R Language Tutorial => Weighting

Example

Sometimes we want the model to give more weight to some data points or examples than others. This is possible by specifying the weight for the input data while learning the model. There are generally two kinds of scenarios where we might use non-uniform weights over the examples:

Analytic Weights: Reflect the different levels of precision of different observations. For example, if analyzing data where each observation is the average results from a geographic area, the analytic weight is proportional to the inverse of the estimated variance. Useful when dealing with averages in data by providing a proportional weight given the number of observations. Source

Sampling Weights (Inverse Probability Weights - IPW): a statistical technique for calculating statistics standardized to a population different from that in which the data was collected. Study designs with a disparate sampling population and population of target inference (target population) are common in application. Useful when dealing with data that have missing values. Source

The lm() function does analytic weighting. For sampling weights the survey package is used to build a survey design object and run svyglm(). By default, the survey package uses sampling weights. (NOTE: lm(), and svyglm() with family gaussian() will all produce the same point estimates, because they both solve for the coefficients by minimizing the weighted least squares. They differ in how standard errors are calculated.)

Test Data

data <- structure(list(lexptot = c(9.1595012302023, 9.86330744180814, 
8.92372556833205, 8.58202430280175, 10.1133857229336), progvillm = c(1L, 
1L, 1L, 1L, 0L), sexhead = c(1L, 1L, 0L, 1L, 1L), agehead = c(79L, 
43L, 52L, 48L, 35L), weight = c(1.04273509979248, 1.01139605045319, 
1.01139605045319, 1.01139605045319, 0.76305216550827)), .Names = c("lexptot", 
"progvillm", "sexhead", "agehead", "weight"), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -5L))

Analytic Weights

lm.analytic <- lm(lexptot ~ progvillm + sexhead + agehead, 
                            data = data, weight = weight)
summary(lm.analytic)

Output

Call:
lm(formula = lexptot ~ progvillm + sexhead + agehead, data = data, 
    weights = weight)

Weighted Residuals:
         1          2          3          4          5 
 9.249e-02  5.823e-01  0.000e+00 -6.762e-01 -1.527e-16

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.016054   1.744293   5.742    0.110
progvillm   -0.781204   1.344974  -0.581    0.665
sexhead      0.306742   1.040625   0.295    0.818
agehead     -0.005983   0.032024  -0.187    0.882

Residual standard error: 0.8971 on 1 degrees of freedom
Multiple R-squared:  0.467, Adjusted R-squared:  -1.132 
F-statistic: 0.2921 on 3 and 1 DF,  p-value: 0.8386

Sampling Weights (IPW)

library(survey)
data$X <- 1:nrow(data)             # Create unique id
        
# Build survey design object with unique id, ipw, and data.frame
des1 <- svydesign(id = ~X,  weights = ~weight, data = data)
        
# Run glm with survey design object
prog.lm <- svyglm(lexptot ~ progvillm + sexhead + agehead, design=des1)

Output

Call:
svyglm(formula = lexptot ~ progvillm + sexhead + agehead, design = des1)

Survey design:
svydesign(id = ~X, weights = ~weight, data = data)

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept) 10.016054   0.183942  54.452   0.0117 *
progvillm   -0.781204   0.640372  -1.220   0.4371  
sexhead      0.306742   0.397089   0.772   0.5813  
agehead     -0.005983   0.014747  -0.406   0.7546  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.2078647)

Number of Fisher Scoring iterations: 2

PDF - Download R Language for free

Previous Next

R Language

Fastest Entity Framework Extensions

Example

Got any R Language Question?

R Language

R Language Linear Models (Regression) Weighting

Fastest Entity Framework Extensions

Example

Got any R Language Question?