Sometimes we want the model to give more weight to some data points or examples than others. This is possible by specifying the weight for the input data while learning the model. There are generally two kinds of scenarios where we might use non-uniform weights over the examples:
The lm()
function does analytic weighting. For sampling weights the survey
package is used to build a survey design object and run svyglm()
. By default, the survey
package uses sampling weights. (NOTE: lm()
, and svyglm()
with family gaussian()
will all produce the same point estimates, because they both solve for the coefficients by minimizing the weighted least squares. They differ in how standard errors are calculated.)
Test Data
data <- structure(list(lexptot = c(9.1595012302023, 9.86330744180814,
8.92372556833205, 8.58202430280175, 10.1133857229336), progvillm = c(1L,
1L, 1L, 1L, 0L), sexhead = c(1L, 1L, 0L, 1L, 1L), agehead = c(79L,
43L, 52L, 48L, 35L), weight = c(1.04273509979248, 1.01139605045319,
1.01139605045319, 1.01139605045319, 0.76305216550827)), .Names = c("lexptot",
"progvillm", "sexhead", "agehead", "weight"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L))
Analytic Weights
lm.analytic <- lm(lexptot ~ progvillm + sexhead + agehead,
data = data, weight = weight)
summary(lm.analytic)
Output
Call:
lm(formula = lexptot ~ progvillm + sexhead + agehead, data = data,
weights = weight)
Weighted Residuals:
1 2 3 4 5
9.249e-02 5.823e-01 0.000e+00 -6.762e-01 -1.527e-16
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.016054 1.744293 5.742 0.110
progvillm -0.781204 1.344974 -0.581 0.665
sexhead 0.306742 1.040625 0.295 0.818
agehead -0.005983 0.032024 -0.187 0.882
Residual standard error: 0.8971 on 1 degrees of freedom
Multiple R-squared: 0.467, Adjusted R-squared: -1.132
F-statistic: 0.2921 on 3 and 1 DF, p-value: 0.8386
Sampling Weights (IPW)
library(survey)
data$X <- 1:nrow(data) # Create unique id
# Build survey design object with unique id, ipw, and data.frame
des1 <- svydesign(id = ~X, weights = ~weight, data = data)
# Run glm with survey design object
prog.lm <- svyglm(lexptot ~ progvillm + sexhead + agehead, design=des1)
Output
Call:
svyglm(formula = lexptot ~ progvillm + sexhead + agehead, design = des1)
Survey design:
svydesign(id = ~X, weights = ~weight, data = data)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.016054 0.183942 54.452 0.0117 *
progvillm -0.781204 0.640372 -1.220 0.4371
sexhead 0.306742 0.397089 0.772 0.5813
agehead -0.005983 0.014747 -0.406 0.7546
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 0.2078647)
Number of Fisher Scoring iterations: 2