# Can You Say “Heteroscedasticity” 3 Times Fast?

**Mad (Data) Scientist**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Most books on regression analysis assume homoscedasticity, the situation in which Var(Y | X = t), for a response variable Y and vector of predictor variables X, is the same for all t. Yet, needless to say, almost all data in real life is heteroscedastic. For Y = human weight and X = height, say, we know that the assumption of homoscedasticity can’t be true, even approximately.

Typical books discuss assessment of that assumption using residual plots and the like, but then leave it at that. Rather few books mention Eicker-White theory, which develops valid asymptotic inference for heteroscedastic data. E-W is really nice, and guess what! — it’s long been available in R, in the **sandwich** and **car** packages on CRAN. (Note that the latter package is intended for use with a book that does cover this topic, J. Fox and S. Weisberg, *An R Companion to Applied Regression*, Second Edition, Sage, 2011.) Then instead of using R’s standard **vcov()** function to obtain estimated variances and covariances of the estimates of the β_{i}, we use **vcovHC()** and **hccm()**, respectively.

One can make a similar derivation for nonlinear regression, which is available as the function **nlshc()** in my **regtools** package. The package will be associated with my own book on regression, currently in progress. (The package is currently in progress as well, but already contains several useful functions.) The rest of this post is adapted from an example in the book.

The model I chose for this simple example was E(Y | X = t) = 1 / t’β, where the distributions of the quantities can be seen in the following simulation code:

sim <- function(n,nreps) { b <- 1:2 res <- replicate(nreps,{ x <- matrix(rexp(2*n),ncol=2) meany <- 1 / (x %*% b) y <- meany + (runif(n) - 0.5) * meany xy <- cbind(x,y) xy <- data.frame(xy) nlout <- nls(X3 ~ 1 / (b1*X1+b2*X2), data=xy,start=list(b1 = 1,b2=1)) b <- coef(nlout) vc <- vcov(nlout) vchc <- nlsvcovhc(nlout) z1 <- (b[1] - 1) / sqrt(vc[1,1]) z2 <- (b[1] - 1) / sqrt(vchc[1,1]) c(z1,z2) }) print(mean(res[1,] < 1.28)) print(mean(res[2,] < 1.28)) }

The results were striking:

> sim(250,2500) [1] 0.6188 [1] 0.9096

Since the true value should be 0.90 (1.28 is the 0.90 quantile of N(0,1)), the Eicker-White method is doing an outstanding job here — and R’s built-in nonlinear regression function, **nls()** is not. The latter function’s reported standard errors are way, way off the mark.

**leave a comment**for the author, please follow the link and comment on their blog:

**Mad (Data) Scientist**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.