r - controlling the value (TRUE or FALSE) of dummy variables in interaction terms when using lm() -
when estimate model has interaction between 2 variables don't enter model standalone variables, , when 1 of these variables dummy (class "logical") variable, r "flips sign" of dummy variable. is, reports estimate of coefficient on interaction term when dummy false, not when true. here example:
data(trees) trees$dheight <- trees$height > 76 trees$cgirth <- trees$girth - mean(trees$girth) lm(volume ~ girth + girth:dheight, data = trees) # estimate girth:dheighttrue lm(volume ~ girth + cgirth:dheight, data = trees) # estimate cgirth:dheightfalse why regression in last line produce estimate interaction in dheight false rather true? (i r report estimate when dheight true.)
this not big problem, better understand why r doing it's doing. know relevel() , contrasts(), can't see make difference here.
the dheight logical. within model coerced factor, , levels sorted lexicographically (i.e. f before t).
as noted in @hongooi's answer, can't estimate 4 parameters, r fit terms in order appear (false before true)
if want force r fit true value first fit model !dheight
lm(formula = volume ~ girth + cgirth:!dheight, data = trees) note !dheightfalse equivalent of dheighttrue
you note in simple case changing sign on coefficient doesn't matter model fit.
edit far better approach
r can regcognize cgirth , girth colinear, therefore can fit remembering a/b expands a + a:b
lm(formula = volume ~ girth + cgirth/dheight, data = trees) coefficients: (intercept) girth cgirth cgirth:dheighttrue -27.198 4.251 na 1.286 this provides coefficients easy interpret names , r sensibly fail return coefficient cgirth
r can tell girth , cgirth colinear, when both model "main effect" or standalone terms.
there no way r should able tell when fitting girth + cgirth:dheight cgirth , girth colinear , given dheight logical want cgirthdheighttrue coefficient fit. (you write own formula parser if wanted)
another approach fit model wanted, , without any colinear terms use
lm(formula = volume ~ girth + i(cgirth*dheight), data = trees) which coerces dheight numeric (true becomes 1).
edit labor point.
when fit ~girth + girth:dheight
what saying there main effect girth + adjustments dheight. r considers first level of factor reference level. slope dheightfalse value girth, have adjustment when dheight == true (girth:dheighttrue).
when fit ~girth + cgirth:dheight -- r not have mind-reading parser can tell given cgirth , girth co-linear when fit interaction of 2 terms, assume second level dheight reference level)
imagine if had variable totally unrelated girth
eg
set.seed(1) trees$cg <- runif(nrow(trees)) then when fit girth + cg:dheight, 4 parameters estimated
lm(formula = volume ~ girth + cg:dheight, data = trees) call: lm(formula = volume ~ girth + cg:dheight, data = trees) coefficients: (intercept) girth cg:dheightfalse cg:dheighttrue -31.79645 4.79435 -5.92168 0.09578 which sensible.
when r processes girth + cgirth:dheight, expand out (with first level of factor first) 1 + girth + cgirth:dheightfalse + cgirth:dheighttrue -- , work out can't estimate 4 parameters, , estimate first 3.
Comments
Post a Comment