r - controlling the value (TRUE or FALSE) of dummy variables in interaction terms when using lm() -


when estimate model has interaction between 2 variables don't enter model standalone variables, , when 1 of these variables dummy (class "logical") variable, r "flips sign" of dummy variable. is, reports estimate of coefficient on interaction term when dummy false, not when true. here example:

data(trees) trees$dheight <- trees$height > 76 trees$cgirth  <- trees$girth - mean(trees$girth) lm(volume ~ girth +  girth:dheight, data = trees)  # estimate  girth:dheighttrue lm(volume ~ girth + cgirth:dheight, data = trees)  # estimate cgirth:dheightfalse     

why regression in last line produce estimate interaction in dheight false rather true? (i r report estimate when dheight true.)

this not big problem, better understand why r doing it's doing. know relevel() , contrasts(), can't see make difference here.

the dheight logical. within model coerced factor, , levels sorted lexicographically (i.e. f before t).

as noted in @hongooi's answer, can't estimate 4 parameters, r fit terms in order appear (false before true)

if want force r fit true value first fit model !dheight

lm(formula = volume ~ girth + cgirth:!dheight, data = trees) 

note !dheightfalse equivalent of dheighttrue

you note in simple case changing sign on coefficient doesn't matter model fit.


edit far better approach

r can regcognize cgirth , girth colinear, therefore can fit remembering a/b expands a + a:b

lm(formula = volume ~ girth + cgirth/dheight, data = trees) coefficients:        (intercept)               girth              cgirth  cgirth:dheighttrue              -27.198               4.251                  na               1.286 

this provides coefficients easy interpret names , r sensibly fail return coefficient cgirth


r can tell girth , cgirth colinear, when both model "main effect" or standalone terms.

there no way r should able tell when fitting girth + cgirth:dheight cgirth , girth colinear , given dheight logical want cgirthdheighttrue coefficient fit. (you write own formula parser if wanted)

another approach fit model wanted, , without any colinear terms use

lm(formula = volume ~ girth + i(cgirth*dheight), data = trees) 

which coerces dheight numeric (true becomes 1).


edit labor point.

when fit ~girth + girth:dheight

what saying there main effect girth + adjustments dheight. r considers first level of factor reference level. slope dheightfalse value girth, have adjustment when dheight == true (girth:dheighttrue).

when fit ~girth + cgirth:dheight -- r not have mind-reading parser can tell given cgirth , girth co-linear when fit interaction of 2 terms, assume second level dheight reference level)

imagine if had variable totally unrelated girth

eg

set.seed(1) trees$cg <- runif(nrow(trees)) 

then when fit girth + cg:dheight, 4 parameters estimated

lm(formula = volume ~ girth + cg:dheight, data = trees)  call: lm(formula = volume ~ girth + cg:dheight, data = trees)  coefficients:     (intercept)            girth  cg:dheightfalse   cg:dheighttrue         -31.79645          4.79435         -5.92168          0.09578   

which sensible.

when r processes girth + cgirth:dheight, expand out (with first level of factor first) 1 + girth + cgirth:dheightfalse + cgirth:dheighttrue -- , work out can't estimate 4 parameters, , estimate first 3.


Comments

Popular posts from this blog

How to mention the localhost in android -

php - Calling a template part from a post -

c# - String.format() DateTime With Arabic culture -