variables - multicolliarity test in R -


i'm working large data set suspect have multicollinearity issues because var-covariance matrix has negative eigenvalue (and small when comparing rest); ratio max eigenvalue/min eigenvalue > 3000;

my question is: there test routine in r identify variables redundant (i don't work regression models); might linear regression pair graphs or use pairs(data) command appreciate numerical tests because have 200 variables , graphs aren't decision support in matter.

if undesrtood correctly looking for:

if have in mind correlation threshold want use exclude variables try following

in example here i'm generating random matrix

> set.seed(3) > data <- data.frame(v1=rnorm(20),v2=rnorm(20),v3=rnorm(20),v4=rnorm(20),v5=rnorm(20)) > cor.mat <- cor(data) > diag(cor.mat)=0 

this correlation matrix , variables v1, v2, v3, v4, v5

> cor.mat             v1          v2         v3         v4         v5 v1  0.00000000 -0.14464568 0.09047839 -0.1200863 -0.1110384 v2 -0.14464568  0.00000000 0.04340839  0.1929009 -0.4354569 v3  0.09047839  0.04340839 0.00000000  0.1185795  0.1760463 v4 -0.12008631  0.19290090 0.11857953  0.0000000 -0.2080077 v5 -0.11103839 -0.43545694 0.17604633 -0.2080077  0.0000000 

now substitute in following loop, in if statement, threshold value want use select redundant variables (here use .4 if not indicate redundancy highest value came out random matrix).

> high_cor = vector() > (i in 1:nrow(cor.mat)){ +     (j in 1:ncol(cor.mat)){ +        if (abs(cor.mat[i,j]) >= 0.4) {high_cor[i]=paste(rownames(cor.mat)[i], "-", +                                                         colnames(cor.mat)[j])} + } + } > high_cor <- high_cor[!is.na(high_cor)] 

in case variables correlate > .4 v2 , v5:

> high_cor [1] "v2 - v5" "v5 - v2" 

hope helps


Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -