dataframe - How to turn variable names into factors in a data frame in R -


say have data frame containing time-series data, first column index, , remaining columns contain different data streams, , named descriptively, in following example:

temps = data.frame(matrix(1:20,nrow=2,ncol=10)) names(temps) <- c("flr1_dirn_areaa","flr1_dirs_areaa","flr1_dirn_areab","flr1_dirs_areab","flr2_dirn_areaa","flr2_dirs_areaa","flr2_dirn_areab","flr2_dirs_areab","flr3_dirn_areaa","flr3_dirs_areaa") temps$index <- as.date(2013,7,1:2)  temps   flr1_dirn_areaa flr1_dirs_areaa    ...       index 1               1               3    ...  1975-07-15 2               2               4    ...  1975-07-16    

now want prep data frame plotting ggplot2, , want include 3 factors: flr, dir, , area.

i can achieve this simple example follows:

temps.m <- melt(temps,"index") temps.m$flr <- factor(rep(1:3,c(8,8,4))) temps.m$dir <- factor(rep(c("n","s"),each=2,len=20)) temps.m$area <- factor(rep(c("a","b"),each=4,len=20)) temps.m         index        variable value flr dir area 1  1975-07-15 flr1_dirn_areaa     1   1   n    2  1975-07-16 flr1_dirn_areaa     2   1   n    3  1975-07-15 flr1_dirs_areaa     3   1   s    4  1975-07-16 flr1_dirs_areaa     4   1   s    5  1975-07-15 flr1_dirn_areab     5   1   n    b 6  1975-07-16 flr1_dirn_areab     6   1   n    b 7  1975-07-15 flr1_dirs_areab     7   1   s    b 8  1975-07-16 flr1_dirs_areab     8   1   s    b 9  1975-07-15 flr2_dirn_areaa     9   2   n    10 1975-07-16 flr2_dirn_areaa    10   2   n    11 1975-07-15 flr2_dirs_areaa    11   2   s    12 1975-07-16 flr2_dirs_areaa    12   2   s    13 1975-07-15 flr2_dirn_areab    13   2   n    b 14 1975-07-16 flr2_dirn_areab    14   2   n    b 15 1975-07-15 flr2_dirs_areab    15   2   s    b 16 1975-07-16 flr2_dirs_areab    16   2   s    b 17 1975-07-15 flr3_dirn_areaa    17   3   n    18 1975-07-16 flr3_dirn_areaa    18   3   n    19 1975-07-15 flr3_dirs_areaa    19   3   s    20 1975-07-16 flr3_dirs_areaa    20   3   s    

in reality, have data streams (columns) of varying lengths - each of comes own file, has missing data, more 3 factors encoded in column (file) names, simple method of applying factors won't work. need more robust, , i'm inclined parse variable names different factors, , populate factor-columns of melted data frame.

my end goal plot this:

ggplot(temps.m,aes(x=index,y=value,color=area,linetype=dir))+geom_line()+facet_grid(flr~.) 

example of plot, multiple factors

i imagine reshape, reshape2, plyr, or other package can in 1 or 2 statements - struggle melt/cast/ddply , rest of them. suggestions?

also, if can suggest entirely different [better] approach structuring data, i'm ears.

thanks in advance

you can use regular expressions creates factors:

res <- do.call(rbind,strsplit(gsub('flr([0-9]+).*dir([a-z]).*area([a-z])',               '\\1,\\2,\\3',                 temps.m$variable),          ','))      [,1] [,2] [,3]  [1,] "1"  "n"  "a"   [2,] "1"  "n"  "a"   [3,] "1"  "s"  "a"   [4,] "1"  "s"  "a"   [5,] "1"  "n"  "b"   [6,] "1"  "n"  "b"   [7,] "1"  "s"  "b"   [8,] "1"  "s"  "b"   ........ 

maybe need further step transform columns factors.

res <- colwise(as.factor)(data.frame(res))   x1 x2 x3 1   1  n  2   1  n  3   1  s  4   1  s  ........ 

to combine result melted data can use cbind

 temps.m <- cbind(temps.m,res) 

Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -