dataframe - How to turn variable names into factors in a data frame in R -
say have data frame containing time-series data, first column index, , remaining columns contain different data streams, , named descriptively, in following example:
temps = data.frame(matrix(1:20,nrow=2,ncol=10)) names(temps) <- c("flr1_dirn_areaa","flr1_dirs_areaa","flr1_dirn_areab","flr1_dirs_areab","flr2_dirn_areaa","flr2_dirs_areaa","flr2_dirn_areab","flr2_dirs_areab","flr3_dirn_areaa","flr3_dirs_areaa") temps$index <- as.date(2013,7,1:2) temps flr1_dirn_areaa flr1_dirs_areaa ... index 1 1 3 ... 1975-07-15 2 2 4 ... 1975-07-16
now want prep data frame plotting ggplot2, , want include 3 factors: flr
, dir
, , area
.
i can achieve this simple example follows:
temps.m <- melt(temps,"index") temps.m$flr <- factor(rep(1:3,c(8,8,4))) temps.m$dir <- factor(rep(c("n","s"),each=2,len=20)) temps.m$area <- factor(rep(c("a","b"),each=4,len=20)) temps.m index variable value flr dir area 1 1975-07-15 flr1_dirn_areaa 1 1 n 2 1975-07-16 flr1_dirn_areaa 2 1 n 3 1975-07-15 flr1_dirs_areaa 3 1 s 4 1975-07-16 flr1_dirs_areaa 4 1 s 5 1975-07-15 flr1_dirn_areab 5 1 n b 6 1975-07-16 flr1_dirn_areab 6 1 n b 7 1975-07-15 flr1_dirs_areab 7 1 s b 8 1975-07-16 flr1_dirs_areab 8 1 s b 9 1975-07-15 flr2_dirn_areaa 9 2 n 10 1975-07-16 flr2_dirn_areaa 10 2 n 11 1975-07-15 flr2_dirs_areaa 11 2 s 12 1975-07-16 flr2_dirs_areaa 12 2 s 13 1975-07-15 flr2_dirn_areab 13 2 n b 14 1975-07-16 flr2_dirn_areab 14 2 n b 15 1975-07-15 flr2_dirs_areab 15 2 s b 16 1975-07-16 flr2_dirs_areab 16 2 s b 17 1975-07-15 flr3_dirn_areaa 17 3 n 18 1975-07-16 flr3_dirn_areaa 18 3 n 19 1975-07-15 flr3_dirs_areaa 19 3 s 20 1975-07-16 flr3_dirs_areaa 20 3 s
in reality, have data streams (columns) of varying lengths - each of comes own file, has missing data, more 3 factors encoded in column (file) names, simple method of applying factors won't work. need more robust, , i'm inclined parse variable names different factors, , populate factor-columns of melted data frame.
my end goal plot this:
ggplot(temps.m,aes(x=index,y=value,color=area,linetype=dir))+geom_line()+facet_grid(flr~.)
i imagine reshape, reshape2, plyr, or other package can in 1 or 2 statements - struggle melt/cast/ddply , rest of them. suggestions?
also, if can suggest entirely different [better] approach structuring data, i'm ears.
thanks in advance
you can use regular expressions creates factors:
res <- do.call(rbind,strsplit(gsub('flr([0-9]+).*dir([a-z]).*area([a-z])', '\\1,\\2,\\3', temps.m$variable), ',')) [,1] [,2] [,3] [1,] "1" "n" "a" [2,] "1" "n" "a" [3,] "1" "s" "a" [4,] "1" "s" "a" [5,] "1" "n" "b" [6,] "1" "n" "b" [7,] "1" "s" "b" [8,] "1" "s" "b" ........
maybe need further step transform columns factors.
res <- colwise(as.factor)(data.frame(res)) x1 x2 x3 1 1 n 2 1 n 3 1 s 4 1 s ........
to combine result melted data can use cbind
temps.m <- cbind(temps.m,res)
Comments
Post a Comment