dataframe - R Subset data frame and perform function based on columns -
sample data. i'm not sure how use code block system on yet.
df <- data.frame(c(1,1,1,2,2,2,3,3,3),c(1990,1991,1992,1990,1991,1992,1990,1991,1992),c(1,2,3,3,2,1,2,1,3)) colnames(df) <- c("id", "year", "value")
that generates simple matrix.
id year value
1 1990 1
1 1991 2
1 1992 3
2 1990 3
2 1991 2
2 1992 1
3 1990 2
3 1991 1
3 1992 3
i sorting through r subsetting questions, , couldn't figure out second step in ddply function {plyr} applied it.
logic: id subgroups, find highest value (which 3) @ earliest time point.
i'm confused syntax use here. searching so, think ddply best choice, can't figure out how. ideally, output should vector of unique ids (as 1 selected, entire row taken it. isn't working in r me, best "logic" come with.
ddply( (ddply(df,id)), year, which.min(value) )
e.g.
id year value
1 1992 3
2 1990 3
3 1992 3
if 3 not available, next highest (2, or 1) should taken. ideas?
you need understand ddply
splits original data.frame data.frames according splitting variable(s). thus, needs function data.frame argument , return value.
library(plyr) ddply(df,.(id),function(df) {res <- df[which.max(df$value),] res[which.min(res$year),]}) # id year value # 1 1 1992 3 # 2 2 1990 3 # 3 3 1992 3
Comments
Post a Comment