dataframe - R Subset data frame and perform function based on columns -


sample data. i'm not sure how use code block system on yet.

df <- data.frame(c(1,1,1,2,2,2,3,3,3),c(1990,1991,1992,1990,1991,1992,1990,1991,1992),c(1,2,3,3,2,1,2,1,3)) colnames(df) <- c("id", "year", "value") 

that generates simple matrix.

id year value
1 1990 1
1 1991 2
1 1992 3
2 1990 3
2 1991 2
2 1992 1
3 1990 2
3 1991 1
3 1992 3

i sorting through r subsetting questions, , couldn't figure out second step in ddply function {plyr} applied it.

logic: id subgroups, find highest value (which 3) @ earliest time point.

i'm confused syntax use here. searching so, think ddply best choice, can't figure out how. ideally, output should vector of unique ids (as 1 selected, entire row taken it. isn't working in r me, best "logic" come with.

ddply( (ddply(df,id)), year, which.min(value) )

e.g.

id year value
1 1992 3
2 1990 3
3 1992 3

if 3 not available, next highest (2, or 1) should taken. ideas?

you need understand ddply splits original data.frame data.frames according splitting variable(s). thus, needs function data.frame argument , return value.

library(plyr) ddply(df,.(id),function(df) {res <- df[which.max(df$value),]                              res[which.min(res$year),]})  #   id year value # 1  1 1992     3 # 2  2 1990     3 # 3  3 1992     3 

Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -