r - tapply - creating NA? -

March 15, 2013

i'm trying calculate mean number of unique fruits per person (my usual practice data). works both these lines of code:

with(df, tapply(fruit, names, fun = function(x) length(unique(x))))->uniques sum(uniques)/length(unique(df$names))  aggregate(df[,"fruit"], by=list(id=names), fun = function(x) length(unique(x)))->d1 sum(d1$x)/length(unique(df$names))

my problem when use code on real data doesn't work. real data prescribing data, want mean number of unique drugs per person. tapply code, has appeared create brand new patient ids not exist in original df. has given 1000s of na values. there no missing values in id column , none in drug_code column either

with(dt3, tapply(drug_code, id, fun = function(x) length(unique(x))))->uniques      head(uniques)                    uniques patient hai0000001      na patient hai0000003      na patient hai0000008      na patient hai0000010      na patient hai0000014      na patient hai0000020      na  table(dt3$id=="patient hai0000001")  ##checking see if ha10000001 occurs in original df. dim of df 228954 rows , 5 cols  false  228954

for aggregate code error:

aggregate(dt3[,"drug_code"], by=list(id=id), fun = function(x) length(unique(x)))->d1  error in aggregate.data.frame(as.data.frame(x), ...) :    arguments must have same length

i don't understand whats happening. real data similar practice data in has id col , has drug/fruit column. there no missing data in either df. know lapply better dataframes, don't need df back. , in case tapply code works on practice data df. have idea of happening here?

practice df:

 names<-as.character(c("john", "john", "john", "john", "john", "mary", "mary","mary","mary","mary", "jim", "sylvia","ted","ted","mary", "sylvia", "jim", "ted", "john", "ted")) dates<-as.date(c("2010-07-01",  "2010-09-01", "2010-11-01", "2010-12-01", "2011-01-01", "2010-08-12",  "2010-11-11", "2010-05-12",  "2010-12-03", "2010-07-12",  "2010-12-21", "2010-02-18",  "2010-10-29", "2010-08-13",  "2010-11-11", "2010-05-12",  "2010-04-01", "2010-05-06",  "2010-09-28", "2010-11-28" )) fruit<-as.character(c("kiwi","apple","banana","orange","apple","orange","apple","orange", "apple", "apple", "pineapple", "peach", "nectarine", "grape", "melon", "apricot", "plum", "lychee", "watermelon", "apple" )) df<-data.frame(names,dates,fruit)

example of real data:

head(dt3)         id         quantity   date_of_claim drug_code  index 1  patient hai0000560        1    2009-10-15 r03ac02 2010-04-06 2  patient hai0000560        1    2009-10-15 r03ak06 2010-04-06 3  patient hai0000560       30    2009-10-15 r03bb04 2010-04-06 4  patient hai0000560       30    2009-10-15 a02bc01 2010-04-06 5  patient hai0000560       50    2009-10-15 m02aa15 2010-04-06 6  patient hai0000560       30    2009-10-15 n02be51 2010-04-06

in case asking fir single number: mean of individual lengths of particular vector (unique(fruits)) within patient-id. shws first indivdual unique counts , mean function result:

> with(df,  tapply(fruit, names, function(x) length(unique(x)) ))    jim   john   mary sylvia    ted       2      5      3      2      4  > mean ( with(df,  tapply(fruit, names, function(x) length(unique(x)) )) ) [1] 3.2

i comment test containment of particular value in code above had trailing space might have caused problems. "string " not equal "string". have put copy of use trim function in pkg::gdata in .rprofile file make easier me handle possibility.

Search This Blog

Live

r - tapply - creating NA? -

Comments

Post a Comment

Popular posts from this blog

javascript - JS causing window size to be bigger than necessary - Dropdown bug -

How to mention the localhost in android -

php - Calling a template part from a post -