Data rearrangement using R/SQL without looping -

January 15, 2015

i know how rearrange source data (table) in order output desired table using r or sql, displayed below.

since looping slow in r, , dataset quite large... it's not preferred have looping in script. efficiency important.

source data table:

date    | country | id | fruit  | favorite | money 20120101         1    apple     book      100 20120101         2    orange    knife     150 20120101         3    banana    watch     80 20120101         4    melon     water     90 20120102         1    apple     phone     120 20120102         2    apple     knife     130 20120102         3    banana    watch     100            .....     ......    ..   .....     ......    ......

output table:

date    | country | field   | id 1 | id 2  | id 3  | id 4 20120101         fruit     apple  orange  banana  melon 20120101         favorite  book   knife   watch   water 20120101         money     100    150     80      90 20120102         fruit     apple  apple   banana  n.a. ....      ....      ....      ....   ....    ....    ....

here approach in r, using sample data:

x <- cbind(mydf[, c("date", "country", "id")],             stack(mydf[, c("fruit", "favorite", "money")])) reshape(x, direction = "wide", idvar = c("date", "country", "ind"), timevar="id") #        date country      ind values.1 values.2 values.3 values.4 # 1  20120101         fruit    apple   orange   banana    melon # 5  20120102         fruit    apple    apple   banana     <na> # 8  20120101      favorite     book    knife    watch    water # 12 20120102      favorite    phone    knife    watch     <na> # 15 20120101         money      100      150       80       90 # 19 20120102         money      120      130      100     <na>

to round other options, here's melt + dcast approach (which can taken "data.table" or "reshape2") , "dplyr" + "tidyr" approach.

library(data.table) dcast(   suppresswarnings(     melt(as.data.table(mydf), c("date", "country", "id"))),     ... ~ id, value.var = "value") #        date country variable     1      2      3     4 # 1: 20120101         fruit apple orange banana melon # 2: 20120101      favorite  book  knife  watch water # 3: 20120101         money   100    150     80    90 # 4: 20120102         fruit apple  apple banana    na # 5: 20120102      favorite phone  knife  watch    na # 6: 20120102         money   120    130    100    na    library(dplyr) library(tidyr) mydf %>%   gather(variable, value, fruit:money) %>%   spread(id, value) #       date country variable     1      2      3     4 # 1 20120101         fruit apple orange banana melon # 2 20120101      favorite  book  knife  watch water # 3 20120101         money   100    150     80    90 # 4 20120102         fruit apple  apple banana  <na> # 5 20120102      favorite phone  knife  watch  <na> # 6 20120102         money   120    130    100  <na>

in answer, mydf defined as:

mydf <- structure(   list(date = c(20120101l, 20120101l, 20120101l,                  20120101l, 20120102l, 20120102l, 20120102l),         country = c("us", "us", "us", "us", "us", "us", "us"),         id = c(1l, 2l, 3l, 4l, 1l, 2l, 3l),        fruit = c("apple", "orange", "banana", "melon",                   "apple", "apple", "banana"),         favorite = c("book", "knife", "watch", "water",                      "phone", "knife", "watch"),         money = c(100l, 150l, 80l, 90l, 120l, 130l, 100l)),    .names = c("date", "country", "id",               "fruit", "favorite", "money"),    class = "data.frame", row.names = c(na, -7l))

Search This Blog

Live

Data rearrangement using R/SQL without looping -

Comments

Post a Comment

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -