Data rearrangement using R/SQL without looping -
i know how rearrange source data (table) in order output desired table using r or sql, displayed below.
since looping slow in r, , dataset quite large... it's not preferred have looping in script. efficiency important.
source data table:
date | country | id | fruit | favorite | money 20120101 1 apple book 100 20120101 2 orange knife 150 20120101 3 banana watch 80 20120101 4 melon water 90 20120102 1 apple phone 120 20120102 2 apple knife 130 20120102 3 banana watch 100 ..... ...... .. ..... ...... ......
output table:
date | country | field | id 1 | id 2 | id 3 | id 4 20120101 fruit apple orange banana melon 20120101 favorite book knife watch water 20120101 money 100 150 80 90 20120102 fruit apple apple banana n.a. .... .... .... .... .... .... ....
here approach in r, using sample data:
x <- cbind(mydf[, c("date", "country", "id")], stack(mydf[, c("fruit", "favorite", "money")])) reshape(x, direction = "wide", idvar = c("date", "country", "ind"), timevar="id") # date country ind values.1 values.2 values.3 values.4 # 1 20120101 fruit apple orange banana melon # 5 20120102 fruit apple apple banana <na> # 8 20120101 favorite book knife watch water # 12 20120102 favorite phone knife watch <na> # 15 20120101 money 100 150 80 90 # 19 20120102 money 120 130 100 <na>
to round other options, here's melt
+ dcast
approach (which can taken "data.table" or "reshape2") , "dplyr" + "tidyr" approach.
library(data.table) dcast( suppresswarnings( melt(as.data.table(mydf), c("date", "country", "id"))), ... ~ id, value.var = "value") # date country variable 1 2 3 4 # 1: 20120101 fruit apple orange banana melon # 2: 20120101 favorite book knife watch water # 3: 20120101 money 100 150 80 90 # 4: 20120102 fruit apple apple banana na # 5: 20120102 favorite phone knife watch na # 6: 20120102 money 120 130 100 na library(dplyr) library(tidyr) mydf %>% gather(variable, value, fruit:money) %>% spread(id, value) # date country variable 1 2 3 4 # 1 20120101 fruit apple orange banana melon # 2 20120101 favorite book knife watch water # 3 20120101 money 100 150 80 90 # 4 20120102 fruit apple apple banana <na> # 5 20120102 favorite phone knife watch <na> # 6 20120102 money 120 130 100 <na>
in answer, mydf
defined as:
mydf <- structure( list(date = c(20120101l, 20120101l, 20120101l, 20120101l, 20120102l, 20120102l, 20120102l), country = c("us", "us", "us", "us", "us", "us", "us"), id = c(1l, 2l, 3l, 4l, 1l, 2l, 3l), fruit = c("apple", "orange", "banana", "melon", "apple", "apple", "banana"), favorite = c("book", "knife", "watch", "water", "phone", "knife", "watch"), money = c(100l, 150l, 80l, 90l, 120l, 130l, 100l)), .names = c("date", "country", "id", "fruit", "favorite", "money"), class = "data.frame", row.names = c(na, -7l))
Comments
Post a Comment