r - Save storage space for small integers or factors with few levels -


r seems require 4 bytes of storage per integer, small ones:

> object.size(rep(1l, 10000)) 40040 bytes 

and, more, factors:

> object.size(factor(rep(1l, 10000))) 40456 bytes 

i think, in latter case handled better. there solution me reduce storage requirements case 8 or 2 bits per row? perhaps solution uses raw type internally storage behaves normal factor otherwise. bit package offers bits, haven't found similar factors.

my data frame few millions of rows consuming gigabytes, , that's huge waste of memory , run time (!). compression reduce required disk space, again @ expense of run time.

related:

since mention raw (and assuming have less 256 factor levels) - could prerequisite conversion operations if memory bottleneck , cpu time isn't. example:

f = factor(rep(1l, 1e5)) object.size(f) # 400456 bytes  f.raw = as.raw(f) object.size(f.raw) #100040 bytes  # go back: identical(as.factor(as.integer(f.raw)), f) #[1] true 

you can save factor levels separately , recover them if that's you're interested in doing, far grouping , goes can raw , never go factors (except presentation).

if have specific use cases have trouble method, please post it, otherwise think should work fine.


here's starting point byte.factor class:

byte.factor = function(f) {   res = as.raw(f)   attr(res, "levels") <- levels(f)   attr(res, "class") <- "byte.factor"   res }  as.factor.byte.factor = function(b) {   factor(attributes(b)$levels[as.integer(b)], attributes(b)$levels) } 

so can things like:

f = factor(c('a','b'), letters) f #[1] b #levels: b c d e f g h j k l m n o p q r s t u v w x y z  b = byte.factor(f) b #[1] 01 02 #attr(,"levels") # [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" #[20] "t" "u" "v" "w" "x" "y" "z" #attr(,"class") #[1] "byte.factor"  as.factor.byte.factor(b) #[1] b #levels: b c d e f g h j k l m n o p q r s t u v w x y z 

check out how data.table overrides rbind.data.frame if want make as.factor generic , add whatever functions want add. should quite straightforward.


Comments

Popular posts from this blog

How to mention the localhost in android -

php - Calling a template part from a post -

c# - String.format() DateTime With Arabic culture -