Why do some Unicode characters display in matrices, but not data frames in R? -
for @ least cases, asian characters printable if contained in matrix, or vector, not in data.frame. here example
q<-'天' q # works # [1] "天" matrix(q) # works # [,1] # [1,] "天" q2<-data.frame(q,stringsasfactors=false) q2 # not work # q # 1 <u+5929> q2[1,] # works again. # [1] "天" clearly, device capable of displaying character, when in data.frame, not work.
doing digging, found print.data.frame function runs format on each column. turns out if run format.default directly, same problem occurs:
format(q) # "<u+5929>" digging format.default, find calling internal format, written in c.
before dig further, want know if others can reproduce behaviour. is there configuration of r allow me display these characters within data.frames?
my sessioninfo(), if helps:
r version 3.0.1 (2013-05-16) platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] lc_collate=english_canada.1252 lc_ctype=english_canada.1252 [3] lc_monetary=english_canada.1252 lc_numeric=c [5] lc_time=english_canada.1252 attached base packages: [1] stats graphics grdevices utils datasets methods base loaded via namespace (and not attached): [1] tools_3.0.1
i hate answer own question, although comments , answers helped, weren't quite right. in windows, doesn't seem can set generic 'utf-8' locale. can, however, set country-specific locales, work in case:
sys.setlocale("lc_ctype", locale="chinese") q2 # works fine # q #1 天 but, make me wonder why format seems use locale; wonder if there way have ignore locale in windows. wonder if there generic utf-8 locale don't know on windows.
Comments
Post a Comment