node.js - Google Ngram sorting? -


from understand, each file in google's ngram dataset contains list of ngrams, sorted alphabetically, numerically year. however, assuming data utf8 (which file says correct), и 1080, i 73, don't understand why использовал_num comes before i'academie_pron. relevant lines file (starting line #131356):

использовал_num 2005    4       1 i'academie_pron 1813    1       1 

here's ngram-sort-test.js broken comparison function highlighted. run, download this file google , un-gzip in same directory ngram-sort-test.js.

this not answer, workaround manually sort file using lc_all=c sort <googlebooks-eng-all-1gram-20120701-i >googlebooks-eng-all-1gram-20120701-i.sorted.


Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -