predict.svm function R text mining? -
i have training set contains sentences , labels(1 et -1 ). after creating svm model. want predict label , score of new data. code :
library(tm); require(rcmdrplugin.temis); library(rtexttools); require(e1071) news=read.csv("c:..polarity.csv",header=f,sep=';') # training data traindata <- as.data.frame(news[1:196,]); trainvector <- as.vector(traindata[,1]); # choose sentences without labels trainsource <- vectorsource(trainvector); traincorpus <- corpus(trainsource) # create training corpus #cleaning training corpus traincorpus <- tm_map(traincorpus,stripwhitespace) traincorpus <- tm_map(traincorpus,tolower) traincorpus <- tm_map(traincorpus, removewords,stopwords("french")) traincorpus <- tm_map(traincorpus,removenumbers) traincorpus <- tm_map(traincorpus, function(x) gsub("(['’\n??]|[[:punct:]]|[[:space:]]|[[:cntrl:]])+", " ", x)) corpus1 <- corpus(dirsource("c.../file", encoding="utf-8"), readercontrol=list(language="fr")) # import corpus of test testcorpus=corpus1 # create copy , cleaning testcorpus <- tm_map(testcorpus,stripwhitespace) testcorpus <- tm_map(testcorpus,tolower) testcorpus <- tm_map(testcorpus, removewords,stopwords("french")) testcorpus <- tm_map(testcorpus,removenumbers) testcorpus <- tm_map(testcorpus, function(x) gsub("(['’\n??]|[[:punct:]]|[[:space:]]|[[:cntrl:]])+", " ", x)) #creating dtm of test , train corpus words stemming tr_matrix <- create_matrix(traincorpus, language="french", stemwords=true, removestopwords=true,weighting=weighttf) tr=as.matrix(tr_matrix) ts_matrix <- create_matrix(testcorpus, language="french", stemwords=true, removestopwords=true,weighting=weighttf) ts = as.matrix(ts_matrix) y=traindata[,2] model<-svm(y~tr) pred=fitted(model) pred <- predict(model, ts, decision.values = true, probability = true,na.action=true)
i have error : erreur dans matrix(ret$dec, nrow = nrow(newdata), byrow = true, dimnames = list(rowns, : la longueur de 'dimnames' [1] n'est pas égale à l'étendue du tableau guess because of structure differnce between training data , new data. can me please? thanks
Comments
Post a Comment