text mining - R-Project no applicable method for 'meta' applied to an object of class "character" -
i trying run code (ubuntu 12.04, r 3.1.1)
# load requisite packages library(tm) library(ggplot2) library(lsa) # place enron email snippets single vector. text <- c( "to mr. ken lay, i’m writing urge donate millions of dollars made selling enron stock before company declared bankruptcy.", "while netted on $100 million, many of enron's employees financially devastated when company declared bankruptcy , retirement plans wiped out", "you sold $101 million worth of enron stock while aggressively urging company’s employees keep buying it", "this reminder of enron’s email retention policy. email retention policy provides follows . . .", "furthermore, against policy store email outside of outlook mailbox and/or public folders. please not copy email onto floppy disks, zip disks, cds or network.", "based on our receipt of various subpoenas, preserving past , future email. please prudent in circulation of email relating work , activities.", "we have recognized on $550 million of fair value gains on stocks via our swaps raptor.", "the raptor accounting treatment looks questionable. a. enron booked $500 million gain equity derivatives related party.", "in third quarter have $250 million problem raptor 3 if don’t “enhance” capital structure of raptor 3 commit more ene shares.") view <- factor(rep(c("view 1", "view 2", "view 3"), each = 3)) df <- data.frame(text, view, stringsasfactors = false) # prepare mini-enron corpus corpus <- corpus(vectorsource(df$text)) corpus <- tm_map(corpus, tolower) corpus <- tm_map(corpus, removepunctuation) corpus <- tm_map(corpus, function(x) removewords(x, stopwords("english"))) corpus <- tm_map(corpus, stemdocument, language = "english") corpus # check corpus # mini-enron corpus 9 text documents # compute term-document matrix contains occurrance of terms in each email # compute distance between pairs of documents , scale multidimentional semantic space (mds) onto 2 dimensions td.mat <- as.matrix(termdocumentmatrix(corpus)) dist.mat <- dist(t(as.matrix(td.mat))) dist.mat # check distance matrix # compute distance between pairs of documents , scale multidimentional semantic space onto 2 dimensions fit <- cmdscale(dist.mat, eig = true, k = 2) points <- data.frame(x = fit$points[, 1], y = fit$points[, 2]) ggplot(points, aes(x = x, y = y)) + geom_point(data = points, aes(x = x, y = y, color = df$view)) + geom_text(data = points, aes(x = x, y = y - 0.2, label = row.names(df)))
however, when run error (in td.mat <- as.matrix(termdocumentmatrix(corpus))
line):
error in usemethod("meta", x) : no applicable method 'meta' applied object of class "character" in addition: warning message: in mclapply(unname(content(x)), termfreq, control) : scheduled cores encountered errors in user code
i not sure @ - modules loaded.
the latest version of tm
(0.60) made can't use functions tm_map
operate on simple character values more. problem tolower
step since isn't "canonical" transformation (see gettransformations()
). replace with
corpus <- tm_map(corpus, content_transformer(tolower))
the content_transformer
function wrapper convert correct data type within corpus. can use content_transformer
function intended manipulate character vectors work in tm_map
pipeline.
Comments
Post a Comment