text mining - R-Project no applicable method for 'meta' applied to an object of class "character" -


i trying run code (ubuntu 12.04, r 3.1.1)

# load requisite packages library(tm) library(ggplot2) library(lsa)  # place enron email snippets single vector. text <- c(   "to mr. ken lay, i’m writing urge donate millions of dollars made selling enron stock before company declared bankruptcy.",   "while netted on $100 million, many of enron's employees financially devastated when company declared bankruptcy , retirement plans wiped out",   "you sold $101 million worth of enron stock while aggressively urging company’s employees keep buying it",   "this reminder of enron’s email retention policy. email retention policy provides follows . . .",   "furthermore, against policy store email outside of outlook mailbox and/or public folders. please not copy email onto floppy disks, zip disks, cds or network.",   "based on our receipt of various subpoenas, preserving past , future email. please prudent in circulation of email relating work , activities.",   "we have recognized on $550 million of fair value gains on stocks via our swaps raptor.",   "the raptor accounting treatment looks questionable. a. enron booked $500 million gain equity derivatives related party.",   "in third quarter have $250 million problem raptor 3 if don’t “enhance” capital structure of raptor 3 commit more ene shares.") view <- factor(rep(c("view 1", "view 2", "view 3"), each = 3)) df <- data.frame(text, view, stringsasfactors = false)  # prepare mini-enron corpus corpus <- corpus(vectorsource(df$text)) corpus <- tm_map(corpus, tolower) corpus <- tm_map(corpus, removepunctuation) corpus <- tm_map(corpus, function(x) removewords(x, stopwords("english"))) corpus <- tm_map(corpus, stemdocument, language = "english") corpus # check corpus  # mini-enron corpus 9 text documents  # compute term-document matrix contains occurrance of terms in each email # compute distance between pairs of documents , scale multidimentional semantic space (mds) onto 2 dimensions td.mat <- as.matrix(termdocumentmatrix(corpus)) dist.mat <- dist(t(as.matrix(td.mat))) dist.mat  # check distance matrix  # compute distance between pairs of documents , scale multidimentional semantic space onto 2 dimensions fit <- cmdscale(dist.mat, eig = true, k = 2) points <- data.frame(x = fit$points[, 1], y = fit$points[, 2]) ggplot(points, aes(x = x, y = y)) + geom_point(data = points, aes(x = x, y = y, color = df$view)) + geom_text(data = points, aes(x = x, y = y - 0.2, label = row.names(df))) 

however, when run error (in td.mat <- as.matrix(termdocumentmatrix(corpus)) line):

error in usemethod("meta", x) :    no applicable method 'meta' applied object of class "character" in addition: warning message: in mclapply(unname(content(x)), termfreq, control) :   scheduled cores encountered errors in user code 

i not sure @ - modules loaded.

the latest version of tm (0.60) made can't use functions tm_map operate on simple character values more. problem tolower step since isn't "canonical" transformation (see gettransformations()). replace with

corpus <- tm_map(corpus, content_transformer(tolower)) 

the content_transformer function wrapper convert correct data type within corpus. can use content_transformer function intended manipulate character vectors work in tm_map pipeline.


Comments

Popular posts from this blog

python - Healpy: From Data to Healpix map -

c - Bitwise operation with (signed) enum value -

xslt - Unnest parent nodes by child node -