This is a short R script to generate a wordcloud and is something I was mucking around with after a day of study.
I’ve been studying PCA for most of the day so figured I’d use the PCA page from Wikipedia to generate a word cloud.
The code is pretty straight forward using the ‘tm’ text mining and ‘wordcloud’ packages. The only thing that orginally caught me out was using the R toLower function which is no longer supported in tm. You now need to use ‘content_transformer(tolower)’.
# load the packages library(tm)
## Loading required package: NLP
## Loading required package: RColorBrewer
# load the text aFile<- readLines("PCA.txt")
## Warning in readLines("PCA.txt"): incomplete final line found on 'PCA.txt'
myCorpus <- Corpus(VectorSource(aFile)) # parse and munge the text myCorpus = tm_map(myCorpus, content_transformer(tolower)) myCorpus = tm_map(myCorpus, removePunctuation) myCorpus = tm_map(myCorpus, removeNumbers) myCorpus = tm_map(myCorpus, removeWords, stopwords("english")) myCorpus = tm_map(myCorpus, removeWords, c("Taylor", "Swift")) # generate a matrix of words myDTM = TermDocumentMatrix(myCorpus, control = list(minWordLength = 1)) m = as.matrix(myDTM) # sort the matrix v = sort(rowSums(m), decreasing = TRUE) # generate the wordcloud wordcloud(names(v), v, scale=c(5,0.5), max.words=100, random.order=FALSE, rot.per=0.35, use.r.layout=FALSE, colors=brewer.pal(8, "Dark2"))