Big Social Data Analytics in Journalism and Mass Communication
Lei Guo1)Boston University, MA, USA, Chris J. Vargo2)The University of Alabama, Tuscaloosa, USA, Zixuan Pan3)Yodlee, Redwood City, CA, USA, Weicong Ding4)Technicolor Research, Los Altos, CA, USA, Prakash Ishwar5)Boston University, MA, USA
Paper’s reference in the IEEE style?
L. Guo, C. J. Vargo, Z. Pan, W. Ding, and P. Ishwar, “Big Social Data Analytics in Journalism and Mass Communication: Comparing Dictionary-Based Text Analysis and Unsupervised Topic Modeling,” Journalism & Mass Communication Quarterly, vol. 93, no. 2, pp. 332–359, Jun. 2016.
How did you find the paper?
Search on UniSA library site
If applicable, write a list of the search terms you used.
- "Big Data Text Analytics Platforms"
Was the paper peer reviewed? Explain how you found out.
"Journalism & Mass Communication Quarterly (JMCQ) is the flagship journal of the Association for Education in Journalism and Mass Communication (AEJMC). It is a quarterly, peer-reviewed journal ranked in the Journal Citation Reports that focuses on research in journalism and mass communication"6)https://au.sagepub.com/en-gb/oce/journalism-mass-communication-quarterly/journal202061
Does the author(s) work in a university or a government-funded research institute? If so, which university or research institute? If not, where do they work?
The authors of the paper are all university researchers. See footnotes
What does this tell you about their expertise? Are they an expert in the topic area?
All authors are experts in the filed
What was the paper about?
The paper presents the results of a study of 77 million tweets about the 2012 US presidential election comparing dictionary based and Latent Dirichlet Allocation (LDA) based analysis.
The following research questions were posed using the twitter data:
- using a dictionary based approach, what is the qualitative structure and proportion of the topic in coverage of Obama and Romney during the elections
- what is it using LDA
- how do the results compare
- which result produced the more valid results
The study determined that both computerized methods had issues with accurately determining or classifying the topics for tweets but that overall LDS performed better than the dictionary approach.
Some comments about big data from the paper:
"Big data is a broad term used for datasets that have a size (e.g., dimensionality, volume, and velocity of generation) and complexity (e.g., diversity, variability) that exceed the capabilities of traditionally used tools for capturing, processing, curating, and analyzing data within a tolerable timeframe. "
"...the criterion to evaluate whether a dataset is “big” is a function of the amount of time required for a human to make a decision on a given unit. For complex problems and large documents, datasets that extend beyond 10,000 may be considered “big.” For smaller documents that require little time per unit to code (e.g., tweet), datasets in the 100,000s are generally considered too large for manual methods (Riffe et al., 2014)."
If applicable, is this paper similar to other papers you have read for this assignment? If so, which papers and why?
This paper is related to previous papers reviewed on text analytics including:
If applicable, is this paper different to other papers you have read for this assignment? If so, which papers and why?
While related to text analytics, this the first paper discussing and comparing techniques to analyse large text data sets, comparing the dictionary based and LDA appraches.
What do these similarities and differences suggest? What are your observations? Do you have any new ideas? Do you have any conclusions?
As of 1997, newspaper (46.7%) was the predominant medium for content analysis while 24.3% focused on TV manuscripts; all were analysed by humans. With the explosion of the internet and social media, the volume of text data available means it is no longer possible to use human based methods to gather meaningful data about it. For this paper, 77 million tweets were reviewed. After pre-processing this resulted in over 30m tweets by more than 1.5m users about Obama, and nearly 20m tweets by over 1m users about Romney.
A a check of the performance of the dictionary and LDA approaches to text analysis, a group of 100 tweet was checked manually. Obviously an extremely low sample of the dataset.
In order to obtain insight from this ever increasing volume, velocity and variety of data, new and improved tools are going to be required.
This question is to be answered after your critical analysis is completed. Which sections (if any) of your critical analysis was this paper cited in?
References [ + ]
|1, 5.||↑||Boston University, MA, USA|
|2.||↑||The University of Alabama, Tuscaloosa, USA|
|3.||↑||Yodlee, Redwood City, CA, USA|
|4.||↑||Technicolor Research, Los Altos, CA, USA|