The GPU Enhanced Parallel Computing for Large Scale Data Clustering
Rutrell Yasin (29 November 2012)
Paper’s reference in the IEEE style?
X. Cui, J. S. Charles, and T. E. Potok, “The GPU Enhanced Parallel Computing for Large Scale Data Clustering,” in 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2011, pp. 220–225.
How did you find the paper?
Search on the IEEE website1)http://ieeexplore.ieee.org/
If applicable, write a list of the search terms you used.
- "Cui, Xiaohui"
Was the paper peer reviewed? Explain how you found out.
Does the author(s) work in a university or a government-funded research institute? If so, which university or research institute? If not, where do they work?
The authors work at USA Energy department's Oak Ridge national laboratory or at Carnegie Mellon university.
What does this tell you about their expertise? Are they an expert in the topic area?
The authors are experts in the field of large volume text analysis.
What was the paper about?
The paper discusses research into clustering of information using highly parallel processors (GPUs) and a flocking type model. The flocking model borrows from the behavior exhibited by birds, whereby each individual's movement decisions are based only on the behavior of those around them and other environmental factors.
A dimple flocking model for birds follows three basic rules:
- separation, steering to avoid collision with neighbors
- alignment, steering toward the average heading and matching velocity of neighbors
- cohesion, steering toward the average position of neighbors.
For this research into document clustering a fourth rule was added:
- feature similarity.
For each document, a feature vector is determined and cosine distance is used to create a similarity matrix.
The, documents are given a random position in a 2D plane and the flocking algorithm applied to their position. For documents which were below a similarity threshold, the alignment and cohesion rules were nullified effectively causing them to repel.
In a GPU, one kernel is created for each document pair (n^2 kernels) to calculate the value of the rules between each document pair, a second kernel is then run to calculate the resulting position and velocity of each document.
The research was completed on a desktop computer using a single NVIDIA GPU. A GPU based model was performed and compared against a single threaded CPU model written in C.
The GPU solution performed between 36 and 59 times faster than the CPU method.
If applicable, is this paper similar to other papers you have read for this assignment? If so, which papers and why?
This paper is related to :
If applicable, is this paper different to other papers you have read for this assignment? If so, which papers and why?
What do these similarities and differences suggest? What are your observations? Do you have any new ideas? Do you have any conclusions?
The use of GPUs for text analysis can deliver significant improvements in speed and reductions in costs. However, programming GPUs can be difficult and new parallel techniques are required to utilise them.
This question is to be answered after your critical analysis is completed. Which sections (if any) of your critical analysis was this paper cited in?
References [ + ]