Efficient Estimation of Word Representations in Vector Space
Mikolov et al (2013)
Paper’s reference in the IEEE style?
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
How did you find the paper?
Link from the Google blog post on Word2Vec1)https://opensource.googleblog.com/2013/08/learning-meaning-behind-words.html
If applicable, write a list of the search terms you used.
Was the paper peer reviewed? Explain how you found out.
Does the author(s) work in a university or a government-funded research institute? If so, which university or research institute? If not, where do they work?
The authors all work for google
What does this tell you about their expertise? Are they an expert in the topic area?
Experienced in research
What was the paper about?
[200 -300 words]
If applicable, is this paper similar to other papers you have read for this assignment? If so, which papers and why?
The paper outlines two techniques for computing continuous vector representations of words from large data sets (billions of words with millions of words in the vocabulary). The techniques create vectors which represent the similarity between words but potentially with multiple degrees.
Having deep and accurate representations of word similarity will allow algebraic operations such as the following to be possible:
The two techniques are:
- continuous bag-of-word (CBOW) model
Similar to a neural network language model (NNLM) but with modifications. A model is trained for each word using two preceding and two following words, in order to have a continuous representation of the word's context
- Continuous skip-gram model
Similar to CBOW but tries to classify a word based on other words in the sentence. Accuracy of predictions in increased with increasing preceding and following word counts, but so is complexity.
To find semantic similarities of word in a different context,
the vector space of words can be searched for the word closest to the LHS of this equation, measured by cosine distance.
Examples of semantic and syntactic questions which can be answered include:
These models were trained on 6 billion words. The following table illustrates the magnitude of the training exercise, and the number of CPU cores needed.
If applicable, is this paper different to other papers you have read for this assignment? If so, which papers and why?
This paper is related to to following papers using machine learning for NLP.
What do these similarities and differences suggest? What are your observations? Do you have any new ideas? Do you have any conclusions?
This paper is from 2013, only three years ago, and there is still a large amount of development and potential development in NLP using machine learning.
The technical complexity (e.g. semantic, syntactic, contextual complexity) of NLP, the corpus sizes and the processing capacity required to delivery results is challenging.
There is a lot of work still to do and a lot of upside to be found.
This question is to be answered after your critical analysis is completed. Which sections (if any) of your critical analysis was this paper cited in?
References [ + ]