WordSim353 is a
test collection for measuring word similarity or relatedness, developed and maintained
by E. Gabrilovich.
This page contains a split of the test set into two subsets, one for evaluating
similarity, and the other for evaluating relatedness, according to the
procedure described in the following paper:
Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius Pasca, Aitor Soroa,
A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches,
In Proceedings of NAACL-HLT 2009.
If you publish results based on this dataset, please reference this paper.
The dataset is available for download here. It contains
the following files:
- wordsim353_annotator1.txt: the classification of the pairs according to the first annotator.
- wordsim353_annotator2.txt: the classification of the pairs according to the second annotator.
- wordsim353_agreed.txt: the classification of the pairs after agreement was reached.
- wordsim_relatedness_goldstandard.txt: the final goldstandard for measuring relatedness, in the same format as the WordSim353 distribution.
- wordsim_similarity_goldstandard.txt: the final goldstandard for measuring similarity, in the same format as the WordSim353 distribution.