Tagging accuracy analysis on part-of-speech taggers

Göster/ Aç
Tarih
2014Yazar
YUMUŞAK, Semih
DOĞDU, Erdoğan
KODAZ, Halife
Üst veri
Tüm öğe kaydını gösterÖzet
Part of Speech (POS) Tagging can be applied by several tools and several programming languages.
This work focuses on the Natural Language Toolkit (NLTK) library in the Python environment and
the gold standard corpora installable. The corpora and tagging methods are analyzed and compared by using the Python language. Different taggers are analyzed according to their tagging accuracies with data from three different corpora. In this study, we have analyzed Brown, Penn
Treebank and NPS Chat corpuses. The taggers we have used for the analysis are; default tagger,
regex tagger, n-gram taggers. We have applied all taggers to these three corpuses, resultantly we
have shown that whereas Unigram tagger does the best tagging in all corpora, the combination of
taggers does better if it is correctly ordered. Additionally, we have seen that NPS Chat Corpus gives
different accuracy results than the other two corpuses.
Koleksiyonlar

DSpace@Karatay by Karatay University Institutional Repository is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 Unported License..