Tagging accuracy analysis on part-of-speech taggers

YUMUŞAK, Semih; DOĞDU, Erdoğan; KODAZ, Halife

Tagging accuracy analysis on part-of-speech taggers

Yazar/lar YUMUŞAK, Semih
DOĞDU, Erdoğan
KODAZ, Halife
Yayın Türü Makale
Yayın Tarihi 2014
Tek Biçim Adres https://hdl.handle.net/20.500.12498/1048

Part of Speech (POS) Tagging can be applied by several tools and several programming languages. This work focuses on the Natural Language Toolkit (NLTK) library in the Python environment and the gold standard corpora installable. The corpora and tagging methods are analyzed and compared by using the Python language. Different taggers are analyzed according to their tagging accuracies with data from three different corpora. In this study, we have analyzed Brown, Penn Treebank and NPS Chat corpuses. The taggers we have used for the analysis are; default tagger, regex tagger, n-gram taggers. We have applied all taggers to these three corpuses, resultantly we have shown that whereas Unigram tagger does the best tagging in all corpora, the combination of taggers does better if it is correctly ordered. Additionally, we have seen that NPS Chat Corpus gives different accuracy results than the other two corpuses.

Koleksiyonlar Fakülteler
Mühendislik ve Doğa Bilimleri Fakültesi
Bilgisayar Mühendisliği

Erişime Açık

Görüntülenme

3

22.03.2024 tarihinden bu yana

İndirme

1

22.03.2024 tarihinden bu yana

Son Erişim Tarihi

19 Nisan 2024 14:25

Google Kontrol

Tıklayınız

Tam Metin İndirmek için tıklayın Ön izleme

Eser Adı (dc.title)	Tagging accuracy analysis on part-of-speech taggers
Yayın Türü (dc.type)	Makale
Yazar/lar (dc.contributor.author)	YUMUŞAK, Semih
Yazar/lar (dc.contributor.author)	DOĞDU, Erdoğan
Yazar/lar (dc.contributor.author)	KODAZ, Halife
Atıf Dizini (dc.source.database)	Wos
Atıf Dizini (dc.source.database)	Scopus
Yayın Tarihi (dc.date.issued)	2014
Kayıt Giriş Tarihi (dc.date.accessioned)	2019-07-10T08:21:16Z
Açık Erişim tarihi (dc.date.available)	2019-07-10T08:21:16Z
ISSN (dc.identifier.issn)	2327-5227
Özet (dc.description.abstract)	Part of Speech (POS) Tagging can be applied by several tools and several programming languages. This work focuses on the Natural Language Toolkit (NLTK) library in the Python environment and the gold standard corpora installable. The corpora and tagging methods are analyzed and compared by using the Python language. Different taggers are analyzed according to their tagging accuracies with data from three different corpora. In this study, we have analyzed Brown, Penn Treebank and NPS Chat corpuses. The taggers we have used for the analysis are; default tagger, regex tagger, n-gram taggers. We have applied all taggers to these three corpuses, resultantly we have shown that whereas Unigram tagger does the best tagging in all corpora, the combination of taggers does better if it is correctly ordered. Additionally, we have seen that NPS Chat Corpus gives different accuracy results than the other two corpuses.
Yayın Dili (dc.language.iso)	en
Tek Biçim Adres (dc.identifier.uri)	https://hdl.handle.net/20.500.12498/1048

Yayın Görüntülenme

Erişilen ülkeler

Erişilen şehirler

Bu site altında yer alan tüm kaynaklar Creative Commons Alıntı-GayriTicari-Türetilemez 4.0 Uluslararası Lisansı ile lisanslanmıştır.