正體中文斷詞系統應用於大型語料庫之多方評估研究

Wen Chao Yeh, Yu Lun Hsieh, Yung Chun Chang, Wen Lian Hsu

研究成果: 書貢獻/報告類型會議貢獻

1 引文 斯高帕斯(Scopus)

摘要

This study aims to evaluate three most popular word segmentation tool for a large Traditional Chinese corpus in terms of their efficiency, resource consumption, and cost. Specifically, we compare the performances of Jieba, CKIP, and MONPA on word segmentation, part-of-speech tagging and named entity recognition through extensive experiments. Experimental results show that MONPA using GPU for batch segmentation can greatly reduce the processing time of massive datasets. In addition, its features such as word segmentation, part-of-speech tagging, and named entity recognition are beneficial to downstream applications.
貢獻的翻譯標題Multifaceted Assessments of Traditional Chinese Word Segmentation Tool on Large Corpora
原文中文
主出版物標題ROCLING 2022 - Proceedings of the 34th Conference on Computational Linguistics and Speech Processing
編輯Yung-Chun Chang, Yi-Chin Huang, Jheng-Long Wu, Ming-Hsiang Su, Hen-Hsen Huang, Yi-Fen Liu, Lung-Hao Lee, Chin-Hung Chou, Yuan-Fu Liao
發行者The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
頁面193-199
頁數7
ISBN(電子)9789869576956
出版狀態已發佈 - 2022
事件34th Conference on Computational Linguistics and Speech Processing, ROCLING 2022 - Taipei, 臺灣
持續時間: 11月 21 202211月 22 2022

出版系列

名字ROCLING 2022 - Proceedings of the 34th Conference on Computational Linguistics and Speech Processing

會議

會議34th Conference on Computational Linguistics and Speech Processing, ROCLING 2022
國家/地區臺灣
城市Taipei
期間11/21/2211/22/22

Keywords

  • Chinese Word Segmentation
  • NER
  • NLP
  • POS

ASJC Scopus subject areas

  • 語言與語言學
  • 言語和聽力

指紋

深入研究「正體中文斷詞系統應用於大型語料庫之多方評估研究」主題。共同形成了獨特的指紋。

引用此