NER in Hindi Language Using Transformer Model:XLM-Roberta
Author
Abstract

Natural Language Processing - Natural language processing (NLP) is a computer program that trains computers to read and understand the text and spoken words in the same way that people do. In Natural Language Processing, Named Entity Recognition (NER) is a crucial field. It extracts information from given texts and is used to translate machines, text to speech synthesis, to understand natural language, etc. Its main goal is to categorize words in a text that represent names into specified tags like location, organization, person-name, date, time, and measures. In this paper, the proposed method extracts entities on Hindi Fraud Call (publicly not available) annotated Corpus using XLM-Roberta (base-sized model). By pre-training model to build the accurate NER system for datasets, the Authors are using XLM-Roberta as a multi-layer bidirectional transformer encoder for learning deep bidirectional Hindi word representations. The fine-tuning concept is used in this proposed method. XLM-Roberta Model has been fine-tuned to extract nine entities from sentences based on context of sentences to achieve better performance. An Annotated corpus for Hindi with a tag set of Nine different Named Entity (NE) classes, defined as part of the NER Shared Task for South and Southeast Asian Languages (SSEAL) at IJCNLP. Nine entities have been recognized from sentences. The Obtained F1-score(micro) and F1-score(macro) are 0.96 and 0.80, respectively.

Year of Publication
2022
Date Published
sep
Publisher
IEEE
Conference Location
Pune, India
ISBN Number
978-1-66542-832-3
URL
https://ieeexplore.ieee.org/document/9935841/
DOI
10.1109/ICBDS53701.2022.9935841
Google Scholar | BibTeX | DOI