Named entity recognition in aviation products domain based on BERT

Date published

2024-12-12

Free to read from

2025-01-07

Supervisor/s

Journal Title

Journal ISSN

Volume Title

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Department

Course name

Type

Article

ISSN

2169-3536

Format

Citation

Yang M, Namoano B, Farsi M, Ahmet Erkoyuncu J. (2024) Named entity recognition in aviation products domain based on BERT. IEEE Access, Volume 12, December 2024, pp. 189710-189721

Abstract

The aviation products' manufacturing industry is undergoing a profound transformation towards intelligence, among which the construction of a knowledge graph specifically for the aviation field has become the core link in achieving cognitive intelligence. In the process of knowledge graph construction, named entity recognition (NER) is a key step and one of the main tasks of knowledge extraction. Given the high degree of specialisation of aviation product text data and the wide span of contextual information, existing models often perform poorly in entity extraction. This paper proposes a new Named Entity Recognition (NER) method specifically tailored for the aviation product field (BBC-Ap), introducing an innovative approach that leverages domain-specific ontologies and advanced deep learning algorithms to significantly enhance the accuracy and efficiency of entity extraction from complex technical documents. The first step of this method is to establish an ontology model of aviation products and annotate the relevant text data to form a dataset for training the named entity model. Next, it adopts a multi-level model structure based on BERT, in which BERT is used to generate word vector representations, a bidirectional long short-term memory network (BiLSTM) is used as an encoder to extract semantic features, and a conditional random field (CRF) is used as a decoder to achieve optimal label assignment. Through experiments on the constructed aviation product dataset, the model achieved a Precision value of 91.74%, a Recall value of 92.46%, and an F1 score of 92.1%, Compared with other baseline models, the F1-score is improved by 0.9% to 1.5%. At the same time, the model also performs well on standard datasets such as CoNLLpp, with a Precision value of 92.87%, a Recall value of 92.54%, and an F1-Score of 92.70%. Finally, the model was used to successfully construct a knowledge graph reflecting the relationships between aviation products in Neo4j, further demonstrating the effectiveness and practicality of the method.

Description

Software Description

Software Language

Github

Keywords

4605 Data Management and Data Science, 46 Information and Computing Sciences, 4602 Artificial Intelligence, 4611 Machine Learning, Networking and Information Technology R&D (NITRD), 40 Engineering, 46 Information and computing sciences

DOI

Rights

Attribution 4.0 International

Funder/s

Engineering and Physical Sciences Research Council
Engineering and Physical Sciences Research Council (EPSRC), Grant Number: EP/Z533221/1

Relationships

Relationships

Resources