Optimisation of Relationship Extraction of Tibetan Medicine Entities Based on ALBERT Model and Span Methods
Keywords:
Tibetan Medicine, Relationship Extraction, ALBERT, Span Methods, Transformer Models, Knowledge Graphs, Clinical Decision Support Systems, Low-Resource NLPAbstract
Extracting relationships from Tibetan medicine texts is essential for building knowledge graphs and improving information retrieval. However, existing methods struggle with the unique linguistic challenges and limited annotated data of Tibetan language. Traditional approaches like rule-based techniques and classical machine learning models often lack accuracy and generalization. While transformer-based models such as BERT offer improvements, they remain inadequate for specialized Tibetan medical texts.
To address this, we propose an optimized ALBERT-Span model that combines ALBERT's contextual embeddings with span-based extraction to handle overlapping and nested entities. Enhanced with data augmentation and hyperparameter tuning, the model significantly outperforms baselines like BERT, BiLSTM-CRF, and CNN models, achieving a 5.4% higher F1 score and 5.2% increase in accuracy. Ablation studies and statistical testing confirm the robustness and effectiveness of the proposed approach.
Our method supports practical applications such as knowledge graph construction and clinical decision support systems (CDSS). Future work will focus on integrating multimodal data and exploring few-shot learning for low-resource scenarios.