Mitigating Class Imbalance in Sentiment Analysis through GPT-3-Generated Synthetic Sentences
- 주제(키워드) GPT-3 , imbalanced sentiment analysis , sentiment analysis , sentiment classification , synthetics review generation , text classification , text generation
- 관리정보기술 faculty
- 등재 SCIE, SCOPUS
- OA유형 All Open Access; Gold Open Access
- 발행기관 Multidisciplinary Digital Publishing Institute (MDPI)
- 발행년도 2023
- 총서유형 Journal
- URI http://www.dcollection.net/handler/ewha/000000211572
- 본문언어 영어
- Published As https://doi.org/10.3390/app13179766
초록/요약
In this paper, we explore the effectiveness of the GPT-3 model in tackling imbalanced sentiment analysis, focusing on the Coursera online course review dataset that exhibits high imbalance. Training on such skewed datasets often results in a bias towards the majority class, undermining the classification performance for minority sentiments, thereby accentuating the necessity for a balanced dataset. Two primary initiatives were undertaken: (1) synthetic review generation via fine-tuning of the Davinci base model from GPT-3 and (2) sentiment classification utilizing nine models on both imbalanced and balanced datasets. The results indicate that good-quality synthetic reviews substantially enhance sentiment classification performance. Every model demonstrated an improvement in accuracy, with an average increase of approximately 12.76% on the balanced dataset. Among all the models, the Multinomial Naïve Bayes achieved the highest accuracy, registering 75.12% on the balanced dataset. This study underscores the potential of the GPT-3 model as a feasible solution for addressing data imbalance in sentiment analysis and offers significant insights for future research. © 2023 by the authors.
more