dCollection 디지털 학술정보 유통시스템

Understanding recurrent neural network for texts using English-Korean corpora

주제(키워드) Keras , Neural machine translation , NLP , RNN , Seq2Seq
등재 SCOPUS, KCI등재
발행기관 Korean Statistical Society
발행년도 2020
총서유형 Journal
URI http://www.dcollection.net/handler/ewha/000000175803
본문언어 영어
Published As http://dx.doi.org/10.29220/CSAM.2020.27.3.313

초록/요약

Deep Learning is the most important key to the development of Artificial Intelligence (AI). There are several distinguishable architectures of neural networks such as MLP, CNN, and RNN. Among them, we try to understand one of the main architectures called Recurrent Neural Network (RNN) that differs from other networks in handling sequential data, including time series and texts. As one of the main tasks recently in Natural Language Processing (NLP), we consider Neural Machine Translation (NMT) using RNNs. We also summarize fundamental structures of the recurrent networks, and some topics of representing natural words to reasonable numeric vectors. We organize topics to understand estimation procedures from representing input source sequences to predict target translated sequences. In addition, we apply multiple translation models with Gated Recurrent Unites (GRUs) in Keras on English-Korean sentences that contain about 26,000 pairwise sequences in total from two different corpora, colloquialism and news. We verified some crucial factors that influence the quality of training. We found that loss decreases with more recurrent dimensions and using bidirectional RNN in the encoder when dealing with short sequences. We also computed BLEU scores which are the main measures of the translation performance, and compared them with the score from Google Translate using the same test sentences. We sum up some difficulties when training a proper translation model as well as dealing with Korean language. The use of Keras in Python for overall tasks from processing raw texts to evaluating the translation model also allows us to include some useful functions and vocabulary libraries as well. © 2020 The Korean Statistical Society, and Korean International Statistical Society.

반출 Meta View 목록

검색 상세

Understanding recurrent neural network for texts using English-Korean corpora

초록/요약