Using the pubannotation ecosystem to perform agile text mining on genomics & informatics: A tutorial review
- 주제(키워드) Named entity recognition , Natural language processing , Text mining
- 관리정보기술 faculty
- 등재 SCOPUS
- 발행기관 Korea Genome Organization
- 발행년도 2020
- 총서유형 Journal
- URI http://www.dcollection.net/handler/ewha/000000182375
- 본문언어 영어
- Published As http://dx.doi.org/10.5808/GI.2020.18.2.e13
초록/요약
The prototype version of the full-text corpus of Genomics & Informatics has recently been archived in a GitHub repository. The full-text publications of volumes 10 through 17 are also directly downloadable from PubMed Central (PMC) as XML files. During the Biomedi-cal Linked Annotation Hackathon 6 (BLAH6), we experimented with converting, annotat-ing, and updating 301 PMC full-text articles of Genomics & Informatics using PubAnnota-tion, a system that provides a convenient way to add PMC publications based on PMCID. Thus, this review aims to provide a tutorial overview of practicing the iterative task of named entity recognition with the PubAnnotation/PubDictionaries/TextAE ecosystem. We also describe developing a conversion tool between the Genia tagger output and the JSON format of PubAnnotation during the hackathon. © 2020, Korea Genome Organization.
more