All Issue

2020 Vol.35, Issue 4 Preview Page

Research Article

29 February 2020. pp. 563-582
Abstract
This study proposes a methodology for constructing linguistic resources in order to eliminate irrelevant keywords from social media texts related to disasters, such as earthquakes or typhoons. When collecting disaster-related social media texts for sentiment analysis, a large number of noisy keywords metaphorically used, such as ‘pupil-earthquake = astonishment,’ is observed. In this regard, filtering these linguistic noisy expressions plays a crucial role in performing an accurate text classification or sentiment analysis. In this study, two types of linguistic patterns are examined for filtering noisy expressions in natural & social disaster-related texts, and a bootstrap method based on the DECO Korean electronic dictionary and Local-Grammar Graph(LGG) formalism is suggested. In this way, for six keywords, around 110~ 470 patterns per keyword are described in LGGs. By applying them to a new corpus through the DECO Noise-Filter platform, we obtained about 88.4% f-measure. The methodology suggested in this study may be adopted in filtering other types of noisy expressions, which will improve the reliability of the performance of sentiment analysis of social media texts.
References
  1. 고아라. 2013. 조사 “같이”와 “처럼”의 의미와 기능에 대한 연구. 『건지인문학』 9, 5-30.
  2. 김진해. 2014. 은유적 합성명사의 결합관계와 인지언어학적 해석. 『국어학』 70, 29-57.
  3. 남지순. 2018. 『코퍼스 분석을 위한 한국어 전자사전 구축방법론』. 도서출판 역락.
  4. 박태연 ‧ 한희정 ‧ 김용 ‧ 김수정. 2017. 재난안전정보의 통합 관리를 위한 분류체계 현황분석 및 개선방안에 관한 연구. 『한국비블리아학회지』 28.3, 125-150.
  5. 신봉희 ‧ 전혜경. 2018. 빅 데이터를 이용한 재해 정보 지원에 관한 연구. 『한국융합학회논문지』 9.8, 25-32.
  6. 신자행. 2016. 빅 데이터 기반의 재난정보관리 방안. 서울대학교 대학원 석사학위 논문.
  7. 안길승 ‧ 서민지 ‧ 허선. 2017. 효과적인 산업재해 분석을 위한 텍스트마이닝 기반의 사고 분류 모형과 온톨로지 개발. 『한국안전학회지』 32.5, 179-185.
  8. 임채훈. 2002. 국어 비유구문의 의미연구-“처럼”, “만큼”을 중심으로-. 『한국어 의미학』 10.
  9. 유광훈 ‧ 남지순. 2017. DecoTex Users' Manual. DICORA-TR-2017-12. DICORA, 한국외국어대학교.
  10. 황창회 ‧ 남지순. 2018. SNS 사용자 생성문에 대한 코퍼스 수집 시스템 소개: Deco Crawlers. DICORA-TR-01-2018. DICORA, 한국외국어대학교.
  11. Baek. S., H. Jeong, and K. Kobayashi. 2013. Disaster Anxiety Measurement and Corpus-Based Content Analysis of Crisis Communication. IEEE International Conference on Systems, Man, and Cybernetics 1789-1794. 10.1109/SMC.2013.309
  12. Gross, M. 1997. The Construction of local grammars. Finite-State language processing, Roche & Schabes (eds.), the MIT Press.
  13. Gross, M. 1999. Nouvelles applications des graphes d’automates finis à la description linguistique, Lingvisticae Investigationes 22.1-2, 249-262. 10.1075/li.22.1-2.15gro
  14. Matherson, D. 2018. The performance of publicness in social media: tracing patterns in tweets after a disaster. Media, Culture & Society 40.4, 584-599. 10.1177/0163443717741356
  15. Paumier, S. 2003. De la reconnaissance de formes linguistiques a l’analyse syntaxique, Ph.D. dissertation, Univ of PEMLV in France.
Information
  • Publisher :The Modern Linguistic Society of Korea
  • Publisher(Ko) :한국현대언어학회
  • Journal Title :The Journal of Studies in Language
  • Journal Title(Ko) :언어연구
  • Volume : 35
  • No :4
  • Pages :563-582