There is an increasing trend in development of standalone tools and techniques for natural language processing (NLP) purposes. In recent years, the news agencies have focused on extracting knowledge from a huge amount of pile text news from various media. However, little work has been done to develop a unified platform for mining and monitoring of news agencies in Persian. In this paper we present an integrated platform for monitoring of Persian news agencies. This platform consists of six main segments including: the web crawler, html parser, estimation of news penetration, category classifier, multi-level news clustering engine and the visualizer. Various open source tools and techniques have been employed in order to design and implement each of the mentioned segments. The final platform has been deployed in one of the most influential Iranian news agencies as a decision support system for comparing the position and rank of the news agency with respect to other competitors.
@InProceedings{TAGHIPOUR18.7, author = {Mohammad Taghipour ,Foad Aboutorabi ,Vahid Zarrabi and Habibollah Asghari}, title = {An Integrated text mining Platform for Monitoring of Persian News Agencies}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {may}, date = {7-12}, location = {Miyazaki, Japan}, editor = {Octavian Popescu and Carlo Strapparava}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {979-10-95546-11-5}, language = {english} }