ArabicWeb16 is the largest publicly-available Arabic Web crawl, containing 150M Web pages. We envisage many uses of this dataset to advance the research in various fields such as information retrieval (IR), machine learning, and natural language processing. However, accessing such a large dataset needs high storage and processing resources, which may not be available for many research teams. In this paper, we present iArabicWeb16, a freely-available Web-based tool making ArabicWeb16 dataset more accessible to the research community via a Web interface and programming API. iArabicWeb16 allows users to search efficiently while providing them with various ranking and indexing methods, and to retrieve Web pages directly. We evaluate its efficiency and scalability with respect to the number of users it can serve and show that it is a useful research tool that helps explore and search ArabicWeb16 dataset.
@InProceedings{YASSER18.12, author = {Khaled Yasser ,Reem Suwaileh ,Abdelrahman Shouman ,Yassmine Barkallah ,Mucahid Kutlu and Tamer Elsayed}, title = {IArabicWeb16: Making a Large Web Collection More Accessible for Research}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {may}, date = {7-12}, location = {Miyazaki, Japan}, editor = {Hend Al-Khalifa and King Saud University and KSA
Walid Magdy and University of Edinburgh and UK
Kareem Darwish and Qatar Computing Research Institute and Qatar
Tamer Elsayed and Qatar University and Qatar}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {979-10-95546-25-2}, language = {english} }