A statistical model for measuring structural similarity between webpages

Zhenisbek Assylbekov, Assulan Nurkas, Inês Russinho Mouga

Research output: Contribution to journalConference articlepeer-review

1 Citation (Scopus)

Abstract

This paper presents a statistical model for measuring structural similarity between webpages from bilingual websites. Starting from basic assumptions we derive the model and propose an algorithm to estimate its parameters in unsupervised manner. Statistical approach appears to benefit the structural similarity measure: in the task of distinguishing parallel webpages from bilingual websites our languageindependent model demonstrates an Fscore of 0.94-0.99 which is comparable to the results of language-dependent methods involving content similarity measures.

Original languageEnglish
Pages (from-to)24-31
Number of pages8
JournalInternational Conference Recent Advances in Natural Language Processing, RANLP
Volume2015-January
Publication statusPublished - Jan 1 2015
Event10th International Conference on Recent Advances in Natural Language Processing, RANLP 2015 - Hissar, Bulgaria
Duration: Sep 7 2015Sep 9 2015

ASJC Scopus subject areas

  • Software
  • Computer Science Applications
  • Artificial Intelligence
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'A statistical model for measuring structural similarity between webpages'. Together they form a unique fingerprint.

Cite this