A statistical model for measuring structural similarity between webpages

Zhenisbek Assylbekov, Assulan Nurkas, Inês Russinho Mouga

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This paper presents a statistical model for measuring structural similarity between webpages from bilingual websites. Starting from basic assumptions we derive the model and propose an algorithm to estimate its parameters in unsupervised manner. Statistical approach appears to benefit the structural similarity measure: in the task of distinguishing parallel webpages from bilingual websites our languageindependent model demonstrates an Fscore of 0.94-0.99 which is comparable to the results of language-dependent methods involving content similarity measures.

Original languageEnglish
Title of host publicationInternational Conference Recent Advances in Natural Language Processing, RANLP
PublisherAssociation for Computational Linguistics (ACL)
Pages24-31
Number of pages8
Volume2015-January
Publication statusPublished - 2015
Event10th International Conference on Recent Advances in Natural Language Processing, RANLP 2015 - Hissar, Bulgaria
Duration: Sep 7 2015Sep 9 2015

Other

Other10th International Conference on Recent Advances in Natural Language Processing, RANLP 2015
CountryBulgaria
CityHissar
Period9/7/159/9/15

Fingerprint

Websites
Statistical Models

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Software
  • Electrical and Electronic Engineering

Cite this

Assylbekov, Z., Nurkas, A., & Mouga, I. R. (2015). A statistical model for measuring structural similarity between webpages. In International Conference Recent Advances in Natural Language Processing, RANLP (Vol. 2015-January, pp. 24-31). Association for Computational Linguistics (ACL).

A statistical model for measuring structural similarity between webpages. / Assylbekov, Zhenisbek; Nurkas, Assulan; Mouga, Inês Russinho.

International Conference Recent Advances in Natural Language Processing, RANLP. Vol. 2015-January Association for Computational Linguistics (ACL), 2015. p. 24-31.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Assylbekov, Z, Nurkas, A & Mouga, IR 2015, A statistical model for measuring structural similarity between webpages. in International Conference Recent Advances in Natural Language Processing, RANLP. vol. 2015-January, Association for Computational Linguistics (ACL), pp. 24-31, 10th International Conference on Recent Advances in Natural Language Processing, RANLP 2015, Hissar, Bulgaria, 9/7/15.
Assylbekov Z, Nurkas A, Mouga IR. A statistical model for measuring structural similarity between webpages. In International Conference Recent Advances in Natural Language Processing, RANLP. Vol. 2015-January. Association for Computational Linguistics (ACL). 2015. p. 24-31
Assylbekov, Zhenisbek ; Nurkas, Assulan ; Mouga, Inês Russinho. / A statistical model for measuring structural similarity between webpages. International Conference Recent Advances in Natural Language Processing, RANLP. Vol. 2015-January Association for Computational Linguistics (ACL), 2015. pp. 24-31
@inproceedings{df1b481cafef454dbfae6110ef3200a4,
title = "A statistical model for measuring structural similarity between webpages",
abstract = "This paper presents a statistical model for measuring structural similarity between webpages from bilingual websites. Starting from basic assumptions we derive the model and propose an algorithm to estimate its parameters in unsupervised manner. Statistical approach appears to benefit the structural similarity measure: in the task of distinguishing parallel webpages from bilingual websites our languageindependent model demonstrates an Fscore of 0.94-0.99 which is comparable to the results of language-dependent methods involving content similarity measures.",
author = "Zhenisbek Assylbekov and Assulan Nurkas and Mouga, {In{\^e}s Russinho}",
year = "2015",
language = "English",
volume = "2015-January",
pages = "24--31",
booktitle = "International Conference Recent Advances in Natural Language Processing, RANLP",
publisher = "Association for Computational Linguistics (ACL)",

}

TY - GEN

T1 - A statistical model for measuring structural similarity between webpages

AU - Assylbekov, Zhenisbek

AU - Nurkas, Assulan

AU - Mouga, Inês Russinho

PY - 2015

Y1 - 2015

N2 - This paper presents a statistical model for measuring structural similarity between webpages from bilingual websites. Starting from basic assumptions we derive the model and propose an algorithm to estimate its parameters in unsupervised manner. Statistical approach appears to benefit the structural similarity measure: in the task of distinguishing parallel webpages from bilingual websites our languageindependent model demonstrates an Fscore of 0.94-0.99 which is comparable to the results of language-dependent methods involving content similarity measures.

AB - This paper presents a statistical model for measuring structural similarity between webpages from bilingual websites. Starting from basic assumptions we derive the model and propose an algorithm to estimate its parameters in unsupervised manner. Statistical approach appears to benefit the structural similarity measure: in the task of distinguishing parallel webpages from bilingual websites our languageindependent model demonstrates an Fscore of 0.94-0.99 which is comparable to the results of language-dependent methods involving content similarity measures.

UR - http://www.scopus.com/inward/record.url?scp=84949778562&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949778562&partnerID=8YFLogxK

M3 - Conference contribution

VL - 2015-January

SP - 24

EP - 31

BT - International Conference Recent Advances in Natural Language Processing, RANLP

PB - Association for Computational Linguistics (ACL)

ER -