Inference algorithms for pattern-based CRFs on sequence data

Rustem Takhanov, Vladimir Kolmogorov

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

We consider Conditional Random Fields (CRFs) with pattern-based potentials defined on a chain. In this model the energy of a string (labeling) x 1... xn is the sum of terms over intervals [i, j] where each term is non-zero only if the substring xi... xj equals a prespecified pattern α. Such CRFs can be naturally applied to many sequence tagging problems. We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP. Their complexities are respectively O(nL), 0(nLℓmax) and O(nL min{|D|, log(ℓmax+1)}) where L is the combined length of input patterns, ℓmax is the maximum length of a pattern, and D is the input alphabet. This improves on the previous algorithms of (Ye et al., 2009) whose complexities are respectively O(nL\D\), O (n|Γ|L2max 2) and O(nL\D\), where |Γ| is the number of input patterns. In addition, we give an efficient algorithm for sampling, and revisit the case of MAP with non-positive weights. Finally, we apply pattern-based CRFs to the problem of the protein dihedral angles prediction.

Original languageEnglish
Title of host publication30th International Conference on Machine Learning, ICML 2013
PublisherInternational Machine Learning Society (IMLS)
Pages1182-1190
Number of pages9
EditionPART 2
Publication statusPublished - 2013
Externally publishedYes
Event30th International Conference on Machine Learning, ICML 2013 - Atlanta, GA, United States
Duration: Jun 16 2013Jun 21 2013

Other

Other30th International Conference on Machine Learning, ICML 2013
CountryUnited States
CityAtlanta, GA
Period6/16/136/21/13

Fingerprint

Dihedral angle
Labeling
Sampling
Proteins
energy

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Sociology and Political Science

Cite this

Takhanov, R., & Kolmogorov, V. (2013). Inference algorithms for pattern-based CRFs on sequence data. In 30th International Conference on Machine Learning, ICML 2013 (PART 2 ed., pp. 1182-1190). International Machine Learning Society (IMLS).

Inference algorithms for pattern-based CRFs on sequence data. / Takhanov, Rustem; Kolmogorov, Vladimir.

30th International Conference on Machine Learning, ICML 2013. PART 2. ed. International Machine Learning Society (IMLS), 2013. p. 1182-1190.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Takhanov, R & Kolmogorov, V 2013, Inference algorithms for pattern-based CRFs on sequence data. in 30th International Conference on Machine Learning, ICML 2013. PART 2 edn, International Machine Learning Society (IMLS), pp. 1182-1190, 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, United States, 6/16/13.
Takhanov R, Kolmogorov V. Inference algorithms for pattern-based CRFs on sequence data. In 30th International Conference on Machine Learning, ICML 2013. PART 2 ed. International Machine Learning Society (IMLS). 2013. p. 1182-1190
Takhanov, Rustem ; Kolmogorov, Vladimir. / Inference algorithms for pattern-based CRFs on sequence data. 30th International Conference on Machine Learning, ICML 2013. PART 2. ed. International Machine Learning Society (IMLS), 2013. pp. 1182-1190
@inproceedings{ca9b4c0dedd641b59ab658ad0bc7333b,
title = "Inference algorithms for pattern-based CRFs on sequence data",
abstract = "We consider Conditional Random Fields (CRFs) with pattern-based potentials defined on a chain. In this model the energy of a string (labeling) x 1... xn is the sum of terms over intervals [i, j] where each term is non-zero only if the substring xi... xj equals a prespecified pattern α. Such CRFs can be naturally applied to many sequence tagging problems. We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP. Their complexities are respectively O(nL), 0(nLℓmax) and O(nL min{|D|, log(ℓmax+1)}) where L is the combined length of input patterns, ℓmax is the maximum length of a pattern, and D is the input alphabet. This improves on the previous algorithms of (Ye et al., 2009) whose complexities are respectively O(nL\D\), O (n|Γ|L2ℓmax 2) and O(nL\D\), where |Γ| is the number of input patterns. In addition, we give an efficient algorithm for sampling, and revisit the case of MAP with non-positive weights. Finally, we apply pattern-based CRFs to the problem of the protein dihedral angles prediction.",
author = "Rustem Takhanov and Vladimir Kolmogorov",
year = "2013",
language = "English",
pages = "1182--1190",
booktitle = "30th International Conference on Machine Learning, ICML 2013",
publisher = "International Machine Learning Society (IMLS)",
edition = "PART 2",

}

TY - GEN

T1 - Inference algorithms for pattern-based CRFs on sequence data

AU - Takhanov, Rustem

AU - Kolmogorov, Vladimir

PY - 2013

Y1 - 2013

N2 - We consider Conditional Random Fields (CRFs) with pattern-based potentials defined on a chain. In this model the energy of a string (labeling) x 1... xn is the sum of terms over intervals [i, j] where each term is non-zero only if the substring xi... xj equals a prespecified pattern α. Such CRFs can be naturally applied to many sequence tagging problems. We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP. Their complexities are respectively O(nL), 0(nLℓmax) and O(nL min{|D|, log(ℓmax+1)}) where L is the combined length of input patterns, ℓmax is the maximum length of a pattern, and D is the input alphabet. This improves on the previous algorithms of (Ye et al., 2009) whose complexities are respectively O(nL\D\), O (n|Γ|L2ℓmax 2) and O(nL\D\), where |Γ| is the number of input patterns. In addition, we give an efficient algorithm for sampling, and revisit the case of MAP with non-positive weights. Finally, we apply pattern-based CRFs to the problem of the protein dihedral angles prediction.

AB - We consider Conditional Random Fields (CRFs) with pattern-based potentials defined on a chain. In this model the energy of a string (labeling) x 1... xn is the sum of terms over intervals [i, j] where each term is non-zero only if the substring xi... xj equals a prespecified pattern α. Such CRFs can be naturally applied to many sequence tagging problems. We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP. Their complexities are respectively O(nL), 0(nLℓmax) and O(nL min{|D|, log(ℓmax+1)}) where L is the combined length of input patterns, ℓmax is the maximum length of a pattern, and D is the input alphabet. This improves on the previous algorithms of (Ye et al., 2009) whose complexities are respectively O(nL\D\), O (n|Γ|L2ℓmax 2) and O(nL\D\), where |Γ| is the number of input patterns. In addition, we give an efficient algorithm for sampling, and revisit the case of MAP with non-positive weights. Finally, we apply pattern-based CRFs to the problem of the protein dihedral angles prediction.

UR - http://www.scopus.com/inward/record.url?scp=84897534660&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84897534660&partnerID=8YFLogxK

M3 - Conference contribution

SP - 1182

EP - 1190

BT - 30th International Conference on Machine Learning, ICML 2013

PB - International Machine Learning Society (IMLS)

ER -