Inference Algorithms for Pattern-Based CRFs on Sequence Data

Vladimir Kolmogorov, Rustem Takhanov

    Research output: Contribution to journalArticle

    Abstract

    We consider Conditional random fields (CRFs) with pattern-based potentials defined on a chain. In this model the energy of a string (labeling) (Formula presented.) is the sum of terms over intervals [i, j] where each term is non-zero only if the substring (Formula presented.) equals a prespecified pattern w. Such CRFs can be naturally applied to many sequence tagging problems. We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP. Their complexities are respectively (Formula presented.), (Formula presented.) and (Formula presented.) where L is the combined length of input patterns, (Formula presented.) is the maximum length of a pattern, and D is the input alphabet. This improves on the previous algorithms of Ye et al. (NIPS, 2009) whose complexities are respectively (Formula presented.), (Formula presented.) and (Formula presented.), where (Formula presented.) is the number of input patterns. In addition, we give an efficient algorithm for sampling, and revisit the case of MAP with non-positive weights.

    Original languageEnglish
    JournalAlgorithmica
    DOIs
    Publication statusAccepted/In press - Jun 20 2015

    Fingerprint

    Conditional Random Fields
    Labeling
    Sampling
    Efficient Algorithms
    Computing
    Tagging
    Term
    Partition Function
    Strings

    Keywords

    • Conditional random fields
    • Sequence tagging
    • String algorithms

    ASJC Scopus subject areas

    • Computer Science(all)
    • Computer Science Applications
    • Applied Mathematics

    Cite this

    Inference Algorithms for Pattern-Based CRFs on Sequence Data. / Kolmogorov, Vladimir; Takhanov, Rustem.

    In: Algorithmica, 20.06.2015.

    Research output: Contribution to journalArticle

    @article{d561d631ad7949a983534688e13503ea,
    title = "Inference Algorithms for Pattern-Based CRFs on Sequence Data",
    abstract = "We consider Conditional random fields (CRFs) with pattern-based potentials defined on a chain. In this model the energy of a string (labeling) (Formula presented.) is the sum of terms over intervals [i, j] where each term is non-zero only if the substring (Formula presented.) equals a prespecified pattern w. Such CRFs can be naturally applied to many sequence tagging problems. We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP. Their complexities are respectively (Formula presented.), (Formula presented.) and (Formula presented.) where L is the combined length of input patterns, (Formula presented.) is the maximum length of a pattern, and D is the input alphabet. This improves on the previous algorithms of Ye et al. (NIPS, 2009) whose complexities are respectively (Formula presented.), (Formula presented.) and (Formula presented.), where (Formula presented.) is the number of input patterns. In addition, we give an efficient algorithm for sampling, and revisit the case of MAP with non-positive weights.",
    keywords = "Conditional random fields, Sequence tagging, String algorithms",
    author = "Vladimir Kolmogorov and Rustem Takhanov",
    year = "2015",
    month = "6",
    day = "20",
    doi = "10.1007/s00453-015-0017-7",
    language = "English",
    journal = "Algorithmica",
    issn = "0178-4617",
    publisher = "Springer New York",

    }

    TY - JOUR

    T1 - Inference Algorithms for Pattern-Based CRFs on Sequence Data

    AU - Kolmogorov, Vladimir

    AU - Takhanov, Rustem

    PY - 2015/6/20

    Y1 - 2015/6/20

    N2 - We consider Conditional random fields (CRFs) with pattern-based potentials defined on a chain. In this model the energy of a string (labeling) (Formula presented.) is the sum of terms over intervals [i, j] where each term is non-zero only if the substring (Formula presented.) equals a prespecified pattern w. Such CRFs can be naturally applied to many sequence tagging problems. We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP. Their complexities are respectively (Formula presented.), (Formula presented.) and (Formula presented.) where L is the combined length of input patterns, (Formula presented.) is the maximum length of a pattern, and D is the input alphabet. This improves on the previous algorithms of Ye et al. (NIPS, 2009) whose complexities are respectively (Formula presented.), (Formula presented.) and (Formula presented.), where (Formula presented.) is the number of input patterns. In addition, we give an efficient algorithm for sampling, and revisit the case of MAP with non-positive weights.

    AB - We consider Conditional random fields (CRFs) with pattern-based potentials defined on a chain. In this model the energy of a string (labeling) (Formula presented.) is the sum of terms over intervals [i, j] where each term is non-zero only if the substring (Formula presented.) equals a prespecified pattern w. Such CRFs can be naturally applied to many sequence tagging problems. We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP. Their complexities are respectively (Formula presented.), (Formula presented.) and (Formula presented.) where L is the combined length of input patterns, (Formula presented.) is the maximum length of a pattern, and D is the input alphabet. This improves on the previous algorithms of Ye et al. (NIPS, 2009) whose complexities are respectively (Formula presented.), (Formula presented.) and (Formula presented.), where (Formula presented.) is the number of input patterns. In addition, we give an efficient algorithm for sampling, and revisit the case of MAP with non-positive weights.

    KW - Conditional random fields

    KW - Sequence tagging

    KW - String algorithms

    UR - http://www.scopus.com/inward/record.url?scp=84931864193&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84931864193&partnerID=8YFLogxK

    U2 - 10.1007/s00453-015-0017-7

    DO - 10.1007/s00453-015-0017-7

    M3 - Article

    JO - Algorithmica

    JF - Algorithmica

    SN - 0178-4617

    ER -