### Abstract

We consider Conditional random fields (CRFs) with pattern-based potentials defined on a chain. In this model the energy of a string (labeling) (Formula presented.) is the sum of terms over intervals [i, j] where each term is non-zero only if the substring (Formula presented.) equals a prespecified pattern w. Such CRFs can be naturally applied to many sequence tagging problems. We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP. Their complexities are respectively (Formula presented.), (Formula presented.) and (Formula presented.) where L is the combined length of input patterns, (Formula presented.) is the maximum length of a pattern, and D is the input alphabet. This improves on the previous algorithms of Ye et al. (NIPS, 2009) whose complexities are respectively (Formula presented.), (Formula presented.) and (Formula presented.), where (Formula presented.) is the number of input patterns. In addition, we give an efficient algorithm for sampling, and revisit the case of MAP with non-positive weights.

Original language | English |
---|---|

Journal | Algorithmica |

DOIs | |

Publication status | Accepted/In press - Jun 20 2015 |

### Fingerprint

### Keywords

- Conditional random fields
- Sequence tagging
- String algorithms

### ASJC Scopus subject areas

- Computer Science(all)
- Computer Science Applications
- Applied Mathematics

### Cite this

*Algorithmica*. https://doi.org/10.1007/s00453-015-0017-7

**Inference Algorithms for Pattern-Based CRFs on Sequence Data.** / Kolmogorov, Vladimir; Takhanov, Rustem.

Research output: Contribution to journal › Article

}

TY - JOUR

T1 - Inference Algorithms for Pattern-Based CRFs on Sequence Data

AU - Kolmogorov, Vladimir

AU - Takhanov, Rustem

PY - 2015/6/20

Y1 - 2015/6/20

N2 - We consider Conditional random fields (CRFs) with pattern-based potentials defined on a chain. In this model the energy of a string (labeling) (Formula presented.) is the sum of terms over intervals [i, j] where each term is non-zero only if the substring (Formula presented.) equals a prespecified pattern w. Such CRFs can be naturally applied to many sequence tagging problems. We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP. Their complexities are respectively (Formula presented.), (Formula presented.) and (Formula presented.) where L is the combined length of input patterns, (Formula presented.) is the maximum length of a pattern, and D is the input alphabet. This improves on the previous algorithms of Ye et al. (NIPS, 2009) whose complexities are respectively (Formula presented.), (Formula presented.) and (Formula presented.), where (Formula presented.) is the number of input patterns. In addition, we give an efficient algorithm for sampling, and revisit the case of MAP with non-positive weights.

AB - We consider Conditional random fields (CRFs) with pattern-based potentials defined on a chain. In this model the energy of a string (labeling) (Formula presented.) is the sum of terms over intervals [i, j] where each term is non-zero only if the substring (Formula presented.) equals a prespecified pattern w. Such CRFs can be naturally applied to many sequence tagging problems. We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP. Their complexities are respectively (Formula presented.), (Formula presented.) and (Formula presented.) where L is the combined length of input patterns, (Formula presented.) is the maximum length of a pattern, and D is the input alphabet. This improves on the previous algorithms of Ye et al. (NIPS, 2009) whose complexities are respectively (Formula presented.), (Formula presented.) and (Formula presented.), where (Formula presented.) is the number of input patterns. In addition, we give an efficient algorithm for sampling, and revisit the case of MAP with non-positive weights.

KW - Conditional random fields

KW - Sequence tagging

KW - String algorithms

UR - http://www.scopus.com/inward/record.url?scp=84931864193&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84931864193&partnerID=8YFLogxK

U2 - 10.1007/s00453-015-0017-7

DO - 10.1007/s00453-015-0017-7

M3 - Article

JO - Algorithmica

JF - Algorithmica

SN - 0178-4617

ER -