## Abstract

We consider Conditional random fields (CRFs) with pattern-based potentials defined on a chain. In this model the energy of a string (labeling) x_{1}… x_{n} is the sum of terms over intervals [i, j] where each term is non-zero only if the substring x_{i}… x_{j} equals a prespecified pattern w. Such CRFs can be naturally applied to many sequence tagging problems. We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP. Their complexities are respectively O(nL) , O(nLℓ_{max}) and O(nLmin{|D|,log(ℓmax+1)}) where L is the combined length of input patterns, ℓ_{max} is the maximum length of a pattern, and D is the input alphabet. This improves on the previous algorithms of Ye et al. (NIPS, 2009) whose complexities are respectively O(nL| D|) , O(n|Γ|L2ℓmax2) and O(nL| D|) , where | Γ| is the number of input patterns. In addition, we give an efficient algorithm for sampling, and revisit the case of MAP with non-positive weights.

Original language | English |
---|---|

Pages (from-to) | 17-46 |

Number of pages | 30 |

Journal | Algorithmica |

Volume | 76 |

Issue number | 1 |

DOIs | |

Publication status | Published - Sep 1 2016 |

## Keywords

- Conditional random fields
- Sequence tagging
- String algorithms

## ASJC Scopus subject areas

- Computer Science(all)
- Computer Science Applications
- Applied Mathematics