## Abstract

We consider Conditional Random Fields (CRFs) with pattern-based potentials defined on a chain. In this model the energy of a string (labeling) x _{1}... x_{n} is the sum of terms over intervals [i, j] where each term is non-zero only if the substring x_{i}... xj equals a prespecified pattern α. Such CRFs can be naturally applied to many sequence tagging problems. We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP. Their complexities are respectively O(nL), 0(nLℓ_{max}) and O(nL min{|D|, log(ℓ_{max}+1)}) where L is the combined length of input patterns, ℓ_{max} is the maximum length of a pattern, and D is the input alphabet. This improves on the previous algorithms of (Ye et al., 2009) whose complexities are respectively O(nL\D\), O (n|Γ|L^{2}ℓ_{max}^{2}) and O(nL\D\), where |Γ| is the number of input patterns. In addition, we give an efficient algorithm for sampling, and revisit the case of MAP with non-positive weights. Finally, we apply pattern-based CRFs to the problem of the protein dihedral angles prediction.

Original language | English |
---|---|

Pages | 1182-1190 |

Number of pages | 9 |

Publication status | Published - Jan 1 2013 |

Event | 30th International Conference on Machine Learning, ICML 2013 - Atlanta, GA, United States Duration: Jun 16 2013 → Jun 21 2013 |

### Other

Other | 30th International Conference on Machine Learning, ICML 2013 |
---|---|

Country | United States |

City | Atlanta, GA |

Period | 6/16/13 → 6/21/13 |

## ASJC Scopus subject areas

- Human-Computer Interaction
- Sociology and Political Science