### Abstract

Given a set S of k strings of maximum length n, the goal of the closest substring problem (CSSP) is to find the smallest integer d (and a corresponding string t of length ℓ ≤ n) such that each string s ∈ S has a substring of length ℓ of "distance" at most d to t. The closest string problem (CSP) is a special case of CSSP where ℓ = n. CSP and CSSP arise in many applications in bioinformatics and are extensively studied in the context of Hamming and edit distance. In this paper we consider a recently introduced distance measure, namely the rank distance. First, we show that the CSP and CSSP via rank distance are NP-hard. Then, we present a polynomial time k-approximation algorithm for the CSP problem. Finally, we give a parametrized algorithm for the CSP (the parameter is the number of input strings) if the alphabet is binary and each string has the same number of 0's and 1's.

Original language | English |
---|---|

Title of host publication | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |

Pages | 413-426 |

Number of pages | 14 |

Volume | 7354 LNCS |

DOIs | |

Publication status | Published - 2012 |

Externally published | Yes |

Event | 23rd Annual Symposium on Combinatorial Pattern Matching, CPM 2012 - Helsinki, Finland Duration: Jul 3 2012 → Jul 5 2012 |

### Publication series

Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|

Volume | 7354 LNCS |

ISSN (Print) | 03029743 |

ISSN (Electronic) | 16113349 |

### Other

Other | 23rd Annual Symposium on Combinatorial Pattern Matching, CPM 2012 |
---|---|

Country | Finland |

City | Helsinki |

Period | 7/3/12 → 7/5/12 |

### Fingerprint

### ASJC Scopus subject areas

- Computer Science(all)
- Theoretical Computer Science

### Cite this

*Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)*(Vol. 7354 LNCS, pp. 413-426). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7354 LNCS). https://doi.org/10.1007/978-3-642-31265-6_33

**On the closest string via rank distance.** / Dinu, Liviu P.; Popa, Alexandru.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).*vol. 7354 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7354 LNCS, pp. 413-426, 23rd Annual Symposium on Combinatorial Pattern Matching, CPM 2012, Helsinki, Finland, 7/3/12. https://doi.org/10.1007/978-3-642-31265-6_33

}

TY - GEN

T1 - On the closest string via rank distance

AU - Dinu, Liviu P.

AU - Popa, Alexandru

PY - 2012

Y1 - 2012

N2 - Given a set S of k strings of maximum length n, the goal of the closest substring problem (CSSP) is to find the smallest integer d (and a corresponding string t of length ℓ ≤ n) such that each string s ∈ S has a substring of length ℓ of "distance" at most d to t. The closest string problem (CSP) is a special case of CSSP where ℓ = n. CSP and CSSP arise in many applications in bioinformatics and are extensively studied in the context of Hamming and edit distance. In this paper we consider a recently introduced distance measure, namely the rank distance. First, we show that the CSP and CSSP via rank distance are NP-hard. Then, we present a polynomial time k-approximation algorithm for the CSP problem. Finally, we give a parametrized algorithm for the CSP (the parameter is the number of input strings) if the alphabet is binary and each string has the same number of 0's and 1's.

AB - Given a set S of k strings of maximum length n, the goal of the closest substring problem (CSSP) is to find the smallest integer d (and a corresponding string t of length ℓ ≤ n) such that each string s ∈ S has a substring of length ℓ of "distance" at most d to t. The closest string problem (CSP) is a special case of CSSP where ℓ = n. CSP and CSSP arise in many applications in bioinformatics and are extensively studied in the context of Hamming and edit distance. In this paper we consider a recently introduced distance measure, namely the rank distance. First, we show that the CSP and CSSP via rank distance are NP-hard. Then, we present a polynomial time k-approximation algorithm for the CSP problem. Finally, we give a parametrized algorithm for the CSP (the parameter is the number of input strings) if the alphabet is binary and each string has the same number of 0's and 1's.

UR - http://www.scopus.com/inward/record.url?scp=84861902988&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84861902988&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-31265-6_33

DO - 10.1007/978-3-642-31265-6_33

M3 - Conference contribution

SN - 9783642312649

VL - 7354 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 413

EP - 426

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -