Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

This query has been bothering me for the past 10 hours. Here we go:

I want to do a comparison to some data I am pulling. I am pulling names and I want to remove names that are similar and have them not return in the query.

Example:

I have the following names:

  • Seaside Heights
  • Seaside HGTS
  • Talladega
  • Tornkal Center
  • Tornkal CTR
  • Yonkers
  • Zebraville

I want it to return like this:

  • Seaside Heights
  • Talladega
  • Tornkal Center
  • Yonkers
  • Zebraville

Basically I think it should be substring(name, 0, 8) to get the first 8 characters then run that 8 characters against the next entry and if they match to ignore it.

Maybe I am thinking way to deep into this. Any insight or concepts that might work will be appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
150 views
Welcome To Ask or Share your Answers For Others

1 Answer

First, you would query all the data.

Then for every record returned you want to run the LCS algorithm (Longest Common Subsequence).

If the longest common Subsequence between two different records is of a number of your choosing then you can class them as similar.

http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

edit: It just so happens there's a nice PHP function for this: http://php.net/manual/en/function.similar-text.php


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...