Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have this program which is supposed to find the Longest Common Substring of a number of strings. Which it does, but if the strings are very long (i.e. >8000 characters long), it works slowly (1.5 seconds). Is there any way to optimise that?

The program is this:

//#include "stdafx.h"
#include <iostream>
#include <string>
#include <vector>
#include <cassert>


using namespace std;

const unsigned short MAX_STRINGS = 10;
const unsigned int  MAX_SIZE=10000;
vector<string> strings;
unsigned int len;

string GetLongestCommonSubstring( string string1, string string2 );
inline void readNumberSubstrings();
inline const string getMaxSubstring();

void readNumberSubstrings()
{
    cin >> len;

    assert(len > 1 && len <=MAX_STRINGS);

    strings.resize(len);

    for(register unsigned int i=0; i<len;i++)
        strings[i]=string(MAX_SIZE,0);

    for(register unsigned int i=0; i<len; i++)
        cin>>strings[i];
}

 const string getMaxSubstring()
{
    string maxSubstring=strings[0];
    for(register unsigned int i=1; i < len; i++)
        maxSubstring=GetLongestCommonSubstring(maxSubstring, strings[i]);
    return maxSubstring;
}

string GetLongestCommonSubstring( string string1, string string2 ) 
{

    const int solution_size = string2.length()+ 1;

    int *x=new int[solution_size]();
    int *y= new int[solution_size]();

    int **previous = &x;
    int **current = &y;

    int max_length = 0;
    int result_index = 0;

    int j;
    int length;
    int M=string2.length() - 1;

    for(register int i = string1.length() - 1; i >= 0; i--)
    {
        for(register int j = M; j >= 0; j--) 
        {
            if(string1[i] != string2[j]) 
                (*current)[j] = 0;
            else 
            {
                length = 1 + (*previous)[j + 1];
                if (length > max_length)
                {
                    max_length = length;
                    result_index = i;
                }

                (*current)[j] = length;
            }
        }

        swap(previous, current);
    }
    string1[max_length+result_index]='';
    return &(string1[result_index]);
}

int main()
{
    readNumberSubstrings();
    cout << getMaxSubstring() << endl;
    return 0;
}

Note: there is a reason why I didn't write code that would solve this problem with suffix trees (they're large).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
160 views
Welcome To Ask or Share your Answers For Others

1 Answer

Often when it comes to optimization, a different approach might be your only true option rather than trying to incrementally improve the current implementation. Here's my idea:

  • create a list of valid characters that might appear in the longest common substring. I.e., if a character doesn't appear in all strings, it can't be part of the longest common substring.

  • separate each string into multiple strings containing only valid characters

  • for every such string, create every possible substring and add it to the list as well

  • filter (as with the characters) all strings, that don't show up in all lists.

The complexity of this obviously depends largely on the number of invalid characters. if it's zero, this approach doesn't help at all.

Some remarks on your code: Don't try to be overly clever. The compiler will optimize so much, there's really no need for you to put register in your code. Second, your allocating strings and then overwrite them (in readNumberSubstrings), that's totally unnecessary. Third, pass by const reference if you can. Fourth, don't use raw pointers, especially if you never delete [] your new []d objects. Use std::vectors instead, it behaves well with exceptions (which you might encounter, you're using strings a lot!).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...