Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm looking for a way to clean strings from their longest repeating pattern.

I have a list of approximately 1000 web pages titles, and they all share a common suffix, which is the name of the website.

They follow this pattern:

['art gallery - museum and visits | expand knowledge',
 'lasergame - entertainment | expand knowledge',
 'coffee shop - confort and food | expand knowledge',
 ...
]

How could I automatically strip all strings from their common suffix " | expand knowledge" ?

Thanks!

Edit: Sorry, I did not make myself clear enough. I have no information about the " | expand knowledge" suffix in advance. I want to be able to clear a list of strings of a potential common suffix, even if I do not know what it is.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
313 views
Welcome To Ask or Share your Answers For Others

1 Answer

Here's a solution using the os.path.commonprefix function on the reversed titles:

titles = ['art gallery - museum and visits | expand knowledge',
 'lasergame - entertainment | expand knowledge',
 'coffee shop - confort and food | expand knowledge',
]

# Find the longest common suffix by reversing the strings and using a 
# library function to find the common "prefix".
common_suffix = os.path.commonprefix([title[::-1] for title in titles])[::-1]

# Strips all titles from the number of characters in the common suffix.
stripped_titles = [title[:-len(common_suffix)] for title in titles]

Result:

['art gallery - museum and visits', 'lasergame - entertainment', 'coffee shop - confort and food']

Because it finds the common suffix by itself, it should work on any group of titles, even if you don't know the suffix.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...