Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have an assignment that requires me to use regular expressions in python to find alliterative expressions in a file that consists of a list of names. Here are the specific instructions: " Open a file and return all of the alliterative names in the file. For our purposes a "name" is a two sequences of letters separated by a space, with capital letters only in the leading positions. We call a name alliterative if the first and last names begin with the same letter, with the exception that s and sh are considered distinct, and likewise for c/ch and t/th.The names file will contain a list of strings separated by commas.Suggestion: Do this in two stages." This is my attempt so far:

def check(regex, string, flags=0):
return not (re.match("(?:" + regex + r")", string, flags=flags)) is None 
def alliterative(names_file):
f = open(names_file)
string = f.read()
lst = string.split(',')
lst2 = []
for i in lst:
    x=lst[i]
    if re.search(r'[A-Z][a-z]* [A-Z][a-z]*', x):
        k=x.split(' ')
        if check('{}'.format(k[0][0]), k[1]):
            if not check('[cst]', k[0][0]):
                lst2.append(x)
            elif len(k[0])==1:
                if len(k[1])==1:
                    lst2.append(x)
                elif not check('h',k[1][1]):
                    lst2.append(x)
            elif len(k[1])==1:
                if not check('h',k[0][1]):
                    lst2.append(x)
return lst2

There are two issues that I have: first, what I coded seems to make sense to me, the general idea behind it is that I first check that the names are in the correct format (first name, last name, all letters only, only first letters of first and last names capitalized), then check to see if the starting letters of the first and last names match, then see if those first letters are not c s or t, if they aren't we add the name to the new list, if they are, we check to see that we aren't accidentally matching a [cst] with an [cst]h. The code compiles but when I tried to run it on this list of names: Umesh Vazirani, Vijay Vazirani, Barbara Liskov, Leslie Lamport, Scott Shenker, R2D2 Rover, Shaq, Sam Spade, Thomas Thing

it returns an empty list instead of ["Vijay Vazirani", "Leslie Lamport", "Sam Spade", "Thomas Thing"] which it is supposed to return. I added print statements to alliterative so see where things were going wrong and it seems that the line if check('{}'.format(k[0][0]), k[1]): is an issue.

More than the issues with my program though, I feel like I am missing the point of regular expressions: am I overcomplicating this? Is there a nicer way to do this with regular expressions?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
120 views
Welcome To Ask or Share your Answers For Others

1 Answer

Please consider improving your question.

Especially the question is only useful for those who want to answer to the exactly the same question, which I think is almost no chance. Please think how to improve so that it can be generallized to the point where this QA can be helpful to others.


I think your direction is about right.

  • It's a good idea to check the input rightness using regular expression. r'[A-Z][a-z]* [A-Z][a-z]*' is a good expression.
  • You can group the output by parentheses. So that you can easily get first and last name later on
  • Keep in mind the difference between re.match and re.search. re.search(r'[A-Z][a-z]* [A-Z][a-z]*', 'aaRob Smith') returns a MatchObject. See this.

Also comment on general programming style

  • Better to name variables first and last for readability, rather than k[0] and k[1] (and how is the letter k picked!?)

Here's one way to do:

import re

FULL_NAME_RE = re.compile(r'^([A-Z][a-z]*) ([A-Z][a-z]*)$')

def is_alliterative(name):
    """Returns True if it matches the alliterative requirement otherwise False"""
    # If not matches the name requirement, reject
    match = FULL_NAME_RE.match(name)
    if not match:
        return False
    first, last = match.group(1, 2)
    first, last = first.lower(), last.lower()  # easy to assume all lower-cases

    if first[0] != last[0]:
        return False

    if first[0] in 'cst':  # Check sh/ch/th
        # Do special check
        return _is_cst_h(first) == _is_cst_h(last)

    # All check passed!
    return True


def _is_cst_h(text):
    """Returns true if text is one of 'ch', 'sh', or 'th'."""
    # Bad (?) assumption that the first letter is c, s, or t
    return text[1:].startswith('h')


names = [
    'Umesh Vazirani', 'Vijay Vazirani' , 'Barbara Liskov',
    'Leslie Lamport', 'Scott Shenker', 'R2D2 Rover', 'Shaq' , 'Sam Spade', 'Thomas Thing'
]
print [name for name in names if is_alliterative(name)]
# Ans
print ['Vijay Vazirani', 'Leslie Lamport', 'Sam Spade', 'Thomas Thing']

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...