Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

what's the best re way to remove brackets and their content, as well as the trailing whitespace within a string? Note that not every string is formatted equally.

Script:

import pandas as pd
import re

df = pd.DataFrame({'name':
          ['University of Southampton (UK)', 
          'The College of William and Mary', 
          'University of Reading (UK)', 
          'Queensland University (Australia)']})

def cleaning(text):
    cleaned = re.findall(re.compile('^([^,]+).+'), text)
    cleaned = re.findall(re.compile('(.*)'), str(cleaned)) # Why do I have to str() here btw?
    return cleaned

df['name'].apply(lambda x: cleaning(x))

Returns:

0    []
1    []
2    []
3    []

Desired output (no whitespace at the end):

0    University of Southampton
1    The College of William and Mary
2    University of Reading
3    Queensland University

Thanks for your help!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
170 views
Welcome To Ask or Share your Answers For Others

1 Answer

Only work for this specific case, but you can do

df.name.str.split('(',expand=True)[0].str.strip()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...