pandas - Python Text to Data Frame with Specific Pattern

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

pandas - Python Text to Data Frame with Specific Pattern

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

I am trying to convert a bunch of text files into a data frame using Pandas.

Thanks to Stack Overflow's amazing community, I almost got the desired output (OP: Python Text File to Data Frame with Specific Pattern).

Basically I need to turn a text with specific patterns (but sometimes missing data) into a data frame using Pandas.

Here is an example:

Number 01600 London                           Register  4314

Some random text...

************************************* B ***************************************
 1 SHARE: 73/1284
   John Smith
   BORN: 1960-01-01 ADDR: Streetname 3/2   1000
   f 4222/2001
   h 1334/2000
   i 5774/2000
 4 SHARE: 58/1284
   Boris Morgan
   BORN:            ADDR: Streetname 4   2000
 5 SHARE: 23/1284
   James Klein
   BORN:            ADDR:      
   c 4222/1988 Supporting Text
   f 4222/2000 Extra Text
************************************* C ***************************************
More random text...

From the example above, I need to transform the text between ***B*** and ***C*** into a data frame with the following output:

Number	Register	City	Id	Share	Name	Born	Address	c	f	h	i
01600	4314	London	1	73/1284	John Smith	1960-01-01	Streetname 3/2 1000	NaN	4222/2001	1334/2000	5774/2000
01600	4314	London	4	58/1284	Boris Morgan	NaN	Streetname 4 2000	NaN	NaN	NaN	NaN
01600	4314	London	5	23/1284	James Klein	NaN	NaN	4222/1988 Supporting Text	4222/2000 Extra Text	NaN	NaN

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

138 views

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:13:41+0000

I wouldn't use the same variable i for both inner and outer loops. Changing your for loop to the following should work cleaner:

for i in items:
    d = {'Number': number, 
         'Register': register, 
         'City': city, 
         'Id': int(i[0].split()[0]), 
         'Share': i[0].split(': ')[1], 
         'Name': i[1], 
         }
    
    if "ADDR" in i[2]:
        born, address = i[2].split("ADDR:")
        d['Born'] = born.replace("BORN:", "").strip()
        d['Address'] = address.strip()
    else:
        d['Born']: i[2].split()[1]
    
    if len(i)>3:
        for j in i[3:]:
            key, value = j.split(" ", 1)
            d[key] = value
    data.append(d)

#load the list of dicts as a dataframe
df = pd.DataFrame(data)

Categories

pandas - Python Text to Data Frame with Specific Pattern

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags