Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am trying to convert a bunch of text files into a data frame using Pandas.

Thanks to Stack Overflow's amazing community, I almost got the desired output (OP: Python Text File to Data Frame with Specific Pattern).

Basically I need to turn a text with specific patterns (but sometimes missing data) into a data frame using Pandas.

Here is an example:

Number 01600 London                           Register  4314

Some random text...

************************************* B ***************************************
 1 SHARE: 73/1284
   John Smith
   BORN: 1960-01-01 ADDR: Streetname 3/2   1000
   f 4222/2001
   h 1334/2000
   i 5774/2000
 4 SHARE: 58/1284
   Boris Morgan
   BORN:            ADDR: Streetname 4   2000
 5 SHARE: 23/1284
   James Klein
   BORN:            ADDR:      
   c 4222/1988 Supporting Text
   f 4222/2000 Extra Text
************************************* C ***************************************
More random text...

From the example above, I need to transform the text between ***B*** and ***C*** into a data frame with the following output:

Number Register City Id Share Name Born Address c f h i
01600 4314 London 1 73/1284 John Smith 1960-01-01 Streetname 3/2 1000 NaN 4222/2001 1334/2000 5774/2000
01600 4314 London 4 58/1284 Boris Morgan NaN Streetname 4 2000 NaN NaN NaN NaN
01600 4314 London 5 23/1284 James Klein NaN NaN 4222/1988 Supporting Text 4222/2000 Extra Text NaN NaN
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
137 views
Welcome To Ask or Share your Answers For Others

1 Answer

I wouldn't use the same variable i for both inner and outer loops. Changing your for loop to the following should work cleaner:

for i in items:
    d = {'Number': number, 
         'Register': register, 
         'City': city, 
         'Id': int(i[0].split()[0]), 
         'Share': i[0].split(': ')[1], 
         'Name': i[1], 
         }
    
    if "ADDR" in i[2]:
        born, address = i[2].split("ADDR:")
        d['Born'] = born.replace("BORN:", "").strip()
        d['Address'] = address.strip()
    else:
        d['Born']: i[2].split()[1]
    
    if len(i)>3:
        for j in i[3:]:
            key, value = j.split(" ", 1)
            d[key] = value
    data.append(d)

#load the list of dicts as a dataframe
df = pd.DataFrame(data)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...