I am trying to convert a bunch of text files into a data frame using Pandas.
Thanks to Stack Overflow's amazing community, I almost got the desired output (OP: Python Text File to Data Frame with Specific Pattern).
Basically I need to turn a text with specific patterns (but sometimes missing data) into a data frame using Pandas.
Here is an example:
Number 01600 London Register 4314
Some random text...
************************************* B ***************************************
1 SHARE: 73/1284
John Smith
BORN: 1960-01-01 ADDR: Streetname 3/2 1000
f 4222/2001
h 1334/2000
i 5774/2000
4 SHARE: 58/1284
Boris Morgan
BORN: ADDR: Streetname 4 2000
5 SHARE: 23/1284
James Klein
BORN: ADDR:
c 4222/1988 Supporting Text
f 4222/2000 Extra Text
************************************* C ***************************************
More random text...
From the example above, I need to transform the text between ***B*** and ***C*** into a data frame with the following output:
Number | Register | City | Id | Share | Name | Born | Address | c | f | h | i |
---|---|---|---|---|---|---|---|---|---|---|---|
01600 | 4314 | London | 1 | 73/1284 | John Smith | 1960-01-01 | Streetname 3/2 1000 | NaN | 4222/2001 | 1334/2000 | 5774/2000 |
01600 | 4314 | London | 4 | 58/1284 | Boris Morgan | NaN | Streetname 4 2000 | NaN | NaN | NaN | NaN |
01600 | 4314 | London | 5 | 23/1284 | James Klein | NaN | NaN | 4222/1988 Supporting Text | 4222/2000 Extra Text | NaN | NaN |