Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have text string which has multiple lines and each line has mix of characters/numbers and spaces etc.

Here is how a couple lines look like:

WEIGHT                         VOLUME                    CHARGEABLE                PACKAGES
                                                             
398.000 KG                     4.999 M3                  833.500 KG                12 PLT
                                                                                         
MAWB                                    HAWB
    / MH616 /                                                                                         
8947806753                             ABC20018830
  

Output I am looking for is to extract the above headers as keys and their values as values of a dict.

{ 
 "WEIGHT": 398.00 KG, 
 "VOLUME" : 4.99 M3,
 "CHAREGABLE" : 833.500 KG,
 "PACKAGES": 12 PLT,
 "MAWB"  : 8947806753,
 "HAWB"  : ABC20018830
} 

I am not sure how to fetch the value for a particular field from a different line under it. If its in same line I can fetch using a pattern. But not sure how to fetch it from a different line (the value of the field is directly underneath it in a different line).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
191 views
Welcome To Ask or Share your Answers For Others

1 Answer

You can use a regex to easily split the text into a list containing all the fields:

import re

a = "WEIGHT                         VOLUME                    CHARGEABLE                PACKAGES
                                                                         398.000 KG                     4.999 M3                  833.500 KG                12 PLT
                                                                                         MAWB                                    HAWB
    / MH616 /                                                                                           8947806753                             ABC20018830
"

# Split on 4 (or more) whitespace (leaves the units with the numbers)
data = re.split(r's{4,}', a)
print(data)

['WEIGHT', 'VOLUME', 'CHARGEABLE', 'PACKAGES', '398.000 KG', '4.999 M3', '833.500 KG', '12 PLT', 'MAWB', 'HAWB', '/ MH616 /', '8947806753', 'ABC20018830 ']

Since the keys and values are mixed, there probably isn't an easy way to automatically determine which is which. However if they are always in the same position, you can pick them out manually, e.g.:

b = {
    # WEIGHT
    data[0]: data[4],
    # VOLUME
    data[1]: data[5]
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...