python - Pythonic way of processing a file between two previously known strings

Question

Welcome To Ask or Share your Answers For Others

python - Pythonic way of processing a file between two previously known strings

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

I process log files with python. Let′s say that I have a log file that contains a line which is START and a line that is END, like below:

START
one line
two line
...
n line
END

What I do want is to be able to store the content between the START and END lines for further processing.

I do the following in Python:

with open (file) as name_of_file:
    for line in name_of_file:
        if 'START' in line:  # We found the start_delimiter
            print(line)
            found_start = True
            for line in name_of_file:  # We now read until the end delimiter
                if 'END' in line:  # We exit here as we have the info
                    found_end=True
                    break
                else:

                    if not (line.isspace()): # We do not want to add to the data empty strings, so we ensure the line is not empty
                        data.append(line.replace(',','').strip().split())  # We store information in a list called data we do not want ','' or spaces
if(found_start and found_end):
    relevant_data=data

And then I process the relevant_data.

Looks to far complicated for the purity of Python, and hence my question: is there a more Pythonic way of doing this?

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

137 views

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:15:53+0000

You are right that there is something not OK with having a nested loop over the same iterator. File objects are already iterators, and you can use that to your advantage. For example, to find the first line with a START in it:

line = next(l for l in name_of_file if 'START' in l)

This will raise a StopIteration if there is no such line. It also sets the file pointer to the beginning of the first line you care about.

Getting the last line without anything that comes after it is a bit more complicated because it's difficult to set external state in a generator expression. Instead, you can make a simple generator:

def interesting_lines(file):
    if not next((line for line in file if 'START' in line), None):
        return
    for line in file:
        if 'END' in line:
            break
        line = line.strip()
        if not line:
            continue
        yield line.replace(',', '').split()

The generator will yield nothing if you don't have a START, but it will yield all the lines until the end if there is no END, so it differs a little from your implementation. You would use the generator to replace your loop entirely:

with open(name_of_file) as file:
    data = list(interesting_lines(file))

if data:
    ... # process data

Wrapping the generator in list immediately processes it, so the lines persist even after you close the file. The iterator can be used repeatedly because at the end of your call, the file pointer will be just past the END line:

with open(name_of_file) as file:
    for data in iter(lambda: list(interesting_lines(file)), []):
        # Process another data set.

The relatively lesser known form of iter converts any callable object that accepts no arguments into an iterator. The end is reached when the callable returns the sentinel value, in this case an empty list.

Categories

python - Pythonic way of processing a file between two previously known strings

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags