python csv: getting subset

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

python csv: getting subset

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

here is a snapshot of my csv:

alex    123f    1
harry   fwef    2
alex    sef 3
alex    gsdf    4
alex    wf35    6
harry   sdfsdf  3

i would like to get the subset of this data where the occurrence of anything in the first column (harry, alex) is at least 4. so i want the resulting data set to be:

alex    123f    1
alex    sef 3
alex    gsdf    4
alex    wf35    6

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

209 views

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:23:29+0000

Clearly, you cannot decide which rows are interesting until you've seen all rows (since the very last row might be the one turning some count from three to four and thereby making some previously seen rows interesting, for example;-). So, unless your CSV file is horribly huge, suck it all into memory, first, as a list...:

import csv

with open('thefile.csv', 'rb') as f:
  data = list(csv.reader(f))

then, do the counting -- Python 2.7 has a better way, but assuming you're still on 2.6 like most of us...:

import collections
counter = collections.defaultdict(int)
for row in data:
    counter[row[0]] += 1

and finally do the selection loop...:

for row in data:
    if counter[row[0]] >= 4:
        print row

Of course, this prints each interesting row as a roughly-hewed list (with square brackets and quotes around the items), but it will be easy to format it in any way you might prefer.

Categories

python csv: getting subset

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags