python - Pandas loc dynamic conditional list

Question

Welcome To Ask or Share your Answers For Others

python - Pandas loc dynamic conditional list

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

I have a Pandas DataFrame and I want to find all rows where the i'th column values are 10 times greater than other columns. Here is an example of my DataFrame:

For example, looking at column i=0, row B (0.344) its is 10x greater than values in the same row but in other columns (0.001, 0, 0.009, 0). So I would like:

my_list_0=[False,True,False,False,False,False,False,False,False,False,False]

The number of columns might change hence I don't want a solution like:

#This is good only for a DataFrame with 4 columns.
my_list_i = data.loc[(data.iloc[:,i]>10*data.iloc[:,(i+1)%num_cols]) &
                     (data.iloc[:,i]>10*data.iloc[:,(i+2)%num_cols]) &
                     (data.iloc[:,i]>10*data.iloc[:,(i+3)%num_cols])]

Any idea? thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

233 views

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:21:19+0000

Given the df:

df = pd.DataFrame({'cell1':[0.006209, 0.344955, 0.004521, 0, 0.018931, 0.439725, 0.013195, 0.009045, 0, 0.02614, 0],
              'cell2':[0.048043, 0.001077, 0,0.010393, 0.031546, 0.287264, 0.016732, 0.030291, 0.016236, 0.310639,0], 
              'cell3':[0,0,0.020238, 0, 0.03811, 0.579348, 0.005906, 0,0,0.068352, 0.030165],
              'cell4':[0.016139, 0.009359, 0,0,0.025449, 0.47779, 0, 0.01282, 0.005107, 0.004846, 0],
              'cell5': [0,0,0,0.012075, 0.031668, 0.520258, 0,0,0,2.728218, 0.013418]})
i = 0

You can use

(10 * df.drop(df.columns[i], axis=1)).lt(df.iloc[:,i], axis=0).all(1)

To get

0     False
1      True
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
dtype: bool

for any number of columns. This drops column i, multiplies the remaining df by 10, and checks row-wise for being less than i, then returns True only if all values in the row are True. So it returns a vector of True for each row where this obtains and False for others.

If you want to give an arbitrary threshold, you can sum the Trues and divide by the number of columns - 1, then compare with your threshold:

thresh = 0.5  # or whatever you want
(10 * df.drop(df.columns[i], axis=1)).lt(df.iloc[:,i], axis=0).sum(1) / (df.shape[1] - 1) > thresh

0     False
1      True
2      True
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
dtype: bool

Categories

python - Pandas loc dynamic conditional list

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags