Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have a Pandas DataFrame and I want to find all rows where the i'th column values are 10 times greater than other columns. Here is an example of my DataFrame:

Sample data

For example, looking at column i=0, row B (0.344) its is 10x greater than values in the same row but in other columns (0.001, 0, 0.009, 0). So I would like:

my_list_0=[False,True,False,False,False,False,False,False,False,False,False]

The number of columns might change hence I don't want a solution like:

#This is good only for a DataFrame with 4 columns.
my_list_i = data.loc[(data.iloc[:,i]>10*data.iloc[:,(i+1)%num_cols]) &
                     (data.iloc[:,i]>10*data.iloc[:,(i+2)%num_cols]) &
                     (data.iloc[:,i]>10*data.iloc[:,(i+3)%num_cols])]

Any idea? thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
233 views
Welcome To Ask or Share your Answers For Others

1 Answer

Given the df:

df = pd.DataFrame({'cell1':[0.006209, 0.344955, 0.004521, 0, 0.018931, 0.439725, 0.013195, 0.009045, 0, 0.02614, 0],
              'cell2':[0.048043, 0.001077, 0,0.010393, 0.031546, 0.287264, 0.016732, 0.030291, 0.016236, 0.310639,0], 
              'cell3':[0,0,0.020238, 0, 0.03811, 0.579348, 0.005906, 0,0,0.068352, 0.030165],
              'cell4':[0.016139, 0.009359, 0,0,0.025449, 0.47779, 0, 0.01282, 0.005107, 0.004846, 0],
              'cell5': [0,0,0,0.012075, 0.031668, 0.520258, 0,0,0,2.728218, 0.013418]})
i = 0

You can use

(10 * df.drop(df.columns[i], axis=1)).lt(df.iloc[:,i], axis=0).all(1)

To get

0     False
1      True
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
dtype: bool

for any number of columns. This drops column i, multiplies the remaining df by 10, and checks row-wise for being less than i, then returns True only if all values in the row are True. So it returns a vector of True for each row where this obtains and False for others.

If you want to give an arbitrary threshold, you can sum the Trues and divide by the number of columns - 1, then compare with your threshold:

thresh = 0.5  # or whatever you want
(10 * df.drop(df.columns[i], axis=1)).lt(df.iloc[:,i], axis=0).sum(1) / (df.shape[1] - 1) > thresh

0     False
1      True
2      True
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
dtype: bool

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...