pandas - Python - Delete duplicates in a dataframe based on two columns combinations?

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

pandas - Python - Delete duplicates in a dataframe based on two columns combinations?

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

I have a dataframe with 3 columns in Python:

Name1 Name2 Value
Juan  Ale   1
Ale   Juan  1

and would like to eliminate the duplicates based on columns Name1 and Name2 combinations.

In my example both rows are equal (but they are in different order), and I would like to delete the second row and just keep the first one, so the end result should be:

Name1 Name2 Value
Juan  Ale   1

Any idea will be really appreciated!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1.2k views

1 Answer

深蓝 · Answer 1 · 2021-10-17T00:48:42+0000

By using np.sort with duplicated

df[pd.DataFrame(np.sort(df[['Name1','Name2']].values,1)).duplicated()]
Out[614]: 
  Name1 Name2  Value
1   Ale  Juan      1

Performance

df=pd.concat([df]*100000)

%timeit df[pd.DataFrame(np.sort(df[['Name1','Name2']].values,1)).duplicated()]
10 loops, best of 3: 69.3 ms per loop
%timeit df[~df[['Name1', 'Name2']].apply(frozenset, axis=1).duplicated()]
1 loop, best of 3: 3.72 s per loop

Categories

pandas - Python - Delete duplicates in a dataframe based on two columns combinations?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags