Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I wanted to create an array and save three fields from a dataframe and then read that array so the codes stored in the array are not on another dataframe.

df1

id; id1; code; date_create
1; 100; 50; 2021-10-10
2; 200; 60; 2021-10-10
3; 300; 70; 2021-10-10
4; 400; 80; 2021-10-10
5; 500; 90; 2021-10-10

df2

1; 100; 50; 2021-10-10
2; 200; 60; 2021-10-10
3; 300; 70; 2021-10-10
4; 400; 80; 2021-10-15
5; 500; 90; 2021-10-15
6; 600; 100; 2021-10-15
7; 700; 101; 2021-10-15

I would like to store it in an array:

read df2 where date_create equals 2021-10-15 and save the field id, id1, code

After read the array and generate the df1 again but without the id, id1, code that is in the array

more or less like this, below the code is not right is more an idea

list = np.array (df1.select ("id", id1, code) .collect ())
    for i in lista:
          df1 = df1.filter (df1 ["id", id1, code]! = i)

Then I was going to make a union

df2.union (df1)

to avoid duplication problems.

If anyone can help me I would appreciate it.

result
    id; id1; code; date_create
    1; 100; 50; 2021-10-10
    2; 200; 60; 2021-10-10
    3; 300; 70; 2021-10-10
    4; 400; 80; 2021-10-15
    5; 500; 90; 2021-10-15
    6; 600; 100; 2021-10-15
    7; 700; 101; 2021-10-15

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
462 views
Welcome To Ask or Share your Answers For Others

1 Answer

You can do an anti-join to eliminate duplicates, and then union:

result = df1.join(df2, ['id', 'id1', 'code'], 'anti').union(df2)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...