I wanted to create an array and save three fields from a dataframe and then read that array so the codes stored in the array are not on another dataframe.
df1
id; id1; code; date_create
1; 100; 50; 2021-10-10
2; 200; 60; 2021-10-10
3; 300; 70; 2021-10-10
4; 400; 80; 2021-10-10
5; 500; 90; 2021-10-10
df2
1; 100; 50; 2021-10-10
2; 200; 60; 2021-10-10
3; 300; 70; 2021-10-10
4; 400; 80; 2021-10-15
5; 500; 90; 2021-10-15
6; 600; 100; 2021-10-15
7; 700; 101; 2021-10-15
I would like to store it in an array:
read df2 where date_create equals 2021-10-15 and save the field id, id1, code
After read the array and generate the df1 again but without the id, id1, code that is in the array
more or less like this, below the code is not right is more an idea
list = np.array (df1.select ("id", id1, code) .collect ())
for i in lista:
df1 = df1.filter (df1 ["id", id1, code]! = i)
Then I was going to make a union
df2.union (df1)
to avoid duplication problems.
If anyone can help me I would appreciate it.
result
id; id1; code; date_create
1; 100; 50; 2021-10-10
2; 200; 60; 2021-10-10
3; 300; 70; 2021-10-10
4; 400; 80; 2021-10-15
5; 500; 90; 2021-10-15
6; 600; 100; 2021-10-15
7; 700; 101; 2021-10-15