I have a list of data frames, on each location of a list, I have one dataframe I need to combine all those in one dataframe. this is to be done in PySpark , before I was using
dataframe_new =pd.concat(listName)
solution 1
from pyspark.sql.types import *
import pyspark.sql
from pyspark.sql import SparkSession, Row
customSchema = StructType([
StructField("col1", StringType(), True),
StructField("col2", StringType(), True),
StructField("col3", StringType(), True),
StructField("col4", StringType(), True),
StructField("col5", StringType(), True),
StructField("col6", StringType(), True),
StructField("col7", StringType(), True)
])
df = spark.createDataFrame(queried_dfs[0],schema=customSchema)
Solution 2 I tried: (iterating through the list of dataframes, but don't know how to combine them
for x in ListOfDataframe
new_df=union_all()
but this is always create a new_df
any help to resolve this?
question from:https://stackoverflow.com/questions/65923884/make-single-dataframe-from-list-of-dataframes