scala - How to save a spark DataFrame as csv on disk?

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

scala - How to save a spark DataFrame as csv on disk?

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

For example, the result of this:

df.filter("project = 'en'").select("title","count").groupBy("title").sum()

would return an Array.

How to save a spark DataFrame as a csv file on disk ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

750 views

1 Answer

深蓝 · Answer 1 · 2021-10-17T00:10:18+0000

Apache Spark does not support native CSV output on disk.

You have four available solutions though:

You can convert your Dataframe into an RDD :
```
def convertToReadableString(r : Row) = ???
df.rdd.map{ convertToReadableString }.saveAsTextFile(filepath)
```
This will create a folder filepath. Under the file path, you'll find partitions files (e.g part-000*)

What I usually do if I want to append all the partitions into a big CSV is
```
cat filePath/part* > mycsvfile.csv
```
Some will use coalesce(1,false) to create one partition from the RDD. It's usually a bad practice, since it may overwhelm the driver by pulling all the data you are collecting to it.

Note that df.rdd will return an RDD[Row].

With Spark <2, you can use databricks spark-csv library:

Spark 1.4+:

df.write.format("com.databricks.spark.csv").save(filepath)

Spark 1.3:

df.save(filepath,"com.databricks.spark.csv")

With Spark 2.x the spark-csv package is not needed as it's included in Spark.
```
df.write.format("csv").save(filepath)
```
You can convert to local Pandas data frame and use to_csv method (PySpark only).

Note: Solutions 1, 2 and 3 will result in CSV format files (part-*) generated by the underlying Hadoop API that Spark calls when you invoke save. You will have one part- file per partition.

Categories

scala - How to save a spark DataFrame as csv on disk?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags