Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Is there a way to convert a Spark Df (not RDD) to pandas DF

I tried the following:

var some_df = Seq(
 ("A", "no"),
 ("B", "yes"),
 ("B", "yes"),
 ("B", "no")

 ).toDF(
"user_id", "phone_number")

Code:

%pyspark
pandas_df = some_df.toPandas()

Error:

 NameError: name 'some_df' is not defined

Any suggestions.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
360 views
Welcome To Ask or Share your Answers For Others

1 Answer

following should work

some_df = sc.parallelize([
 ("A", "no"),
 ("B", "yes"),
 ("B", "yes"),
 ("B", "no")]
 ).toDF(["user_id", "phone_number"])
pandas_df = some_df.toPandas()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...