Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have a json file, nodes that looks like this:

[{"toid":"osgb4000000031043205","point":[508180.748,195333.973],"index":1}
,{"toid":"osgb4000000031043206","point":[508163.122,195316.627],"index":2}
,{"toid":"osgb4000000031043207","point":[508172.075,195325.719],"index":3}
,{"toid":"osgb4000000031043208","point":[508513,196023],"index":4}]

I am able to read and manipulate this record with Python.

I am trying to read this file in scala through the spark-shell.

From this tutorial, I can see that it is possible to read json via sqlContext.read.json

val vfile = sqlContext.read.json("path/to/file/nodes.json")

However, this results in a corrupt_record error:

vfile: org.apache.spark.sql.DataFrame = [_corrupt_record: string]

Can anyone shed some light on this error? I can read and use the file with other applications and I am confident it is not corrupt and sound json.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
855 views
Welcome To Ask or Share your Answers For Others

1 Answer

As Spark expects "JSON Line format" not a typical JSON format, we can tell spark to read typical JSON by specifying:

val df = spark.read.option("multiline", "true").json("<file>")

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...