scala - Reading JSON with Apache Spark - `corrupt_record`

Question

Welcome To Ask or Share your Answers For Others

scala - Reading JSON with Apache Spark - `corrupt_record`

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

scala - Reading JSON with Apache Spark - `corrupt_record`

I have a json file, nodes that looks like this:

[{"toid":"osgb4000000031043205","point":[508180.748,195333.973],"index":1}
,{"toid":"osgb4000000031043206","point":[508163.122,195316.627],"index":2}
,{"toid":"osgb4000000031043207","point":[508172.075,195325.719],"index":3}
,{"toid":"osgb4000000031043208","point":[508513,196023],"index":4}]

I am able to read and manipulate this record with Python.

I am trying to read this file in scala through the spark-shell.

From this tutorial, I can see that it is possible to read json via sqlContext.read.json

val vfile = sqlContext.read.json("path/to/file/nodes.json")

However, this results in a corrupt_record error:

vfile: org.apache.spark.sql.DataFrame = [_corrupt_record: string]

Can anyone shed some light on this error? I can read and use the file with other applications and I am confident it is not corrupt and sound json.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:20:11+0000

As Spark expects "JSON Line format" not a typical JSON format, we can tell spark to read typical JSON by specifying:

val df = spark.read.option("multiline", "true").json("<file>")

Categories

scala - Reading JSON with Apache Spark - `corrupt_record`

scala - Reading JSON with Apache Spark - `corrupt_record`

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags