Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
585 views
in Technique[技术] by (71.8m points)

json - Spark:为JSON字符串生成JSON模式(Spark: Generating JSON schema for a JSON string)

Im using Spark 2.4.3 and Scala 2.11

(我正在使用Spark 2.4.3和Scala 2.11)

Below is my current JSON string in a DataFrame column.

(以下是我在DataFrame列中当前的JSON字符串。)

Im trying to store the schema of this JSON string in another column using schema_of_json function.

(我试图使用schema_of_json函数将这个JSON string的模式存储在另一列中。)

But its throwing below the error.

(但是它低于错误。)

How could I resolve this?

(我该如何解决?)

{
  "company": {
    "companyId": "123",
    "companyName": "ABC"
  },
  "customer": {
    "customerDetails": {
      "customerId": "CUST-100",
      "customerName": "CUST-AAA",
      "status": "ACTIVE",
      "phone": {
        "phoneDetails": {
          "home": {
            "phoneno": "666-777-9999"
          },
          "mobile": {
            "phoneno": "333-444-5555"
          }
        }
      }
    },
    "address": {
      "loc": "NORTH",
      "adressDetails": [
        {
          "street": "BBB",
          "city": "YYYYY",
          "province": "AB",
          "country": "US"
        },
        {
          "street": "UUU",
          "city": "GGGGG",
          "province": "NB",
          "country": "US"
        }
      ]
    }
  }
}

Code:

(码:)

val df = spark.read.textFile("./src/main/resources/json/company.txt")
df.printSchema()
df.show()

root
 |-- value: string (nullable = true)

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value                                                                                                                                                                                                                                                                                                                                                                                                                              |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"company":{"companyId":"123","companyName":"ABC"},"customer":{"customerDetails":{"customerId":"CUST-100","customerName":"CUST-AAA","status":"ACTIVE","phone":{"phoneDetails":{"home":{"phoneno":"666-777-9999"},"mobile":{"phoneno":"333-444-5555"}}}},"address":{"loc":"NORTH","adressDetails":[{"street":"BBB","city":"YYYYY","province":"AB","country":"US"},{"street":"UUU","city":"GGGGG","province":"NB","country":"US"}]}}}|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


df.withColumn("jsonSchema",schema_of_json(col("value")))

Error:

(错误:)

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'schemaofjson(`value`)' due to data type mismatch: The input json should be a string literal and not null; however, got `value`.;;
'Project [value#0, schemaofjson(value#0) AS jsonSchema#10]
+- Project [value#0]
   +- Relation[value#0] text
  ask by Leibnitz translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Since SPARK-24709 was introduced schema_of_json accepts just literal strings.

(自从引入SPARK-24709以来,schema_of_json schema_of_json接受文字字符串。)

You can extract schema of String in DDL format by calling

(您可以通过调用DDL格式提取String模式)

spark.read
  .json(df.select("value").as[String])
  .schema
  .toDDL

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...