You input isn't a valid JSON so you can't read it using spark.read.json
. Instead, you can load it as text DataFrame with spark.read.text
and parse the stringified dict into json using UDF:
import ast
import json
from pyspark.sql import functions as F
from pyspark.sql.types import *
schema = StructType([
StructField("event_date_utc", StringType(), True),
StructField("deleted", BooleanType(), True),
StructField("cost", IntegerType(), True),
StructField("name", StringType(), True)
])
dict_to_json = F.udf(lambda x: json.dumps(ast.literal_eval(x)))
df = spark.read.text("xxx")
.withColumn("value", F.from_json(dict_to_json("value"), schema))
.select("value.*")
df.show()
#+--------------+-------+----+----+
#|event_date_utc|deleted|cost|name|
#+--------------+-------+----+----+
#|null |false |1 |Mike|
#+--------------+-------+----+----+
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…