在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称:rcongiu/Hive-JSON-Serde开源软件地址:https://github.com/rcongiu/Hive-JSON-Serde开源编程语言:Java 99.7%开源软件介绍:JsonSerde - a read/write SerDe for JSON DataBuild Status: This library enables Apache Hive to read and write in JSON format. It includes support for serialization and deserialization (SerDe) as well as JSON conversion UDF. Features
InstallationDownload the latest binaries ( JSON Data FilesUpload JSON files to HDFS with The following example will work. { "key" : 10 }
{ "key" : 20 } The following example will not work. {
"key" : 10
}
{
"key" : 20
} Loading a JSON File and Querying DataUses json-serde/src/test/scripts/test-without-cr-lf.json.
Querying Complex FieldsUses json-serde/src/test/scripts/data.txt.
If you have complex json it can be tedious to create tables manually. Try hive-json-schema to build your schema from data. See json-serde/src/test/scripts for more examples. Defining Nested StructuresADD JAR json-serde-1.3.7-SNAPSHOT-jar-with-dependencies.jar;
CREATE TABLE json_nested_test (
country string,
languages array<string>,
religions map<string,array<int>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE;
-- data : {"country":"Switzerland","languages":["German","French","Italian"],
-- "religions":{"catholic":[10,20],"protestant":[40,50]}}
LOAD DATA LOCAL INPATH 'nesteddata.txt' OVERWRITE INTO TABLE json_nested_test;
select * from json_nested_test;
-- result: Switzerland ["German","French","Italian"] {"catholic":[10,20],"protestant":[40,50]}
select languages[0] from json_nested_test;
-- result: German
select religions['catholic'][0] from json_nested_test;
-- result: 10 Using ArraysData in JSON arrays should be ordered identically to Hive columns, similarly to text/csv. For example, array data as follows. ["John", 26 ]
["Mary", 23 ] Can be imported into the following table. CREATE TABLE people (name string, age int) Arrays can also be nested. CREATE TABLE complex_array (
name string, address struct<street:string,city:string>
)
-- data:
["John", { street:"10 green street", city:"Paris" } .. ] Importing Malformed DataThe SerDe will raise exceptions with malformed data. For example, the following malformed JSON will raise
{"country":"Italy","languages" "Italian","religions":{"catholic":"90"}}
This may not be desirable if you have a few bad lines you wish to ignore. Set ALTER TABLE json_table SET SERDEPROPERTIES ( "ignore.malformed.json" = "true"); While this option will not make the query fail, a NULL record will be inserted instead.
Promoting a Scalar to an ArrayIt is a common issue to have a field that sometimes is a scalar and sometimes an array. { "field" : "hello", .. }
{ "field" : [ "hello", "world" ], ... Declare your table as Support for UNIONTYPEA JSON data does not store anything describing the type, so the SerDe will try and infer it. The order matters.
For example, if you define a field The following data will be parsed as { "f": "123" } The following data will parsed as a { "f": "asv" } It's worth noting that complex Mapping Hive KeywordsSometimes JSON data has attributes named like reserved words in hive. For instance, you may have a JSON attribute
named 'timestamp', and hive will fail when issuing a In the following example CREATE TABLE mytable (
myfield string, ts string
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( "mapping.ts" = "timestamp" )
STORED AS TEXTFILE; Mapping Names with PeriodsHive doesn't support column names containing periods. In theory they should work when quoted in backtics, but
doesn't, as noted in SO#35344480.
To work around this issue set the property For example, create the following table. CREATE TABLE mytable (
my_field string,
other struct<with_dots:string> )
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ("dots.in.keys" = "true" ) Load the following JSON.
Query data substituting periods with underscores. SELECT my_field, other.with_dots from mytable
value, blah Case Sensitivity in mappingsSince hive is case insensitive, all JSON keys are by default lowercased, to accomodate situations where the same JSON key is in a different case. However, this may not be what you want, you may need to treat the same key with different case as two different ones. You'll then have to use mappings, since hive does not support case sensitive columns, and you'll also have to tell the SerDe not to be case insensitive (the default). CREATE TABLE mytable (
time1 string,
time2 string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
"case.insensitive" = "false", --tells hive to ignore key case
"mapping.time1"= "time", -- lowercase 'time' mapped into 'time1'
"mapping.time2"= "Time") -- uppercase to 'time2'
-- Data: { "time" : "2012-10-22:, "Time": "2012-11-22"}
SELECT time1,time2 from mytable Explicit Null Value In Serialized JSON StringIn order to be complaint with some object oriented systems an explicit 'null' json value is required in the serialized string. As default, Hive-JSON-Serde will not produce null values in the output serialized JSON string and just drop the key, if you do want to have explicit 'null' values in your output JSON string, use the following:
User Defined Functions (UDF)tjsonThe
The SerDe must also be in the classpath for the UDF to work. If not installed
as a hive extra library, you should also TimestampsNote that the system default timezone is used to convert timestamps. ContributingSee CONTRIBUTING for how to build the project. HistoryThis library is written by Roberto Congiu <[email protected]> during his time at OpenX Technologies, Inc.. See CHANGELOG for details. ThanksThanks to Douglas Crockford for the liberal license for his JSON library, and thanks to my employer OpenX and my boss Michael Lum for letting me open source the code. |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论