• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

neuecc/Utf8Json: Definitely Fastest and Zero Allocation JSON Serializer for C#(N ...

原作者: [db:作者] 来自: 网络 收藏 邀请

开源软件名称:

neuecc/Utf8Json

开源软件地址:

https://github.com/neuecc/Utf8Json

开源编程语言:

C# 99.9%

开源软件介绍:

THIS PROJECT IS ARCHIVED, USE COMMUNITY FORK INSTEAD.

Utf8Json - Fast JSON Serializer for C#

Join the chat at https://gitter.im/neuecc/Utf8Json Releases

Definitely Fastest and Zero Allocation JSON Serializer for C#(.NET, .NET Core, Unity and Xamarin), this serializer write/read directly to UTF8 binary so boostup performance. And I adopt the same architecture as the fastest binary serializer, MessagePack for C# that I've developed.

image

This benchmark is convert object to UTF8 and UTF8 to object benchmark. It is not to string(.NET UTF16), so Jil, NetJSON and Json.NET contains additional UTF8.GetBytes/UTF8.GetString call. Definitely means does not exists encoding/decoding cost. Benchmark code is in sandbox/PerfBenchmark by BenchmarkDotNet.

I've tested more benchmark - Benchmark of Jil vs Utf8Json for test many dataset patterns(borrwed from Jil Benchmark) and three input/output compare(Object <-> byte[](Utf8), Object <-> Stream(Utf8) and Object <-> String(UTF16)). If target is UTF8(both byte[] and Stream), Utf8Json wins and memory allocated is extremely small.

Utf8Json does not beat MessagePack for C#(binary), but shows a similar memory consumption(there is no additional memory allocation) and achieves higher performance than other JSON serializers.

The crucial difference is that read and write directly to UTF8 binaries means that there is no overhead. Normaly serialization requires serialize to Stream or byte[], it requires additional UTF8.GetBytes cost or StreamReader/Writer overhead(it is very slow!).

TargetClass obj1;

// Object to UTF8 byte[]
[Benchmark]
public byte[] Utf8JsonSerializer()
{
    return Utf8Json.JsonSerializer.Serialize(obj1, jsonresolver);
}

// Object to String to UTF8 byte[]
[Benchmark]
public byte[] Jil()
{
    return utf8.GetBytes(global::Jil.JSON.Serialize(obj1));
}

// Object to Stream with StreamWriter
[Benchmark]
public void JilTextWriter()
{
    using (var ms = new MemoryStream())
    using (var sw = new StreamWriter(ms, utf8))
    {
        global::Jil.JSON.Serialize(obj1, sw);
    }
}

For example, the OutputFormatter of ASP.NET Core needs to write to Body(Stream), but using Jil's TextWriter overload is slow. (This not means Jil is slow, for example StreamWriter allocate many memory(char[1024] and byte[3075]) on constructor (streamwriter.cs#L203-L204) and other slow features unfortunately).

// ASP.NET Core, OutputFormatter
public class JsonOutputFormatter : IOutputFormatter //, IApiResponseTypeMetadataProvider
{
    const string ContentType = "application/json";
    static readonly string[] SupportedContentTypes = new[] { ContentType };

    public Task WriteAsync(OutputFormatterWriteContext context)
    {
        context.HttpContext.Response.ContentType = ContentType;

        // Jil, normaly JSON Serializer requires serialize to Stream or byte[].
        using (var writer = new StreamWriter(context.HttpContext.Response.Body))
        {
            Jil.JSON.Serialize(context.Object, writer, _options);
            writer.Flush();
            return Task.CompletedTask;
        }

        // Utf8Json
        // Utf8Json.JsonSerializer.NonGeneric.Serialize(context.ObjectType, context.HttpContext.Response.Body, context.Object, resolver);
    }
}

The approach of directly write/read from JSON binary is similar to corefxlab/System.Text.Json and corefxlab/System.Text.Formatting. But it is not yet finished and not be general serializer.

Corefxlab has UTF8String and C# discussing UTF8String Constants but maybe it is far future.

Install and QuickStart

The library provides in NuGet except for Unity. Standard library availables for .NET Framework 4.5 and .NET Standard 2.0.

Install-Package Utf8Json

And official Extension Packages for support other library(ImmutableCollection) or binding for framework(ASP.NET Core MVC).

Install-Package Utf8Json.ImmutableCollection
Install-Package Utf8Json.UnityShims
Install-Package Utf8Json.AspNetCoreMvcFormatter

NuGet page links - Utf8Json, Utf8Json.ImmutableCollection, Utf8Json.UnityShims, Utf8Json.AspNetCoreMvcFormatter

for Unity, you can download from releases page. There providing .unitypackage. Unity support details, see Unity section.

You can find third-party extension package like Utf8Json.FSharpExtensions for F# types, NServiceBus.Utf8Json or others.

QuickStart, you can call Utf8Json.JsonSerializer.Serialize/Deserialize.

var p = new Person { Age = 99, Name = "foobar" };

// Object -> byte[] (UTF8)
byte[] result = JsonSerializer.Serialize(p);

// byte[] -> Object
var p2 = JsonSerializer.Deserialize<Person>(result);

// Object -> String
var json = JsonSerializer.ToJsonString(p2);

// Write to Stream
JsonSerializer.Serialize(stream, p2);

In default, you can serialize all public members. You can customize serialize to private, exclude null, change DateTime format(default is ISO8601), enum handling, etc. see the Resolver section.

Performance of Serialize

This image is what code is generated when object serializing.

image

// Disassemble generated serializer code.
public sealed class PersonFormatter : IJsonFormatter<Person>
{
    private readonly byte[][] stringByteKeys;
    
    public PersonFormatter()
    {
        // pre-encoded escaped string byte with "{", ":" and ",".
        this.stringByteKeys = new byte[][]
        {
            JsonWriter.GetEncodedPropertyNameWithBeginObject("Age"), // {\"Age\":
            JsonWriter.GetEncodedPropertyNameWithPrefixValueSeparator("Name") // ,\"Name\":
        };
    }

    public sealed Serialize(ref JsonWriter writer, Person person, IJsonFormatterResolver jsonFormatterResolver)
    {
        if (person == null) { writer.WriteNull(); return; }

        // WriteRawX is optimize byte->byte copy when we know src size.
        UnsafeMemory64.WriteRaw7(ref writer, this.stringByteKeys[0]);
        writer.WriteInt32(person.Age); // itoa write directly to avoid ToString + UTF8 encode
        UnsafeMemory64.WriteRaw8(ref writer, this.stringByteKeys[1]);
        writer.WriteString(person.Name);

        writer.WriteEndObject();
    }

    // public unsafe Person Deserialize(ref JsonReader reader, IJsonFormatterResolver jsonFormatterResolver)
}

Object to JSON's main serialization cost is write property name. Utf8Json create cache at first and after that only do memory copy. Optimize part1, concatenate "{", ":" and "." to cached propertyname. Optimize part2, use optimized custom memory copy method(see: UnsafeMemory.cs). Normally memory copy is used Buffer.BlockCopy but it has some overhead when target binary is small enough, releated to dotnet/coreclr - issue #9786 Optimize Buffer.MemoryCopy and dotnet/coreclr - Add a fast path for byte[] to Buffer.BlockCopy #3118. Utf8Json don't use Buffer.BlockCopy and generates length specialized copy code that can reduce branch cost.

Number conversion is often high cost. If target encoding is UTF8 only, we can use itoa algorithm so avoid int.ToString and UTF8 encode cost. Especialy double-conversion, Utf8Json ported google/double-conversion algorithm, it is fast dtoa and atod works.

Other optimize techniques.

  • High-level API uses internal memory pool, don't allocate working memory under 64K
  • Struct JsonWriter does not allocate any more and write underlying byte[] directly, don't use TextWriter
  • Avoid boxing all codes, all platforms(include Unity/IL2CPP)
  • Heavyly tuned dynamic IL code generation, it generates per option so reduce option check: see:DynamicObjectResolver.cs
  • Call Primitive API directly when IL code generation knows target is primitive
  • Getting cached generated formatter on static generic field(don't use dictionary-cache because lookup is overhead)
  • Don't use IEnumerable<T> abstraction on iterate collection, specialized each collection types, see:CollectionFormatter.cs
  • Uses optimized type key dictionary for non-generic methods, see: ThreadsafeTypeKeyHashTable.cs

Performance of Deserialize

When deserializing, requires property name to target member name matching. Utf8Json avoid string key decode for matching, generate automata based IL inlining code.

image

use raw byte[] slice and try to match each ulong type (per 8 character, if it is not enough, pad with 0).

// Disassemble generated serializer code.
public sealed class PersonFormatter : IJsonFormatter<Person>
{
    // public sealed Serialize(ref JsonWriter writer, Person person, IJsonFormatterResolver jsonFormatterResolver)

    public unsafe Person Deserialize(ref JsonReader reader, IJsonFormatterResolver jsonFormatterResolver)
    {
        if (reader.ReadIsNull()) return null;
        
        reader.ReadIsBeginObjectWithVerify(); // "{"
        
        byte[] bufferUnsafe = reader.GetBufferUnsafe();
        int age;
        string name;
        fixed (byte* ptr2 = &bufferUnsafe[0])
        {
            int num;
            while (!reader.ReadIsEndObjectWithSkipValueSeparator(ref num)) // "}" or ","
            {
                // don't decode string, get raw slice
                ArraySegment<byte> arraySegment = reader.ReadPropertyNameSegmentRaw();
                byte* ptr3 = ptr2 + arraySegment.Offset;
                int count = arraySegment.Count;
                if (count != 0)
                {
                    // match automata per long
                    ulong key = AutomataKeyGen.GetKey(ref ptr3, ref count);
                    if (count == 0)
                    {
                        if (key == 6645569uL)
                        {
                            age = reader.ReadInt32(); // atoi read directly to avoid GetString + int.Parse
                            continue;
                        }
                        if (key == 1701667150uL)
                        {
                            name = reader.ReadString();
                            continue;
                        }
                    }
                }
                reader.ReadNextBlock();
            }
        }

        return new PersonSample
        {
            Age = age,
            Name = name
        };
    }
}

Of course number conversion(decode to string -> try parse) is high cost. Utf8Json directly convert byte[] to number by atoi/atod algorithm.

Built-in support types

These types can serialize by default.

Primitives(int, string, etc...), Enum, Nullable<>, TimeSpan, DateTime, DateTimeOffset, Guid, Uri, Version, StringBuilder, BitArray, Type, ArraySegment<>, BigInteger, Complext, ExpandoObject , Task, Array[], Array[,], Array[,,], Array[,,,], KeyValuePair<,>, Tuple<,...>, ValueTuple<,...>, List<>, LinkedList<>, Queue<>, Stack<>, HashSet<>, ReadOnlyCollection<>, IList<>, ICollection<>, IEnumerable<>, Dictionary<,>, IDictionary<,>, SortedDictionary<,>, SortedList<,>, ILookup<,>, IGrouping<,>, ObservableCollection<>, ReadOnlyOnservableCollection<>, IReadOnlyList<>, IReadOnlyCollection<>, ISet<>, ConcurrentBag<>, ConcurrentQueue<>, ConcurrentStack<>, ReadOnlyDictionary<,>, IReadOnlyDictionary<,>, ConcurrentDictionary<,>, Lazy<>, Task<>, custom inherited ICollection<> or IDictionary<,> with paramterless constructor, IEnumerable, ICollection, IList, IDictionary and custom inherited ICollection or IDictionary with paramterless constructor(includes ArrayList and Hashtable), Exception and inherited exception types(serialize only) and your own class or struct(includes anonymous type).

Utf8Json has sufficient extensiblity. You can add custom type support and has some official/third-party extension package. for example ImmutableCollections(ImmutableList<>, etc), Utf8Json.FSharpExtensions(FSharpOption, FSharpList, etc...). Please see extensions section.

Object Serialization

Utf8Json can serialze your own public Class or Struct. In default, serializer search all public instance member(field or property) and uses there member name as json property name. If you want to avoid serialization target, you can use [IgnoreDataMember] attribute of System.Runtime.Serialization to target member. If you want to change property name, you can use [DataMember(Name = string)] attribute of System.Runtime.Serialization.

// JsonSerializer.Serialize(new FooBar { FooProperty = 99, BarProperty = "BAR" });
// Result : {"foo":99}
public class FooBar
{
    [DataMember(Name = "foo")]
    public int FooProperty { get; set; }

    [IgnoreDataMember]
    public string BarProperty { get; set; }
}

Utf8Json has other option, allows private/internal member serialization, convert property name to camelCalse/snake_case, if value is null does not create property. Or you can use a different DateTime format(default is ISO8601). The details, please read Resolver section. Here is sample.

// default serializer change to allow private/exclude null/snake_case serializer.
JsonSerializer.SetDefaultResolver(StandardResolver.AllowPrivateExcludeNullSnakeCase);

var json = JsonSerializer.ToJsonString(new Person { Age = 23, FirstName = null, LastName = "Foo" });

// {"age":23,"last_name":"Foo"}
Console.WriteLine(json);

Serialize ImmutableObject(SerializationConstructor)

Utf8Json can deserialize immutable object like this.

public struct CustomPoint
{
    public readonly int X;
    public readonly int Y;

    public CustomPoint(int x, int y)
    {
        this.X = x;
        this.Y = y;
    }
}

Utf8Json choose constructor with the most matched argument by name(ignore case).

MessagePack for C# choose least matched argument, please be aware of the opposite. This is design miss of MessagePack for C#.

If can not match automatically, you can specify to use constructor manually by [SerializationConstructorAttribute].

public class CustomPoint
{
    public readonly int X;
    public readonly int Y;

    public CustomPoint(int x, int y)
    {
        this.X = x;
        this.Y = y;
    }

    // used this constructor.
    [SerializationConstructor]
    public CustomPoint(int x)
    {
        this.X = x;
    }
}

ShouldSerializeXXX pattern

UtfJson supports ShouldSerialize feature of Json.NET. If defined public bool ShouldMemberName() method, call method before serialize member value and if false does not output member.

public class MyPerson
{
    public string Name { get; set; }
    public string[] Addresses { get; set; }

    // ShouldSerialize*membername**
    // method must be `public` and return `bool` and parameter less.
    public bool ShouldSerializeAddresses()
    {
        if (Addresses != null && Addresses.Length != 0)
        {
            return true;
        }
        else
        {
            return false;
        }
    }
}

--

// {"Name":"foo"}
JsonSerializer.ToJsonString(new MyPerson { Name = "foo", Addresses = new string[0] });
        
// {"Name":"bar","Addresses":["tokyo","kyoto"]}
JsonSerializer.ToJsonString(new MyPerson { Name = "bar", Addresses = new[] { "tokyo", "kyoto" } });

Dynamic Deserialization

If use JsonSerializer.Deserialize or JsonSerializer.Deserialize, convert json to bool, double, string, IDictionary<string, object>, List<object>.

// dynamic json deserialize
var json = JsonSerializer.Deserialize<dynamic>(@"{""foo"":""json"",""bar"":100,""nest"":{""foobar"":true}}");

var r1 = json["foo"]; // "json" - dynamic(string)
var r2 = json["bar"]; // 100 - dynamic(double), it can cast to int or other number.
var r3 = json["nest"]["foobar"]; // true

If target is object, you access by string indexer.

JSON Comments

JSON Comments is invalid JSON Format but used widely(for example, VSCode - settings.json) and also supports JSON.NET. Utf8Json suports both single-line comment and multi-line comment.

{
    // allow single line comment
    "foo": true, // trailing
    /*
      allow
      multi
      line
      comment
    */
    "bar": 999 /* trailing */
}

Encoding

Utf8Json only supports UTF-8 but it is valid on latest JSON Spec - RFC8259 The JavaScript Object Notation (JSON) Data Interchange Format, DECEMBER 2017.

It mentions about encoding.

8.1. Character Encoding JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8 [RFC3629].

Previous specifications of JSON have not required the use of UTF-8 when transmitting JSON text. However, the vast majority of JSON based software implementations have chosen to use the UTF-8 encoding, to the extent that it is the only encoding that achieves interoperability.

Which serializer should be used

The performance of binary(protobuf, msgpack, avro, etc...) vs text(json, xml, yaml, etc...) depends on the implementation. However, binary has advantage basically. Utf8Json write directly to byte[] it is close to the binary serializer. But especialy double is still slower than binary write(Utf8Json uses google/double-conversion algorithm, it is good but there are many processes, it can not be the fastest), write string requires escape and large payload must pay copy cost.

I recommend use MessagePack for C# for general use serializer, C# to C#, C# to NoSQL, Save to File, communicate internal cross platform(multi-language), etc. MessagePack for C# has many options(Union ,Typeless, Compression) and definitely fastest.

But JSON is still better on web, for public Web API, send for JavaScript and easy to integrate between multi-language communication. For example, use Utf8Json for Web API formatter and use MessagePack for C# for Redis. It is perfect.

For that reason Utf8Json is focusing performance and cross-platform compatibility. I don't implement original format(like Union, Typeless, Cyclic-Reference) if you want to use it should be use binary serializer. But customizability for serialize/deserialize JSON is important for cross-platform communication. IJsonFormatterResolver can serialize/deserialize all patterns and you can create own pattern.

High-Level API(JsonSerializer)

JsonSerializer is the entry point of Utf8Json. Its static methods are main API of Utf8Json.

API Description
DefaultResolver FormatterResolver that used resolver less overloads. If does not set it, used StandardResolver.Default.
SetDefaultResolver Set default resolver of JsonSerializer APIs.
Serialize<T> Convert object to byte[] or write to stream. There has IJsonFormatterResolver overload, used specified resolver.
SerializeUnsafe<T> Same as Serialize<T> but return ArraySegement<byte>. The result of ArraySegment is contains internal buffer pool, it can not share across thread and can not hold, so use quickly.
SerializeAsync<T> Convert object to byte[] and write to stream async.
ToJsonString<T> Convert object to string.
Deserialize<T> Convert byte[] or ArraySegment<byte> or stream to object. There has IFormatterResolver overload, used specified resolver.
DeserializeAsync<T> Convert stream(read async to buffer byte[]) to object.
PrettyPrint Output indented json string.
PrettyPrintByteArray Output indented json string(UTF8 byte[]).
NonGeneric.* NonGeneric APIs of Serialize/Deserialize. There accept type parameter at first argument. This API is bit slower than generic API but useful for framework integration such as ASP.NET formatter.

Utf8Json operates at the byte[] level, so Deserialize<T>(Stream) read to end at first, it is not truly streaming deserialize.

High-Level API uses memory pool internaly to avoid unnecessary memory allocation. If result size is under 64K, allocates GC memory only for the return bytes.

Low-Level API(IJsonFormatter)

IJsonFormatter is serializer by each type. For example Int32Formatter : IJsonFormatter<Int32> represents Int32 JSON serializer.

public interface IJsonFormatter<T> : IJsonFormatter
{
    void Serialize(ref JsonWriter writer, T value, IJsonFormatterResolver formatterResolver);
    T Deserialize(ref JsonReader reader, IJsonFormatterResolver formatterResolver);
}

Many builtin formatters exists under Utf8Json.Formatters. You can get sub type serializer by formatterResolver.GetFormatter<T>. Here is sample of write own formatter.


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap