JBlam.JsonSequence 1.0.1

dotnet add package JBlam.JsonSequence --version 1.0.1                
NuGet\Install-Package JBlam.JsonSequence -Version 1.0.1                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="JBlam.JsonSequence" Version="1.0.1" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add JBlam.JsonSequence --version 1.0.1                
#r "nuget: JBlam.JsonSequence, 1.0.1"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install JBlam.JsonSequence as a Cake Addin
#addin nuget:?package=JBlam.JsonSequence&version=1.0.1

// Install JBlam.JsonSequence as a Cake Tool
#tool nuget:?package=JBlam.JsonSequence&version=1.0.1                

JBlam's JSON Sequence

Implements JSON Text Sequences (RFC7464) and NDJSON (Newline-Delimited JSON, spec) using .NET's System.Text.Json as the backend.

Format overview

At time of writing, in the subjective opinion of author, neither the RFC7464 nor any of the less formal "JSON with newlines" specs have emerged as a clear winner. Here's a very short overview of the two formats:

  • JSON Text Sequences (or "the RFC") prefixes each record with the ASCII Record Separator \u001E and suffixes each record with LF (\u000A). The RFC suggests that the parser SHOULD continue parsing after any failures parsing an individual record.
  • NDJSON has no record prefix, and suffixes each record with LF (\u000A). The spec suggests that the parser SHOULD raise an error if an individual record is unparseable (though the behaviour SHOULD be configurable).

API overview

Similar to System.Text.Json.JsonSerializer, the JBlam.JsonSequence.JsonSequenceSerializer provides static methods to directly serialise and deserialise IEnumerable<T> and IAsyncEnumerable<T>.

record R(int I);
var options = new JsonSequenceSerializerOptions(
    // or NdJson, whichever makes you happy
    JsonSequenceSerializerDefaults.Rfc7464);

// serialise -- supports async API as well
IEnumerable<R> items = [ new(1), new(2), new(3) ];
MemoryStream utf8Json = new();
JsonSequenceSerializer.Serialize(
    utf8Json,
    items,
    // unlike with JsonSerializer, the options are not optional,
    // because I need to know what set of delimiters you want.
    options);

// deserialise
utf8Json.Position = 0; // let's read the same stream back
var roundtrip = JsonSequenceSerializer.Deserialize<R>(utf8Json, options);

Similar to StreamWriter, the JBlam.JsonSequence.JsonSequenceWriter<T> class allows you to serialise items one-by-one as you generate them:

// JsonSequenceWriter supports synchronous IDisposable and 
// serialisation API as well
await using var writer = JsonSequenceWriter.Create<R>(
    utf8Json,
    options,
    leaveStreamOpen: true);
await writer.SerializeAsync(new(1));
await writer.SerializeAsync(new(2));
// disposing the writer will dispose the stream by default, but we asked it not to.

The example code above is tested in JBlam.JsonSequence.Tests.DocTest.

Error tolerance

Both RFC7464 and NDJSON specify some kind of error tolerance for handling empty or truncated records. The behaviours are controlled by the properties of JsonSequenceSerializerOptions:

  • EmptyRecordPolicy instructs the parser how to treat any record which contains zero bytes: the default for both RFC and NDJSON is to silently ignore empties; otherwise you can emit the default value (for example, null for reference types), or let the underlying JSON parser throw.
  • AllowWhitespaceInEmptyRecords causes the parser to treat a record which constists only of whitespace as though it were empty (per EmptyRecordPolicy): default for NDJSON.
  • ErrorTolerancePolicy allows deserialisation to continue unconditionally after any error (RFC default) or throw if the record is not syntactically-legal JSON (NDJSON default), unless the record was already skipped because it was empty.
  • TruncationDetectionPolicy causes the parser to consdier treat an otherwise-legal record as an error if it's possible that the record was truncated (default for RFC only); see the RFC for details.

If you want to detect errors explicitly, JsonSequenceSerializer.DeserializeItems{Async}<T> yields a "result object" SequenceItem<T>, which you can inspect to determine if any of the above error conditions occurred, or otherwise retrieve the successfully-parsed item.

Control serialisation of elements

The above serialisation and deserialisation methods can all be called with a JsonSerializerOptions instance that is passed directly to JsonSerializer when handling individual records.

Alternatively, if you're using the System.Text.Json source generator, you can instead pass an instance of JsonTypeInfo<T> which is forwarded to JsonSerializer.

The library is properly annotated for trimming when using JsonTypeInfo<T>.

Transparently convert a stream to JSON

JsonSequenceSerializer provides an extension method CopyAsJsonArrayTo on Stream, so if you have a sequence file that you need converted to "normal JSON", you can do that. The individual records are not parsed or otherwise validated, so the resulting stream may contain invalid JSON if your source was corrupted.

Perf

As a lukewarm take: if you're using JSON sequences, you are probably generating data so infrequently that perf is irrelevant. Nonetheless, here's some funky tables.

I ran the benchmarks on an old low-spec laptop because the CPU makes my lap nice and toasty. The reported numbers are for .NET 8; you can run the benchmarks on .NET 6, where on my machine they all run longer by a factor of something between 1.5 -- 2.


BenchmarkDotNet v0.13.10, Linux Mint 21.2 (Victoria)
Intel Core i3-6100U CPU 2.30GHz (Skylake), 1 CPU, 4 logical and 2 physical cores
.NET SDK 8.0.100
  [Host]     : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2


Reading

Method ElementCount Mean Ratio Allocated Alloc Ratio
Array 1000000 381.73 ms 1.00 30.9 MB 1.00
Enumerable 1000000 933.94 ms 2.45 22.89 MB 0.74
Sequence 1000000 823.21 ms 2.16 22.89 MB 0.74
Null 1000000 40.63 ms 0.11 1 MB 0.03
TypeInfoArray 1000000 390.83 ms 1.02 30.89 MB 1.00
TypeInfoEnumerable 1000000 889.52 ms 2.33 22.89 MB 0.74
TypeInfoSequence 1000000 817.68 ms 2.14 22.89 MB 0.74

All tests handle a sequence of 1 million elements which I generate using an implementation of Stream that is crafted such that I don't need to allocate a gigantic string.

  • Null measures the overhead of reading through my custom Stream. The benchmark method allocates that 1MB as a buffer. I'm unconcerned about the overhead because it's less than 10% of the actual system under test.
  • Array is just pointing JsonSerializer at the "complete" JSON array. It caches the entire output in memory before returning, though looking at the 8MB allocation difference between this method and the following one, it's not totally clear whether it's allocating 8 bytes per record (I think it should be 4 bytes) or whether I made the dynamic list sizing algorithm double the backing array at an inconvenient juncture.
  • Enumerable is using JsonSerializer's DeserialiseAsAsyncEnumerable method, which streams the output instead of collecting it. I believe the 22MB allocation is the lower bound of "handling each item".
  • Sequence is my implementation. The execution time difference between Sequence and Enumerable is statistically significant (Benchmark.NET reports standard deviations of ~4ms) but not especially exciting.
  • TypeInfo{X} provides source-generated JsonTypeInfo<T> to JsonSerializer. The effect on performance is marginal, which I wasn't expecting.

Copying

Method ElementCount Mean Allocated
DeserialiseCollectionThenReserialise 1000000 962.13 ms 66.2 MB
DeserialiseByElementThenReserialise 1000000 1,184.63 ms 203.14 MB
CopyToArray 1000000 46.56 ms 12.4 MB
  • DeserialiseCollectionThenReserialise uses this library to deserialise a sequence; collects it into a List<T>, then uses System.Text.Json to serialise the list to the output stream
  • DeserialiseByElementThenReserialise uses this library to deserialise a sequence; iterating over the sequence, each item is serialised one-by-one using System.Text.Json (the parent JSON array terminals [ ] and delimiters , are manually written)
  • CopyToArray uses JsonSequenceSerializer.CopyAsJsonArrayTo to copy the source bytes, transforming only the sequence delimiters without parsing the records.

CopyToArray performs suspiciously well here; I suspect I may have a faulty benchmark. PRs welcome!

Not shown here: I ran a benchmark of CopyAsJsonArrayTo followed by JsonSerializer.Deserialize<List<T>> against the existing reading benchmarks above; it performed significantly worse in time and allocations.

FAQ

(The "A" stands for "answered".)

Which sequence format should I use?

I'm not an authority on this, but here's my recommendation:

  1. If you're going to interact with microcontrollers, use NDJSON.

    Reasoning: the RFC is somewhat harder to implement. ArduinoJson already supports NDJSON.

  2. If you want to hand-edit files, use NDJSON.

    Reasoning: it's hard to type \u001E

  3. Otherwise, use the RFC

    Reasoning: the mandatory record delimiter \u001E is not legal in regular JSON, so no valid input JSON-like data will "accidentally" break the sequence.

Why is your license so stupid?

Because software licenses are stupid. If you want this released under a different license, contact me and explain why.

What's up with spelling in this project?

The major API naming follows the american spellings of System.Text.Json (for example, Deserialize), to avoid surprising developers who are likely familiar with those spellings. Everything else follows Australian English spelling because this project is written in Australian English.

Can you make this directly work with JSON arrays?

TL;DR: for your own good, no.

You could theoretically make a JsonSequenceWriter<T> correctly write items as they're produced. However, reading items is not feasible using the "result monad" SequenceItem approach this library takes, because the record delimiter is ,, a character which is essential to any practically-useful JSON document.

Taking a step back; you'd want this if you need to interoperate a "sequence JSON producer" and a "complete JSON consumer". But the implicit assumption of the "sequence producer" is that you can't know when the end of the sequence will happen until it happens, whereas JSON's design insists that you must know where the end of the document is. The two are fundamentally incompatible.

Product Compatible and additional computed target framework versions.
.NET net6.0 is compatible.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
1.0.1 214 12/24/2023