JBlam.JsonSequence
1.0.1
dotnet add package JBlam.JsonSequence --version 1.0.1
NuGet\Install-Package JBlam.JsonSequence -Version 1.0.1
<PackageReference Include="JBlam.JsonSequence" Version="1.0.1" />
paket add JBlam.JsonSequence --version 1.0.1
#r "nuget: JBlam.JsonSequence, 1.0.1"
// Install JBlam.JsonSequence as a Cake Addin #addin nuget:?package=JBlam.JsonSequence&version=1.0.1 // Install JBlam.JsonSequence as a Cake Tool #tool nuget:?package=JBlam.JsonSequence&version=1.0.1
JBlam's JSON Sequence
Implements JSON Text Sequences (RFC7464) and NDJSON (Newline-Delimited JSON, spec) using .NET's System.Text.Json as the backend.
Format overview
At time of writing, in the subjective opinion of author, neither the RFC7464 nor any of the less formal "JSON with newlines" specs have emerged as a clear winner. Here's a very short overview of the two formats:
- JSON Text Sequences (or "the RFC") prefixes each record with the ASCII Record Separator
\u001E
and suffixes each record with LF (\u000A
). The RFC suggests that the parser SHOULD continue parsing after any failures parsing an individual record. - NDJSON has no record prefix, and suffixes each record with LF (
\u000A
). The spec suggests that the parser SHOULD raise an error if an individual record is unparseable (though the behaviour SHOULD be configurable).
API overview
Similar to System.Text.Json.JsonSerializer
, the JBlam.JsonSequence.JsonSequenceSerializer
provides static methods to directly serialise and deserialise IEnumerable<T>
and IAsyncEnumerable<T>
.
record R(int I);
var options = new JsonSequenceSerializerOptions(
// or NdJson, whichever makes you happy
JsonSequenceSerializerDefaults.Rfc7464);
// serialise -- supports async API as well
IEnumerable<R> items = [ new(1), new(2), new(3) ];
MemoryStream utf8Json = new();
JsonSequenceSerializer.Serialize(
utf8Json,
items,
// unlike with JsonSerializer, the options are not optional,
// because I need to know what set of delimiters you want.
options);
// deserialise
utf8Json.Position = 0; // let's read the same stream back
var roundtrip = JsonSequenceSerializer.Deserialize<R>(utf8Json, options);
Similar to StreamWriter
, the JBlam.JsonSequence.JsonSequenceWriter<T>
class allows you to serialise items one-by-one as you generate them:
// JsonSequenceWriter supports synchronous IDisposable and
// serialisation API as well
await using var writer = JsonSequenceWriter.Create<R>(
utf8Json,
options,
leaveStreamOpen: true);
await writer.SerializeAsync(new(1));
await writer.SerializeAsync(new(2));
// disposing the writer will dispose the stream by default, but we asked it not to.
The example code above is tested in JBlam.JsonSequence.Tests.DocTest
.
Error tolerance
Both RFC7464 and NDJSON specify some kind of error tolerance for handling empty or truncated records. The behaviours are controlled by the properties of JsonSequenceSerializerOptions
:
EmptyRecordPolicy
instructs the parser how to treat any record which contains zero bytes: the default for both RFC and NDJSON is to silently ignore empties; otherwise you can emit the default value (for example,null
for reference types), or let the underlying JSON parser throw.AllowWhitespaceInEmptyRecords
causes the parser to treat a record which constists only of whitespace as though it were empty (perEmptyRecordPolicy
): default for NDJSON.ErrorTolerancePolicy
allows deserialisation to continue unconditionally after any error (RFC default) or throw if the record is not syntactically-legal JSON (NDJSON default), unless the record was already skipped because it was empty.TruncationDetectionPolicy
causes the parser to consdier treat an otherwise-legal record as an error if it's possible that the record was truncated (default for RFC only); see the RFC for details.
If you want to detect errors explicitly, JsonSequenceSerializer.DeserializeItems{Async}<T>
yields a "result object" SequenceItem<T>
, which you can inspect to determine if any of the above error conditions occurred, or otherwise retrieve the successfully-parsed item.
Control serialisation of elements
The above serialisation and deserialisation methods can all be called with a JsonSerializerOptions
instance that is passed directly to JsonSerializer
when handling individual records.
Alternatively, if you're using the System.Text.Json source generator, you can instead pass an instance of JsonTypeInfo<T>
which is forwarded to JsonSerializer
.
The library is properly annotated for trimming when using JsonTypeInfo<T>
.
Transparently convert a stream to JSON
JsonSequenceSerializer
provides an extension method CopyAsJsonArrayTo
on Stream
, so if you have a sequence file that you need converted to "normal JSON", you can do that. The individual records are not parsed or otherwise validated, so the resulting stream may contain invalid JSON if your source was corrupted.
Perf
As a lukewarm take: if you're using JSON sequences, you are probably generating data so infrequently that perf is irrelevant. Nonetheless, here's some funky tables.
I ran the benchmarks on an old low-spec laptop because the CPU makes my lap nice and toasty. The reported numbers are for .NET 8; you can run the benchmarks on .NET 6, where on my machine they all run longer by a factor of something between 1.5 -- 2.
BenchmarkDotNet v0.13.10, Linux Mint 21.2 (Victoria)
Intel Core i3-6100U CPU 2.30GHz (Skylake), 1 CPU, 4 logical and 2 physical cores
.NET SDK 8.0.100
[Host] : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
DefaultJob : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
Reading
Method | ElementCount | Mean | Ratio | Allocated | Alloc Ratio |
---|---|---|---|---|---|
Array | 1000000 | 381.73 ms | 1.00 | 30.9 MB | 1.00 |
Enumerable | 1000000 | 933.94 ms | 2.45 | 22.89 MB | 0.74 |
Sequence | 1000000 | 823.21 ms | 2.16 | 22.89 MB | 0.74 |
Null | 1000000 | 40.63 ms | 0.11 | 1 MB | 0.03 |
TypeInfoArray | 1000000 | 390.83 ms | 1.02 | 30.89 MB | 1.00 |
TypeInfoEnumerable | 1000000 | 889.52 ms | 2.33 | 22.89 MB | 0.74 |
TypeInfoSequence | 1000000 | 817.68 ms | 2.14 | 22.89 MB | 0.74 |
All tests handle a sequence of 1 million elements which I generate using an implementation of Stream
that is crafted such that I don't need to allocate a gigantic string.
- Null measures the overhead of reading through my custom Stream. The benchmark method allocates that 1MB as a buffer. I'm unconcerned about the overhead because it's less than 10% of the actual system under test.
- Array is just pointing
JsonSerializer
at the "complete" JSON array. It caches the entire output in memory before returning, though looking at the 8MB allocation difference between this method and the following one, it's not totally clear whether it's allocating 8 bytes per record (I think it should be 4 bytes) or whether I made the dynamic list sizing algorithm double the backing array at an inconvenient juncture. - Enumerable is using
JsonSerializer
'sDeserialiseAsAsyncEnumerable
method, which streams the output instead of collecting it. I believe the 22MB allocation is the lower bound of "handling each item". - Sequence is my implementation. The execution time difference between Sequence and Enumerable is statistically significant (Benchmark.NET reports standard deviations of ~4ms) but not especially exciting.
- TypeInfo{X} provides source-generated
JsonTypeInfo<T>
toJsonSerializer
. The effect on performance is marginal, which I wasn't expecting.
Copying
Method | ElementCount | Mean | Allocated |
---|---|---|---|
DeserialiseCollectionThenReserialise | 1000000 | 962.13 ms | 66.2 MB |
DeserialiseByElementThenReserialise | 1000000 | 1,184.63 ms | 203.14 MB |
CopyToArray | 1000000 | 46.56 ms | 12.4 MB |
- DeserialiseCollectionThenReserialise uses this library to deserialise a sequence; collects it into a
List<T>
, then usesSystem.Text.Json
to serialise the list to the output stream - DeserialiseByElementThenReserialise uses this library to deserialise a sequence; iterating over the sequence, each item is serialised one-by-one using
System.Text.Json
(the parent JSON array terminals[
]
and delimiters,
are manually written) - CopyToArray uses
JsonSequenceSerializer.CopyAsJsonArrayTo
to copy the source bytes, transforming only the sequence delimiters without parsing the records.
CopyToArray performs suspiciously well here; I suspect I may have a faulty benchmark. PRs welcome!
Not shown here: I ran a benchmark of CopyAsJsonArrayTo
followed by JsonSerializer.Deserialize<List<T>>
against the existing reading benchmarks above; it performed significantly worse in time and allocations.
FAQ
(The "A" stands for "answered".)
Which sequence format should I use?
I'm not an authority on this, but here's my recommendation:
- If you're going to interact with microcontrollers, use NDJSON.
Reasoning: the RFC is somewhat harder to implement. ArduinoJson already supports NDJSON.
- If you want to hand-edit files, use NDJSON.
Reasoning: it's hard to type
\u001E
- Otherwise, use the RFC
Reasoning: the mandatory record delimiter
\u001E
is not legal in regular JSON, so no valid input JSON-like data will "accidentally" break the sequence.
Why is your license so stupid?
Because software licenses are stupid. If you want this released under a different license, contact me and explain why.
What's up with spelling in this project?
The major API naming follows the american spellings of System.Text.Json (for example, Deserialize
), to avoid surprising developers who are likely familiar with those spellings. Everything else follows Australian English spelling because this project is written in Australian English.
Can you make this directly work with JSON arrays?
TL;DR: for your own good, no.
You could theoretically make a JsonSequenceWriter<T>
correctly write items as they're produced. However, reading items is not feasible using the "result monad" SequenceItem
approach this library takes, because the record delimiter is ,
, a character which is essential to any practically-useful JSON document.
Taking a step back; you'd want this if you need to interoperate a "sequence JSON producer" and a "complete JSON consumer". But the implicit assumption of the "sequence producer" is that you can't know when the end of the sequence will happen until it happens, whereas JSON's design insists that you must know where the end of the document is. The two are fundamentally incompatible.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
-
net6.0
- System.Text.Json (>= 8.0.0)
-
net8.0
- System.Text.Json (>= 8.0.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated |
---|---|---|
1.0.1 | 214 | 12/24/2023 |