whfmt.FileFormatCatalog
1.0.0
dotnet add package whfmt.FileFormatCatalog --version 1.0.0
NuGet\Install-Package whfmt.FileFormatCatalog -Version 1.0.0
<PackageReference Include="whfmt.FileFormatCatalog" Version="1.0.0" />
<PackageVersion Include="whfmt.FileFormatCatalog" Version="1.0.0" />
<PackageReference Include="whfmt.FileFormatCatalog" />
paket add whfmt.FileFormatCatalog --version 1.0.0
#r "nuget: whfmt.FileFormatCatalog, 1.0.0"
#:package whfmt.FileFormatCatalog@1.0.0
#addin nuget:?package=whfmt.FileFormatCatalog&version=1.0.0
#tool nuget:?package=whfmt.FileFormatCatalog&version=1.0.0
whfmt.FileFormatCatalog
675+ embedded file format and language definitions for automatic format detection and syntax highlighting.
Cross-platform net8.0 — works in any .NET 8 application.
dotnet add package whfmt.FileFormatCatalog
Full documentation: whfmt-FileFormatCatalog-guide.md — API reference, architecture, integration guides (Level 1–3), and .whfmt format specification.
About
This catalog grew out of the format detection engine inside WpfHexEditorIDE — a full-featured binary/text IDE built on WPF. Every time a file is opened, the IDE needs to know what it is, which editor to route it to, and how to syntax-highlight it. Rather than hardcoding rules, we built a declarative .whfmt format — a JSON definition file that captures magic bytes, extensions, MIME types, entropy hints, quality scores, and syntax grammars in one place.
Over time the catalog grew to 675+ definitions covering everything from Nintendo ROMs and audio codecs to machine learning models and certificate formats. The syntax grammar side expanded to 35 languages to drive the built-in code editor.
This package extracts that catalog as a standalone, cross-platform library — useful for any application that needs to identify files, route them to the right handler, or provide syntax highlighting without taking a dependency on a full IDE framework.
Quick Start
1 — Add the using directives
using WpfHexEditor.Core.Definitions;
using WpfHexEditor.Core.Contracts;
var catalog = EmbeddedFormatCatalog.Instance;
2 — Detect a format by extension
EmbeddedFormatEntry? entry = catalog.GetByExtension(".zip");
Console.WriteLine(entry?.Name); // "ZIP Archive"
Console.WriteLine(entry?.PreferredEditor); // "structure-editor"
3 — Detect by magic bytes
// Pass at least the first 16 bytes — 512 bytes recommended
byte[] header = File.ReadAllBytes("unknown.bin")[..512];
EmbeddedFormatEntry? detected = catalog.DetectFromBytes(header);
Console.WriteLine(detected?.Name); // e.g. "PNG Image"
4 — Detect by MIME type
EmbeddedFormatEntry? entry = catalog.GetByMimeType("image/png");
5 — Browse a category
// Enum overload — IntelliSense, no typos
IReadOnlyList<EmbeddedFormatEntry> games = catalog.GetByCategory(FormatCategory.Game);
// String overload — for dynamic/runtime scenarios
IReadOnlyList<EmbeddedFormatEntry> same = catalog.GetByCategory("Game");
6 — Extract a syntax grammar for a code editor
EmbeddedFormatEntry? cs = catalog.GetByExtension(".cs");
if (cs?.HasSyntaxDefinition == true)
{
string? grammar = catalog.GetSyntaxDefinitionJson(cs.ResourceKey);
// Feed grammar into your tokenizer / syntax highlighter
}
7 — Access the full JSON or schema
// Full .whfmt JSON for any entry (cached)
string json = catalog.GetJson(entry.ResourceKey);
// Embedded JSON schema — enum overload (recommended)
string? schema = catalog.GetSchemaJson(SchemaName.Whfmt);
// String overload
string? same = catalog.GetSchemaJson("whfmt");
8 — Route to the right editor
IReadOnlyList<string> editors = catalog.GetCompatibleEditorIds("report.pdf");
// ["hex-editor", "structure-editor"]
Fast Startup — PreWarm
// Call once from a background thread at startup to pre-load all JSON into cache
await Task.Run(() => EmbeddedFormatCatalog.Instance.PreWarm());
Advanced Examples
Batch folder scanner — group files by detected category
var catalog = EmbeddedFormatCatalog.Instance;
var byCategory = Directory
.EnumerateFiles(@"C:\Downloads", "*.*", SearchOption.AllDirectories)
.Select(path =>
{
// Try extension first (fast), fall back to magic bytes (accurate)
var entry = catalog.GetByExtension(Path.GetExtension(path));
if (entry is null)
{
using var fs = File.OpenRead(path);
var header = new byte[512];
int read = fs.Read(header, 0, header.Length);
entry = catalog.DetectFromBytes(header.AsSpan(0, read));
}
return (Path: path, Category: entry?.Category ?? "Unknown", Entry: entry);
})
.GroupBy(f => f.Category)
.OrderByDescending(g => g.Count());
foreach (var group in byCategory)
Console.WriteLine($"{group.Key}: {group.Count()} file(s)");
// Pull only the game ROMs using the enum
var roms = catalog.GetByCategory(FormatCategory.Game);
Console.WriteLine($"Known game formats: {roms.Count}");
Magic-byte validator — detect extension spoofing
var catalog = EmbeddedFormatCatalog.Instance;
bool IsExtensionSpoofed(string filePath)
{
var byExtension = catalog.GetByExtension(Path.GetExtension(filePath));
if (byExtension is null) return false; // unknown format — skip
using var fs = File.OpenRead(filePath);
var header = new byte[512];
int read = fs.Read(header, 0, header.Length);
var byBytes = catalog.DetectFromBytes(header.AsSpan(0, read));
// Spoofed if bytes point to a different known format
return byBytes is not null && byBytes.ResourceKey != byExtension.ResourceKey;
}
// Usage
if (IsExtensionSpoofed(@"C:\uploads\invoice.pdf"))
Console.WriteLine("Warning: file content does not match its extension.");
Grammar loader — wire syntax highlighting into a custom editor
var catalog = EmbeddedFormatCatalog.Instance;
// Load grammars only for the Programming category (enum — no typo risk)
var languages = catalog.GetByCategory(FormatCategory.Programming)
.Where(e => e.HasSyntaxDefinition)
.OrderBy(e => e.Name);
foreach (var lang in languages)
{
string? grammarJson = catalog.GetSyntaxDefinitionJson(lang.ResourceKey);
if (grammarJson is null) continue;
// Deserialize into your tokenizer model and register
// MyTokenizerRegistry.Register(lang.Name, grammarJson);
Console.WriteLine($"Loaded grammar: {lang.Name} ({lang.Extensions.FirstOrDefault()})");
}
// Output: Loaded grammar: C# (.cs), Loaded grammar: Python (.py), ...
// Validate your own .whfmt file against the embedded schema
string? whfmtSchema = catalog.GetSchemaJson(SchemaName.Whfmt);
// Pass whfmtSchema to your JSON schema validator (e.g. JsonSchema.Net)
MIME negotiation — extension ↔ MIME bidirectional mapping
var catalog = EmbeddedFormatCatalog.Instance;
// Extension → MIME (e.g. for HTTP Content-Type)
string? GetMimeType(string extension)
=> catalog.GetByExtension(extension)?.MimeTypes?.FirstOrDefault();
// MIME → canonical extension (e.g. for file download naming)
string? GetExtension(string mimeType)
=> catalog.GetByMimeType(mimeType)?.Extensions.FirstOrDefault();
// Examples
Console.WriteLine(GetMimeType(".png")); // "image/png"
Console.WriteLine(GetMimeType(".zip")); // "application/zip"
Console.WriteLine(GetExtension("image/png")); // ".png"
Console.WriteLine(GetExtension("audio/mpeg")); // ".mp3"
Features
Format Detection
- 675+ embedded
.whfmtdefinitions — extension, MIME type, and magic-byte lookup DetectFromBytes(ReadOnlySpan<byte>)— zero-alloc magic-byte scoring across all signaturesGetByExtension,GetByMimeType,GetByCategory— multiple lookup strategiesGetCompatibleEditorIds— returns all compatible editor IDs for a given file path- 27 categories: Archives, Audio, Images, Game, Documents, Video, System, 3D, and more
Syntax Highlighting
- 35 language grammars with
syntaxDefinitionblocks (C#, Python, JS/TS, Go, Rust, Java, Kotlin, Swift, Dart, PHP, Ruby, Lua, SQL, YAML, TOML, Markdown, and more) GetSyntaxDefinitionJson(resourceKey)— raw grammar JSON ready for a tokenizerHasSyntaxDefinitionflag for fast filtering
Enum API
FormatCategoryenum — all 27 categories with IntelliSense, no string typosSchemaNameenum —Whfmt,Whcd,Whdbg,Whidews,Whscd- All enum overloads delegate to string overloads — both variants always available
Performance
- Singleton with lazy thread-safe initialization (
LazyInitializer) - Entries backed by
FrozenSet<T>— O(1) set operations - JSON cache — each resource key read once, then served from memory
PreWarm()— pre-load all JSON on a background thread before first use
What's New in 1.0.0
- New: Initial NuGet release — cross-platform
net8.0. - New:
FormatCategoryenum — type-safe overload forGetByCategory(). - New:
SchemaNameenum — type-safe overload forGetSchemaJson(). - New:
DetectFromBytes(ReadOnlySpan<byte>)— magic-byte detection across all 675+ signatures. - New:
GetByMimeType(string)— MIME type lookup. - New:
GetByCategory(string/FormatCategory)— category browsing. - New:
GetSchemaJson(string/SchemaName)— access to 5 embedded JSON schemas. - New:
MimeTypesandSignaturesfields onEmbeddedFormatEntry.
Included Assemblies
Both bundled inside the package — zero external NuGet dependencies:
| Assembly | Purpose |
|---|---|
| WpfHexEditor.Core.Definitions | EmbeddedFormatCatalog singleton + 675+ embedded .whfmt definitions |
| WpfHexEditor.Core.Contracts | IEmbeddedFormatCatalog, EmbeddedFormatEntry, FormatCategory, SchemaName |
License
GNU Affero General Public License v3.0 (AGPL-3.0)
Links
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- No dependencies.
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 1.0.0 | 90 | 4/16/2026 |