2024.js/idv.md

2.9 KiB

The Indented Document Values Format

Overview

The Indented Document Values (IDV) format is a text-based, whitespace-sensitive serialization format.

IDV is designed to prioritize human readability and writability by minimizing visual noise- there are no sigils, quotes, or brackets, only colons, indentation, and (when necessary) backslash escapes.

As a tradeoff, IDV is not a self-describing data format- you have to know what type of data an IDV document represents at the time you parse it.

Example

TODO: need something both concise and nontrivial. LDAP user data is certainly an option

Syntax

IDV is a line-oriented format. Before any other parsing is done, the input is split into lines, and any trailing whitespace on a line (including line separators) is ignored.

TODO: possible redraft: sequence of comments, entry headers, and documents, defined by line types (blank, comment, entry header, indented)

The lines of an IDV document represent a single flat list of Comments and Entries.

A Comment is any line whose first character is a # character. Comment lines are for human use and are ignored by the parser.

# This line is ignored

An Entry's first line is unindented and contains the name of a Category, up to the first : character, followed by a Distinguisher. All following lines with indentation, if any, are the entry's Document:

Collection: distinguisher
  Indented
  document

  with a blank line
  1. The Category and Distinguisher are both trimmed of surrounding whitespace before being interpreted, but internal whitespace is left intact.
  2. Backslash unescaping is performed on the Category and Distinguisher.
  3. The Distinguisher may contain literal colons; these are treated as regular characters and carry no special meaning.
  4. The first line of a Document defines the document's indentation- subsequent lines can be indented deeper, but no line may be indented less than the first line.
  5. It is ambiguous whether blank lines are part of a document or just aesthetic spacing for Entries; to resolve this, blank lines before and after a Document are ignored, but internal blank lines are considered part of the Document.
  6. Backslash unescaping is not performed on the Document. However, backslashes may be processed later, when the document is interpreted.

Data Model

TODO: tuples, can be interpreted according to patterns

Patterns

Primitive Property

TODO: one of distinguisher | document non-empty, parsing based on expected type

Object Property

TODO: distinguisher ignored, document is IDV

Union Property

TODO: distinguisher determines how the document is parsed

List

TODO: property specified multiple times

Map

TODO: distinguisher defines key, document parsed for value

Property Map

TODO: Category defines key, parsed as property for value

Merged Map

See Also

TODO:

  • yaml
  • dpkg control files