2024.js/idv.md

4.8 KiB

The Indented Document Values Format

Overview

The Indented Document Values (IDV) format is a meta-syntax for machine-readable textual data.

IDV is designed to prioritize human readability and writability by minimizing visual noise- there are no sigils, quotes, or brackets, only colons, indentation, and (when necessary) backslash escapes.

As a tradeoff, IDV is not a self-describing data format- while it can be used for defining a serialization or configuration format, systems using it need to layer their own semantics on top of it.

Example

Person: Alice
  Uid: 1000
  Phone: 555-1234
  Group: users
  Group: sudo
  Banner:
    ============================
    This is my ASCII art login message
    ============================

Person: Bob
  Uid: 1001
  Phone: 555-5656
  Group: users

Syntax

IDV is a line-oriented format. Before any other parsing is done, the input is split into lines, and any trailing whitespace on a line (including line separators) is ignored.

Comments

A Comment is any line whose first character is a # character. Comment lines are for human use and are ignored by the parser.

# This line is ignored

Blank Lines

A Blank Line is any line that only contains whitespace. Because trailing whitespace is always trimmed, all Blank Lines are indistinguishable from each other.

Blank Lines are ignored unless they are part of a Document. (see below)

Entries

An Entry is composed of one or more lines:

Tags

Each entry begins with a Tag, terminated by a colon (:). A Tag can contain any characters except leading or trailing whitespace, newlines, and colons:

Tag:

Distinguishers

Optionally, a Distinguisher can follow the Tag on the same line. A Distinguisher can contain any characters except leading or trailing whitespace, and newlines:

Tag: distinguisher

Escapes

Within Tags and Distinguishers, backslash escapes may be used to represent non-permitted or inconvenient characters:

Tag With \: And Spaces:

Tag: \ distinguisher with leading whitespace and\nA newline
Escape sequence Replacement
\<space> A literal space
\n A newline
\: A colon (:)
\\ A backslash (\)

TODO: additional escapes? ie, hex or unicode?

Documents

After the first line of an entry, any indented lines make up the Document portion of the entry:

Tag: distinguisher
  First Line
    Second Line
  Third Line

The first line of a Document defines the Document's indentation- subsequent lines can be indented deeper, but no line may be indented less than the first line. This indentation is removed from the beginning of each line when determining the Document's value.

Blank Lines can not carry indentation information. To resolve this ambiguity, Documents may not begin or end with Blank Lines- such lines are ignored. Blank Lines that occur between indented lines are considered part of the Document.

Tag:

  The above blank line is ignored.
  The below blank line is part of the Document.

  The below blank line is ignored.

Tag:
  Other stuff

Backslash escapes are not processed within a Document. However, backslashes may be processed later, by higher-layered semantics.

In many cases the Document will contain recursive IDV data, and the rules above are designed to play nicely with this case- but it is up to the concrete format to decide how to parse the Document. It could just as easily contain free text, XML, or a base64 blob.

Disambiguations:

  1. The Tag and Distinguisher are both trimmed of surrounding whitespace before being interpreted, but internal whitespace is left intact.
  2. The Distinguisher may contain literal colons; these are treated as regular characters and carry no special meaning.

Data Model

Applying minimal interpretation, IDV data can be represented as a list of Entries.

An Entry can be represented as a 3-tuple of:

  1. a string (the Tag)
  2. a string (the optional Distinguisher)
  3. a list of strings (the lines of the Document)

How Entries are interpreted by the appication is not specified, but see below for some suggested patterns that should line up with things people usually want to do.

Patterns

Primitive Property

TODO: one of distinguisher | document non-empty, parsing based on expected type

Object Property

TODO: distinguisher ignored, document is IDV

Union Property

TODO: distinguisher determines how the document is parsed

List

TODO: property specified multiple times

Map

TODO: distinguisher defines key, document parsed for value

Property Map

TODO: Category defines key, parsed as property for value

Merged Map

See Also

TODO:

  • yaml
  • dpkg control files