4.8 KiB
The Indented Document Values Format
Overview
The Indented Document Values (IDV) format is a meta-syntax for machine-readable textual data.
IDV is designed to prioritize human readability and writability by minimizing visual noise- there are no sigils, quotes, or brackets, only colons, indentation, and (when necessary) backslash escapes.
As a tradeoff, IDV is not a self-describing data format- while it can be used for defining a serialization or configuration format, systems using it need to layer their own semantics on top of it.
Example
Person: Alice
Uid: 1000
Phone: 555-1234
Group: users
Group: sudo
Banner:
============================
This is my ASCII art login message
============================
Person: Bob
Uid: 1001
Phone: 555-5656
Group: users
Syntax
IDV is a line-oriented format. Before any other parsing is done, the input is split into lines, and any trailing whitespace on a line (including line separators) is ignored.
Comments
A Comment is any line whose first character is a #
character. Comment lines are for human use and are ignored by the parser.
# This line is ignored
Blank Lines
A Blank Line is any line that only contains whitespace. Because trailing whitespace is always trimmed, all Blank Lines are indistinguishable from each other.
Blank Lines are ignored unless they are part of a Document. (see below)
Entries
An Entry is composed of one or more lines:
Tags
Each entry begins with a Tag, terminated by a colon (:
). A Tag can contain any characters except leading or trailing whitespace, newlines, and colons:
Tag:
Distinguishers
Optionally, a Distinguisher can follow the Tag on the same line. A Distinguisher can contain any characters except leading or trailing whitespace, and newlines:
Tag: distinguisher
Escapes
Within Tags and Distinguishers, backslash escapes may be used to represent non-permitted or inconvenient characters:
Tag With \: And Spaces:
Tag: \ distinguisher with leading whitespace and\nA newline
Escape sequence | Replacement |
---|---|
\<space> | A literal space |
\n | A newline |
\: | A colon (: ) |
\\ | A backslash (\ ) |
TODO: additional escapes? ie, hex or unicode?
Documents
After the first line of an entry, any indented lines make up the Document portion of the entry:
Tag: distinguisher
First Line
Second Line
Third Line
The first line of a Document defines the Document's indentation- subsequent lines can be indented deeper, but no line may be indented less than the first line. This indentation is removed from the beginning of each line when determining the Document's value.
Blank Lines can not carry indentation information. To resolve this ambiguity, Documents may not begin or end with Blank Lines- such lines are ignored. Blank Lines that occur between indented lines are considered part of the Document.
Tag:
The above blank line is ignored.
The below blank line is part of the Document.
The below blank line is ignored.
Tag:
Other stuff
Backslash escapes are not processed within a Document. However, backslashes may be processed later, by higher-layered semantics.
In many cases the Document will contain recursive IDV data, and the rules above are designed to play nicely with this case- but it is up to the concrete format to decide how to parse the Document. It could just as easily contain free text, XML, or a base64 blob.
Disambiguations:
- The Tag and Distinguisher are both trimmed of surrounding whitespace before being interpreted, but internal whitespace is left intact.
- The Distinguisher may contain literal colons; these are treated as regular characters and carry no special meaning.
Data Model
Applying minimal interpretation, IDV data can be represented as a list of Entries.
An Entry can be represented as a 3-tuple of:
- a string (the Tag)
- a string (the optional Distinguisher)
- a list of strings (the lines of the Document)
How Entries are interpreted by the appication is not specified, but see below for some suggested patterns that should line up with things people usually want to do.
Patterns
Primitive Property
TODO: one of distinguisher | document non-empty, parsing based on expected type
Object Property
TODO: distinguisher ignored, document is IDV
Union Property
TODO: distinguisher determines how the document is parsed
List
TODO: property specified multiple times
Map
TODO: distinguisher defines key, document parsed for value
Property Map
TODO: Category defines key, parsed as property for value
Merged Map
See Also
TODO:
- yaml
- dpkg control files