168 lines
4.8 KiB
Markdown
168 lines
4.8 KiB
Markdown
# The Indented Document Values Format
|
|
|
|
## Overview
|
|
|
|
The Indented Document Values (IDV) format is a meta-syntax for machine-readable textual data.
|
|
|
|
IDV is designed to prioritize human readability and writability by minimizing visual noise- there are no sigils, quotes, or brackets, only colons, indentation, and (when necessary) backslash escapes.
|
|
|
|
As a tradeoff, IDV is not a self-describing data format- while it can be used for defining a serialization or configuration format, systems using it need to layer their own semantics on top of it.
|
|
|
|
### Example
|
|
|
|
```
|
|
Person: Alice
|
|
Uid: 1000
|
|
Phone: 555-1234
|
|
Group: users
|
|
Group: sudo
|
|
Banner:
|
|
============================
|
|
This is my ASCII art login message
|
|
============================
|
|
|
|
Person: Bob
|
|
Uid: 1001
|
|
Phone: 555-5656
|
|
Group: users
|
|
```
|
|
|
|
## Syntax
|
|
|
|
IDV is a line-oriented format. Before any other parsing is done, the input is split into lines, and any trailing whitespace on a line (including line separators) is ignored.
|
|
|
|
### Comments
|
|
|
|
A **Comment** is any line whose first character is a `#` character. Comment lines are for human use and are ignored by the parser.
|
|
|
|
```
|
|
# This line is ignored
|
|
```
|
|
|
|
### Blank Lines
|
|
|
|
A **Blank Line** is any line that only contains whitespace. Because trailing whitespace is always trimmed, all Blank Lines are indistinguishable from each other.
|
|
|
|
Blank Lines are ignored unless they are part of a Document. (see below)
|
|
|
|
### Entries
|
|
|
|
An **Entry** is composed of one or more lines:
|
|
|
|
#### Tags
|
|
|
|
Each entry begins with a **Tag**, terminated by a colon (`:`). A Tag can contain any characters except leading or trailing whitespace, newlines, and colons:
|
|
|
|
```
|
|
Tag:
|
|
```
|
|
|
|
#### Distinguishers
|
|
|
|
Optionally, a Distinguisher can follow the Tag on the same line. A Distinguisher can contain any characters except leading or trailing whitespace, and newlines:
|
|
|
|
```
|
|
Tag: distinguisher
|
|
```
|
|
|
|
#### Escapes
|
|
|
|
Within Tags and Distinguishers, backslash escapes may be used to represent non-permitted or inconvenient characters:
|
|
|
|
```
|
|
Tag With \: And Spaces:
|
|
|
|
Tag: \ distinguisher with leading whitespace and\nA newline
|
|
```
|
|
|
|
| Escape sequence | Replacement |
|
|
| --------------- | ----------------- |
|
|
| \\_\<space>_ | A literal space |
|
|
| \\n | A newline |
|
|
| \\: | A colon (`:`) |
|
|
| \\\\ | A backslash (`\`) |
|
|
|
|
> TODO: additional escapes? ie, hex or unicode?
|
|
|
|
#### Documents
|
|
|
|
After the first line of an entry, any indented lines make up the **Document** portion of the entry:
|
|
|
|
```
|
|
Tag: distinguisher
|
|
First Line
|
|
Second Line
|
|
Third Line
|
|
```
|
|
|
|
The first line of a Document defines the Document's indentation- subsequent lines can be indented deeper, but no line may be indented _less_ than the first line. This indentation is removed from the beginning of each line when determining the Document's value.
|
|
|
|
Blank Lines can not carry indentation information. To resolve this ambiguity, Documents may not begin or end with Blank Lines- such lines are ignored. Blank Lines that occur _between_ indented lines _are_ considered part of the Document.
|
|
|
|
```
|
|
Tag:
|
|
|
|
The above blank line is ignored.
|
|
The below blank line is part of the Document.
|
|
|
|
The below blank line is ignored.
|
|
|
|
Tag:
|
|
Other stuff
|
|
```
|
|
|
|
Backslash escapes are _not_ processed within a Document. However, backslashes may be processed later, by higher-layered semantics.
|
|
|
|
In many cases the Document will contain recursive IDV data, and the rules above are designed to play nicely with this case- but it is up to the concrete format to decide how to parse the Document. It could just as easily contain free text, XML, or a base64 blob.
|
|
|
|
#### Disambiguations:
|
|
|
|
1. The Tag and Distinguisher are both trimmed of surrounding whitespace before being interpreted, but internal whitespace is left intact.
|
|
1. The Distinguisher may contain literal colons; these are treated as regular characters and carry no special meaning.
|
|
|
|
## Data Model
|
|
|
|
Applying minimal interpretation, IDV data can be represented as a list of Entries.
|
|
|
|
An Entry can be represented as a 3-tuple of:
|
|
|
|
1. a string (the Tag)
|
|
2. a string (the optional Distinguisher)
|
|
3. a list of strings (the lines of the Document)
|
|
|
|
How Entries are interpreted by the appication is not specified, but see below for some suggested patterns that should line up with things people usually want to do.
|
|
|
|
## Patterns
|
|
|
|
### Primitive Property
|
|
|
|
> TODO: one of distinguisher | document non-empty, parsing based on expected type
|
|
|
|
### Object Property
|
|
|
|
> TODO: distinguisher ignored, document is IDV
|
|
|
|
### Union Property
|
|
|
|
> TODO: distinguisher determines how the document is parsed
|
|
|
|
### List
|
|
|
|
> TODO: property specified multiple times
|
|
|
|
### Map
|
|
|
|
> TODO: distinguisher defines key, document parsed for value
|
|
|
|
### Property Map
|
|
|
|
> TODO: Category defines key, parsed as property for value
|
|
|
|
### Merged Map
|
|
|
|
## See Also
|
|
|
|
> TODO:
|
|
>
|
|
> - yaml
|
|
> - dpkg control files
|