2024.js/idv.md

168 lines
4.8 KiB
Markdown

# The Indented Document Values Format
## Overview
The Indented Document Values (IDV) format is a meta-syntax for machine-readable textual data.
IDV is designed to prioritize human readability and writability by minimizing visual noise- there are no sigils, quotes, or brackets, only colons, indentation, and (when necessary) backslash escapes.
As a tradeoff, IDV is not a self-describing data format- while it can be used for defining a serialization or configuration format, systems using it need to layer their own semantics on top of it.
### Example
```
Person: Alice
Uid: 1000
Phone: 555-1234
Group: users
Group: sudo
Banner:
============================
This is my ASCII art login message
============================
Person: Bob
Uid: 1001
Phone: 555-5656
Group: users
```
## Syntax
IDV is a line-oriented format. Before any other parsing is done, the input is split into lines, and any trailing whitespace on a line (including line separators) is ignored.
### Comments
A **Comment** is any line whose first character is a `#` character. Comment lines are for human use and are ignored by the parser.
```
# This line is ignored
```
### Blank Lines
A **Blank Line** is any line that only contains whitespace. Because trailing whitespace is always trimmed, all Blank Lines are indistinguishable from each other.
Blank Lines are ignored unless they are part of a Document. (see below)
### Entries
An **Entry** is composed of one or more lines:
#### Tags
Each entry begins with a **Tag**, terminated by a colon (`:`). A Tag can contain any characters except leading or trailing whitespace, newlines, and colons:
```
Tag:
```
#### Distinguishers
Optionally, a Distinguisher can follow the Tag on the same line. A Distinguisher can contain any characters except leading or trailing whitespace, and newlines:
```
Tag: distinguisher
```
#### Escapes
Within Tags and Distinguishers, backslash escapes may be used to represent non-permitted or inconvenient characters:
```
Tag With \: And Spaces:
Tag: \ distinguisher with leading whitespace and\nA newline
```
| Escape sequence | Replacement |
| --------------- | ----------------- |
| \\_\<space>_ | A literal space |
| \\n | A newline |
| \\: | A colon (`:`) |
| \\\\ | A backslash (`\`) |
> TODO: additional escapes? ie, hex or unicode?
#### Documents
After the first line of an entry, any indented lines make up the **Document** portion of the entry:
```
Tag: distinguisher
First Line
Second Line
Third Line
```
The first line of a Document defines the Document's indentation- subsequent lines can be indented deeper, but no line may be indented _less_ than the first line. This indentation is removed from the beginning of each line when determining the Document's value.
Blank Lines can not carry indentation information. To resolve this ambiguity, Documents may not begin or end with Blank Lines- such lines are ignored. Blank Lines that occur _between_ indented lines _are_ considered part of the Document.
```
Tag:
The above blank line is ignored.
The below blank line is part of the Document.
The below blank line is ignored.
Tag:
Other stuff
```
Backslash escapes are _not_ processed within a Document. However, backslashes may be processed later, by higher-layered semantics.
In many cases the Document will contain recursive IDV data, and the rules above are designed to play nicely with this case- but it is up to the concrete format to decide how to parse the Document. It could just as easily contain free text, XML, or a base64 blob.
#### Disambiguations:
1. The Tag and Distinguisher are both trimmed of surrounding whitespace before being interpreted, but internal whitespace is left intact.
1. The Distinguisher may contain literal colons; these are treated as regular characters and carry no special meaning.
## Data Model
Applying minimal interpretation, IDV data can be represented as a list of Entries.
An Entry can be represented as a 3-tuple of:
1. a string (the Tag)
2. a string (the optional Distinguisher)
3. a list of strings (the lines of the Document)
How Entries are interpreted by the appication is not specified, but see below for some suggested patterns that should line up with things people usually want to do.
## Patterns
### Primitive Property
> TODO: one of distinguisher | document non-empty, parsing based on expected type
### Object Property
> TODO: distinguisher ignored, document is IDV
### Union Property
> TODO: distinguisher determines how the document is parsed
### List
> TODO: property specified multiple times
### Map
> TODO: distinguisher defines key, document parsed for value
### Property Map
> TODO: Category defines key, parsed as property for value
### Merged Map
## See Also
> TODO:
>
> - yaml
> - dpkg control files