Record first set of IDV spec thoughts

This commit is contained in:
Tangent Wantwight 2024-07-20 00:52:21 -04:00
parent af07866ac0
commit 763bfbc8cf

83
idv.md Normal file
View file

@ -0,0 +1,83 @@
# The Indented Document Values Format
## Overview
The Indented Document Values (IDV) format is a text-based, whitespace-sensitive serialization format.
IDV is designed to prioritize human readability and writability by minimizing visual noise- there are no sigils, quotes, or brackets, only colons, indentation, and (when necessary) backslash escapes.
As a tradeoff, IDV is not a self-describing data format- you have to know what type of data an IDV document represents at the time you parse it.
### Example
> TODO: need something both concise and nontrivial. LDAP user data is certainly an option
## Syntax
IDV is a line-oriented format. Before any other parsing is done, the input is split into lines, and any trailing whitespace on a line (including line separators) is ignored.
> TODO: possible redraft: sequence of comments, entry headers, and documents, defined by line types (blank, comment, entry header, indented)
The lines of an IDV document represent a single flat list of Comments and Entries.
A **Comment** is any line whose first character is a `#` character. Comment lines are for human use and are ignored by the parser.
```
# This line is ignored
```
An **Entry**'s first line is unindented and contains the name of a **Category**, up to the first `:` character, followed by a **Distinguisher**. All following lines with indentation, if any, are the entry's **Document**:
```
Collection: distinguisher
Indented
document
with a blank line
```
1. The Category and Distinguisher are both trimmed of surrounding whitespace before being interpreted, but internal whitespace is left intact.
1. Backslash unescaping is performed on the Category and Distinguisher.
1. The Distinguisher may contain literal colons; these are treated as regular characters and carry no special meaning.
1. The first line of a Document defines the document's indentation- subsequent lines can be indented deeper, but no line may be indented _less_ than the first line.
1. It is ambiguous whether blank lines are part of a document or just aesthetic spacing for Entries; to resolve this, blank lines before and after a Document are ignored, but internal blank lines are considered part of the Document.
1. Backslash unescaping is **not** performed on the Document. However, backslashes may be processed later, when the document is interpreted.
## Data Model
> TODO: tuples, can be interpreted according to patterns
## Patterns
### Primitive Property
> TODO: one of distinguisher | document non-empty, parsing based on expected type
### Object Property
> TODO: distinguisher ignored, document is IDV
### Union Property
> TODO: distinguisher determines how the document is parsed
### List
> TODO: property specified multiple times
### Map
> TODO: distinguisher defines key, document parsed for value
### Property Map
> TODO: Category defines key, parsed as property for value
### Merged Map
## See Also
> TODO:
>
> - yaml
> - dpkg control files