Revisions to IDV spec
This commit is contained in:
parent
763bfbc8cf
commit
924d8ccf48
1 changed files with 103 additions and 18 deletions
121
idv.md
121
idv.md
|
@ -2,23 +2,36 @@
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
The Indented Document Values (IDV) format is a text-based, whitespace-sensitive serialization format.
|
The Indented Document Values (IDV) format is a meta-syntax for machine-readable textual data.
|
||||||
|
|
||||||
IDV is designed to prioritize human readability and writability by minimizing visual noise- there are no sigils, quotes, or brackets, only colons, indentation, and (when necessary) backslash escapes.
|
IDV is designed to prioritize human readability and writability by minimizing visual noise- there are no sigils, quotes, or brackets, only colons, indentation, and (when necessary) backslash escapes.
|
||||||
|
|
||||||
As a tradeoff, IDV is not a self-describing data format- you have to know what type of data an IDV document represents at the time you parse it.
|
As a tradeoff, IDV is not a self-describing data format- while it can be used for defining a serialization or configuration format, systems using it need to layer their own semantics on top of it.
|
||||||
|
|
||||||
### Example
|
### Example
|
||||||
|
|
||||||
> TODO: need something both concise and nontrivial. LDAP user data is certainly an option
|
```
|
||||||
|
Person: Alice
|
||||||
|
Uid: 1000
|
||||||
|
Phone: 555-1234
|
||||||
|
Group: users
|
||||||
|
Group: sudo
|
||||||
|
Banner:
|
||||||
|
============================
|
||||||
|
This is my ASCII art login message
|
||||||
|
============================
|
||||||
|
|
||||||
|
Person: Bob
|
||||||
|
Uid: 1001
|
||||||
|
Phone: 555-5656
|
||||||
|
Group: users
|
||||||
|
```
|
||||||
|
|
||||||
## Syntax
|
## Syntax
|
||||||
|
|
||||||
IDV is a line-oriented format. Before any other parsing is done, the input is split into lines, and any trailing whitespace on a line (including line separators) is ignored.
|
IDV is a line-oriented format. Before any other parsing is done, the input is split into lines, and any trailing whitespace on a line (including line separators) is ignored.
|
||||||
|
|
||||||
> TODO: possible redraft: sequence of comments, entry headers, and documents, defined by line types (blank, comment, entry header, indented)
|
### Comments
|
||||||
|
|
||||||
The lines of an IDV document represent a single flat list of Comments and Entries.
|
|
||||||
|
|
||||||
A **Comment** is any line whose first character is a `#` character. Comment lines are for human use and are ignored by the parser.
|
A **Comment** is any line whose first character is a `#` character. Comment lines are for human use and are ignored by the parser.
|
||||||
|
|
||||||
|
@ -26,26 +39,98 @@ A **Comment** is any line whose first character is a `#` character. Comment line
|
||||||
# This line is ignored
|
# This line is ignored
|
||||||
```
|
```
|
||||||
|
|
||||||
An **Entry**'s first line is unindented and contains the name of a **Category**, up to the first `:` character, followed by a **Distinguisher**. All following lines with indentation, if any, are the entry's **Document**:
|
### Blank Lines
|
||||||
|
|
||||||
|
A **Blank Line** is any line that only contains whitespace. Because trailing whitespace is always trimmed, all Blank Lines are indistinguishable from each other.
|
||||||
|
|
||||||
|
Blank Lines are ignored unless they are part of a Document. (see below)
|
||||||
|
|
||||||
|
### Entries
|
||||||
|
|
||||||
|
An **Entry** is composed of one or more lines:
|
||||||
|
|
||||||
|
#### Tags
|
||||||
|
|
||||||
|
Each entry begins with a **Tag**, terminated by a colon (`:`). A Tag can contain any characters except leading or trailing whitespace, newlines, and colons:
|
||||||
|
|
||||||
```
|
```
|
||||||
Collection: distinguisher
|
Tag:
|
||||||
Indented
|
|
||||||
document
|
|
||||||
|
|
||||||
with a blank line
|
|
||||||
```
|
```
|
||||||
|
|
||||||
1. The Category and Distinguisher are both trimmed of surrounding whitespace before being interpreted, but internal whitespace is left intact.
|
#### Distinguishers
|
||||||
1. Backslash unescaping is performed on the Category and Distinguisher.
|
|
||||||
|
Optionally, a Distinguisher can follow the Tag on the same line. A Distinguisher can contain any characters except leading or trailing whitespace, and newlines:
|
||||||
|
|
||||||
|
```
|
||||||
|
Tag: distinguisher
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Escapes
|
||||||
|
|
||||||
|
Within Tags and Distinguishers, backslash escapes may be used to represent non-permitted or inconvenient characters:
|
||||||
|
|
||||||
|
```
|
||||||
|
Tag With \: And Spaces:
|
||||||
|
|
||||||
|
Tag: \ distinguisher with leading whitespace and\nA newline
|
||||||
|
```
|
||||||
|
|
||||||
|
| Escape sequence | Replacement |
|
||||||
|
| --------------- | ----------------- |
|
||||||
|
| \\_\<space>_ | A literal space |
|
||||||
|
| \\n | A newline |
|
||||||
|
| \\: | A colon (`:`) |
|
||||||
|
| \\\\ | A backslash (`\`) |
|
||||||
|
|
||||||
|
> TODO: additional escapes? ie, hex or unicode?
|
||||||
|
|
||||||
|
#### Documents
|
||||||
|
|
||||||
|
After the first line of an entry, any indented lines make up the **Document** portion of the entry:
|
||||||
|
|
||||||
|
```
|
||||||
|
Tag: distinguisher
|
||||||
|
First Line
|
||||||
|
Second Line
|
||||||
|
Third Line
|
||||||
|
```
|
||||||
|
|
||||||
|
The first line of a Document defines the Document's indentation- subsequent lines can be indented deeper, but no line may be indented _less_ than the first line. This indentation is removed from the beginning of each line when determining the Document's value.
|
||||||
|
|
||||||
|
Blank Lines can not carry indentation information. To resolve this ambiguity, Documents may not begin or end with Blank Lines- such lines are ignored. Blank Lines that occur _between_ indented lines _are_ considered part of the Document.
|
||||||
|
|
||||||
|
```
|
||||||
|
Tag:
|
||||||
|
|
||||||
|
The above blank line is ignored.
|
||||||
|
The below blank line is part of the Document.
|
||||||
|
|
||||||
|
The below blank line is ignored.
|
||||||
|
|
||||||
|
Tag:
|
||||||
|
Other stuff
|
||||||
|
```
|
||||||
|
|
||||||
|
Backslash escapes are _not_ processed within a Document. However, backslashes may be processed later, by higher-layered semantics.
|
||||||
|
|
||||||
|
In many cases the Document will contain recursive IDV data, and the rules above are designed to play nicely with this case- but it is up to the concrete format to decide how to parse the Document. It could just as easily contain free text, XML, or a base64 blob.
|
||||||
|
|
||||||
|
#### Disambiguations:
|
||||||
|
|
||||||
|
1. The Tag and Distinguisher are both trimmed of surrounding whitespace before being interpreted, but internal whitespace is left intact.
|
||||||
1. The Distinguisher may contain literal colons; these are treated as regular characters and carry no special meaning.
|
1. The Distinguisher may contain literal colons; these are treated as regular characters and carry no special meaning.
|
||||||
1. The first line of a Document defines the document's indentation- subsequent lines can be indented deeper, but no line may be indented _less_ than the first line.
|
|
||||||
1. It is ambiguous whether blank lines are part of a document or just aesthetic spacing for Entries; to resolve this, blank lines before and after a Document are ignored, but internal blank lines are considered part of the Document.
|
|
||||||
1. Backslash unescaping is **not** performed on the Document. However, backslashes may be processed later, when the document is interpreted.
|
|
||||||
|
|
||||||
## Data Model
|
## Data Model
|
||||||
|
|
||||||
> TODO: tuples, can be interpreted according to patterns
|
Applying minimal interpretation, IDV data can be represented as a list of Entries.
|
||||||
|
|
||||||
|
An Entry can be represented as a 3-tuple of:
|
||||||
|
|
||||||
|
1. a string (the Tag)
|
||||||
|
2. a string (the optional Distinguisher)
|
||||||
|
3. a list of strings (the lines of the Document)
|
||||||
|
|
||||||
|
How Entries are interpreted by the appication is not specified, but see below for some suggested patterns that should line up with things people usually want to do.
|
||||||
|
|
||||||
## Patterns
|
## Patterns
|
||||||
|
|
||||||
|
|
Loading…
Add table
Reference in a new issue