Revisions to IDV spec
This commit is contained in:
parent
763bfbc8cf
commit
924d8ccf48
1 changed files with 103 additions and 18 deletions
121
idv.md
121
idv.md
|
@ -2,23 +2,36 @@
|
|||
|
||||
## Overview
|
||||
|
||||
The Indented Document Values (IDV) format is a text-based, whitespace-sensitive serialization format.
|
||||
The Indented Document Values (IDV) format is a meta-syntax for machine-readable textual data.
|
||||
|
||||
IDV is designed to prioritize human readability and writability by minimizing visual noise- there are no sigils, quotes, or brackets, only colons, indentation, and (when necessary) backslash escapes.
|
||||
|
||||
As a tradeoff, IDV is not a self-describing data format- you have to know what type of data an IDV document represents at the time you parse it.
|
||||
As a tradeoff, IDV is not a self-describing data format- while it can be used for defining a serialization or configuration format, systems using it need to layer their own semantics on top of it.
|
||||
|
||||
### Example
|
||||
|
||||
> TODO: need something both concise and nontrivial. LDAP user data is certainly an option
|
||||
```
|
||||
Person: Alice
|
||||
Uid: 1000
|
||||
Phone: 555-1234
|
||||
Group: users
|
||||
Group: sudo
|
||||
Banner:
|
||||
============================
|
||||
This is my ASCII art login message
|
||||
============================
|
||||
|
||||
Person: Bob
|
||||
Uid: 1001
|
||||
Phone: 555-5656
|
||||
Group: users
|
||||
```
|
||||
|
||||
## Syntax
|
||||
|
||||
IDV is a line-oriented format. Before any other parsing is done, the input is split into lines, and any trailing whitespace on a line (including line separators) is ignored.
|
||||
|
||||
> TODO: possible redraft: sequence of comments, entry headers, and documents, defined by line types (blank, comment, entry header, indented)
|
||||
|
||||
The lines of an IDV document represent a single flat list of Comments and Entries.
|
||||
### Comments
|
||||
|
||||
A **Comment** is any line whose first character is a `#` character. Comment lines are for human use and are ignored by the parser.
|
||||
|
||||
|
@ -26,26 +39,98 @@ A **Comment** is any line whose first character is a `#` character. Comment line
|
|||
# This line is ignored
|
||||
```
|
||||
|
||||
An **Entry**'s first line is unindented and contains the name of a **Category**, up to the first `:` character, followed by a **Distinguisher**. All following lines with indentation, if any, are the entry's **Document**:
|
||||
### Blank Lines
|
||||
|
||||
A **Blank Line** is any line that only contains whitespace. Because trailing whitespace is always trimmed, all Blank Lines are indistinguishable from each other.
|
||||
|
||||
Blank Lines are ignored unless they are part of a Document. (see below)
|
||||
|
||||
### Entries
|
||||
|
||||
An **Entry** is composed of one or more lines:
|
||||
|
||||
#### Tags
|
||||
|
||||
Each entry begins with a **Tag**, terminated by a colon (`:`). A Tag can contain any characters except leading or trailing whitespace, newlines, and colons:
|
||||
|
||||
```
|
||||
Collection: distinguisher
|
||||
Indented
|
||||
document
|
||||
|
||||
with a blank line
|
||||
Tag:
|
||||
```
|
||||
|
||||
1. The Category and Distinguisher are both trimmed of surrounding whitespace before being interpreted, but internal whitespace is left intact.
|
||||
1. Backslash unescaping is performed on the Category and Distinguisher.
|
||||
#### Distinguishers
|
||||
|
||||
Optionally, a Distinguisher can follow the Tag on the same line. A Distinguisher can contain any characters except leading or trailing whitespace, and newlines:
|
||||
|
||||
```
|
||||
Tag: distinguisher
|
||||
```
|
||||
|
||||
#### Escapes
|
||||
|
||||
Within Tags and Distinguishers, backslash escapes may be used to represent non-permitted or inconvenient characters:
|
||||
|
||||
```
|
||||
Tag With \: And Spaces:
|
||||
|
||||
Tag: \ distinguisher with leading whitespace and\nA newline
|
||||
```
|
||||
|
||||
| Escape sequence | Replacement |
|
||||
| --------------- | ----------------- |
|
||||
| \\_\<space>_ | A literal space |
|
||||
| \\n | A newline |
|
||||
| \\: | A colon (`:`) |
|
||||
| \\\\ | A backslash (`\`) |
|
||||
|
||||
> TODO: additional escapes? ie, hex or unicode?
|
||||
|
||||
#### Documents
|
||||
|
||||
After the first line of an entry, any indented lines make up the **Document** portion of the entry:
|
||||
|
||||
```
|
||||
Tag: distinguisher
|
||||
First Line
|
||||
Second Line
|
||||
Third Line
|
||||
```
|
||||
|
||||
The first line of a Document defines the Document's indentation- subsequent lines can be indented deeper, but no line may be indented _less_ than the first line. This indentation is removed from the beginning of each line when determining the Document's value.
|
||||
|
||||
Blank Lines can not carry indentation information. To resolve this ambiguity, Documents may not begin or end with Blank Lines- such lines are ignored. Blank Lines that occur _between_ indented lines _are_ considered part of the Document.
|
||||
|
||||
```
|
||||
Tag:
|
||||
|
||||
The above blank line is ignored.
|
||||
The below blank line is part of the Document.
|
||||
|
||||
The below blank line is ignored.
|
||||
|
||||
Tag:
|
||||
Other stuff
|
||||
```
|
||||
|
||||
Backslash escapes are _not_ processed within a Document. However, backslashes may be processed later, by higher-layered semantics.
|
||||
|
||||
In many cases the Document will contain recursive IDV data, and the rules above are designed to play nicely with this case- but it is up to the concrete format to decide how to parse the Document. It could just as easily contain free text, XML, or a base64 blob.
|
||||
|
||||
#### Disambiguations:
|
||||
|
||||
1. The Tag and Distinguisher are both trimmed of surrounding whitespace before being interpreted, but internal whitespace is left intact.
|
||||
1. The Distinguisher may contain literal colons; these are treated as regular characters and carry no special meaning.
|
||||
1. The first line of a Document defines the document's indentation- subsequent lines can be indented deeper, but no line may be indented _less_ than the first line.
|
||||
1. It is ambiguous whether blank lines are part of a document or just aesthetic spacing for Entries; to resolve this, blank lines before and after a Document are ignored, but internal blank lines are considered part of the Document.
|
||||
1. Backslash unescaping is **not** performed on the Document. However, backslashes may be processed later, when the document is interpreted.
|
||||
|
||||
## Data Model
|
||||
|
||||
> TODO: tuples, can be interpreted according to patterns
|
||||
Applying minimal interpretation, IDV data can be represented as a list of Entries.
|
||||
|
||||
An Entry can be represented as a 3-tuple of:
|
||||
|
||||
1. a string (the Tag)
|
||||
2. a string (the optional Distinguisher)
|
||||
3. a list of strings (the lines of the Document)
|
||||
|
||||
How Entries are interpreted by the appication is not specified, but see below for some suggested patterns that should line up with things people usually want to do.
|
||||
|
||||
## Patterns
|
||||
|
||||
|
|
Loading…
Add table
Reference in a new issue