diff --git a/idv.md b/idv.md new file mode 100644 index 0000000..904a32d --- /dev/null +++ b/idv.md @@ -0,0 +1,83 @@ +# The Indented Document Values Format + +## Overview + +The Indented Document Values (IDV) format is a text-based, whitespace-sensitive serialization format. + +IDV is designed to prioritize human readability and writability by minimizing visual noise- there are no sigils, quotes, or brackets, only colons, indentation, and (when necessary) backslash escapes. + +As a tradeoff, IDV is not a self-describing data format- you have to know what type of data an IDV document represents at the time you parse it. + +### Example + +> TODO: need something both concise and nontrivial. LDAP user data is certainly an option + +## Syntax + +IDV is a line-oriented format. Before any other parsing is done, the input is split into lines, and any trailing whitespace on a line (including line separators) is ignored. + +> TODO: possible redraft: sequence of comments, entry headers, and documents, defined by line types (blank, comment, entry header, indented) + +The lines of an IDV document represent a single flat list of Comments and Entries. + +A **Comment** is any line whose first character is a `#` character. Comment lines are for human use and are ignored by the parser. + +``` +# This line is ignored +``` + +An **Entry**'s first line is unindented and contains the name of a **Category**, up to the first `:` character, followed by a **Distinguisher**. All following lines with indentation, if any, are the entry's **Document**: + +``` +Collection: distinguisher + Indented + document + + with a blank line +``` + +1. The Category and Distinguisher are both trimmed of surrounding whitespace before being interpreted, but internal whitespace is left intact. +1. Backslash unescaping is performed on the Category and Distinguisher. +1. The Distinguisher may contain literal colons; these are treated as regular characters and carry no special meaning. +1. The first line of a Document defines the document's indentation- subsequent lines can be indented deeper, but no line may be indented _less_ than the first line. +1. It is ambiguous whether blank lines are part of a document or just aesthetic spacing for Entries; to resolve this, blank lines before and after a Document are ignored, but internal blank lines are considered part of the Document. +1. Backslash unescaping is **not** performed on the Document. However, backslashes may be processed later, when the document is interpreted. + +## Data Model + +> TODO: tuples, can be interpreted according to patterns + +## Patterns + +### Primitive Property + +> TODO: one of distinguisher | document non-empty, parsing based on expected type + +### Object Property + +> TODO: distinguisher ignored, document is IDV + +### Union Property + +> TODO: distinguisher determines how the document is parsed + +### List + +> TODO: property specified multiple times + +### Map + +> TODO: distinguisher defines key, document parsed for value + +### Property Map + +> TODO: Category defines key, parsed as property for value + +### Merged Map + +## See Also + +> TODO: +> +> - yaml +> - dpkg control files