diff --git a/idv.md b/idv.md index 904a32d..9ad7de1 100644 --- a/idv.md +++ b/idv.md @@ -2,23 +2,36 @@ ## Overview -The Indented Document Values (IDV) format is a text-based, whitespace-sensitive serialization format. +The Indented Document Values (IDV) format is a meta-syntax for machine-readable textual data. IDV is designed to prioritize human readability and writability by minimizing visual noise- there are no sigils, quotes, or brackets, only colons, indentation, and (when necessary) backslash escapes. -As a tradeoff, IDV is not a self-describing data format- you have to know what type of data an IDV document represents at the time you parse it. +As a tradeoff, IDV is not a self-describing data format- while it can be used for defining a serialization or configuration format, systems using it need to layer their own semantics on top of it. ### Example -> TODO: need something both concise and nontrivial. LDAP user data is certainly an option +``` +Person: Alice + Uid: 1000 + Phone: 555-1234 + Group: users + Group: sudo + Banner: + ============================ + This is my ASCII art login message + ============================ + +Person: Bob + Uid: 1001 + Phone: 555-5656 + Group: users +``` ## Syntax IDV is a line-oriented format. Before any other parsing is done, the input is split into lines, and any trailing whitespace on a line (including line separators) is ignored. -> TODO: possible redraft: sequence of comments, entry headers, and documents, defined by line types (blank, comment, entry header, indented) - -The lines of an IDV document represent a single flat list of Comments and Entries. +### Comments A **Comment** is any line whose first character is a `#` character. Comment lines are for human use and are ignored by the parser. @@ -26,26 +39,98 @@ A **Comment** is any line whose first character is a `#` character. Comment line # This line is ignored ``` -An **Entry**'s first line is unindented and contains the name of a **Category**, up to the first `:` character, followed by a **Distinguisher**. All following lines with indentation, if any, are the entry's **Document**: +### Blank Lines + +A **Blank Line** is any line that only contains whitespace. Because trailing whitespace is always trimmed, all Blank Lines are indistinguishable from each other. + +Blank Lines are ignored unless they are part of a Document. (see below) + +### Entries + +An **Entry** is composed of one or more lines: + +#### Tags + +Each entry begins with a **Tag**, terminated by a colon (`:`). A Tag can contain any characters except leading or trailing whitespace, newlines, and colons: ``` -Collection: distinguisher - Indented - document - - with a blank line +Tag: ``` -1. The Category and Distinguisher are both trimmed of surrounding whitespace before being interpreted, but internal whitespace is left intact. -1. Backslash unescaping is performed on the Category and Distinguisher. +#### Distinguishers + +Optionally, a Distinguisher can follow the Tag on the same line. A Distinguisher can contain any characters except leading or trailing whitespace, and newlines: + +``` +Tag: distinguisher +``` + +#### Escapes + +Within Tags and Distinguishers, backslash escapes may be used to represent non-permitted or inconvenient characters: + +``` +Tag With \: And Spaces: + +Tag: \ distinguisher with leading whitespace and\nA newline +``` + +| Escape sequence | Replacement | +| --------------- | ----------------- | +| \\_\_ | A literal space | +| \\n | A newline | +| \\: | A colon (`:`) | +| \\\\ | A backslash (`\`) | + +> TODO: additional escapes? ie, hex or unicode? + +#### Documents + +After the first line of an entry, any indented lines make up the **Document** portion of the entry: + +``` +Tag: distinguisher + First Line + Second Line + Third Line +``` + +The first line of a Document defines the Document's indentation- subsequent lines can be indented deeper, but no line may be indented _less_ than the first line. This indentation is removed from the beginning of each line when determining the Document's value. + +Blank Lines can not carry indentation information. To resolve this ambiguity, Documents may not begin or end with Blank Lines- such lines are ignored. Blank Lines that occur _between_ indented lines _are_ considered part of the Document. + +``` +Tag: + + The above blank line is ignored. + The below blank line is part of the Document. + + The below blank line is ignored. + +Tag: + Other stuff +``` + +Backslash escapes are _not_ processed within a Document. However, backslashes may be processed later, by higher-layered semantics. + +In many cases the Document will contain recursive IDV data, and the rules above are designed to play nicely with this case- but it is up to the concrete format to decide how to parse the Document. It could just as easily contain free text, XML, or a base64 blob. + +#### Disambiguations: + +1. The Tag and Distinguisher are both trimmed of surrounding whitespace before being interpreted, but internal whitespace is left intact. 1. The Distinguisher may contain literal colons; these are treated as regular characters and carry no special meaning. -1. The first line of a Document defines the document's indentation- subsequent lines can be indented deeper, but no line may be indented _less_ than the first line. -1. It is ambiguous whether blank lines are part of a document or just aesthetic spacing for Entries; to resolve this, blank lines before and after a Document are ignored, but internal blank lines are considered part of the Document. -1. Backslash unescaping is **not** performed on the Document. However, backslashes may be processed later, when the document is interpreted. ## Data Model -> TODO: tuples, can be interpreted according to patterns +Applying minimal interpretation, IDV data can be represented as a list of Entries. + +An Entry can be represented as a 3-tuple of: + +1. a string (the Tag) +2. a string (the optional Distinguisher) +3. a list of strings (the lines of the Document) + +How Entries are interpreted by the appication is not specified, but see below for some suggested patterns that should line up with things people usually want to do. ## Patterns