Record first set of IDV spec thoughts

2024-07-20 00:52:21 -04:00 · 2024-07-20 00:52:21 -04:00 · 763bfbc8cf
commit 763bfbc8cf
parent af07866ac0
1 changed files with 83 additions and 0 deletions
--- a/idv.md
+++ b/idv.md
@ -0,0 +1,83 @@
+# The Indented Document Values Format
+
+## Overview
+
+The Indented Document Values (IDV) format is a text-based, whitespace-sensitive serialization format.
+
+IDV is designed to prioritize human readability and writability by minimizing visual noise- there are no sigils, quotes, or brackets, only colons, indentation, and (when necessary) backslash escapes.
+
+As a tradeoff, IDV is not a self-describing data format- you have to know what type of data an IDV document represents at the time you parse it.
+
+### Example
+
+> TODO: need something both concise and nontrivial. LDAP user data is certainly an option
+
+## Syntax
+
+IDV is a line-oriented format. Before any other parsing is done, the input is split into lines, and any trailing whitespace on a line (including line separators) is ignored.
+
+> TODO: possible redraft: sequence of comments, entry headers, and documents, defined by line types (blank, comment, entry header, indented)
+
+The lines of an IDV document represent a single flat list of Comments and Entries.
+
+A **Comment** is any line whose first character is a `#` character. Comment lines are for human use and are ignored by the parser.
+
+```
+# This line is ignored
+```
+
+An **Entry**'s first line is unindented and contains the name of a **Category**, up to the first `:` character, followed by a **Distinguisher**. All following lines with indentation, if any, are the entry's **Document**:
+
+```
+Collection: distinguisher
+  Indented
+  document
+
+  with a blank line
+```
+
+1. The Category and Distinguisher are both trimmed of surrounding whitespace before being interpreted, but internal whitespace is left intact.
+1. Backslash unescaping is performed on the Category and Distinguisher.
+1. The Distinguisher may contain literal colons; these are treated as regular characters and carry no special meaning.
+1. The first line of a Document defines the document's indentation- subsequent lines can be indented deeper, but no line may be indented _less_ than the first line.
+1. It is ambiguous whether blank lines are part of a document or just aesthetic spacing for Entries; to resolve this, blank lines before and after a Document are ignored, but internal blank lines are considered part of the Document.
+1. Backslash unescaping is **not** performed on the Document. However, backslashes may be processed later, when the document is interpreted.
+
+## Data Model
+
+> TODO: tuples, can be interpreted according to patterns
+
+## Patterns
+
+### Primitive Property
+
+> TODO: one of distinguisher | document non-empty, parsing based on expected type
+
+### Object Property
+
+> TODO: distinguisher ignored, document is IDV
+
+### Union Property
+
+> TODO: distinguisher determines how the document is parsed
+
+### List
+
+> TODO: property specified multiple times
+
+### Map
+
+> TODO: distinguisher defines key, document parsed for value
+
+### Property Map
+
+> TODO: Category defines key, parsed as property for value
+
+### Merged Map
+
+## See Also
+
+> TODO:
+>
+> - yaml
+> - dpkg control files