Changes between Version 18 and Version 19 of StructuredBinaryData


Ignore:
Timestamp:
2012-08-06T21:42:18Z (12 years ago)
Author:
Sean Bartell
Comment:

Document bit blobs and add future features

Legend:

Unmodified
Added
Removed
Modified
  • StructuredBinaryData

    v18 v19  
    6868integer, and a string holds a Unicode string.
    6969
    70 A blob node represents an arbitrary sequence of raw bytes. Blob nodes are
    71 polymorphic, allowing any source of raw binary data to be used. Bithenge
     70A blob node represents an arbitrary sequence of raw bytes or bits. Blob nodes
     71are polymorphic, allowing any source of raw binary data to be used. Bithenge
    7272includes blob node implementations for in‐memory buffers, files, and block
    7373devices. An implementation has also been written that reads another task’s
     
    115115which takes a 4‐byte blob node as the input tree and provides an integer node
    116116as the output tree. Another example would be `FAT16_filesystem`, a transform
    117 that takes a blob node as the input tree and provides a complex output tree
    118 with various decoded information about the filesystem. Some transforms, like
    119 `uint32le`, are built in to Bithenge; more complicated transforms can be loaded
    120 from a script file.
     117that takes a byte blob node as the input tree and provides a complex output
     118tree with various decoded information about the filesystem. Some transforms,
     119like `uint32le`, are built in to Bithenge; more complicated transforms can be
     120loaded from a script file.
    121121
    122122Transforms are represented in Bithenge with a polymorphic object. The primary
     
    132132
    133133||= name =||= input =||= output =||= description =||= example =||
    134 ||ascii             ||blob node        ||string       ||decodes some bytes as ASCII characters ||  `hex:6869` becomes `"hi"` ||
    135 ||known_length(len) ||blob node        ||blob node    ||requires the input to have a known length || ||
    136 ||nonzero_boolean   ||integer          ||boolean      ||decodes a boolean where nonzero values are true || `0` becomes `false` ||
    137 ||uint8             ||1‐byte blob node ||integer node ||decodes a 1‐byte unsigned integer ||  `hex:11` becomes `17` ||
    138 ||uint16be          ||2‐byte blob node ||integer node ||decodes a 2‐byte big‐endian unsigned integer ||  `hex:0201` becomes `513` ||
    139 ||uint16le          ||2‐byte blob node ||integer node ||decodes a 2‐byte little‐endian unsigned integer ||  `hex:0101` becomes `257` ||
    140 ||uint32be          ||4‐byte blob node ||integer node ||decodes a 4‐byte big‐endian unsigned integer ||  `hex:00000201` becomes `513` ||
    141 ||uint32le          ||4‐byte blob node ||integer node ||decodes a 4‐byte little‐endian unsigned integer ||  `hex:01010000` becomes `257` ||
    142 ||uint64be          ||8‐byte blob node ||integer node ||decodes a 8‐byte big‐endian unsigned integer ||  `hex:0000000000000201` becomes `513` ||
    143 ||uint64le          ||8‐byte blob node ||integer node ||decodes a 8‐byte little‐endian unsigned integer ||  `hex:0101000000000000` becomes `257` ||
    144 ||zero_terminated   ||blob node        ||blob node    ||takes bytes up until the first `00` ||  `hex:7f0400` becomes `hex:7f04` ||
     134||ascii             ||byte blob node   ||string        ||decodes some bytes as ASCII characters ||  `hex:6869` becomes `"hi"` ||
     135||bit               ||1‐bit blob node  ||boolean       ||decodes a single bit || `1` becomes `true` ||
     136||bits_be           ||byte blob node   ||bit blob node ||decodes bytes as bits, starting with the most‐significant bit || `hex:0f` becomes `bit:00001111` ||
     137||bits_le           ||byte blob node   ||bit blob node ||decodes bytes as bits, starting with the least‐significant bit || `hex:0f` becomes `bit:11110000` ||
     138||known_length(len) ||blob node        ||blob node     ||requires the input to have a known length || ||
     139||nonzero_boolean   ||integer          ||boolean       ||decodes a boolean where nonzero values are true || `0` becomes `false` ||
     140||uint8             ||1‐byte blob node ||integer node  ||decodes a 1‐byte unsigned integer ||  `hex:11` becomes `17` ||
     141||uint16be          ||2‐byte blob node ||integer node  ||decodes a 2‐byte big‐endian unsigned integer ||  `hex:0201` becomes `513` ||
     142||uint16le          ||2‐byte blob node ||integer node  ||decodes a 2‐byte little‐endian unsigned integer ||  `hex:0101` becomes `257` ||
     143||uint32be          ||4‐byte blob node ||integer node  ||decodes a 4‐byte big‐endian unsigned integer ||  `hex:00000201` becomes `513` ||
     144||uint32le          ||4‐byte blob node ||integer node  ||decodes a 4‐byte little‐endian unsigned integer ||  `hex:01010000` becomes `257` ||
     145||uint64be          ||8‐byte blob node ||integer node  ||decodes a 8‐byte big‐endian unsigned integer ||  `hex:0000000000000201` becomes `513` ||
     146||uint64le          ||8‐byte blob node ||integer node  ||decodes a 8‐byte little‐endian unsigned integer ||  `hex:0101000000000000` becomes `257` ||
     147||uint_be(len)      ||bit blob node    ||integer node  ||decodes bits as an unsigned integer, starting with the most‐significant bit || ||
     148||uint_le(len)      ||bit blob node    ||integer node  ||decodes bits as an unsigned integer, starting with the least‐significant bit || ||
     149||zero_terminated   ||byte blob node   ||byte blob node||takes bytes up until the first `00` ||  `hex:7f0400` becomes `hex:7f04` ||
    145150
    146151== Basic syntax ==
     
    267272   could pass the whole blob as a parameter and apply transforms to subblobs.
    268273   This is essential for non‐sequential blobs like filesystems.
    269  Bitfields:: `struct` will be extended to work with bits instead of just bytes.
    270274 Complex expressions:: Expressions that use operators or call transforms.
     275 Accessing outer fields:: Expressions can use previously decoded fields of the
     276   current `struct`, but they need a way to access the previously decoded
     277   fields of an outer `struct`.
    271278 Assertions:: These could be implemented as transforms that don't actually
    272279   change the input. There could be multiple levels, ranging from “warning” to
     
    274281 Enumerations:: An easier way to handle many constant values, like
    275282   `enum { 0: "none", 1: "file", 2: "directory", 3: "symlink" }`.
     283 Merge blobs and internal nodes:: Currently, `struct`, `repeat`, and so on only
     284   work with blobs, which must be either byte sequences or bit sequences.
     285   Numbered internal nodes (such as those made by `repeat`) should be supported
     286   as well.
    276287 Transforming internal nodes:: After binary data is decoded into a tree, it
    277288   should be possible to apply further transforms to interpret the data
     
    279290   filesystem have been decoded, a further transform could determine the data
    280291   for each file.
     292 More information in repeat subtransforms:: Repeat subtransforms should have
     293   access to the current index and previously decoded items.
    281294 Hidden fields:: Some fields, such as length fields, are no longer interesting
    282295   after the data is decoded, so they should be hidden by default.
     
    290303   decoded within a blob. There would need to be some sort of scoping to
    291304   determine which transforms have the automatic parameters.
     305 Smarter length calculation:: Bithenge should automatically detect the length
     306   of certain composed transforms, such as `repeat(8) {bit} <- bits_le`. This
     307   would also be addressed by the constraint‐based version.
    292308
    293309=== Constraint‐based version ===