Things get a little trickier when the length of a binary string can’t be predicted, but the solution is pretty straightforward: the string is prefixed with useful information about its length. The core of variable-length serialization is the varint pseudotype.
Variable integers
We’ve met 4 integer types so far: int8, int16, int32 and int64. What if we wanted to save memory on average though? With millions of transactions, the blockchain is likely to notice conservative efforts on integer serialization, hence the varint type.
A varint may be of any of the above lengths, as long as such length is specified –except for int8– in an additional 1-byte prefix:
8-bit varints have no such prefix because they’re a value per se. A table will hopefully shed some light:
size | value | encoding |
---|---|---|
8-bit | 8c |
8c |
16-bit | 12 a4 |
fd 12 a4 |
32-bit | 12 a4 5b 78 |
fe 12 a4 5b 78 |
64-bit | 12 a4 5b 78 12 c4 56 d8 |
ff 12 a4 5b 78 12 c4 56 d8 |
See how the varint prefix introduces the size of the number coming after. The only limitation of varint8 is that it’s unable to represent the fd
-ff
values as they have a special meaning, so a varint16 would be required.
Check out varint.h for a varint parsing implementation.
Example
Consider the byte string:
13 9c fd 7d 80 44 6b a2 20 cc
as seen in ex-varints.c:
and the corresponding high-level structure:
The struct has 3 fixed-length integers and 1 variable-length integer (by contract). Since varints can hold up to 64-bit values, we need to allocate the largest size. Here’s how we proceed to decode the binary string into the struct:
In other words:
- The first field is an int16:
9c13
. - Go ahead and move to
bytes + 2
(int16 takes 2 bytes). bytes + 2
holdsfd
and announces a varint16.- Skip to the following 2 bytes.
- The second field is
807d
. - Go ahead and move to
bytes + 5
(varint16 takesvarlen = 3
bytes). - The third field is an int32:
20a26b44
. - The fourth field is an int8:
cc
.
Variable data
Now that you’re able to read a varint, deserializing variable data is a no-brainer. Technically, variable data is just some binary data prefixed with a varint holding its length. Consider the 13-bytes string:
fd 0a 00 e3 03 41 8b a6
20 e1 b7 83 60
as seen in ex-vardata.c:
Here’s the decoding process:
Like in the previous example, we find a varint16 at the beginning of the array holding the value 0a
, that is 10 in decimal base. 10 is the length of the data coming next, so we read 10 bytes starting from byte + 3
because a varint16 takes varlen = 3
bytes. That’s it!
The same applies for variable strings, you just encode them in UTF-8 before serialization.
Get the code!
Full source on GitHub.
Next block in chain?
You learned how to serialize variable-length data for the blockchain. You’re fully set to exploit the bigger entities!
In the next article I’ll teach you some concepts about keys and blockchain property. Please share this post if you enjoyed it and use the form below for questions and comments!