Enough of theory, we’re up to some real code! Before treating full-blown entities, though, you need to learn how to serialize generic data through a small set a primitives. Think of C variables, we certainly want to know what int
and char
mean before defining a custom struct
.
Remember, little-endian is the default byte order. Code examples may include common.h for general purpose routines and endian.h for naïve endian conversions. Hash functions are defined in hash.h with help from OpenSSL. From now on I expect you don’t get in trouble with pointer arithmetics.
Integers
First of all, there’s no use for negative integers in the blockchain. Integers are always unsigned, they can hold 8-bit, 16-bit, 32-bit or 64-bit values. In ex-integers.c:
we serialize “n8 + n16 + n32 + n64” into (1 + 2 + 4 + 8) = 15 bytes. When storing single bytes we don’t care about endianness, but we must in all other cases. That’s why little-endian order has to be enforced for multibyte values:
If the machine is little-endian, the numbers are stored without additional manipulation. If not, their bytes are reversed.
The resulting ser
array (15 bytes):
01
23 45
67 89 ab cd
ef 12 34 56 78 9a bc de
Fixed-length data
By fixed-length data I mean data whose length is known in advance and therefore doesn’t need to be attached. In actual code, memcpy
is all we need to serialize binary data.
Null-padded strings
Fixed-length strings are encoded in UTF-8 and padded with \0
characters up to the desired length. This is the case of the Bitcoin p2p protocol, where messages are identified by human-readable names like version
, tx
, getblocks
etc. with a maximum length of 12 characters. In ex-fixed-strings.c:
we serialize “n32 + str + n16” into (4 + 10 + 2) = 16 bytes. Safely assume that ASCII strings encode to raw bytes for free. The actual string length is required to compute the padding:
Final packing:
The resulting ser
array (16 bytes):
8b a3 f7 68
46 6f 6f 42 61 72 00 00 00 00
12 ee
Hashes
Hashes are another typical example of fixed-length data. In ex-hashes.c (requires OpenSSL):
we serialize “prefix + hash256(message) + suffix” into (2 + 32 + 1) = 35 bytes. Below we first calculate the SHA-256 digest of the message:
The SHA-256 algorithm yields a 256-bit digest, so we allocate an array of 32 bytes in advance. SHA256_DIGEST_LENGTH
would be equivalent here, but I want to be as explicit as possible. The SHA-256 digest for the “Hello Bitcoin!” string is:
51 8a d5 a3 75 fa 52 f8
4b 2b 3d f7 93 3a d6 85
eb 62 cf 69 86 9a 96 73
15 61 f9 4d 10 82 6b 5c
By hashing again:
we get the hash256 digest:
90 98 6e a4 e2 8b 84 7c
c7 f9 be ba 87 ea 81 b2
21 ca 6e af 98 28 a8 b0
4c 29 0c 21 d8 91 bc da
with 90
being the MSB because SHA-256 works big-endian. Final packing:
The resulting ser
array (35 bytes):
7f d1
90 98 6e a4 e2 8b 84 7c
c7 f9 be ba 87 ea 81 b2
21 ca 6e af 98 28 a8 b0
4c 29 0c 21 d8 91 bc da
8c
Get the code!
Full source on GitHub.
Next block in chain?
You learned how to serialize fixed-length data for the blockchain.
In the second part we’ll deal with variable-length data. Please share this post if you enjoyed it and use the form below for questions and comments!