Skip to main content

The MultiversX Serialization Format

In MultiversX, there is a specific serialization format for all data that interacts with a smart contract. The serialization format is central to any project because all values entering and exiting a contract are represented as byte arrays that need to be interpreted according to a consistent specification.

In Rust, the multiversx-sc-codec crate (crate, doc) exclusively deals with this format. Both Go and Rust implementations of scenarios have a component that serializes to this format. DApp developers need to be aware of this format when interacting with the smart contract on the backend.

Rationale

We want the format to be somewhat readable and to interact with the rest of the blockchain ecosystem as easily as possible. This is why we have chosen big endian representation for all numeric types.

More importantly, the format needs to be as compact as possible, since each additional byte costs additional fees.

The concept of top-level vs. nested objects

There is a perk that is central to the formatter: we know the size of the byte arrays entering the contract. All arguments have a known size in bytes, and we normally learn the length of storage values before loading the value itself into the contract. This gives us some additional data straight away that allows us to encode less.

Imagine that we have an argument of type int32. During a smart contract call we want to transmit the value "5" to it. A standard deserializer might expect us to send the full 4 bytes 0x00000005, but there is clearly no need for the leading zeroes. It's a single argument, and we know where to stop, there is no risk of reading too much. So sending 0x05 is enough. We saved 3 bytes. Here we say that the integer is represented in its top-level form, it exists on its own and can be represented more compactly.

But now imagine that an argument that deserializes as a vector of int32. The numbers are serialized one after the other. We no longer have the possibility of having variable length integers because we won't know where one number begins and one ends. Should we interpret 0x0101 as[1, 1] or [257]? So the solution is to always represent each integer in its full 4-byte form. [1, 1] is thus represented as 0x0000000100000001 and[257]as 0x00000101, there is no more ambiguity. The integers here are said to be in their nested form. This means that because they are part of a larger structure, the length of their representation must be apparent from the encoding.

But what about the vector itself? Its representation must always be a multiple of 4 bytes in length, so from the representation we can always deduce the length of the vector by dividing the number of bytes by 4. If the encoded byte length is not divisible by 4, this is a deserialization error. Because the vector is top-level we don't have to worry about encoding its length, but if the vector itself gets embedded into an even larger structure, this can be a problem. If, for instance, the argument is a vector of vectors of int32, each nested vector also needs to have its length encoded before its data.

A note about the value zero

We are used to writing the number zero as "0" or "0x00", but if we think about it, we don't need 1 byte for representing it, 0 bytes or an "empty byte array" represent the number 0 just as well. In fact, just like in 0x0005, the leading 0 byte is superfluous, so is the byte 0x00 just like an unnecessary leading 0.

With this being said, the format always encodes zeroes of any type as empty byte arrays.

How each type gets serialized

Fixed-width numbers

Small numbers can be stored in variables of up to 64 bits.

Rust types: u8, u16, u32, usize, u64, i8, i16, i32, isize, i64.

Top-encoding: The same as for all numerical types, the minimum number of bytes that can fit their 2's complement, big endian representation.

Nested encoding: Fixed width big endian encoding of the type, using 2's complement.

info

A note about the types usize and isize: these Rust-specific types have the width of the underlying architecture, i.e. 32 on 32-bit systems and 64 on 64-bit systems. However, smart contracts always run on a wasm32 architecture, so these types will always be identical to u32 and i32 respectively. Even when simulating smart contract execution on 64-bit systems, they must still be serialized on 32 bits.

Examples

TypeNumberTop-level encodingNested encoding
u800x0x00
u810x010x01
u80x110x110x11
u82550xFF0xFF
u1600x0x0000
u160x110x110x0011
u160x11220x11220x1122
u3200x0x00000000
u320x110x110x00000011
u320x11220x11220x00001122
u320x1122330x1122330x00112233
u320x112233440x112233440x11223344
u6400x0x0000000000000000
u640x110x110x0000000000000011
u640x11220x11220x0000000000001122
u640x1122330x1122330x0000000000112233
u640x112233440x112233440x0000000011223344
u640x11223344550x11223344550x0000001122334455
u640x1122334455660x1122334455660x0000112233445566
u640x112233445566770x112233445566770x0011223344556677
u640x11223344556677880x11223344556677880x1122334455667788
usize00x0x00000000
usize0x110x110x00000011
usize0x11220x11220x00001122
usize0x1122330x1122330x00112233
usize0x112233440x112233440x11223344
i800x0x00
i810x010x01
i8-10xFF0xFF
i81270x7F0x7F
i8-1280x800x80
i16-0x110xEF0xEF
i16-10xFF0xFFFF
i16-0x110xEF0xFFEF
i16-0x11220xEEDE0xEEDE
i32-10xFF0xFFFFFFFF
i32-0x110xEF0xFFFFFFEF
i32-0x11220xEEDE0xFFFFEEDE
i32-0x1122330xEEDDCD0xFFEEDDCD
i32-0x112233440xEEDDCCBC0xEEDDCCBC
i64-10xFF0xFFFFFFFFFFFFFFFF
i64-0x110xEF0xFFFFFFFFFFFFFFEF
i64-0x11220xEEDE0xFFFFFFFFFFFFEEDE
i64-0x1122330xEEDDCD0xFFFFFFFFFFEEDDCD
i64-0x112233440xEEDDCCBC0xFFFFFFFFEEDDCCBC
i64-0x11223344550xEEDDCCBBAB0xFFFFFFEEDDCCBBAB
i64-0x1122334455660xEEDDCCBBAA9A0xFFFFEEDDCCBBAA9A
i64-0x112233445566770xEEDDCCBBAA99890xFFEEDDCCBBAA9989
i64-0x11223344556677880xEEDDCCBBAA9988780xEEDDCCBBAA998878
isize00x0x00000000
isize-10xFF0xFFFFFFFF
isize-0x110xEF0xFFFFFFEF
isize-0x11220xEEDE0xFFFFEEDE
isize-0x1122330xEEDDCD0xFFEEDDCD
isize-0x112233440xEEDDCCBC0xEEDDCCBC

Arbitrary width (big) numbers

For most smart contracts applications, number larger than the maximum uint64 value are needed. EGLD balances for instance are represented as fixed-point decimal numbers with 18 decimals. This means that to represent even just 1 EGLD we use the number 1018, which already exceeds the capacity of a regular 64-bit integer.

Rust types: BigUint, BigInt,

info

These types are managed by MultiversX VM, in many cases the contract never sees the data, only a handle. This is to reduce the burden on the smart contract.

Top-encoding: The same as for all numerical types, the minimum number of bytes that can fit their 2's complement, big endian representation.

Nested encoding: Since these types are variable length, we need to encode their length, so that the decodes knows when to stop decoding. The length of the encoded number always comes first, on 4 bytes (usize/u32). Next we encode:

  • For BigUint the big endian bytes
  • For BigInt the shortest 2's complement number that can unambiguously represent the number. Positive numbers must always have the most significant bit 0, while the negative ones 1. See examples below.

Examples

TypeNumberTop-level encodingNested encodingExplanation
BigUint00x0x00000000The length of 0 is considered 0.
BigUint10x010x00000001011 can be represented on 1 byte, so the length is 1.
BigUint2560x01000x000000020100256 is the smallest number that takes 2 bytes.
BigInt00x0x00000000Signed 0 is also represented as zero-length bytes.
BigInt10x010x0000000101Signed 1 is also represented as 1 byte.
BigInt-10x01FF0x00000001FFThe shortest 2's complement representation of -1 if FF. The most significant bit is 1.
BigUint1270x7F0x000000017F
BigInt1270x7F0x000000017F
BigUint1280x800x0000000180
BigInt1280x00800x000000020080The most significant bit of this number is 1, so to avoid ambiguity an extra 0 byte needs to be prepended.
BigInt2550x00FF0x0000000200FFSame as above.
BigInt2560x01000x000000020100256 requires 2 bytes to represent, of which the MSB is 0, no more need to prepend a 0 byte.

Boolean values

Booleans are serialized the same as a byte (u8) that can take values 1 or 0.

Rust type: bool

Values

TypeValueTop-level encodingNested encoding
booltrue0x010x01
boolfalse0x0x00

Lists of items

This is an umbrella term for all types of lists or arrays of various item types. They all serialize the same way.

Rust types: &[T], Vec<T>, Box<[T]>, LinkedList<T>, VecMapper<T>, etc.

Top-encoding: All nested encodings of the items, concatenated.

Nested encoding: First, the length of the list, encoded on 4 bytes (usize/u32). Then, all nested encodings of the items, concatenated.

Examples

TypeValueTop-level encodingNested encodingExplanation
Vec<u8>vec![1, 2]0x01020x00000002 0102Length = 2
Vec<u16>vec![1, 2]0x000100020x00000002 00010002Length = 2
Vec<u16>vec![]0x0x00000000Length = 0
Vec<u32>vec![7]0x000000070x00000001 00000007Length = 1
Vec< Vec<u32>>vec![ vec![7]]0x00000001 000000070x00000001 00000001 00000007There is 1 element, which is a vector. In both cases the inner Vec needs to be nested-encoded in the larger Vec.
Vec<&[u8]>vec![ &[7u8][..]]0x00000001 070x00000001 00000001 07Same as above, but the inner list is a simple list of bytes.
Vec< BigUint>vec![ 7u32.into()]0x00000001 070x00000001 00000001 07BigUints need to encode their length when nested. The 7 is encoded the same way as a list of bytes of length 1, so the same as above.

Arrays and tuples

The only difference between these types and the lists in the previous section is that their length is known at compile time. Therefore, there is never any need to encode their length.

Rust types: [T; N], Box<[T; N]>, (T1, T2, ... , TN).

Top-encoding: All nested encodings of the items, concatenated.

Nested encoding: All nested encodings of the items, concatenated.

Examples

TypeValueTop-level encodingNested encoding
[u8; 2][1, 2]0x01020x0102
[u16; 2][1, 2]0x000100020x00010002
(u8, u16, u32)[1u8, 2u16, 3u32]0x010002000000030x01000200000003

Byte slices and ASCII strings

A special case of the list types, they behave according to the same rules as lists of items.

info

Strings are treated from the point of view of serialization as series of bytes. Using Unicode strings, while often a good practice in programming, tends to add unnecessary overhead to smart contracts. The difference is that Unicode strings get validated on input and concatenation.

We consider best practice to use Unicode on the frontend, but keep all messages and error messages in ASCII format on smart contract level.

Rust types: BoxedBytes, &[u8], Vec<u8>, String, &str.

Top-encoding: The byte slice, as-is.

Nested encoding: The length of the byte slice on 4 bytes, followed by the byte slice as-is.

Examples

TypeValueTop-level encodingNested encodingExplanation
&'static [u8]b"abc"0x6162630x00000003616263ASCII strings are regular byte slices of buffers.
BoxedBytesBoxedBytes::from( b"abc")0x6162630x00000003616263BoxedBytes are just optimized owned byte slices that cannot grow.
Vec<u8>b"abc".to_vec()0x6162630x00000003616263Use Vec for a buffer that can grow.
&'static str"abc"0x6162630x00000003616263Unicode string (slice).
String"abc".to_string()0x6162630x00000003616263Unicode string (owned).

Options

An Option represents an optional value: every Option is either Some and contains a value, or None, and does not.

Rust types: Option<T>.

Top-encoding: If Some, a 0x01 byte gets encoded, and after it the encoded value. If None, nothing gets encoded.

Nested encoding: If Some, a 0x01 byte gets encoded, and after it the encoded value. If None, a 0x00 byte get encoded.

Examples

TypeValueTop-level encodingNested encodingExplanation
Option<u16>Some(5)0x0100050x010005
Option<u16>Some(0)0x0100000x010000
Option<u16>None0x0x00Note that Some has different encoding than None for any type
Option< BigUint>Some( BigUint::from( 0x1234u32))0x01 00000002 12340x01 00000002 1234The Some value is nested-encoded. For a BigUint this adds the length, which here is 2.

Custom structures

Any structure defined in a contract of library can become serializable if it is annotated with either or all of: TopEncode, TopDecode, NestedEncode, NestedDecode.

Example implementation:

#[derive(TopEncode, TopDecode, NestedEncode, NestedDecode)]
pub struct Struct {
pub int: u16,
pub seq: Vec<u8>,
pub another_byte: u8,
pub uint_32: u32,
pub uint_64: u64,
}

Top-encoding: All fields nested-encoded one after the other.

Nested encoding: The same, all fields nested-encoded one after the other.

Example value

Struct {
int: 0x42,
seq: vec![0x1, 0x2, 0x3, 0x4, 0x5],
another_byte: 0x6,
uint_32: 0x12345,
uint_64: 0x123456789,
}

It will be encoded (both top-encoding and nested encoding) as: 0x004200000005010203040506000123450000000123456789.

Explanation:

[
/* int */ 0, 0x42,
/* seq length */ 0, 0, 0, 5,
/* seq contents */ 1, 2, 3, 4, 5,
/* another_byte */ 6,
/* uint_32 */ 0x00, 0x01, 0x23, 0x45,
/* uint_64 */ 0x00, 0x00, 0x00, 0x01, 0x23, 0x45, 0x67, 0x89
]

Custom enums

Any enum defined in a contract of library can become serializable if it is annotated with either or all of: TopEncode, TopDecode, NestedEncode, NestedDecode.

A simple enum example:

Example taken from the multiversx-sc-codec tests.

#[derive(TopEncode, TopDecode, NestedEncode, NestedDecode)]
enum DayOfWeek {
Monday,
Tuesday,
Wednesday,
Thursday,
Friday,
Saturday,
Sunday,
}

A more complex enum example:

#[derive(TopEncode, TopDecode, NestedEncode, NestedDecode)]
enum EnumWithEverything {
Default,
Today(DayOfWeek),
Write(Vec<u8>, u16),
Struct {
int: u16,
seq: Vec<u8>,
another_byte: u8,
uint_32: u32,
uint_64: u64,
},
}

Nested encoding: First, the discriminant is encoded. The discriminant is the index of the variant, starting with 0. Then the fields in that variant (if any) get nested-encoded one after the other.

Top-encoding: Same as nested-encoding, but with an additional rule: if the discriminant is 0 (first variant) and there are no fields, nothing is encoded.

Example values

The examples below are taken from the multiversx-sc-codec tests.

ValueTop-encoding bytesNested encoding bytes
DayOfWeek::Monday
/* nothing */
/* discriminant */ 0,
DayOfWeek::Tuesday
/* discriminant */ 1,
EnumWithEverything::Default
/* nothing */
/* discriminant */ 0,
EnumWithEverything::Today(
DayOfWeek::Monday
)

/* discriminant */ 1,
/* DayOfWeek discriminant */ 0
EnumWithEverything::Today(
DayOfWeek::Friday
)

/* discriminant */ 1,
/* DayOfWeek discriminant */ 4
EnumWithEverything::Write(
Vec::new(),
0,
)

/* discriminant */ 2,
/* vec length */ 0, 0, 0, 0,
/* u16 */ 0, 0,
EnumWithEverything::Write(
[1, 2, 3].to_vec(),
4
)

/* discriminant */ 2,
/* vec length */ 0, 0, 0, 3,
/* vec contents */ 1, 2, 3,
/* an extra 16 */ 0, 4,
EnumWithEverything::Struct (
int: 0x42,
seq: vec![0x1, 0x2, 0x3, 0x4, 0x5],
another_byte: 0x6,
uint_32: 0x12345,
uint_64: 0x123456789,
);

/* discriminant */ 3,
/* int */ 0, 0x42,
/* seq length */ 0, 0, 0, 5,
/* seq contents */ 1, 2, 3, 4, 5,
/* another_byte */ 6,
/* uint_32 */ 0x00, 0x01, 0x23, 0x45,
/* uint_64 */ 0x00, 0x00, 0x00, 0x01,
0x23, 0x45, 0x67, 0x89,