Collections

From Rust Community Wiki
Revision as of 22:15, 23 August 2022 by Notriddle (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Rust's standard library contains common collection types, such as sequences, sets, and maps. These live in the std::collectionsThis links to official Rust documentation module. In no_std environments, they can be accessed under alloc::collectionsThis links to official Rust documentation.

All collections are owned and allocated on the heap.

Sequences[edit]

Sequences can contain an arbitrary number of items of the same type. The standard library contains three sequence types: Vec, VecDeque, and LinkedList.

Vec[edit]

VecThis links to official Rust documentation is the most ubiquitous collection type. It is backed by a heap-allocated array. When items are added or removed from the Vec, the array grows or shrinks dynamically. Even though pushing items to a Vec sometimes requires copying the array, the amortized time to push an item is constant. Vec is the most performant sequence type in the standard library for many purposes, and the only one that can be sliced as it is stored contiguously in memory.

Length and capacity[edit]

Vec manages to achieve a constant amortized push complexity by allocating space for more elements than it contains. Since most of the time the underlying storage is in fact larger than the required storage, the push operation is simply a matter of copying one item to the end and incrementing the length.

When writing performance-critical code, reallocating can be costly. It is therefore advised to manually control the capacity of your Vec using methods such as Vec::with_capacityThis links to official Rust documentation and Vec::reserveThis links to official Rust documentation.

VecDeque[edit]

VecDequeThis links to official Rust documentation is a circular buffer. It works similar to Vec, except that inserting or removing items at the start of the sequence is more efficient.

LinkedList[edit]

LinkedListThis links to official Rust documentation is a doubly-linked list. It allows adding and removing items at both ends in constant time, but doesn't allow indexing items in constant time. LinkedList is less memory efficient and less cache friendly than Vec and VecDeque, which very often leads to worse performance.

Maps[edit]

Maps are data structures that map keys to values. The standard library contains two map types: HashMap and BTreeMap. The primary differences between these two maps are performance and the order in which Iterator yields the map's items.

HashMap[edit]

HashMapThis links to official Rust documentation is an efficient implementation of a hash map.

HashMap items are stored in random order, and can be accessed with amortized O(1) memory accesses. If you're very, very unlucky, operations on a HashMap can require scanning the entire thing, making it O(N).

The Rust standard library hash map has been internally redesigned multiple times (while keeping the same API, of course). The design approach of the version of HashMap that shipped with Rust 1.0, as well as the version that is currently used, were both documented by Gankra:

The current version of the Rust standard library hash map is also published to crates.io as the Cargo vec.svghashbrown crate. Using the version in the standard library provides a guaranteed stable indefinitely API, while using the crate from crates.io allows access to APIs that have not been stablized.

BTreeMap[edit]

BTreeMapThis links to official Rust documentation is an efficient implementation of an in-memory B-Tree. In other words, BTreeMap is like a binary search tree, but faster.

BTreeMap internally keeps its keys sorted, and requires O(log N) memory accesses for most operations, such as fetching, deleting, or inserting a key-value pair. Iterating over the entire map, in order, can be done in an optimal O(N) memory accesses.

BTreeMap also has a blog post by Gankra describing how it works.

It is not currently published as a separate crate like HashMap/hashbrown.

Sets[edit]

Sets are unordered collections that can't contain the same item more than once. The standard library contains the HashSetThis links to official Rust documentation and BTreeSetThis links to official Rust documentation types, which internally use the same implementation as HashMap and BTreeMap (the items are stored in the keys, and the values are empty).

BinaryHeap[edit]

The BinaryHeapThis links to official Rust documentation is a priority queue implemented with a binary heap.