Undefined Behavior

From Rust Community Wiki
Jump to navigation Jump to search

Note: Although I believe that the content in this article is correct, I would appreciate if someone more familiar with the subtleties of UB could extend or revise this page.

Undefined Behavior, often abbreviated UB, is the result of a violated assumption of the compiler or optimizer. UB can usually only be encountered if the program or its dependencies use unsafe code incorrectly.

There are a few known bugs that can cause UB without unsafe in edge cases, which are labeled I-unsound on GitHub. However, it's very unlikely to encounter these bugs on a day-to-day basis.

Example[edit | edit source]

let b: bool = unsafe {
    std::mem::transmute(2u8)
};

This example uses the unsafe transmute()This links to official Rust documentation function to convert the integer 2 to a boolThis links to official Rust documentation. The problem is that Rust assumes that any bool is either true (represented as 1) or false (represented as 0). This means that b is an invalid value – it is neither true nor false.

This means that the compiler is free to assume that the code will never run and therefore can make certain optimizations, which can have drastic consequences in other parts of the code. This series of articles explains the impacts of UB; it focuses on C but the points also apply to Rust.

Consequences of Undefined Behavior[edit | edit source]

The compiler assumes that UB can never happen, and optimizes code under this assumption. However, optimizations that are correct in the absence of UB might be nonsensical and even dangerous, if UB is present. What exactly happens is not specified and can't be predicted.

For example, the optimizer is allowed to inline code and move code around, to unroll loops, replace arithmetic with bitwise operations, remove "dead code" and much more. So it is possible that code exhibiting UB could be removed, or does something entirely else. This is sometimes humorously called "eating your laundry" or nasal demons.

Difference between Undefined Behavior and Contract Violations[edit | edit source]

Library types can define their own contracts that must be upheld.

For example, VecThis links to official Rust documentation has the contract that its length must never be greater than its capacity. Its implementation ensures that this never happens, and relies on it within unsafe blocks. If the contract were violated, the unsafe code could cause UB.

Contract Violations are logical errors, but they can cause UB when unsafe code is involved. If a function doesn't ensure that contracts are upheld, and this can cause UB down the line, then that function should be unsafe. Furthermore, fields with a contract should be private.

Contract Violations somewhat overlap with invalid values. For example, a bool can only have the values true and false; this is called a validity invariant. Note that this is not just relied on by the API, but by the language itself, since bool is a primitive type. Therefore, an invalid value is illegal to produce (even if it is never used)[1]. On the other hand, if a library type such as Vec has a violated contract, it causes UB only when it is used (e.g. when the Vec is indexed).

Another example is the str type, which is a slice of text that must be valid UTF-8. However, it has the same validity invariants as [u8][2], the encoding is "just" an API contract. This means it is not UB to produce a str that isn't valid UTF-8. However, using an incorrectly encoded str can cause UB.

Behavior considered undefined[edit | edit source]

This list is taken from the Reference.

Warning: The following list is not exhaustive. There is no formal model of Rust's semantics for what is and is not allowed in unsafe code, so there may be more behavior considered unsafe. The following list is just what we know for sure is Undefined Behavior. Please read the Rustonomicon before writing unsafe code.

Data races[edit | edit source]

Rust assumes that data races can never occur; things like accessing a mutable static is therefore unsafe.

Data races may also happen when transmuting a shared reference (&) to an exclusive reference (&mut). You should never do that.

Dereferencing a dangling/unaligned raw pointer[edit | edit source]

Raw pointers (*T, *mut T) can be dereferenced with the * prefix operator. Doing this is UB if the pointer is dangling (i.e. no longer valid) or unaligned.

Breaking LLVM's pointer aliasing rules[edit | edit source]

Rust uses LLVM as its compiler backend. References (&T, &mut T) follow LLVM's scoped noalias model, except if the &T contains an UnsafeCellThis links to official Rust documentation.

Specifically, exclusive references (&mut T) must not alias: While an exclusive reference is valid, no other reference/pointer may access the data it points to.

Mutating immutable data[edit | edit source]

All data inside a constant is immutable. Moreover, all data reached through a shared reference or data owned by an immutable binding is immutable, unless that data is contained within an UnsafeCellThis links to official Rust documentation.

Invoking UB via compiler intrinsics[edit | edit source]

Executing code compiled with platform features that the current platform does not support[edit | edit source]

See target_feature.

Calling a function with the wrong call ABI or unwinding from a function with the wrong unwind ABI[edit | edit source]

Producing an invalid value[edit | edit source]

"Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation. Values must always be valid, even if they're unused or in a private field.

The following values are invalid (at their respective type):

  • A value other than false (0) or true (1) in a bool.
  • A discriminant in an enum not included in the type definition.
  • A null fn pointer.
  • A value in a char which is a surrogate or above char::MAX.
  • A ! (all values are invalid for this type).
  • An integer (i*/u*), floating point value (f*), or raw pointer obtained from uninitialized memory, or uninitialized memory in a str.
  • A reference or Box<T> that is dangling, unaligned, or points to an invalid value.
  • Invalid metadata in a wide reference, Box<T>, or raw pointer:
    • dyn Trait metadata is invalid if it is not a pointer to a vtable for Trait that matches the actual dynamic trait the pointer or reference points to.
    • Slice metadata is invalid if the length is not a valid usize (i.e., it must not be read from uninitialized memory).
  • Invalid values for a type with a custom definition of invalid values. In the standard library, this affects NonNull<T> and NonZero*.

Note: rustc achieves this with the unstable rustc_layout_scalar_valid_range_* attributes.

References[edit | edit source]