Undefined Behavior

From Rust Community Wiki

Note: Although I believe that the content in this article is correct, I would appreciate if someone more familiar with the subtleties of UB could extend or revise this page.

Undefined Behavior, often abbreviated UB, is the result of a violated assumption of the compiler or optimizer. UB can usually only be encountered if the program or its dependencies use unsafe code incorrectly.

There are a few known bugs that can cause UB without unsafe in edge cases, which are labeled I-unsound on GitHub. However, it's very unlikely to encounter these bugs on a day-to-day basis.

Example[edit]

let b: bool = unsafe {
    std::mem::transmute(2u8)
};

This example uses the unsafe transmute()This links to official Rust documentation function to convert the integer 2 to a boolThis links to official Rust documentation. The problem is that Rust assumes that any bool is either true (represented as 1) or false (represented as 0). This means that b is an invalid value – it is neither true nor false.

This means that the compiler is free to assume that the code will never run and therefore can make certain optimizations, which can have drastic consequences in other parts of the code. This series of articles explains the impacts of UB; it focuses on C but the points also apply to Rust.

Consequences of Undefined Behavior[edit]

The compiler assumes that UB can never happen, and optimizes code under this assumption. However, optimizations that are correct in the absence of UB might be nonsensical and even dangerous, if UB is present. What exactly happens is not specified and can't be predicted.

For example, the optimizer is allowed to inline code and move code around, to unroll loops, replace arithmetic with bitwise operations, remove "dead code" and much more. So it is possible that code exhibiting UB could be removed, or does something entirely else. This is sometimes humorously called "eating your laundry" or nasal demons.

Acceptable behaviors when undefined behavior are open-ended enough to even cause the program to misbehave before the invalid code is actually invoked, as long as the execution of the program is guaranteed to eventually hit it. For example, removing conditionals that, if UB were not invoked, must be true. This can create the appearance of undefined behavior time traveling.

Difference between Undefined Behavior and Contract Violations[edit]

Library types can define their own contracts that must be upheld.

For example, VecThis links to official Rust documentation has the contract that its length must never be greater than its capacity. Its implementation ensures that this never happens, and relies on it within unsafe blocks. If the contract were violated, the unsafe code could cause UB.

Contract Violations are logical errors, but they can cause UB when unsafe code is involved. If a function doesn't ensure that contracts are upheld, and this can cause UB down the line, then that function should be unsafe. Furthermore, fields with a contract should be private.

Contract Violations somewhat overlap with invalid values. For example, a bool can only have the values true and false; this is called a validity invariant. Note that this is not just relied on by the API, but by the language itself, since bool is a primitive type. Therefore, an invalid value is illegal to produce (even if it is never used)[1]. On the other hand, if a library type such as Vec has a violated contract, it causes UB only when it is used (e.g. when the Vec is indexed).

Another example is the str type, which is a slice of text that must be valid UTF-8. However, it has the same validity invariants as [u8][2], the encoding is "just" an API contract. This means it is not UB to produce a str that isn't valid UTF-8. However, using an incorrectly encoded str can cause UB.

Behavior considered undefined[edit]

This list is taken from the Reference.

Warning: The following list is not exhaustive. There is no formal model of Rust's semantics for what is and is not allowed in unsafe code, so there may be more behavior considered unsafe. The following list is just what we know for sure is Undefined Behavior. Please read the Rustonomicon before writing unsafe code.

Data races[edit]

Rust assumes that data races can never occur; things like accessing a mutable static is therefore unsafe.

Data races may also happen when transmuting a shared reference (&) to an exclusive reference (&mut). You should never do that.

Dereferencing a dangling/unaligned raw pointer[edit]

Raw pointers (*T, *mut T) can be dereferenced with the * prefix operator. Doing this is UB if the pointer is dangling (i.e. no longer valid) or unaligned.

Breaking LLVM's pointer aliasing rules[edit]

Rust uses LLVM as its compiler backend. References (&T, &mut T) follow LLVM's scoped noalias model, except if the &T contains an UnsafeCellThis links to official Rust documentation.

Specifically, exclusive references (&mut T) must not alias: While an exclusive reference is valid, no other reference/pointer may access the data it points to.

Mutating immutable data[edit]

All data inside a constant is immutable. Moreover, all data reached through a shared reference or data owned by an immutable binding is immutable, unless that data is contained within an UnsafeCellThis links to official Rust documentation.

Invoking UB via compiler intrinsics[edit]

Executing code compiled with platform features that the current platform does not support[edit]

See target_feature.

Calling a function with the wrong call ABI or unwinding from a function with the wrong unwind ABI[edit]

Producing an invalid value[edit]

"Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation. Values must always be valid, even if they're unused or in a private field.

The following values are invalid (at their respective type):

  • A value other than false (0) or true (1) in a bool.
  • A discriminant in an enum not included in the type definition.
  • A null fn pointer.
  • A value in a char which is a surrogate or above char::MAX.
  • A ! (all values are invalid for this type).
  • An integer (i*/u*), floating point value (f*), or raw pointer obtained from uninitialized memory, or uninitialized memory in a str.
  • A reference or Box<T> that is dangling, unaligned, or points to an invalid value.
  • Invalid metadata in a wide reference, Box<T>, or raw pointer:
    • dyn Trait metadata is invalid if it is not a pointer to a vtable for Trait that matches the actual dynamic trait the pointer or reference points to.
    • Slice metadata is invalid if the length is not a valid usize (i.e., it must not be read from uninitialized memory).
  • Invalid values for a type with a custom definition of invalid values. In the standard library, this affects NonNull<T> and NonZero*.

Note: rustc achieves this with the unstable rustc_layout_scalar_valid_range_* attributes.

Avoiding undefined behavior[edit]

Safe Rust does not have undefined behavior. If it is possible to induce LLVM-side undefined behavior using safe Rust, that is a bug in the compiler.

Because the results of invoking undefined behavior are so unpredictable and broken code can often appear to work, if one wishes to search for undefined behavior in unsafe code by testing it, it is highly recommended to compile the source code with extra debug information. Unfortunately, the LLVM sanitizers and miri both require nightly builds of rustc.

References[edit]