Undefined Behavior
Note: Although I believe that the content in this article is correct, I would appreciate if someone more familiar with the subtleties of UB could extend or revise this page.
Undefined Behavior, often abbreviated UB, is the result of a violated assumption of the compiler or optimizer. UB can usually only be encountered if the program or its dependencies use unsafe code incorrectly.
There are a few known bugs that can cause UB without unsafe
in edge cases, which are labeled I-unsound on GitHub. However, it's very unlikely to encounter these bugs on a day-to-day basis.
Example
let b: bool = unsafe {
std::mem::transmute(2u8)
};
This example uses the unsafe transmute()
function to convert the integer 2
to a bool
. The problem is that Rust assumes that any bool
is either true
(represented as 1
) or false
(represented as 0
). This means that b
is an invalid value – it is neither true
nor false
.
This means that the compiler is free to assume that the code will never run and therefore can make certain optimizations, which can have drastic consequences in other parts of the code. This series of articles explains the impacts of UB; it focuses on C but the points also apply to Rust.
Consequences of Undefined Behavior
The compiler assumes that UB can never happen, and optimizes code under this assumption. However, optimizations that are correct in the absence of UB might be nonsensical and even dangerous, if UB is present. What exactly happens is not specified and can't be predicted.
For example, the optimizer is allowed to inline code and move code around, to unroll loops, replace arithmetic with bitwise operations, remove "dead code" and much more. So it is possible that code exhibiting UB could be removed, or does something entirely else. This is sometimes humorously called "eating your laundry" or nasal demons.
Acceptable behaviors when undefined behavior are open-ended enough to even cause the program to misbehave before the invalid code is actually invoked, as long as the execution of the program is guaranteed to eventually hit it. For example, removing conditionals that, if UB were not invoked, must be true. This can create the appearance of undefined behavior time traveling.
Difference between Undefined Behavior and Contract Violations
Library types can define their own contracts that must be upheld.
For example, Vec
has the contract that its length must never be greater than its capacity. Its implementation ensures that this never happens, and relies on it within unsafe
blocks. If the contract were violated, the unsafe code could cause UB.
Contract Violations are logical errors, but they can cause UB when unsafe code is involved. If a function doesn't ensure that contracts are upheld, and this can cause UB down the line, then that function should be unsafe
. Furthermore, fields with a contract should be private.
Contract Violations somewhat overlap with invalid values. For example, a bool
can only have the values true
and false
; this is called a validity invariant. Note that this is not just relied on by the API, but by the language itself, since bool
is a primitive type. Therefore, an invalid value is illegal to produce (even if it is never used)[1]. On the other hand, if a library type such as Vec
has a violated contract, it causes UB only when it is used (e.g. when the Vec
is indexed).
Another example is the str
type, which is a slice of text that must be valid UTF-8. However, it has the same validity invariants as [u8]
[2], the encoding is "just" an API contract. This means it is not UB to produce a str
that isn't valid UTF-8. However, using an incorrectly encoded str
can cause UB.
Behavior considered undefined
This list is taken from the Reference.
Warning: The following list is not exhaustive. There is no formal model of Rust's semantics for what is and is not allowed in unsafe code, so there may be more behavior considered unsafe. The following list is just what we know for sure is Undefined Behavior. Please read the Rustonomicon before writing unsafe code.
Data races
Rust assumes that data races can never occur; things like accessing a mutable static is therefore unsafe.
Data races may also happen when transmuting a shared reference (&
) to an exclusive reference (&mut
). You should never do that.
Dereferencing a dangling/unaligned raw pointer
Raw pointers (*T
, *mut T
) can be dereferenced with the *
prefix operator. Doing this is UB if the pointer is dangling (i.e. no longer valid) or unaligned.
Breaking LLVM's pointer aliasing rules
Rust uses LLVM as its compiler backend. References (&T
, &mut T
) follow LLVM's scoped noalias model, except if the &T
contains an UnsafeCell
.
Specifically, exclusive references (&mut T
) must not alias: While an exclusive reference is valid, no other reference/pointer may access the data it points to.
Mutating immutable data
All data inside a constant is immutable. Moreover, all data reached through a shared reference or data owned by an immutable binding is immutable, unless that data is contained within an UnsafeCell
.
Invoking UB via compiler intrinsics
Executing code compiled with platform features that the current platform does not support
See target_feature
.
Calling a function with the wrong call ABI or unwinding from a function with the wrong unwind ABI
Producing an invalid value
"Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation. Values must always be valid, even if they're unused or in a private field.
The following values are invalid (at their respective type):
- A value other than
false
(0
) ortrue
(1
) in abool
.
- A discriminant in an
enum
not included in the type definition.
- A null
fn
pointer.
- A value in a
char
which is a surrogate or abovechar::MAX
.
- A
!
(all values are invalid for this type).
- An integer (
i*
/u*
), floating point value (f*
), or raw pointer obtained from uninitialized memory, or uninitialized memory in astr
.
- A reference or
Box<T>
that is dangling, unaligned, or points to an invalid value.
- Invalid metadata in a wide reference,
Box<T>
, or raw pointer:dyn Trait
metadata is invalid if it is not a pointer to a vtable forTrait
that matches the actual dynamic trait the pointer or reference points to.- Slice metadata is invalid if the length is not a valid
usize
(i.e., it must not be read from uninitialized memory).
- Invalid values for a type with a custom definition of invalid values. In the standard library, this affects
NonNull<T>
andNonZero*
.
Note: rustc
achieves this with the unstable rustc_layout_scalar_valid_range_*
attributes.
Avoiding undefined behavior
Safe Rust does not have undefined behavior. If it is possible to induce LLVM-side undefined behavior using safe Rust, that is a bug in the compiler.
Because the results of invoking undefined behavior are so unpredictable and broken code can often appear to work, if one wishes to search for undefined behavior in unsafe code by testing it, it is highly recommended to compile the source code with extra debug information. Unfortunately, the LLVM sanitizers and miri both require nightly builds of rustc.