Pattern matching

From Rust Community Wiki
Revision as of 12:06, 26 July 2021 by Aloso (talk | contribs) (→‎Future extensions)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Pattern matching has two purposes in Rust: To check if a value has certain characteristics, and to bind parts of the value to variables.

Example[edit | edit source]

Patterns are used in many places. For example, they appear on the left side of match arms.

#[derive(Debug)]
struct Foo {
    bar: i32,
    baz: bool,
}

let value = Some(Foo { bar: 42, baz: true });

match value {
    Some(Foo { bar: 42, baz: false }) => println!("first match arm"),
    Some(Foo { bar: 42, .. })         => println!("second match arm"),
    Some(_)                           => println!("third match arm"),
    None                              => println!("last match arm"),
}
// This prints "second match arm"

When this example is run, the match arms are matched against the value from top to bottom, and the first arm that matches the value is executed.

Pattern matching in a match expression must always be exhaustive. If the third or last match arm was removed, this example would no longer be accepted by the compiler, because every possible value has to be covered by at least one pattern.

Bindings[edit | edit source]

When pattern matching, parts of the value can be bound to new variables. These can be made mutable by prepending the mut keyword:

match value {
    x @ Some(Foo { baz: false, .. }) => println!("x is {:?}", x),
    Some(Foo { bar: mut x, .. })     => println!("x is {:?}", x),
    None                             => println!("value is None"),
}

In this example, if the first arm matches the value, the whole value is bound to the variable x. In the second arm, only the field bar is bound to x.

Note that x is a different binding in each match arm. They have different types, and only the binding in the second match arm is mutable.

Refutability[edit | edit source]

Patterns can either be irrefutable, meaning that matching the pattern can never fail; otherwise they are refutable. For example, the wildcard pattern (_) and variable bindings are irrefutable, as matching them will always succeed.

Irrefutable patterns are used in let bindings and in function and closure parameters:

fn pattern_matching((a, b): (i32, &bool)) {
    let &c = b;
}

In this example, (a, b) and &c are patterns. a is an i32, b is a &bool, c is a bool.

The process of taking a value apart with pattern matching and binding its components to new variables is called destructuring.

Patterns[edit | edit source]

There are many ways how values can be matched:

  • Wildcard (_): Matches any value and ignores it
  • Binding (e.g. foo): Matches any value and binds it to a new variable.
  • Binding with additional pattern (e.g. foo @ Some(_)): Matches a value against the pattern after the @. If it succeeds, the whole value is bound to the variable.
    • Currently, the pattern after @ can't introduce new bindings. This restriction will be lifted when the bindings_after_at nightly-only feature is stabilized.
  • Literal (e.g. 5, true, 'a', ""): Matches exactly that literal
  • Range (e.g. 5..=10, 'a'..='f'): Matches any value in the given range. Not all kinds of ranges are allowed in patterns:
    • Currently, only inclusive ranges (start..=end) are allowed
    • Half-open ranges such as start.. are expected to reach stable in Rust 1.56.
    • .. is allowed in patterns, but has a different meaning than in expressions: It means that zero or more elements are omitted.
  • Constant (e.g. crate::FOO): Matches a constant.
  • Reference (&PATTERN): Matches the dereferenced value
  • Tuple (e.g. (PATTERN, .., PATTERN)): Destructures a tuple of values
    • May contain two dots (..) to ignore an arbitrary number of components
  • Array/slice (e.g. [PATTERN, .., PATTERN])
    • May contain two dots (..) to ignore an arbitrary number of elements. These can be bound as an array/slice to a new variable with binding @ ..
  • Struct (e.g. Foo { field: PATTERN, .. }, Foo(PATTERN, ..))
    • May contain two dots (..) to ignore an arbitrary number of struct fields
    • If a field is bound to a variable of the same name, the pattern can be omitted. For example, Foo { field: field } is equivalent to Foo { field }, and Foo { field: mut field } can be written as Foo { mut field }.
    • A tuple struct can be pattern-matched with the regular struct syntax, e.g. let Foo { 0: x } = Foo(5);
  • Enum variant (e.g. Option::Some(PATTERN))
    • Enum variants are always refutable, except if the enum contains at most one variant and doesn't have the #[non_exhaustive] attribute
    • To the enum variant fields apply the same pattern-matching rules as to struct fields
  • Or-patterns (e.g. A | B): Matches any of the patterns separated with a vertical bar.
    • Or-patterns weren't available before Rust 1.53, but match arms could still have multiple patterns separated with |.
    • An or-pattern must be wrapped in parentheses (e.g. (A | B | C)) if:
      • it is not in a tuple, array, slice, struct or enum variant pattern, unless the pattern is in a match arm
      • it is behind a reference, e.g. &(A | B)
  • box (e.g. Some(box x)): This is a nightly feature to destructure a BoxThis links to official Rust documentation. It's discouraged to use this feature, since it might be removed in the future.

Binding modes[edit | edit source]

There are several ways how a value can be bound to a variable:

  • foo is the default, it creates an immutable binding.
  • mut foo creates a mutable binding.
  • ref foo creates a shared (immutable) reference to the matched value.
  • ref mut foo creates an exclusive (mutable) reference to the matched value.

For example:

let mut x: Option<i32> = Some(4);
match x {
    Some(ref mut inner) => {
        // inner is a reference into the Option. It has type &mut i32.
        *inner = 5;
    }
    _ => {}
}
dbg!(x);

The ref and ref mut binding modes are rarely needed because of match ergonomics.

Match ergonomics[edit | edit source]

Match ergonomics (also called default binding modes) make it easier to match on borrowed values by automatically adding &, &mut, ref and ref mut in the pattern where necessary. Imagine you are writing a function to unwrap a struct from a borrowed ResultThis links to official Rust documentation, which doesn't implement the CopyThis links to official Rust documentation trait:

struct Foo(bool, i32);

fn unwrap(f: &Result<Foo, ()>) -> &Foo {
    match f { ... }
}

Without match ergonomics, we have to use & to match on the reference, and then use the ref keyword to bind the content by reference:

match f {
    &Ok(ref foo) => foo,
    &Err(()) => panic!("Unwrapped an error"),
}

However, with match ergonomics, this isn't necessary, because &Ok(foo) is treated the same as Ok(&foo) or Ok(Foo(&a, &b)) in patterns; the reference can be moved inside structs or enums:

fn unwrap(f: &Result<Foo, ()>) -> &Foo {
    match f {
        Ok(foo) => foo,
        Err(()) => panic!("Unwrapped an error"),
    }
}

This also works for tuples, arrays, slices and unions.

Places where patterns are used[edit | edit source]

Refutable patterns[edit | edit source]

match block[edit | edit source]

Full article: Match

With the matchThis links to official Rust documentation keyword, a value is pattern matched against several match arms. Each match arm consists of at least one pattern and an optional if guard. Patterns in a match block must be exhaustive.

if let and while let[edit | edit source]

These blocks are syntactic sugar for the often more verbose match block. For example:

if let Some(x) = args().next() {
    dbg!(x);
}
// desugars to
match args().next() {
    Some(x) => {
        dbg!(x);
    }
    _ => {}
}
let mut iter = "abc".chars();
while let Some(x) = iter.next() {
    dbg!(x);
}
// desugars to
let mut iter = "abc".chars();
loop {
    match iter.next() {
        Some(x) => {
            dbg!(x);
        }
        None => break,
    }
}

Like match blocks, if let and while let blocks can have multiple patterns separated with vertical bars (|). However, they can't have an if guard.

Many people find the naming if let and while let confusing. The rationale behind this syntax is that both let and if let accept a pattern. However, the pattern must be irrefutable for let, but can be refutable for if let.

Irrefutable patterns[edit | edit source]

let bindings[edit | edit source]

let bindings are patterns. This is why it's possible to destructure values on assignment:

let [x, ..] = [1, 2, 3];
let &(a, b) = &(3, 2);

for loops[edit | edit source]

for loops accept a pattern to bind the iterated-over values, for example:

for (n, &x) in vec![1, 2, 3].iter().enumerate() {
    dbg!(n, x);
}

Function and closure parameters[edit | edit source]

Function and closure parameters are patterns, for example:

fn foo((a, b): (i32, i32), _: bool) {}

[1, 2, 3]
    .iter()
    .zip(&[4, 5, 6])
    .map(|(&n, &x)| n + x);

Macros[edit | edit source]

Macros using macro_rules! can accept patterns with a :pat parameter, for example:

macro_rules! matches {
    // accepts an expression and a pattern
    ($e:expr, $p:pat) => {
        match $e {
            $p => true,
            _  => false,
        }
    }
}

In fact, a similar macro, matches!This links to official Rust documentation, is part of the standard library.

Future extensions[edit | edit source]

Bindings after @[edit | edit source]

As mentioned above, patterns after a @ can't introduce new bindings. For example, foo @ Some(bar) is illegal.

This restriction will be lifted when the experimental bindings_after_at feature is stabilized. Note that the matched type must implement the CopyThis links to official Rust documentation trait:

#![feature(bindings_after_at)]

match Some(5) {
    x @ Some(y) => {
        dbg!(x, y);
    }
    _ => {}
}

Destructuring assignment[edit | edit source]

The left-hand side of assignments (e.g. *x = 4;) is an expression. More precisely, it is a place expression, because it must refer to a memory location.

Destructuring assignment will make the left-hand side of assignments look more like patterns, because it will allow to destructure a value while assigning it:

[a, _, b.c, ..] = ['a', 'w', 'e', 's', 'o', 'm', 'e', '!'];

[a, _, b.c, ..] is an expression, even though it looks a lot like a pattern: It contains a wildcard (_), which is usually not allowed in expressions, and the full range (..) is treated as omission of the remaining elements. On the other hand, it contains a field access, which is not allowed in patterns. This is unfortunate because it blurs the distinction between patterns and expressions. However, destructuring assignment is a very useful feature, so it is considered to be worth the additional complexity.