String Types

Strings are types that represent textual data. There are several string types that have slightly different purposes.

The most common string type, strThis links to official Rust documentation, is a primitive type and is used for string literals. Other string types in the standard library are OsStrThis links to official Rust documentation and CStrThis links to official Rust documentation. All of these types are dynamically sized and can therefore only exist behind a reference or a pointer-like type (e.g. &str). However, they have owned counterparts, StringThis links to official Rust documentation, OsStringThis links to official Rust documentation and CStringThis links to official Rust documentation, which are allocated on the heap and are Sized. This is similar to the distinction between Vec<T> and [T].

Strings are represented as slices of bytes; however, Rust guarantees that strings are always valid. For instance, the str and String types must be valid UTF-8, violating this rule causes undefined behavior.

ExampleEdit

let s1: &str = "hello world!";
let s2 = String::from("👍👌🤝");
assert_eq!(s1.len(), s2.len());

StringThis links to official Rust documentation and strThis links to official Rust documentationEdit

String and str are types containing UTF-8 encoded text. This means that characters have a variable width: A character can be between 1 and 4 bytes long. Therefore these strings can't be indexed, and slicing a string to get a substring uses byte offsets. Slicing a string in the middle of a multi-byte character causes a panic:

let s = "hi 😂";
assert_eq!(&s[3..7], "😂");
&s[0..4]; // panics

The len()  method returns the byte length of the string; this is not the same as the number of characters. To iterate over the characters of a string, the chars()  and char_indices()  methods can be used.

A String or str can be converted from a Vec<u8> or [u8] with the String::from_utf8 and str::from_utf8 family of functions, which validate the input.

A note about UnicodeEdit

In Rust, a char is a Unicode scalar value. This is similar to, but not the same as a Unicode code point: A char can't be a high or low surrogate. This means that converting a u32 to a char can fail.

A char is not the same as a character in the general sense. When talking about characters, we often mean graphemes, which can consist of multiple chars. To iterate over or count the graphemes of a string, external crates such as  unicode-segmentation should be used. Note that graphemes don't necessarily correspond to what is displayed as a visual unit, since that depends on the text rendering pipeline. For example, fonts can define ligatures, and these can be turned on and off with font features.

Also note that strings that are semantically equal don't necessarily have the same byte representation. Often there are multiple ways to represent a single grapheme, and sometimes different graphemes should be treated equal. Therefore, when comparing or sorting strings, they should be normalized beforehand to produce correct results. This can be done with the  unicode-normalization crate, for example.

OsStringThis links to official Rust documentation and OsStrThis links to official Rust documentationEdit

OsString and OsStr are used when interfacing with platform-specific APIs. Their format varies by system and is therefore not exposed to the programmer. One can infallibly convert a String to an OsString. The reverse is not guaranteed as OsString may contain values unrepresentable by UTF-8, so the conversion can be done fallibly (OsString::into_string ) or lossily (OsStr::to_string_lossy ).

CStringThis links to official Rust documentation and CStrThis links to official Rust documentationEdit

CString and CStr are constrained by C language requirements. Namely, they are terminated by a nul byte (b'\0') and can't contain any other nul bytes. They can be created with the CString::new() method, and will fail if the input contains a non-terminal nul character.

Note that C strings are not constrained to UTF-8. This means that when a &CStr is converted to a &str, it must be validated.

Important traitsEdit

Deref Edit

Owned strings can be dereferenced to their borrowed counterparts:

  • String implements Deref<Target = str>
  • OsString implements Deref<Target = OsStr>
  • CString implements Deref<Target = CStr>

This means that str methods are also available for String because of auto-deref. For example, `String::new().chars()` is equivalent to `String::new().deref().chars()`.

AsRef Edit

All strings implement the AsRef trait to convert them to the borrowed variant:

  • String and str implement AsRef<str>
  • OsString and OsStr implement AsRef<OsStr>
  • CString and CStr implement AsRef<CStr>

This is useful to be generic over strings, when a string reference is enough, for example:

fn foo(s: impl AsRef<str>) {
    let s: &str = s.as_ref();
    // do something with s
}

foo(String::from("this works"));
foo("this also works");

For conversions in the opposite direction, the ToString  trait can be used.

FromStr  and ToString Edit

These traits are used to convert other values from and into a string.

FromStr is fallible, i.e. it returns a Result. It is used by str::parse() .

T: Display has a blanket implementation for ToString. This means that ToString doesn't need to be implemented manually; instead, one should implement the Display  trait. An implementation for ToString is then automatically provided.

Borrow  and ToOwned Edit

These traits are used to convert a borrowed string to an owned string and vice versa: String implements Borrow<str>, OsString implements Borrow<OsStr> and CString implements Borrow<CStr>.

As a result, strings can be used with the Cow  type, which stands for clone on write. It can be used to return a type that is either owned or borrowed, while avoiding allocations unless necessary. For example:

fn foo(s: &str) -> Cow<str> {
    if s.chars().all(char::is_lowercase) {
        Cow::Borrowed(s)
    } else {
        Cow::Owned(s.to_lowercase())
    }
}

let s: &str = foo("hello world").as_ref();

Sources of confusionEdit

Rust is more pedantic than other languages when it comes to string handling, which can lead to confusion as to why a certain type or trait is used. Additionally, as strings are such a fundamental type, there are some arguably inelegant or redundant items such as FromStr.

Cow<str>Edit

Cow<str> is not a special case, since Cow  can be used with any type that implements ToOwned, e.g. [T], Path and many more. However, Cow<str> is arguably the most common use case.

FromStr  and ToString Edit

While FromStr deviates from From  by returning a Result, it seems redundant with TryFrom  and ToString redundant with Into .

Indeed, TryFrom was going to make FromStr obsolete, but cannot as it would overlap with impl<T, U> TryFrom<T> for U where U: From<T> (#44174), which violates coherence. It is also used by str::parse . On the other hand, ToString::to_string() is used as a convenience method for format!("{}", x).