String Types

Strings in Rust are wrappers around sequences restricting them to contain valid text. The most common type is, which always contains valid UTF-8. Each owned type has a corresponding borrowed type which refers to a valid sub-sequence. can be borrowed as and implements   and , so that all   operations can be done on a. This is analagous to code>Vec and. Other string types include  which is an opaque, operating system-specific string format, and    which contains a C-compatible string.

The advantage of string types are their constraints. As opposed to storing text in a, string types guarantee their contents to be valid.

and are guaranteed to contain a sequence of UTF-8-encoded Unicode scalars by two properties. First, while indexing is still done by byte, slicing in the middle of a character will cause a panic. Secondly, the API provides no safe way to modify the string such that a character contains a code point outside of the range of valid scalars.

A  or   can only be converted from a   or   by the   and   family of functions which validate the input.

Unicode and UTF-8
Each character in a  may span multiple bytes, nevertheless indices still refer to bytes rather than characters. Thus, the length of a  is the number of bytes it takes up rather than the number of characters. Using an index in the middle of a character will cause a panic. The characters and their indices may be iterated using and. The length of these iterators may still not give you the length relevant for your use case.

Multiple characters may be composed to produce a single grapheme, and distinct graphemes are not always semantically distinct. An accent can combines with an adjacent character, and that same grapheme may also be representable as a single character. Thus, semantically identical strings may have a different number of characters. Ligatures and emoji characters may even conditionally combine into fewer graphemes based on the font or by preference. A knowledge of how the text will be rendered is required to calculate the length in graphemes. Additionally, different graphemes may be confusingly or deceptively similar, or even visually identical. Semantically comparing UTF-8 strings requires normalization appropriate for your use case. Such complexities of Unicode are beyond the scope of this article, but various crates are available to help with handling Unicode properly.

and are used when interfacing with platform-specific APIs. Their format varies by system and is therefore opaque. One can infallibly convert a  to an. The reverse is not guaranteed as  may contain values unrepresentable by Unicode, so the conversion can be done fallibly  or lossily.

and are constrained by C language requirements. Namely, they are terminated by a nul byte and they do not contain any nul bytes. They can be created by the  method, and will fail if the input contains a non-terminal nul character.

Sources of confusion
Rust is more pedantic than other languages when it comes to string handling, which can lead to confusion as to why a certain type or trait is used. Additionally, as strings are such a fundamental type, there are some arguably inelegant or redundant items such as.

is not a special case, just a generic use of. A copy-on-write type is an enum of   and. is used in the place of  to avoid allocating when borrowing is possible, but create a copy when mutability is required (without run-time coordination).

and
and  are traits with   and. While  deviates from  in returning a , it seems redundant with  and   redundant with.

Indeed,  was going to obsolete , but cannot as it would overlap with   #44174. It is also used by. On the other hand,  is a convenience method equivalent to   and blanket implemented for.