From 4cd1833599d7d5298be91600ceb5516ef69a6fee Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Mon, 22 Jul 2024 12:13:07 +0200 Subject: [PATCH 1/3] re-structure 'invalid values' enumeration to instead define what is *valid* --- src/behavior-considered-undefined.md | 73 ++++++++++++++++------------ 1 file changed, 43 insertions(+), 30 deletions(-) diff --git a/src/behavior-considered-undefined.md b/src/behavior-considered-undefined.md index aab055471..efa1d37a8 100644 --- a/src/behavior-considered-undefined.md +++ b/src/behavior-considered-undefined.md @@ -59,33 +59,10 @@ Please read the [Rustonomicon] before writing unsafe code. * Executing code compiled with platform features that the current platform does not support (see [`target_feature`]), *except* if the platform explicitly documents this to be safe. * Calling a function with the wrong call ABI or unwinding from a function with the wrong unwind ABI. -* Producing an invalid value, even in private fields and locals. "Producing" a +* Producing an [invalid value][invalid-values]. "Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation. - The following values are invalid (at their respective type): - * A value other than `false` (`0`) or `true` (`1`) in a [`bool`]. - * A discriminant in an `enum` not included in the type definition. - * A null `fn` pointer. - * A value in a `char` which is a surrogate or above `char::MAX`. - * A `!` (all values are invalid for this type). - * An integer (`i*`/`u*`), floating point value (`f*`), or raw pointer obtained - from [uninitialized memory][undef], or uninitialized memory in a `str`. - * A reference or `Box` that is [dangling], misaligned, or points to an invalid value - (in case of dynamically sized types, using the actual dynamic type of the - pointee as determined by the metadata). - * Invalid metadata in a wide reference, `Box`, or raw pointer. The requirement - for the metadata is determined by the type of the unsized tail: - * `dyn Trait` metadata is invalid if it is not a pointer to a vtable for `Trait`. - * Slice (`[T]`) metadata is invalid if the length is not a valid `usize` - (i.e., it must not be read from uninitialized memory). - Furthermore, for wide references and `Box`, slice metadata is invalid - if it makes the total size of the pointed-to value bigger than `isize::MAX`. - * Invalid values for a type with a custom definition of invalid values. - In the standard library, this affects [`NonNull`] and [`NonZero*`]. - - > **Note**: `rustc` achieves this with the unstable - > `rustc_layout_scalar_valid_range_*` attributes. * Incorrect use of inline assembly. For more details, refer to the [rules] to follow when writing code that uses inline assembly. * **In [const context](const_eval.md#const-context)**: transmuting or otherwise @@ -94,11 +71,6 @@ Please read the [Rustonomicon] before writing unsafe code. 'Reinterpreting' refers to loading the pointer value at integer type without a cast, e.g. by doing raw pointer casts or using a union. -**Note:** Uninitialized memory is also implicitly invalid for any type that has -a restricted set of valid values. In other words, the only cases in which -reading uninitialized memory is permitted are inside `union`s and in "padding" -(the gaps between the fields/elements of a type). - > **Note**: Undefined behavior affects the entire program. For example, calling > a function in C that exhibits undefined behavior of C means your entire > program contains undefined behaviour that can also affect the Rust code. And @@ -155,6 +127,46 @@ entire range, so it is important that the length metadata is never too large. In particular, the dynamic size of a Rust value (as determined by `size_of_val`) must never exceed `isize::MAX`. +### Invalid values +[invalid-values]: #invalid-values + +The Rust compiler assumes that all values produced during program execution are +"valid", and producing an invalid value is hence immediate UB. + +Whether a value is valid depends on the type: +* A [`bool`] value must be `false` (`0`) or `true` (`1`). +* A `fn` pointer value must be non-null. +* A `char` value must not be a surrogate (i.e., must not be in the range `0xD800..=0xDFFF`) and must be equal to or less than `char::MAX`. +* A `!` value must never exist. +* An integer (`i*`/`u*`), floating point value (`f*`), or raw pointer must be + initialized, i.e., must not be obtained from [uninitialized memory][undef]. +* A `str` value is treated like `[u8]`, i.e. it must be initialized. +* An `enum` must have a valid discriminant, and all fields of the variant indicated by that discriminant must be valid at their respective type. +* A `struct`, tuple, and array requires all fields/elements to be valid at their respective type. +* For a `union`, the exact validity requirements are not decided yet. The following is certain: + * If the `union` has a zero-sized field, then all values are valid. + * If a value is valid for a particular `union` field, then it is valid for the union. +* A reference or [`Box`] must be aligned, it cannot be [dangling], and it must point to a valid value + (in case of dynamically sized types, using the actual dynamic type of the + pointee as determined by the metadata). +* The metadata of a wide reference, [`Box`], or raw pointer must match + the type of the unsized tail: + * `dyn Trait` metadata must be a pointer to a compiler-generated vtable for `Trait`. + * Slice (`[T]`) metadata must be a valid `usize`. + Furthermore, for wide references and [`Box`], slice metadata is invalid + if it makes the total size of the pointed-to value bigger than `isize::MAX`. +* If a type has a custom range of a valid values, then a valid value must be in that range. + In the standard library, this affects [`NonNull`] and [`NonZero`]. + + > **Note**: `rustc` achieves this with the unstable + > `rustc_layout_scalar_valid_range_*` attributes. + +**Note:** Uninitialized memory is also implicitly invalid for any type that has +a restricted set of valid values. In other words, the only cases in which +reading uninitialized memory is permitted are inside `union`s and in "padding" +(the gaps between the fields of a type). + + [`bool`]: types/boolean.md [`const`]: items/constant-items.md [noalias]: http://llvm.org/docs/LangRef.html#noalias @@ -164,7 +176,8 @@ must never exceed `isize::MAX`. [`UnsafeCell`]: ../std/cell/struct.UnsafeCell.html [Rustonomicon]: ../nomicon/index.html [`NonNull`]: ../core/ptr/struct.NonNull.html -[`NonZero*`]: ../core/num/index.html +[`NonZero`]: ../core/num/struct.NonZero.html +[`Box`]: ../alloc/boxed/struct.Box.html [place expression context]: expressions.md#place-expressions-and-value-expressions [rules]: inline-assembly.md#rules-for-inline-assembly [points to]: #pointed-to-bytes From a16c846b3cc106deeb940390b78b4c3bee7eb472 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Mon, 22 Jul 2024 16:16:50 +0200 Subject: [PATCH 2/3] say more clearly what is still being debated --- src/behavior-considered-undefined.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/src/behavior-considered-undefined.md b/src/behavior-considered-undefined.md index efa1d37a8..f3db7bcad 100644 --- a/src/behavior-considered-undefined.md +++ b/src/behavior-considered-undefined.md @@ -143,15 +143,18 @@ Whether a value is valid depends on the type: * A `str` value is treated like `[u8]`, i.e. it must be initialized. * An `enum` must have a valid discriminant, and all fields of the variant indicated by that discriminant must be valid at their respective type. * A `struct`, tuple, and array requires all fields/elements to be valid at their respective type. -* For a `union`, the exact validity requirements are not decided yet. The following is certain: - * If the `union` has a zero-sized field, then all values are valid. - * If a value is valid for a particular `union` field, then it is valid for the union. +* For a `union`, the exact validity requirements are not decided yet. + Obviously, all values that can be created entirely in safe code are valid. + If the union has a zero-sized field, then every possible value is valid. + Further details are [still being debated](https://github.com/rust-lang/unsafe-code-guidelines/issues/438). * A reference or [`Box`] must be aligned, it cannot be [dangling], and it must point to a valid value (in case of dynamically sized types, using the actual dynamic type of the pointee as determined by the metadata). + Note that the last point (about pointing to a valid value) is still subject of debate. * The metadata of a wide reference, [`Box`], or raw pointer must match the type of the unsized tail: * `dyn Trait` metadata must be a pointer to a compiler-generated vtable for `Trait`. + (For raw pointers, this requirement is still subject of debate.) * Slice (`[T]`) metadata must be a valid `usize`. Furthermore, for wide references and [`Box`], slice metadata is invalid if it makes the total size of the pointed-to value bigger than `isize::MAX`. From 96b698f83c106b3048db52b9e9e91f7d193c99ee Mon Sep 17 00:00:00 2001 From: Travis Cross Date: Tue, 23 Jul 2024 22:52:45 +0000 Subject: [PATCH 3/3] Improve some editorial bits --- src/behavior-considered-undefined.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/behavior-considered-undefined.md b/src/behavior-considered-undefined.md index f3db7bcad..1ee03e6d7 100644 --- a/src/behavior-considered-undefined.md +++ b/src/behavior-considered-undefined.md @@ -150,11 +150,11 @@ Whether a value is valid depends on the type: * A reference or [`Box`] must be aligned, it cannot be [dangling], and it must point to a valid value (in case of dynamically sized types, using the actual dynamic type of the pointee as determined by the metadata). - Note that the last point (about pointing to a valid value) is still subject of debate. + Note that the last point (about pointing to a valid value) remains a subject of some debate. * The metadata of a wide reference, [`Box`], or raw pointer must match the type of the unsized tail: * `dyn Trait` metadata must be a pointer to a compiler-generated vtable for `Trait`. - (For raw pointers, this requirement is still subject of debate.) + (For raw pointers, this requirement remains a subject of some debate.) * Slice (`[T]`) metadata must be a valid `usize`. Furthermore, for wide references and [`Box`], slice metadata is invalid if it makes the total size of the pointed-to value bigger than `isize::MAX`.