From d0c9729af21c0b47339c3357d995885b3eab017f Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Mon, 22 Jul 2019 17:27:39 +0200 Subject: [PATCH 01/12] add basic memory interface --- wip/memory-interface.md | 110 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 110 insertions(+) create mode 100644 wip/memory-interface.md diff --git a/wip/memory-interface.md b/wip/memory-interface.md new file mode 100644 index 00000000..877ca2d2 --- /dev/null +++ b/wip/memory-interface.md @@ -0,0 +1,110 @@ +# Rust Memory Interface + +**Note:** This document is not normative nor endorsed by the UCG WG. Its purpose is to be the basis for discussion and to set down some key terminology. + +The purpose of this document is to describe the interface between a Rust program and memory. +This interface is a key part of the Rust Abstract Machine: it lets us separate concerns by splitting the Machine (i.e., its specification) into two pieces, connected by this well-defined interface: +* The *expression/statement semantics* of Rust boils down to explaining which "memroy events" (calls to the memory interface) happen in which order. +* The Rust *memory model* explains which interactions with the memory are legal (the others are UB), and which values can be returned by reads. + +The interface is also opinionated in several ways; this is not intended to be able to support *any imaginable* memory model, but rather start the process of reducing the design space of what we consider a "reasonable" memory model for Rust. +For example, it explicitly acknowledges that pointers are not just integers and that uninitialized memory is special (both are true for C and C++ as well but you have to read the standard very careful, and consult non-normative defect report responses, to see this). +Another key property of the interface presented below is that it is *untyped*. +This encodes the fact that in Rust, *operations are typed, but memory is not*---a key difference to C and C++ with their type-based strict aliasing rules. + +## Pointers + +One key question a memory model has to answer is *what is a pointer*. +It might seem like the answer is just "an integer of appropriate size", but [that is not the case][pointers-complicated]. +So we will leave this question open, and treat `Pointer` as an "associated type" of the memory interface + +## Bytes + +The unit of communication between the memory model and the rest of the program is a *byte*. +Again the question of "what is a byte" is not as trivial as it might seem; beyond `u8` values we have to represent `Pointer`s and [uninitialized memory][uninit]. +We define the `Byte` type (in terms of an arbitrary `Pointer` type) as follows: + +```rust +enum Byte { + /// The "normal" case: a (frozen, initialized) integer in `0..256`. + Raw(u8), + /// An uninitialized byte. + Uninit, + /// One byte of a pointer. + Pointer { + /// The pointer of which this is a byte. + ptr: Pointer, + /// Which byte of the pointer this is. + /// `idx` will always be in `0..size_of::()`. + idx: u8, + } +} +``` + +## Memory interface + +The Rust memory interface is described by the following (not-yet-complete) trait definition: + +```rust +/// *Note*: All memory operations can be non-deterministic, which means that +/// executing the same operation on the same memory can have different results. +/// We also let all operations potentially mutated memory. For example, reads +/// actually do change the current state when considering concurrency or +/// Stacked Borrows. +trait Memory { + /// The type of pointer values. + type Pointer; + + /// The type of memory errors (i.e., ways in which the program can cause UB + /// by interacting with memory). + type Error; + + /// Create a new allocation. + fn allocate(&mut self, size: u64, align: u64) -> Result; + + /// Remove an allocation. + fn deallocate(&mut self, ptr: Self::Pointer, size: u64, align: u64) -> Result<(), Self::Error>; + + /// Write some bytes to memory. + fn write(&mut self, ptr: Self::Pointer, bytes: Vec>) -> Result<(), Self::Error>; + + /// Read some bytes from memory. + fn read(&mut self, ptr: Self::Pointer, len: u64) -> Result>, Self::Error>; + + /// Offset the given pointer. + fn offset(&mut self, ptr: Self::Pointer, offset: u64, mode: OffsetMode) -> Result; + + /// Cast the given pointer to an integer. + fn ptr_to_int(&mut self, ptr: Self::Pointer) -> Result; + + /// Cast the given integer to a pointer. + fn int_to_ptr(&mut self, int: u64) -> Result; +} + +/// The rules applying to this pointer offset operation. +enum OffsetMode { + /// Wrapping offset; never UB. + Wrapping, + /// Non-wrapping offset; UB if it wraps. + NonWrapping, + /// In-bounds offset; UB if it wraps or if old and new pointer are not both + /// in-bounds of the same allocation (details are specified by the memory + /// model). + Inbounds, +} +``` + +This is a very basic memory interface that is incomplete in at least the following ways: + +* To implement rules like "dereferencing a null, unaligned, or dangling raw pointer is UB" (even if no memory access happens), there needs to be a way to do an "alignment, bounds and null-check". +* There needs to be some way to do alignment checks -- either using the above operation, or by adding `align` parameters to `read` and `write`. +* To represent concurrency, many operations need to take a "thread ID" and `read` and `write` need to take an [`Ordering`]. +* To represent [Stacked Borrows], there needs to be a "retag" operation, and that one will in fact be "lightly typed" (it cares about `UnsafeCell`). +* Maybe we want operations that can compare pointers without casting them to integers. + +But I think it can still be useful to provide some basic terminology and grounds for further discussion. + +[pointers-complicated]: https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html +[uninit]: https://www.ralfj.de/blog/2019/07/14/uninit.html +[`Ordering`]: https://doc.rust-lang.org/nightly/core/sync/atomic/enum.Ordering.html +[Stacked Borrows]: stacked-borrows.md From 499fa9ee1090f489c9f1d4ecc95a4b252dcd780d Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Wed, 24 Jul 2019 21:39:36 +0200 Subject: [PATCH 02/12] begin with value domain --- wip/memory-interface.md | 29 ++++++---- wip/value-domain.md | 120 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 137 insertions(+), 12 deletions(-) create mode 100644 wip/value-domain.md diff --git a/wip/memory-interface.md b/wip/memory-interface.md index 877ca2d2..bfde5c61 100644 --- a/wip/memory-interface.md +++ b/wip/memory-interface.md @@ -16,6 +16,7 @@ This encodes the fact that in Rust, *operations are typed, but memory is not*--- One key question a memory model has to answer is *what is a pointer*. It might seem like the answer is just "an integer of appropriate size", but [that is not the case][pointers-complicated]. +This becomes even more prominent with aliasing models such as [Stacked Borrows]. So we will leave this question open, and treat `Pointer` as an "associated type" of the memory interface ## Bytes @@ -31,7 +32,7 @@ enum Byte { /// An uninitialized byte. Uninit, /// One byte of a pointer. - Pointer { + PtrFragment { /// The pointer of which this is a byte. ptr: Pointer, /// Which byte of the pointer this is. @@ -41,6 +42,12 @@ enum Byte { } ``` +The purpose of `PtrFragment` is to be able to have a byte-wise representation of a `Pointer`. +On a 32-bit system, the sequence of 4 bytes representing `ptr: Pointer` is: +``` +[PtrFragment { ptr, idx: 0 }, PtrFragment { ptr, idx: 1 }, PtrFragment { ptr, idx: 2 }, PtrFragment { ptr, idx: 3 }] +``` + ## Memory interface The Rust memory interface is described by the following (not-yet-complete) trait definition: @@ -51,34 +58,32 @@ The Rust memory interface is described by the following (not-yet-complete) trait /// We also let all operations potentially mutated memory. For example, reads /// actually do change the current state when considering concurrency or /// Stacked Borrows. +/// And finally, all operations are fallible (they return `Result`); if they +/// fail, that means the program caused UB. trait Memory { /// The type of pointer values. type Pointer; - /// The type of memory errors (i.e., ways in which the program can cause UB - /// by interacting with memory). - type Error; - /// Create a new allocation. - fn allocate(&mut self, size: u64, align: u64) -> Result; + fn allocate(&mut self, size: u64, align: u64) -> Result; /// Remove an allocation. - fn deallocate(&mut self, ptr: Self::Pointer, size: u64, align: u64) -> Result<(), Self::Error>; + fn deallocate(&mut self, ptr: Self::Pointer, size: u64, align: u64) -> Result<(), Error>; /// Write some bytes to memory. - fn write(&mut self, ptr: Self::Pointer, bytes: Vec>) -> Result<(), Self::Error>; + fn write(&mut self, ptr: Self::Pointer, bytes: Vec>) -> Result<(), Error>; /// Read some bytes from memory. - fn read(&mut self, ptr: Self::Pointer, len: u64) -> Result>, Self::Error>; + fn read(&mut self, ptr: Self::Pointer, len: u64) -> Result>, Error>; /// Offset the given pointer. - fn offset(&mut self, ptr: Self::Pointer, offset: u64, mode: OffsetMode) -> Result; + fn offset(&mut self, ptr: Self::Pointer, offset: u64, mode: OffsetMode) -> Result; /// Cast the given pointer to an integer. - fn ptr_to_int(&mut self, ptr: Self::Pointer) -> Result; + fn ptr_to_int(&mut self, ptr: Self::Pointer) -> Result; /// Cast the given integer to a pointer. - fn int_to_ptr(&mut self, int: u64) -> Result; + fn int_to_ptr(&mut self, int: u64) -> Result; } /// The rules applying to this pointer offset operation. diff --git a/wip/value-domain.md b/wip/value-domain.md new file mode 100644 index 00000000..e63e7e48 --- /dev/null +++ b/wip/value-domain.md @@ -0,0 +1,120 @@ +# Rust Value Domain + +**Note:** This document is not normative nor endorsed by the UCG WG. Its purpose is to be the basis for discussion and to set down some key terminology. + +The purpose of this document is to describe what the set of *all possible values* is in Rust. +This is an important definition: one key element of a Rust specification will be to define the [representation relation][representation] of every type. +This relation relates values with lists of bytes, so before we can even start specifying the relation we have to specify the involved domains. +`Byte` is defined as part of [the memory interface][memory-interface]; this document is about defining `Value`. + +[representation]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#representation +[memory-interface]: memory-interface.md + +## Values + +The Rust value domain is described by the following (incomplete) type definition (in terms of an arbitrary `Pointer` type instantiated by [the memory interface][memory-interface]): + +```rust +enum Value { + /// A mathematical integer. + Int(BigInt), + /// A Boolean value. + Bool(bool), + /// A pointer. + Ptr(Pointer), + /// A zero-sized "unit". + Unit, + /// An uninitialized value. + Uninit, + /// An n-tuple. + Tuple(Vec), + /// A variant of a sum type. + Variant { + idx: u64, + data: Box, + }, + /* ... */ +} +``` + +As Rust grows, we might expand this definition. That is okay; all previously defined representation relations are still well-defined when the domain grows, the newly added values will just not be valid for old types as one would expect. + +## Example value relations + +We show some examples for how one might want to use this `Value` domain to define the value relation for a type. + +### `bool` + +The value relation for `bool` relates `Bool(true)` with `[Raw(0)]` and `Bool(false)` with [`Raw(1)`], and that's it. + +### `()` + +The value relation for the `()` type relates `Unit` with the empty list `[]`, and that's it. + +### `!` + +The value relation for the `!` type is empty: nothing is related to anything at this type. + +### `#[repr(C)] struct Pair(T, U)` + +The value relation for `Pair` us based on the value relations for `T` and `U`. +A value `Tuple([t, u])` (assuming we can use array notation to "match" on `Vec`) is represented by a list of bytes `rt ++ pad1 ++ ru ++ pad2` (using `++` for list concatenation) if: + +* `t` is represented by `rt` at type `T`. +* `u` is represented by `ru` at type `U`. +* The length of `rt ++ pad1` is equal to the offset of the `U` field. +* The length of the entire list `rt ++ pad1 ++ ru ++ pad2` is equal to the size of `Pair`. + +This relation demonstrates that value of type `Pair` are always 2-tuples (aka, pairs). +It also shows that the actual content of the padding bytes is entirely irrelevant, we only care to have the right number of them to "pad" `ru` to the right place and to "pad" the entire list to have the right length. + +### `&T`/`&mut T` + +Reference types are tricky. +But a possible value relation for sized `T` is: +A value `Ptr(ptr)` is related to `[PtrFragment { ptr, idx: 0 }, ..., PtrFragment { ptr, idx: N-1 }]` where `N == size_of::()` if `ptr` is non-NULL and aligned to `align_of::()`. + +### `u8` + +For the value representation of integer types, there are two different reasonable choices. +Certainly, a value `Int(i)` where `i` in `0..256` is related to `[Raw(i as u8)]`. + +And then, maybe, we also want to additionally say that value `Uninit` is related to `[Uninit]`. +This essentially corresponds to saying that uninitialized memory is a valid representation of a `u8` value (namely, the uninitialized value). + +## The role of the value representation in the operational semantics + +One key use of the value representation is to define a "typed" interface to memory: + +```rust +trait TypedMemory: Memory { + /// Write a value of the given type to memory. + fn typed_write(&mut self, ptr: Self::Pointer, val: Value, ty: Type) -> Result<(), Error>; + + /// Read a value of the given type. + fn typed_read(&mut self, ptr: Self::Pointer, ty: Type) -> Result; +} +``` + +here, `Type` is some representation of the Rust type system (akin to [`Ty`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/type.Ty.html) in the compiler). +We can implement `TypedMemory` for any `Memory` as follows: +* For `typed_write`, pick any representation of `val` for `ty`, and call `Memory::write`. If no representation exists, we have UB. +* For `typed_read`, read `ty.size()` many bytes from memory, and then determine which value this list of bytes represents. If it does not represent any value, we have UB. + +In particular, this defines the semantics of a "typed copy": when an assignment is executed in Rust (at some type `T`), or when an argument of type `T` is passed to a function or a return value of type `T` returned from a function, this is called a "typed copy". +Its semantics is basically to do a `typed_read` at `T` followed by a `typed_write` at `T`. +In particular, this means doing a "typed copy" at `T` of some memory that has no valid representation at `T` is undefined behavior! +This also means that for types that have padding, the "typed copy" does not preserve the padding bytes. + +## Relation to validity invariant + +One way we *could* also use the value representation (and the author things this is exceedingly elegant) is to define the validity invariant. +Certainly, it is the case that if a list of bytes is not related to any value for a given type `T`, then that list of bytes is *invalid* for `T` and it should be UB to produce such a list of bytes at type `T`. +We could decide that this is an "if and only if", i.e., that the validity invariant for a type is exactly "must be in the value representation". +For many types this is likely what we will do anyway (e.g., for `bool` and `!` and `()` and integers), but for references, this choice would mean that *validity of the reference cannot depend on what memory looks like*---so "dereferencable" and "points to valid data" cannot be part of the validity invariant for references. +The reason this is so elegant is that, as we have seen above, a "typed copy" already very naturally is UB when the memory that is copied is not a valid representation of `T`. +This means we do not even need a special clause in our specification for the validity invariant---in fact, the term does not even have to appear in the specification---as everything juts falls out of how a "typed copy" applies the value representation twice. + +Justifying the `dereferencable` LLVM attribute is, in this case, left to the aliasing model (e.g. [Stacked Borrows]), just like that is needed to justify the `noalias` attribute. + +[Stacked Borrows]: stacked-borrows.md From 648ec0226f6fb2ebc791b4f6df257ecd25ee748e Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 25 Jul 2019 11:28:40 +0200 Subject: [PATCH 03/12] ptr-to-byte --- wip/memory-interface.md | 34 ++++++++++++++++++++++++++++------ wip/value-domain.md | 9 +++++---- 2 files changed, 33 insertions(+), 10 deletions(-) diff --git a/wip/memory-interface.md b/wip/memory-interface.md index bfde5c61..ccc292bb 100644 --- a/wip/memory-interface.md +++ b/wip/memory-interface.md @@ -11,6 +11,7 @@ The interface is also opinionated in several ways; this is not intended to be ab For example, it explicitly acknowledges that pointers are not just integers and that uninitialized memory is special (both are true for C and C++ as well but you have to read the standard very careful, and consult non-normative defect report responses, to see this). Another key property of the interface presented below is that it is *untyped*. This encodes the fact that in Rust, *operations are typed, but memory is not*---a key difference to C and C++ with their type-based strict aliasing rules. +At the same time, the memory model provides a *side-effect free* way to turn pointers into "raw bytes", which is *not* [the direction C++ is moving towards](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf), so we might have to revisit this choice later. ## Pointers @@ -36,7 +37,7 @@ enum Byte { /// The pointer of which this is a byte. ptr: Pointer, /// Which byte of the pointer this is. - /// `idx` will always be in `0..size_of::()`. + /// `idx` will always be in `0..PTR_SIZE`. idx: u8, } } @@ -48,6 +49,21 @@ On a 32-bit system, the sequence of 4 bytes representing `ptr: Pointer` is: [PtrFragment { ptr, idx: 0 }, PtrFragment { ptr, idx: 1 }, PtrFragment { ptr, idx: 2 }, PtrFragment { ptr, idx: 3 }] ``` +Based on the `PtrToInt` trait (see below), we can turn every initialized `Byte` into an integer in `0..256`: + +```rust +impl Byte { + fn as_int(self) -> Option { + match self { + Byte::Raw(int) => Some(int), + Byte::Uninit => None, + Byte::PtrFragment { ptr, idx } => + ptr.get_byte(idx), + } + } +} +``` + ## Memory interface The Rust memory interface is described by the following (not-yet-complete) trait definition: @@ -55,14 +71,17 @@ The Rust memory interface is described by the following (not-yet-complete) trait ```rust /// *Note*: All memory operations can be non-deterministic, which means that /// executing the same operation on the same memory can have different results. -/// We also let all operations potentially mutated memory. For example, reads +/// We also let all operations potentially mutate memory. For example, reads /// actually do change the current state when considering concurrency or /// Stacked Borrows. /// And finally, all operations are fallible (they return `Result`); if they /// fail, that means the program caused UB. trait Memory { /// The type of pointer values. - type Pointer; + type Pointer: Copy + PtrToInt; + + /// The size of pointer values. + const PTR_SIZE: u64; /// Create a new allocation. fn allocate(&mut self, size: u64, align: u64) -> Result; @@ -79,13 +98,16 @@ trait Memory { /// Offset the given pointer. fn offset(&mut self, ptr: Self::Pointer, offset: u64, mode: OffsetMode) -> Result; - /// Cast the given pointer to an integer. - fn ptr_to_int(&mut self, ptr: Self::Pointer) -> Result; - /// Cast the given integer to a pointer. fn int_to_ptr(&mut self, int: u64) -> Result; } +/// The `Pointer` type must know how to extract its bytes, *without any access to the `Memory`*. +trait PtrToInt { + /// Get the `idx`-th byte of the pointer. `idx` must be in `0..PTR_SIZE`. + fn get_byte(self, idx: u8) -> u8; +} + /// The rules applying to this pointer offset operation. enum OffsetMode { /// Wrapping offset; never UB. diff --git a/wip/value-domain.md b/wip/value-domain.md index e63e7e48..b637b451 100644 --- a/wip/value-domain.md +++ b/wip/value-domain.md @@ -4,7 +4,8 @@ The purpose of this document is to describe what the set of *all possible values* is in Rust. This is an important definition: one key element of a Rust specification will be to define the [representation relation][representation] of every type. -This relation relates values with lists of bytes, so before we can even start specifying the relation we have to specify the involved domains. +This relation relates values with lists of bytes: it says, for a given value and list of bytes, if that value is represented by that list. +However, before we can even start specifying the relation, we have to specify the involved domains. `Byte` is defined as part of [the memory interface][memory-interface]; this document is about defining `Value`. [representation]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#representation @@ -45,7 +46,7 @@ We show some examples for how one might want to use this `Value` domain to defin ### `bool` -The value relation for `bool` relates `Bool(true)` with `[Raw(0)]` and `Bool(false)` with [`Raw(1)`], and that's it. +The value relation for `bool` relates `Bool(b)` with `[bb]` if and only if `bb.as_int() == Some(if b { 1 } else { 0 })`. ### `()` @@ -72,12 +73,12 @@ It also shows that the actual content of the padding bytes is entirely irrelevan Reference types are tricky. But a possible value relation for sized `T` is: -A value `Ptr(ptr)` is related to `[PtrFragment { ptr, idx: 0 }, ..., PtrFragment { ptr, idx: N-1 }]` where `N == size_of::()` if `ptr` is non-NULL and aligned to `align_of::()`. +A value `Ptr(ptr)` is related to `[PtrFragment { ptr, idx: 0 }, ..., PtrFragment { ptr, idx: PTR_SIZE-1 }]` if `ptr` is non-NULL and appropriately aligned (defining alignment is left open for now). ### `u8` For the value representation of integer types, there are two different reasonable choices. -Certainly, a value `Int(i)` where `i` in `0..256` is related to `[Raw(i as u8)]`. +Certainly, a value `Int(i)` where `i` in `0..256` is related to `[b]` if `b.as_int() == Some(i)`. And then, maybe, we also want to additionally say that value `Uninit` is related to `[Uninit]`. This essentially corresponds to saying that uninitialized memory is a valid representation of a `u8` value (namely, the uninitialized value). From ca891ac9199afecce16a752487194aa47fc1bc15 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 25 Jul 2019 11:58:49 +0200 Subject: [PATCH 04/12] more editing --- wip/memory-interface.md | 22 +++++++++++----------- wip/value-domain.md | 31 +++++++++++++++++++------------ 2 files changed, 30 insertions(+), 23 deletions(-) diff --git a/wip/memory-interface.md b/wip/memory-interface.md index ccc292bb..8cd6dabe 100644 --- a/wip/memory-interface.md +++ b/wip/memory-interface.md @@ -4,21 +4,22 @@ The purpose of this document is to describe the interface between a Rust program and memory. This interface is a key part of the Rust Abstract Machine: it lets us separate concerns by splitting the Machine (i.e., its specification) into two pieces, connected by this well-defined interface: -* The *expression/statement semantics* of Rust boils down to explaining which "memroy events" (calls to the memory interface) happen in which order. +* The *expression/statement semantics* of Rust boils down to explaining which "memroy events" (calls to the memory interface) happen in which order. This part of the specification is *pure* in the sense that it has no "state": everything that needs to be remembered from one expression evaluation to the next is communicated through memory. * The Rust *memory model* explains which interactions with the memory are legal (the others are UB), and which values can be returned by reads. -The interface is also opinionated in several ways; this is not intended to be able to support *any imaginable* memory model, but rather start the process of reducing the design space of what we consider a "reasonable" memory model for Rust. -For example, it explicitly acknowledges that pointers are not just integers and that uninitialized memory is special (both are true for C and C++ as well but you have to read the standard very careful, and consult non-normative defect report responses, to see this). +The interface shown below is also opinionated in several ways. +It is not intended to be able to support *any imaginable* memory model, but rather start the process of reducing the design space of what we consider a "reasonable" memory model for Rust. +For example, it explicitly acknowledges that pointers are not just integers and that uninitialized memory is special (both are true for C and C++ as well but you have to read the standard very careful, and consult [non-normative defect report responses](http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm), to see this). Another key property of the interface presented below is that it is *untyped*. -This encodes the fact that in Rust, *operations are typed, but memory is not*---a key difference to C and C++ with their type-based strict aliasing rules. -At the same time, the memory model provides a *side-effect free* way to turn pointers into "raw bytes", which is *not* [the direction C++ is moving towards](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf), so we might have to revisit this choice later. +This implies that in Rust, *operations are typed, but memory is not*---a key difference to C and C++ with their type-based strict aliasing rules. +At the same time, the memory model provides a *side-effect free* way to turn pointers into "raw bytes", which is *not* [the direction C++ is moving towards](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf), and we might have to revisit this choice later if it turns out to not be workable. ## Pointers One key question a memory model has to answer is *what is a pointer*. It might seem like the answer is just "an integer of appropriate size", but [that is not the case][pointers-complicated]. This becomes even more prominent with aliasing models such as [Stacked Borrows]. -So we will leave this question open, and treat `Pointer` as an "associated type" of the memory interface +So the interface will leave it up to the concrete instance to answer this question, and carry `Pointer` as an associated type. ## Bytes @@ -58,7 +59,7 @@ impl Byte { Byte::Raw(int) => Some(int), Byte::Uninit => None, Byte::PtrFragment { ptr, idx } => - ptr.get_byte(idx), + Some(ptr.get_byte(idx)), } } } @@ -96,9 +97,10 @@ trait Memory { fn read(&mut self, ptr: Self::Pointer, len: u64) -> Result>, Error>; /// Offset the given pointer. - fn offset(&mut self, ptr: Self::Pointer, offset: u64, mode: OffsetMode) -> Result; + fn offset(&mut self, ptr: Self::Pointer, offset: u64, mode: OffsetMode) + -> Result; - /// Cast the given integer to a pointer. + /// Cast the given integer to a pointer. (The other direction is handled by `PtrToInt` below.) fn int_to_ptr(&mut self, int: u64) -> Result; } @@ -129,8 +131,6 @@ This is a very basic memory interface that is incomplete in at least the followi * To represent [Stacked Borrows], there needs to be a "retag" operation, and that one will in fact be "lightly typed" (it cares about `UnsafeCell`). * Maybe we want operations that can compare pointers without casting them to integers. -But I think it can still be useful to provide some basic terminology and grounds for further discussion. - [pointers-complicated]: https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html [uninit]: https://www.ralfj.de/blog/2019/07/14/uninit.html [`Ordering`]: https://doc.rust-lang.org/nightly/core/sync/atomic/enum.Ordering.html diff --git a/wip/value-domain.md b/wip/value-domain.md index b637b451..a8fa24e4 100644 --- a/wip/value-domain.md +++ b/wip/value-domain.md @@ -23,8 +23,6 @@ enum Value { Bool(bool), /// A pointer. Ptr(Pointer), - /// A zero-sized "unit". - Unit, /// An uninitialized value. Uninit, /// An n-tuple. @@ -34,6 +32,8 @@ enum Value { idx: u64, data: Box, }, + /// A "bag of raw bytes". + RawBag(Vec>), /* ... */ } ``` @@ -46,11 +46,12 @@ We show some examples for how one might want to use this `Value` domain to defin ### `bool` -The value relation for `bool` relates `Bool(b)` with `[bb]` if and only if `bb.as_int() == Some(if b { 1 } else { 0 })`. +The value relation for `bool` relates `Bool(b)` with `[r]` if and only if `r.as_int() == Some(if b { 1 } else { 0 })`. +(`as_int` is defined in [the memory interface][memory-interface].) ### `()` -The value relation for the `()` type relates `Unit` with the empty list `[]`, and that's it. +The value relation for the `()` type relates the empty tuple `Tuple([])` (assuming we can use array notation to "match" on `Vec`) with the empty byte list `[]`, and that's it. ### `!` @@ -58,16 +59,17 @@ The value relation for the `!` type is empty: nothing is related to anything at ### `#[repr(C)] struct Pair(T, U)` -The value relation for `Pair` us based on the value relations for `T` and `U`. -A value `Tuple([t, u])` (assuming we can use array notation to "match" on `Vec`) is represented by a list of bytes `rt ++ pad1 ++ ru ++ pad2` (using `++` for list concatenation) if: +The value relation for `Pair` is based on the value relations for `T` and `U`. +A value `Tuple([t, u])` is represented by a list of bytes `rt ++ pad1 ++ ru ++ pad2` (using `++` for list concatenation) if: * `t` is represented by `rt` at type `T`. * `u` is represented by `ru` at type `U`. * The length of `rt ++ pad1` is equal to the offset of the `U` field. * The length of the entire list `rt ++ pad1 ++ ru ++ pad2` is equal to the size of `Pair`. -This relation demonstrates that value of type `Pair` are always 2-tuples (aka, pairs). -It also shows that the actual content of the padding bytes is entirely irrelevant, we only care to have the right number of them to "pad" `ru` to the right place and to "pad" the entire list to have the right length. +This relation specifies that values of type `Pair` are always 2-tuples (aka, pairs). +It also says that the actual content of the padding bytes is entirely irrelevant, we only care to have the right number of them to "pad" `ru` to the right place and to "pad" the entire list to have the right length. +So, for example when considering `Pair`, the value `Tuple[42, 119]` is represented on a little-endian target by `[Raw(42), byte, Raw(119), Raw(0)]` for *any* `byte: Byte`. ### `&T`/`&mut T` @@ -80,9 +82,14 @@ A value `Ptr(ptr)` is related to `[PtrFragment { ptr, idx: 0 }, ..., PtrFragment For the value representation of integer types, there are two different reasonable choices. Certainly, a value `Int(i)` where `i` in `0..256` is related to `[b]` if `b.as_int() == Some(i)`. -And then, maybe, we also want to additionally say that value `Uninit` is related to `[Uninit]`. +And then, maybe, we also want to additionally say that value `Uninit` is related to byte list `[Uninit]`. This essentially corresponds to saying that uninitialized memory is a valid representation of a `u8` value (namely, the uninitialized value). +### `union` + +The `union` type does not even try to interpret memory, so for a `union` of size `n`, the value relation says that for any byte list `bytes` of that length, `RawBag(bytes)` is related to `bytes`. +(Note however that [this definition might not be implementable](https://github.com/rust-lang/unsafe-code-guidelines/issues/156).) + ## The role of the value representation in the operational semantics One key use of the value representation is to define a "typed" interface to memory: @@ -97,7 +104,7 @@ trait TypedMemory: Memory { } ``` -here, `Type` is some representation of the Rust type system (akin to [`Ty`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/type.Ty.html) in the compiler). +Here, `Type` is some representation of the Rust type system (akin to [`Ty`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/type.Ty.html) in the compiler). We can implement `TypedMemory` for any `Memory` as follows: * For `typed_write`, pick any representation of `val` for `ty`, and call `Memory::write`. If no representation exists, we have UB. * For `typed_read`, read `ty.size()` many bytes from memory, and then determine which value this list of bytes represents. If it does not represent any value, we have UB. @@ -109,13 +116,13 @@ This also means that for types that have padding, the "typed copy" does not pres ## Relation to validity invariant -One way we *could* also use the value representation (and the author things this is exceedingly elegant) is to define the validity invariant. +One way we *could* also use the value representation (and the author thinks this is exceedingly elegant) is to define the validity invariant. Certainly, it is the case that if a list of bytes is not related to any value for a given type `T`, then that list of bytes is *invalid* for `T` and it should be UB to produce such a list of bytes at type `T`. We could decide that this is an "if and only if", i.e., that the validity invariant for a type is exactly "must be in the value representation". For many types this is likely what we will do anyway (e.g., for `bool` and `!` and `()` and integers), but for references, this choice would mean that *validity of the reference cannot depend on what memory looks like*---so "dereferencable" and "points to valid data" cannot be part of the validity invariant for references. The reason this is so elegant is that, as we have seen above, a "typed copy" already very naturally is UB when the memory that is copied is not a valid representation of `T`. This means we do not even need a special clause in our specification for the validity invariant---in fact, the term does not even have to appear in the specification---as everything juts falls out of how a "typed copy" applies the value representation twice. -Justifying the `dereferencable` LLVM attribute is, in this case, left to the aliasing model (e.g. [Stacked Borrows]), just like that is needed to justify the `noalias` attribute. +Justifying the `dereferencable` LLVM attribute is, in this case, left to the aliasing model (e.g. [Stacked Borrows]), just like the `noalias` attribute. [Stacked Borrows]: stacked-borrows.md From 168249a3775f5cdfcc1a68c18120cdcde7ea2119 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 25 Jul 2019 19:16:37 +0200 Subject: [PATCH 05/12] tweaks --- wip/memory-interface.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/wip/memory-interface.md b/wip/memory-interface.md index 8cd6dabe..5166633b 100644 --- a/wip/memory-interface.md +++ b/wip/memory-interface.md @@ -4,8 +4,8 @@ The purpose of this document is to describe the interface between a Rust program and memory. This interface is a key part of the Rust Abstract Machine: it lets us separate concerns by splitting the Machine (i.e., its specification) into two pieces, connected by this well-defined interface: -* The *expression/statement semantics* of Rust boils down to explaining which "memroy events" (calls to the memory interface) happen in which order. This part of the specification is *pure* in the sense that it has no "state": everything that needs to be remembered from one expression evaluation to the next is communicated through memory. -* The Rust *memory model* explains which interactions with the memory are legal (the others are UB), and which values can be returned by reads. +* The *expression/statement semantics* of Rust boils down to explaining which "memroy events" (calls to the memory interface) happen in which order - expressed as calls to the methods of this interface, and reactions to its return values. This part of the specification is *pure* in the sense that it has no "state": everything that needs to be remembered from one expression evaluation to the next is communicated through memory. +* The Rust *memory model* explains which interactions with the memory are legal (the others are UB), and which values can be returned by reads. A memory model is defined by implementing the memory interface. The interface shown below is also opinionated in several ways. It is not intended to be able to support *any imaginable* memory model, but rather start the process of reducing the design space of what we consider a "reasonable" memory model for Rust. @@ -20,6 +20,9 @@ One key question a memory model has to answer is *what is a pointer*. It might seem like the answer is just "an integer of appropriate size", but [that is not the case][pointers-complicated]. This becomes even more prominent with aliasing models such as [Stacked Borrows]. So the interface will leave it up to the concrete instance to answer this question, and carry `Pointer` as an associated type. +Practically speaking, `Pointer` will be some representation of an "address", plus [provenance] information. + +[provenance]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#pointer-provenance ## Bytes From 53b899ea0daa2d682914dcac362bc6123da44052 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 25 Jul 2019 20:16:29 +0200 Subject: [PATCH 06/12] address many review comments --- wip/memory-interface.md | 39 +++++++++++++++++++++++++-------------- wip/value-domain.md | 22 ++++++++++++---------- 2 files changed, 37 insertions(+), 24 deletions(-) diff --git a/wip/memory-interface.md b/wip/memory-interface.md index 5166633b..a3dfcc9d 100644 --- a/wip/memory-interface.md +++ b/wip/memory-interface.md @@ -9,9 +9,9 @@ This interface is a key part of the Rust Abstract Machine: it lets us separate c The interface shown below is also opinionated in several ways. It is not intended to be able to support *any imaginable* memory model, but rather start the process of reducing the design space of what we consider a "reasonable" memory model for Rust. -For example, it explicitly acknowledges that pointers are not just integers and that uninitialized memory is special (both are true for C and C++ as well but you have to read the standard very careful, and consult [non-normative defect report responses](http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm), to see this). +For example, it explicitly acknowledges that pointers are not just integers and that uninitialized memory is special (both are true for C and C++ as well but you have to read the standard very careful, and consult [defect report responses](http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm), to see this). Another key property of the interface presented below is that it is *untyped*. -This implies that in Rust, *operations are typed, but memory is not*---a key difference to C and C++ with their type-based strict aliasing rules. +This implies that in Rust, *operations are typed, but memory is not* - a key difference to C and C++ with their type-based strict aliasing rules. At the same time, the memory model provides a *side-effect free* way to turn pointers into "raw bytes", which is *not* [the direction C++ is moving towards](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf), and we might have to revisit this choice later if it turns out to not be workable. ## Pointers @@ -19,7 +19,7 @@ At the same time, the memory model provides a *side-effect free* way to turn poi One key question a memory model has to answer is *what is a pointer*. It might seem like the answer is just "an integer of appropriate size", but [that is not the case][pointers-complicated]. This becomes even more prominent with aliasing models such as [Stacked Borrows]. -So the interface will leave it up to the concrete instance to answer this question, and carry `Pointer` as an associated type. +So we leave it up to the memory model to answer this question, and make `Pointer` an associated type. Practically speaking, `Pointer` will be some representation of an "address", plus [provenance] information. [provenance]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#pointer-provenance @@ -27,7 +27,7 @@ Practically speaking, `Pointer` will be some representation of an "address", plu ## Bytes The unit of communication between the memory model and the rest of the program is a *byte*. -Again the question of "what is a byte" is not as trivial as it might seem; beyond `u8` values we have to represent `Pointer`s and [uninitialized memory][uninit]. +Again, the question of "what is a byte" is not as trivial as it might seem; beyond `u8` values we have to represent `Pointer`s and [uninitialized memory][uninit]. We define the `Byte` type (in terms of an arbitrary `Pointer` type) as follows: ```rust @@ -39,6 +39,7 @@ enum Byte { /// One byte of a pointer. PtrFragment { /// The pointer of which this is a byte. + /// That is, the byte is a fragment of this pointer. ptr: Pointer, /// Which byte of the pointer this is. /// `idx` will always be in `0..PTR_SIZE`. @@ -47,10 +48,15 @@ enum Byte { } ``` -The purpose of `PtrFragment` is to be able to have a byte-wise representation of a `Pointer`. +The purpose of `PtrFragment` is to enable a byte-wise representation of a `Pointer`. On a 32-bit system, the sequence of 4 bytes representing `ptr: Pointer` is: ``` -[PtrFragment { ptr, idx: 0 }, PtrFragment { ptr, idx: 1 }, PtrFragment { ptr, idx: 2 }, PtrFragment { ptr, idx: 3 }] +[ + PtrFragment { ptr, idx: 0 }, + PtrFragment { ptr, idx: 1 }, + PtrFragment { ptr, idx: 2 }, + PtrFragment { ptr, idx: 3 }, +] ``` Based on the `PtrToInt` trait (see below), we can turn every initialized `Byte` into an integer in `0..256`: @@ -73,13 +79,18 @@ impl Byte { The Rust memory interface is described by the following (not-yet-complete) trait definition: ```rust +/// All operations are fallible, so they return `Result`. If they fail, that +/// means the program caused UB. What exactly the `UndefinedBehavior` type is +/// does not matter here. +type Result = std::result::Result; + /// *Note*: All memory operations can be non-deterministic, which means that /// executing the same operation on the same memory can have different results. /// We also let all operations potentially mutate memory. For example, reads /// actually do change the current state when considering concurrency or /// Stacked Borrows. -/// And finally, all operations are fallible (they return `Result`); if they -/// fail, that means the program caused UB. +/// This is pseudo-Rust, so we just use fully owned types everywhere for +/// symmetry and simplicity. trait Memory { /// The type of pointer values. type Pointer: Copy + PtrToInt; @@ -88,23 +99,23 @@ trait Memory { const PTR_SIZE: u64; /// Create a new allocation. - fn allocate(&mut self, size: u64, align: u64) -> Result; + fn allocate(&mut self, size: u64, align: u64) -> Result; /// Remove an allocation. - fn deallocate(&mut self, ptr: Self::Pointer, size: u64, align: u64) -> Result<(), Error>; + fn deallocate(&mut self, ptr: Self::Pointer, size: u64, align: u64) -> Result; /// Write some bytes to memory. - fn write(&mut self, ptr: Self::Pointer, bytes: Vec>) -> Result<(), Error>; + fn write(&mut self, ptr: Self::Pointer, bytes: Vec>) -> Result; /// Read some bytes from memory. - fn read(&mut self, ptr: Self::Pointer, len: u64) -> Result>, Error>; + fn read(&mut self, ptr: Self::Pointer, len: u64) -> Result>>; /// Offset the given pointer. fn offset(&mut self, ptr: Self::Pointer, offset: u64, mode: OffsetMode) - -> Result; + -> Result; /// Cast the given integer to a pointer. (The other direction is handled by `PtrToInt` below.) - fn int_to_ptr(&mut self, int: u64) -> Result; + fn int_to_ptr(&mut self, int: u64) -> Result; } /// The `Pointer` type must know how to extract its bytes, *without any access to the `Memory`*. diff --git a/wip/value-domain.md b/wip/value-domain.md index a8fa24e4..7be69fc0 100644 --- a/wip/value-domain.md +++ b/wip/value-domain.md @@ -17,28 +17,30 @@ The Rust value domain is described by the following (incomplete) type definition ```rust enum Value { - /// A mathematical integer. + /// A mathematical integer, used for `i*`/`u*` types. Int(BigInt), - /// A Boolean value. + /// A Boolean value, used for `bool`. Bool(bool), - /// A pointer. + /// A pointer value, used for (thin) references and raw pointers. Ptr(Pointer), /// An uninitialized value. Uninit, - /// An n-tuple. + /// An n-tuple, used for arrays, structs, tuples, SIMD vectors. Tuple(Vec), - /// A variant of a sum type. + /// A variant of a sum type, used for enums. Variant { - idx: u64, + idx: BigInt, data: Box, }, - /// A "bag of raw bytes". + /// A "bag of raw bytes", used for unions. RawBag(Vec>), /* ... */ } ``` -As Rust grows, we might expand this definition. That is okay; all previously defined representation relations are still well-defined when the domain grows, the newly added values will just not be valid for old types as one would expect. +The point of this type is to capture the mathematical concepts that are represented by the data we store in memory. +The definition is likely incomplete, and even if it was complete now, we might expand it as Rust grows. +That is okay; all previously defined representation relations are still well-defined when the domain grows, the newly added values will just not be valid for old types as one would expect. ## Example value relations @@ -97,10 +99,10 @@ One key use of the value representation is to define a "typed" interface to memo ```rust trait TypedMemory: Memory { /// Write a value of the given type to memory. - fn typed_write(&mut self, ptr: Self::Pointer, val: Value, ty: Type) -> Result<(), Error>; + fn typed_write(&mut self, ptr: Self::Pointer, val: Value, ty: Type) -> Result; /// Read a value of the given type. - fn typed_read(&mut self, ptr: Self::Pointer, ty: Type) -> Result; + fn typed_read(&mut self, ptr: Self::Pointer, ty: Type) -> Result; } ``` From 91648b1f95eb6b814efb281f7953d34256be9eb5 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 25 Jul 2019 20:26:59 +0200 Subject: [PATCH 07/12] just assume a memory model is in scope --- wip/memory-interface.md | 5 ++++- wip/value-domain.md | 6 +++--- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/wip/memory-interface.md b/wip/memory-interface.md index a3dfcc9d..fab056ef 100644 --- a/wip/memory-interface.md +++ b/wip/memory-interface.md @@ -28,7 +28,7 @@ Practically speaking, `Pointer` will be some representation of an "address", plu The unit of communication between the memory model and the rest of the program is a *byte*. Again, the question of "what is a byte" is not as trivial as it might seem; beyond `u8` values we have to represent `Pointer`s and [uninitialized memory][uninit]. -We define the `Byte` type (in terms of an arbitrary `Pointer` type) as follows: +We define the `Byte` type as follows, where `Pointer` will later be instantiated with the `Memory::Pointer` associated type. ```rust enum Byte { @@ -137,6 +137,9 @@ enum OffsetMode { } ``` +We will generally assume we have a particular memory model in scope, and freely refer to its `PTR_SIZE` and `Pointer` items. +We will also write `Byte` for `Byte`. + This is a very basic memory interface that is incomplete in at least the following ways: * To implement rules like "dereferencing a null, unaligned, or dangling raw pointer is UB" (even if no memory access happens), there needs to be a way to do an "alignment, bounds and null-check". diff --git a/wip/value-domain.md b/wip/value-domain.md index 7be69fc0..4cc96482 100644 --- a/wip/value-domain.md +++ b/wip/value-domain.md @@ -13,10 +13,10 @@ However, before we can even start specifying the relation, we have to specify th ## Values -The Rust value domain is described by the following (incomplete) type definition (in terms of an arbitrary `Pointer` type instantiated by [the memory interface][memory-interface]): +The Rust value domain is described by the following (incomplete) type definition (using the `Pointer` type defined by [the memory interface][memory-interface]): ```rust -enum Value { +enum Value { /// A mathematical integer, used for `i*`/`u*` types. Int(BigInt), /// A Boolean value, used for `bool`. @@ -33,7 +33,7 @@ enum Value { data: Box, }, /// A "bag of raw bytes", used for unions. - RawBag(Vec>), + RawBag(Vec), /* ... */ } ``` From 3785d4c680225046c9a8781f893c018cc72ab4d7 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 25 Jul 2019 20:27:56 +0200 Subject: [PATCH 08/12] typo --- wip/memory-interface.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/wip/memory-interface.md b/wip/memory-interface.md index fab056ef..87c7aa07 100644 --- a/wip/memory-interface.md +++ b/wip/memory-interface.md @@ -4,7 +4,7 @@ The purpose of this document is to describe the interface between a Rust program and memory. This interface is a key part of the Rust Abstract Machine: it lets us separate concerns by splitting the Machine (i.e., its specification) into two pieces, connected by this well-defined interface: -* The *expression/statement semantics* of Rust boils down to explaining which "memroy events" (calls to the memory interface) happen in which order - expressed as calls to the methods of this interface, and reactions to its return values. This part of the specification is *pure* in the sense that it has no "state": everything that needs to be remembered from one expression evaluation to the next is communicated through memory. +* The *expression/statement semantics* of Rust boils down to explaining which "memory events" (calls to the memory interface) happen in which order - expressed as calls to the methods of this interface, and reactions to its return values. This part of the specification is *pure* in the sense that it has no "state": everything that needs to be remembered from one expression evaluation to the next is communicated through memory. * The Rust *memory model* explains which interactions with the memory are legal (the others are UB), and which values can be returned by reads. A memory model is defined by implementing the memory interface. The interface shown below is also opinionated in several ways. From 17611a13fc4269269000112d9db3ae09f958a1a8 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 25 Jul 2019 20:29:58 +0200 Subject: [PATCH 09/12] add NonNullU8 --- wip/value-domain.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/wip/value-domain.md b/wip/value-domain.md index 4cc96482..4574b380 100644 --- a/wip/value-domain.md +++ b/wip/value-domain.md @@ -87,6 +87,11 @@ Certainly, a value `Int(i)` where `i` in `0..256` is related to `[b]` if `b.as_i And then, maybe, we also want to additionally say that value `Uninit` is related to byte list `[Uninit]`. This essentially corresponds to saying that uninitialized memory is a valid representation of a `u8` value (namely, the uninitialized value). +### `NonNullU8` + +`NonNullU8` is basically the same as `u8` except that we remove the 0, and certainly do not consider `Uninit` a valid value. +A value `Int(i)` where `i` in `1..256` is related to `[b]` if `b.as_int() == Some(i)`, and that's it. + ### `union` The `union` type does not even try to interpret memory, so for a `union` of size `n`, the value relation says that for any byte list `bytes` of that length, `RawBag(bytes)` is related to `bytes`. From 334c8b696579506e21197cb75b4ca9a7131375f5 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 25 Jul 2019 20:31:23 +0200 Subject: [PATCH 10/12] linebreaks --- wip/memory-interface.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/wip/memory-interface.md b/wip/memory-interface.md index 87c7aa07..9ba8219a 100644 --- a/wip/memory-interface.md +++ b/wip/memory-interface.md @@ -4,8 +4,10 @@ The purpose of this document is to describe the interface between a Rust program and memory. This interface is a key part of the Rust Abstract Machine: it lets us separate concerns by splitting the Machine (i.e., its specification) into two pieces, connected by this well-defined interface: -* The *expression/statement semantics* of Rust boils down to explaining which "memory events" (calls to the memory interface) happen in which order - expressed as calls to the methods of this interface, and reactions to its return values. This part of the specification is *pure* in the sense that it has no "state": everything that needs to be remembered from one expression evaluation to the next is communicated through memory. -* The Rust *memory model* explains which interactions with the memory are legal (the others are UB), and which values can be returned by reads. A memory model is defined by implementing the memory interface. +* The *expression/statement semantics* of Rust boils down to explaining which "memory events" (calls to the memory interface) happen in which order - expressed as calls to the methods of this interface, and reactions to its return values. + This part of the specification is *pure* in the sense that it has no "state": everything that needs to be remembered from one expression evaluation to the next is communicated through memory. +* The Rust *memory model* explains which interactions with the memory are legal (the others are UB), and which values can be returned by reads. + A memory model is defined by implementing the memory interface. The interface shown below is also opinionated in several ways. It is not intended to be able to support *any imaginable* memory model, but rather start the process of reducing the design space of what we consider a "reasonable" memory model for Rust. From d871de683f5487278f4cf844e215e514c1792509 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 25 Jul 2019 22:25:12 +0200 Subject: [PATCH 11/12] NonNullU8 is actually called NonZeroU8 --- wip/value-domain.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/wip/value-domain.md b/wip/value-domain.md index 4574b380..6b3cad2f 100644 --- a/wip/value-domain.md +++ b/wip/value-domain.md @@ -87,9 +87,9 @@ Certainly, a value `Int(i)` where `i` in `0..256` is related to `[b]` if `b.as_i And then, maybe, we also want to additionally say that value `Uninit` is related to byte list `[Uninit]`. This essentially corresponds to saying that uninitialized memory is a valid representation of a `u8` value (namely, the uninitialized value). -### `NonNullU8` +### `NonZeroU8` -`NonNullU8` is basically the same as `u8` except that we remove the 0, and certainly do not consider `Uninit` a valid value. +`NonZeroU8` is basically the same as `u8` except that we remove the 0, and certainly do not consider `Uninit` a valid value. A value `Int(i)` where `i` in `1..256` is related to `[b]` if `b.as_int() == Some(i)`, and that's it. ### `union` From d63559b8c8085c19e31114e95164cafbabd7a799 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Fri, 26 Jul 2019 11:14:57 +0200 Subject: [PATCH 12/12] cite cerberus --- wip/value-domain.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/wip/value-domain.md b/wip/value-domain.md index 6b3cad2f..0c840a5d 100644 --- a/wip/value-domain.md +++ b/wip/value-domain.md @@ -112,6 +112,8 @@ trait TypedMemory: Memory { ``` Here, `Type` is some representation of the Rust type system (akin to [`Ty`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/type.Ty.html) in the compiler). +This interface is inspired by [Cerberus](https://www.cl.cam.ac.uk/~pes20/cerberus/). + We can implement `TypedMemory` for any `Memory` as follows: * For `typed_write`, pick any representation of `val` for `ty`, and call `Memory::write`. If no representation exists, we have UB. * For `typed_read`, read `ty.size()` many bytes from memory, and then determine which value this list of bytes represents. If it does not represent any value, we have UB.