Skip to content

Commit f769982

Browse files
committed
Add additional anchor on section references
1 parent a4b2f6c commit f769982

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

docs/buffers.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -82,13 +82,13 @@ The `acl_bind_buffer_to_device` function first calls the [`acl_do_physical_buffe
8282

8383
**NOTE:** The simulator does not know the memory interfaces of any device until a `.aocx` file is loaded, which usually happens after SYCL* calls the `clEnqueueWriteBuffer` function.
8484

85-
Then, the `acl_bind_buffer_to_device` function reserves the memory through the [`acl_allocate_block`](https://github.com/intel/fpga-runtime-for-opencl/blob/b08e0af97351718ce0368a9ee507242b35f4929e/src/acl_mem.cpp#L4310-L4565) function. The `acl_allocate_block` function attempts to allocate memory on the preferred bank. If it fails (i.e., the bank's memory is full), then `acl_allocate_block` function attempts to allocate memory the entire device's global memory. The `acl_allocate_block` function decides on the memory range it can allocate based on the information you provided regarding the device, global memory, and memory bank. It returns a range in the form of `[pointer_to_begin_address, pointer_to_end_address]` (achieved through [`l_get_working_range`](https://github.com/intel/fpga-runtime-for-opencl/blob/b08e0af97351718ce0368a9ee507242b35f4929e/src/acl_mem.cpp#L4253-L4308)). The specifics of how memory is reserved are described in the next subsection *Memory Allocation Algorithm*.
85+
Then, the `acl_bind_buffer_to_device` function reserves the memory through the [`acl_allocate_block`](https://github.com/intel/fpga-runtime-for-opencl/blob/b08e0af97351718ce0368a9ee507242b35f4929e/src/acl_mem.cpp#L4310-L4565) function. The `acl_allocate_block` function attempts to allocate memory on the preferred bank. If it fails (i.e., the bank's memory is full), then `acl_allocate_block` function attempts to allocate memory the entire device's global memory. The `acl_allocate_block` function decides on the memory range it can allocate based on the information you provided regarding the device, global memory, and memory bank. It returns a range in the form of `[pointer_to_begin_address, pointer_to_end_address]` (achieved through [`l_get_working_range`](https://github.com/intel/fpga-runtime-for-opencl/blob/b08e0af97351718ce0368a9ee507242b35f4929e/src/acl_mem.cpp#L4253-L4308)). The specifics of how memory is reserved are described in [Memory Allocation Algorithm](#memory-allocation-algorithm).
8686

8787
**NOTE:** You can partition a single device's global memory into multiple banks (the partition can be interleaving or separate, with interleaving being the default). Interleaving memory provides more load balancing between memory banks. You can query which specific bank to access through runtime calls. For more information about memory banks, see [Global Memory Accesses Optimization](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-fpga-optimization-guide/top/optimize-your-design/throughput-1/memory-accesses/global-memory-accesses-optimization.html) topic in the *FPGA Optimization Guide for Intel(R) oneAPI Toolkits*.
8888

8989
Once the address range is set, the `acl_allocate_block` function returns the device address. The device address is different from the surface representation in runtime. Specifically, device address is a bitwise OR of the device ID and device pointer as formatted [here](https://github.com/intel/fpga-runtime-for-opencl/blob/1264543c0361530f5883e35dc0c9d48ac0fd3653/include/acl.h#L264-L274).
9090

91-
Once the 2D list is ready and the current block allocation is set, the `acl_bind_buffer_to_device` function [enqueues a memory transfer](https://github.com/intel/fpga-runtime-for-opencl/blob/b08e0af97351718ce0368a9ee507242b35f4929e/src/acl_mem.cpp#L4726-L5174) from the context's `unwrapped_host_mem` to the buffer's `for_enqueue_writes` as described in the next subsection.
91+
Once the 2D list is ready and the current block allocation is set, the `acl_bind_buffer_to_device` function [enqueues a memory transfer](https://github.com/intel/fpga-runtime-for-opencl/blob/b08e0af97351718ce0368a9ee507242b35f4929e/src/acl_mem.cpp#L4726-L5174) from the context's `unwrapped_host_mem` to the buffer's `for_enqueue_writes` as described in [Transfer Memory section](#transfer-memory).
9292

9393
##### Memory Allocation Algorithm
9494
The memory allocation algorithm is first-fit allocation. The allocation starts from the beginning of the requested global memory and then searches for the next available space (gaps or ends) that satisfies the size requirement. If you request a specific memory bank, then the memory must be non-interleaving. When you specify a bank ID, the first-fit allocation algorithm starts at the address of `(((bank_id -1) % num_banks) * bank_size + the start of target global_mem)`. The implication of specifying a bank ID is that the consecutive memory allocation may not be adjacent to each other. Conversely, if you never specified a bank ID, the consecutive memory allocation should be adjacent, assuming there was no deallocation.

0 commit comments

Comments
 (0)