You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The tutorial gives an example about using packed_accessor64 to access tensor elements efficiently as follows. However, I still do not know how to use packed_accessor64. Can anyone give me a more specific example? Thanks.
__global__ void packed_accessor_kernel(
PackedTensorAccessor64<float, 2> foo,
float* trace) {
int i=threadIdx.x
gpuAtomicAdd(trace, foo[i][i])
}
torch::Tensor foo = torch::rand({12, 12});
// assert foo is 2-dimensional and holds floats.
auto foo_a = foo.packed_accessor64<float,2>();
float trace = 0;
packed_accessor_kernel<<<1, 12>>>(foo_a, &trace);