Skip to content

[Offload] Define additional device info properties #152533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rafbiels
Copy link

@rafbiels rafbiels commented Aug 7, 2025

Add the following properties in Offload device info:

  • VENDOR_ID
  • NUM_COMPUTE_UNITS
  • [SINGLE|DOUBLE|HALF]_FP_CONFIG
  • NATIVE_VECTOR_WIDTH_[CHAR|SHORT|INT|LONG|FLOAT|DOUBLE|HALF]
  • MAX_CLOCK_FREQUENCY
  • MEMORY_CLOCK_RATE
  • ADDRESS_BITS
  • MAX_MEM_ALLOC_SIZE
  • GLOBAL_MEM_SIZE

Add a bitfield option to enumerators, allowing the values to be bit-shifted instead of incremented. Generate the per-type enums using foreach to reduce code duplication.

Use macros in unit test definitions to reduce code duplication.

Copy link

github-actions bot commented Aug 7, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Aug 7, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Rafal Bielski (rafbiels)

Changes

Add the following properties in Offload device info:

  • VENDOR_ID
  • NUM_COMPUTE_UNITS
  • [SINGLE|DOUBLE|HALF]_FP_CONFIG
  • PREFERRED_VECTOR_WIDTH_[CHAR|SHORT|INT|LONG|FLOAT|DOUBLE|HALF]
  • NATIVE_VECTOR_WIDTH_[CHAR|SHORT|INT|LONG|FLOAT|DOUBLE|HALF]
  • MAX_CLOCK_FREQUENCY
  • MEMORY_CLOCK_RATE
  • ADDRESS_BITS
  • MAX_MEM_ALLOC_SIZE
  • GLOBAL_MEM_SIZE

Introduce templated enumerators in Offload TableGen to reduce code duplication for the per-type definitions.

Use macros in unit test definitions to reduce code duplication.


Patch is 34.99 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152533.diff

13 Files Affected:

  • (modified) offload/liboffload/API/APIDefs.td (+16)
  • (modified) offload/liboffload/API/Device.td (+32)
  • (modified) offload/liboffload/src/OffloadImpl.cpp (+85-3)
  • (modified) offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h (+6)
  • (modified) offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h (+1)
  • (modified) offload/plugins-nextgen/amdgpu/src/rtl.cpp (+22-3)
  • (modified) offload/plugins-nextgen/cuda/src/rtl.cpp (+12-4)
  • (modified) offload/tools/offload-tblgen/APIGen.cpp (+25-10)
  • (modified) offload/tools/offload-tblgen/MiscGen.cpp (+22-5)
  • (modified) offload/tools/offload-tblgen/PrintGen.cpp (+41-25)
  • (modified) offload/tools/offload-tblgen/RecordTypes.hpp (+30-1)
  • (modified) offload/unittests/OffloadAPI/device/olGetDeviceInfo.cpp (+64-6)
  • (modified) offload/unittests/OffloadAPI/device/olGetDeviceInfoSize.cpp (+65-34)
diff --git a/offload/liboffload/API/APIDefs.td b/offload/liboffload/API/APIDefs.td
index 640932dcf8464..79f7242a35ecf 100644
--- a/offload/liboffload/API/APIDefs.td
+++ b/offload/liboffload/API/APIDefs.td
@@ -158,16 +158,32 @@ class Etor<string Name, string Desc> {
   string tagged_type;
 }
 
+class EtorTemplate<string Name, string Desc, list<string> TemplateValues> : Etor<Name, Desc> {
+  // Create Etor for every value in template_values by replacing the string
+  // "<TemplateEtor>" with the template value in the name and description
+  list<string> template_values = TemplateValues;
+}
+
 class TaggedEtor<string Name, string Type, string Desc> : Etor<Name, Desc> {
   let tagged_type = Type;
 }
 
+class TaggedEtorTemplate<string Name, string Type, string Desc, list<string> TemplateValues> : TaggedEtor<Name, Type, Desc> {
+  // Create TaggedEtor for every value in template_values by replacing the
+  // string "<TemplateEtor>" with the template value in the name and description
+  list<string> template_values = TemplateValues;
+}
+
 class Enum : APIObject {
   // This refers to whether the enumerator descriptions specify a return
   // type for functions where this enum may be used as an output type. If set,
   // all Etor values must be TaggedEtor records
   bit is_typed = 0;
 
+  // This refers to whether the enumerator is used to name bits of a bit field,
+  // where consecutive values are bit-shifted rather than incremented.
+  bit is_bit_field = 0;
+
   list<Etor> etors = [];
 }
 
diff --git a/offload/liboffload/API/Device.td b/offload/liboffload/API/Device.td
index 857c596124b27..062542364f08e 100644
--- a/offload/liboffload/API/Device.td
+++ b/offload/liboffload/API/Device.td
@@ -34,9 +34,41 @@ def DeviceInfo : Enum {
     TaggedEtor<"DRIVER_VERSION", "char[]", "Driver version">,
     TaggedEtor<"MAX_WORK_GROUP_SIZE", "uint32_t", "Maximum total work group size in work items">,
     TaggedEtor<"MAX_WORK_GROUP_SIZE_PER_DIMENSION", "ol_dimensions_t", "Maximum work group size in each dimension">,
+    TaggedEtor<"VENDOR_ID", "uint32_t", "Device vendor ID">,
+    TaggedEtor<"NUM_COMPUTE_UNITS", "uint32_t", "Number of compute units">,
+    TaggedEtorTemplate<"<TemplateEtor>_FP_CONFIG", "ol_device_fp_capability_flags_t", "<TemplateEtor> precision floating point capability", ["Single","Half","Double"]>,
+    TaggedEtorTemplate<"PREFERRED_VECTOR_WIDTH_<TemplateEtor>", "uint32_t", "Preferred vector width for <TemplateEtor>", ["char","short","int","long","float","double","half"]>,
+    TaggedEtorTemplate<"NATIVE_VECTOR_WIDTH_<TemplateEtor>", "uint32_t", "Native vector width for <TemplateEtor>", ["char","short","int","long","float","double","half"]>,
+    TaggedEtor<"MAX_CLOCK_FREQUENCY", "uint32_t", "Max clock frequency in MHz">,
+    TaggedEtor<"MEMORY_CLOCK_RATE", "uint32_t", "Memory clock frequency in MHz">,
+    TaggedEtor<"ADDRESS_BITS", "uint32_t", "Number of bits used to represent an address in device memory">,
+    TaggedEtor<"MAX_MEM_ALLOC_SIZE", "uint64_t", "The maximum size of memory object allocation in bytes">,
+    TaggedEtor<"GLOBAL_MEM_SIZE", "uint64_t", "The size of global device memory in bytes">,
   ];
 }
 
+def : Enum {
+  let name = "ol_device_fp_capability_flag_t";
+  let desc = "Device floating-point capability flags";
+  let is_bit_field = 1;
+  let etors =[
+    Etor<"CORRECTLY_ROUNDED_DIVIDE_SQRT", "Support correctly rounded divide and sqrt">,
+    Etor<"ROUND_TO_NEAREST", "Support round to nearest">,
+    Etor<"ROUND_TO_ZERO", "Support round to zero">,
+    Etor<"ROUND_TO_INF", "Support round to infinity">,
+    Etor<"INF_NAN", "Support INF to NAN">,
+    Etor<"DENORM", "Support denorm">,
+    Etor<"FMA", "Support fused multiply-add">,
+    Etor<"SOFT_FLOAT", "Basic floating point operations implemented in software">,
+  ];
+}
+
+def : Typedef {
+  let name = "ol_device_fp_capability_flags_t";
+  let desc = "Device floating-point capability flags";
+  let value = "uint32_t";
+}
+
 def : FptrTypedef {
   let name = "ol_device_iterate_cb_t";
   let desc = "User-provided function to be used with `olIterateDevices`";
diff --git a/offload/liboffload/src/OffloadImpl.cpp b/offload/liboffload/src/OffloadImpl.cpp
index 272a12ab59a06..4ea587e29de4d 100644
--- a/offload/liboffload/src/OffloadImpl.cpp
+++ b/offload/liboffload/src/OffloadImpl.cpp
@@ -302,10 +302,57 @@ Error olGetDeviceInfoImplDetail(ol_device_handle_t Device,
   };
 
   // These are not implemented by the plugin interface
-  if (PropName == OL_DEVICE_INFO_PLATFORM)
+  switch (PropName) {
+  case OL_DEVICE_INFO_PLATFORM:
     return Info.write<void *>(Device->Platform);
-  if (PropName == OL_DEVICE_INFO_TYPE)
+
+  case OL_DEVICE_INFO_TYPE:
     return Info.write<ol_device_type_t>(OL_DEVICE_TYPE_GPU);
+
+  case OL_DEVICE_INFO_SINGLE_FP_CONFIG:
+  case OL_DEVICE_INFO_DOUBLE_FP_CONFIG: {
+    ol_device_fp_capability_flags_t flags{0};
+    flags |= OL_DEVICE_FP_CAPABILITY_FLAG_CORRECTLY_ROUNDED_DIVIDE_SQRT |
+             OL_DEVICE_FP_CAPABILITY_FLAG_ROUND_TO_NEAREST |
+             OL_DEVICE_FP_CAPABILITY_FLAG_ROUND_TO_ZERO |
+             OL_DEVICE_FP_CAPABILITY_FLAG_ROUND_TO_INF |
+             OL_DEVICE_FP_CAPABILITY_FLAG_INF_NAN |
+             OL_DEVICE_FP_CAPABILITY_FLAG_DENORM |
+             OL_DEVICE_FP_CAPABILITY_FLAG_FMA;
+    return Info.write(flags);
+  }
+
+  case OL_DEVICE_INFO_HALF_FP_CONFIG:
+    return Info.write<ol_device_fp_capability_flags_t>(0);
+
+  case OL_DEVICE_INFO_PREFERRED_VECTOR_WIDTH_CHAR:
+  case OL_DEVICE_INFO_PREFERRED_VECTOR_WIDTH_SHORT:
+  case OL_DEVICE_INFO_PREFERRED_VECTOR_WIDTH_INT:
+  case OL_DEVICE_INFO_PREFERRED_VECTOR_WIDTH_LONG:
+  case OL_DEVICE_INFO_PREFERRED_VECTOR_WIDTH_FLOAT:
+  case OL_DEVICE_INFO_PREFERRED_VECTOR_WIDTH_DOUBLE:
+  case OL_DEVICE_INFO_NATIVE_VECTOR_WIDTH_CHAR:
+  case OL_DEVICE_INFO_NATIVE_VECTOR_WIDTH_SHORT:
+  case OL_DEVICE_INFO_NATIVE_VECTOR_WIDTH_INT:
+  case OL_DEVICE_INFO_NATIVE_VECTOR_WIDTH_LONG:
+  case OL_DEVICE_INFO_NATIVE_VECTOR_WIDTH_FLOAT:
+  case OL_DEVICE_INFO_NATIVE_VECTOR_WIDTH_DOUBLE:
+    return Info.write<uint32_t>(1);
+
+  case OL_DEVICE_INFO_PREFERRED_VECTOR_WIDTH_HALF:
+  case OL_DEVICE_INFO_NATIVE_VECTOR_WIDTH_HALF:
+    return Info.write<uint32_t>(0);
+
+  // None of the existing plugins specify a limit on a single allocation,
+  // so return the global memory size instead
+  case OL_DEVICE_INFO_MAX_MEM_ALLOC_SIZE:
+    PropName = OL_DEVICE_INFO_GLOBAL_MEM_SIZE;
+    break;
+
+  default:
+    break;
+  }
+
   if (PropName >= OL_DEVICE_INFO_LAST)
     return createOffloadError(ErrorCode::INVALID_ENUMERATION,
                               "getDeviceInfo enum '%i' is invalid", PropName);
@@ -316,6 +363,7 @@ Error olGetDeviceInfoImplDetail(ol_device_handle_t Device,
                      "plugin did not provide a response for this information");
   auto Entry = *EntryOpt;
 
+  // Retrieve properties from the plugin interface
   switch (PropName) {
   case OL_DEVICE_INFO_NAME:
   case OL_DEVICE_INFO_VENDOR:
@@ -327,7 +375,18 @@ Error olGetDeviceInfoImplDetail(ol_device_handle_t Device,
     return Info.writeString(std::get<std::string>(Entry->Value).c_str());
   }
 
-  case OL_DEVICE_INFO_MAX_WORK_GROUP_SIZE: {
+  case OL_DEVICE_INFO_GLOBAL_MEM_SIZE: {
+    // Uint64 values
+    if (!std::holds_alternative<uint64_t>(Entry->Value))
+      return makeError(ErrorCode::BACKEND_FAILURE,
+                       "plugin returned incorrect type");
+    return Info.write(std::get<uint64_t>(Entry->Value));
+  }
+
+  case OL_DEVICE_INFO_MAX_WORK_GROUP_SIZE:
+  case OL_DEVICE_INFO_VENDOR_ID:
+  case OL_DEVICE_INFO_NUM_COMPUTE_UNITS:
+  case OL_DEVICE_INFO_ADDRESS_BITS: {
     // Uint32 values
     if (!std::holds_alternative<uint64_t>(Entry->Value))
       return makeError(ErrorCode::BACKEND_FAILURE,
@@ -339,6 +398,29 @@ Error olGetDeviceInfoImplDetail(ol_device_handle_t Device,
     return Info.write(static_cast<uint32_t>(Value));
   }
 
+  case OL_DEVICE_INFO_MAX_CLOCK_FREQUENCY:
+  case OL_DEVICE_INFO_MEMORY_CLOCK_RATE: {
+    // Frequency might require unit conversion to MHz
+    if (!std::holds_alternative<uint64_t>(Entry->Value))
+      return makeError(ErrorCode::BACKEND_FAILURE,
+                       "plugin returned incorrect type");
+    auto Value = std::get<uint64_t>(Entry->Value);
+    StringRef Units = Entry->Units;
+    if (!Units.empty() && Units != "MHz") {
+      // Conversion required
+      if (Units == "Hz")
+        Value /= 1000000;
+      else if (Units == "kHz")
+        Value /= 1000;
+      else if (Units == "GHz")
+        Value *= 1000;
+    }
+    if (Value > std::numeric_limits<uint32_t>::max())
+      return makeError(ErrorCode::BACKEND_FAILURE,
+                       "plugin returned out of range device info");
+    return Info.write(static_cast<uint32_t>(Value));
+  }
+
   case OL_DEVICE_INFO_MAX_WORK_GROUP_SIZE_PER_DIMENSION: {
     // {x, y, z} triples
     ol_dimensions_t Out{0, 0, 0};
diff --git a/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h b/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h
index 61f680bab3a07..ad135f72fff12 100644
--- a/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h
+++ b/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h
@@ -70,10 +70,16 @@ typedef enum {
   HSA_ISA_INFO_NAME = 1
 } hsa_isa_info_t;
 
+typedef enum {
+  HSA_MACHINE_MODEL_SMALL = 0,
+  HSA_MACHINE_MODEL_LARGE = 1
+} hsa_machine_model_t;
+
 typedef enum {
   HSA_AGENT_INFO_NAME = 0,
   HSA_AGENT_INFO_VENDOR_NAME = 1,
   HSA_AGENT_INFO_FEATURE = 2,
+  HSA_AGENT_INFO_MACHINE_MODEL = 3,
   HSA_AGENT_INFO_PROFILE = 4,
   HSA_AGENT_INFO_WAVEFRONT_SIZE = 6,
   HSA_AGENT_INFO_WORKGROUP_MAX_DIM = 7,
diff --git a/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h b/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h
index 3117763e35896..29cfe78082dbb 100644
--- a/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h
+++ b/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h
@@ -67,6 +67,7 @@ typedef enum hsa_amd_agent_info_s {
   HSA_AMD_AGENT_INFO_CACHELINE_SIZE = 0xA001,
   HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT = 0xA002,
   HSA_AMD_AGENT_INFO_MAX_CLOCK_FREQUENCY = 0xA003,
+  HSA_AMD_AGENT_INFO_MEMORY_MAX_FREQUENCY = 0xA008,
   HSA_AMD_AGENT_INFO_PRODUCT_NAME = 0xA009,
   HSA_AMD_AGENT_INFO_MAX_WAVES_PER_CU = 0xA00A,
   HSA_AMD_AGENT_INFO_NUM_SIMDS_PER_CU = 0xA00B,
diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index 852c0e99b2266..20d8994f97a39 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -2643,6 +2643,15 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
     if (Status == HSA_STATUS_SUCCESS)
       Info.add("Vendor Name", TmpChar, "", DeviceInfo::VENDOR);
 
+    Info.add("Vendor ID", uint64_t{4130}, "", DeviceInfo::VENDOR_ID);
+
+    hsa_machine_model_t MachineModel;
+    Status = getDeviceAttrRaw(HSA_AGENT_INFO_MACHINE_MODEL, MachineModel);
+    if (Status == HSA_STATUS_SUCCESS)
+      Info.add("Memory Address Size",
+               uint64_t{MachineModel == HSA_MACHINE_MODEL_SMALL ? 32u : 64u},
+               "bits", DeviceInfo::ADDRESS_BITS);
+
     hsa_device_type_t DevType;
     Status = getDeviceAttrRaw(HSA_AGENT_INFO_DEVICE, DevType);
     if (Status == HSA_STATUS_SUCCESS) {
@@ -2693,11 +2702,17 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
 
     Status = getDeviceAttrRaw(HSA_AMD_AGENT_INFO_MAX_CLOCK_FREQUENCY, TmpUInt);
     if (Status == HSA_STATUS_SUCCESS)
-      Info.add("Max Clock Freq", TmpUInt, "MHz");
+      Info.add("Max Clock Freq", TmpUInt, "MHz",
+               DeviceInfo::MAX_CLOCK_FREQUENCY);
+
+    Status = getDeviceAttrRaw(HSA_AMD_AGENT_INFO_MEMORY_MAX_FREQUENCY, TmpUInt);
+    if (Status == HSA_STATUS_SUCCESS)
+      Info.add("Max Memory Clock Freq", TmpUInt, "MHz",
+               DeviceInfo::MEMORY_CLOCK_RATE);
 
     Status = getDeviceAttrRaw(HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT, TmpUInt);
     if (Status == HSA_STATUS_SUCCESS)
-      Info.add("Compute Units", TmpUInt);
+      Info.add("Compute Units", TmpUInt, "", DeviceInfo::NUM_COMPUTE_UNITS);
 
     Status = getDeviceAttrRaw(HSA_AMD_AGENT_INFO_NUM_SIMDS_PER_CU, TmpUInt);
     if (Status == HSA_STATUS_SUCCESS)
@@ -2779,7 +2794,11 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
 
       Status = Pool->getAttrRaw(HSA_AMD_MEMORY_POOL_INFO_SIZE, TmpSt);
       if (Status == HSA_STATUS_SUCCESS)
-        PoolNode.add("Size", TmpSt, "bytes");
+        PoolNode.add(
+            "Size", TmpSt, "bytes",
+            (Pool->isGlobal() && Pool->isCoarseGrained())
+                ? std::optional<DeviceInfo>{DeviceInfo::GLOBAL_MEM_SIZE}
+                : std::nullopt);
 
       Status = Pool->getAttrRaw(HSA_AMD_MEMORY_POOL_INFO_RUNTIME_ALLOC_ALLOWED,
                                 TmpBool);
diff --git a/offload/plugins-nextgen/cuda/src/rtl.cpp b/offload/plugins-nextgen/cuda/src/rtl.cpp
index 7649fd9285bb5..18a006f0b3992 100644
--- a/offload/plugins-nextgen/cuda/src/rtl.cpp
+++ b/offload/plugins-nextgen/cuda/src/rtl.cpp
@@ -951,13 +951,20 @@ struct CUDADeviceTy : public GenericDeviceTy {
 
     Info.add("Vendor Name", "NVIDIA", "", DeviceInfo::VENDOR);
 
+    Info.add("Vendor ID", uint64_t{4318}, "", DeviceInfo::VENDOR_ID);
+
+    Info.add("Memory Address Size", std::numeric_limits<CUdeviceptr>::digits,
+             "bits", DeviceInfo::ADDRESS_BITS);
+
     Res = cuDeviceTotalMem(&TmpSt, Device);
     if (Res == CUDA_SUCCESS)
-      Info.add("Global Memory Size", TmpSt, "bytes");
+      Info.add("Global Memory Size", TmpSt, "bytes",
+               DeviceInfo::GLOBAL_MEM_SIZE);
 
     Res = getDeviceAttrRaw(CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT, TmpInt);
     if (Res == CUDA_SUCCESS)
-      Info.add("Number of Multiprocessors", TmpInt);
+      Info.add("Number of Multiprocessors", TmpInt, "",
+               DeviceInfo::NUM_COMPUTE_UNITS);
 
     Res = getDeviceAttrRaw(CU_DEVICE_ATTRIBUTE_GPU_OVERLAP, TmpInt);
     if (Res == CUDA_SUCCESS)
@@ -1018,7 +1025,7 @@ struct CUDADeviceTy : public GenericDeviceTy {
 
     Res = getDeviceAttrRaw(CU_DEVICE_ATTRIBUTE_CLOCK_RATE, TmpInt);
     if (Res == CUDA_SUCCESS)
-      Info.add("Clock Rate", TmpInt, "kHz");
+      Info.add("Clock Rate", TmpInt, "kHz", DeviceInfo::MAX_CLOCK_FREQUENCY);
 
     Res = getDeviceAttrRaw(CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT, TmpInt);
     if (Res == CUDA_SUCCESS)
@@ -1055,7 +1062,8 @@ struct CUDADeviceTy : public GenericDeviceTy {
 
     Res = getDeviceAttrRaw(CU_DEVICE_ATTRIBUTE_MEMORY_CLOCK_RATE, TmpInt);
     if (Res == CUDA_SUCCESS)
-      Info.add("Memory Clock Rate", TmpInt, "kHz");
+      Info.add("Memory Clock Rate", TmpInt, "kHz",
+               DeviceInfo::MEMORY_CLOCK_RATE);
 
     Res = getDeviceAttrRaw(CU_DEVICE_ATTRIBUTE_GLOBAL_MEMORY_BUS_WIDTH, TmpInt);
     if (Res == CUDA_SUCCESS)
diff --git a/offload/tools/offload-tblgen/APIGen.cpp b/offload/tools/offload-tblgen/APIGen.cpp
index 8c61d1f12de7a..ea62fa3cd9e91 100644
--- a/offload/tools/offload-tblgen/APIGen.cpp
+++ b/offload/tools/offload-tblgen/APIGen.cpp
@@ -131,17 +131,32 @@ static void ProcessEnum(const EnumRec &Enum, raw_ostream &OS) {
   OS << formatv("/// @brief {0}\n", Enum.getDesc());
   OS << formatv("typedef enum {0} {{\n", Enum.getName());
 
-  uint32_t EtorVal = 0;
+  uint32_t EtorVal = Enum.isBitField();
   for (const auto &EnumVal : Enum.getValues()) {
-    if (Enum.isTyped()) {
-      OS << MakeComment(
-          formatv("[{0}] {1}", EnumVal.getTaggedType(), EnumVal.getDesc())
-              .str());
-    } else {
-      OS << MakeComment(EnumVal.getDesc());
-    }
-    OS << formatv(TAB_1 "{0}_{1} = {2},\n", Enum.getEnumValNamePrefix(),
-                  EnumVal.getName(), EtorVal++);
+    size_t NumTemplateValues{EnumVal.getTemplateValues().size()};
+    size_t TemplateIndex{
+        NumTemplateValues > 0 ? 0 : std::numeric_limits<size_t>::max()};
+    do {
+      std::string Name{TemplateIndex < std::numeric_limits<size_t>::max()
+                           ? EnumVal.getTemplateName(TemplateIndex)
+                           : EnumVal.getName()};
+      std::string Desc{TemplateIndex < std::numeric_limits<size_t>::max()
+                           ? EnumVal.getTemplateDesc(TemplateIndex)
+                           : EnumVal.getDesc()};
+      if (Enum.isTyped()) {
+        OS << MakeComment(
+            formatv("[{0}] {1}", EnumVal.getTaggedType(), Desc).str());
+      } else {
+        OS << MakeComment(Desc);
+      }
+      OS << formatv(TAB_1 "{0}_{1} = {2},\n", Enum.getEnumValNamePrefix(), Name,
+                    EtorVal);
+      if (Enum.isBitField()) {
+        EtorVal <<= 1u;
+      } else {
+        ++EtorVal;
+      }
+    } while (++TemplateIndex < NumTemplateValues);
   }
 
   // Add last_element/force uint32 val
diff --git a/offload/tools/offload-tblgen/MiscGen.cpp b/offload/tools/offload-tblgen/MiscGen.cpp
index b90e5cfdec8b2..afd76ba34ae06 100644
--- a/offload/tools/offload-tblgen/MiscGen.cpp
+++ b/offload/tools/offload-tblgen/MiscGen.cpp
@@ -107,10 +107,27 @@ void EmitOffloadInfo(const RecordKeeper &Records, raw_ostream &OS) {
 
 )";
 
-  auto ErrorCodeEnum = EnumRec{Records.getDef("DeviceInfo")};
-  uint32_t EtorVal = 0;
-  for (const auto &EnumVal : ErrorCodeEnum.getValues()) {
-    OS << formatv(TAB_1 "OFFLOAD_DEVINFO({0}, \"{1}\", {2})\n",
-                  EnumVal.getName(), EnumVal.getDesc(), EtorVal++);
+  auto Enum = EnumRec{Records.getDef("DeviceInfo")};
+
+  uint32_t EtorVal = Enum.isBitField();
+  for (const auto &EnumVal : Enum.getValues()) {
+    size_t NumTemplateValues{EnumVal.getTemplateValues().size()};
+    size_t TemplateIndex{
+        NumTemplateValues > 0 ? 0 : std::numeric_limits<size_t>::max()};
+    do {
+      std::string Name{TemplateIndex < std::numeric_limits<size_t>::max()
+                           ? EnumVal.getTemplateName(TemplateIndex)
+                           : EnumVal.getName()};
+      std::string Desc{TemplateIndex < std::numeric_limits<size_t>::max()
+                           ? EnumVal.getTemplateDesc(TemplateIndex)
+                           : EnumVal.getDesc()};
+      OS << formatv(TAB_1 "OFFLOAD_DEVINFO({0}, \"{1}\", {2})\n", Name, Desc,
+                    EtorVal);
+      if (Enum.isBitField()) {
+        EtorVal <<= 1u;
+      } else {
+        ++EtorVal;
+      }
+    } while (++TemplateIndex < NumTemplateValues);
   }
 }
diff --git a/offload/tools/offload-tblgen/PrintGen.cpp b/offload/tools/offload-tblgen/PrintGen.cpp
index 89d7c820426cf..1bfa388ada436 100644
--- a/offload/tools/offload-tblgen/PrintGen.cpp
+++ b/offload/tools/offload-tblgen/PrintGen.cpp
@@ -40,10 +40,18 @@ static void ProcessEnum(const EnumRec &Enum, raw_ostream &OS) {
                 Enum.getName());
 
   for (const auto &Val : Enum.getValues()) {
-    auto Name = Enum.getEnumValNamePrefix() + "_" + Val.getName();
-    OS << formatv(TAB_1 "case {0}:\n", Name);
-    OS << formatv(TAB_2 "os << \"{0}\";\n", Name);
-    OS << formatv(TAB_2 "break;\n");
+    size_t NumTemplateValues{Val.getTemplateValues().size()};
+    size_t TemplateIndex{
+        NumTemplateValues > 0 ? 0 : std::numeric_limits<size_t>::max()};
+    do {
+      auto Name = Enum.getEnumValNamePrefix() + "_" +
+                  (TemplateIndex < std::numeric_limits<size_t>::max()
+                       ? Val.getTemplateName(TemplateIndex)
+                       : Val.getName());
+      OS << formatv(TAB_1 "case {0}:\n", Name);
+      OS << formatv(TAB_2 "os << \"{0}\";\n", Name);
+      OS << formatv(TAB_2 "break;\n");
+    } while (++TemplateIndex < NumTemplateValues);
   }
 
   OS << TAB_1 "default:\n" TAB_2 "os << \"unknown enumerator\";\n" TAB_2
@@ -67,29 +75,37 @@ inline void printTagged(llvm::raw_ostream &os, const void *ptr, {0} value, size_
                 Enum.getName());
 
   for (const auto &Val : Enum.getValues()) {
-    auto Name = Enum.getEnumValNamePrefix() + "_" + Val.getName();
-    auto Type = Val.getTaggedType();
-    OS << formatv(TAB_1 "case {0}: {{\n", Name);
-    // Special case for strings
-    if (Type == "char[]") {
-      OS << formatv(TAB_2 "printPtr(os, (const char*) ptr);\n");
-    } else {
-      if (Type == "void *")
-        OS << formatv(TAB_2 "void * const * const tptr = (void * "
-                            "const * const)ptr;\n");
-      else
-        OS << formatv(
-            TAB_2 "const {0} * const tptr = (const {0} * co...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Aug 7, 2025

@llvm/pr-subscribers-offload

Author: Rafal Bielski (rafbiels)

Changes

Add the following properties in Offload device info:

  • VENDOR_ID
  • NUM_COMPUTE_UNITS
  • [SINGLE|DOUBLE|HALF]_FP_CONFIG
  • PREFERRED_VECTOR_WIDTH_[CHAR|SHORT|INT|LONG|FLOAT|DOUBLE|HALF]
  • NATIVE_VECTOR_WIDTH_[CHAR|SHORT|INT|LONG|FLOAT|DOUBLE|HALF]
  • MAX_CLOCK_FREQUENCY
  • MEMORY_CLOCK_RATE
  • ADDRESS_BITS
  • MAX_MEM_ALLOC_SIZE
  • GLOBAL_MEM_SIZE

Introduce templated enumerators in Offload TableGen to reduce code duplication for the per-type definitions.

Use macros in unit test definitions to reduce code duplication.


Patch is 34.99 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152533.diff

13 Files Affected:

  • (modified) offload/liboffload/API/APIDefs.td (+16)
  • (modified) offload/liboffload/API/Device.td (+32)
  • (modified) offload/liboffload/src/OffloadImpl.cpp (+85-3)
  • (modified) offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h (+6)
  • (modified) offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h (+1)
  • (modified) offload/plugins-nextgen/amdgpu/src/rtl.cpp (+22-3)
  • (modified) offload/plugins-nextgen/cuda/src/rtl.cpp (+12-4)
  • (modified) offload/tools/offload-tblgen/APIGen.cpp (+25-10)
  • (modified) offload/tools/offload-tblgen/MiscGen.cpp (+22-5)
  • (modified) offload/tools/offload-tblgen/PrintGen.cpp (+41-25)
  • (modified) offload/tools/offload-tblgen/RecordTypes.hpp (+30-1)
  • (modified) offload/unittests/OffloadAPI/device/olGetDeviceInfo.cpp (+64-6)
  • (modified) offload/unittests/OffloadAPI/device/olGetDeviceInfoSize.cpp (+65-34)
diff --git a/offload/liboffload/API/APIDefs.td b/offload/liboffload/API/APIDefs.td
index 640932dcf8464..79f7242a35ecf 100644
--- a/offload/liboffload/API/APIDefs.td
+++ b/offload/liboffload/API/APIDefs.td
@@ -158,16 +158,32 @@ class Etor<string Name, string Desc> {
   string tagged_type;
 }
 
+class EtorTemplate<string Name, string Desc, list<string> TemplateValues> : Etor<Name, Desc> {
+  // Create Etor for every value in template_values by replacing the string
+  // "<TemplateEtor>" with the template value in the name and description
+  list<string> template_values = TemplateValues;
+}
+
 class TaggedEtor<string Name, string Type, string Desc> : Etor<Name, Desc> {
   let tagged_type = Type;
 }
 
+class TaggedEtorTemplate<string Name, string Type, string Desc, list<string> TemplateValues> : TaggedEtor<Name, Type, Desc> {
+  // Create TaggedEtor for every value in template_values by replacing the
+  // string "<TemplateEtor>" with the template value in the name and description
+  list<string> template_values = TemplateValues;
+}
+
 class Enum : APIObject {
   // This refers to whether the enumerator descriptions specify a return
   // type for functions where this enum may be used as an output type. If set,
   // all Etor values must be TaggedEtor records
   bit is_typed = 0;
 
+  // This refers to whether the enumerator is used to name bits of a bit field,
+  // where consecutive values are bit-shifted rather than incremented.
+  bit is_bit_field = 0;
+
   list<Etor> etors = [];
 }
 
diff --git a/offload/liboffload/API/Device.td b/offload/liboffload/API/Device.td
index 857c596124b27..062542364f08e 100644
--- a/offload/liboffload/API/Device.td
+++ b/offload/liboffload/API/Device.td
@@ -34,9 +34,41 @@ def DeviceInfo : Enum {
     TaggedEtor<"DRIVER_VERSION", "char[]", "Driver version">,
     TaggedEtor<"MAX_WORK_GROUP_SIZE", "uint32_t", "Maximum total work group size in work items">,
     TaggedEtor<"MAX_WORK_GROUP_SIZE_PER_DIMENSION", "ol_dimensions_t", "Maximum work group size in each dimension">,
+    TaggedEtor<"VENDOR_ID", "uint32_t", "Device vendor ID">,
+    TaggedEtor<"NUM_COMPUTE_UNITS", "uint32_t", "Number of compute units">,
+    TaggedEtorTemplate<"<TemplateEtor>_FP_CONFIG", "ol_device_fp_capability_flags_t", "<TemplateEtor> precision floating point capability", ["Single","Half","Double"]>,
+    TaggedEtorTemplate<"PREFERRED_VECTOR_WIDTH_<TemplateEtor>", "uint32_t", "Preferred vector width for <TemplateEtor>", ["char","short","int","long","float","double","half"]>,
+    TaggedEtorTemplate<"NATIVE_VECTOR_WIDTH_<TemplateEtor>", "uint32_t", "Native vector width for <TemplateEtor>", ["char","short","int","long","float","double","half"]>,
+    TaggedEtor<"MAX_CLOCK_FREQUENCY", "uint32_t", "Max clock frequency in MHz">,
+    TaggedEtor<"MEMORY_CLOCK_RATE", "uint32_t", "Memory clock frequency in MHz">,
+    TaggedEtor<"ADDRESS_BITS", "uint32_t", "Number of bits used to represent an address in device memory">,
+    TaggedEtor<"MAX_MEM_ALLOC_SIZE", "uint64_t", "The maximum size of memory object allocation in bytes">,
+    TaggedEtor<"GLOBAL_MEM_SIZE", "uint64_t", "The size of global device memory in bytes">,
   ];
 }
 
+def : Enum {
+  let name = "ol_device_fp_capability_flag_t";
+  let desc = "Device floating-point capability flags";
+  let is_bit_field = 1;
+  let etors =[
+    Etor<"CORRECTLY_ROUNDED_DIVIDE_SQRT", "Support correctly rounded divide and sqrt">,
+    Etor<"ROUND_TO_NEAREST", "Support round to nearest">,
+    Etor<"ROUND_TO_ZERO", "Support round to zero">,
+    Etor<"ROUND_TO_INF", "Support round to infinity">,
+    Etor<"INF_NAN", "Support INF to NAN">,
+    Etor<"DENORM", "Support denorm">,
+    Etor<"FMA", "Support fused multiply-add">,
+    Etor<"SOFT_FLOAT", "Basic floating point operations implemented in software">,
+  ];
+}
+
+def : Typedef {
+  let name = "ol_device_fp_capability_flags_t";
+  let desc = "Device floating-point capability flags";
+  let value = "uint32_t";
+}
+
 def : FptrTypedef {
   let name = "ol_device_iterate_cb_t";
   let desc = "User-provided function to be used with `olIterateDevices`";
diff --git a/offload/liboffload/src/OffloadImpl.cpp b/offload/liboffload/src/OffloadImpl.cpp
index 272a12ab59a06..4ea587e29de4d 100644
--- a/offload/liboffload/src/OffloadImpl.cpp
+++ b/offload/liboffload/src/OffloadImpl.cpp
@@ -302,10 +302,57 @@ Error olGetDeviceInfoImplDetail(ol_device_handle_t Device,
   };
 
   // These are not implemented by the plugin interface
-  if (PropName == OL_DEVICE_INFO_PLATFORM)
+  switch (PropName) {
+  case OL_DEVICE_INFO_PLATFORM:
     return Info.write<void *>(Device->Platform);
-  if (PropName == OL_DEVICE_INFO_TYPE)
+
+  case OL_DEVICE_INFO_TYPE:
     return Info.write<ol_device_type_t>(OL_DEVICE_TYPE_GPU);
+
+  case OL_DEVICE_INFO_SINGLE_FP_CONFIG:
+  case OL_DEVICE_INFO_DOUBLE_FP_CONFIG: {
+    ol_device_fp_capability_flags_t flags{0};
+    flags |= OL_DEVICE_FP_CAPABILITY_FLAG_CORRECTLY_ROUNDED_DIVIDE_SQRT |
+             OL_DEVICE_FP_CAPABILITY_FLAG_ROUND_TO_NEAREST |
+             OL_DEVICE_FP_CAPABILITY_FLAG_ROUND_TO_ZERO |
+             OL_DEVICE_FP_CAPABILITY_FLAG_ROUND_TO_INF |
+             OL_DEVICE_FP_CAPABILITY_FLAG_INF_NAN |
+             OL_DEVICE_FP_CAPABILITY_FLAG_DENORM |
+             OL_DEVICE_FP_CAPABILITY_FLAG_FMA;
+    return Info.write(flags);
+  }
+
+  case OL_DEVICE_INFO_HALF_FP_CONFIG:
+    return Info.write<ol_device_fp_capability_flags_t>(0);
+
+  case OL_DEVICE_INFO_PREFERRED_VECTOR_WIDTH_CHAR:
+  case OL_DEVICE_INFO_PREFERRED_VECTOR_WIDTH_SHORT:
+  case OL_DEVICE_INFO_PREFERRED_VECTOR_WIDTH_INT:
+  case OL_DEVICE_INFO_PREFERRED_VECTOR_WIDTH_LONG:
+  case OL_DEVICE_INFO_PREFERRED_VECTOR_WIDTH_FLOAT:
+  case OL_DEVICE_INFO_PREFERRED_VECTOR_WIDTH_DOUBLE:
+  case OL_DEVICE_INFO_NATIVE_VECTOR_WIDTH_CHAR:
+  case OL_DEVICE_INFO_NATIVE_VECTOR_WIDTH_SHORT:
+  case OL_DEVICE_INFO_NATIVE_VECTOR_WIDTH_INT:
+  case OL_DEVICE_INFO_NATIVE_VECTOR_WIDTH_LONG:
+  case OL_DEVICE_INFO_NATIVE_VECTOR_WIDTH_FLOAT:
+  case OL_DEVICE_INFO_NATIVE_VECTOR_WIDTH_DOUBLE:
+    return Info.write<uint32_t>(1);
+
+  case OL_DEVICE_INFO_PREFERRED_VECTOR_WIDTH_HALF:
+  case OL_DEVICE_INFO_NATIVE_VECTOR_WIDTH_HALF:
+    return Info.write<uint32_t>(0);
+
+  // None of the existing plugins specify a limit on a single allocation,
+  // so return the global memory size instead
+  case OL_DEVICE_INFO_MAX_MEM_ALLOC_SIZE:
+    PropName = OL_DEVICE_INFO_GLOBAL_MEM_SIZE;
+    break;
+
+  default:
+    break;
+  }
+
   if (PropName >= OL_DEVICE_INFO_LAST)
     return createOffloadError(ErrorCode::INVALID_ENUMERATION,
                               "getDeviceInfo enum '%i' is invalid", PropName);
@@ -316,6 +363,7 @@ Error olGetDeviceInfoImplDetail(ol_device_handle_t Device,
                      "plugin did not provide a response for this information");
   auto Entry = *EntryOpt;
 
+  // Retrieve properties from the plugin interface
   switch (PropName) {
   case OL_DEVICE_INFO_NAME:
   case OL_DEVICE_INFO_VENDOR:
@@ -327,7 +375,18 @@ Error olGetDeviceInfoImplDetail(ol_device_handle_t Device,
     return Info.writeString(std::get<std::string>(Entry->Value).c_str());
   }
 
-  case OL_DEVICE_INFO_MAX_WORK_GROUP_SIZE: {
+  case OL_DEVICE_INFO_GLOBAL_MEM_SIZE: {
+    // Uint64 values
+    if (!std::holds_alternative<uint64_t>(Entry->Value))
+      return makeError(ErrorCode::BACKEND_FAILURE,
+                       "plugin returned incorrect type");
+    return Info.write(std::get<uint64_t>(Entry->Value));
+  }
+
+  case OL_DEVICE_INFO_MAX_WORK_GROUP_SIZE:
+  case OL_DEVICE_INFO_VENDOR_ID:
+  case OL_DEVICE_INFO_NUM_COMPUTE_UNITS:
+  case OL_DEVICE_INFO_ADDRESS_BITS: {
     // Uint32 values
     if (!std::holds_alternative<uint64_t>(Entry->Value))
       return makeError(ErrorCode::BACKEND_FAILURE,
@@ -339,6 +398,29 @@ Error olGetDeviceInfoImplDetail(ol_device_handle_t Device,
     return Info.write(static_cast<uint32_t>(Value));
   }
 
+  case OL_DEVICE_INFO_MAX_CLOCK_FREQUENCY:
+  case OL_DEVICE_INFO_MEMORY_CLOCK_RATE: {
+    // Frequency might require unit conversion to MHz
+    if (!std::holds_alternative<uint64_t>(Entry->Value))
+      return makeError(ErrorCode::BACKEND_FAILURE,
+                       "plugin returned incorrect type");
+    auto Value = std::get<uint64_t>(Entry->Value);
+    StringRef Units = Entry->Units;
+    if (!Units.empty() && Units != "MHz") {
+      // Conversion required
+      if (Units == "Hz")
+        Value /= 1000000;
+      else if (Units == "kHz")
+        Value /= 1000;
+      else if (Units == "GHz")
+        Value *= 1000;
+    }
+    if (Value > std::numeric_limits<uint32_t>::max())
+      return makeError(ErrorCode::BACKEND_FAILURE,
+                       "plugin returned out of range device info");
+    return Info.write(static_cast<uint32_t>(Value));
+  }
+
   case OL_DEVICE_INFO_MAX_WORK_GROUP_SIZE_PER_DIMENSION: {
     // {x, y, z} triples
     ol_dimensions_t Out{0, 0, 0};
diff --git a/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h b/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h
index 61f680bab3a07..ad135f72fff12 100644
--- a/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h
+++ b/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.h
@@ -70,10 +70,16 @@ typedef enum {
   HSA_ISA_INFO_NAME = 1
 } hsa_isa_info_t;
 
+typedef enum {
+  HSA_MACHINE_MODEL_SMALL = 0,
+  HSA_MACHINE_MODEL_LARGE = 1
+} hsa_machine_model_t;
+
 typedef enum {
   HSA_AGENT_INFO_NAME = 0,
   HSA_AGENT_INFO_VENDOR_NAME = 1,
   HSA_AGENT_INFO_FEATURE = 2,
+  HSA_AGENT_INFO_MACHINE_MODEL = 3,
   HSA_AGENT_INFO_PROFILE = 4,
   HSA_AGENT_INFO_WAVEFRONT_SIZE = 6,
   HSA_AGENT_INFO_WORKGROUP_MAX_DIM = 7,
diff --git a/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h b/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h
index 3117763e35896..29cfe78082dbb 100644
--- a/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h
+++ b/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h
@@ -67,6 +67,7 @@ typedef enum hsa_amd_agent_info_s {
   HSA_AMD_AGENT_INFO_CACHELINE_SIZE = 0xA001,
   HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT = 0xA002,
   HSA_AMD_AGENT_INFO_MAX_CLOCK_FREQUENCY = 0xA003,
+  HSA_AMD_AGENT_INFO_MEMORY_MAX_FREQUENCY = 0xA008,
   HSA_AMD_AGENT_INFO_PRODUCT_NAME = 0xA009,
   HSA_AMD_AGENT_INFO_MAX_WAVES_PER_CU = 0xA00A,
   HSA_AMD_AGENT_INFO_NUM_SIMDS_PER_CU = 0xA00B,
diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index 852c0e99b2266..20d8994f97a39 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -2643,6 +2643,15 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
     if (Status == HSA_STATUS_SUCCESS)
       Info.add("Vendor Name", TmpChar, "", DeviceInfo::VENDOR);
 
+    Info.add("Vendor ID", uint64_t{4130}, "", DeviceInfo::VENDOR_ID);
+
+    hsa_machine_model_t MachineModel;
+    Status = getDeviceAttrRaw(HSA_AGENT_INFO_MACHINE_MODEL, MachineModel);
+    if (Status == HSA_STATUS_SUCCESS)
+      Info.add("Memory Address Size",
+               uint64_t{MachineModel == HSA_MACHINE_MODEL_SMALL ? 32u : 64u},
+               "bits", DeviceInfo::ADDRESS_BITS);
+
     hsa_device_type_t DevType;
     Status = getDeviceAttrRaw(HSA_AGENT_INFO_DEVICE, DevType);
     if (Status == HSA_STATUS_SUCCESS) {
@@ -2693,11 +2702,17 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
 
     Status = getDeviceAttrRaw(HSA_AMD_AGENT_INFO_MAX_CLOCK_FREQUENCY, TmpUInt);
     if (Status == HSA_STATUS_SUCCESS)
-      Info.add("Max Clock Freq", TmpUInt, "MHz");
+      Info.add("Max Clock Freq", TmpUInt, "MHz",
+               DeviceInfo::MAX_CLOCK_FREQUENCY);
+
+    Status = getDeviceAttrRaw(HSA_AMD_AGENT_INFO_MEMORY_MAX_FREQUENCY, TmpUInt);
+    if (Status == HSA_STATUS_SUCCESS)
+      Info.add("Max Memory Clock Freq", TmpUInt, "MHz",
+               DeviceInfo::MEMORY_CLOCK_RATE);
 
     Status = getDeviceAttrRaw(HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT, TmpUInt);
     if (Status == HSA_STATUS_SUCCESS)
-      Info.add("Compute Units", TmpUInt);
+      Info.add("Compute Units", TmpUInt, "", DeviceInfo::NUM_COMPUTE_UNITS);
 
     Status = getDeviceAttrRaw(HSA_AMD_AGENT_INFO_NUM_SIMDS_PER_CU, TmpUInt);
     if (Status == HSA_STATUS_SUCCESS)
@@ -2779,7 +2794,11 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
 
       Status = Pool->getAttrRaw(HSA_AMD_MEMORY_POOL_INFO_SIZE, TmpSt);
       if (Status == HSA_STATUS_SUCCESS)
-        PoolNode.add("Size", TmpSt, "bytes");
+        PoolNode.add(
+            "Size", TmpSt, "bytes",
+            (Pool->isGlobal() && Pool->isCoarseGrained())
+                ? std::optional<DeviceInfo>{DeviceInfo::GLOBAL_MEM_SIZE}
+                : std::nullopt);
 
       Status = Pool->getAttrRaw(HSA_AMD_MEMORY_POOL_INFO_RUNTIME_ALLOC_ALLOWED,
                                 TmpBool);
diff --git a/offload/plugins-nextgen/cuda/src/rtl.cpp b/offload/plugins-nextgen/cuda/src/rtl.cpp
index 7649fd9285bb5..18a006f0b3992 100644
--- a/offload/plugins-nextgen/cuda/src/rtl.cpp
+++ b/offload/plugins-nextgen/cuda/src/rtl.cpp
@@ -951,13 +951,20 @@ struct CUDADeviceTy : public GenericDeviceTy {
 
     Info.add("Vendor Name", "NVIDIA", "", DeviceInfo::VENDOR);
 
+    Info.add("Vendor ID", uint64_t{4318}, "", DeviceInfo::VENDOR_ID);
+
+    Info.add("Memory Address Size", std::numeric_limits<CUdeviceptr>::digits,
+             "bits", DeviceInfo::ADDRESS_BITS);
+
     Res = cuDeviceTotalMem(&TmpSt, Device);
     if (Res == CUDA_SUCCESS)
-      Info.add("Global Memory Size", TmpSt, "bytes");
+      Info.add("Global Memory Size", TmpSt, "bytes",
+               DeviceInfo::GLOBAL_MEM_SIZE);
 
     Res = getDeviceAttrRaw(CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT, TmpInt);
     if (Res == CUDA_SUCCESS)
-      Info.add("Number of Multiprocessors", TmpInt);
+      Info.add("Number of Multiprocessors", TmpInt, "",
+               DeviceInfo::NUM_COMPUTE_UNITS);
 
     Res = getDeviceAttrRaw(CU_DEVICE_ATTRIBUTE_GPU_OVERLAP, TmpInt);
     if (Res == CUDA_SUCCESS)
@@ -1018,7 +1025,7 @@ struct CUDADeviceTy : public GenericDeviceTy {
 
     Res = getDeviceAttrRaw(CU_DEVICE_ATTRIBUTE_CLOCK_RATE, TmpInt);
     if (Res == CUDA_SUCCESS)
-      Info.add("Clock Rate", TmpInt, "kHz");
+      Info.add("Clock Rate", TmpInt, "kHz", DeviceInfo::MAX_CLOCK_FREQUENCY);
 
     Res = getDeviceAttrRaw(CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT, TmpInt);
     if (Res == CUDA_SUCCESS)
@@ -1055,7 +1062,8 @@ struct CUDADeviceTy : public GenericDeviceTy {
 
     Res = getDeviceAttrRaw(CU_DEVICE_ATTRIBUTE_MEMORY_CLOCK_RATE, TmpInt);
     if (Res == CUDA_SUCCESS)
-      Info.add("Memory Clock Rate", TmpInt, "kHz");
+      Info.add("Memory Clock Rate", TmpInt, "kHz",
+               DeviceInfo::MEMORY_CLOCK_RATE);
 
     Res = getDeviceAttrRaw(CU_DEVICE_ATTRIBUTE_GLOBAL_MEMORY_BUS_WIDTH, TmpInt);
     if (Res == CUDA_SUCCESS)
diff --git a/offload/tools/offload-tblgen/APIGen.cpp b/offload/tools/offload-tblgen/APIGen.cpp
index 8c61d1f12de7a..ea62fa3cd9e91 100644
--- a/offload/tools/offload-tblgen/APIGen.cpp
+++ b/offload/tools/offload-tblgen/APIGen.cpp
@@ -131,17 +131,32 @@ static void ProcessEnum(const EnumRec &Enum, raw_ostream &OS) {
   OS << formatv("/// @brief {0}\n", Enum.getDesc());
   OS << formatv("typedef enum {0} {{\n", Enum.getName());
 
-  uint32_t EtorVal = 0;
+  uint32_t EtorVal = Enum.isBitField();
   for (const auto &EnumVal : Enum.getValues()) {
-    if (Enum.isTyped()) {
-      OS << MakeComment(
-          formatv("[{0}] {1}", EnumVal.getTaggedType(), EnumVal.getDesc())
-              .str());
-    } else {
-      OS << MakeComment(EnumVal.getDesc());
-    }
-    OS << formatv(TAB_1 "{0}_{1} = {2},\n", Enum.getEnumValNamePrefix(),
-                  EnumVal.getName(), EtorVal++);
+    size_t NumTemplateValues{EnumVal.getTemplateValues().size()};
+    size_t TemplateIndex{
+        NumTemplateValues > 0 ? 0 : std::numeric_limits<size_t>::max()};
+    do {
+      std::string Name{TemplateIndex < std::numeric_limits<size_t>::max()
+                           ? EnumVal.getTemplateName(TemplateIndex)
+                           : EnumVal.getName()};
+      std::string Desc{TemplateIndex < std::numeric_limits<size_t>::max()
+                           ? EnumVal.getTemplateDesc(TemplateIndex)
+                           : EnumVal.getDesc()};
+      if (Enum.isTyped()) {
+        OS << MakeComment(
+            formatv("[{0}] {1}", EnumVal.getTaggedType(), Desc).str());
+      } else {
+        OS << MakeComment(Desc);
+      }
+      OS << formatv(TAB_1 "{0}_{1} = {2},\n", Enum.getEnumValNamePrefix(), Name,
+                    EtorVal);
+      if (Enum.isBitField()) {
+        EtorVal <<= 1u;
+      } else {
+        ++EtorVal;
+      }
+    } while (++TemplateIndex < NumTemplateValues);
   }
 
   // Add last_element/force uint32 val
diff --git a/offload/tools/offload-tblgen/MiscGen.cpp b/offload/tools/offload-tblgen/MiscGen.cpp
index b90e5cfdec8b2..afd76ba34ae06 100644
--- a/offload/tools/offload-tblgen/MiscGen.cpp
+++ b/offload/tools/offload-tblgen/MiscGen.cpp
@@ -107,10 +107,27 @@ void EmitOffloadInfo(const RecordKeeper &Records, raw_ostream &OS) {
 
 )";
 
-  auto ErrorCodeEnum = EnumRec{Records.getDef("DeviceInfo")};
-  uint32_t EtorVal = 0;
-  for (const auto &EnumVal : ErrorCodeEnum.getValues()) {
-    OS << formatv(TAB_1 "OFFLOAD_DEVINFO({0}, \"{1}\", {2})\n",
-                  EnumVal.getName(), EnumVal.getDesc(), EtorVal++);
+  auto Enum = EnumRec{Records.getDef("DeviceInfo")};
+
+  uint32_t EtorVal = Enum.isBitField();
+  for (const auto &EnumVal : Enum.getValues()) {
+    size_t NumTemplateValues{EnumVal.getTemplateValues().size()};
+    size_t TemplateIndex{
+        NumTemplateValues > 0 ? 0 : std::numeric_limits<size_t>::max()};
+    do {
+      std::string Name{TemplateIndex < std::numeric_limits<size_t>::max()
+                           ? EnumVal.getTemplateName(TemplateIndex)
+                           : EnumVal.getName()};
+      std::string Desc{TemplateIndex < std::numeric_limits<size_t>::max()
+                           ? EnumVal.getTemplateDesc(TemplateIndex)
+                           : EnumVal.getDesc()};
+      OS << formatv(TAB_1 "OFFLOAD_DEVINFO({0}, \"{1}\", {2})\n", Name, Desc,
+                    EtorVal);
+      if (Enum.isBitField()) {
+        EtorVal <<= 1u;
+      } else {
+        ++EtorVal;
+      }
+    } while (++TemplateIndex < NumTemplateValues);
   }
 }
diff --git a/offload/tools/offload-tblgen/PrintGen.cpp b/offload/tools/offload-tblgen/PrintGen.cpp
index 89d7c820426cf..1bfa388ada436 100644
--- a/offload/tools/offload-tblgen/PrintGen.cpp
+++ b/offload/tools/offload-tblgen/PrintGen.cpp
@@ -40,10 +40,18 @@ static void ProcessEnum(const EnumRec &Enum, raw_ostream &OS) {
                 Enum.getName());
 
   for (const auto &Val : Enum.getValues()) {
-    auto Name = Enum.getEnumValNamePrefix() + "_" + Val.getName();
-    OS << formatv(TAB_1 "case {0}:\n", Name);
-    OS << formatv(TAB_2 "os << \"{0}\";\n", Name);
-    OS << formatv(TAB_2 "break;\n");
+    size_t NumTemplateValues{Val.getTemplateValues().size()};
+    size_t TemplateIndex{
+        NumTemplateValues > 0 ? 0 : std::numeric_limits<size_t>::max()};
+    do {
+      auto Name = Enum.getEnumValNamePrefix() + "_" +
+                  (TemplateIndex < std::numeric_limits<size_t>::max()
+                       ? Val.getTemplateName(TemplateIndex)
+                       : Val.getName());
+      OS << formatv(TAB_1 "case {0}:\n", Name);
+      OS << formatv(TAB_2 "os << \"{0}\";\n", Name);
+      OS << formatv(TAB_2 "break;\n");
+    } while (++TemplateIndex < NumTemplateValues);
   }
 
   OS << TAB_1 "default:\n" TAB_2 "os << \"unknown enumerator\";\n" TAB_2
@@ -67,29 +75,37 @@ inline void printTagged(llvm::raw_ostream &os, const void *ptr, {0} value, size_
                 Enum.getName());
 
   for (const auto &Val : Enum.getValues()) {
-    auto Name = Enum.getEnumValNamePrefix() + "_" + Val.getName();
-    auto Type = Val.getTaggedType();
-    OS << formatv(TAB_1 "case {0}: {{\n", Name);
-    // Special case for strings
-    if (Type == "char[]") {
-      OS << formatv(TAB_2 "printPtr(os, (const char*) ptr);\n");
-    } else {
-      if (Type == "void *")
-        OS << formatv(TAB_2 "void * const * const tptr = (void * "
-                            "const * const)ptr;\n");
-      else
-        OS << formatv(
-            TAB_2 "const {0} * const tptr = (const {0} * co...
[truncated]

@RossBrunton
Copy link
Contributor

You'll also need to implement these in olGetDeviceInfoImplDetailHost (which is the imaginary host device and should probably use default or stub values).

@rafbiels rafbiels force-pushed the offload-device-info branch from 72d0264 to 2f5efcd Compare August 8, 2025 14:20
Copy link

github-actions bot commented Aug 8, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@rafbiels rafbiels force-pushed the offload-device-info branch from 2f5efcd to b391017 Compare August 8, 2025 16:00
@rafbiels rafbiels force-pushed the offload-device-info branch from b391017 to bb00b74 Compare August 12, 2025 14:57
Add the following properties in Offload device info:
* VENDOR_ID
* NUM_COMPUTE_UNITS
* [SINGLE|DOUBLE|HALF]_FP_CONFIG
* NATIVE_VECTOR_WIDTH_[CHAR|SHORT|INT|LONG|FLOAT|DOUBLE|HALF]
* MAX_CLOCK_FREQUENCY
* MEMORY_CLOCK_RATE
* ADDRESS_BITS
* MAX_MEM_ALLOC_SIZE
* GLOBAL_MEM_SIZE

Add a bitfield option to enumerators, allowing the values to be
bit-shifted instead of incremented. Generate the per-type enums
using `foreach` to reduce code duplication.

Use macros in unit test definitions to reduce code duplication.
@rafbiels rafbiels force-pushed the offload-device-info branch from bb00b74 to b53364b Compare August 13, 2025 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants