Skip to content

[MLIR][ROCDL] Implement math.ipowi #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10,000 commits into from
Closed

[MLIR][ROCDL] Implement math.ipowi #4

wants to merge 10,000 commits into from

Conversation

lialan
Copy link

@lialan lialan commented Jan 13, 2025

No description provided.

rnk and others added 30 commits January 9, 2025 11:21
…vm#122029)

Move the common case of FieldDecl::getFieldIndex() inline to mitigate
the cost of removing the extra `FieldNo` induction variable.

Also rename isNoUniqueAddress parameter to isNonVirtualBaseType, which
appears to be more accurate. I think the current name is just a
consequence of autocomplete gone wrong.
Need to sync the mask between cost and actual emission to avoid bugs in
mask calculation

Fixes llvm#122324
I’m seeing a series of errors when trying to run the cmake configure
step on macOS when the cmake generator is set to Xcode. All is well if I
use the Ninja or Unix Makefile generators. Messages are all of the form:
~~~
CMake Error at …llvm-project/clang/cmake/modules/AddClang.cmake:120
(target_compile_definitions):
  Cannot specify compile definitions for target "obj.clangBasic" which
  is not built by this project.
Call Stack (most recent call first):
  …llvm-project/clang/lib/Basic/CMakeLists.txt:57 (add_clang_library)
~~~
The remaining errors are similar but mention targets obj.clangAPINotes,
obj.clangLex, obj.clangParse, and so on.

The regression appears to have been introduced by commit 09fa2f0
(Oct 14 2024) which added the code in this area.

My proposed solution is simply to add a test to ensure that the obj.x
target exists before setting its compile definitions. There is precedent
doing just this in both clang/cmake/modules/AddClang.cmake and
clang/lib/support/CMakeLists.txt as well as in the “MSVC AND NOT
CLANG_LINK_CLANG_DYLIB” path immediately above the offending line.

I’ve also made a couple of grammatical tweaks in the comments
surrounding this code.

In case it's relevant, the cmake settings and definitions I've used to
trigger these errors is:
~~~bash
GENERATOR="Xcode"
OUTDIR=build_macos
cmake \
-S "$SCRIPT_DIR/llvm" \
-B "$SCRIPT_DIR/$OUTDIR" \
-G "$GENERATOR" \
-D CMAKE_BUILD_TYPE=Release \
-D CMAKE_OSX_ARCHITECTURES=arm64 \
-D LLVM_PARALLEL_LINK_JOBS=1 \
-D LLVM_ENABLE_PROJECTS="clang;lld" \
-D LLVM_TARGETS_TO_BUILD=RISCV \
-D LLVM_DEFAULT_TARGET_TRIPLE=riscv32-unknown-elf \
-D LLVM_OPTIMIZED_TABLEGEN=Yes
~~~
(cmake v3.31.1, Xcode 16.1. I know that not all of these variables are
useful for the Xcode generator!)

Co-authored-by: Paul Bowen-Huggett <[email protected]>
…llvm#122332)

The SEW operand for these instructions should have a value of 0. This
matches what was done for vcpop/vfirst.
…2286)

Don't suggest to comment-out the parameter name if the parameter has an
attribute that's spelled after the parameter name.

This prevents the parameter's attributes from being wrongly applied to
the parameter's type.

This fixes llvm#122191.
…lvm#122190)

The GPU ID operations already implement InferIntRangeInterface, which
gives constant lower and upper bounds on those IDs when appropriate
metadata is prentent on the operations or in the surrounding context.

This commit uses that existing code to implement the
ValueBoundsOpInterface, which is used when analyzing affine operations
(unlike the integer range interface, which is used for arithmetic
optimization).

It also implements the interface for gpu.launch, where we can use it to
express the constraint that block/grid sizes are equal to their value
from outside the launch op and that the corresponding IDs are bounded
above by that size.

As a consequence, the test pass for this inference is updated to work on
a FunctionOpInterface and not a func.func, creating minor churn in other
tests.
)

With this patch we switch from the temporary dummy seeds to actual seeds
provided by the seed collector.
The seeds get sliced and each slice is used as the starting point for
vectorization.
The test runs asynchronous kernels and depending on the timing the
output is slightly different. We now only check for the common parts of
the output.
Summary:
Previously we had some indirection here, this patch updates these
utilities to just be normal template functions. We use SFINAE to manage
the special case handling for floats. Also this strips address spaces so
it can be used more generally.
Summary:
Use a normal bitcast, remove from the shared utils since it's not
available in
GCC 7.4
…llvm#122354)

This disables the support added in PR121985 by default while we
investigate a compile time crash.
This adds a workflow for running HLSL tests on PRs that modify HLSL and
DirectX code.

The tests enabled here are the LLVM & Clang tests and the Offload
execution tests: https://github.com/llvm-beanz/offload-test-suite/
Pre-commit some tests in preparation to teach ValueTracking's
implied-cond about samesign.
…lvm#120327)

The `sycl_kernel_entry_point` attribute is used to declare a function that
defines a pattern for an offload kernel entry point. The attribute requires
a single type argument that specifies a class type that meets the requirements
for a SYCL kernel name as described in section 5.2, "Naming of kernels", of
the SYCL 2020 specification. A unique kernel name type is required for each
function declared with the attribute. The attribute may not first appear on a
declaration that follows a definition of the function. The function is
required to have a non-deduced `void` return type. The function must not be
a non-static member function, be deleted or defaulted, be declared with the
`constexpr` or `consteval` specifiers, be declared with the `[[noreturn]]`
attribute, be a coroutine, or accept variadic arguments.

Diagnostics are not yet provided for the following:
- Use of a type as a kernel name that does not satisfy the forward
  declarability requirements specified in section 5.2, "Naming of kernels",
  of the SYCL 2020 specification.
- Use of a type as a parameter of the attributed function that does not
  satisfy the kernel parameter requirements specified in section 4.12.4,
  "Rules for parameter passing to kernels", of the SYCL 2020 specification
  (each such function parameter constitutes a kernel parameter).
- Use of language features that are not permitted in device functions as
  specified in section 5.4, "Language restrictions for device functions",
  of the SYCL 2020 specification.

There are several issues noted by various FIXME comments.
- The diagnostic generated for kernel name conflicts needs additional work
  to better detail the relevant source locations; such as the location of
  each declaration as well as the original source of each kernel name.
- A number of the tests illustrate spurious errors being produced due to
  attributes that appertain to function templates being instantiated too
  early (during overload resolution as opposed to after an overload is
  selected).

Included changes allow the `SYCLKernelEntryPointAttr` attribute to be
marked as invalid if a `sycl_kernel_entry_point` attribute is used incorrectly.
This is intended to prevent trying to emit an offload kernel entry point
without having to mark the associated function as invalid since doing so
would affect overload resolution; which this attribute should not do.
Unfortunately, Clang eagerly instantiates attributes that appertain to
functions with the result that errors might be issued for function
declarations that are never selected by overload resolution. Tests have
been added to demonstrate this. Further work will be needed to address
these issues (for this and other attributes).
Summary:
This isn't used anymore, I moved the GPU extensions into `offload/`.
After llvm#120563 malloc_size also
needs intercepting on Apple platforms, otherwise all type-sanitized
binaries crash on startup with an objc error:
realized class 0x12345 has corrupt data pointer: malloc_size(0x567) = 0

PR: llvm#122133
llvm#122371)

…llvm#121991)"

This reverts commit f8f8598.

This breaks ARMv7 and s390x buildbot with the following message:
```
llvm-exegesis error: No available targets are compatible with triple "armv8l-unknown-linux-gnueabihf"
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/tcwg-buildbot/worker/clang-armv7-2stage/stage2/bin/FileCheck /home/tcwg-buildbot/worker/clang-armv7-2stage/llvm/llvm/test/tools/llvm-exegesis/dry-run-measurement.test
```
…m#120662)

The Clang tablegen built-in function prototype parser has the `__bf16`
type missing. This patch adds the missing type to the parser.
…pace declaration of a negative test. (llvm#122375)

Commit 1a73654 added a missing
diagnostic for incorrect placement of an attribute in a namespace
declaration. This change corrects a SYCL test that inadvertently
exercised the `sycl_kernel_entry_point` attribute in the wrong
declaration location.
These make cross compiling the test suite more difficult, as you need
the
sysroot to contain these headers and libraries cross compiled for your
target.
It's straightforward to stick with the corresponding C headers.
If the pointer to be checked is statically known to be zero, the tag
check will always pass since:
1) the tag is zero
2) shadow memory for address 0 is initialized to 0 and never updated.
We can therefore elide the tag check.

We perform the elision in two places:
1) the HWASan pass
2) when lowering the CHECK_MEMACCESS intrinsic. Conceivably, the HWASan
pass may encounter a "cannot currently statically prove to be null"
pointer (and is therefore unable to omit the intrinsic) that later
optimization passes convert into a statically known-null pointer. As a
last line of defense, we perform elision here too.

This also updates the tests from
llvm#122186
This patch improves the linker’s ability to estimate stub reachability
in the `TextOutputSection::estimateStubsInRangeVA` function. It does so
by including thunks that have already been placed ahead of the current
call site address when calculating the threshold for direct stub calls.

Before this fix, the estimation process overlooked existing forward
thunks. This could result in some thunks not being inserted where
needed. In rare situations, particularly with large and specially
arranged codebases, this might lead to branch instructions being out of
range, causing linking errors.

Although this patch successfully addresses the problem, it is not
feasible to create a test for this issue. The specific layout and order
of thunk creation required to reproduce the corner case are too complex,
making test creation impractical.

Example error messages the issue could generate:
```
ld64.lld: error: banana.o:(symbol OUTLINED_FUNCTION_24949_3875): relocation BRANCH26 is out of range: 134547892 is not in [-134217728, 134217727]; references objc_autoreleaseReturnValue
ld64.lld: error: main.o:(symbol _main+0xc): relocation BRANCH26 is out of range: 134544132 is not in [-134217728, 134217727]; references objc_release
```
pobrn and others added 11 commits January 12, 2025 10:07
…atic functions (llvm#119974)

Static member functions can be considered the same way as free functions
are, so do that.
Msan is not supported on Android as mentioned in google/sanitizers#1381.
We proactively give the warning saying it is unsupported to fix
android/ndk#1958.
…lvm#102299)

This checks that classes/structs inheriting from
``std::enable_shared_from_this`` does so with public inheritance, so it
prevents crashes due to ``std::make_shared`` and ``shared_from_this()``
getting called when the internal weak pointer was not initialized (e.g.
due to private inheritance).
…2634)

Certain non-standard float types were directly passed through in the
LLVM type converter, resulting in invalid IR or failed assertions:

```
mlir-opt: mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp:638: FailureOr<Type> mlir::LLVMTypeConverter::convertVectorType(VectorType) const: Assertion `LLVM::isCompatibleVectorType(vectorType) && "expected vector type compatible with the LLVM dialect"' failed.
```

The LLVM type converter should not define invalid type conversion rules
for such types. If there is no type conversion rule, conversion patterns
will not apply to ops with such operand types.
…cl) appears in the trailing return type of the lambda (llvm#122611)

The (function) type of the lambda function is null while parsing
trailing return type. The type is filled-in when the lambda body is
entered. So, resolving `__PRETTY_FUNCTION__` before the lambda body is
entered causes the crash.

Fixes llvm#121274.
Option allows using full LTO when linking bitcode files compiled with
unified LTO pipeline.
There is a narrow special-case in isImpliedCondICmps that can benefit
from being taught about samesign. Since it costs us nothing to implement
it, teach it about samesign, for completeness. This patch marks the
completion of the effort to teach ValueTracking about samesign.
@lialan lialan closed this Jan 13, 2025
@lialan lialan deleted the lialan/rocdl_lib branch January 14, 2025 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.