Skip to content

[X86][AVX10.2] Decouple AMX-AVX512 from AVX10.2 #148633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions clang/include/clang/Basic/BuiltinsX86_64.td
Original file line number Diff line number Diff line change
Expand Up @@ -290,7 +290,7 @@ let Features = "amx-complex,amx-transpose", Attributes = [NoThrow] in {
def tconjtfp16_internal : X86Builtin<"_Vector<256, int>(unsigned short, unsigned short, _Vector<256, int>)">;
}

let Features = "amx-avx512,avx10.2-512", Attributes = [NoThrow] in {
let Features = "amx-avx512,avx512f,evex512,avx512bf16", Attributes = [NoThrow] in {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since avx512bf16 implies avx512f, we can omit avx512f here and after.

def tcvtrowd2ps_internal : X86Builtin<"_Vector<16, float>(unsigned short, unsigned short, _Vector<256, int>, unsigned int)">;
def tcvtrowps2bf16h_internal : X86Builtin<"_Vector<32, __bf16>(unsigned short, unsigned short, _Vector<256, int>, unsigned int)">;
def tcvtrowps2bf16l_internal : X86Builtin<"_Vector<32, __bf16>(unsigned short, unsigned short, _Vector<256, int>, unsigned int)">;
Expand Down Expand Up @@ -382,7 +382,7 @@ let Features = "amx-complex,amx-transpose", Attributes = [NoThrow] in {
def tconjtfp16 : X86Builtin<"void(_Constant unsigned char, _Constant unsigned char)">;
}

let Features = "amx-avx512,avx10.2-512", Attributes = [NoThrow] in {
let Features = "amx-avx512,avx512f,evex512,avx512bf16", Attributes = [NoThrow] in {
def tcvtrowd2ps : X86Builtin<"_Vector<16, float>(_Constant unsigned char, unsigned int)">;
def tcvtrowps2bf16h : X86Builtin<"_Vector<32, __bf16>(_Constant unsigned char, unsigned int)">;
def tcvtrowps2bf16l : X86Builtin<"_Vector<32, __bf16>(_Constant unsigned char, unsigned int)">;
Expand Down
2 changes: 1 addition & 1 deletion clang/lib/Headers/amxavx512intrin.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

#define __DEFAULT_FN_ATTRS_AVX512 \
__attribute__((__always_inline__, __nodebug__, \
__target__("amx-avx512,avx10.2-512")))
__target__("amx-avx512,avx512f,evex512,avx512bf16")))

/// Moves a row from a tile register to a zmm destination register, converting
/// the int32 source elements to fp32. The row of the tile is selected by a
Expand Down
4 changes: 2 additions & 2 deletions clang/test/CodeGen/X86/amx_avx512_api.c
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
// RUN: %clang_cc1 %s -flax-vector-conversions=none -ffreestanding -triple=x86_64-unknown-unknown \
// RUN: -target-feature +amx-avx512 -target-feature +avx10.2-512 \
// RUN: -emit-llvm -o - -Werror -pedantic | FileCheck %s --check-prefixes=CHECK
// RUN: -target-feature +amx-avx512 -emit-llvm -o - -Werror -pedantic | \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why missing avx512f...?

// RUN: FileCheck %s --check-prefixes=CHECK

#include <immintrin.h>

Expand Down
2 changes: 1 addition & 1 deletion clang/test/CodeGen/X86/amxavx512-builtins.c
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
// RUN: %clang_cc1 %s -ffreestanding -triple=x86_64-unknown-unknown -target-feature +amx-tile -target-feature +amx-avx512 \
// RUN: -target-feature +avx10.2-512 -emit-llvm -o - -Wall -Werror -pedantic -Wno-gnu-statement-expression -flax-vector-conversions=none | FileCheck %s
// RUN: -emit-llvm -o - -Wall -Werror -pedantic -Wno-gnu-statement-expression -flax-vector-conversions=none | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird we request avx512f etc. features but not compile fail without them.


#include <immintrin.h>
#include <stddef.h>
Expand Down
3 changes: 2 additions & 1 deletion llvm/lib/Target/X86/X86.td
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,8 @@ def FeatureAMXTRANSPOSE : SubtargetFeature<"amx-transpose", "HasAMXTRANSPOSE", "
def FeatureAMXAVX512 : SubtargetFeature<"amx-avx512",
"HasAMXAVX512", "true",
"Support AMX-AVX512 instructions",
[FeatureAMXTILE]>;
[FeatureAMXTILE, FeatureAVX512,
FeatureEVEX512, FeatureBF16]>;
Comment on lines +280 to +281
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see the reason. We cannot add them here.

def FeatureAMXTF32 : SubtargetFeature<"amx-tf32", "HasAMXTF32", "true",
"Support AMX-TF32 instructions",
[FeatureAMXTILE]>;
Expand Down
52 changes: 28 additions & 24 deletions llvm/lib/Target/X86/X86InstrAMX.td
Original file line number Diff line number Diff line change
Expand Up @@ -550,7 +550,7 @@ let Predicates = [HasAMXMOVRS, In64BitMode], SchedRW = [WriteSystem] in {
} // HasAMXMOVRS, In64BitMode

multiclass m_tcvtrowd2ps {
let Predicates = [HasAMXAVX512, HasAVX10_2_512, In64BitMode] in {
let Predicates = [HasAMXAVX512, HasAVX512, HasEVEX512, In64BitMode] in {
let SchedRW = [WriteSystem] in {
def rri : Ii8<0x7, MRMSrcReg, (outs VR512:$dst),
(ins TILE:$src1, i32u8imm:$src2),
Expand All @@ -561,12 +561,12 @@ multiclass m_tcvtrowd2ps {
"tcvtrowd2ps\t{$src2, $src1, $dst|$dst, $src1, $src2}",
[]>, T8,XS, EVEX, VVVV, EVEX_V512;
}
} // HasAMXAVX512, HasAVX10_2_512, In64BitMode
} // HasAMXAVX512, HasAVX512, HasEVEX512, In64BitMode
}

defm TCVTROWD2PS : m_tcvtrowd2ps;

let Predicates = [HasAMXAVX512, HasAVX10_2_512, In64BitMode] in {
let Predicates = [HasAMXAVX512, HasAVX512, HasEVEX512, HasBF16, In64BitMode] in {
let SchedRW = [WriteSystem] in {
let usesCustomInserter = 1 in {
def PTCVTROWD2PSrri : PseudoI<(outs VR512:$dst), (ins u8imm:$src1, i32u8imm:$src2),
Expand Down Expand Up @@ -629,25 +629,29 @@ let Predicates = [HasAMXAVX512, HasAVX10_2_512, In64BitMode] in {
}

multiclass AMXAVX512_BASE<bits<8> Opcode1, bits<8> Opcode2, string Opstr,
Prefix P1, Prefix P2> {
let Predicates = [HasAMXAVX512, HasAVX10_2_512, In64BitMode], SchedRW = [WriteSystem] in {
let OpPrefix = P1 in
def rre : I<Opcode1, MRMSrcReg4VOp3, (outs VR512:$dst),
(ins TILE:$src1, GR32:$src2),
!strconcat(Opstr, "\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
[]>, EVEX, VVVV, EVEX_V512, T8;
let OpPrefix = P2 in
def rri : Ii8<Opcode2, MRMSrcReg, (outs VR512:$dst),
(ins TILE:$src1, i32u8imm:$src2),
Prefix P1, Prefix P2> {
let Predicates = [HasAMXAVX512, HasAVX512, HasEVEX512, HasBF16, In64BitMode] in {
let SchedRW = [WriteSystem] in {
let OpPrefix = P1 in
def rre : I<Opcode1, MRMSrcReg4VOp3, (outs VR512:$dst),
(ins TILE:$src1, GR32:$src2),
!strconcat(Opstr, "\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
[]>, EVEX, EVEX_V512, TA;
let usesCustomInserter = 1 in {
def "P"#NAME#"rre" : PseudoI<(outs VR512:$dst), (ins u8imm:$src1, GR32:$src2),
[(set VR512:$dst,
(!cast<Intrinsic>("int_x86_"#Opstr) timm:$src1, GR32:$src2))]>;
def "P"#NAME#"rri" : PseudoI<(outs VR512:$dst), (ins u8imm:$src1, i32u8imm:$src2),
[(set VR512:$dst,
(!cast<Intrinsic>("int_x86_"#Opstr) timm:$src1, imm:$src2))]>;
[]>, EVEX, VVVV, EVEX_V512, T8;
let OpPrefix = P2 in
def rri : Ii8<Opcode2, MRMSrcReg, (outs VR512:$dst),
(ins TILE:$src1, i32u8imm:$src2),
!strconcat(Opstr, "\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
[]>, EVEX, EVEX_V512, TA;
let usesCustomInserter = 1 in {
def "P"#NAME#"rre" : PseudoI<(outs VR512:$dst), (ins u8imm:$src1, GR32:$src2),
[(set VR512:$dst,
(!cast<Intrinsic>("int_x86_"#Opstr) timm:$src1,
GR32:$src2))]>;
def "P"#NAME#"rri" : PseudoI<(outs VR512:$dst), (ins u8imm:$src1, i32u8imm:$src2),
[(set VR512:$dst,
(!cast<Intrinsic>("int_x86_"#Opstr) timm:$src1,
imm:$src2))]>;
}
}
}
}
Expand All @@ -658,7 +662,7 @@ defm TCVTROWPS2BF16H : AMXAVX512_BASE<0x6d, 0x07, "tcvtrowps2bf16h", XD, XD>;
defm TCVTROWPS2BF16L : AMXAVX512_BASE<0x6d, 0x77, "tcvtrowps2bf16l", XS, XS>;

multiclass m_tilemovrow {
let Predicates = [HasAMXAVX512, HasAVX10_2_512, In64BitMode] in {
let Predicates = [HasAMXAVX512, HasAVX512, HasEVEX512, In64BitMode] in {
let SchedRW = [WriteSystem] in {
def rri : Ii8<0x7, MRMSrcReg, (outs VR512:$dst),
(ins TILE:$src1, u8imm:$src2),
Expand All @@ -669,12 +673,12 @@ multiclass m_tilemovrow {
"tilemovrow\t{$src2, $src1, $dst|$dst, $src1, $src2}",
[]>, T8,PD, EVEX, VVVV, EVEX_V512;
}
} // HasAMXAVX512, HasAVX10_2_512, In64BitMode
} // HasAMXAVX512, HasAVX512, HasEVEX512, In64BitMode
}

defm TILEMOVROW : m_tilemovrow;

let Predicates = [HasAMXAVX512, HasAVX10_2_512, In64BitMode] in {
let Predicates = [HasAMXAVX512, HasAVX512, HasEVEX512, In64BitMode] in {
let SchedRW = [WriteSystem] in {
let usesCustomInserter = 1 in {
def PTILEMOVROWrri : PseudoI<(outs VR512:$dst), (ins u8imm:$src1, i32u8imm:$src2),
Expand Down
2 changes: 1 addition & 1 deletion llvm/lib/TargetParser/X86TargetParser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -616,7 +616,7 @@ constexpr FeatureBitset ImpliedFeaturesAMX_FP8 = FeatureAMX_TILE;
constexpr FeatureBitset ImpliedFeaturesAMX_TRANSPOSE = FeatureAMX_TILE;
constexpr FeatureBitset ImpliedFeaturesAMX_MOVRS = FeatureAMX_TILE;
constexpr FeatureBitset ImpliedFeaturesAMX_AVX512 =
FeatureAMX_TILE | FeatureAVX10_2_512;
FeatureAMX_TILE | FeatureAVX512F | FeatureEVEX512 | FeatureAVX512BF16;
constexpr FeatureBitset ImpliedFeaturesAMX_TF32 = FeatureAMX_TILE;
constexpr FeatureBitset ImpliedFeaturesHRESET = {};

Expand Down
8 changes: 4 additions & 4 deletions llvm/test/CodeGen/X86/amx-across-func-tilemovrow.ll
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+amx-int8 -mattr=+avx10.2-512 -mattr=+amx-avx512 -verify-machineinstrs | FileCheck %s
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+amx-int8 -mattr=+avx10.2-512 -mattr=+amx-avx512 -verify-machineinstrs -enable-ipra | FileCheck -check-prefix=IPRA %s
; RUN: llc < %s -O0 -mtriple=x86_64-unknown-unknown -mattr=+amx-int8 -mattr=+avx10.2-512 -mattr=+amx-avx512 -verify-machineinstrs | FileCheck -check-prefix=O0 %s
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+amx-int8,+amx-avx512 -verify-machineinstrs | FileCheck %s
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+amx-int8,+amx-avx512 -verify-machineinstrs -enable-ipra | FileCheck -check-prefix=IPRA %s
; RUN: llc < %s -O0 -mtriple=x86_64-unknown-unknown -mattr=+amx-int8,+amx-avx512 -verify-machineinstrs | FileCheck -check-prefix=O0 %s

@buf = dso_local global [3072 x i8] zeroinitializer, align 64

Expand Down Expand Up @@ -95,7 +95,7 @@ define dso_local <16 x i32> @test_api(i16 signext %0, i16 signext %1) nounwind {
; O0-NEXT: movq %rsp, %rbp
; O0-NEXT: andq $-1024, %rsp # imm = 0xFC00
; O0-NEXT: subq $4096, %rsp # imm = 0x1000
; O0-NEXT: vpxor %xmm0, %xmm0, %xmm0
; O0-NEXT: vxorps %xmm0, %xmm0, %xmm0
; O0-NEXT: vmovups %zmm0, {{[0-9]+}}(%rsp)
; O0-NEXT: movb $1, {{[0-9]+}}(%rsp)
; O0-NEXT: movw %si, %cx
Expand Down
2 changes: 1 addition & 1 deletion llvm/test/CodeGen/X86/amx-avx512-intrinsics.ll
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -O0 -verify-machineinstrs -mtriple=x86_64-unknown-unknown --show-mc-encoding -mattr=+amx-tile,+amx-avx512,+avx10.2-512 | FileCheck %s
; RUN: llc < %s -O0 -verify-machineinstrs -mtriple=x86_64-unknown-unknown --show-mc-encoding -mattr=+amx-tile,+amx-avx512 | FileCheck %s

define <16 x float> @test_tcvtrowd2ps(i32 %A) {
; CHECK-LABEL: test_tcvtrowd2ps:
Expand Down
4 changes: 2 additions & 2 deletions llvm/test/CodeGen/X86/amx-tile-avx512-internals.ll
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+amx-tile,+amx-bf16,+avx10.2-512, \
; RUN: -mattr=+amx-avx512 -verify-machineinstrs | FileCheck %s
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+amx-tile,+amx-bf16,+amx-avx512 \
; RUN: -verify-machineinstrs | FileCheck %s

define void @test_amx(i8* %pointer, i8* %base, i32 %index, i64 %stride) {
; CHECK-LABEL: test_amx:
Expand Down