Skip to content

WIP vectorization for UTF16->UTF8 #83073

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Catfish-Man
Copy link
Contributor

No description provided.

@Catfish-Man Catfish-Man self-assigned this Jul 15, 2025
#endif
let mask = Word(truncatingIfNeeded: 0x80808080_80808080 as UInt64)

#if (arch(arm64) || arch(arm64_32))// && SWIFT_STDLIB_ENABLE_VECTOR_TYPES
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do an x86 version of this, but it was giving me weird errors and I wanted to get the core thing working before dealing with them. TBH it's probably still faster on x86 than it was even this way though.

} else {
isASCII = false
var tmp: (
UInt32, UInt32, UInt32, UInt32, UInt32, UInt32, UInt32, UInt32
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making a temporary buffer here is sort of awful and I want to improve it at some point, but it's also not really hurting anything and simplifies the rest of the code a lot

@Catfish-Man
Copy link
Contributor Author

@swift-ci please test

@Catfish-Man
Copy link
Contributor Author

@swift-ci please benchmark

@Catfish-Man
Copy link
Contributor Author

@swift-ci please Apple Silicon benchmark

@Catfish-Man
Copy link
Contributor Author

------- Performance (arm64): -Osize -------
REGRESSION                               OLD        NEW        DELTA       RATIO    
NSString.bridged.byteCount.ascii.utf8    0.0        0.339      +33900.0%   **0.00x (?)**
Calculator                               115.5      128.056    +10.9%      **0.90x**
Chars2                                   2731.25    2982.432   +9.2%       **0.92x**

IMPROVEMENT                              OLD        NEW        DELTA       RATIO    
UTF16Decode.initDecoding                 69.2       4.077      -94.1%      **16.97x**
UTF16Decode.initFromCustom.cont          251.375    21.8       -91.3%      **11.53x**
ArrayAppendGenericStructs                1134.545   870.0      -23.3%      **1.30x (?)**
Array.removeAll.keepingCapacity.Object   2.522      2.238      -11.3%      **1.13x (?)**
InsertCharacterEndIndex                  58.865     54.923     -6.7%       **1.07x**

I'll take that. (The NSString.bridged.byteCount.ascii.utf8 result is noise due to it running too fast after earlier speedups)

@Catfish-Man
Copy link
Contributor Author

Some of those failures do look real, so this'll stay as a draft for now

@Catfish-Man
Copy link
Contributor Author

Somehow the x86 results look better despite not using the hand vectorized path? I guess I should try using the fallback path on arm64 and see if it does ok there 😂

 IMPROVEMENT                                   OLD         NEW         DELTA    RATIO    
17:07:59  UTF16Decode.initDecoding                      176.167     6.55        -96.3%   **26.89x**
17:07:59  UTF16Decode.initFromCustom.cont               475.5       37.615      -92.1%   **12.64x**
17:07:59  Breadcrumbs.CopyAllUTF16CodeUnits.longMixed   223.364     160.133     -28.3%   **1.39x**
17:07:59  Breadcrumbs.CopyAllUTF16CodeUnits.Mixed       226.2       162.733     -28.1%   **1.39x**
17:07:59  Breadcrumbs.CopyUTF16CodeUnits.longMixed      229.6       165.643     -27.9%   **1.39x**

@Catfish-Man
Copy link
Contributor Author

@swift-ci please test

@Catfish-Man
Copy link
Contributor Author

@swift-ci please Apple Silicon benchmark

@Catfish-Man
Copy link
Contributor Author

@swift-ci please benchmark

@Catfish-Man
Copy link
Contributor Author

Catfish-Man commented Jul 16, 2025

Turns out not accidentally processing twice as much data improves the speedup!

------- Performance (arm64): -Osize -------

REGRESSION                        OLD       NEW       DELTA    RATIO    
MapReduceClass2                   59.048    64.658    +9.5%    **0.91x**
MapReduceClassShort2              91.654    100.0     +9.1%    **0.92x (?)**

IMPROVEMENT                       OLD       NEW       DELTA    RATIO    
UTF16Decode.initDecoding          72.619    2.239     -96.9%   **32.42x**
UTF16Decode.initFromCustom.cont   252.375   22.0      -91.3%   **11.47x**
BufferFillFromSlice               11.326    10.068    -11.1%   **1.12x (?)**
ArrayAppendToGeneric              179.5     165.496   -7.8%    **1.08x (?)**
String.replaceSubrange.String     6.076     5.615     -7.6%    **1.08x**
InsertCharacterEndIndex           60.472    56.405    -6.7%    **1.07x**

@Catfish-Man
Copy link
Contributor Author

@swift-ci please Apple Silicon benchmark

@Catfish-Man
Copy link
Contributor Author

@swift-ci please benchmark

@Catfish-Man
Copy link
Contributor Author

@swift-ci please Apple Silicon benchmark

@Catfish-Man
Copy link
Contributor Author

Catfish-Man commented Jul 19, 2025

IMPROVEMENT                                  OLD         NEW         DELTA       RATIO    
UTF16Decode.initDecoding                     69.524      2.135       -96.9%      **32.55x**
UTF16Decode.initFromCustom.cont              252.0       21.648      -91.4%      **11.64x**
Calculator                                   128.056     115.55      -9.8%       **1.11x**
StringHasPrefixUnicode                       24014.085   21890.411   -8.8%       **1.10x**

Just as good as before, so I think that means I get to delete all the architecture-specific bits of the patch :)

@Catfish-Man
Copy link
Contributor Author

@swift-ci please benchmark

@Catfish-Man
Copy link
Contributor Author

@swift-ci please Apple Silicon benchmark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant