Skip to content

SILOptimizer: two optimization improvements for address-only enums #34115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 30, 2020

Conversation

eeckstein
Copy link
Contributor

This PR contains 2 commits:

  • SILCombine: make the alloc_stack-enum optimization a bit more tolerant:
    When eliminating an enum from an alloc_stack, accept "dead" inject_enum_addr instructions, which inject different enum cases.
  • SimplifyCFG: allow jump-threading for switch_enum_data_addr instructions:
    If the branch-block injects a certain enum case and the destination switches on that enum, it's worth jump threading. E.g.
  inject_enum_addr %enum : $*Optional<T>, #Optional.some
  ... // no memory writes here
  br bb1
bb1:
  ... // no memory writes here
  switch_enum_addr %enum : $*Optional<T>, case #Optional.some ...

Both improvements enable removing all usages of optionals in loops, which iterate over arrays of address-only elements, e.g.

  func test<T>(_ items: [T]) {
    for i in items {
      print(i)
    }
  }

So this change is mainly an improvement for non-specialized generic code.

@eeckstein
Copy link
Contributor Author

@swift-ci test

@eeckstein
Copy link
Contributor Author

@swift-ci benchmark

@eeckstein eeckstein requested a review from atrick September 29, 2020 15:41
@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
Data.append.Sequence.64kB.Count.RE.I 21 44 +109.5% 0.48x
Data.append.Sequence.64kB.Count.RE 21 44 +109.5% 0.48x
DataAppendSequence 6000 8900 +48.3% 0.67x
Data.append.Sequence.809B.Count.RE 61 89 +45.9% 0.69x
Data.append.Sequence.809B.Count.RE.I 61 88 +44.3% 0.69x
AngryPhonebook.ASCII2 109 142 +30.3% 0.77x
StringHasPrefixAscii 1040 1180 +13.5% 0.88x (?)
String.replaceSubrange.ArrChar.Small 36 39 +8.3% 0.92x (?)
Set.isSuperset.Seq.Int.Empty 87 94 +8.0% 0.93x (?)
Set.isDisjoint.Seq.Empty.Box 87 94 +8.0% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
ObserverForwarderStruct 535 320 -40.2% 1.67x (?)
IterateData 900 803 -10.8% 1.12x (?)
Set.isDisjoint.Box.Empty 103 93 -9.7% 1.11x (?)
DictionaryOfAnyHashableStrings_insert 5068 4620 -8.8% 1.10x (?)
Set.isDisjoint.Seq.Box.Empty 91 83 -8.8% 1.10x (?)
Data.init.Sequence.2049B.Count.I 59 54 -8.5% 1.09x (?)
AngryPhonebook.Strasse 129 119 -7.8% 1.08x (?)
DataSetCountSmall 91 84 -7.7% 1.08x (?)
Set.isStrictSubset.Int.Empty 54 50 -7.4% 1.08x (?)
Set.isStrictSuperset.Seq.Empty.Int 180 167 -7.2% 1.08x (?)
Data.init.Sequence.809B.Count 58 54 -6.9% 1.07x (?)
Data.init.Sequence.809B.Count.I 58 54 -6.9% 1.07x (?)
NormalizedIterator_fastPrenormal 750 700 -6.7% 1.07x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
StaticArray.o 11993 12433 +3.7% 0.96x
 
Improvement OLD NEW DELTA RATIO
ObserverForwarderStruct.o 2950 2522 -14.5% 1.17x
DictionaryLiteral.o 1262 1210 -4.1% 1.04x
ObserverUnappliedMethod.o 5297 5113 -3.5% 1.04x
ProtocolConformance.o 3983 3887 -2.4% 1.02x
DictionaryKeysContains.o 8739 8583 -1.8% 1.02x
NibbleSort.o 13344 13145 -1.5% 1.02x
SortIntPyramids.o 9164 9052 -1.2% 1.01x
ObjectiveCBridging.o 59541 58829 -1.2% 1.01x
DictionaryOfAnyHashableStrings.o 8080 7984 -1.2% 1.01x
StringRemoveDupes.o 5469 5405 -1.2% 1.01x
TwoSum.o 4184 4136 -1.1% 1.01x
DictTest2.o 10365 10253 -1.1% 1.01x
DictionaryBridgeToObjC.o 5057 5003 -1.1% 1.01x

Performance: -Osize

Regression OLD NEW DELTA RATIO
Data.append.Sequence.64kB.Count 27 44 +63.0% 0.61x
Data.init.Sequence.64kB.Count.I 27 43 +59.3% 0.63x
Data.init.Sequence.64kB.Count 27 43 +59.3% 0.63x
Data.append.Sequence.64kB.Count.I 28 44 +57.1% 0.64x
Data.init.Sequence.2047B.Count.I 53 77 +45.3% 0.69x
Data.init.Sequence.2049B.Count.I 53 77 +45.3% 0.69x
Data.init.Sequence.809B.Count.I 52 72 +38.5% 0.72x
Data.init.Sequence.809B.Count 53 72 +35.8% 0.74x
Data.append.Sequence.809B.Count.I 65 85 +30.8% 0.76x
AngryPhonebook.ASCII2 109 142 +30.3% 0.77x
Data.append.Sequence.809B.Count 66 85 +28.8% 0.78x
Data.init.Sequence.513B.Count.I 66 84 +27.3% 0.79x
Data.init.Sequence.511B.Count.I 66 84 +27.3% 0.79x
DataCountSmall 15 19 +26.7% 0.79x
DataCountMedium 15 17 +13.3% 0.88x (?)
String.replaceSubrange.Substring.Small 38 42 +10.5% 0.90x (?)
Set.isSubset.Seq.Empty.Int 80 88 +10.0% 0.91x
String.replaceSubrange.String.Small 31 34 +9.7% 0.91x (?)
String.replaceSubrange.ArrChar.Small 35 38 +8.6% 0.92x (?)
ProtocolDispatch 221 239 +8.1% 0.92x (?)
Set.isDisjoint.Empty.Int 86 93 +8.1% 0.92x (?)
Set.isDisjoint.Seq.Empty.Box 87 94 +8.0% 0.93x
Set.isDisjoint.Empty.Box 88 95 +8.0% 0.93x (?)
Set.isSuperset.Seq.Int.Empty 88 95 +8.0% 0.93x (?)
PrefixWhileAnySequenceLazy 1234 1327 +7.5% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
DictionaryLiteral 4760 2890 -39.3% 1.65x
DataCreateMedium 6300 4100 -34.9% 1.54x
Set.isStrictSubset.Int.Empty 53 47 -11.3% 1.13x (?)
Set.isSubset.Seq.Int.Empty 130 117 -10.0% 1.11x (?)
Set.subtracting.Empty.Box 10 9 -10.0% 1.11x (?)
Set.isStrictSubset.Seq.Int.Empty 132 119 -9.8% 1.11x (?)
DataAccessBytesMedium 64 58 -9.4% 1.10x (?)
ObjectiveCBridgeStringIsEqualAllSwift 66 60 -9.1% 1.10x (?)
Set.isStrictSuperset.Seq.Empty.Int 181 165 -8.8% 1.10x (?)
NibbleSort 2740 2510 -8.4% 1.09x (?)
DictionaryOfAnyHashableStrings_insert 5180 4774 -7.8% 1.09x (?)
AngryPhonebook.Strasse 129 119 -7.8% 1.08x (?)
Set.isSubset.Int.Empty 53 49 -7.5% 1.08x (?)
ObjectiveCBridgeStubFromNSString 589 548 -7.0% 1.07x (?)
DictionaryOfAnyHashableStrings_lookup 3600 3360 -6.7% 1.07x (?)

Code size: -Osize

Improvement OLD NEW DELTA RATIO
StaticArray.o 11118 10898 -2.0% 1.02x
DictionaryKeysContains.o 7858 7705 -1.9% 1.02x
ProtocolConformance.o 3650 3587 -1.7% 1.02x
ObserverForwarderStruct.o 2847 2809 -1.3% 1.01x
ObserverUnappliedMethod.o 5129 5077 -1.0% 1.01x

Performance: -Onone

Regression OLD NEW DELTA RATIO
AngryPhonebook.ASCII2 110 144 +30.9% 0.76x
 
Improvement OLD NEW DELTA RATIO
Set.isStrictSubset.Empty.Int 565 417 -26.2% 1.35x
DictionaryLiteral 9860 7660 -22.3% 1.29x
Set.isStrictSuperset.Seq.Empty.Int 1344 1144 -14.9% 1.17x
ArrayAppendAsciiSubstring 49500 42264 -14.6% 1.17x (?)
ArrayAppendUTF16Substring 50616 43272 -14.5% 1.17x
ArrayAppendLatin1Substring 50040 43236 -13.6% 1.16x (?)
MapReduceLazySequence 19020 16649 -12.5% 1.14x
ArrayPlusEqualFiveElementCollection 181411 159248 -12.2% 1.14x (?)
ArrayPlusEqualSingleElementCollection 223720 197776 -11.6% 1.13x
Combos 1954 1737 -11.1% 1.12x (?)
ArrayAppendUTF16 26146 23392 -10.5% 1.12x (?)
ArrayAppendAscii 25602 23018 -10.1% 1.11x (?)
Set.isDisjoint.Seq.Empty.Box 907 817 -9.9% 1.11x (?)
Set.isDisjoint.Seq.Box.Empty 1138 1027 -9.8% 1.11x (?)
Set.isDisjoint.Empty.Box 1071 967 -9.7% 1.11x (?)
Set.subtracting.Empty.Box 219 198 -9.6% 1.11x
ArrayAppendLatin1 25500 23086 -9.5% 1.10x (?)
Set.subtracting.Seq.Box.Empty 1401 1271 -9.3% 1.10x (?)
Set.isDisjoint.Box.Empty 1188 1079 -9.2% 1.10x
ArrayPlusEqualThreeElements 7810 7100 -9.1% 1.10x (?)
Set.subtracting.Box.Empty 247 225 -8.9% 1.10x (?)
DictionaryOfAnyHashableStrings_insert 6118 5600 -8.5% 1.09x (?)
AngryPhonebook.Strasse 129 119 -7.8% 1.08x (?)
DropLastAnySequence 13680 12624 -7.7% 1.08x (?)
Set.subtracting.Seq.Empty.Box 1133 1049 -7.4% 1.08x (?)
SetIsSubsetBox0 1240 1149 -7.3% 1.08x (?)
FlattenListFlatMap 247112 228985 -7.3% 1.08x (?)
DropLastSequenceLazy 14182 13184 -7.0% 1.08x (?)
Set.isStrictSubset.Box0 1231 1146 -6.9% 1.07x (?)

Code size: -swiftlibs

Regression OLD NEW DELTA RATIO
libswiftFoundation.dylib 1310720 1327104 +1.2% 0.99x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@eeckstein
Copy link
Contributor Author

about the benchmark results: the Data benchmark regressions seem to be caused by data/code alignment changes. The generated code is the same.

Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, with a few comments below of things I would add.

When eliminating an enum from an alloc_stack, accept "dead" inject_enum_addr instructions, which inject different enum cases.
…ons.

If the branch-block injects a certain enum case and the destination switches on that enum, it's worth jump threading. E.g.

  inject_enum_addr %enum : $*Optional<T>, #Optional.some
  ... // no memory writes here
  br DestBB
DestBB:
  ... // no memory writes here
  switch_enum_addr %enum : $*Optional<T>, case #Optional.some ...

This enables removing all code with optionals in a loop, which iterates over an array of address-only elements, e.g.

  func test<T>(_ items: [T]) {
    for i in items {
      print(i)
    }
  }
@eeckstein
Copy link
Contributor Author

@atrick Thanks for reviewing. In this new version I addressed your comments.

@eeckstein
Copy link
Contributor Author

@swift-ci smoke test

@eeckstein
Copy link
Contributor Author

@swift-ci smoke test linux

@eeckstein eeckstein merged commit f6c008f into swiftlang:main Sep 30, 2020
@eeckstein eeckstein deleted the enum-optimizations branch September 30, 2020 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants