SILOptimizer: two optimization improvements for address-only enums #34115

eeckstein · 2020-09-29T15:39:45Z

This PR contains 2 commits:

SILCombine: make the alloc_stack-enum optimization a bit more tolerant:
When eliminating an enum from an alloc_stack, accept "dead" inject_enum_addr instructions, which inject different enum cases.
SimplifyCFG: allow jump-threading for switch_enum_data_addr instructions:
If the branch-block injects a certain enum case and the destination switches on that enum, it's worth jump threading. E.g.

  inject_enum_addr %enum : $*Optional<T>, #Optional.some
  ... // no memory writes here
  br bb1
bb1:
  ... // no memory writes here
  switch_enum_addr %enum : $*Optional<T>, case #Optional.some ...

Both improvements enable removing all usages of optionals in loops, which iterate over arrays of address-only elements, e.g.

  func test<T>(_ items: [T]) {
    for i in items {
      print(i)
    }
  }

So this change is mainly an improvement for non-specialized generic code.

eeckstein · 2020-09-29T15:40:01Z

@swift-ci test

eeckstein · 2020-09-29T15:40:13Z

@swift-ci benchmark

swift-ci · 2020-09-29T16:38:38Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
Data.append.Sequence.64kB.Count.RE.I	21	44	+109.5%	0.48x
Data.append.Sequence.64kB.Count.RE	21	44	+109.5%	0.48x
DataAppendSequence	6000	8900	+48.3%	0.67x
Data.append.Sequence.809B.Count.RE	61	89	+45.9%	0.69x
Data.append.Sequence.809B.Count.RE.I	61	88	+44.3%	0.69x
AngryPhonebook.ASCII2	109	142	+30.3%	0.77x
StringHasPrefixAscii	1040	1180	+13.5%	0.88x (?)
String.replaceSubrange.ArrChar.Small	36	39	+8.3%	0.92x (?)
Set.isSuperset.Seq.Int.Empty	87	94	+8.0%	0.93x (?)
Set.isDisjoint.Seq.Empty.Box	87	94	+8.0%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
ObserverForwarderStruct	535	320	-40.2%	1.67x (?)
IterateData	900	803	-10.8%	1.12x (?)
Set.isDisjoint.Box.Empty	103	93	-9.7%	1.11x (?)
DictionaryOfAnyHashableStrings_insert	5068	4620	-8.8%	1.10x (?)
Set.isDisjoint.Seq.Box.Empty	91	83	-8.8%	1.10x (?)
Data.init.Sequence.2049B.Count.I	59	54	-8.5%	1.09x (?)
AngryPhonebook.Strasse	129	119	-7.8%	1.08x (?)
DataSetCountSmall	91	84	-7.7%	1.08x (?)
Set.isStrictSubset.Int.Empty	54	50	-7.4%	1.08x (?)
Set.isStrictSuperset.Seq.Empty.Int	180	167	-7.2%	1.08x (?)
Data.init.Sequence.809B.Count	58	54	-6.9%	1.07x (?)
Data.init.Sequence.809B.Count.I	58	54	-6.9%	1.07x (?)
NormalizedIterator_fastPrenormal	750	700	-6.7%	1.07x (?)

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
StaticArray.o	11993	12433	+3.7%	0.96x

Improvement	OLD	NEW	DELTA	RATIO
ObserverForwarderStruct.o	2950	2522	-14.5%	1.17x
DictionaryLiteral.o	1262	1210	-4.1%	1.04x
ObserverUnappliedMethod.o	5297	5113	-3.5%	1.04x
ProtocolConformance.o	3983	3887	-2.4%	1.02x
DictionaryKeysContains.o	8739	8583	-1.8%	1.02x
NibbleSort.o	13344	13145	-1.5%	1.02x
SortIntPyramids.o	9164	9052	-1.2%	1.01x
ObjectiveCBridging.o	59541	58829	-1.2%	1.01x
DictionaryOfAnyHashableStrings.o	8080	7984	-1.2%	1.01x
StringRemoveDupes.o	5469	5405	-1.2%	1.01x
TwoSum.o	4184	4136	-1.1%	1.01x
DictTest2.o	10365	10253	-1.1%	1.01x
DictionaryBridgeToObjC.o	5057	5003	-1.1%	1.01x

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
Data.append.Sequence.64kB.Count	27	44	+63.0%	0.61x
Data.init.Sequence.64kB.Count.I	27	43	+59.3%	0.63x
Data.init.Sequence.64kB.Count	27	43	+59.3%	0.63x
Data.append.Sequence.64kB.Count.I	28	44	+57.1%	0.64x
Data.init.Sequence.2047B.Count.I	53	77	+45.3%	0.69x
Data.init.Sequence.2049B.Count.I	53	77	+45.3%	0.69x
Data.init.Sequence.809B.Count.I	52	72	+38.5%	0.72x
Data.init.Sequence.809B.Count	53	72	+35.8%	0.74x
Data.append.Sequence.809B.Count.I	65	85	+30.8%	0.76x
AngryPhonebook.ASCII2	109	142	+30.3%	0.77x
Data.append.Sequence.809B.Count	66	85	+28.8%	0.78x
Data.init.Sequence.513B.Count.I	66	84	+27.3%	0.79x
Data.init.Sequence.511B.Count.I	66	84	+27.3%	0.79x
DataCountSmall	15	19	+26.7%	0.79x
DataCountMedium	15	17	+13.3%	0.88x (?)
String.replaceSubrange.Substring.Small	38	42	+10.5%	0.90x (?)
Set.isSubset.Seq.Empty.Int	80	88	+10.0%	0.91x
String.replaceSubrange.String.Small	31	34	+9.7%	0.91x (?)
String.replaceSubrange.ArrChar.Small	35	38	+8.6%	0.92x (?)
ProtocolDispatch	221	239	+8.1%	0.92x (?)
Set.isDisjoint.Empty.Int	86	93	+8.1%	0.92x (?)
Set.isDisjoint.Seq.Empty.Box	87	94	+8.0%	0.93x
Set.isDisjoint.Empty.Box	88	95	+8.0%	0.93x (?)
Set.isSuperset.Seq.Int.Empty	88	95	+8.0%	0.93x (?)
PrefixWhileAnySequenceLazy	1234	1327	+7.5%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
DictionaryLiteral	4760	2890	-39.3%	1.65x
DataCreateMedium	6300	4100	-34.9%	1.54x
Set.isStrictSubset.Int.Empty	53	47	-11.3%	1.13x (?)
Set.isSubset.Seq.Int.Empty	130	117	-10.0%	1.11x (?)
Set.subtracting.Empty.Box	10	9	-10.0%	1.11x (?)
Set.isStrictSubset.Seq.Int.Empty	132	119	-9.8%	1.11x (?)
DataAccessBytesMedium	64	58	-9.4%	1.10x (?)
ObjectiveCBridgeStringIsEqualAllSwift	66	60	-9.1%	1.10x (?)
Set.isStrictSuperset.Seq.Empty.Int	181	165	-8.8%	1.10x (?)
NibbleSort	2740	2510	-8.4%	1.09x (?)
DictionaryOfAnyHashableStrings_insert	5180	4774	-7.8%	1.09x (?)
AngryPhonebook.Strasse	129	119	-7.8%	1.08x (?)
Set.isSubset.Int.Empty	53	49	-7.5%	1.08x (?)
ObjectiveCBridgeStubFromNSString	589	548	-7.0%	1.07x (?)
DictionaryOfAnyHashableStrings_lookup	3600	3360	-6.7%	1.07x (?)

Code size: -Osize

Improvement	OLD	NEW	DELTA	RATIO
StaticArray.o	11118	10898	-2.0%	1.02x
DictionaryKeysContains.o	7858	7705	-1.9%	1.02x
ProtocolConformance.o	3650	3587	-1.7%	1.02x
ObserverForwarderStruct.o	2847	2809	-1.3%	1.01x
ObserverUnappliedMethod.o	5129	5077	-1.0%	1.01x

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
AngryPhonebook.ASCII2	110	144	+30.9%	0.76x

Improvement	OLD	NEW	DELTA	RATIO
Set.isStrictSubset.Empty.Int	565	417	-26.2%	1.35x
DictionaryLiteral	9860	7660	-22.3%	1.29x
Set.isStrictSuperset.Seq.Empty.Int	1344	1144	-14.9%	1.17x
ArrayAppendAsciiSubstring	49500	42264	-14.6%	1.17x (?)
ArrayAppendUTF16Substring	50616	43272	-14.5%	1.17x
ArrayAppendLatin1Substring	50040	43236	-13.6%	1.16x (?)
MapReduceLazySequence	19020	16649	-12.5%	1.14x
ArrayPlusEqualFiveElementCollection	181411	159248	-12.2%	1.14x (?)
ArrayPlusEqualSingleElementCollection	223720	197776	-11.6%	1.13x
Combos	1954	1737	-11.1%	1.12x (?)
ArrayAppendUTF16	26146	23392	-10.5%	1.12x (?)
ArrayAppendAscii	25602	23018	-10.1%	1.11x (?)
Set.isDisjoint.Seq.Empty.Box	907	817	-9.9%	1.11x (?)
Set.isDisjoint.Seq.Box.Empty	1138	1027	-9.8%	1.11x (?)
Set.isDisjoint.Empty.Box	1071	967	-9.7%	1.11x (?)
Set.subtracting.Empty.Box	219	198	-9.6%	1.11x
ArrayAppendLatin1	25500	23086	-9.5%	1.10x (?)
Set.subtracting.Seq.Box.Empty	1401	1271	-9.3%	1.10x (?)
Set.isDisjoint.Box.Empty	1188	1079	-9.2%	1.10x
ArrayPlusEqualThreeElements	7810	7100	-9.1%	1.10x (?)
Set.subtracting.Box.Empty	247	225	-8.9%	1.10x (?)
DictionaryOfAnyHashableStrings_insert	6118	5600	-8.5%	1.09x (?)
AngryPhonebook.Strasse	129	119	-7.8%	1.08x (?)
DropLastAnySequence	13680	12624	-7.7%	1.08x (?)
Set.subtracting.Seq.Empty.Box	1133	1049	-7.4%	1.08x (?)
SetIsSubsetBox0	1240	1149	-7.3%	1.08x (?)
FlattenListFlatMap	247112	228985	-7.3%	1.08x (?)
DropLastSequenceLazy	14182	13184	-7.0%	1.08x (?)
Set.isStrictSubset.Box0	1231	1146	-6.9%	1.07x (?)

Code size: -swiftlibs

Regression	OLD	NEW	DELTA	RATIO
libswiftFoundation.dylib	1310720	1327104	+1.2%	0.99x

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

eeckstein · 2020-09-29T18:08:46Z

about the benchmark results: the Data benchmark regressions seem to be caused by data/code alignment changes. The generated code is the same.

atrick

Looks good to me, with a few comments below of things I would add.

lib/SILOptimizer/SILCombiner/SILCombinerMiscVisitors.cpp

lib/SILOptimizer/Transforms/SimplifyCFG.cpp

When eliminating an enum from an alloc_stack, accept "dead" inject_enum_addr instructions, which inject different enum cases.

…ons. If the branch-block injects a certain enum case and the destination switches on that enum, it's worth jump threading. E.g. inject_enum_addr %enum : $*Optional<T>, #Optional.some ... // no memory writes here br DestBB DestBB: ... // no memory writes here switch_enum_addr %enum : $*Optional<T>, case #Optional.some ... This enables removing all code with optionals in a loop, which iterates over an array of address-only elements, e.g. func test<T>(_ items: [T]) { for i in items { print(i) } }

eeckstein · 2020-09-30T14:47:15Z

@atrick Thanks for reviewing. In this new version I addressed your comments.

eeckstein · 2020-09-30T14:47:23Z

@swift-ci smoke test

eeckstein · 2020-09-30T16:16:58Z

@swift-ci smoke test linux

eeckstein requested a review from atrick September 29, 2020 15:41

atrick approved these changes Sep 30, 2020

View reviewed changes

lib/SILOptimizer/SILCombiner/SILCombinerMiscVisitors.cpp Show resolved Hide resolved

lib/SILOptimizer/Transforms/SimplifyCFG.cpp Outdated Show resolved Hide resolved

lib/SILOptimizer/Transforms/SimplifyCFG.cpp Show resolved Hide resolved

eeckstein added 2 commits September 30, 2020 16:44

SILCombine: make the alloc_stack-enum optimization a bit more tolerant.

28220ff

When eliminating an enum from an alloc_stack, accept "dead" inject_enum_addr instructions, which inject different enum cases.

eeckstein force-pushed the enum-optimizations branch from 8d37f2c to 0a71d0f Compare September 30, 2020 14:46

eeckstein merged commit f6c008f into swiftlang:main Sep 30, 2020

eeckstein deleted the enum-optimizations branch September 30, 2020 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SILOptimizer: two optimization improvements for address-only enums #34115

SILOptimizer: two optimization improvements for address-only enums #34115

Uh oh!

eeckstein commented Sep 29, 2020

Uh oh!

eeckstein commented Sep 29, 2020

Uh oh!

eeckstein commented Sep 29, 2020

Uh oh!

swift-ci commented Sep 29, 2020

Uh oh!

eeckstein commented Sep 29, 2020

Uh oh!

atrick left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eeckstein commented Sep 30, 2020

Uh oh!

eeckstein commented Sep 30, 2020

Uh oh!

eeckstein commented Sep 30, 2020

Uh oh!

Uh oh!

SILOptimizer: two optimization improvements for address-only enums #34115

SILOptimizer: two optimization improvements for address-only enums #34115

Uh oh!

Conversation

eeckstein commented Sep 29, 2020

Uh oh!

eeckstein commented Sep 29, 2020

Uh oh!

eeckstein commented Sep 29, 2020

Uh oh!

swift-ci commented Sep 29, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

eeckstein commented Sep 29, 2020

Uh oh!

atrick left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eeckstein commented Sep 30, 2020

Uh oh!

eeckstein commented Sep 30, 2020

Uh oh!

eeckstein commented Sep 30, 2020

Uh oh!

Uh oh!