Skip to content

Commit 2ee52cd

Browse files
committed
Task T228334710 update tuning guide
stack-info: PR: #3433, branch: drisspg/stack/1
1 parent a96b470 commit 2ee52cd

File tree

1 file changed

+13
-63
lines changed

1 file changed

+13
-63
lines changed

recipes_source/recipes/tuning_guide.py

Lines changed: 13 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,20 @@
88
techniques often can be implemented by changing only a few lines of code and can
99
be applied to a wide range of deep learning models across all domains.
1010
11+
Prerequisites
12+
~~~~~~~~~~~~~
13+
- PyTorch 2.0 or later
14+
- Python 3.8 or later
15+
- CUDA-capable GPU (recommended for GPU optimizations)
16+
- Linux, macOS, or Windows operating system
17+
1118
General optimizations
1219
---------------------
1320
"""
1421

22+
import torch
23+
import torchvision
24+
1525
###############################################################################
1626
# Enable asynchronous data loading and augmentation
1727
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -90,7 +100,7 @@
90100
# setting it to zero, for more details refer to the
91101
# `documentation <https://pytorch.org/docs/master/optim.html#torch.optim.Optimizer.zero_grad>`_.
92102
#
93-
# Alternatively, starting from PyTorch 1.7, call ``model`` or
103+
# Alternatively, call ``model`` or
94104
# ``optimizer.zero_grad(set_to_none=True)``.
95105

96106
###############################################################################
@@ -129,7 +139,7 @@ def gelu(x):
129139
###############################################################################
130140
# Enable channels_last memory format for computer vision models
131141
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
132-
# PyTorch 1.5 introduced support for ``channels_last`` memory format for
142+
# PyTorch supports ``channels_last`` memory format for
133143
# convolutional networks. This format is meant to be used in conjunction with
134144
# `AMP <https://pytorch.org/docs/stable/amp.html>`_ to further accelerate
135145
# convolutional neural networks with
@@ -250,65 +260,6 @@ def gelu(x):
250260
#
251261
# export LD_PRELOAD=<jemalloc.so/tcmalloc.so>:$LD_PRELOAD
252262

253-
###############################################################################
254-
# Use oneDNN Graph with TorchScript for inference
255-
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
256-
# oneDNN Graph can significantly boost inference performance. It fuses some compute-intensive operations such as convolution, matmul with their neighbor operations.
257-
# In PyTorch 2.0, it is supported as a beta feature for ``Float32`` & ``BFloat16`` data-types.
258-
# oneDNN Graph receives the model’s graph and identifies candidates for operator-fusion with respect to the shape of the example input.
259-
# A model should be JIT-traced using an example input.
260-
# Speed-up would then be observed after a couple of warm-up iterations for inputs with the same shape as the example input.
261-
# The example code-snippets below are for resnet50, but they can very well be extended to use oneDNN Graph with custom models as well.
262-
263-
# Only this extra line of code is required to use oneDNN Graph
264-
torch.jit.enable_onednn_fusion(True)
265-
266-
###############################################################################
267-
# Using the oneDNN Graph API requires just one extra line of code for inference with Float32.
268-
# If you are using oneDNN Graph, please avoid calling ``torch.jit.optimize_for_inference``.
269-
270-
# sample input should be of the same shape as expected inputs
271-
sample_input = [torch.rand(32, 3, 224, 224)]
272-
# Using resnet50 from torchvision in this example for illustrative purposes,
273-
# but the line below can indeed be modified to use custom models as well.
274-
model = getattr(torchvision.models, "resnet50")().eval()
275-
# Tracing the model with example input
276-
traced_model = torch.jit.trace(model, sample_input)
277-
# Invoking torch.jit.freeze
278-
traced_model = torch.jit.freeze(traced_model)
279-
280-
###############################################################################
281-
# Once a model is JIT-traced with a sample input, it can then be used for inference after a couple of warm-up runs.
282-
283-
with torch.no_grad():
284-
# a couple of warm-up runs
285-
traced_model(*sample_input)
286-
traced_model(*sample_input)
287-
# speedup would be observed after warm-up runs
288-
traced_model(*sample_input)
289-
290-
###############################################################################
291-
# While the JIT fuser for oneDNN Graph also supports inference with ``BFloat16`` datatype,
292-
# performance benefit with oneDNN Graph is only exhibited by machines with AVX512_BF16
293-
# instruction set architecture (ISA).
294-
# The following code snippets serves as an example of using ``BFloat16`` datatype for inference with oneDNN Graph:
295-
296-
# AMP for JIT mode is enabled by default, and is divergent with its eager mode counterpart
297-
torch._C._jit_set_autocast_mode(False)
298-
299-
with torch.no_grad(), torch.cpu.amp.autocast(cache_enabled=False, dtype=torch.bfloat16):
300-
# Conv-BatchNorm folding for CNN-based Vision Models should be done with ``torch.fx.experimental.optimization.fuse`` when AMP is used
301-
import torch.fx.experimental.optimization as optimization
302-
# Please note that optimization.fuse need not be called when AMP is not used
303-
model = optimization.fuse(model)
304-
model = torch.jit.trace(model, (example_input))
305-
model = torch.jit.freeze(model)
306-
# a couple of warm-up runs
307-
model(example_input)
308-
model(example_input)
309-
# speedup would be observed in subsequent runs.
310-
model(example_input)
311-
312263

313264
###############################################################################
314265
# Train a model on CPU with PyTorch ``DistributedDataParallel``(DDP) functionality
@@ -426,9 +377,8 @@ def gelu(x):
426377
# * enable AMP
427378
#
428379
# * Introduction to Mixed Precision Training and AMP:
429-
# `video <https://www.youtube.com/watch?v=jF4-_ZK_tyc&feature=youtu.be>`_,
430380
# `slides <https://nvlabs.github.io/eccv2020-mixed-precision-tutorial/files/dusan_stosic-training-neural-networks-with-tensor-cores.pdf>`_
431-
# * native PyTorch AMP is available starting from PyTorch 1.6:
381+
# * native PyTorch AMP is available:
432382
# `documentation <https://pytorch.org/docs/stable/amp.html>`_,
433383
# `examples <https://pytorch.org/docs/stable/notes/amp_examples.html#amp-examples>`_,
434384
# `tutorial <https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html>`_

0 commit comments

Comments
 (0)