Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 |
Tags
- groupby
- ndarray
- type hints
- VSCode
- python 문법
- seaborn
- namedtuple
- 표집분포
- Numpy
- 최대가능도 추정법
- Python 유래
- 딥러닝
- Comparisons
- subplot
- Numpy data I/O
- Array operations
- scatter
- 정규분포 MLE
- BOXPLOT
- Python
- Operation function
- dtype
- unstack
- 가능도
- pivot table
- linalg
- boolean & fancy index
- Python 특징
- 부스트캠프 AI테크
- 카테고리분포 MLE
Archives
- Today
- Total
또르르's 개발 Story
[37-2] PyTorch profiler 본문
PyTorch profiler는 컴퓨팅 자원의 사용량 모니터링할 수 있는 모듈입니다.
reference: https://pytorch.org/tutorials/recipes/recipes/profiler.html
https://pytorch.org/tutorials/recipes/recipes/profiler.html
pytorch.org
1️⃣ 설정
필요한 모듈을 불러옵니다.
import torch
import torchvision.models as models
import torch.autograd.profiler as profiler
Pre-trained 된 모델을 불러오고, random값으로 구성된 input을 만듭니다.
model = models.resnet18()
inputs = torch.randn(5, 3, 224, 224)
2️⃣ Profiler 사용하기
with 함수를PyTorch profiler는 컴퓨팅 자원의 사용량 모니터링할 수 있는 모듈입니다.
with profiler.profile(record_shapes=True) as prof:
with profiler.record_function("model_inference"):
model(inputs)
3️⃣ Profiler 출력하기
Profiler를 table형태로 출력할 수 있습니다.
Table에는 각 연산들을 수행할 때마다 CPU 사용량, Call된 횟수 등이 출력됩니다.
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))
--------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
--------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
model_inference 2.36% 15.300ms 99.99% 648.336ms 648.336ms 1
aten::conv2d 0.04% 276.336us 67.02% 434.575ms 21.729ms 20
aten::convolution 0.04% 266.254us 66.98% 434.299ms 21.715ms 20
aten::_convolution 0.09% 598.371us 66.94% 434.033ms 21.702ms 20
aten::mkldnn_convolution 66.75% 432.800ms 66.85% 433.434ms 21.672ms 20
aten::batch_norm 0.52% 3.371ms 21.80% 141.382ms 7.069ms 20
aten::_batch_norm_impl_index 0.05% 355.960us 21.28% 138.011ms 6.901ms 20
aten::native_batch_norm 13.10% 84.940ms 21.22% 137.617ms 6.881ms 20
aten::select 6.84% 44.368ms 8.07% 52.303ms 3.632us 14400
aten::max_pool2d 0.00% 15.861us 6.98% 45.273ms 45.273ms 1
--------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 648.402ms
prof.key_averages에 group_by_input_shape=True로 넣게 되면 CPU 결과를 더 세분화하고 Input shape을 표기합니다.
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_time_total", row_limit=10))
--------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ---------------------------------------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls Input Shapes
--------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ---------------------------------------------
model_inference 2.36% 15.300ms 99.99% 648.336ms 648.336ms 1 []
aten::conv2d 0.01% 48.901us 15.43% 100.068ms 25.017ms 4 [[5, 64, 56, 56], [64, 64, 3, 3], [], [], [],
aten::convolution 0.01% 67.531us 15.43% 100.019ms 25.005ms 4 [[5, 64, 56, 56], [64, 64, 3, 3], [], [], [],
aten::_convolution 0.01% 91.548us 15.42% 99.952ms 24.988ms 4 [[5, 64, 56, 56], [64, 64, 3, 3], [], [], [],
aten::mkldnn_convolution 15.39% 99.763ms 15.40% 99.860ms 24.965ms 4 [[5, 64, 56, 56], [64, 64, 3, 3], [], [], [],
aten::conv2d 0.01% 46.451us 14.08% 91.319ms 30.440ms 3 [[5, 512, 7, 7], [512, 512, 3, 3], [], [], []
aten::convolution 0.00% 27.435us 14.08% 91.273ms 30.424ms 3 [[5, 512, 7, 7], [512, 512, 3, 3], [], [], []
aten::_convolution 0.01% 64.204us 14.07% 91.245ms 30.415ms 3 [[5, 512, 7, 7], [512, 512, 3, 3], [], [], []
aten::mkldnn_convolution 14.05% 91.119ms 14.06% 91.181ms 30.394ms 3 [[5, 512, 7, 7], [512, 512, 3, 3], [], [], []
aten::conv2d 0.01% 32.570us 10.56% 68.478ms 22.826ms 3 [[5, 128, 28, 28], [128, 128, 3, 3], [], [],
--------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ---------------------------------------------
Self CPU time total: 648.402ms
profiler에서 profile_memory=True는 각 연산들에 CPU가 얼마다 할당되었는지 (CPU Mem)를 보여줍니다.
여기서 "Self" memory들은 다른 연산들의 child로 들어간 연산을 제외한, 자기 자신한테 할당된 memory만을 보여줍니다.
with profiler.profile(profile_memory=True, record_shapes=True) as prof:
model(inputs)
print(prof.key_averages().table(sort_by="self_cpu_memory_usage", row_limit=10))
--------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem # of Calls
--------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
aten::empty 0.13% 735.252us 0.13% 735.252us 7.070us 94.79 Mb 94.79 Mb 104
aten::resize_ 0.00% 17.308us 0.00% 17.308us 8.654us 11.48 Mb 11.48 Mb 2
aten::addmm 0.09% 495.177us 0.09% 511.509us 511.509us 19.53 Kb 19.53 Kb 1
aten::add 0.09% 526.031us 0.09% 526.031us 26.302us 160 b 160 b 20
aten::empty_strided 0.00% 4.623us 0.00% 4.623us 4.623us 4 b 4 b 1
aten::conv2d 0.05% 257.399us 65.56% 368.116ms 18.406ms 47.37 Mb 0 b 20
aten::convolution 0.04% 207.070us 65.51% 367.859ms 18.393ms 47.37 Mb 0 b 20
aten::_convolution 0.08% 435.428us 65.48% 367.652ms 18.383ms 47.37 Mb 0 b 20
aten::mkldnn_convolution 65.33% 366.804ms 65.40% 367.216ms 18.361ms 47.37 Mb 0 b 20
aten::as_strided_ 0.02% 136.166us 0.02% 136.166us 6.808us 0 b 0 b 20
--------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 561.502ms
'부스트캠프 AI 테크 U stage > 실습' 카테고리의 다른 글
[38-3] Pruning using PyTorch (0) | 2021.03.18 |
---|---|
[38-2] Python 병렬 Processing (0) | 2021.03.17 |
[36-1] Model Conversion (0) | 2021.03.16 |
[34-4] Hourglass Network using PyTorch (0) | 2021.03.12 |
[34-3] CNN Visualization using VGG11 (0) | 2021.03.12 |
Comments