또르르's 개발 Story

[37-2] PyTorch profiler 본문

부스트캠프 AI 테크 U stage/실습

[37-2] PyTorch profiler

또르르21 2021. 3. 17. 00:46

PyTorch profiler컴퓨팅 자원의 사용량 모니터링할 수 있는 모듈입니다.

 

reference: https://pytorch.org/tutorials/recipes/recipes/profiler.html

 

https://pytorch.org/tutorials/recipes/recipes/profiler.html

 

pytorch.org

 

1️⃣ 설정

 

필요한 모듈을 불러옵니다.

import torch

import torchvision.models as models

import torch.autograd.profiler as profiler

 

 

Pre-trained 된 모델을 불러오고, random값으로 구성된 input을 만듭니다.

model = models.resnet18()

inputs = torch.randn(5, 3, 224, 224)

 

2️⃣ Profiler 사용하기

 

with 함수를PyTorch profiler는 컴퓨팅 자원의 사용량 모니터링할 수 있는 모듈입니다.

with profiler.profile(record_shapes=True) as prof:

    with profiler.record_function("model_inference"):
    
        model(inputs)

 

 

3️⃣ Profiler 출력하기

 

Profiler를 table형태로 출력할 수 있습니다.

Table에는 각 연산들을 수행할 때마다 CPU 사용량, Call된 횟수 등이 출력됩니다.

print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))
---------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                             Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls  
---------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                  model_inference         2.36%      15.300ms        99.99%     648.336ms     648.336ms             1  
                     aten::conv2d         0.04%     276.336us        67.02%     434.575ms      21.729ms            20  
                aten::convolution         0.04%     266.254us        66.98%     434.299ms      21.715ms            20  
               aten::_convolution         0.09%     598.371us        66.94%     434.033ms      21.702ms            20  
         aten::mkldnn_convolution        66.75%     432.800ms        66.85%     433.434ms      21.672ms            20  
                 aten::batch_norm         0.52%       3.371ms        21.80%     141.382ms       7.069ms            20  
     aten::_batch_norm_impl_index         0.05%     355.960us        21.28%     138.011ms       6.901ms            20  
          aten::native_batch_norm        13.10%      84.940ms        21.22%     137.617ms       6.881ms            20  
                     aten::select         6.84%      44.368ms         8.07%      52.303ms       3.632us         14400  
                 aten::max_pool2d         0.00%      15.861us         6.98%      45.273ms      45.273ms             1  
---------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 648.402ms

prof.key_averages에 group_by_input_shape=True로 넣게 되면 CPU 결과를 더 세분화하고 Input shape을 표기합니다.

print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_time_total", row_limit=10))
---------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ---------------------------------------------  
                             Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls                                   Input Shapes  
---------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ---------------------------------------------  
                  model_inference         2.36%      15.300ms        99.99%     648.336ms     648.336ms             1                                             []  
                     aten::conv2d         0.01%      48.901us        15.43%     100.068ms      25.017ms             4  [[5, 64, 56, 56], [64, 64, 3, 3], [], [], [],  
                aten::convolution         0.01%      67.531us        15.43%     100.019ms      25.005ms             4  [[5, 64, 56, 56], [64, 64, 3, 3], [], [], [],  
               aten::_convolution         0.01%      91.548us        15.42%      99.952ms      24.988ms             4  [[5, 64, 56, 56], [64, 64, 3, 3], [], [], [],  
         aten::mkldnn_convolution        15.39%      99.763ms        15.40%      99.860ms      24.965ms             4  [[5, 64, 56, 56], [64, 64, 3, 3], [], [], [],  
                     aten::conv2d         0.01%      46.451us        14.08%      91.319ms      30.440ms             3  [[5, 512, 7, 7], [512, 512, 3, 3], [], [], []  
                aten::convolution         0.00%      27.435us        14.08%      91.273ms      30.424ms             3  [[5, 512, 7, 7], [512, 512, 3, 3], [], [], []  
               aten::_convolution         0.01%      64.204us        14.07%      91.245ms      30.415ms             3  [[5, 512, 7, 7], [512, 512, 3, 3], [], [], []  
         aten::mkldnn_convolution        14.05%      91.119ms        14.06%      91.181ms      30.394ms             3  [[5, 512, 7, 7], [512, 512, 3, 3], [], [], []  
                     aten::conv2d         0.01%      32.570us        10.56%      68.478ms      22.826ms             3  [[5, 128, 28, 28], [128, 128, 3, 3], [], [],   
---------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ---------------------------------------------  
Self CPU time total: 648.402ms

 

profiler에서 profile_memory=True는 각 연산들에 CPU가 얼마다 할당되었는지 (CPU Mem)를 보여줍니다.

여기서 "Self" memory들은 다른 연산들의 child로 들어간 연산을 제외한, 자기 자신한테 할당된 memory만을 보여줍니다. 

with profiler.profile(profile_memory=True, record_shapes=True) as prof:

    model(inputs)
    

print(prof.key_averages().table(sort_by="self_cpu_memory_usage", row_limit=10))
---------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                             Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem    # of Calls  
---------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                      aten::empty         0.13%     735.252us         0.13%     735.252us       7.070us      94.79 Mb      94.79 Mb           104  
                    aten::resize_         0.00%      17.308us         0.00%      17.308us       8.654us      11.48 Mb      11.48 Mb             2  
                      aten::addmm         0.09%     495.177us         0.09%     511.509us     511.509us      19.53 Kb      19.53 Kb             1  
                        aten::add         0.09%     526.031us         0.09%     526.031us      26.302us         160 b         160 b            20  
              aten::empty_strided         0.00%       4.623us         0.00%       4.623us       4.623us           4 b           4 b             1  
                     aten::conv2d         0.05%     257.399us        65.56%     368.116ms      18.406ms      47.37 Mb           0 b            20  
                aten::convolution         0.04%     207.070us        65.51%     367.859ms      18.393ms      47.37 Mb           0 b            20  
               aten::_convolution         0.08%     435.428us        65.48%     367.652ms      18.383ms      47.37 Mb           0 b            20  
         aten::mkldnn_convolution        65.33%     366.804ms        65.40%     367.216ms      18.361ms      47.37 Mb           0 b            20  
                aten::as_strided_         0.02%     136.166us         0.02%     136.166us       6.808us           0 b           0 b            20  
---------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 561.502ms
Comments