Yolov5 evaluate 發生問題

使用 yolov5 直接 export 出來的 onnx evaluate 時發生問題

opset_version 有設定為 11


dummy_input = torch.randn(1, 3, 640, 640)
model_id = 32769
version = '0001'
platform = '520'
onnx_load_path = 'yolov5_model/yolov5s.onnx'
onnx_save_path = 'yolov5_model/yolov5s-opt.onnx'
if __name__ == '__main__':
    exported_m = onnx.load(onnx_load_path)
    print('onnx model loaded')
    optimized_m = ktc.onnx_optimizer.onnx2onnx_flow(exported_m, eliminate_tail=True, opt_matmul=False, disable_fuse_bn=True)
    print('complete optimize')
    onnx.save(optimized_m, onnx_save_path)
    print('complete save model')
 
    print('Evaluation')
    km = ktc.ModelConfig(model_id, version, platform, onnx_path=onnx_save_path)
    eval_result = km.evaluate()

terminate called after throwing an instance of 'std::out_of_range'

 what(): map::at

./compilerIpevaluator_520.sh: line 32:  84 Aborted         (core dumped) $LIBS_FOLDER/compiler/compile 520 $model $TMP_FOLDER/config_compiler.json warning compile.log

Traceback (most recent call last):

 File "test_script.py", line 24, in <module>

  eval_result = km.evaluate()

 File "/workspace/miniconda/lib/python3.7/site-packages/ktc/toolchain.py", line 128, in evaluate

  subprocess.run(['./compilerIpevaluator_520.sh', input_model_path], check=True)

 File "/workspace/miniconda/lib/python3.7/subprocess.py", line 512, in run

  output=stdout, stderr=stderr)

subprocess.CalledProcessError: Command '['./compilerIpevaluator_520.sh', '/docker_mount/convert_script/yolov5_model/yolov5s-opt.onnx']' returned non-zero exit status 1.

(base) root@65a275970522:/docker_mount/convert_script# 

Tagged:

Comments

  • @johnson luo

    Hi johnson luo,

    您是否可以提供您 export 出來的 yolov5s.onnx 與 yolov5s-opt.onnx,方便找出現錯誤訊息原因。

  • edited July 2022

    有經過 ir_version = 4, opset_version = 11 並使用原 repository 之 export.py 去匯出 onnx

    (base) root@65a275970522:/workspace/libs/ONNX_Convertor/optimizer_scripts# python pytorch_exported_onnx_preprocess.py /docker_mount/convert_script/yolov5_model/yolov5s.onnx /docker_mount/convert_script/yolov5_model/yolov5s.opt.onnx 

    再使用 pytorch_exported_onnx_preprocess.py 再匯出一次

    經過程式碼匯出之 onnx
    optimized_m = ktc.onnx_optimizer.onnx2onnx_flow(exported_m, eliminate_tail=True, opt_matmul=False, disable_fuse_bn=True)
    



    程式碼

    dummy_input = torch.randn(1, 3, 640, 640)
    model_id = 32769
    version = '0001'
    platform = '520'
    onnx_load_path = 'yolov5_model/yolov5s.opt.onnx'
    onnx_save_path = 'yolov5_model/yolov5s-opt.onnx'
    if __name__ == '__main__':
        exported_m = onnx.load(onnx_load_path)
        print('onnx model loaded')
        optimized_m = ktc.onnx_optimizer.onnx2onnx_flow(exported_m, eliminate_tail=True, opt_matmul=False, disable_fuse_bn=True)
        print('complete optimize')
        onnx.save(optimized_m, onnx_save_path)
        print('complete save model')
    
        print('Evaluation')
        km = ktc.ModelConfig(model_id, version, platform, onnx_path=onnx_save_path)
        eval_result = km.evaluate()
    


    輸出結果:


  • @johnson luo

    Hi johnson,

    您提供的 yolov5s.onnx 尾端含有大量 Kneron ToolChain 上不支援的 operators,您必須把紅線後面的去除,最後再把去除的 operators 在電腦上做 postporcess。 

    額外提醒:

    如果您是使用 yolov5s ,您可以使用 Kneron AI Training Platform Model_Zoo 提供的 model,

    參考連結:http://doc.kneron.com/docs/#model_training/object_detection_yolov5/#model

  • edited July 2022

    Hi @Andy Hsieh,

    我使用您提供在 Model Zoo 裡面所提供的 yolov5s-noupsample/best.pt 但在轉換時發生錯誤, 我按照這個網站使用的 code 去操作發生下列問題

    load_path = 'yolov5s-noupsample/best.pt'
    if __name__ == '__main__':
        # Load the pth saved model
        pth_model = torch.load(load_path, map_location='cpu')
        # Export the model
        dummy_input = torch.randn(1, 3, 640, 640, device='cpu')
        torch.onnx.export(pth_model, dummy_input, 'yolo.onnx', opset_version=11)
        # Load the exported onnx model as an onnx object
        exported_m = onnx.load('yolo.onnx')
        # Optimize the exported onnx object
        result_m = ktc.onnx_optimizer.torch_exported_onnx_flow(exported_m)
        optimized_m = ktc.onnx_optimizer.onnx2onnx_flow(result_m, eliminate_tail=True, opt_matmul=False)
        # Save the onnx object optimized_m to path /data1/optimized.onnx.
        onnx.save(optimized_m, 'yolo.opt.onnx')
    


    錯誤 log

    /workspace/miniconda/lib/python3.7/site-packages/numpy/__init__.py:156: UserWarning: mkl-service package failed to import, therefore Intel(R) MKL initialization ensuring its correct out-of-the box operation under condition when Gnu OpenMP had already been loaded by Python process is not assured. Please install mkl-service package, see http://github.com/IntelPython/mkl-service

     from . import _distributor_init

    Using TensorFlow backend.

    Traceback (most recent call last):

     File "/opt/project/converter.py", line 11, in <module>

      pth_model = torch.load(load_path, map_location='cpu')

     File "/workspace/miniconda/lib/python3.7/site-packages/torch/serialization.py", line 594, in load

      return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)

     File "/workspace/miniconda/lib/python3.7/site-packages/torch/serialization.py", line 853, in _load

      result = unpickler.load()

    ModuleNotFoundError: No module named 'models'

  • 單純 load model 就會出錯了

    model = torch.load('best.pt', map_location='cpu')
    


    我查了 google 說是要跟當初匯出模型的路徑一樣,但我不知道您當初的路徑為何

  • @johnson luo

    Hi johnson,

    怎麼將 best.pt 轉出,您可以參考 Kneron ToolChain docker 內的 ai_training/detection/yolov5 的 tutorial ,

    具體位置如下:

    /workspace/ai_training/detection/yolov5/yolov5/tutorial/tutorial.ipynb

  • edited July 2022

    Hi @Andy Hsieh

    使用 export.py 可以匯出 onnx 但另外一個問題是, 我將環境內的 notebook 跑過一遍發現在 train 的流程出現錯誤,請問版本出現問題嘛?


    File "train.py", line 287, in train

    loss, loss_items = compute_loss(pred, targets.to(device), model) # loss scaled by batch_size

    File "/content/drive/MyDrive/kenron_yolov5/yolov5/yolov5/utils/loss.py", line 65, in compute_loss

    tcls, tbox, indices, anchors = build_targets(p, targets, model) # targets

    File "/content/drive/MyDrive/kenron_yolov5/yolov5/yolov5/utils/loss.py", line 174, in build_targets

    indices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1))) # image, anchor, grid indices

    RuntimeError: result type Float can't be cast to the desired output type long int

  • @johnson luo

    Hi johnson,

    造成此錯誤訊息的確有可能是版本問題所致,您可以嘗試將 requirements.txt 內的 torch>=1.7.0 改成 torch==1.7.0,

    (具體位置:/workspace/ai_training/detection/yolov5/yolov5/requirements.txt)

    並執行 pip install -r requirements.txt 安裝完成後再重新試試看。

  • Hi @Andy Hsieh

    感謝您的回覆,我使用您提供的方法 ai_training/detection/yolov5 的 tutorial 匯出 onnx 後,使用 web-gui 做 evaluate 會出現問題,但用 code 不會有問題

    使用 opt 後的 model


    running compiler and IP evaluator...

    Compiler config generated.

    [piano][warning][helper_node.cc:18][EmptyNode] Warning: creating an EmptyNode instance for op_type: ConstantOfShape

    Cmd exec error: exit status 1,

    /workspace/miniconda/lib/python3.7/site-packages/numpy/__init__.py:156: UserWarning: mkl-service package failed to import, therefore Intel(R) MKL initialization ensuring its correct out-of-the box operation under condition when Gnu OpenMP had already been loaded by Python process is not assured. Please install mkl-service package, see http://github.com/IntelPython/mkl-service

     from . import _distributor_init

    Using TensorFlow backend.

    Cannot inference all shapes. If no other error is raised, please ignore this message.

    Cannot inference all shapes. If no other error is raised, please ignore this message.

    Cannot inference all shapes. If no other error is raised, please ignore this message.

    ./compilerIpevaluator_520.sh: line 32:  988 Segmentation fault   (core dumped) $LIBS_FOLDER/compiler/compile 520 $model $TMP_FOLDER/config_compiler.json warning compile.log

    Traceback (most recent call last):

     File "scripts/onnxFlow.py", line 67, in <module>

      eval_result = km.evaluate()

     File "/workspace/miniconda/lib/python3.7/site-packages/ktc/toolchain.py", line 128, in evaluate

      subprocess.run(['./compilerIpevaluator_520.sh', input_model_path], check=True)

     File "/workspace/miniconda/lib/python3.7/subprocess.py", line 512, in run

      output=stdout, stderr=stderr)

    subprocess.CalledProcessError: Command '['./compilerIpevaluator_520.sh', '/data1/input.onnx']' returned non-zero exit status 1.

  • @johnson luo

    Hi johnson,

    您是否可以提供對於 evaluate 會出現問題的 code 與 操作步驟。因為我這邊測試出來都是可執行的。

    Code:


    Kneron Toolchain WebGUI:

  • @Andy Hsieh 我是 GUI 使用出現問題,code 沒有問題,請問您是用我提供的 model 嘛?

  • @johnson luo

    Hi johnson,

    我是使用您在下面這篇提供的 yolo.opt.zip 解壓縮出來的 model (yolo.opt.onnx)。


  • 我這次也成功了!好那 evaluate 應該可以了

  • @Andy Hsieh

    請問使用 code 來做量化的流程應該參考那一個?我自行用 gui 可以成功, 但把 web gui 的 preprocess_func 直接拿來用加上

    http://doc.kneron.com/docs/#toolchain/yolo_example/#step-5-quantization 的 code 會出錯

    log:

    Using TensorFlow backend.

    processing image: test_image10/000000000785.jpg

    processing image: test_image10/000000000885.jpg

    processing image: test_image10/000000000872.jpg

    processing image: test_image10/000000001296.jpg

    processing image: test_image10/000000005193.jpg

    processing image: test_image10/000000005001.jpg

    processing image: test_image10/000000001268.jpg

    processing image: test_image10/309_190.jpg

    processing image: test_image10/000000000139.jpg

    processing image: test_image10/000000001000.jpg

    Traceback (most recent call last):

     File "/opt/.pycharm_helpers/pydev/pydevd.py", line 1491, in _exec

      pydev_imports.execfile(file, globals, locals) # execute the script

     File "/opt/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile

      exec(compile(contents+"\n", file, 'exec'), glob, loc)

     File "/opt/project/quantization.py", line 56, in <module>

      fix_point_analysis(km, img_list=img_list)

     File "/opt/project/quantization.py", line 37, in fix_point_analysis

      bie_model_path = km.analysis({"input_1_o0": img_list})

     File "/workspace/miniconda/lib/python3.7/site-packages/ktc/toolchain.py", line 105, in analysis

      skip_verify=skip_verify

     File "/workspace/miniconda/lib/python3.7/site-packages/sys_flow/run.py", line 648, in gen_fx_model

      model_fx_report, p_model, success = run_btm_and_release()

     File "/workspace/miniconda/lib/python3.7/site-packages/sys_flow/run.py", line 586, in run_btm_and_release

      p_model = prepare_model(p_user_config)

     File "/workspace/miniconda/lib/python3.7/site-packages/sys_flow/run.py", line 514, in prepare_model

      npy2txt(np_txt, input_names, dims, p_input)

     File "/workspace/miniconda/lib/python3.7/site-packages/sys_flow/flow_utils.py", line 586, in npy2txt

      np_in_s = np_txt[input_names[i_in]]

    KeyError: 'images'

    程式碼:

    import torch
    import onnx
    import ktc
    import os
    import argparse
    
    import cv2
    import numpy as np
    
    ###  pre process function  ###
    # function name / parameter list can't be changed
    def preprocess_func(img_file_path):
        image = cv2.imread(img_file_path)
        # resize image to match model input size (this case: width 1024  height 512)
        image = cv2.resize(image, (640, 640), interpolation=cv2.INTER_LINEAR)
        # convert to numpy array
        np_data = np.array(image, dtype='float32')
        # data normalization (for OpenMMLab Kneron Edition, "pixel/256 - 0.5" )
        np_data = np_data/256.
        np_data = np_data - 0.5
    
        return np_data
    
    dummy_input = torch.randn(1, 3, 640, 640)
    model_id = 32769
    version = '0001'
    platform = '520'
    
    
    def init_km_model(onnx_load_path):
        exported_m = onnx.load(onnx_load_path)
        km = ktc.ModelConfig(model_id, version, platform, onnx_path=onnx_load_path)
        return km
    
    
    def fix_point_analysis(km, img_list):
        bie_model_path = km.analysis({"input_1_o0": img_list})
        print("\nFix point analysis done. Save bie model to '" + str(bie_model_path) + "'")
    
    
    if __name__ == '__main__':
        parser = argparse.ArgumentParser()
        parser.add_argument('--imgs_dir', type=str, default=None, help='path to image')
        parser.add_argument('--onnx_path', type=str, default=None, help='path to onnx model')
        args = parser.parse_args()
        # load and normalize all image data from folder
    
        img_list = []
        for (dir_path, _, file_names) in os.walk(args.imgs_dir):
            for f_n in file_names:
                fullpath = os.path.join(dir_path, f_n)
                print("processing image: " + fullpath)
                img_data = preprocess_func(img_file_path=fullpath)
                img_list.append(img_data)
        km = init_km_model(args.onnx_path)
        fix_point_analysis(km, img_list=img_list)
    


  • @johnson luo

    Hi johnson,

    如果您要量化的 model 是 yolo.opt.onnx ,看似您的 bie_model_path = km.analysis({"input_1_o0": img_list}) 這裡的 "input_1_o0" 錯了,需要修改成 "images"。

  • @johnson luo

    Hi johnson,

    恭喜您,很高興能幫到您。

  • edited July 2022

    @Andy Hsieh 我使用 web gui 轉出來的 nef 直接使用在 kneron_plus/python/example_model_zoo$ python KL520KnModelZooGenericInferenceYolov5.py  把 model 改成使用剛轉出的 nef 會失敗,請問中間有什麼我該注意的嘛?


    [Starting Inference Work]

     - Starting inference loop 1 times

     - - Error: inference failed, error = Error raised in function: generic_raw_inference_receive. Error code: 103. Description: ApiReturnCode.KP_FW_INFERENCE_TIMEOUT_103

  • 還有另外一點 可否請教轉出的 bie 用的 postprocess function 有可以參考的嘛?我看文件好像好像沒有針對 yolov5 的轉換過程

  • @johnson luo 

    Hi johnson,

    您可能需要提供一下您的 Kneron Plus 使用的版本與您轉出的 nef model,方便幫您確認。


    不知道您轉出的 bie 用的 postprocess function 指的是甚麼意思,但有關於 Bie model 的介紹您可以參考此連結:http://doc.kneron.com/docs/#toolchain/manual/#4-bie-workflow (4 BIE Workflow、4.1 Quantization、4.2 E2E Simulator Check (Fixed Point))

    bie 為 kneron onnx 量化加密後的 model,主要功能是要讓使用者自己來確認量化後的 accuracy 是否是可接受,如果您指的是 bie inference 的話,Kneron 沒有提供 postprocess function 因為這是使用者針對各自的應用自行編寫的。

  • edited July 2022

    @Andy Hsieh Kneron Plus 版本為 KL520

    nef model:

    # bie inference
    out_data = ktc.kneron_inference([in_data], bie_file=bie_path, input_names=["images"], radix=radix)
    


    這後面的後處理須自行處理嘛?

  • 這邊有個疑問 為什麼原本轉出的 onnx 輸入 shape 為 (1, 3, 640, 640), 轉 bie 卻變成需要 (1, 640, 640, 3)


    def fix_point_analysis(km, img_list):
        bie_model_path = km.analysis({"images": img_list})
        print("\nFix point analysis done. Save bie model to '" + str(bie_model_path) + "'")
    


    AssertionError: np input size (1, 3, 640, 640) is different from onnx input size (1, 640, 640, 3)


  • 還有推理輸出的 shape 也跟原本使用 onnx 的時候不太一樣

    inf_results = ktc.kneron_inference(input_data, onnx_file=onnx_load_path,
                                       input_names=["images"])
    

    原始的 onnx 推理 shape


  • @Andy Hsieh

    我自己先解決最近兩個 shape 不同的方法是自行 transpose preprocesspostprocess 的輸出即可,但我還是想知道為什麼 shape 不同,謝謝!

  • edited July 2022

    @Andy Hsieh

    目前用 code 編譯出 nef 檔案,但使用 http://doc.kneron.com/docs/#toolchain/manual/#5-nef-workflow 的 5.2 E2E Simulator Check (Hardware) 會出現問題,不知道是 docker 還是轉出的模型有問題,可以麻煩您幫忙看一看嘛?

    轉出的所有相關檔案

    另外,直接拿到 kneron_tool_chain 的 KL520KnModelZooGenericInferenceYolov5.py 也不能執行


    /usr/bin/python3.9 /home/boshi/kneron/kneron_plus_v1.3.0/kneron_plus/python/example_model_zoo/KL520KnModelZooGenericInferenceYolov5.py

    [Connect Device]

     - Success

    [Set Device Timeout]

     - Success

    [Upload Firmware]

     - Success

    [Upload Model]

     - Success

    [Read Image]

     - Success

    [Starting Inference Work]

     - Starting inference loop 1 times

     - - Error: inference failed, error = Error raised in function: generic_raw_inference_receive. Error code: 103. Description: ApiReturnCode.KP_FW_INFERENCE_TIMEOUT_103

  • edited July 2022

    @johnson luo

    Hi johnson,

    我這邊做 5.2 E2E Simulator Check (Hardware) 也會出現問題,目前正在幫您確認,有消息後會再通知您。



    Kneron Tool Chain 是 channel last,shape 不同的原因是為了後續符合 Kneron 硬體架構。

  • edited July 2022

    @johnson luo

    Hi johnson,

    這裡幫您確認過後,發現您使用的 onnx model (yolo.opt.onnx) 內含一些 sigmoid operators,KL520 不支持 sigmoid,如果您需要支援 sigmoid 您可以使用 KL720。

    使用 KL520,您須將 sigmoid 去除 (如下圖),去除後您應該就可以順利使用 5.2 E2E Simulator Check (Hardware),得到 inference output 後,您可以將 inference output 做回 sigmoid。

    參考連結:

    https://doc.kneron.com/docs/#toolchain/manual/ (2.3 Supported operators Table 1.1、Table1.2)

    https://doc.kneron.com/docs/#toolchain/converters/#7-model-editor (7 Model Editor)

The discussion has been closed due to inactivity. To continue with the topic, please feel free to post a new discussion.