Kneron Model Zoo Classification Model 與 ImageNet DataSet Accuracy 問題

GigaHsu · May 2022

(Q1) 下載 Kneron Model Zoo上 Classification onnx 轉成 nef 後, 測試 ImageNet inference 5萬張jpg圖片

可成功跑完測試流程, 但 Accuracy 不如預期?

請幫忙看一下轉換流程, 哪裡參數設定有問題 ? (因為用這組參數轉 Keras MobileNetV2.h5 可得正常Accuracy)

def preprocess(pil_img):

image = tf.io.read_file(pil_img)

image = tf.io.decode_jpeg(image, channels=3)

image = tf.image.resize(image, [224, 224])

image = tf.cast(image / 255., tf.float32)

array = tf.Session().run(image)

return array

m = onnx.load("/data1/benchmark_model/MobileNetV2.onnx")

m = ktc.onnx_optimizer.onnx2onnx_flow(m, eliminate_tail=True, opt_matmul=False)

onnx.save(m,'MobileNetV2_kneron_opt.onnx')

或者可以提供Model Zoo這兩個 MobleNetV2 & Resnet50 onnx 的正確轉換 python sample code ?

(Q2) Resnet50 從網站download resnet50 pb轉成 32-bit tflite 再轉 onnx 會出現問題 (有去掉ArgMax)

https://github.com/mlcommons/inference/tree/master/vision/classification_and_detection

bottom_nodes = []

bottom_nodes.append("ArgMax")

m = ktc.onnx_optimizer.tflite2onnx_flow("/data1/benchmark_model/resnet50_v1.tflite", False, bottom_nodes)

m = ktc.onnx_optimizer.onnx2onnx_flow(m, eliminate_tail=True, opt_matmul=False)

onnx.save(m,'resnet50_v1_opt.tflite.onnx')

(base) root@fa228aeb0eb5:/data1/benchmark_model# python3 resnet50_v1_tflite_to_nef.py

/workspace/miniconda/lib/python3.7/site-packages/numpy/__init__.py:156: UserWarning: mkl-service package failed to import, therefore Intel(R) MKL initialization ensuring its correct out-of-the box operation under condition when Gnu OpenMP had already been loaded by Python process is not assured. Please install mkl-service package, see http://github.com/IntelPython/mkl-service

from . import _distributor_init

Using TensorFlow backend.

found unsupported op type ARG_MAX, if the node is at buttom, we recommend you use "-bottom_nodes" to delete this node

New Qunatized information saved

Traceback (most recent call last):

File "resnet50_v1_tflite_to_nef.py", line 78, in <module>

m = ktc.onnx_optimizer.tflite2onnx_flow("/data1/benchmark_model/resnet50_v1.tflite", True, bottom_nodes)

File "/workspace/miniconda/lib/python3.7/site-packages/ktc/onnx_optimizer.py", line 308, in tflite2onnx_flow

return tflite2onnx.main(tflite_path, '/tmp/tflite_converted.onnx', not release_mode, bottom_nodes)

File "/workspace/libs/ONNX_Convertor/tflite-onnx/onnx_tflite/tflite2onnx.py", line 379, in main

out_value_info = set_end_node(b_node.node_list[-1], b_node.node_output_shape.tolist())

IndexError: list index out of range

(base) root@fa228aeb0eb5:/data1/benchmark_model#

resnet50_v1.zip

可否協助提供這個ResNet50 tflite 32-bit 正確的轉換方式?

thanks,

Giga

Andy Hsieh · May 2022

@GigaHsu

Hi Giga,

關於影響推論的結果 (Accuracy) 的可能性很多，您可以先檢查下面幾點：

模型本身以及您訓練模型的 datasets 是否符合您想推論的東西，或是訓練模型時的參數。
量化時的影響，用來做量化分析的圖片是否皆與訓練模型或是推論場景有關? 且圖庫中應盡可能的涵蓋各種使用場景的圖片。
增加量化用的圖片。

關於 MobleNetV2 & Resnet50 onnx 正確轉換成 nef 的 python sample code，您可以參考這個連結：

http://doc.kneron.com/docs/#toolchain/manual/

關於第二個問題，看到您的錯誤訊息，事實上您的 tflite 沒有成功去掉 ArgMax 這個 op type，所以才會無法導致轉成 keron onnx，你再去除 ArgMax 這個 type 的時候可以考慮試試使用 -bottom_nodes resnet_model/dense/BiasAdd，應該就可以成功轉出 onnx 了。

GigaHsu · June 2022

Hi Andy,

你建議設定 bottom_nodes.append("resnet_model/dense/BiasAdd")

可以成功轉出 .nef 檔案可跑, 只是Accuracy 非常差只有 0.1 %.

Q1. 想再問一下 Accuracy 的問題.

你說的, 大概Pow都有跟我說過,都能理解.

只是 MobileNetV1/MobileNetV2/Resnet50 都是使用相同ImageNet dataset 5萬張

有些模型 Accuracy 就不好,有些就能接受(> 68%) 我也理解轉換 8-bit tflite 需要輸入照片越多越好,

找到max/min值,以便得到更好的8-bit model的zero-point與scale參數, 所以我excel有列10張跟100張的Accuracy區別.

想問一下MobileNetV1/MobileNetV2/Resnet50 這3個Models 正確的def preprocess(pil_img): 是否相同或不同?

同樣設定,卻跑出不同的Accuracy. 如果是你們來轉, def preprocess(pil_img) 使用ImageNet dataset 又是如何撰寫才是正確的作法?

thanks,

Giga

Andy Hsieh · June 2022

@GigaHsu

Hi GigaHsu,

您成功轉出的 .nef 檔案是去除 ArgMax 這個 type 的 model，不知道您推論之前是否有再做後處理 ( ArgMax ) 回去，做完之後 Accuracy 應該不會只有 0.1 %，因為不知道您的 Accuracy 是怎麼計算的，如果您方便的話是否可以提供計算 Accuracy 的 Code 與相關檔案。

基本上每個 Model 會有相對應的 preprocess ，這個 preprocess 是因為要將 data 轉換成模型需要的格式，送進模型推論所需要的 preprocess，所以與模型的架構沒有直接的關聯性，比較有關聯的會是在訓練模型時的一些設定。

會有同樣的設定，跑出不同的 Accuracy 也有可能是因為訓練模型時參數的設定、訓練模型的 datasets 導致，preprocess 如何撰寫會依個人需求而有所改變，所以沒有一定的標準答案，但如果您是使用耐能的架構，那範例中的 preprocess 就可以參考。

GigaHsu · June 2022

Hi Andy,

我簡化問題到10張圖計算Accuracy. 方便你分析問題.

.tgz file 放入兩個tflite model (mobilenetv1, resnet50v1)

進入Kneron docker 後, copy tgz 至 /data1 folder 解開.

(1) mobilenetv1_tflite_to_nef.py

執行結果 : mobilenetv1_tflite_to_nef_ok.txt Accuracy = 60 %

(2) resnet50_v1_tflite_to_nef.py

執行結果 : resnet50_v1_tflite_to_nef_failed.txt 執行結果 Accuracy = 0 %

有空的話, 協助我看一下 Resnet50v1 在 KL720 上 accuracy issue : resnet50v1轉nef的code需要如何修正.

resnet50_v1.tgz

thanks,

Giga

Andy Hsieh · June 2022

@GigaHsu

Hi GigaHsu,

您的 Resnet50v1.tflite model 是量化過的 model ，這邊建議使用未量化過的 float model。

GigaHsu · June 2022

Hi Andy,

轉pb的時候有去掉 converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

改用 converter.optimizations = [tf.lite.Optimize.DEFAULT].

(1) resnet50_v1.tflite.png (有用Netron檢查Conv2D是float32)

(2) resnet50_v1.quant.tflite.png (使用converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]的結果 Conv2D是int8)

Q1 : .tgz 附的是 (1) resnet50_v1.tflite, Accuracy不好是否跟去掉上面Pad 有關? call tflite2onnx_flow() 第二個參數設True.

m = ktc.onnx_optimizer.tflite2onnx_flow("/data1/resnet50_v1/resnet50_v1.tflite", True, bottom_nodes)

否則會轉不出來.nef. 會說input格式有問題.

Q2 : 你說resnet50_v1.tflite是量化過的 model, 可否用Netron指出位置. 我看一下.

請問如果是量化過的Model 也能成功轉出.nef 嗎? 因為resnet50_v1.tflite 可以轉出.nef 可跑,只是Accuracy不好.

thanks,

Giga

Andy Hsieh · June 2022

@GigaHsu

Hi GigaHsu,

Q1.

這裡有幫您測試過，轉出來的模型有沒有 Pad 都是有相同的行為。

所以我認為 Accuracy 不好應該是跟有沒有 Pad 沒有關係的。

至於 call tflite2onnx_flow() 第二個參數設定 True 的原因是將 model input shape 轉成 channel last，Kneron Tool Chain 所需要的 input shape。

Q2.

Netron 指出的位置如圖，點開 filter 的時候可以看得出這個 model 有沒有被量化過。

目前對於量化過的模型的轉換功能還在初期，所以會建議 tflite model 必須是未量化過的。

這篇給您參考：https://www.kneron.com/forum/discussion/comment/594#Comment_594

GigaHsu · June 2022

Hi Andy,

轉tflite 去掉 converter.optimizations = [tf.lite.Optimize.DEFAULT]

只保留 tflite_quant_model = converter.convert()

可得 filter 為 float 格式 :

但執行 resnet50_v1_tflite_to_nef.py

結果 Accuracy = 0 % 有空再幫忙看原因為何?

resnet50_v1.tar.01.gz

resnet50_v1.tar.00.gz

gigahsu@gigahsu-Z390F:~/resnet50_v1/tmp$ ls

resnet50_v1.tar.00.gz resnet50_v1.tar.01.gz

gigahsu@gigahsu-Z390F:~/resnet50_v1/tmp$ cat resnet50_v1.tar.* | tar -zxv

resnet50_v1.tflite

gigahsu@gigahsu-Z390F:~/resnet50_v1/tmp$ ls

resnet50_v1.tar.00.gz resnet50_v1.tar.01.gz resnet50_v1.tflite

thanks,

Giga

Andy Hsieh · June 2022

@GigaHsu

Hi GigaHsu,

看過您的 model 也對您的 model 做過 10 張的量化是可以順利的轉成 nef 的。

我認為有 Accuracy = 0 % 這個情況可能是 .tflite 模型本身的問題，建議您確認一下你的模型。

Kneron Model Zoo Classification Model 與 ImageNet DataSet Accuracy 問題

Comments