Radix/Scaling Clarification for Float-to-INT8 Tensor Transfer Between Models

Nimesh · May 15

We are facing an issue related to radix/scaling between two models in our pipeline.

Currently:

The output tensor from the first model is in float format.
The next classification model expects INT8 quantized input.

Because of this, we need to apply a proper scale factor/radix conversion before passing the tensor to the next model.

Our questions are:

On what basis should we determine the scale factor/radix value for converting float output to INT8 input?
Is there any standard or automated process in the Kneron pipeline for handling this conversion between models?
How should the radix values mentioned in the model JSON files actually be interpreted and applied during tensor transfer?

We discussed this internally, and it seems that in some cases the pipeline works even without explicit quantization, while in other cases radix scaling is required. We currently do not understand what differentiates these cases.

We also checked the input/output radix values mentioned in the model JSON files, but directly applying those radix values is not working correctly for our use case.

Our pipeline is:

Pose Estimation Model → Classification Model

Any clarification, recommended workflow, or examples regarding radix/scaling handling between chained models would be very helpful.

Thanks.

Maria Chen · May 18

Hi Nimesh,

To connect 2 models, you could write a postprocess function for your first model to convert its output into a fitting input for your second model.

In KL730, the radix and scale works like this in dequantization:

You can find more information in Kneron Documentation Center: Convert ONNX & NPU Data on the KL730 Platform - Document Center

Also, for your second model, does it need INT8 input format from when it's an onnx model already? When you trained your first and second models, how did you connect their outputs and inputs?

Radix/Scaling Clarification for Float-to-INT8 Tensor Transfer Between Models

Comments