4-channel Model Inference with KL720

Mason · March 2022

https://www.kneron.com/forum/discussion/190/inference-with-kl720

We are starting a new discussion since the previous one is marked as closed

We compiled our 4-ch model again, but with that new NEF file we still get timeout errors on KL720. This time we use the "generic_raw_inference_bypass_pre_proc_send" API as you suggested and fix the Input Width Alignment issue. We didn't encounter errors during conversion/compilation, same as previous attempt. According to the discussion before, you said that our model was "stuck in the NPU". Could you please explain what might be the cause?

There's another question (less important). We compiled a 5-ch model before knowing that models should have <= 4 channels, and also got timeout errors. But it accepts 1-ch, 2-ch, 3-ch, 4-ch input after we fixed the Input Width Alignment issue. So we are wondering why this model, or KL720, behaves this way.

Maria Chen · March 2022

Hi Mason,

Could you check the ioinfo.csv file for the NPU input format? Usually, when there are 4 channels, it should be able to run properly with the input 4W4C8B, and 1W16C8B would be more likely to cause errors.

If the NPU input was 4W4C8B, you could use the "bypass_pre_proc_send" for inferencing. (The pre-processing used in generic inferece is usually for images, so going through that with something that's not an image might change the input.)

If the input was 4W4C8B but the inferencing still didn't work, could you provide us with your .onnx and .nef files (with 4 channels) so we could check where the root cause is?

Edit: You might also need to do some pre-processing for your input depending on the normalization you use.

Mason · March 2022

Thanks for your reply. We've checked "ioinfo.csv" and it is "4W4C8B", so there might be problems somewhere else. Could you please provide a cloud storage link for us to upload model files again?

Btw, because there could be something we missed in the test code, we'd like to leave some of it here. Please kindly help us to check it.

Data preparation

    width_aligned = 4 * math.ceil(model_nef_descriptor.models[0].width / 4.0)
    model_input_width = model_nef_descriptor.models[0].width
    model_input_height = model_nef_descriptor.models[0].height
    model_input_channel = model_nef_descriptor.models[0].channel
  
    in_data = np.random.randint(low=0, high=255, size=(1, 1000, 4), dtype=np.uint8)
    in_data_norm = in_data - 128
    in_data_aligned = np.ones((width_aligned, model_input_height, 4), dtype=np.uint8) * 255
    in_data_aligned[:in_data_norm.shape[0], :in_data_norm.shape[1], :in_data_norm.shape[2]] = in_data_norm

    in_data_aligned_buffer = in_data_aligned.tobytes()

Inference command

    generic_raw_image_header = kp.GenericRawBypassPreProcImageHeader(
        model_id=model_nef_descriptor.models[0].id,
        image_buffer_size=len(in_data_aligned_buffer),
        inference_number=0
    )
    kp.inference.generic_raw_inference_bypass_pre_proc_send(
        device_group=device_group,
        generic_raw_image_header=generic_raw_image_header,
        image_buffer=in_data_aligned_buffer
    )

    generic_raw_result = kp.inference.generic_raw_inference_bypass_pre_proc_receive(
       device_group=device_group,
        generic_raw_image_header=generic_raw_image_header,
        model_nef_descriptor=model_nef_descriptor
    )

Maria Chen · March 2022

Hi Mason,

Thank you for your information. Could you upload your files onto this sharepoint folder via the below link? We'll perform testing on our side as well.

20220308

Please let us know if you couldn't upload your files here.

Mason · March 2022

Hi Maria,

We've uploaded our model files (ioinfo.csv as well for reference). Hopefully the following tests will lead us to the root cause.

Thank you for your help.

Maria Chen · March 2022

Hi Mason,

We got your files, and our team found out that the inference flow is stuck in the NPU again. Investigating the root cause inside the NPU would take some time. We'll let you know once we've figured out the cause, thank you for your patience!

Mason · March 2022

Hi Maria,

Is there any update?

Maria Chen · March 2022

Hi Mason,

Thank you for waiting. According to the hardware team, it seems like one of the model's layers got stuck in the NPU, so the toolchain team will be fixing that bug. You could wait for the updated toolchain to be released, or use your previous model with 5 channels bypassing pre-processing.

Mason · March 2022

Hi Maria,

Talking about the 5-channel model, we don't think that we can use it without figuring out why there's a mismatch between input data dimension and model spec. As mentioned before, we'd expect a 5-channel model accepts 5-channel data as input. But in fact, it accepts 1, 2, 3, and 4-channel data instead. Thus we don't think that the output has the same meaning as what we got by running source model on a computer. Do you have any idea about why would the model/KL720 behave this way?

*By "accepting data", we mean we can get result response from KL720 with that data and no error occurs

Maria Chen · March 2022

Hi Mason,

By "accepting 1, 2, 3, and 4-channel data," could you explain more about it, such as how you noticed that it didn't accept the data in the 5th channel?

Also, if the model only accepted the first 4 channel's data, then the problem might have been caused in the toolchain. We'll need to look more into it, so if it's okay with you, could you provide the model, the full source code, and the inference data in the same sharepoint folder again so we could replicate the issue? Thank you for your help.

Mason · March 2022

Hi Maria,

Thanks for the reply.

The model doesn't accept "input data which has 5-channels", not "the 5-th channel of input data".

Let's say we have 5 set of input data, which are a) W * H * 1, b) W * H * 2, c) W * H * 3, d) W * H * 4, e) W * H * 5, namely data have different channel counts. And we have a model which is trained with many W * H * 5 data and then compiled into NEF. We used the code posted earlier in this thread to feed a,b,c,d,e into this model and the results are:

- Input (e) raises timeout error (N7 or 103) every times we run "kp.inference.generic_raw_inference_bypass_pre_proc_receive"

- Input (a)(b)(c)(e) go through "generic_raw_inference_bypass_pre_proc_receive" without exception and we can get an inference result with "kp.inference.generic_inference_retrieve_float_node"

We've uploaded model to the sharepoint.

Codes again (all modified from example) :

Data Preparation (NCH, in the middle paragraph, is for changing input data channel count)

  width_aligned = 4 * math.ceil(model_nef_descriptor.models[0].width / 4.0)
  model_input_width = model_nef_descriptor.models[0].width
  model_input_height = model_nef_descriptor.models[0].height
  model_input_channel = model_nef_descriptor.models[0].channel
  
  in_data = np.random.randint(low=0, high=255, size=(1, 1000, NCH), dtype=np.uint8)
  in_data_norm = in_data - 128
  in_data_aligned = np.ones((width_aligned, model_input_height, NCH), dtype=np.uint8) * 255
  in_data_aligned[:in_data_norm.shape[0], :in_data_norm.shape[1], :in_data_norm.shape[2]] = in_data_norm
 
  in_data_aligned_buffer = in_data_aligned.tobytes()

Inference

  generic_raw_image_header = kp.GenericRawBypassPreProcImageHeader(
    model_id=model_nef_descriptor.models[0].id,
    image_buffer_size=len(in_data_aligned_buffer),
    inference_number=0
  )
  kp.inference.generic_raw_inference_bypass_pre_proc_send(
    device_group=device_group,
    generic_raw_image_header=generic_raw_image_header,
    image_buffer=in_data_aligned_buffer
  )
 
  generic_raw_result = kp.inference.generic_raw_inference_bypass_pre_proc_receive(
    device_group=device_group,
    generic_raw_image_header=generic_raw_image_header,
    model_nef_descriptor=model_nef_descriptor
  )

Getting Results

  inf_node_output_list = []
  for node_idx in range(generic_raw_result.header.num_output_node):
    inference_float_node_output = kp.inference.generic_inference_retrieve_float_node(
      node_idx=node_idx,
      generic_raw_result=generic_raw_result,
      channels_ordering=kp.ChannelOrdering.KP_CHANNEL_ORDERING_CHW)
    inf_node_output_list.append(inference_float_node_output)

Maria Chen · March 2022

Hi Mason,

Thank you for providing the information. Our team ran your code, and we think you could change your alignment for the inference. (Please refer to the code below)

The reason why the timeout error occurred is that the NPU input should be 1W16C8B for the 5-channel model, but the alignment in the code was in the 4W4C8B format. Trying to fit the data of 4*5*1000 (20000) inside 1*16*1000 (16000) would cause an error, since some data would fit but some others wouldn't.

Therefore, you could try adjusting the alignment according to the format.

import math
import numpy as np
import enum

class NPU_LAYOUT(enum):
    FMT_1W16C8B = 0,
    FMT_4W4C8B = 1
"""
FMT_1W16C8B
W align = 1
C align = 16
FMT_4W4C8B
W align = 4
C align = 4
"""
model_input_width = model_nef_descriptor.models[0].width
model_input_height = model_nef_descriptor.models[0].height
model_input_channel = model_nef_descriptor.models[0].channel
NCH = model_input_channel
npu_layout = NPU_LAYOUT.FMT_1W16C8B

if npu_layout == NPU_LAYOUT.FMT_1W16C8B:
    width_aligned = 1 * math.ceil(model_nef_descriptor.models[0].width / 1.0)
    channel_aligned = 16 * math.ceil(model_nef_descriptor.models[0].channel / 16.0)
elif npu_layout == NPU_LAYOUT.FMT_4W4C8B:
    width_aligned = 4 * math.ceil(model_nef_descriptor.models[0].width / 4.0)
    channel_aligned = 4 * math.ceil(model_nef_descriptor.models[0].channel / 4.0)

in_data = np.random.randint(low=0, high=255, size=(1, 1000, NCH), dtype=np.uint8)
in_data_norm = in_data - 128
in_data_aligned = np.ones((width_aligned, model_input_height, channel_aligned), dtype=np.uint8) * 255
in_data_aligned[:in_data_norm.shape[0], :in_data_norm.shape[1], :in_data_norm.shape[2]] = in_data_norm

img_buffer = in_data_aligned.tobytes()

Also, please make sure that for the model with 5 channels, the NPU input is 1W16C8B.

Mason · March 2022

Hi Maria,

Thanks for your suggestion. I finally managed to have KL720 output inference result of the 5-channel model.

Btw, I'd like to ask when will the toolchain update be released?

Maria Chen · March 2022

Hi Mason,

That's good to hear!

We don't have the exact schedule for the toolchain update, but our teams are working on it.

4-channel Model Inference with KL720

Comments