4-channel Model Inference with KL720

https://www.kneron.com/forum/discussion/190/inference-with-kl720

We are starting a new discussion since the previous one is marked as closed


We compiled our 4-ch model again, but with that new NEF file we still get timeout errors on KL720. This time we use the "generic_raw_inference_bypass_pre_proc_send" API as you suggested and fix the Input Width Alignment issue. We didn't encounter errors during conversion/compilation, same as previous attempt. According to the discussion before, you said that our model was "stuck in the NPU". Could you please explain what might be the cause?


There's another question (less important). We compiled a 5-ch model before knowing that models should have <= 4 channels, and also got timeout errors. But it accepts 1-ch, 2-ch, 3-ch, 4-ch input after we fixed the Input Width Alignment issue. So we are wondering why this model, or KL720, behaves this way.

Comments

  • edited March 2022

    Hi Mason,

    Could you check the ioinfo.csv file for the NPU input format? Usually, when there are 4 channels, it should be able to run properly with the input 4W4C8B, and 1W16C8B would be more likely to cause errors.

    If the NPU input was 4W4C8B, you could use the "bypass_pre_proc_send" for inferencing. (The pre-processing used in generic inferece is usually for images, so going through that with something that's not an image might change the input.)

    If the input was 4W4C8B but the inferencing still didn't work, could you provide us with your .onnx and .nef files (with 4 channels) so we could check where the root cause is?


    Edit: You might also need to do some pre-processing for your input depending on the normalization you use.

  • Thanks for your reply. We've checked "ioinfo.csv" and it is "4W4C8B", so there might be problems somewhere else. Could you please provide a cloud storage link for us to upload model files again?


    Btw, because there could be something we missed in the test code, we'd like to leave some of it here. Please kindly help us to check it.

    Data preparation

        width_aligned = 4 * math.ceil(model_nef_descriptor.models[0].width / 4.0)
        model_input_width = model_nef_descriptor.models[0].width
        model_input_height = model_nef_descriptor.models[0].height
        model_input_channel = model_nef_descriptor.models[0].channel
      
        in_data = np.random.randint(low=0, high=255, size=(1, 1000, 4), dtype=np.uint8)
        in_data_norm = in_data - 128
        in_data_aligned = np.ones((width_aligned, model_input_height, 4), dtype=np.uint8) * 255
        in_data_aligned[:in_data_norm.shape[0], :in_data_norm.shape[1], :in_data_norm.shape[2]] = in_data_norm
    
        in_data_aligned_buffer = in_data_aligned.tobytes()
    

    Inference command

        generic_raw_image_header = kp.GenericRawBypassPreProcImageHeader(
            model_id=model_nef_descriptor.models[0].id,
            image_buffer_size=len(in_data_aligned_buffer),
            inference_number=0
        )
        kp.inference.generic_raw_inference_bypass_pre_proc_send(
            device_group=device_group,
            generic_raw_image_header=generic_raw_image_header,
            image_buffer=in_data_aligned_buffer
        )
    
        generic_raw_result = kp.inference.generic_raw_inference_bypass_pre_proc_receive(
           device_group=device_group,
            generic_raw_image_header=generic_raw_image_header,
            model_nef_descriptor=model_nef_descriptor
        )
    
  • edited March 2022

    Hi Mason,

    Thank you for your information. Could you upload your files onto this sharepoint folder via the below link? We'll perform testing on our side as well.

    20220308


    Please let us know if you couldn't upload your files here.

  • Hi Maria,

    We've uploaded our model files (ioinfo.csv as well for reference). Hopefully the following tests will lead us to the root cause.

    Thank you for your help.

  • Hi Mason,

    We got your files, and our team found out that the inference flow is stuck in the NPU again. Investigating the root cause inside the NPU would take some time. We'll let you know once we've figured out the cause, thank you for your patience!

  • Hi Maria,

    Is there any update?

  • Hi Mason,

    Thank you for waiting. According to the hardware team, it seems like one of the model's layers got stuck in the NPU, so the toolchain team will be fixing that bug. You could wait for the updated toolchain to be released, or use your previous model with 5 channels bypassing pre-processing.

  • edited March 2022

    Hi Maria,

    Talking about the 5-channel model, we don't think that we can use it without figuring out why there's a mismatch between input data dimension and model spec. As mentioned before, we'd expect a 5-channel model accepts 5-channel data as input. But in fact, it accepts 1, 2, 3, and 4-channel data instead. Thus we don't think that the output has the same meaning as what we got by running source model on a computer. Do you have any idea about why would the model/KL720 behave this way?

    *By "accepting data", we mean we can get result response from KL720 with that data and no error occurs

  • Hi Mason,

    By "accepting 1, 2, 3, and 4-channel data," could you explain more about it, such as how you noticed that it didn't accept the data in the 5th channel?

    Also, if the model only accepted the first 4 channel's data, then the problem might have been caused in the toolchain. We'll need to look more into it, so if it's okay with you, could you provide the model, the full source code, and the inference data in the same sharepoint folder again so we could replicate the issue? Thank you for your help.

  • Hi Maria,

    Thanks for the reply.

    The model doesn't accept "input data which has 5-channels", not "the 5-th channel of input data".

    Let's say we have 5 set of input data, which are a) W * H * 1, b) W * H * 2, c) W * H * 3, d) W * H * 4, e) W * H * 5, namely data have different channel counts. And we have a model which is trained with many W * H * 5 data and then compiled into NEF. We used the code posted earlier in this thread to feed a,b,c,d,e into this model and the results are:

    - Input (e) raises timeout error (N7 or 103) every times we run "kp.inference.generic_raw_inference_bypass_pre_proc_receive"

    - Input (a)(b)(c)(e) go through "generic_raw_inference_bypass_pre_proc_receive" without exception and we can get an inference result with "kp.inference.generic_inference_retrieve_float_node"

    We've uploaded model to the sharepoint.


    Codes again (all modified from example) :

    Data Preparation (NCH, in the middle paragraph, is for changing input data channel count)

      width_aligned = 4 * math.ceil(model_nef_descriptor.models[0].width / 4.0)
      model_input_width = model_nef_descriptor.models[0].width
      model_input_height = model_nef_descriptor.models[0].height
      model_input_channel = model_nef_descriptor.models[0].channel
      
      in_data = np.random.randint(low=0, high=255, size=(1, 1000, NCH), dtype=np.uint8)
      in_data_norm = in_data - 128
      in_data_aligned = np.ones((width_aligned, model_input_height, NCH), dtype=np.uint8) * 255
      in_data_aligned[:in_data_norm.shape[0], :in_data_norm.shape[1], :in_data_norm.shape[2]] = in_data_norm
     
      in_data_aligned_buffer = in_data_aligned.tobytes()
    

    Inference

      generic_raw_image_header = kp.GenericRawBypassPreProcImageHeader(
        model_id=model_nef_descriptor.models[0].id,
        image_buffer_size=len(in_data_aligned_buffer),
        inference_number=0
      )
      kp.inference.generic_raw_inference_bypass_pre_proc_send(
        device_group=device_group,
        generic_raw_image_header=generic_raw_image_header,
        image_buffer=in_data_aligned_buffer
      )
     
      generic_raw_result = kp.inference.generic_raw_inference_bypass_pre_proc_receive(
        device_group=device_group,
        generic_raw_image_header=generic_raw_image_header,
        model_nef_descriptor=model_nef_descriptor
      )
    

    Getting Results

      inf_node_output_list = []
      for node_idx in range(generic_raw_result.header.num_output_node):
        inference_float_node_output = kp.inference.generic_inference_retrieve_float_node(
          node_idx=node_idx,
          generic_raw_result=generic_raw_result,
          channels_ordering=kp.ChannelOrdering.KP_CHANNEL_ORDERING_CHW)
        inf_node_output_list.append(inference_float_node_output)
    
  • Hi Mason,

    Thank you for providing the information. Our team ran your code, and we think you could change your alignment for the inference. (Please refer to the code below)

    The reason why the timeout error occurred is that the NPU input should be 1W16C8B for the 5-channel model, but the alignment in the code was in the 4W4C8B format. Trying to fit the data of 4*5*1000 (20000) inside 1*16*1000 (16000) would cause an error, since some data would fit but some others wouldn't.

    Therefore, you could try adjusting the alignment according to the format.

    import math
    import numpy as np
    import enum
    
    class NPU_LAYOUT(enum):
        FMT_1W16C8B = 0,
        FMT_4W4C8B = 1
    """
    FMT_1W16C8B
    W align = 1
    C align = 16
    FMT_4W4C8B
    W align = 4
    C align = 4
    """
    model_input_width = model_nef_descriptor.models[0].width
    model_input_height = model_nef_descriptor.models[0].height
    model_input_channel = model_nef_descriptor.models[0].channel
    NCH = model_input_channel
    npu_layout = NPU_LAYOUT.FMT_1W16C8B
    
    if npu_layout == NPU_LAYOUT.FMT_1W16C8B:
        width_aligned = 1 * math.ceil(model_nef_descriptor.models[0].width / 1.0)
        channel_aligned = 16 * math.ceil(model_nef_descriptor.models[0].channel / 16.0)
    elif npu_layout == NPU_LAYOUT.FMT_4W4C8B:
        width_aligned = 4 * math.ceil(model_nef_descriptor.models[0].width / 4.0)
        channel_aligned = 4 * math.ceil(model_nef_descriptor.models[0].channel / 4.0)
    
    in_data = np.random.randint(low=0, high=255, size=(1, 1000, NCH), dtype=np.uint8)
    in_data_norm = in_data - 128
    in_data_aligned = np.ones((width_aligned, model_input_height, channel_aligned), dtype=np.uint8) * 255
    in_data_aligned[:in_data_norm.shape[0], :in_data_norm.shape[1], :in_data_norm.shape[2]] = in_data_norm
    
    img_buffer = in_data_aligned.tobytes()
    

    Also, please make sure that for the model with 5 channels, the NPU input is 1W16C8B.

  • Hi Maria,


    Thanks for your suggestion. I finally managed to have KL720 output inference result of the 5-channel model.


    Btw, I'd like to ask when will the toolchain update be released?

  • Hi Mason,

    That's good to hear!

    We don't have the exact schedule for the toolchain update, but our teams are working on it.

The discussion has been closed due to inactivity. To continue with the topic, please feel free to post a new discussion.