Customized model using DevKit with KL520 and SDK

陳昱堯 · April 2022

Hi,

We used customized CNN model (input is 120x120 image, output is a float number for regression problem) and successfully transform ONNX into NEF

The test with dongle is OK (RMSE=3-4 for RGBA8888, RMSE=5-6 for RGB565)

There are two questions,

Is it possible to use RGBA8888 format with development kit? we want to increase accuracy so RGBA8888 is a better choice than RGB565, but the example of "tiny_yolo_v3" only provides RGB565 format for camera and display. Is there any way to use RGBA8888 for dev kit?
Our debug message printed from UART is shown below. The output of model should be printed in roast degree:[xx.xx]. However, the number is always 0.00. It seems that no error occurs when the program running. I dont know why my result is incorrect (maybe the data size problem?). Could you help us find out the solution?
Thanks!!!!!

/////The settings of are000 shown here://///

#define IMG_W RGB_IMG_SOURCE_W

#define IMG_H RGB_IMG_SOURCE_H

#define IMG_CH 3

#define IMG_SIZE (IMG_W * IMG_H * IMG_CH)

#define IMG_FORMAT IMAGE_FORMAT_RGB565

#define DISPLAY_FORMAT V2K_PIX_FMT_RGB565

#ifdef MODEL_COMPILATION_WITH_ADD_NORM

#define INF_IMG_FORMAT (IMAGE_FORMAT_SUB128 | NPU_FORMAT_RGB565 | IMAGE_FORMAT_PARALLEL_PROC)

/////-----------------------------------------------------/////

The UART msg is shown below:

BOOT MODE: Manual

1. SPI

2. UART(Xmodem)

Please select boot mode[1-2]: 1

[0.000]

#########################################

[0.000] ## With key ##

[0.000] #########################################

[0.000] -> Bootup Status: 0x10000 <-

[0.000] : Power Button Reset

[0.000] [kmdw_cam_mipi_init] init 270548244

[0.000] [kmdw_cam_mipi_init] init 270548252

ncpu: Ready!

[0.000] === Versions (host mode) ===

[0.000] SPL: 1.1.1.0-build.2

[0.000] FW: 1.7.0.0-build.1229

[0.000]

=== Menu ===

[0.000] ( 1) Start Tiny Yolo

[0.000] ( 2) Stop Tiny Yolo

[0.000] ( 3) Toggle pipeline mode

[0.000] ( 4) Quit

[0.000] command >> 1

[4.327] cam 0: frame buf[0] : 0x63c12820

[4.327] cam 0: frame buf[1] : 0x63b31820

[4.327] cam 0: frame buf[2] : 0x63a50820

[4.328] cam 0: frame buf[3] : 0x6396f820

[4.328] cam 0: frame buf[4] : 0x6388e820

[4.328] cam 0: frame buf[5] : 0x637ad820

[4.328] cam 0: frame buf[6] : 0x636cc820

[4.329] cam 0: frame_info buf: 0x636a6e70, size: 7*60

[4.329] command >> [4.539]

=[00004539]= [0] camera # 1 (pre 00004334)

[4.539]

XXX Model predict from here!!!! XXX

[4.684] roast degree:[-0.00]

[4.684] Round trip 936941207: pre/npu/post: 3/25/0 ms

[4.685] -[00004685]- (inf out)

[4.690]

=[00004690]= [1] camera # 2 (pre 00004690)

[4.690]

XXX Model predict from here!!!! XXX

[4.718] roast degree:[-0.00]

[4.718] Round trip 943769139: pre/npu/post: 3/26/0 ms

[4.719] --> FPS: 29.41 (34 ms)

[4.719] -[00004719]- (inf out)

[4.724]

=[00004724]= [2] camera # 3 (pre 00004724)

[4.724]

...

Dereck Hao · April 2022

SDK cannot support RGBA8888
You can use some helper macros in kdpio.h to check the model output. The reference steps are as follows:

Create your post processing function and register it in the ncpu project using kdpio_post_processing_register api
Use POSTPROC_OUTPUT_MEM_ADDR(or other macros) in your post processing function to get the model output address, you can check your model output directly

陳昱堯 · April 2022

Hi, Dereck, Thank for the reply!

I try to modify the post_process function and get a result which is not correct

I print some messages and find out the status is not right

The codes are shown below:

preprocess function: It seems working fine. I only use 120x120 image in the center so I crop the image, and the rgb565 is changed into bgr565 because the train data is bgr888. Also, I subtract the 0-255 RGB by 128 because the in-model preprocess include /255 and +0.5 (this works fine with dongle).
//////////////////////////////////////////////////code start:

int preprocess_rgb565_bgr_crop_dennis(int model_id, struct kdp_image_s *image_p)

{

int in_row, in_col, in_ch, top, bottom, left, right, channel;

int input_radix, bit_shift;

uint8_t *src_p, *dst_p;

in_row = DIM_INPUT_ROW(image_p);

in_col = DIM_INPUT_COL(image_p);

in_ch = DIM_INPUT_CH(image_p);

top = RAW_CROP_TOP(image_p);

bottom = RAW_CROP_BOTTOM(image_p);

left = RAW_CROP_LEFT(image_p);

right = RAW_CROP_RIGHT(image_p);

channel = 2;

src_p = (uint8_t *)RAW_IMAGE_MEM_ADDR(image_p);

dst_p = (uint8_t *)PREPROC_INPUT_MEM_ADDR(image_p);

int data_size = sizeof(uint8_t)*in_ch;

int len_row = data_size*in_row;

int out_data_size = sizeof(uint8_t)*channel;

int out_len_row = out_data_size*(right-left);

for (int y = 0; y < in_col; y++)

{

for (int x = 0; x < in_row; x++)

{

if ( (x > left) && (x <= right) && (y > top) && (y <= bottom) )

{

uint16_t rgb565 = (uint16_t)*(src_p+len_row*y+data_size*x);

uint8_t r = (uint8_t)(rgb565 >> 11) << 3;

uint8_t g = (uint8_t)(rgb565 >> 5) << 2;

uint8_t b = (uint8_t)(rgb565 << 3);

uint16_t bgr565 = ((r-128)>>3) | ((g-128)<<3) | ((b-128) << 8);

*(dst_p+out_len_row*(y-top-1)+out_data_size*(x-left-1)) = bgr565;

}

return 0;

//////////////////////////////////////////////////:code end
postprocess function: My model output is only a float number. I saw the example use scale and div to get float type, so I copy the method to get a result of float. However, the result is not changed each image, and the status seems wrong.
//////////////////////////////////////////////////code start:

int post_dennis(int model_id, struct kdp_image_s *image_p)

{

struct dennis_post_globals_s *gp = get_dennis_gp();

uint8_t *result_p;

int div;

float scale;

int8_t *src_p = (int8_t *)POSTPROC_OUTPUT_MEM_ADDR(image_p);

/* Convert to float */

scale = *(float *)&POSTPROC_OUT_NODE_SCALE(image_p);

div = 1 << POSTPROC_OUT_NODE_RADIX(image_p);

gp->temp.roast_degree = (float)*src_p;

gp->temp.roast_degree = do_div_scale(gp->temp.roast_degree, div, scale);

DSG("model result:[%.2f]\n",(gp->temp.roast_degree));

result_p = (uint8_t *)(POSTPROC_RESULT_MEM_ADDR(image_p));

memcpy(result_p, &(gp->temp), sizeof(struct dennis_post_globals_s));

return sizeof(struct dennis_post_globals_s);

}

//////////////////////////////////////////////////:code end
debug message:
//////////////////////////////////////////////////message start:

[122.240] Round trip -1321960007: pre/npu/post: 1/27/2 ms

[122.240] --> FPS: 12.35 (81 ms)

PreProcessing 0!

Run NPU!!

[122.288]

=[00122288]= [2] camera # 1471 (pre 00122288)

[122.288]

XXX Model predict from here!!!! XXX

Run Post Process!

model result:[10.02]

[122.318] [INFO] IMAGE STATE error, status:1

[122.318] -[00122318]- (inf out)

[122.320] roast degree:[10.02]

[122.321] Round trip -1305760728: pre/npu/post: 0/28/2 ms

[122.321] --> FPS: 12.35 (81 ms)

PreProcessing 0!

Run NPU!!

[122.369]

=[00122369]= [3] camera # 1472 (pre 00122369)

[122.369]

XXX Model predict from here!!!! XXX

Run Post Process!

model result:[10.02]

[122.399] [INFO] IMAGE STATE error, status:1

[122.399] -[00122399]- (inf out)

[122.401] roast degree:[10.02]

[122.402] Round trip -1289560365: pre/npu/post: 1/27/2 ms

[122.402] --> FPS: 12.35 (81 ms)

PreProcessing 0!

Run NPU!!

[122.450]

=[00122450]= [0] camera # 1473 (pre 00122450)

[122.450]

XXX Model predict from here!!!! XXX

Run Post Process!

model result:[10.02]

[122.480] [INFO] IMAGE STATE error, status:1

[122.480] -[00122480]- (inf out)

[122.482] roast degree:[10.02]

[122.483] Round trip -1273363274: pre/npu/post: 1/27/2 ms

[122.483] --> FPS: 12.35 (81 ms)

PreProcessing 0!

Run NPU!!

[122.531]

=[00122531]= [1] camera # 1474 (pre 00122531)

[122.531]

XXX Model predict from here!!!! XXX

Run Post Process!

model result:[10.02]

[122.561] [INFO] IMAGE STATE error, status:1

[122.561] -[00122561]- (inf out)

[122.563] roast degree:[10.02]

[122.564] Round trip -1257160903: pre/npu/post: 0/27/3 ms

[122.564] --> FPS: 12.35 (81 ms)

PreProcessing 0!

Run NPU!!

[122.612]

=[00122612]= [2] camera # 1475 (pre 00122612)

[122.612]

XXX Model predict from here!!!! XXX

Run Post Process!

model result:[10.02]

[122.642] [INFO] IMAGE STATE error, status:1

[122.642] -[00122642]- (inf out)

[122.644] roast degree:[10.02]

//////////////////////////////////////////////////:message end
//////////////////////////////////////////////////The status is printed by the following function:

static int tiny_yolo_run_image(uint32_t app_id, struct kapp_img_run_s *img_run_p)

{

int status;

bool is_dme = false;

if (img_run_p == NULL)

return -1;

//check where the model is stored

//is_dme = kmdw_model_get_location();

kmdw_model_config_img(&img_run_p->img_cfg, &img_run_p->crop_box, &img_run_p->pad_values, img_run_p->ext_param);

kmdw_model_config_result(img_run_p->evt_id, img_run_p->evt_flag);

//status = kmdw_model_run("kapp_tiny_yolo", img_run_p->out, TINY_YOLO_V3_224_224_3, is_dme);

status = kmdw_model_run("dennis_CNN", img_run_p->out, CUSTOMER_MODEL_1, is_dme);

if (status == KMDW_MODEL_RUN_RC_ABORT) {

info_msg("[INFO] Got abort request\n");

return KAPP_ABORT;

} else if(status == KMDW_MODEL_RUN_RC_ERROR) {

info_msg("[INFO] Run Model error\n");

return KAPP_ERR;

} else if(status != IMAGE_STATE_DONE) {

info_msg("[INFO] IMAGE STATE error, status:%d\n",status);

return KAPP_ERR;

}

return KAPP_OP_OK;

}

//////////////////////////////////////////////////
Please help me find the way to get correct result and solve the status problem, thank you!!

Dereck Hao · April 2022

Did you check that these parameters are correct?(in_row, in_col, in_ch, top, bottom, left, right, channel...)

KL520 model format order is wch, w will be aligned to 16 (below link as your reference. FAQ 4)

End to End Simulator - Document Center (kneron.com)

陳昱堯 · April 2022

Hi, Dereck, thanks for your hint and link!!

I've figure out the solution!

And I changed the application. It becomes a camera-vedio with red rectangle (measure range)

As the user enter "3" to cmd, the dev kit measures the value (predicted by model).

In the debug procedure, I guess the format of image is the most important thing.

I print every param and finally find out the raw format and input format are different!!!

(I feed data in B,G,R format because I trained the model with BGR888 input)

The correct preprocess, postprocess, and UART message result are shown below:

(If there is any mistake, please let me know, thank you!!!)

///////////////////////////////////////////////////////////////////////////// Preprocess

int preprocess_rgb565_bgr_crop_dennis(int model_id, struct kdp_image_s *image_p)

{

int in_row, in_col, in_ch, top, bottom, left, right, channel, raw_row, raw_col, raw_ch;

int input_radix, bit_shift;

uint8_t *src_p, *dst_p;

int32_t *len_p;

raw_row = RAW_INPUT_ROW(image_p);

raw_col = RAW_INPUT_COL(image_p);

top = RAW_CROP_TOP(image_p);

bottom = RAW_CROP_BOTTOM(image_p);

left = RAW_CROP_LEFT(image_p);

right = RAW_CROP_RIGHT(image_p);

raw_ch = 2;

in_row = DIM_INPUT_ROW(image_p);

in_col = DIM_INPUT_COL(image_p);

in_ch = DIM_INPUT_CH(image_p);

//DSG("data:%d,%d,%d,%d,%d,%d,%d\n",top,bottom,left,right,raw_row,raw_col,in_ch);

src_p = (uint8_t *)RAW_IMAGE_MEM_ADDR(image_p);

dst_p = (uint8_t *)PREPROC_INPUT_MEM_ADDR(image_p);

len_p = (int32_t *)&PREPROC_INPUT_MEM_LEN(image_p);

int data_size = sizeof(uint8_t)*raw_ch;

int len_row = data_size*raw_col;

int out_data_size = sizeof(uint8_t)*in_ch;

int out_len_row = out_data_size*in_col;

int num = 0;

for (int y = 0; y < raw_row; y++)

{

for (int x = 0; x < raw_col; x++)

{

if ( (x > left) && (x <= right) && (y > top) && (y <= bottom) )

{

uint16_t rgb565 = *(uint16_t *)(src_p+len_row*y+data_size*x);

uint8_t r = (uint8_t)(rgb565 >> 11) << 3;

uint8_t g = (uint8_t)(rgb565 >> 5) << 2;

uint8_t b = (uint8_t)(rgb565 << 3);

//uint16_t bgr565 = ((r-128)>>3) | ((g-128)<<3) | ((b-128) << 8);

//*(dst_p+out_len_row*(y-top-1)+out_data_size*(x-left-1)) = bgr565;

*(dst_p+out_len_row*(y-top-1)+out_data_size*(x-left-1)) = b-128;

*(dst_p+out_len_row*(y-top-1)+out_data_size*(x-left-1)+1) = g-128;

*(dst_p+out_len_row*(y-top-1)+out_data_size*(x-left-1)+2) = r-128;

num += 3;

}

//DSG("num:%d\n",num);

*len_p = in_row*in_col*in_ch;

//DSG("pre_len,%d\n",PREPROC_INPUT_MEM_LEN(image_p));

return 0;

}

///////////////////////////////////////////////////////////////////////////// Postprocess

int post_dennis(int model_id, struct kdp_image_s *image_p)

{

// model output dim:(H, C, W_aligned)

struct dennis_post_globals_s *gp = get_dennis_gp();

float *result_p;

int div;

float scale;

int8_t *src_p = (int8_t *)MODEL_OUTPUT_MEM_ADDR(image_p);

// Convert to float

scale = *(float *)&POSTPROC_OUT_NODE_SCALE(image_p);

div = 1 << POSTPROC_OUT_NODE_RADIX(image_p);

gp->temp.roast_degree = (float)*src_p;

gp->temp.roast_degree = do_div_scale(gp->temp.roast_degree, div, scale);

//DSG("output:[%.2f]\n",gp->temp.roast_degree);

//DSG("scale:[%.2f]\n",scale);

//DSG("div:[%d]\n",div);

result_p = (float *)(POSTPROC_RESULT_MEM_ADDR(image_p));

*result_p = gp->temp.roast_degree;

int32_t model_len = MODEL_OUTPUT_MEM_LEN(image_p);

//DSG("model_len:[%d]\n",model_len); // w aligned to 16 bytes

//DSG("result:[%.2f]\n",*result_p);

return 0;

}

///////////////////////////////////////////////////////////////////////////// UART result

BOOT MODE: Manual

1. SPI

2. UART(Xmodem)

Please select boot mode[1-2]: 1

[0.000]

#########################################

[0.000] ## With key ##

[0.000] #########################################

[0.000] -> Bootup Status: 0x10000 <-

[0.000] : Power Button Reset

[0.000] [kmdw_cam_mipi_init] init 270547368

[0.000] [kmdw_cam_mipi_init] init 270547376

ncpu: Ready!

[0.000] === Versions (host mode) ===

[0.000] SPL: 1.1.1.0-build.2

[0.000] FW: 1.7.0.0-build.1229

[0.000]

=== Menu ===

[0.000] ( 1) Start Camera

[0.000] ( 2) Stop Camera

[0.000] ( 3) measure coffee degree

[0.000] ( 4) Quit

[0.000] command >> 1

[3.272] cam 0: frame buf[0] : 0x63c12850

[3.272] cam 0: frame buf[1] : 0x63b31850

[3.272] cam 0: frame buf[2] : 0x63a50850

[3.272] cam 0: frame buf[3] : 0x6396f850

[3.273] cam 0: frame buf[4] : 0x6388e850

[3.273] cam 0: frame buf[5] : 0x637ad850

[3.273] cam 0: frame buf[6] : 0x636cc850

[3.273] cam 0: frame_info buf: 0x636a6ea0, size: 7*60

[3.274] command >> 3

[8.773] command >> [8.810] do measure!

[8.940] roast degree:[52.60]

[8.941] Round trip 1788148522: pre/npu/post: 25/25/0 ms

3

[13.286] command >> [13.327] do measure!

[13.457] roast degree:[28.39]

[13.458] Round trip -1603416481: pre/npu/post: 25/25/0 ms

3

[17.999] do measure!

[17.999] command >> [18.128] roast degree:[53.44]

[18.129] Round trip -669224694: pre/npu/post: 24/26/0 ms

3

[21.592] command >> [21.612] do measure!

[21.742] roast degree:[52.60]

[21.743] Round trip 53568456: pre/npu/post: 24/26/0 ms

3

[28.859] do measure!

[28.860] command >> [28.989] roast degree:[45.09]

[28.990] Round trip 1502966045: pre/npu/post: 24/26/0 ms

2

[46.051] command >> 4

陳昱堯 · April 2022

I found that the preprocess function wrong, but there is no detail comment of the raw data format in SDK.

Therefore, I try to debug and change the code as below. I use NPU_FORMAT_RGBA8888, but my model

only use B, G, R (three channels), so I set A as 0. Is it correct? What is the raw image format exactly?(in the code I use H, C, W)

after many wrong answers, I finally get a version with "some" right answer (from UART msg) as shown below.

The correct answer should be 45.225, but I get 10.02 many times. Sometimes it's 45.

Does the result come out after NPU process finish all the time? or maybe I should give NPU some time to process?

////////////////////////////////////////// preprocess:

src_p = (uint8_t *)RAW_IMAGE_MEM_ADDR(image_p);

dst_p = (uint8_t *)PREPROC_INPUT_MEM_ADDR(image_p);

int len_row = sizeof(uint8_t)*raw_col;

int data_row = len_row*raw_ch;

int out_len_row = sizeof(uint8_t)*in_col;

int out_data_row = out_len_row*in_ch;

for (int y = 0; y < raw_row; y++)

{

for (int x = 0; x < raw_col; x++)

{

if ( (x >= left) && (x < right) && (y >= top) && (y < bottom) )

{

uint8_t a = *(uint8_t *)(src_p+data_row*y+len_row*0+x);

uint8_t b = *(uint8_t *)(src_p+data_row*y+len_row*1+x);

uint8_t g = *(uint8_t *)(src_p+data_row*y+len_row*2+x);

uint8_t r = *(uint8_t *)(src_p+data_row*y+len_row*3+x);

*(dst_p+out_data_row*(y-top)+out_len_row*0+(x-left)) = 0;

*(dst_p+out_data_row*(y-top)+out_len_row*1+(x-left)) = r;

*(dst_p+out_data_row*(y-top)+out_len_row*2+(x-left)) = g;

*(dst_p+out_data_row*(y-top)+out_len_row*3+(x-left)) = b;

}

return 0;

//////////////////////////////////////////////// UART output to PC

[505.309] command >> [505.341] do measure!

[505.425] roast degree:[10.02]

[505.426] Round trip -1994555333: pre/npu/post: 28/26/0 ms

3

[506.675] command >> [506.706] do measure!

[506.786] roast degree:[52.60]

[506.786] Round trip -1722543648: pre/npu/post: 28/26/0 ms

3

[507.584] command >> [507.585] do measure!

[507.664] roast degree:[10.02]

[507.665] Round trip -1546757264: pre/npu/post: 28/26/0 ms

3

[508.318] command >> [508.386] do measure!

[508.466] roast degree:[45.09]

[508.466] Round trip -1386545015: pre/npu/post: 29/25/0 ms

3

[509.070] command >> [509.108] do measure!

[509.188] roast degree:[10.02]

[509.188] Round trip -1242145063: pre/npu/post: 29/25/0 ms

3

[509.823] command >> [509.825] do measure!

[509.909] roast degree:[10.02]

[509.910] Round trip -1097759773: pre/npu/post: 28/26/0 ms

Ethon Lin · May 2022

Hello,

If you are using a channel(R, G, B) model with RGBA8888 input, it's ok to set channel A = 0 to fit the format. For the sequence of RGBA data, please refer to the comment of the link. https://www.kneron.com/forum/discussion/comment/760/#Comment_760
In KL520, the raw output will be H, C, W format. You can find the instruction in 4th FAQ. http://doc.kneron.com/docs/#toolchain/python_app/app_flow_manual/#faq
Depends on the size of image or deep of model, both image transmission and inference cost few time. But if you aren't using parallel mode, you should get the result in sequence.

陳昱堯 · May 2022

Hi,

I write a version with raw output R->G->B->A(A set to 0), and the for loop is based on H, C, W format.
The model result is sometimes correct (error < 3) and sometimes in the range 10-20, but I didn't see this situation when using dongle.
I guess the problem is the 3. (third) point from Ethon Lin. I used parallel mode and got my result. However, when I changed the parameter in scpu project [IMAGE_FORMAT_PARALLEL_PROC] into [IMAGE_FORMAT_RAW_OUTPUT], my postprocessing is not working. Then, I get only 0.00 because my result callback is not working. Could you guys explain the way [IMAGE_FORMAT_RAW_OUTPUT] works? What code should I write to get the result in sequence? or how can I make sure [IMAGE_FORMAT_PARALLEL_PROC] give me the correct answer with every input image? (maybe longer timeout or something?)

Ethon Lin · May 2022

Hello,

It's different between IMAGE_FORMAT_PARALLEL_PROC and IMAGE_FORMAT_RAW_OUTPUT.

User can set the bit IMAGE_FORMAT_PARALLEL_PROC to enable or disable parallel mode, for instance,

img_run[i].img_cfg.image_format |= IMAGE_FORMAT_PARALLEL_PROC; // enable

img_run[i].img_cfg.image_format &= ~IMAGE_FORMAT_PARALLEL_PROC; // disable

And bit IMAGE_FORMAT_RAW_OUTPUT is set to make NPU output raw feature map(fixed point) without post processing function.

陳昱堯 · May 2022

Hi Ethon,

Thank for your advice!

I found the comment of raw output format:

/* raw output format:

* ([output_num][height_outnode1][channel_outnode1][width_outnode1][radix_outnode1][scale_outnode1][h2][c2][w2][r2][s2][...]

* [h_n][c_n][w_n][r_n][s_n][fixed_point_datanode1][fixed_point_datanode2][...][fixed_point_datanodeN])

* 1 byte for each fixed-point data. 4 bytes for each of other data.

* fixed-point data is converted to float data with formula of fp_value / (scale * (2 ^ radix)).

*/

This helps me to get the result from model by scpu. (In the case of raw output, I dont use postprocessing. Instead, I change the fixed number into float in scpu.)

However, there is still a problem. My UART result is shown below. The right answer should be 5x.xx, but the result 14.11 continously comes out. Is the result always generated from the NPU? or Do I get the wrong result when NPU has not done yet? How should I solve this problem? (like waiting for NPU complete or something)

Thank you for your patience!!!

[127.180] do measure!

[127.180] command >> [127.314] roast degree:[14.11]

[127.314] Round trip -307135142: pre/npu/post: 28/26/0 ms

3

[127.905] do measure!

[127.905] command >> [128.039] roast degree:[55.59]

[128.039] Round trip -162115811: pre/npu/post: 29/26/0 ms

3

[128.555] command >> [128.575] do measure!

[128.709] roast degree:[14.11]

[128.709] Round trip -28135037: pre/npu/post: 28/26/0 ms

3

[129.254] do measure!

[129.254] command >> [129.388] roast degree:[14.11]

[129.388] Round trip 107680164: pre/npu/post: 29/26/0 ms

3

[130.013] command >> [130.058] do measure!

[130.192] roast degree:[53.10]

[130.192] Round trip 268483120: pre/npu/post: 28/26/0 ms

3

[130.765] do measure!

[130.765] command >> [130.899] roast degree:[53.10]

[130.899] Round trip 409875537: pre/npu/post: 29/26/0 ms

3

[131.460] do measure!

[131.460] command >> [131.594] roast degree:[53.10]

[131.594] Round trip 548881982: pre/npu/post: 29/25/0 ms

3

[132.110] command >> [132.130] do measure!

[132.264] roast degree:[14.11]

[132.264] Round trip 682880716: pre/npu/post: 29/26/0 ms

3

[132.758] command >> [132.800] do measure!

[132.934] roast degree:[54.76]

[132.934] Round trip 816861546: pre/npu/post: 28/26/0 ms

3

[133.404] do measure!

[133.404] command >> [133.538] roast degree:[14.11]

[133.538] Round trip 937659278: pre/npu/post: 29/26/0 ms

3

[134.196] command >> [134.208] do measure!

[134.342] roast degree:[14.11]

[134.342] Round trip 1098473014: pre/npu/post: 28/26/0 ms

3

[134.932] do measure!

[134.932] command >> [135.066] roast degree:[14.11]

[135.066] Round trip 1243278742: pre/npu/post: 28/26/0 ms

3

[136.128] command >> [136.138] do measure!

[136.272] roast degree:[55.59]

[136.272] Round trip 1484456600: pre/npu/post: 29/26/0 ms

3

[136.968] do measure!

[136.968] command >> [137.102] roast degree:[53.93]

[137.102] Round trip 1650467550: pre/npu/post: 29/25/0 ms

Ethon Lin · May 2022

Hello,

You mentioned that you can get the correct result with dongle but got wrong abnormal output with host mode on 96 board. Is my understanding right?

I wonder whether are this two results are using different input image format on each platform. (MIPI with rgb565 on host mode but rgba8888 with dongle on PLUS). Maybe you can try to use rgb565 on dongle to check the accuracy instead.

陳昱堯 · May 2022

Hello Ethon,

Yes, you are right. I get correct result with dongle (RGB565 and RGBA8888 are both correct), but sometimes wrong result on 96 board.

I dont know if I understand the SDK code right. Maybe this is the point.

In the settings below, IMG_CH was 3 for RGB565. Does this mean the preprocess function in NCPU sees raw data like three bytes as a pixel (R,G,B)?

or Does this mean the preprocess function in NCPU sees two bytes (RGB565) with one byte (A) as a pixel?

or Should I set the IMG_CH as 2 for RGB565?

And here is the code for format: #define INF_IMG_FORMAT (NPU_FORMAT_RGBA8888 | IMAGE_FORMAT_RAW_OUTPUT)

Does this mean the preprocess function sees raw data as RGBA8888 format? (4 bytes as a pixel)

or this means the output of preprocess function for NPU is RGBA8888? (Actually, I already feed model with this format)

#define IMG_W RGB_IMG_SOURCE_W

#define IMG_H RGB_IMG_SOURCE_H

#define IMG_CH 3

#define IMG_SIZE (IMG_W * IMG_H * IMG_CH)

#define IMG_FORMAT IMAGE_FORMAT_RGB565

#define DISPLAY_FORMAT V2K_PIX_FMT_RGB565

#ifdef MODEL_COMPILATION_WITH_ADD_NORM

#define INF_IMG_FORMAT (IMAGE_FORMAT_SUB128 | NPU_FORMAT_RGB565 | IMAGE_FORMAT_PARALLEL_PROC)

#else

#ifdef MODEL_COMPILATION_WITH_ADD_NORM_DENNIS

#define INF_IMG_FORMAT (NPU_FORMAT_RGBA8888 | IMAGE_FORMAT_RAW_OUTPUT)

//#define INF_IMG_FORMAT (NPU_FORMAT_RGB565 | IMAGE_FORMAT_RAW_OUTPUT)

#else

#define INF_IMG_FORMAT (IMAGE_FORMAT_RIGHT_SHIFT_ONE_BIT | NPU_FORMAT_RGB565 | IMAGE_FORMAT_PARALLEL_PROC)

#endif

Dereck Hao · May 2022

IMG_CH was 3 for RGB565. This means the preprocess function sees raw data like three bytes as a pixel (R,G,B)
Does this mean the preprocess function sees raw data as RGBA8888 format? (4 bytes as a pixel) -> Yes
IMAGE_FORMAT_RAW_OUTPUT means to bypass post process and output feature map directly

陳昱堯 · May 2022

Thank you for reply!!

I finally get a version for normal result. The error is a little bigger than dongle, but it is OK because the camera is different. That should be solved by better designed model. At least the result is stable and reasonable now!

Here is the note for myself and people who meet the same problem when using SDK:

1. The definition of crop box is not coordinate but distance to edge

img_run[i].crop_box.top = 180; // pixels to top

img_run[i].crop_box.bottom = 180; // pixels to bottom

img_run[i].crop_box.left = 260; // pixels to left

img_run[i].crop_box.right = 260; // pixels to right

2. Preprocess can use default settings. No need to write code in NCPU!!

#define INF_IMG_FORMAT (IMAGE_FORMAT_SUB128 | NPU_FORMAT_RGB565 | IMAGE_FORMAT_PARALLEL_PROC)

IMAGE_FORMAT_SUB128 is for r-128, g-128, b-128

NPU_FORMAT_RGB565 is telling preprocess to transfer RGB565 into RGB, so there is no need to do transformation in NCPU again

Therefore, the settings are:

#define IMG_W RGB_IMG_SOURCE_W

#define IMG_H RGB_IMG_SOURCE_H

#define IMG_CH 3

#define IMG_SIZE (IMG_W * IMG_H * 2)

#define IMG_FORMAT IMAGE_FORMAT_RGB565

#define DISPLAY_FORMAT V2K_PIX_FMT_RGB565

3. If use IMAGE_FORMAT_PARALLEL_PROC and do_parallel=1

the output result should be transfer by NCPU postprocess(see the code inside)

If use IMAGE_FORMAT_RAW_OUTPUT and do_parallel=0

the raw output should follow the format:

/* raw output format:

* ([output_num][height_outnode1][channel_outnode1][width_outnode1][radix_outnode1][scale_outnode1][h2][c2][w2][r2][s2][...]

* [h_n][c_n][w_n][r_n][s_n][fixed_point_datanode1][fixed_point_datanode2][...][fixed_point_datanodeN])

* 1 byte for each fixed-point data. 4 bytes for each of other data.

* fixed-point data is converted to float data with formula of fp_value / (scale * (2 ^ radix)).

*/

4. [important] If no parallel processing, there is no need to write NCPU code!!!!!

5. In this project version, the preprocess is SUB128 and model inprocess is [/255] and [+0.5], so the final input is 0-1

Dereck Hao · May 2022

Congrats on solving the problem!! and thanks for organizing these notes.

Customized model using DevKit with KL520 and SDK

Comments

img_run[i].img_cfg.image_format |= IMAGE_FORMAT_PARALLEL_PROC; // enable

img_run[i].img_cfg.image_format &= ~IMAGE_FORMAT_PARALLEL_PROC; // disable