Get the same result for all samples from the NPU of NUCLEO-N657X0-Q

autoome · ‎2026-05-05

Hello ST Community,

I am trying to run a TFLite model on the Neural-ART NPU of the NUCLEO-N657X0-Q board using STM32CubeIDE + STM32CubeMX and STM32Cube AI Studio. I followed the guide here: https://community.st.com/t5/stm32-mcus/how-to-build-an-ai-application-from-scratch-on-the-nucleo-n657x0/ta-p/828502 but with my own TFLite model.

Model details:

Architecture: 5x FullyConnected layers (41→32→32→16→8→1) with ReLU activations
Input: int8[1,41], scale=0.0717, zero_point=-24
Output: int8[1,1], scale=0.0354, zero_point=1
Task: binary classification

Problem: I have 31 test samples. Every sample produces the same output value 53 regardless of input. I verified the inputs are correctly quantized and different for each sample.

What I investigated: Through per-epoch debugging I found the following epoch flow:

EP0 (HW): reads input
EP1 (HW): continuation
EP2 (SW): DequantizeLinear
EP3 (SW): Conv float → writes correct floats

EP4 (hybrid): outputs ALL ZEROS

EP5 (HW): reads zeros → always produces same output 53

I ran the TFLite model and same dataset on STM32 Cube AI Studio, and still got the same output value 53 regardless of input.

This is my inference code:

int aiRun(void) {
    LL_ATON_RT_RetValues_t ret = LL_ATON_RT_DONE;
    LL_ATON_RT_Reset_Network(&NN_Instance_network);
    LL_ATON_Set_User_Input_Buffer_network(0, stai_input_data, 41);
    LL_ATON_Set_User_Output_Buffer_network(0, stai_output_data, 1);
    SCB_CleanDCache_by_Addr((uint32_t*)stai_input_data, 64);
    SCB_InvalidateDCache_by_Addr((uint32_t*)stai_output_data, 64);
    do {
        ret = LL_ATON_RT_RunEpochBlock(&NN_Instance_network);
        if (ret == LL_ATON_RT_WFE)
            LL_ATON_OSAL_WFE();
    } while (ret != LL_ATON_RT_DONE);
    SCB_InvalidateDCache_by_Addr((uint32_t*)stai_output_data, 64);
    return 0;
}

Environment:

Board: NUCLEO-N657X0-Q
ST Edge AI Studio: v4.0.0
OS: Ubuntu 24

Thank you very much for any advice in advance!!

Julian E. · ‎2026-05-05

Hi @autoome,

First of all, we are working on updating this tutorial. Hopefully, it will come out soon.

Then, when doing a "validate on target" on STM32CubeAI Studio, you can download the output in the table with the metrics. Are all the outputs the same?

It would help us understand if this is an issue at the model level or at the embedded code level.

You can also download an application template from STM32Cube AI Studio, this could help you.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

autoome · ‎2026-05-05

Hi @Julian E.

Thank you very much for your reply!

Yes when I tried the "validate on target" on AI Studio, I got the same output for every run, here's network_val_c_outputs.csv:

And here's network_val_m_outputs.csv:

Thank you very much for the help!

Julian E. · ‎2026-05-05

Hi @autoome,

Ok, so it seems that the issue is coming from the converted model and not your implementation.

Maybe a bug from the ST Edge AI Core.

Could you do 2 "validate on target", one with the NPU, one without it and send me the metrics you get, in particular the COS.

Doing it with and without the NPU will help us see if the issue comes from the NPU compiler part. If you get a good cos without the NPU but a bad one with it for example.

If you get bad cos in both cases, then it is another story.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

autoome · ‎2026-05-11

Hi @Julian E.

Thank you very much for your reply. I did two validations on targets, and without NPU looks good.

Here's the one with NPU:

outputcos

And here's the one without NPU, this looks normal:

cos

Thanks again for any advice!

Julian E. · ‎2026-05-26

Hi @autoome,

Sorry for the very late answer.

In this case, it seems to be a bug in the NPU compiler.

There is pretty much nothing that you can do on your own.

One possibility would be to add output at different places in the model to check where the COS start to diverge.

(For example, add an input in the middle of your model to check if the COS is better, if it is, than the problematic part of the model is after this output, else before etc)

This is no guarantee that a single layer is the cause, it could be a pattern of multiple layers in certain condition that create the issue. What I mean is that you may not be able to understand the real cause of the issue.

You can edit your model with tools such as ONNX-modifier

If you find something, a solution could be to edit your model to replace the problematic layer.

Or exclude from the quantization this layer, as it would then be mapped in SW (MCU) and the implementation would be different, maybe solving the issue.

Else could you please share your model?

I will create an internal ticket to look at the error, but I cannot guarantee that it will be fixed.

Please also try to use the latest version of the ST Edge AI Core as every version brings bug fixes.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

autoome · ‎2026-05-27

Hi @Julian E.

Thank you again for your reply. I will try to modify the model as you suggested.

Sure I can share my model here:

It consists of the TFLITE models for quantized and no quantized, and their original onnx model.

Thanks for your time!