2026-05-05 4:48 AM - last edited on 2026-05-05 4:52 AM by mƎALLEm
Hello ST Community,
I am trying to run a TFLite model on the Neural-ART NPU of the NUCLEO-N657X0-Q board using STM32CubeIDE + STM32CubeMX and STM32Cube AI Studio. I followed the guide here: https://community.st.com/t5/stm32-mcus/how-to-build-an-ai-application-from-scratch-on-the-nucleo-n657x0/ta-p/828502 but with my own TFLite model.
Model details:
Problem: I have 31 test samples. Every sample produces the same output value 53 regardless of input. I verified the inputs are correctly quantized and different for each sample.
What I investigated: Through per-epoch debugging I found the following epoch flow:
EP0 (HW): reads input
EP1 (HW): continuation
EP2 (SW): DequantizeLinear
EP3 (SW): Conv float → writes correct floats
EP4 (hybrid): outputs ALL ZEROS
EP5 (HW): reads zeros → always produces same output 53
I ran the TFLite model and same dataset on STM32 Cube AI Studio, and still got the same output value 53 regardless of input.
This is my inference code:
int aiRun(void) {
LL_ATON_RT_RetValues_t ret = LL_ATON_RT_DONE;
LL_ATON_RT_Reset_Network(&NN_Instance_network);
LL_ATON_Set_User_Input_Buffer_network(0, stai_input_data, 41);
LL_ATON_Set_User_Output_Buffer_network(0, stai_output_data, 1);
SCB_CleanDCache_by_Addr((uint32_t*)stai_input_data, 64);
SCB_InvalidateDCache_by_Addr((uint32_t*)stai_output_data, 64);
do {
ret = LL_ATON_RT_RunEpochBlock(&NN_Instance_network);
if (ret == LL_ATON_RT_WFE)
LL_ATON_OSAL_WFE();
} while (ret != LL_ATON_RT_DONE);
SCB_InvalidateDCache_by_Addr((uint32_t*)stai_output_data, 64);
return 0;
}
Environment:
2026-05-05 5:04 AM - edited 2026-05-05 7:31 AM
Hi @autoome,
First of all, we are working on updating this tutorial. Hopefully, it will come out soon.
Then, when doing a "validate on target" on STM32CubeAI Studio, you can download the output in the table with the metrics. Are all the outputs the same?
It would help us understand if this is an issue at the model level or at the embedded code level.
You can also download an application template from STM32Cube AI Studio, this could help you.
Have a good day,
Julian
2026-05-05 5:13 AM
Hi @Julian E.
Thank you very much for your reply!
Yes when I tried the "validate on target" on AI Studio, I got the same output for every run, here's network_val_c_outputs.csv:
And here's network_val_m_outputs.csv:
Thank you very much for the help!
2026-05-05 7:30 AM
Hi @autoome,
Ok, so it seems that the issue is coming from the converted model and not your implementation.
Maybe a bug from the ST Edge AI Core.
Could you do 2 "validate on target", one with the NPU, one without it and send me the metrics you get, in particular the COS.
Doing it with and without the NPU will help us see if the issue comes from the NPU compiler part. If you get a good cos without the NPU but a bad one with it for example.
If you get bad cos in both cases, then it is another story.
Have a good day,
Julian
2026-05-11 5:57 AM
Hi @Julian E.
Thank you very much for your reply. I did two validations on targets, and without NPU looks good.
Here's the one with NPU:
output
cos
And here's the one without NPU, this looks normal:
cos
Thanks again for any advice!
2026-05-26 1:49 AM
Hi @autoome,
Sorry for the very late answer.
In this case, it seems to be a bug in the NPU compiler.
There is pretty much nothing that you can do on your own.
One possibility would be to add output at different places in the model to check where the COS start to diverge.
(For example, add an input in the middle of your model to check if the COS is better, if it is, than the problematic part of the model is after this output, else before etc)
This is no guarantee that a single layer is the cause, it could be a pattern of multiple layers in certain condition that create the issue. What I mean is that you may not be able to understand the real cause of the issue.
You can edit your model with tools such as ONNX-modifier
If you find something, a solution could be to edit your model to replace the problematic layer.
Or exclude from the quantization this layer, as it would then be mapped in SW (MCU) and the implementation would be different, maybe solving the issue.
Else could you please share your model?
I will create an internal ticket to look at the error, but I cannot guarantee that it will be fixed.
Please also try to use the latest version of the ST Edge AI Core as every version brings bug fixes.
Have a good day,
Julian
2026-05-27 3:49 AM
Hi @Julian E.
Thank you again for your reply. I will try to modify the model as you suggested.
Sure I can share my model here:
It consists of the TFLITE models for quantized and no quantized, and their original onnx model.
Thanks for your time!
We’re moving the ST Community to a new platform to give you a better and more reliable community experience.