2026-05-06 8:16 AM - edited 2026-05-06 8:24 AM
Dear all,
I think I have discovered a bug in the CMSIS headers supplied for the STM32H7RSx and STM32H7x MCUs.
In "Drivers/CMSIS/Device/ST/STM32H7RSxx/Include/stm32h7rsxx.h", there are definitions for ATOMIC_SET_BIT() and ATOMIC_CLEAR_BIT() among others, see https://github.com/STMicroelectronics/cmsis-device-h7rs/blob/7e6e213ddc397c76622a0aca2f623ccc3b34c010/Include/stm32h7rsxx.h#L162:
/* Use of CMSIS compiler intrinsics for register exclusive access */
/* Atomic 32-bit register access macro to set one or several bits */
#define ATOMIC_SET_BIT(REG, BIT) \
do { \
uint32_t val; \
do { \
val = __LDREXW((__IO uint32_t *)&(REG)) | (BIT); \
} while ((__STREXW(val,(__IO uint32_t *)&(REG))) != 0U); \
} while(0)and the "register" in the description and the '__IO' qualifier in the code hint that this can be used for peripheral registers. This suspicion is correct, e.g., see https://github.com/STMicroelectronics/stm32h7rsxx-hal-driver/blob/1bde483bc7ab4883c2ef643e5a16ad6b303a631f/Src/stm32h7rsxx_hal_uart.c#L4214:
static void UART_TxISR_8BIT(UART_HandleTypeDef *huart)
{
/* Check that a Tx process is ongoing */
if (huart->gState == HAL_UART_STATE_BUSY_TX)
{
if (huart->TxXferCount == 0U)
{
/* Disable the UART Transmit Data Register Empty Interrupt */
ATOMIC_CLEAR_BIT(huart->Instance->CR1, USART_CR1_TXEIE_TXFNFIE);Note above that the LDREX instruction is used in the ATOMIC_CLEAR_BIT() implementation.
STM32H7RSx and STM32H7x MCUs host ARM Cortex M7 cores, which follow the ARMv7-M architecture.
The ARMv7-M Architecture Reference Manual (https://developer.arm.com/documentation/ddi0403/d/Application-Level-Architecture/ARM-Architecture-Memory-Model/Synchronization-and-semaphores/Load-Exclusive-and-Store-Exclusive-usage-restrictions) states:
"LDREX and STREX operations must be performed only on memory with the Normal memory attribute."
Most code examples for STM32H7RSx and STM32H7x (e.g., https://github.com/STMicroelectronics/STM32CubeH7RS/blob/4891e67739a01faea3c35d7a9bdccea6970266fd/Projects/STM32H7S78-DK/Examples/GPIO/GPIO_IOToggle/Appli/Src/main.c#L171) don't define memory properties for the peripherals region (0x4000_0000 .. 0x5FFF_FFFF for STM32H7RSx) by means of an explicit MPU config, but they do define this by means of the implicit MPU region -1 by means of calling HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT). This means privileged code will access the peripherals using the default memory properties which can be found in the MCU's reference manual. For the peripherals, this is with the Device memory attribute:
Therefore, I conclude this is a violation of the ARM architecture reference manual. This is confirmed by this post (https://community.st.com/t5/stm32-mcus-products/ldrex-instruction-on-the-stm32h755-when-mpu-activated-causes/m-p/174176/highlight/true#M36012) and by someone of ST in the accepted answer of that same topic:
"We don't implement global monitor on STM32H7. We recommend to use the HW semaphore for synchronization."
I happened to run into this using an MPU config that defines an explicit MPU region including the peripherals region with the Device memory attribute set: the target froze on a "LDREX [0x4000_4C00, #0]" instruction (UART4 CR1 on STM32H7S3). Normal debugger access was not possible anymore (I had to use HW instruction tracing to figure out where it hung). The corresponding code for that instruction was in "stm32h7rsxx_hal_uart.c" in UART_TxISR_8BIT():
/* Disable the UART Transmit Data Register Empty Interrupt */
ATOMIC_CLEAR_BIT(huart->Instance->CR1, USART_CR1_TXEIE_TXFNFIE);Interesting though was that this freeze only happened after two days of running without problems. When I randomly interrupt the MCU while not in a frozen state and then put a breakpoint on that instruction and then continue, the breakpoint hits and I can continue without encountering any freeze. So it seems it only freezes very rarely at that point.
Can someone from ST comment on this, confirm that this is indeed a violation/bug (and what are the suggested fixes on both short term and long term)?
2026-05-31 6:51 AM
Ok, some late followup, just did not have enough time earlier to dive into this too much, especially as it's a PITA.
I am coming from ARM-V8M, so your mileage may vary. There is a local exclusive monitor and a global exclusive monitor. The local one is really one state bit, LDREX sets it, CLREX clears it and STREX checks-and-clears it (yes, exception entry/return clears it as well, and there is a bunch of undefinedness on a fault, but that should not matter here):
LDREX:
if memaddrdesc.memattrs.shareable then
MarkExclusiveGlobal(memaddrdesc.paddress, ProcessorID(), size);
MarkExclusiveLocal(memaddrdesc.paddress, ProcessorID(), size);
STREX:
passed = IsExclusiveLocal(memaddrdesc.paddress, ProcessorID(), size);
if memaddrdesc.memattrs.shareable then
passed = passed && IsExclusiveGlobal(memaddrdesc.paddress, ProcessorID(), size);
// Local monitor remains in, or transitions to the Open Access state.
ClearExclusiveLocal(ProcessorID());
So local exclusive monitor is always considered by this pseudo code from the architecture reference manual, and global monitor is considered is the memory is shareable (which device memory always is).
In the CM33 TRM there is some additional constraint added, "Exclusive accesses to the PPB memory region (0xE0000000:0xE00FFFFF) do not update the internal local exclusive monitor. Load exclusive instructions load data into a register and Store exclusive instructions store data from a register. For STREX and STLEX instructions, the status register is always updated with the value 0, indicating the store has updated memory."
CM55 is now where I am starting to get confused (TRM):
"Exclusive Load instructions do not update the internal exclusive monitor if these instructions are in Shareable memory addresses associated with the M-AXI and P-AHB interfaces where a global exclusive monitor is not supported."
"If an Exclusive read access is carried out to a region that does not support a global exclusive
monitor ... These responses do not result in the processor taking an exception, but they do ensure that the
STREX does not pass".
So the first paragraph directly contradicts that the architecure reference manual, and the 2nd one says in so many words that a STREX without global monitor support on shareable device memory will always fail (i.e. ALWAYS, not work most of the time).
CM7 now is simpler. There you could define peripheral space as "non-shared" and get the local exclusive monitor only. But it actually says in the TRM that internal and external exclusives are handled for shared device regions. A external exclusive monitor (name change between CM7 and CM33/CM55/CM85) is handled similar to CM55/CM85 in that if the external exclusive monitor is not supported, it will always fail the STREX.
So my conclusion is that the problem you are seeing is something different. Mainly because either local/global (internal/external) exclusive monitors work, all the time, or they fail, all the time, but not randomly would work.
2026-06-03 1:46 AM
Dear Thomas,
Thank you very much for your extensive cross-referencing between the various TRMs, very interesting!
@Thomas Roell wrote:CM7 now is simpler. There you could define peripheral space as "non-shared" and get the local exclusive monitor only. But it actually says in the TRM that internal and external exclusives are handled for shared device regions. A external exclusive monitor (name change between CM7 and CM33/CM55/CM85) is handled similar to CM55/CM85 in that if the external exclusive monitor is not supported, it will always fail the STREX.
Does this mean that for CM7 depending on how the peripheral region is defined in the MPU config (shared vs. non-shared), the STREX on peripheral memory could behave differently?
If yes, what is the preferred MPU config for it, would an explicit MPU region defined as "Strongly ordered" be fine?
@Thomas Roell wrote:So my conclusion is that the problem you are seeing is something different. Mainly because either local/global (internal/external) exclusive monitors work, all the time, or they fail, all the time, but not randomly would work.
OK, that seems to make sense, thanks.
2026-06-03 6:04 AM
> Does this mean that for CM7 depending on how the peripheral region is defined in the MPU config (shared vs. non-shared), the STREX on peripheral memory could behave differently?
Possibly, but unlikely (unless there is a hardware bug). The default memory map without MPU is using "shared" for the device region.
I am lost of what your issue could be. Naively I'd say that your problem is really that some other code writes to CR1 while inside ATOMIC_CLEAR_BIT(CR1, ...). The local exclusive monitor does not get reset by another type of STR operation, other than CLREX or STREX. But then again this can only happen if an ISR interrupts the ATOMIC_CLEAR_BIT(), which means as a side effect the local exclusive monitor gets cleared. Howver even if another busmaster trips that, it would not explain the "freeze" that you are observing.
The other thing that really throws me off is that you cannot break into the "freeze" with the debugger. If say the global exclusive monitor would fail, then STREX would simply fail, and you end up having a loop that does not terminate. But you still could break into this with a debugger.
Hence back to "declare device memory as not shared and see what happens".
2026-06-09 1:32 AM
@Thomas Roell wrote:I am lost of what your issue could be. Naively I'd say that your problem is really that some other code writes to CR1 while inside ATOMIC_CLEAR_BIT(CR1, ...). The local exclusive monitor does not get reset by another type of STR operation, other than CLREX or STREX. But then again this can only happen if an ISR interrupts the ATOMIC_CLEAR_BIT(), which means as a side effect the local exclusive monitor gets cleared.
And in that case the instruction trace would have traced the ISR entry, which didn't happen.
@Thomas Roell wrote:The other thing that really throws me off is that you cannot break into the "freeze" with the debugger. If say the global exclusive monitor would fail, then STREX would simply fail, and you end up having a loop that does not terminate. But you still could break into this with a debugger.
Something else I could try to see whether it is either:
is to also include loads and stores in the instruction trace. This would give a bit more clarity on what is exactly happening.
@Thomas Roell wrote:Hence back to "declare device memory as not shared and see what happens".
I will test this. Just to be sure, do you mean define device memory as "Nonshared Device": TEX=0b010 C=0 B=0 S=0 (see https://developer.arm.com/documentation/dui0646/c/Cortex-M7-Peripherals/Optional-Memory-Protection-Unit/MPU-access-permission-attributes?lang=en)?
We’re moving the ST Community to a new platform to give you a better and more reliable community experience.