Skip to content

STM32 I2CSlave race condition causing timeouts #15498

@agausmann

Description

@agausmann

Description of defect

When using I2CSlave on STM32 targets, it is possible for read or write to timeout and return an error even if the transfer succeeded.
This can cause significant stalls (0.3 sec or more) in the thread that called the I2CSlave read or write.

The cause seems to be an ABA problem, using the same flag value for two distinct states: "addressed, transfer pending" and for "transfer in progress". As a result, the driver is unable to distinguish between those two states. If the hardware transitions quickly enough from "in progress" to "idle" to "addressed", so that the driver never sees the idle state, then it will never realize that the transfer had completed and timeout with an error.

Example execution flow in the write case:

  • i2c_slave_write called by Thread X
    • Enters loop waiting for pending_slave_tx_master_rx to be cleared
  • Transfer completes, HAL_I2C_SlaveTxCpltCallback called from ISR
    • clears pending_slave_tx_master_rx
  • [Some other thread may be executed first, delaying the return to Thread X]
  • Master addresses the slave again, HAL_I2C_AddrCallback gets called from ISR
    • sets pending_slave_tx_master_rx
  • When Thread X resumes, pending_slave_tx_master_rx is set, and is not cleared until timeout

This seems to be fixed by using separate flags for "addressed" and "transfer in progress" states. I will be creating a PR in a moment to demonstrate this.

Target(s) affected by this defect ?

STM

Toolchain(s) (name and version) displaying this defect ?

GCC_ARM

What version of Mbed-os are you using (tag or sha) ?

baf6a30

What version(s) of tools are you using. List all that apply (E.g. mbed-cli)

  • mbed-tools 7.59.0
  • arm-none-eabi-gcc (GNU Arm Embedded Toolchain 10.3-2021.10) 10.3.1 20210824 (release)

How is this defect reproduced ?

https://github.com/agausmann/mbed-i2c-stall-repro

This has master and slave combined in one device firmware. I've also encountered the problem with separate devices, but this is the easiest way to demonstrate it.

This example logs to the ST-Link console (9600 baud) each time the slave receives a transfer, with the format <returncode> <payload>

If returncode is 1, then the I2C slave HAL timed out inside the loop. This can be confirmed by enabling DEBUG_STDIO in the file TARGET_STM/i2c_api.c, it will print "TIMEOUT or error in i2c_slave_read". Before it prints 1, you will also see a noticeable pause in the output.

asciicast

The expected behavior is all 0 (success) return codes, and a more consistent output rate in the console with no significant pauses.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Status

    Untriaged

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions