While playing with SMBus alert functionality, I noticed the following
messages if alert was active on more than once device.
smbus_alert 3-000c: SMBALERT# from dev 0x0c, flag 0
smbus_alert 3-000c: no driver alert()!
or:
smbus_alert 3-000c: SMBALERT# from dev 0x28, flag 0
This is seen if multiple devices assert alert at the same time and at least
one of them does not or not correctly implement SMBus arbitration.
Once it starts, this message repeats forever at high rate.
Worst case, the problem turn resulted in system crashes after a while.
The following two patches fix the problem for me. The first patch
aborts the endless loop in smbus_alert() if no handler is found
for an alert address. The second patch sends alerts to all devices
with alert handler if that situation is observed.
I split the changes into two patches since I figured that the first patch
might be easier to accept. However, both patches are really needed to
fix the problem for good.
Note that there is one situation which is not addressed by this set of
patches: If the corrupted address points to yet another device with alert
handler on the same bus, the alert handler of that device will be called.
If it is not a source of the alert, we are back to the original problem.
I do not know how to address this case.
----------------------------------------------------------------
Guenter Roeck (2):
i2c: smbus: Improve handling of stuck alerts
i2c: smbus: Send alert notifications to all devices if source not found
drivers/i2c/i2c-smbus.c | 64 ++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 58 insertions(+), 6 deletions(-)
The following messages were observed while testing alert functionality
on systems with multiple I2C devices on a single bus if alert was active
on more than one chip.
smbus_alert 3-000c: SMBALERT# from dev 0x0c, flag 0
smbus_alert 3-000c: no driver alert()!
and:
smbus_alert 3-000c: SMBALERT# from dev 0x28, flag 0
Once it starts, this message repeats forever at high rate. There is no
device at any of the reported addresses.
Analysis shows that this is seen if multiple devices have the alert pin
active. Apparently some devices do not support SMBus arbitration correctly.
They keep sending address bits after detecting an address collision and
handle the collision not at all or too late.
Specifically, address 0x0c is seen with ADT7461A at address 0x4c and
ADM1021 at address 0x18 if alert is active on both chips. Address 0x28 is
seen with ADT7483 at address 0x2a and ADT7461 at address 0x4c if alert is
active on both chips.
Once the system is in bad state (alert is set by more than one chip),
it often only recovers by power cycling.
To reduce the impact of this problem, abort the endless loop in
smbus_alert() if the same address is read more than once and not
handled by a driver.
Fixes: b5527a7766f0 ("i2c: Add SMBus alert support")
Signed-off-by: Guenter Roeck <[email protected]>
---
drivers/i2c/i2c-smbus.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)
diff --git a/drivers/i2c/i2c-smbus.c b/drivers/i2c/i2c-smbus.c
index d3d06e3b4f3b..533c885b99ac 100644
--- a/drivers/i2c/i2c-smbus.c
+++ b/drivers/i2c/i2c-smbus.c
@@ -34,6 +34,7 @@ static int smbus_do_alert(struct device *dev, void *addrp)
struct i2c_client *client = i2c_verify_client(dev);
struct alert_data *data = addrp;
struct i2c_driver *driver;
+ int ret;
if (!client || client->addr != data->addr)
return 0;
@@ -47,16 +48,21 @@ static int smbus_do_alert(struct device *dev, void *addrp)
device_lock(dev);
if (client->dev.driver) {
driver = to_i2c_driver(client->dev.driver);
- if (driver->alert)
+ if (driver->alert) {
driver->alert(client, data->type, data->data);
- else
+ ret = -EBUSY;
+ } else {
dev_warn(&client->dev, "no driver alert()!\n");
- } else
+ ret = -EOPNOTSUPP;
+ }
+ } else {
dev_dbg(&client->dev, "alert with no driver\n");
+ ret = -ENODEV;
+ }
device_unlock(dev);
/* Stop iterating after we find the device */
- return -EBUSY;
+ return ret;
}
/*
@@ -67,6 +73,7 @@ static irqreturn_t smbus_alert(int irq, void *d)
{
struct i2c_smbus_alert *alert = d;
struct i2c_client *ara;
+ unsigned short prev_addr = 0; /* Not a valid address */
ara = alert->ara;
@@ -94,8 +101,19 @@ static irqreturn_t smbus_alert(int irq, void *d)
data.addr, data.data);
/* Notify driver for the device which issued the alert */
- device_for_each_child(&ara->adapter->dev, &data,
- smbus_do_alert);
+ status = device_for_each_child(&ara->adapter->dev, &data,
+ smbus_do_alert);
+ /*
+ * If we read the same address more than once, and the alert
+ * was not handled by a driver, it won't do any good to repeat
+ * the loop because it will never terminate.
+ * Bail out in this case.
+ * Note: This assumes that a driver with alert handler handles
+ * the alert properly and clears it if necessary.
+ */
+ if (data.addr == prev_addr && status != -EBUSY)
+ break;
+ prev_addr = data.addr;
}
return IRQ_HANDLED;
--
2.33.0
If a SMBUs alert is received and the originating device is not found,
the reason may be that the address reported on the SMBus alert address
is corrupted, for example because multiple devices asserted alert and
do not correctly implement SMBus arbitration.
If this happens, call alert handlers on all devices connected to the
given I2C bus, in the hope that this cleans up the situation. Retry
twice before giving up.
This change reliably fixed the problem on a system with multiple devices
on a single bus. Example log where the device on address 0x18 (ADM1021)
and on address 0x4c (ADM7461A) both had the alert line asserted:
smbus_alert 3-000c: SMBALERT# from dev 0x0c, flag 0
smbus_alert 3-000c: no driver alert()!
smbus_alert 3-000c: SMBALERT# from dev 0x0c, flag 0
smbus_alert 3-000c: no driver alert()!
lm90 3-0018: temp1 out of range, please check!
lm90 3-0018: Disabling ALERT#
lm90 3-0029: Everything OK
lm90 3-002a: Everything OK
lm90 3-004c: temp1 out of range, please check!
lm90 3-004c: temp2 out of range, please check!
lm90 3-004c: Disabling ALERT#
Fixes: b5527a7766f0 ("i2c: Add SMBus alert support")
Signed-off-by: Guenter Roeck <[email protected]>
---
drivers/i2c/i2c-smbus.c | 38 ++++++++++++++++++++++++++++++++++++--
1 file changed, 36 insertions(+), 2 deletions(-)
diff --git a/drivers/i2c/i2c-smbus.c b/drivers/i2c/i2c-smbus.c
index 533c885b99ac..f48cec19db41 100644
--- a/drivers/i2c/i2c-smbus.c
+++ b/drivers/i2c/i2c-smbus.c
@@ -65,6 +65,32 @@ static int smbus_do_alert(struct device *dev, void *addrp)
return ret;
}
+/* Same as above, but call back all drivers with alert handler */
+
+static int smbus_do_alert_force(struct device *dev, void *addrp)
+{
+ struct i2c_client *client = i2c_verify_client(dev);
+ struct alert_data *data = addrp;
+ struct i2c_driver *driver;
+
+ if (!client || (client->flags & I2C_CLIENT_TEN))
+ return 0;
+
+ /*
+ * Drivers should either disable alerts, or provide at least
+ * a minimal handler. Lock so the driver won't change.
+ */
+ device_lock(dev);
+ if (client->dev.driver) {
+ driver = to_i2c_driver(client->dev.driver);
+ if (driver->alert)
+ driver->alert(client, data->type, data->data);
+ }
+ device_unlock(dev);
+
+ return 0;
+}
+
/*
* The alert IRQ handler needs to hand work off to a task which can issue
* SMBus calls, because those sleeping calls can't be made in IRQ context.
@@ -74,6 +100,7 @@ static irqreturn_t smbus_alert(int irq, void *d)
struct i2c_smbus_alert *alert = d;
struct i2c_client *ara;
unsigned short prev_addr = 0; /* Not a valid address */
+ int retries = 0;
ara = alert->ara;
@@ -111,8 +138,15 @@ static irqreturn_t smbus_alert(int irq, void *d)
* Note: This assumes that a driver with alert handler handles
* the alert properly and clears it if necessary.
*/
- if (data.addr == prev_addr && status != -EBUSY)
- break;
+ if (data.addr == prev_addr && status != -EBUSY) {
+ /* retry once */
+ if (retries++)
+ break;
+ device_for_each_child(&ara->adapter->dev, &data,
+ smbus_do_alert_force);
+ } else {
+ retries = 0;
+ }
prev_addr = data.addr;
}
--
2.33.0
Hi,
On Mon, Jan 10, 2022 at 09:28:55AM -0800, Guenter Roeck wrote:
> While playing with SMBus alert functionality, I noticed the following
> messages if alert was active on more than once device.
>
> smbus_alert 3-000c: SMBALERT# from dev 0x0c, flag 0
> smbus_alert 3-000c: no driver alert()!
>
> or:
>
> smbus_alert 3-000c: SMBALERT# from dev 0x28, flag 0
>
> This is seen if multiple devices assert alert at the same time and at least
> one of them does not or not correctly implement SMBus arbitration.
>
> Once it starts, this message repeats forever at high rate.
> Worst case, the problem turn resulted in system crashes after a while.
>
> The following two patches fix the problem for me. The first patch
> aborts the endless loop in smbus_alert() if no handler is found
> for an alert address. The second patch sends alerts to all devices
> with alert handler if that situation is observed.
>
> I split the changes into two patches since I figured that the first patch
> might be easier to accept. However, both patches are really needed to
> fix the problem for good.
>
> Note that there is one situation which is not addressed by this set of
> patches: If the corrupted address points to yet another device with alert
> handler on the same bus, the alert handler of that device will be called.
> If it is not a source of the alert, we are back to the original problem.
> I do not know how to address this case.
>
> ----------------------------------------------------------------
> Guenter Roeck (2):
> i2c: smbus: Improve handling of stuck alerts
> i2c: smbus: Send alert notifications to all devices if source not found
>
> drivers/i2c/i2c-smbus.c | 64 ++++++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 58 insertions(+), 6 deletions(-)
Looking through the patches I carry locally, I just noticed that
I never got a reply to this series. Is there a problem with it,
or did it just get lost ?
Thanks,
Guenter
> Looking through the patches I carry locally, I just noticed that
> I never got a reply to this series. Is there a problem with it,
> or did it just get lost ?
The only problem was that I didn't have the bandwidth. But luckily, I
need to work on SMBALERT myself now, so I will handle all related
commits around that.
On 6/12/24 13:21, Wolfram Sang wrote:
>
>> Looking through the patches I carry locally, I just noticed that
>> I never got a reply to this series. Is there a problem with it,
>> or did it just get lost ?
>
> The only problem was that I didn't have the bandwidth. But luckily, I
> need to work on SMBALERT myself now, so I will handle all related
> commits around that.
>
Ah, just the "normal" problem. Let me know if I can help.
I still have the hardware that I used to test that code.
Guenter