2023-12-07 08:22:58

by Jensen Huang

[permalink] [raw]
Subject: [PATCH v2] i2c: rk3x: fix potential spinlock recursion on poll

Possible deadlock scenario (on reboot):
rk3x_i2c_xfer_common(polling)
-> rk3x_i2c_wait_xfer_poll()
-> rk3x_i2c_irq(0, i2c);
--> spin_lock(&i2c->lock);
...
<rk3x i2c interrupt>
-> rk3x_i2c_irq(0, i2c);
--> spin_lock(&i2c->lock); (deadlock here)

Store the IRQ number and disable/enable it around the polling transfer.
This patch has been tested on NanoPC-T4.

Signed-off-by: Jensen Huang <[email protected]>
Reviewed-by: Heiko Stuebner <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
---
Changes in v2:
- Add description for member 'irq' to fix build warning

drivers/i2c/busses/i2c-rk3x.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-rk3x.c b/drivers/i2c/busses/i2c-rk3x.c
index a044ca0c35a1..4362db7c5789 100644
--- a/drivers/i2c/busses/i2c-rk3x.c
+++ b/drivers/i2c/busses/i2c-rk3x.c
@@ -178,6 +178,7 @@ struct rk3x_i2c_soc_data {
* @clk: function clk for rk3399 or function & Bus clks for others
* @pclk: Bus clk for rk3399
* @clk_rate_nb: i2c clk rate change notify
+ * @irq: irq number
* @t: I2C known timing information
* @lock: spinlock for the i2c bus
* @wait: the waitqueue to wait for i2c transfer
@@ -200,6 +201,7 @@ struct rk3x_i2c {
struct clk *clk;
struct clk *pclk;
struct notifier_block clk_rate_nb;
+ int irq;

/* Settings */
struct i2c_timings t;
@@ -1087,13 +1089,18 @@ static int rk3x_i2c_xfer_common(struct i2c_adapter *adap,

spin_unlock_irqrestore(&i2c->lock, flags);

- rk3x_i2c_start(i2c);
-
if (!polling) {
+ rk3x_i2c_start(i2c);
+
timeout = wait_event_timeout(i2c->wait, !i2c->busy,
msecs_to_jiffies(WAIT_TIMEOUT));
} else {
+ disable_irq(i2c->irq);
+ rk3x_i2c_start(i2c);
+
timeout = rk3x_i2c_wait_xfer_poll(i2c);
+
+ enable_irq(i2c->irq);
}

spin_lock_irqsave(&i2c->lock, flags);
@@ -1310,6 +1317,8 @@ static int rk3x_i2c_probe(struct platform_device *pdev)
return ret;
}

+ i2c->irq = irq;
+
platform_set_drvdata(pdev, i2c);

if (i2c->soc_data->calc_timings == rk3x_i2c_v0_calc_timings) {
--
2.42.0


2023-12-07 08:37:28

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v2] i2c: rk3x: fix potential spinlock recursion on poll

On 2023-12-07 09:21, Jensen Huang wrote:
> Possible deadlock scenario (on reboot):
> rk3x_i2c_xfer_common(polling)
> -> rk3x_i2c_wait_xfer_poll()
> -> rk3x_i2c_irq(0, i2c);
> --> spin_lock(&i2c->lock);
> ...
> <rk3x i2c interrupt>
> -> rk3x_i2c_irq(0, i2c);
> --> spin_lock(&i2c->lock); (deadlock here)
>
> Store the IRQ number and disable/enable it around the polling transfer.
> This patch has been tested on NanoPC-T4.

In case you haven't already seen the related discussion linked below,
please have a look. I also added more people to the list of recipients,
in an attempt to make everyone aware of the different approaches to
solving this issue.

https://lore.kernel.org/all/[email protected]/T/#m6fc9c214452fec6681843e7f455978c35c6f6c8b

> Signed-off-by: Jensen Huang <[email protected]>
> Reviewed-by: Heiko Stuebner <[email protected]>
> Reviewed-by: Andi Shyti <[email protected]>
> ---
> Changes in v2:
> - Add description for member 'irq' to fix build warning
>
> drivers/i2c/busses/i2c-rk3x.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/i2c/busses/i2c-rk3x.c
> b/drivers/i2c/busses/i2c-rk3x.c
> index a044ca0c35a1..4362db7c5789 100644
> --- a/drivers/i2c/busses/i2c-rk3x.c
> +++ b/drivers/i2c/busses/i2c-rk3x.c
> @@ -178,6 +178,7 @@ struct rk3x_i2c_soc_data {
> * @clk: function clk for rk3399 or function & Bus clks for others
> * @pclk: Bus clk for rk3399
> * @clk_rate_nb: i2c clk rate change notify
> + * @irq: irq number
> * @t: I2C known timing information
> * @lock: spinlock for the i2c bus
> * @wait: the waitqueue to wait for i2c transfer
> @@ -200,6 +201,7 @@ struct rk3x_i2c {
> struct clk *clk;
> struct clk *pclk;
> struct notifier_block clk_rate_nb;
> + int irq;
>
> /* Settings */
> struct i2c_timings t;
> @@ -1087,13 +1089,18 @@ static int rk3x_i2c_xfer_common(struct
> i2c_adapter *adap,
>
> spin_unlock_irqrestore(&i2c->lock, flags);
>
> - rk3x_i2c_start(i2c);
> -
> if (!polling) {
> + rk3x_i2c_start(i2c);
> +
> timeout = wait_event_timeout(i2c->wait, !i2c->busy,
> msecs_to_jiffies(WAIT_TIMEOUT));
> } else {
> + disable_irq(i2c->irq);
> + rk3x_i2c_start(i2c);
> +
> timeout = rk3x_i2c_wait_xfer_poll(i2c);
> +
> + enable_irq(i2c->irq);
> }
>
> spin_lock_irqsave(&i2c->lock, flags);
> @@ -1310,6 +1317,8 @@ static int rk3x_i2c_probe(struct platform_device
> *pdev)
> return ret;
> }
>
> + i2c->irq = irq;
> +
> platform_set_drvdata(pdev, i2c);
>
> if (i2c->soc_data->calc_timings == rk3x_i2c_v0_calc_timings) {

2023-12-07 10:13:04

by Jensen Huang

[permalink] [raw]
Subject: Re: [PATCH v2] i2c: rk3x: fix potential spinlock recursion on poll

On Thu, Dec 7, 2023 at 4:37 PM Dragan Simic <[email protected]> wrote:
>
> On 2023-12-07 09:21, Jensen Huang wrote:
> > Possible deadlock scenario (on reboot):
> > rk3x_i2c_xfer_common(polling)
> > -> rk3x_i2c_wait_xfer_poll()
> > -> rk3x_i2c_irq(0, i2c);
> > --> spin_lock(&i2c->lock);
> > ...
> > <rk3x i2c interrupt>
> > -> rk3x_i2c_irq(0, i2c);
> > --> spin_lock(&i2c->lock); (deadlock here)
> >
> > Store the IRQ number and disable/enable it around the polling transfer.
> > This patch has been tested on NanoPC-T4.
>
> In case you haven't already seen the related discussion linked below,
> please have a look. I also added more people to the list of recipients,
> in an attempt to make everyone aware of the different approaches to
> solving this issue.
>
> https://lore.kernel.org/all/[email protected]/T/#m6fc9c214452fec6681843e7f455978c35c6f6c8b

Thank you for providing the information. I hadn't seen this link before.
After carefully looking into the related discussion, it appears that
Dmitry Osipenko is already working on a suitable patch. To avoid duplication
or conflicts, my patch can be discarded.

--
Best regards,
Jensen

2023-12-07 14:11:02

by Dragan Simic

[permalink] [raw]
Subject: Re: [PATCH v2] i2c: rk3x: fix potential spinlock recursion on poll

On 2023-12-07 10:25, Jensen Huang wrote:
> On Thu, Dec 7, 2023 at 4:37 PM Dragan Simic <[email protected]> wrote:
>>
>> On 2023-12-07 09:21, Jensen Huang wrote:
>> > Possible deadlock scenario (on reboot):
>> > rk3x_i2c_xfer_common(polling)
>> > -> rk3x_i2c_wait_xfer_poll()
>> > -> rk3x_i2c_irq(0, i2c);
>> > --> spin_lock(&i2c->lock);
>> > ...
>> > <rk3x i2c interrupt>
>> > -> rk3x_i2c_irq(0, i2c);
>> > --> spin_lock(&i2c->lock); (deadlock here)
>> >
>> > Store the IRQ number and disable/enable it around the polling transfer.
>> > This patch has been tested on NanoPC-T4.
>>
>> In case you haven't already seen the related discussion linked below,
>> please have a look. I also added more people to the list of
>> recipients,
>> in an attempt to make everyone aware of the different approaches to
>> solving this issue.
>>
>> https://lore.kernel.org/all/[email protected]/T/#m6fc9c214452fec6681843e7f455978c35c6f6c8b
>
> Thank you for providing the information. I hadn't seen this link
> before.
> After carefully looking into the related discussion, it appears that
> Dmitry Osipenko is already working on a suitable patch. To avoid
> duplication
> or conflicts, my patch can be discarded.

Thank you for responding so quickly. Perhaps it would be best to hear
from Dmitry as well, before discarding anything. It's been a while
since Dmitry wrote about working on the patch, so he might have
abandoned it.

2023-12-07 16:01:25

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH v2] i2c: rk3x: fix potential spinlock recursion on poll

On 12/7/23 17:10, Dragan Simic wrote:
> On 2023-12-07 10:25, Jensen Huang wrote:
>> On Thu, Dec 7, 2023 at 4:37 PM Dragan Simic <[email protected]> wrote:
>>>
>>> On 2023-12-07 09:21, Jensen Huang wrote:
>>> > Possible deadlock scenario (on reboot):
>>> > rk3x_i2c_xfer_common(polling)
>>> >     -> rk3x_i2c_wait_xfer_poll()
>>> >         -> rk3x_i2c_irq(0, i2c);
>>> >             --> spin_lock(&i2c->lock);
>>> >             ...
>>> >         <rk3x i2c interrupt>
>>> >         -> rk3x_i2c_irq(0, i2c);
>>> >             --> spin_lock(&i2c->lock); (deadlock here)
>>> >
>>> > Store the IRQ number and disable/enable it around the polling
>>> transfer.
>>> > This patch has been tested on NanoPC-T4.
>>>
>>> In case you haven't already seen the related discussion linked below,
>>> please have a look.  I also added more people to the list of recipients,
>>> in an attempt to make everyone aware of the different approaches to
>>> solving this issue.
>>>
>>> https://lore.kernel.org/all/[email protected]/T/#m6fc9c214452fec6681843e7f455978c35c6f6c8b
>>
>> Thank you for providing the information. I hadn't seen this link before.
>> After carefully looking into the related discussion, it appears that
>> Dmitry Osipenko is already working on a suitable patch. To avoid
>> duplication
>> or conflicts, my patch can be discarded.
>
> Thank you for responding so quickly.  Perhaps it would be best to hear
> from Dmitry as well, before discarding anything.  It's been a while
> since Dmitry wrote about working on the patch, so he might have
> abandoned it.

This patch is okay. In general, will be better to have IRQ disabled by
default like I did in my variant, it should allow to remove the spinlock
entirely. Of course this also can be done later on in a follow up
patches. Jensen, feel free to use my variant of the patch, add my
s-o-b+co-developed tags to the commit msg if you'll do. Otherwise I'll
be able to send my patch next week.

--
Best regards,
Dmitry

2023-12-08 08:54:17

by Jensen Huang

[permalink] [raw]
Subject: Re: [PATCH v2] i2c: rk3x: fix potential spinlock recursion on poll

On Fri, Dec 8, 2023 at 12:00 AM Dmitry Osipenko
<[email protected]> wrote:
>
> On 12/7/23 17:10, Dragan Simic wrote:
> > On 2023-12-07 10:25, Jensen Huang wrote:
> >> On Thu, Dec 7, 2023 at 4:37 PM Dragan Simic <[email protected]> wrote:
> >>>
> >>> On 2023-12-07 09:21, Jensen Huang wrote:
> >>> > Possible deadlock scenario (on reboot):
> >>> > rk3x_i2c_xfer_common(polling)
> >>> > -> rk3x_i2c_wait_xfer_poll()
> >>> > -> rk3x_i2c_irq(0, i2c);
> >>> > --> spin_lock(&i2c->lock);
> >>> > ...
> >>> > <rk3x i2c interrupt>
> >>> > -> rk3x_i2c_irq(0, i2c);
> >>> > --> spin_lock(&i2c->lock); (deadlock here)
> >>> >
> >>> > Store the IRQ number and disable/enable it around the polling
> >>> transfer.
> >>> > This patch has been tested on NanoPC-T4.
> >>>
> >>> In case you haven't already seen the related discussion linked below,
> >>> please have a look. I also added more people to the list of recipients,
> >>> in an attempt to make everyone aware of the different approaches to
> >>> solving this issue.
> >>>
> >>> https://lore.kernel.org/all/[email protected]/T/#m6fc9c214452fec6681843e7f455978c35c6f6c8b
> >>
> >> Thank you for providing the information. I hadn't seen this link before.
> >> After carefully looking into the related discussion, it appears that
> >> Dmitry Osipenko is already working on a suitable patch. To avoid
> >> duplication
> >> or conflicts, my patch can be discarded.
> >
> > Thank you for responding so quickly. Perhaps it would be best to hear
> > from Dmitry as well, before discarding anything. It's been a while
> > since Dmitry wrote about working on the patch, so he might have
> > abandoned it.
>
> This patch is okay. In general, will be better to have IRQ disabled by
> default like I did in my variant, it should allow to remove the spinlock
> entirely. Of course this also can be done later on in a follow up
> patches. Jensen, feel free to use my variant of the patch, add my
> s-o-b+co-developed tags to the commit msg if you'll do. Otherwise I'll
> be able to send my patch next week.

Thank you for the suggestion. I've updated the patch to your variant, and
as confirmed by others, reboots are functioning correctly. I measured the
overhead of enable_irq/disable_irq() by calculating ktime in the
updated version,
and on rk3399, the minimum delta I observed was 291/875 ns. This extra
cost may impact most interrupt-based transfers. Therefore, I personally lean
towards the current v2 patch and handle the spinlock and irqsave/restore in
a follow up patch. I'd like to hear everyone's thoughts on this.

--
Best regards,
Jensen

2023-12-08 12:18:01

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH v2] i2c: rk3x: fix potential spinlock recursion on poll

On 12/8/23 11:53, Jensen Huang wrote:
> On Fri, Dec 8, 2023 at 12:00 AM Dmitry Osipenko
> <[email protected]> wrote:
>>
>> On 12/7/23 17:10, Dragan Simic wrote:
>>> On 2023-12-07 10:25, Jensen Huang wrote:
>>>> On Thu, Dec 7, 2023 at 4:37 PM Dragan Simic <[email protected]> wrote:
>>>>>
>>>>> On 2023-12-07 09:21, Jensen Huang wrote:
>>>>>> Possible deadlock scenario (on reboot):
>>>>>> rk3x_i2c_xfer_common(polling)
>>>>>> -> rk3x_i2c_wait_xfer_poll()
>>>>>> -> rk3x_i2c_irq(0, i2c);
>>>>>> --> spin_lock(&i2c->lock);
>>>>>> ...
>>>>>> <rk3x i2c interrupt>
>>>>>> -> rk3x_i2c_irq(0, i2c);
>>>>>> --> spin_lock(&i2c->lock); (deadlock here)
>>>>>>
>>>>>> Store the IRQ number and disable/enable it around the polling
>>>>> transfer.
>>>>>> This patch has been tested on NanoPC-T4.
>>>>>
>>>>> In case you haven't already seen the related discussion linked below,
>>>>> please have a look. I also added more people to the list of recipients,
>>>>> in an attempt to make everyone aware of the different approaches to
>>>>> solving this issue.
>>>>>
>>>>> https://lore.kernel.org/all/[email protected]/T/#m6fc9c214452fec6681843e7f455978c35c6f6c8b
>>>>
>>>> Thank you for providing the information. I hadn't seen this link before.
>>>> After carefully looking into the related discussion, it appears that
>>>> Dmitry Osipenko is already working on a suitable patch. To avoid
>>>> duplication
>>>> or conflicts, my patch can be discarded.
>>>
>>> Thank you for responding so quickly. Perhaps it would be best to hear
>>> from Dmitry as well, before discarding anything. It's been a while
>>> since Dmitry wrote about working on the patch, so he might have
>>> abandoned it.
>>
>> This patch is okay. In general, will be better to have IRQ disabled by
>> default like I did in my variant, it should allow to remove the spinlock
>> entirely. Of course this also can be done later on in a follow up
>> patches. Jensen, feel free to use my variant of the patch, add my
>> s-o-b+co-developed tags to the commit msg if you'll do. Otherwise I'll
>> be able to send my patch next week.
>
> Thank you for the suggestion. I've updated the patch to your variant, and
> as confirmed by others, reboots are functioning correctly. I measured the
> overhead of enable_irq/disable_irq() by calculating ktime in the
> updated version,
> and on rk3399, the minimum delta I observed was 291/875 ns. This extra
> cost may impact most interrupt-based transfers. Therefore, I personally lean
> towards the current v2 patch and handle the spinlock and irqsave/restore in
> a follow up patch. I'd like to hear everyone's thoughts on this.

Please don't use ktime for perf measurements, ktime itself is a slow API
and it should be 200us that ktime takes itself. Also, the 0.2us is
practically nothing, especially compared to I2C transfers measured in ms.

I'm fine with keeping your v2 variant for the bug fix if you prefer
that. Thanks for addressing the issue :)

--
Best regards,
Dmitry

2023-12-19 17:25:20

by Wolfram Sang

[permalink] [raw]
Subject: Re: [PATCH v2] i2c: rk3x: fix potential spinlock recursion on poll

On Thu, Dec 07, 2023 at 04:21:59PM +0800, Jensen Huang wrote:
> Possible deadlock scenario (on reboot):
> rk3x_i2c_xfer_common(polling)
> -> rk3x_i2c_wait_xfer_poll()
> -> rk3x_i2c_irq(0, i2c);
> --> spin_lock(&i2c->lock);
> ...
> <rk3x i2c interrupt>
> -> rk3x_i2c_irq(0, i2c);
> --> spin_lock(&i2c->lock); (deadlock here)
>
> Store the IRQ number and disable/enable it around the polling transfer.
> This patch has been tested on NanoPC-T4.
>
> Signed-off-by: Jensen Huang <[email protected]>
> Reviewed-by: Heiko Stuebner <[email protected]>
> Reviewed-by: Andi Shyti <[email protected]>

Applied to for-current, thanks!

But I'd like to see the follow-up patches somewhen which have been
discussed in this thread.


Attachments:
(No filename) (820.00 B)
signature.asc (849.00 B)
Download all attachments

2023-12-22 08:22:51

by Jensen Huang

[permalink] [raw]
Subject: Re: [PATCH v2] i2c: rk3x: fix potential spinlock recursion on poll

On Wed, Dec 20, 2023 at 1:23 AM Wolfram Sang <[email protected]> wrote:
>
> On Thu, Dec 07, 2023 at 04:21:59PM +0800, Jensen Huang wrote:
> > Possible deadlock scenario (on reboot):
> > rk3x_i2c_xfer_common(polling)
> > -> rk3x_i2c_wait_xfer_poll()
> > -> rk3x_i2c_irq(0, i2c);
> > --> spin_lock(&i2c->lock);
> > ...
> > <rk3x i2c interrupt>
> > -> rk3x_i2c_irq(0, i2c);
> > --> spin_lock(&i2c->lock); (deadlock here)
> >
> > Store the IRQ number and disable/enable it around the polling transfer.
> > This patch has been tested on NanoPC-T4.
> >
> > Signed-off-by: Jensen Huang <[email protected]>
> > Reviewed-by: Heiko Stuebner <[email protected]>
> > Reviewed-by: Andi Shyti <[email protected]>
>
> Applied to for-current, thanks!
>
> But I'd like to see the follow-up patches somewhen which have been
> discussed in this thread.

I've made some attempts, such as removing irqsave/restore, but it
didn't meet my expectations and
requires further testing. Therefore, I may not be able to submit such
a patch in the short term. However,
if someone else submits a new patch, I'd be happy to test it on rk3328/rk3399.


On Fri, Dec 8, 2023 at 8:17 PM Dmitry Osipenko
<[email protected]> wrote:
>
> On 12/8/23 11:53, Jensen Huang wrote:
> > On Fri, Dec 8, 2023 at 12:00 AM Dmitry Osipenko
> > <[email protected]> wrote:
> >>
> >> On 12/7/23 17:10, Dragan Simic wrote:
> >>> On 2023-12-07 10:25, Jensen Huang wrote:
> >>>> On Thu, Dec 7, 2023 at 4:37 PM Dragan Simic <[email protected]> wrote:
> >>>>>
> >>>>> On 2023-12-07 09:21, Jensen Huang wrote:
> >>>>>> Possible deadlock scenario (on reboot):
> >>>>>> rk3x_i2c_xfer_common(polling)
> >>>>>> -> rk3x_i2c_wait_xfer_poll()
> >>>>>> -> rk3x_i2c_irq(0, i2c);
> >>>>>> --> spin_lock(&i2c->lock);
> >>>>>> ...
> >>>>>> <rk3x i2c interrupt>
> >>>>>> -> rk3x_i2c_irq(0, i2c);
> >>>>>> --> spin_lock(&i2c->lock); (deadlock here)
> >>>>>>
> >>>>>> Store the IRQ number and disable/enable it around the polling
> >>>>> transfer.
> >>>>>> This patch has been tested on NanoPC-T4.
> >>>>>
> >>>>> In case you haven't already seen the related discussion linked below,
> >>>>> please have a look. I also added more people to the list of recipients,
> >>>>> in an attempt to make everyone aware of the different approaches to
> >>>>> solving this issue.
> >>>>>
> >>>>> https://lore.kernel.org/all/[email protected]/T/#m6fc9c214452fec6681843e7f455978c35c6f6c8b
> >>>>
> >>>> Thank you for providing the information. I hadn't seen this link before.
> >>>> After carefully looking into the related discussion, it appears that
> >>>> Dmitry Osipenko is already working on a suitable patch. To avoid
> >>>> duplication
> >>>> or conflicts, my patch can be discarded.
> >>>
> >>> Thank you for responding so quickly. Perhaps it would be best to hear
> >>> from Dmitry as well, before discarding anything. It's been a while
> >>> since Dmitry wrote about working on the patch, so he might have
> >>> abandoned it.
> >>
> >> This patch is okay. In general, will be better to have IRQ disabled by
> >> default like I did in my variant, it should allow to remove the spinlock
> >> entirely. Of course this also can be done later on in a follow up
> >> patches. Jensen, feel free to use my variant of the patch, add my
> >> s-o-b+co-developed tags to the commit msg if you'll do. Otherwise I'll
> >> be able to send my patch next week.
> >
> > Thank you for the suggestion. I've updated the patch to your variant, and
> > as confirmed by others, reboots are functioning correctly. I measured the
> > overhead of enable_irq/disable_irq() by calculating ktime in the
> > updated version,
> > and on rk3399, the minimum delta I observed was 291/875 ns. This extra
> > cost may impact most interrupt-based transfers. Therefore, I personally lean
> > towards the current v2 patch and handle the spinlock and irqsave/restore in
> > a follow up patch. I'd like to hear everyone's thoughts on this.
>
> Please don't use ktime for perf measurements, ktime itself is a slow API
> and it should be 200us that ktime takes itself. Also, the 0.2us is
> practically nothing, especially compared to I2C transfers measured in ms.
>
> I'm fine with keeping your v2 variant for the bug fix if you prefer
> that. Thanks for addressing the issue :)

Thank you for clarifying the situation with ktime. Honestly, I also
believe that the overhead
of disable_irq/enable_irq should be minimal. Considering that system
frequency adjustments
involve I2C transfers, I conducted this 'somewhat imprecise' measurements.


--
Best regards,
Jensen