2013-06-06 13:44:09

by Christian Ruppert

[permalink] [raw]
Subject: [PATCH 1/2] i2c: designware: fix race between subsequent xfers

The designware block is not always properly disabled in the case of
transfer errors. Interrupts from aborted transfers might be handled
after the data structures for the following transfer are initialised but
before the hardware is set up. This might corrupt the data structures to
the point that the system is stuck in an infinite interrupt loop (where
FIFOs are never emptied).
This patch cleanly disables the designware-i2c hardware at the end of
every transfer, successful or not.

Signed-off-by: Christian Ruppert <[email protected]>
---
drivers/i2c/busses/i2c-designware-core.c | 14 +++++++++++---
1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/i2c/busses/i2c-designware-core.c b/drivers/i2c/busses/i2c-designware-core.c
index 6c0e776..65c0c7a 100644
--- a/drivers/i2c/busses/i2c-designware-core.c
+++ b/drivers/i2c/busses/i2c-designware-core.c
@@ -588,10 +588,20 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
ret = wait_for_completion_interruptible_timeout(&dev->cmd_complete, HZ);
if (ret == 0) {
dev_err(dev->dev, "controller timed out\n");
+ /* i2c_dw_init implicitly disables the adapter */
i2c_dw_init(dev);
ret = -ETIMEDOUT;
goto done;
- } else if (ret < 0)
+ }
+
+ /*
+ * We must disable the adapter before unlocking the &dev->lock mutex
+ * below. Otherwise the hardware might continue generating interrupts
+ * which in turn causes a race condition with the following transfer.
+ */
+ __i2c_dw_enable(dev, false);
+
+ if (ret < 0)
goto done;

if (dev->msg_err) {
@@ -601,8 +611,6 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)

/* no error */
if (likely(!dev->cmd_err)) {
- /* Disable the adapter */
- __i2c_dw_enable(dev, false);
ret = num;
goto done;
}
--
1.7.1


2013-06-06 13:44:19

by Christian Ruppert

[permalink] [raw]
Subject: [PATCH 2/2] i2c: designware: make i2c xfers non-interruptible

When the process at the source of an i2c transfer is killed in the
middle of the transfer, the transfer is interrupted. Interrupted
transfers might cause buggy slaves on the bus (or higher level drivers)
to go haywire.
This patch forces ongoing i2c transfers to finish properly, even if the
initiating process is killed.

Signed-off-by: Christian Ruppert <[email protected]>
---
drivers/i2c/busses/i2c-designware-core.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/i2c/busses/i2c-designware-core.c b/drivers/i2c/busses/i2c-designware-core.c
index 65c0c7a..d903368 100644
--- a/drivers/i2c/busses/i2c-designware-core.c
+++ b/drivers/i2c/busses/i2c-designware-core.c
@@ -585,7 +585,7 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
i2c_dw_xfer_init(dev);

/* wait for tx to complete */
- ret = wait_for_completion_interruptible_timeout(&dev->cmd_complete, HZ);
+ ret = wait_for_completion_timeout(&dev->cmd_complete, HZ);
if (ret == 0) {
dev_err(dev->dev, "controller timed out\n");
/* i2c_dw_init implicitly disables the adapter */
--
1.7.1

2013-06-07 05:19:55

by Mika Westerberg

[permalink] [raw]
Subject: Re: [PATCH 1/2] i2c: designware: fix race between subsequent xfers

Hi Christian,

On Thu, Jun 06, 2013 at 03:43:35PM +0200, Christian Ruppert wrote:
> The designware block is not always properly disabled in the case of
> transfer errors. Interrupts from aborted transfers might be handled
> after the data structures for the following transfer are initialised but
> before the hardware is set up. This might corrupt the data structures to
> the point that the system is stuck in an infinite interrupt loop (where
> FIFOs are never emptied).
> This patch cleanly disables the designware-i2c hardware at the end of
> every transfer, successful or not.

Have you tried with the latest mainline driver? There is a commit that
solves similar problem:

2a2d95e9d6d29e7 i2c: designware: always clear interrupts before enabling them

Maybe it helps?

2013-06-07 05:21:52

by Mika Westerberg

[permalink] [raw]
Subject: Re: [PATCH 2/2] i2c: designware: make i2c xfers non-interruptible

On Thu, Jun 06, 2013 at 03:43:36PM +0200, Christian Ruppert wrote:
> When the process at the source of an i2c transfer is killed in the
> middle of the transfer, the transfer is interrupted. Interrupted
> transfers might cause buggy slaves on the bus (or higher level drivers)
> to go haywire.
> This patch forces ongoing i2c transfers to finish properly, even if the
> initiating process is killed.

I already sent a similar patch ;-)

https://patchwork.kernel.org/patch/2601241/

However, it is not yet picked by Wolfram.

2013-06-07 07:56:36

by Christian Ruppert

[permalink] [raw]
Subject: Re: [PATCH 2/2] i2c: designware: make i2c xfers non-interruptible

On Fri, Jun 07, 2013 at 08:25:55AM +0300, Mika Westerberg wrote:
> On Thu, Jun 06, 2013 at 03:43:36PM +0200, Christian Ruppert wrote:
> > When the process at the source of an i2c transfer is killed in the
> > middle of the transfer, the transfer is interrupted. Interrupted
> > transfers might cause buggy slaves on the bus (or higher level drivers)
> > to go haywire.
> > This patch forces ongoing i2c transfers to finish properly, even if the
> > initiating process is killed.
>
> I already sent a similar patch ;-)
>
> https://patchwork.kernel.org/patch/2601241/
>
> However, it is not yet picked by Wolfram.

Oops, this looks like a human race condition :) I checked main line and
Wolfram's git but didn't think of checking patchwork. I'll rebase the
second patch on yours.

--
Christian Ruppert , <[email protected]>
/|
Tel: +41/(0)22 816 19-42 //| 3, Chemin du Pr?-Fleuri
_// | bilis Systems CH-1228 Plan-les-Ouates

2013-06-07 08:17:11

by Christian Ruppert

[permalink] [raw]
Subject: Re: [PATCH 1/2] i2c: designware: fix race between subsequent xfers

On Fri, Jun 07, 2013 at 08:23:53AM +0300, Mika Westerberg wrote:
> Hi Christian,
>
> On Thu, Jun 06, 2013 at 03:43:35PM +0200, Christian Ruppert wrote:
> > The designware block is not always properly disabled in the case of
> > transfer errors. Interrupts from aborted transfers might be handled
> > after the data structures for the following transfer are initialised but
> > before the hardware is set up. This might corrupt the data structures to
> > the point that the system is stuck in an infinite interrupt loop (where
> > FIFOs are never emptied).
> > This patch cleanly disables the designware-i2c hardware at the end of
> > every transfer, successful or not.
>
> Have you tried with the latest mainline driver? There is a commit that
> solves similar problem:
>
> 2a2d95e9d6d29e7 i2c: designware: always clear interrupts before enabling them
>
> Maybe it helps?

Hi Mika,

Thanks for the hint but I have checked both main line and Wolfram's
branch and I saw this patch. I actually hoped it would fix our problem
but it didn't.

Here some more details: We experienced system lockups (complete lock up,
no reaction whatsoever) in long-term tests under heavy system load with
lots of scheduling and forking/killing. These lockups could be traced to
the I2C driver which after some time ended up in an incoherent state:
i2c_dw_isr was being called with DW_IC_INTR_RX_FULL but
dev->msg_read_idx == dev->msgs_num. This resulted in the FIFO never
being emptied by i2c_dw_read. Since the DW_IC_INTR_RX_FULL interrupt is
cleared by emptying the FIFO, this situation results in an IRQ loop
locking up the system.

We found that the situation systematically occurs just after the
originating process is interrupted (premature return of
wait_for_completion_interruptible_timeout) and further analysis showed
the race condition: Interrupts from the previous transfer are sometimes
triggered after the initialisation of dev in the beginning of
i2c_dw_xfer, thus corrupting the state. If these interrupts occur before
dev is initialised everything works fine.

An alternative solution would probably be to make sure the hardware is
disabled before initialising the dev structure in i2c_dw_xfer.

Greetings,
Christian

--
Christian Ruppert , <[email protected]>
/|
Tel: +41/(0)22 816 19-42 //| 3, Chemin du Pr?-Fleuri
_// | bilis Systems CH-1228 Plan-les-Ouates

2013-06-07 08:52:42

by Christian Ruppert

[permalink] [raw]
Subject: [PATCH V2] i2c: designware: fix race between subsequent xfers

The designware block is not always properly disabled in the case of
transfer errors. Interrupts from aborted transfers might be handled
after the data structures for the following transfer are initialised but
before the hardware is set up. This can corrupt the data structures to
the point that the system is stuck in an infinite interrupt loop (where
FIFOs are never emptied because dev->msg_read_idx == dev->msgs_num).

This patch cleanly disables the designware-i2c hardware at the end of
every transfer, be it successful or not.

This patch requires https://patchwork.kernel.org/patch/2601241/ to be
applied first.

Signed-off-by: Christian Ruppert <[email protected]>
---
drivers/i2c/busses/i2c-designware-core.c | 10 ++++++++--
1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-designware-core.c b/drivers/i2c/busses/i2c-designware-core.c
index b75d292..55a9991 100644
--- a/drivers/i2c/busses/i2c-designware-core.c
+++ b/drivers/i2c/busses/i2c-designware-core.c
@@ -588,11 +588,19 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
ret = wait_for_completion_timeout(&dev->cmd_complete, HZ);
if (ret == 0) {
dev_err(dev->dev, "controller timed out\n");
+ /* i2c_dw_init implicitly disables the adapter */
i2c_dw_init(dev);
ret = -ETIMEDOUT;
goto done;
}

+ /*
+ * We must disable the adapter before unlocking the &dev->lock mutex
+ * below. Otherwise the hardware might continue generating interrupts
+ * which in turn causes a race condition with the following transfer.
+ */
+ __i2c_dw_enable(dev, false);
+
if (dev->msg_err) {
ret = dev->msg_err;
goto done;
@@ -600,8 +608,6 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)

/* no error */
if (likely(!dev->cmd_err)) {
- /* Disable the adapter */
- __i2c_dw_enable(dev, false);
ret = num;
goto done;
}
--
1.7.1

2013-06-07 09:19:09

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH V2] i2c: designware: fix race between subsequent xfers

On Fri, Jun 7, 2013 at 11:51 AM, Christian Ruppert
<[email protected]> wrote:
> The designware block is not always properly disabled in the case of
> transfer errors. Interrupts from aborted transfers might be handled
> after the data structures for the following transfer are initialised but
> before the hardware is set up. This can corrupt the data structures to
> the point that the system is stuck in an infinite interrupt loop (where
> FIFOs are never emptied because dev->msg_read_idx == dev->msgs_num).
>
> This patch cleanly disables the designware-i2c hardware at the end of
> every transfer, be it successful or not.
>
> This patch requires https://patchwork.kernel.org/patch/2601241/ to be
> applied first.

What if you just move disabling code from i2c_dw_xfer_init() to the
top of i2c_dw_xfer() ?
Will it help?

--
With Best Regards,
Andy Shevchenko

2013-06-14 14:35:49

by Wolfram Sang

[permalink] [raw]
Subject: Re: [PATCH V2] i2c: designware: fix race between subsequent xfers

On Fri, Jun 07, 2013 at 10:51:23AM +0200, Christian Ruppert wrote:
> The designware block is not always properly disabled in the case of
> transfer errors. Interrupts from aborted transfers might be handled
> after the data structures for the following transfer are initialised but
> before the hardware is set up. This can corrupt the data structures to
> the point that the system is stuck in an infinite interrupt loop (where
> FIFOs are never emptied because dev->msg_read_idx == dev->msgs_num).
>
> This patch cleanly disables the designware-i2c hardware at the end of
> every transfer, be it successful or not.
>
> This patch requires https://patchwork.kernel.org/patch/2601241/ to be
> applied first.

These last two lines should be below "---".

>
> Signed-off-by: Christian Ruppert <[email protected]>
> ---
> drivers/i2c/busses/i2c-designware-core.c | 10 ++++++++--
> 1 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/i2c/busses/i2c-designware-core.c b/drivers/i2c/busses/i2c-designware-core.c
> index b75d292..55a9991 100644
> --- a/drivers/i2c/busses/i2c-designware-core.c
> +++ b/drivers/i2c/busses/i2c-designware-core.c
> @@ -588,11 +588,19 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
> ret = wait_for_completion_timeout(&dev->cmd_complete, HZ);
> if (ret == 0) {
> dev_err(dev->dev, "controller timed out\n");
> + /* i2c_dw_init implicitly disables the adapter */
> i2c_dw_init(dev);
> ret = -ETIMEDOUT;
> goto done;
> }
>
> + /*
> + * We must disable the adapter before unlocking the &dev->lock mutex
> + * below. Otherwise the hardware might continue generating interrupts
> + * which in turn causes a race condition with the following transfer.

I added "Needs some more investigation if the additional interrupts are
a hardware bug or this driver doesn't handle them correctly yet." to the
comment and

Applied to for-next, thanks!

BTW since I am currently here: i2c-designware-core should be in the
'algos' directory, no?


Attachments:
(No filename) (1.99 kB)
signature.asc (836.00 B)
Digital signature
Download all attachments

2013-06-17 08:20:38

by Christian Ruppert

[permalink] [raw]
Subject: Re: [PATCH V2] i2c: designware: fix race between subsequent xfers

On Fri, Jun 14, 2013 at 04:37:41PM +0200, Wolfram Sang wrote:
> On Fri, Jun 07, 2013 at 10:51:23AM +0200, Christian Ruppert wrote:
> [...]
> > + /*
> > + * We must disable the adapter before unlocking the &dev->lock mutex
> > + * below. Otherwise the hardware might continue generating interrupts
> > + * which in turn causes a race condition with the following transfer.
>
> I added "Needs some more investigation if the additional interrupts are
> a hardware bug or this driver doesn't handle them correctly yet." to the
> comment and

Good idea, thanks.

> Applied to for-next, thanks!
>
> BTW since I am currently here: i2c-designware-core should be in the
> 'algos' directory, no?

At the risk of passing for a complete moron: What exactly is the
difference between I2C algos and I2C bus drivers?

Greetings,
Christian

--
Christian Ruppert , <[email protected]>
/|
Tel: +41/(0)22 816 19-42 //| 3, Chemin du Pr?-Fleuri
_// | bilis Systems CH-1228 Plan-les-Ouates

2013-06-17 08:30:20

by Mika Westerberg

[permalink] [raw]
Subject: Re: [PATCH V2] i2c: designware: fix race between subsequent xfers

On Fri, Jun 14, 2013 at 04:37:41PM +0200, Wolfram Sang wrote:
> On Fri, Jun 07, 2013 at 10:51:23AM +0200, Christian Ruppert wrote:
> > The designware block is not always properly disabled in the case of
> > transfer errors. Interrupts from aborted transfers might be handled
> > after the data structures for the following transfer are initialised but
> > before the hardware is set up. This can corrupt the data structures to
> > the point that the system is stuck in an infinite interrupt loop (where
> > FIFOs are never emptied because dev->msg_read_idx == dev->msgs_num).
> >
> > This patch cleanly disables the designware-i2c hardware at the end of
> > every transfer, be it successful or not.
> >
> > This patch requires https://patchwork.kernel.org/patch/2601241/ to be
> > applied first.
>
> These last two lines should be below "---".
>
> >
> > Signed-off-by: Christian Ruppert <[email protected]>
> > ---
> > drivers/i2c/busses/i2c-designware-core.c | 10 ++++++++--
> > 1 files changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/i2c/busses/i2c-designware-core.c b/drivers/i2c/busses/i2c-designware-core.c
> > index b75d292..55a9991 100644
> > --- a/drivers/i2c/busses/i2c-designware-core.c
> > +++ b/drivers/i2c/busses/i2c-designware-core.c
> > @@ -588,11 +588,19 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
> > ret = wait_for_completion_timeout(&dev->cmd_complete, HZ);
> > if (ret == 0) {
> > dev_err(dev->dev, "controller timed out\n");
> > + /* i2c_dw_init implicitly disables the adapter */
> > i2c_dw_init(dev);
> > ret = -ETIMEDOUT;
> > goto done;
> > }
> >
> > + /*
> > + * We must disable the adapter before unlocking the &dev->lock mutex
> > + * below. Otherwise the hardware might continue generating interrupts
> > + * which in turn causes a race condition with the following transfer.
>
> I added "Needs some more investigation if the additional interrupts are
> a hardware bug or this driver doesn't handle them correctly yet." to the
> comment and
>
> Applied to for-next, thanks!

Sorry for the late response (was on vacation). This patch looks good to me
as well. Thanks for applying it!

2013-06-17 08:33:53

by Jean Delvare

[permalink] [raw]
Subject: Re: [PATCH V2] i2c: designware: fix race between subsequent xfers

On Mon, 17 Jun 2013 10:19:33 +0200, Christian Ruppert wrote:
> On Fri, Jun 14, 2013 at 04:37:41PM +0200, Wolfram Sang wrote:
> > BTW since I am currently here: i2c-designware-core should be in the
> > 'algos' directory, no?
>
> At the risk of passing for a complete moron: What exactly is the
> difference between I2C algos and I2C bus drivers?

The i2c/algos directory contains abstracted code which is common to
multiple hardware implementations. The most popular of these is
i2c-algo-bit which implements software-only I2C over virtually any pair
of controllable pins (parallel port, GPIOs, etc.)

As a general rule, i2c/algos should only contain reusable, architecture
and platform independent code. All the actual hardware access should be
delegated to the bus drivers, through callbacks. If this can't be done
easily then i2c/algos is not the right place.

--
Jean Delvare

2013-06-17 09:02:36

by Christian Ruppert

[permalink] [raw]
Subject: Re: [PATCH V2] i2c: designware: fix race between subsequent xfers

On Mon, Jun 17, 2013 at 10:33:36AM +0200, Jean Delvare wrote:
> On Mon, 17 Jun 2013 10:19:33 +0200, Christian Ruppert wrote:
> > On Fri, Jun 14, 2013 at 04:37:41PM +0200, Wolfram Sang wrote:
> > > BTW since I am currently here: i2c-designware-core should be in the
> > > 'algos' directory, no?
> >
> > At the risk of passing for a complete moron: What exactly is the
> > difference between I2C algos and I2C bus drivers?
>
> The i2c/algos directory contains abstracted code which is common to
> multiple hardware implementations. The most popular of these is
> i2c-algo-bit which implements software-only I2C over virtually any pair
> of controllable pins (parallel port, GPIOs, etc.)
>
> As a general rule, i2c/algos should only contain reusable, architecture
> and platform independent code. All the actual hardware access should be
> delegated to the bus drivers, through callbacks. If this can't be done
> easily then i2c/algos is not the right place.

In this case, busses is the right place for the i2c-designware-core. This
file contains the actual driver implementation (i.e. register access,
interrupt handling etc.) for a dedicated I2C bus driver hardware block
used in SOCs.

Greetings,
Christian

--
Christian Ruppert , <[email protected]>
/|
Tel: +41/(0)22 816 19-42 //| 3, Chemin du Pr?-Fleuri
_// | bilis Systems CH-1228 Plan-les-Ouates