2014-12-10 20:16:34

by Ashwin Chaugule

[permalink] [raw]
Subject: [PATCH] Mailbox: Complete wait event only if Tx was successful

If a wait_for_completion_timeout() call returns due to a timeout,
the mbox code can still call complete() after returning from the wait.
This can cause subsequent transmissions on a channel to fail, since
the wait_for_completion_timeout() sees the completion variable
is !=0, caused by the erroneous complete() call, and immediately
returns without waiting for the time as expected by the client.

Fix this by calling complete() only if the TX was successful.

Signed-off-by: Ashwin Chaugule <[email protected]>
---
drivers/mailbox/mailbox.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 17e9e4a..4acaddb 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r)
if (mssg && chan->cl->tx_done)
chan->cl->tx_done(chan->cl, mssg, r);

- if (chan->cl->tx_block)
+ if ((!r) && chan->cl->tx_block)
complete(&chan->tx_complete);
}

--
1.9.1


2014-12-12 08:43:11

by Sudeep Holla

[permalink] [raw]
Subject: Re: [Linaro-acpi] [PATCH] Mailbox: Complete wait event only if Tx was successful

Hi Ashwin,

On Thursday 11 December 2014 01:46 AM, Ashwin Chaugule wrote:
> If a wait_for_completion_timeout() call returns due to a timeout,
> the mbox code can still call complete() after returning from the wait.
> This can cause subsequent transmissions on a channel to fail, since
> the wait_for_completion_timeout() sees the completion variable
> is !=0, caused by the erroneous complete() call, and immediately
> returns without waiting for the time as expected by the client.
>
> Fix this by calling complete() only if the TX was successful.
>
> Signed-off-by: Ashwin Chaugule <[email protected]>
> ---
> drivers/mailbox/mailbox.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> index 17e9e4a..4acaddb 100644
> --- a/drivers/mailbox/mailbox.c
> +++ b/drivers/mailbox/mailbox.c
> @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r)
> if (mssg && chan->cl->tx_done)
> chan->cl->tx_done(chan->cl, mssg, r);
>
> - if (chan->cl->tx_block)
> + if ((!r) && chan->cl->tx_block)
> complete(&chan->tx_complete);

Just curious to check if there's another possible race which is
a different issue.

Suppose the timer fired and indicated that the Tx is complete, then
it tries to execute complete while the wait_for_completion_timeout timed
out. Does that make sense ?

So if yes, how about adding !completion_done(..) to the check while you
are at this ?

Regards,
Sudeep

2014-12-12 10:21:57

by Jassi Brar

[permalink] [raw]
Subject: Re: [PATCH] Mailbox: Complete wait event only if Tx was successful

On 11 December 2014 at 01:46, Ashwin Chaugule
<[email protected]> wrote:
> If a wait_for_completion_timeout() call returns due to a timeout,
> the mbox code can still call complete() after returning from the wait.
> This can cause subsequent transmissions on a channel to fail, since
> the wait_for_completion_timeout() sees the completion variable
> is !=0, caused by the erroneous complete() call, and immediately
> returns without waiting for the time as expected by the client.
>
> Fix this by calling complete() only if the TX was successful.
>
> Signed-off-by: Ashwin Chaugule <[email protected]>
> ---
> drivers/mailbox/mailbox.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> index 17e9e4a..4acaddb 100644
> --- a/drivers/mailbox/mailbox.c
> +++ b/drivers/mailbox/mailbox.c
> @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r)
> if (mssg && chan->cl->tx_done)
> chan->cl->tx_done(chan->cl, mssg, r);
>
> - if (chan->cl->tx_block)
> + if ((!r) && chan->cl->tx_block)
> complete(&chan->tx_complete);
>
Thanks for finding the bug.
However the fix is flawed. complete() could also be done from
mbox_chan_txdone() calling tx_tick(). And if the controller returned
error, we could never pass on that error code to the user (timeout
fires and then we will move on with -EIO).
Since we could never prevent the controller from returning -EIO as the
error, I think we have to explicitly tell tx_tick() if it needs to
complete() or not.

-Jassi

2014-12-12 17:47:30

by Ashwin Chaugule

[permalink] [raw]
Subject: Re: [Linaro-acpi] [PATCH] Mailbox: Complete wait event only if Tx was successful

On 12 December 2014 at 03:43, Sudeep Holla <[email protected]> wrote:
> Hi Ashwin,

Hi,

> On Thursday 11 December 2014 01:46 AM, Ashwin Chaugule wrote:
>>
>> If a wait_for_completion_timeout() call returns due to a timeout,
>> the mbox code can still call complete() after returning from the wait.
>> This can cause subsequent transmissions on a channel to fail, since
>> the wait_for_completion_timeout() sees the completion variable
>> is !=0, caused by the erroneous complete() call, and immediately
>> returns without waiting for the time as expected by the client.
>>
>> Fix this by calling complete() only if the TX was successful.
>>
>> Signed-off-by: Ashwin Chaugule <[email protected]>
>> ---
>> drivers/mailbox/mailbox.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
>> index 17e9e4a..4acaddb 100644
>> --- a/drivers/mailbox/mailbox.c
>> +++ b/drivers/mailbox/mailbox.c
>> @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r)
>> if (mssg && chan->cl->tx_done)
>> chan->cl->tx_done(chan->cl, mssg, r);
>>
>> - if (chan->cl->tx_block)
>> + if ((!r) && chan->cl->tx_block)
>> complete(&chan->tx_complete);
>
>
> Just curious to check if there's another possible race which is
> a different issue.
>
> Suppose the timer fired and indicated that the Tx is complete, then
> it tries to execute complete while the wait_for_completion_timeout timed
> out. Does that make sense ?
>
> So if yes, how about adding !completion_done(..) to the check while you
> are at this ?

Yea. Seems like another race condition. I'll add it along with this..

Thanks,
Ashwin

2014-12-12 17:58:29

by Ashwin Chaugule

[permalink] [raw]
Subject: Re: [PATCH] Mailbox: Complete wait event only if Tx was successful

On 12 December 2014 at 05:21, Jassi Brar <[email protected]> wrote:
> On 11 December 2014 at 01:46, Ashwin Chaugule
> <[email protected]> wrote:
>> If a wait_for_completion_timeout() call returns due to a timeout,
>> the mbox code can still call complete() after returning from the wait.
>> This can cause subsequent transmissions on a channel to fail, since
>> the wait_for_completion_timeout() sees the completion variable
>> is !=0, caused by the erroneous complete() call, and immediately
>> returns without waiting for the time as expected by the client.
>>
>> Fix this by calling complete() only if the TX was successful.
>>
>> Signed-off-by: Ashwin Chaugule <[email protected]>
>> ---
>> drivers/mailbox/mailbox.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
>> index 17e9e4a..4acaddb 100644
>> --- a/drivers/mailbox/mailbox.c
>> +++ b/drivers/mailbox/mailbox.c
>> @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r)
>> if (mssg && chan->cl->tx_done)
>> chan->cl->tx_done(chan->cl, mssg, r);
>>
>> - if (chan->cl->tx_block)
>> + if ((!r) && chan->cl->tx_block)
>> complete(&chan->tx_complete);
>>
> Thanks for finding the bug.
> However the fix is flawed. complete() could also be done from
> mbox_chan_txdone() calling tx_tick(). And if the controller returned
> error, we could never pass on that error code to the user (timeout
> fires and then we will move on with -EIO).

If the controller returned error, wouldnt it be passed to the user via
chan->cl->tx_done()?
However, I agree that with this fix, it won't call complete(), when
the controller finished the Tx with an Err.

> Since we could never prevent the controller from returning -EIO as the
> error, I think we have to explicitly tell tx_tick() if it needs to
> complete() or not.

Hm. How about checking if the wait_for_completion_timeout() returned
the timeout value? If so, then return with -EIO and don't complete().

>
> -Jassi

2014-12-16 11:35:19

by Sudeep Holla

[permalink] [raw]
Subject: Re: [Linaro-acpi] [PATCH] Mailbox: Complete wait event only if Tx was successful

On Fri, Dec 12, 2014 at 05:47:26PM +0000, Ashwin Chaugule wrote:
> On 12 December 2014 at 03:43, Sudeep Holla <[email protected]> wrote:
> > Hi Ashwin,
>
> Hi,
>

Hi Ashwin,

> > On Thursday 11 December 2014 01:46 AM, Ashwin Chaugule wrote:
> >>
> >> If a wait_for_completion_timeout() call returns due to a timeout,
> >> the mbox code can still call complete() after returning from the wait.
> >> This can cause subsequent transmissions on a channel to fail, since
> >> the wait_for_completion_timeout() sees the completion variable
> >> is !=0, caused by the erroneous complete() call, and immediately
> >> returns without waiting for the time as expected by the client.
> >>
> >> Fix this by calling complete() only if the TX was successful.
> >>
> >> Signed-off-by: Ashwin Chaugule <[email protected]>
> >> ---
> >> drivers/mailbox/mailbox.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> >> index 17e9e4a..4acaddb 100644
> >> --- a/drivers/mailbox/mailbox.c
> >> +++ b/drivers/mailbox/mailbox.c
> >> @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r)
> >> if (mssg && chan->cl->tx_done)
> >> chan->cl->tx_done(chan->cl, mssg, r);
> >>
> >> - if (chan->cl->tx_block)
> >> + if ((!r) && chan->cl->tx_block)
> >> complete(&chan->tx_complete);
> >
> >
> > Just curious to check if there's another possible race which is
> > a different issue.
> >
> > Suppose the timer fired and indicated that the Tx is complete, then
> > it tries to execute complete while the wait_for_completion_timeout timed
> > out. Does that make sense ?
> >
> > So if yes, how about adding !completion_done(..) to the check while you
> > are at this ?
>
> Yea. Seems like another race condition. I'll add it along with this..
>

Thanks !

Regards,
Sudeep

2014-12-16 13:00:44

by Ashwin Chaugule

[permalink] [raw]
Subject: Re: [Linaro-acpi] [PATCH] Mailbox: Complete wait event only if Tx was successful

On 16 December 2014 at 06:36, Sudeep Holla <[email protected]> wrote:
> On Fri, Dec 12, 2014 at 05:47:26PM +0000, Ashwin Chaugule wrote:
>> On 12 December 2014 at 03:43, Sudeep Holla <[email protected]> wrote:
>> > On Thursday 11 December 2014 01:46 AM, Ashwin Chaugule wrote:
>> >>
>> >> If a wait_for_completion_timeout() call returns due to a timeout,
>> >> the mbox code can still call complete() after returning from the wait.
>> >> This can cause subsequent transmissions on a channel to fail, since
>> >> the wait_for_completion_timeout() sees the completion variable
>> >> is !=0, caused by the erroneous complete() call, and immediately
>> >> returns without waiting for the time as expected by the client.
>> >>
>> >> Fix this by calling complete() only if the TX was successful.
>> >>
>> >> Signed-off-by: Ashwin Chaugule <[email protected]>
>> >> ---
>> >> drivers/mailbox/mailbox.c | 2 +-
>> >> 1 file changed, 1 insertion(+), 1 deletion(-)
>> >>
>> >> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
>> >> index 17e9e4a..4acaddb 100644
>> >> --- a/drivers/mailbox/mailbox.c
>> >> +++ b/drivers/mailbox/mailbox.c
>> >> @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r)
>> >> if (mssg && chan->cl->tx_done)
>> >> chan->cl->tx_done(chan->cl, mssg, r);
>> >>
>> >> - if (chan->cl->tx_block)
>> >> + if ((!r) && chan->cl->tx_block)
>> >> complete(&chan->tx_complete);
>> >
>> >
>> > Just curious to check if there's another possible race which is
>> > a different issue.
>> >
>> > Suppose the timer fired and indicated that the Tx is complete, then
>> > it tries to execute complete while the wait_for_completion_timeout timed
>> > out. Does that make sense ?
>> >
>> > So if yes, how about adding !completion_done(..) to the check while you
>> > are at this ?
>>
>> Yea. Seems like another race condition. I'll add it along with this..
>>
>
> Thanks !

IIUC, it looks like adding the !completion_done() will not really fix
this race. Once the lock inside wait_for_completion.. is released,
completion_done() will return 0, and we'll call complete(), which is
not what we want, since the "wait" is already over (after a timeout).
I think the only right thing here is to increase the timeout in
wait_for_completion_timeout(). Thoughts?

Cheers,
Ashwin