LinuxLists.cc - spidev: fix hang when transfer_one

2014-01-05 23:45:20

Subject: spidev: fix hang when transfer_one_message fails

This corrects a problem in spi_pump_messages() that leads to an spi
message hanging forever when a call to transfer_one_message() fails.
This failure occurs in my MCP2210 driver when the cs_change bit is set
on the last transfer in a message, an operation which the hardware does
not support.

Rationale
Since the transfer_one_message() returns an int, we must presume that it
may fail. If transfer_one_message() should never fail, it should return
void. Thus, calls to transfer_one_message() should properly manage a
failure.

Signed-off-by: Daniel Santos <[email protected]>
---
drivers/spi/spi.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 98f4b77..907122e 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -735,7 +735,9 @@ static void spi_pump_messages(struct kthread_work *work)
ret = master->transfer_one_message(master, master->cur_msg);
if (ret) {
dev_err(&master->dev,
- "failed to transfer one message from queue\n");
+ "failed to transfer one message from queue: %d\n", ret);
+ master->cur_msg->status = ret;
+ spi_finalize_current_message(master);
return;
}
}
--
1.8.3.2

2014-01-05 23:51:05

by Daniel Santos

[permalink] [raw]

Subject: Re: [PATCH] spidev: fix hang when transfer_one_message fails

Sorry, the "[PATCH]" prefix didn't get in the subject.

2014-01-06 12:53:52

by Mark Brown

[permalink] [raw]

Subject: Re: spidev: fix hang when transfer_one_message fails

On Sun, Jan 05, 2014 at 05:39:26PM -0600, [email protected] wrote:
> This corrects a problem in spi_pump_messages() that leads to an spi
> message hanging forever when a call to transfer_one_message() fails.
> This failure occurs in my MCP2210 driver when the cs_change bit is set
> on the last transfer in a message, an operation which the hardware does
> not support.

Applied, thanks.

Attachments:

(No filename) (392.00 B)
signature.asc (836.00 B)
Digital signature Download all attachments

2014-01-23 16:47:06

by Geert Uytterhoeven

[permalink] [raw]

Subject: Re: spidev: fix hang when transfer_one_message fails

On Mon, Jan 6, 2014 at 12:39 AM, <[email protected]> wrote:
> --- a/drivers/spi/spi.c
> +++ b/drivers/spi/spi.c
> @@ -735,7 +735,9 @@ static void spi_pump_messages(struct kthread_work *work)
> ret = master->transfer_one_message(master, master->cur_msg);
> if (ret) {
> dev_err(&master->dev,
> - "failed to transfer one message from queue\n");
> + "failed to transfer one message from queue: %d\n", ret);
> + master->cur_msg->status = ret;

This crashes with drivers using the generic spi_transfer_one_message(),
which always calls spi_finalize_current_message(), which zeroes
master->cur_msg:

spi_master spi0: failed to transfer one message from queue: -110
spi_pump_messages:748 master = ef3d8c00
spi_pump_messages:749 master->cur_msg = (null)
Unable to handle kernel NULL pointer dereference at virtual address 00000020
pgd = c0004000
[00000020] *pgd=00000000
Internal error: Oops: 817 [#1] SMP ARM
Modules linked in:
CPU: 1 PID: 30 Comm: spi0 Not tainted
3.13.0-koelsch-00403-gecb6e4e65dea-dirty #274
task: ef250bc0 ti: ef3f0000 task.ti: ef3f0000
PC is at spi_pump_messages+0x22c/0x288
LR is at irq_work_queue+0x6c/0xcc

Probably your transfer_one_message() forgot to call
spi_finalize_current_message()? Is this allowed in case of failure?

* @transfer_one_message: the subsystem calls the driver to transfer a single
* message while queuing transfers that arrive in the meantime. When the
* driver is finished with this message, it must call
* spi_finalize_current_message() so the subsystem can issue the next
* transfer

Alternatively, we need a check for master->cur_msg here.

> + spi_finalize_current_message(master);
> return;
> }
> }

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2014-01-23 18:17:56

by Mark Brown

[permalink] [raw]

Subject: Re: spidev: fix hang when transfer_one_message fails

On Thu, Jan 23, 2014 at 05:47:02PM +0100, Geert Uytterhoeven wrote:

> Probably your transfer_one_message() forgot to call
> spi_finalize_current_message()? Is this allowed in case of failure?

Probably not, or at least we should be consistent about requiring it or
not. Do you want to send a revert for this with a suitable changelog?

Attachments:

(No filename) (337.00 B)
signature.asc (836.00 B)
Digital signature Download all attachments

2014-01-24 02:25:59

by Daniel Santos

[permalink] [raw]

Subject: Re: spidev: fix hang when transfer_one_message fails

On 01/23/2014 12:17 PM, Mark Brown wrote:
> On Thu, Jan 23, 2014 at 05:47:02PM +0100, Geert Uytterhoeven wrote:
>
>> Probably your transfer_one_message() forgot to call
>> spi_finalize_current_message()? Is this allowed in case of failure?
> Probably not, or at least we should be consistent about requiring it or
> not.

Hmm, well it sounds like the core problem is a lack of specificity about
the interface.

1. When a message is being rejected, who is responsible for finalizing
it, the spi subsystem or the master driver?
2. What does a non-zero return value from transfer() or
transfer_one_message() mean? transfer() is supposed to just queue the
message and not sleep, so it would seem appropriate that it would mean
that the message was rejected due to something being invalid or
unsupported (or an OOM), etc. , but transfer_one_message() is also where
*most but not all* drivers transmit the message, so should it mean the
message was rejected outright for being invalid/unsupported/OOM or
should it mean a failure, such as EIO while xmitting or both?
3. Is there ever a reason to set the message's status to anything other
than the return value of transfer()/transfer_one_message()? From a
cursory review of mainline spi drivers, this appears to vary. Some
drivers are always returning zero while setting the status upon error,
some return the status and others still will set the status to one
value, but return a different error code.

So if a non-zero return value from transfer() or transfer_one_message()
should also be the status, I'm thinking we can have a small reduction in
the code footprint if it's done in the spi core. However, I suppose that
I can't properly discuss this without delving into an almost unrelated
issue, which may render the point moot.

The only reason I'm using transfer_one_message() at all is because
transfer() is being deprecated. My driver (currently out-of-tree)
supports both but will prefer transfer() as long as it hasn't been
removed or become broken ( which I'm managing via a #if
LINUX_VERSION_CODE >= KERNEL_VERSION(4,99,99) check:
https://github.com/daniel-santos/mcp2210-linux/blob/master/mcp2210-spi.c#L143).
The reason is for this is that the mcp2210 driver has an internal
command queue that manages (per its requirements) spi messages as well
as other types of commands to the remote (via USB) device (which is both
an spi master and gpio chip). From a cursory review of other spi drivers
in the mainline, I can see that at least two of them do this as well:
spi-pxa2xx and spi-bfin-v3. So perhaps we need a non-deprecated
mechanism to do our own queuing and avoid the overhead of the spi core
providing a thread & queue which we'll just ignore. Then, the core can
take care of setting status and finalizing when calls to transfer() fail
(since there should be no ambiguity about this here), but leave that up
to the driver when calling transfer_one_message()?

Either way, I think that we need to decide and spell it out in the
kerneldocs.

Daniel

2014-01-24 13:01:52

by Mark Brown

[permalink] [raw]

Subject: Re: spidev: fix hang when transfer_one_message fails

On Thu, Jan 23, 2014 at 08:21:39PM -0600, Daniel Santos wrote:
> On 01/23/2014 12:17 PM, Mark Brown wrote:
> >On Thu, Jan 23, 2014 at 05:47:02PM +0100, Geert Uytterhoeven wrote:

Please don't write enormous walls of text, it really doesn't make it
easy to read your messages or encourage doing so. Use blank lines
between paragraphs (including within lists) and try to either split or
condense your ideas so that what you're trying to say comes over more
clearly.

> The only reason I'm using transfer_one_message() at all is because
> transfer() is being deprecated. My driver (currently out-of-tree)
> supports both but will prefer transfer() as long as it hasn't been
> removed or become broken ( which I'm managing via a #if
> LINUX_VERSION_CODE >= KERNEL_VERSION(4,99,99) check: https://github.com/daniel-santos/mcp2210-linux/blob/master/mcp2210-spi.c#L143).

No, don't do that - it's not sensible. If there's something you need
work upstream to get it implemented or understand how to use the
framework better. Don't code around the frameworks, talk to people
instead.

> of other spi drivers in the mainline, I can see that at least two of
> them do this as well: spi-pxa2xx and spi-bfin-v3. So perhaps we need
> a non-deprecated mechanism to do our own queuing and avoid the

No, that's not what those drivers are doing (nor the others doing
similar things) - they have done some optimisation on the code that
pushes messages to hardware so they don't defer to task context when
they don't have to. There's very little hardware specific about what
they're doing, it's all about how we work with the scheduler to minimise
the idle time for the hardware. A major goal of factoring out the loops
that traverse the messages from the drivers is to allow us to move that
code out of the drivers and into the framework where it belongs.

> overhead of the spi core providing a thread & queue which we'll just
> ignore. Then, the core can take care of setting status and
> finalizing when calls to transfer() fail (since there should be no
> ambiguity about this here), but leave that up to the driver when
> calling transfer_one_message()?

When the core refactoring is finished popping up into the thread will be
mostly optional. Things like PIO, clock reprogramming and delays will
need to be pushed up into task context as do some of the DMA operations
and the completions - you don't want to be doing anything slow in
interrupt context.

Attachments:

(No filename) (2.39 kB)
signature.asc (836.00 B)
Digital signature Download all attachments

2014-01-27 23:15:09

by Daniel Santos

[permalink] [raw]

Subject: Re: spidev: fix hang when transfer_one_message fails

On 01/24/2014 07:01 AM, Mark Brown wrote:
> Please don't write enormous walls of text, it really doesn't make it
> easy to read your messages or encourage doing so. Use blank lines
> between paragraphs (including within lists) and try to either split or
> condense your ideas so that what you're trying to say comes over more
> clearly.

Indeed, that was pretty ugly. :) Sorry about that.

>
>> The only reason I'm using transfer_one_message() at all is because
>> transfer() is being deprecated. My driver (currently out-of-tree)
>> supports both but will prefer transfer() as long as it hasn't been
>> removed or become broken ( which I'm managing via a #if
>> LINUX_VERSION_CODE >= KERNEL_VERSION(4,99,99) check: https://github.com/daniel-santos/mcp2210-linux/blob/master/mcp2210-spi.c#L143).
> No, don't do that - it's not sensible. If there's something you need
> work upstream to get it implemented or understand how to use the
> framework better. Don't code around the frameworks, talk to people
> instead.

I suppose that at the time I worked on this, I had some time pressures
and I did plan to come back to it and discuss this with linux-spi to
figure out how to better manage this or if I should just simply use the
spi's queue and leave it be. I've faced a lot of challenges thus far
because:

a.) It's my first device driver, and

b.) I must dynamically create/destroy gpio_chips, irq_chips, spi_masters
and their children since this is a USB "bridge" device that can be added
& removed at any point in time.

I originally thought that it was a first in its class, but I've since
discovered another out-of-tree project that is doing very similar
things, USB to i2c/spi (https://github.com/groeck/diolan)

>
>> of other spi drivers in the mainline, I can see that at least two of
>> them do this as well: spi-pxa2xx and spi-bfin-v3. So perhaps we need
>> a non-deprecated mechanism to do our own queuing and avoid the
> No, that's not what those drivers are doing (nor the others doing
> similar things) - they have done some optimisation on the code that
> pushes messages to hardware so they don't defer to task context when
> they don't have to. There's very little hardware specific about what
> they're doing, it's all about how we work with the scheduler to minimise
> the idle time for the hardware. A major goal of factoring out the loops
> that traverse the messages from the drivers is to allow us to move that
> code out of the drivers and into the framework where it belongs.

Oh, that's cool! :) Thanks for the clarification.

>> overhead of the spi core providing a thread & queue which we'll just
>> ignore. Then, the core can take care of setting status and
>> finalizing when calls to transfer() fail (since there should be no
>> ambiguity about this here), but leave that up to the driver when
>> calling transfer_one_message()?
> When the core refactoring is finished popping up into the thread will be
> mostly optional. Things like PIO, clock reprogramming and delays will
> need to be pushed up into task context as do some of the DMA operations
> and the completions - you don't want to be doing anything slow in
> interrupt context.

I suppose I need to read up more on the refactoring work happening in
this subsystem. Yes, we definitely don't want to spend much time in
interrupt context and my driver currently spends a lot of time there (at
least to me). My strategy has been that when I get an spi message from
transfer(), I create and submit an mcp2210-specific command for that
message. If no command is currently in-process, I also submit 64-byte
interrupt URB for that command prior to returning (the mcp2210 has a
tiny buffer). I suppose I've been trying to follow the "first make it
correct, then make it fast" credo.

Daniel

Subject: spidev: fix hang when transfer_one_message fails

Subject: Re: *[PATCH]* spidev: fix hang when transfer_one_message fails

Subject: Re: spidev: fix hang when transfer_one_message fails

Attachments:

Subject: Re: spidev: fix hang when transfer_one_message fails

Subject: Re: spidev: fix hang when transfer_one_message fails

Attachments:

Subject: Re: spidev: fix hang when transfer_one_message fails

Subject: Re: spidev: fix hang when transfer_one_message fails

Attachments:

Subject: Re: spidev: fix hang when transfer_one_message fails

Subject: Re: [PATCH] spidev: fix hang when transfer_one_message fails