2008-12-21 07:32:48

by David Brownell

[permalink] [raw]
Subject: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix

From: David Brownell <[email protected]>

The recent "simplify spi_write_then_read()" patch included a small
regression from the 2.6.27 behavior with its performance win.

All SPI transfers are full duplex, and are packaged as half duplex
by either discarding the data that's read ("write only"), or else
by writing zeroes ("read only"). That patch wasn't ensuring that
zeroes were getting written out during the "half duplex read" part
of the transaction; instead, old RX bits were getting sent.

The fix is trivial: zero the buffer before using it.

Signed-off-by: David Brownell <[email protected]>
Cc: Vernon Sauder <[email protected]>

--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -677,11 +677,13 @@ int spi_write_then_read(struct spi_device *spi,

/* ... unless someone else is using the pre-allocated buffer */
if (!mutex_trylock(&lock)) {
- local_buf = kmalloc(SPI_BUFSIZ, GFP_KERNEL);
+ local_buf = kzalloc(SPI_BUFSIZ, GFP_KERNEL);
if (!local_buf)
return -ENOMEM;
- } else
+ } else {
local_buf = buf;
+ memset(local_buf, 0, x.len);
+ }

memcpy(local_buf, txbuf, n_tx);
x.tx_buf = local_buf;


2008-12-21 23:46:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix



On Sat, 20 Dec 2008, David Brownell wrote:
>
> All SPI transfers are full duplex, and are packaged as half duplex
> by either discarding the data that's read ("write only"), or else
> by writing zeroes ("read only"). That patch wasn't ensuring that
> zeroes were getting written out during the "half duplex read" part
> of the transaction; instead, old RX bits were getting sent.

Hmm. In addition, isn't this broken (in that same function):

memcpy(local_buf, txbuf, n_tx);
x.tx_buf = local_buf;
x.rx_buf = local_buf;

/* do the i/o */
status = spi_sync(spi, &message);
if (status == 0)
memcpy(rxbuf, x.rx_buf + n_tx, n_rx);

shouldn't that 'rx_buf' setup be

x.rx_buf = local_buf + n_tx;

since the whole point was that we allocated a buffer that can hold _both_
the rx and tx parts? Especially as that final copy into the resulting
"rxbuf" thing uses that "+ n_tx" addition?

Linus

2008-12-22 00:48:37

by David Brownell

[permalink] [raw]
Subject: Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix

On Sunday 21 December 2008, Linus Torvalds wrote:
>
> On Sat, 20 Dec 2008, David Brownell wrote:
> >
> > All SPI transfers are full duplex, and are packaged as half duplex
> > by either discarding the data that's read ("write only"), or else
> > by writing zeroes ("read only"). That patch wasn't ensuring that
> > zeroes were getting written out during the "half duplex read" part
> > of the transaction; instead, old RX bits were getting sent.
>
> Hmm. In addition, isn't this broken (in that same function):

No -- this is full duplex. The write_then_read() helper is
simplifying a common half-duplex idiom for short operations,
but the harware still does full duplex. Buffer layout is:

Before: WWWWW0000000
After: xxxxxRRRRRRR

That is, for every bit shifted out (W, 0) another one gets
shifted in (x, R). The I/O primitive essentially swaps
contents of a one-word shift register between master and
slave; or, sequences of such words. Words don't need to
be byte-size, though that's a common option.


> memcpy(local_buf, txbuf, n_tx);
> x.tx_buf = local_buf;
> x.rx_buf = local_buf;
>
> /* do the i/o */
> status = spi_sync(spi, &message);
> if (status == 0)
> memcpy(rxbuf, x.rx_buf + n_tx, n_rx);
>
> shouldn't that 'rx_buf' setup be
>
> x.rx_buf = local_buf + n_tx;
>
> since the whole point was that we allocated a buffer that can hold _both_
> the rx and tx parts? Especially as that final copy into the resulting
> "rxbuf" thing uses that "+ n_tx" addition?

See above. We only want the "R" bits which were shifted in
right *after* the n_tx "W" bits. If we offset rx_buf before
the I/O, we'd start with the "x" don't-care bits and need to
do something else to discard them. (Plus, allocate more
space at the end of the buffer.)

- Dave


> Linus
>
>

2008-12-23 01:54:15

by Ingo Oeser

[permalink] [raw]
Subject: Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix

Hi David,

On Monday 22 December 2008, you wrote:
> On Sunday 21 December 2008, Linus Torvalds wrote:
> >
> > On Sat, 20 Dec 2008, David Brownell wrote:
> > >
> > > All SPI transfers are full duplex, and are packaged as half duplex
> > > by either discarding the data that's read ("write only"), or else
> > > by writing zeroes ("read only"). That patch wasn't ensuring that
> > > zeroes were getting written out during the "half duplex read" part
> > > of the transaction; instead, old RX bits were getting sent.
> >
> > Hmm. In addition, isn't this broken (in that same function):
>
> No -- this is full duplex. The write_then_read() helper is
> simplifying a common half-duplex idiom for short operations,
> but the harware still does full duplex. Buffer layout is:
>
> Before: WWWWW0000000
> After: xxxxxRRRRRRR
>
> That is, for every bit shifted out (W, 0) another one gets
> shifted in (x, R). The I/O primitive essentially swaps
> contents of a one-word shift register between master and
> slave; or, sequences of such words. Words don't need to
> be byte-size, though that's a common option.

> See above. We only want the "R" bits which were shifted in
> right *after* the n_tx "W" bits. If we offset rx_buf before
> the I/O, we'd start with the "x" don't-care bits and need to
> do something else to discard them. (Plus, allocate more
> space at the end of the buffer.)

Wow, what interesting hardware logic and a nice explanation.

Could you put that into a comment somewhere close to those helpers?

You can safely assume, that any code which Linus doesn't understand
is non-trivial and needs a comment :-)


Best Regards

Ingo Oeser

2008-12-23 18:38:12

by David Brownell

[permalink] [raw]
Subject: Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix

On Monday 22 December 2008, Ingo Oeser wrote:
> Hi David,
>
> On Monday 22 December 2008, you wrote:
> > The write_then_read() helper is
> > simplifying a common half-duplex idiom for short operations,
> > but the hardware still does full duplex. Buffer layout is:
> >
> > Before: WWWWW0000000
> > After: xxxxxRRRRRRR
> >
> > That is, for every bit shifted out (W, 0) another one gets
> > shifted in (x, R). The I/O primitive essentially swaps
> > contents of a one-word shift register between master and
> > slave; or, sequences of such words. Words don't need to
> > be byte-size, though that's a common option.
>
> ...
>
> Wow, what interesting hardware logic and a nice explanation.

It's standard SPI ... yes, it's kind of interesting, and I've
not come across anything else that synchronizes RX and TX in
quite the same way.

I suspect that hardware designers find the ability to get
quick transfers (tens of megabits/second) from just a shift
register is fairly attractive. Less work than high speed
I2C, for one example.


> Could you put that into a comment somewhere close to those helpers?

Fair enough. That file does lack any mention of the I/O model
for SPI ... and it's certainly a bit of a surprise to anyone
who's not had to look at it before!


> You can safely assume, that any code which Linus doesn't understand
> is non-trivial and needs a comment :-)

I'm not sure I'd agree with that -- while he's many things,
"expert in everything" is not one of them! -- but I do agree
that comment would be worth adding.


However, with regards to $PATCH ... considering that the '28
release is just around the corner, I changed my mind and would
suggest just reverting f9b90e39cbc5c4d6ef60022fd1f25d541df0aad1
much though I very much like that fix in general.

Reason being: it needs a bit more work, I forgot that it will
break for example drivers using drivers/spi/omap_uwire.c since
that is built on top of truly half-duplex hardware. (Used with
OMAP1 SOCs.) The kerneldoc says it's for providing MicroWire
style transactions, so it's kind of significant that it work
on MicroWire hardware too. ;)

So an updated version would need to both zero the TX bits when
using normal full-duplex SPI hardware ... and use more or less
the current code when running on half-duplex variants.

- Dave

2008-12-23 18:46:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix



On Sun, 21 Dec 2008, David Brownell wrote:

> On Sunday 21 December 2008, Linus Torvalds wrote:
> >
> > Hmm. In addition, isn't this broken (in that same function):
>
> No -- this is full duplex. The write_then_read() helper is
> simplifying a common half-duplex idiom for short operations,
> but the harware still does full duplex. Buffer layout is:
>
> Before: WWWWW0000000
> After: xxxxxRRRRRRR

Then, the problem is this:

x.tx_buf = local_buf;
x.rx_buf = local_buf;

which makes no sense. It's not two separate "tx_buf"/"rx_buf" things, it's
just a single "buf".

Having two separate pointers is not only confusing, but it's actively
WRONG, since the "rx_buf" clearly does not start at local_buf.

But that's not how most spi drivers seem to do it. They literally set
tx_bug and rx_buf to different values. So again, none of what you talk
about makes sense, and none of your "explanations" are anything but just
incoherent noise.

So please explain how it can _possibly_ make sense to

- have two separate pointers

- and have one of them point to what is clearly not the data it is
supposed to point to.

Hmm?

Linus

2008-12-23 20:55:11

by David Brownell

[permalink] [raw]
Subject: Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix

On Tuesday 23 December 2008, Linus Torvalds wrote:
>
> On Sun, 21 Dec 2008, David Brownell wrote:
>
> > On Sunday 21 December 2008, Linus Torvalds wrote:
> > >
> > > Hmm. In addition, isn't this broken (in that same function):
> >
> > No -- this is full duplex. The write_then_read() helper is
> > simplifying a common half-duplex idiom for short operations,
> > but the hardware still does full duplex. Buffer layout is:
> >
> > Before: WWWWW0000000
> > After: xxxxxRRRRRRR
>
> Then, the problem is this:
>
> x.tx_buf = local_buf;
> x.rx_buf = local_buf;
>
> which makes no sense. It's not two separate "tx_buf"/"rx_buf" things,
> it's just a single "buf".

In *this* specific case, yes. That's the whole point of this
routine: letting drivers treat a full duplex hardware protocol
as if it were a half duplex model.

Note that the *caller* provides two separate buffers ... which
can safely be on the stack; memcpy is used to access them.

The whole local_buf thing is a bounce-buffer, letting the caller
ignore DMA and other I/O issues that can be confusing.


You're not quite understanding an optimization, I suspect.

The original version of this code had two separate tranfers,
and each was half duplex: first a TX, then an RX ... and with
the buffer pointer in the RX transfer offset as you seem to
expect. But it uses local_buf in the *same* way. The only thing
different is how the I/O is described. With ", " separating
the first and second transfers:

Before: WWWWW, 0000000
After: xxxxx, RRRRRRR

Previously "discard RX data" (xxxxx) and "TX zeroes" (000000)
were implicit, because the TX transfer had no RX buffer and
then the RX transfer had no TX buffer. (You'll get that code
if you revert the patch, as I suggest for '28-final.) The
"interesting" RX data landed in the same place.

Thing is, the mid-message switch between two transfers made
for nasty performance when DMA or similar pipelining (FIFOs)
is involved. The cost to switch from the TX transfer to the
RX transfer could easily double the time to perform the whole
write-then-read message. (And it exposed a bug in one SPI
master controller driver, since fixed.)


Hence the switch to a lower overhead implementation, which
implements the same on-the-wire protocol more efficiently.
It made the "xxxxx" and "0000000" parts explicit, and uses
one transfer not two. (And exposed DMA bugs in other SPI
master controller drivers ... in one case, its fix seemed
to resolve some longstanding hard-to-reproduce problems.)

And that's why the offset to the RRRRRRR bits went from
being explicit to being implicit: it's doing the whole
thing in one transfer, rather than two.


> Having two separate pointers is not only confusing, but it's actively
> WRONG, since the "rx_buf" clearly does not start at local_buf.

Erm, no. It's right as is. Watch the bits on the wire,
or as they arrive in the buffers -- it's correct. Bits
are arriving and going into rx_buf, starting at exactly
the beginning of local_buf.

But none of those bits are interesting to the caller until
the ones read AFTER the write completes. Hence the name
of the function: write, then read. SPI itself does both
at the same time.


> So please explain how it can _possibly_ make sense to
>
> - have two separate pointers

In the general case, the full duplex nature of SPI is available for
drivers to do what they want. This version of write_then_read() is
using that for a performance optimization.

It would be extremely surprising for a driver to write data and
always have its transmit buffer overwritten -- unless it asked
for that behavior. Especially for a driver that doesn't care
about the data the slave returns from that particular message!
Likewise if it had to initialize each RX buffer before using it.

The per-transfer interface prevents such surprises by splitting
out the RX and TX buffers. Code that wants overwriting can get
it; code that doesn't, isn't forced to cope with that.


> - and have one of them point to what is clearly not the data it is
> supposed to point to.

See above ... you're not seeing the consequences of each transfer
being full duplex. There aren't many ways to package an interface
to hardware transfers where no bit is received without transmitting
another one:

- Only provide half duplex transfers ... not a good choice,
some devices do need full duplex, and custom drivers have
wanted to leverage the full duplex hardware.

- Only provide a primitive where every TX buffer gets clobbered
by RX data, and every RX buffer must be zeroed ... limiting,
because of half-duplex variants (Microwire, and "3-wire SPI")
that would be precluded.

- Provide a primitive with both RX and TX buffers, and let the
drivers set them up as needed.

The current framework uses the last option, which still seems to
me to be the best choice.

- Dave