2020-12-17 20:30:47

by Sowjanya Komatineni

[permalink] [raw]
Subject: [PATCH v4 5/9] spi: spi-mem: Mark dummy transfers by setting dummy_data bit

This patch marks dummy transfer by setting dummy_data bit to 1.

Controllers supporting dummy transfer by hardware use this bit field
to skip software transfer of dummy bytes and use hardware dummy bytes
transfer.

Signed-off-by: Sowjanya Komatineni <[email protected]>
---
drivers/spi/spi-mem.c | 1 +
include/linux/spi/spi.h | 2 ++
2 files changed, 3 insertions(+)

diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c
index f3a3f19..c64371c 100644
--- a/drivers/spi/spi-mem.c
+++ b/drivers/spi/spi-mem.c
@@ -354,6 +354,7 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op)
xfers[xferpos].tx_buf = tmpbuf + op->addr.nbytes + 1;
xfers[xferpos].len = op->dummy.nbytes;
xfers[xferpos].tx_nbits = op->dummy.buswidth;
+ xfers[xferpos].dummy_data = 1;
spi_message_add_tail(&xfers[xferpos], &msg);
xferpos++;
totalxferlen += op->dummy.nbytes;
diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
index aa09fdc..708f2f5 100644
--- a/include/linux/spi/spi.h
+++ b/include/linux/spi/spi.h
@@ -827,6 +827,7 @@ extern void spi_res_release(struct spi_controller *ctlr,
* transfer. If 0 the default (from @spi_device) is used.
* @bits_per_word: select a bits_per_word other than the device default
* for this transfer. If 0 the default (from @spi_device) is used.
+ * @dummy_data: indicates transfer is dummy bytes transfer.
* @cs_change: affects chipselect after this transfer completes
* @cs_change_delay: delay between cs deassert and assert when
* @cs_change is set and @spi_transfer is not the last in @spi_message
@@ -939,6 +940,7 @@ struct spi_transfer {
struct sg_table tx_sg;
struct sg_table rx_sg;

+ unsigned dummy_data:1;
unsigned cs_change:1;
unsigned tx_nbits:3;
unsigned rx_nbits:3;
--
2.7.4


2020-12-18 09:47:49

by Pratyush Yadav

[permalink] [raw]
Subject: Re: [PATCH v4 5/9] spi: spi-mem: Mark dummy transfers by setting dummy_data bit

Hi Sowjanya,

On 17/12/20 12:28PM, Sowjanya Komatineni wrote:
> This patch marks dummy transfer by setting dummy_data bit to 1.
>
> Controllers supporting dummy transfer by hardware use this bit field
> to skip software transfer of dummy bytes and use hardware dummy bytes
> transfer.

What is the benefit you get from this change? You add complexity in
spi-mem and the controller driver, so that must come with some benefits.
Here I don't see any. The transfer will certainly take the same amount
of time because the number or period of the dummy cycles has not
changed. So why is this needed?

> Signed-off-by: Sowjanya Komatineni <[email protected]>
> ---
> drivers/spi/spi-mem.c | 1 +
> include/linux/spi/spi.h | 2 ++
> 2 files changed, 3 insertions(+)
>
> diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c
> index f3a3f19..c64371c 100644
> --- a/drivers/spi/spi-mem.c
> +++ b/drivers/spi/spi-mem.c
> @@ -354,6 +354,7 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op)
> xfers[xferpos].tx_buf = tmpbuf + op->addr.nbytes + 1;
> xfers[xferpos].len = op->dummy.nbytes;
> xfers[xferpos].tx_nbits = op->dummy.buswidth;
> + xfers[xferpos].dummy_data = 1;
> spi_message_add_tail(&xfers[xferpos], &msg);
> xferpos++;
> totalxferlen += op->dummy.nbytes;
> diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
> index aa09fdc..708f2f5 100644
> --- a/include/linux/spi/spi.h
> +++ b/include/linux/spi/spi.h
> @@ -827,6 +827,7 @@ extern void spi_res_release(struct spi_controller *ctlr,
> * transfer. If 0 the default (from @spi_device) is used.
> * @bits_per_word: select a bits_per_word other than the device default
> * for this transfer. If 0 the default (from @spi_device) is used.
> + * @dummy_data: indicates transfer is dummy bytes transfer.
> * @cs_change: affects chipselect after this transfer completes
> * @cs_change_delay: delay between cs deassert and assert when
> * @cs_change is set and @spi_transfer is not the last in @spi_message
> @@ -939,6 +940,7 @@ struct spi_transfer {
> struct sg_table tx_sg;
> struct sg_table rx_sg;
>
> + unsigned dummy_data:1;
> unsigned cs_change:1;
> unsigned tx_nbits:3;
> unsigned rx_nbits:3;
> --
> 2.7.4
>

--
Regards,
Pratyush Yadav
Texas Instruments India

2020-12-18 10:00:45

by Boris Brezillon

[permalink] [raw]
Subject: Re: [PATCH v4 5/9] spi: spi-mem: Mark dummy transfers by setting dummy_data bit

On Fri, 18 Dec 2020 14:51:08 +0530
Pratyush Yadav <[email protected]> wrote:

> Hi Sowjanya,
>
> On 17/12/20 12:28PM, Sowjanya Komatineni wrote:
> > This patch marks dummy transfer by setting dummy_data bit to 1.
> >
> > Controllers supporting dummy transfer by hardware use this bit field
> > to skip software transfer of dummy bytes and use hardware dummy bytes
> > transfer.
>
> What is the benefit you get from this change? You add complexity in
> spi-mem and the controller driver, so that must come with some benefits.
> Here I don't see any. The transfer will certainly take the same amount
> of time because the number or period of the dummy cycles has not
> changed. So why is this needed?

Well, you don't have to queue TX bytes if you use HW-based dummy
cycles, but I agree, I'd expect the overhead to be negligible,
especially since we're talking about emitting a few bytes, not hundreds.
This being said, the complexity added to the core is reasonable IMHO,
so if it really helps reducing the CPU overhead (we might need some
numbers to prove that), I guess it's okay.

>
> > Signed-off-by: Sowjanya Komatineni <[email protected]>
> > ---
> > drivers/spi/spi-mem.c | 1 +
> > include/linux/spi/spi.h | 2 ++
> > 2 files changed, 3 insertions(+)
> >
> > diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c
> > index f3a3f19..c64371c 100644
> > --- a/drivers/spi/spi-mem.c
> > +++ b/drivers/spi/spi-mem.c
> > @@ -354,6 +354,7 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op)
> > xfers[xferpos].tx_buf = tmpbuf + op->addr.nbytes + 1;
> > xfers[xferpos].len = op->dummy.nbytes;
> > xfers[xferpos].tx_nbits = op->dummy.buswidth;
> > + xfers[xferpos].dummy_data = 1;
> > spi_message_add_tail(&xfers[xferpos], &msg);
> > xferpos++;
> > totalxferlen += op->dummy.nbytes;
> > diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
> > index aa09fdc..708f2f5 100644
> > --- a/include/linux/spi/spi.h
> > +++ b/include/linux/spi/spi.h
> > @@ -827,6 +827,7 @@ extern void spi_res_release(struct spi_controller *ctlr,
> > * transfer. If 0 the default (from @spi_device) is used.
> > * @bits_per_word: select a bits_per_word other than the device default
> > * for this transfer. If 0 the default (from @spi_device) is used.
> > + * @dummy_data: indicates transfer is dummy bytes transfer.
> > * @cs_change: affects chipselect after this transfer completes
> > * @cs_change_delay: delay between cs deassert and assert when
> > * @cs_change is set and @spi_transfer is not the last in @spi_message
> > @@ -939,6 +940,7 @@ struct spi_transfer {
> > struct sg_table tx_sg;
> > struct sg_table rx_sg;
> >
> > + unsigned dummy_data:1;
> > unsigned cs_change:1;
> > unsigned tx_nbits:3;
> > unsigned rx_nbits:3;
> > --
> > 2.7.4
> >
>

2020-12-18 18:11:00

by Sowjanya Komatineni

[permalink] [raw]
Subject: Re: [PATCH v4 5/9] spi: spi-mem: Mark dummy transfers by setting dummy_data bit


On 12/18/20 1:57 AM, Boris Brezillon wrote:
> On Fri, 18 Dec 2020 14:51:08 +0530
> Pratyush Yadav <[email protected]> wrote:
>
>> Hi Sowjanya,
>>
>> On 17/12/20 12:28PM, Sowjanya Komatineni wrote:
>>> This patch marks dummy transfer by setting dummy_data bit to 1.
>>>
>>> Controllers supporting dummy transfer by hardware use this bit field
>>> to skip software transfer of dummy bytes and use hardware dummy bytes
>>> transfer.
>> What is the benefit you get from this change? You add complexity in
>> spi-mem and the controller driver, so that must come with some benefits.
>> Here I don't see any. The transfer will certainly take the same amount
>> of time because the number or period of the dummy cycles has not
>> changed. So why is this needed?
> Well, you don't have to queue TX bytes if you use HW-based dummy
> cycles, but I agree, I'd expect the overhead to be negligible,
> especially since we're talking about emitting a few bytes, not hundreds.
> This being said, the complexity added to the core is reasonable IMHO,
> so if it really helps reducing the CPU overhead (we might need some
> numbers to prove that), I guess it's okay.

Hardware dummy cycles feature of Tegra QSPI is to save SW transfer cycle
of dummy bytes by filling FIFO.

I don't have numbers as we always use hardware dummy cycles with Tegra QSPI.

>>
>>> Signed-off-by: Sowjanya Komatineni <[email protected]>
>>> ---
>>> drivers/spi/spi-mem.c | 1 +
>>> include/linux/spi/spi.h | 2 ++
>>> 2 files changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c
>>> index f3a3f19..c64371c 100644
>>> --- a/drivers/spi/spi-mem.c
>>> +++ b/drivers/spi/spi-mem.c
>>> @@ -354,6 +354,7 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op)
>>> xfers[xferpos].tx_buf = tmpbuf + op->addr.nbytes + 1;
>>> xfers[xferpos].len = op->dummy.nbytes;
>>> xfers[xferpos].tx_nbits = op->dummy.buswidth;
>>> + xfers[xferpos].dummy_data = 1;
>>> spi_message_add_tail(&xfers[xferpos], &msg);
>>> xferpos++;
>>> totalxferlen += op->dummy.nbytes;
>>> diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
>>> index aa09fdc..708f2f5 100644
>>> --- a/include/linux/spi/spi.h
>>> +++ b/include/linux/spi/spi.h
>>> @@ -827,6 +827,7 @@ extern void spi_res_release(struct spi_controller *ctlr,
>>> * transfer. If 0 the default (from @spi_device) is used.
>>> * @bits_per_word: select a bits_per_word other than the device default
>>> * for this transfer. If 0 the default (from @spi_device) is used.
>>> + * @dummy_data: indicates transfer is dummy bytes transfer.
>>> * @cs_change: affects chipselect after this transfer completes
>>> * @cs_change_delay: delay between cs deassert and assert when
>>> * @cs_change is set and @spi_transfer is not the last in @spi_message
>>> @@ -939,6 +940,7 @@ struct spi_transfer {
>>> struct sg_table tx_sg;
>>> struct sg_table rx_sg;
>>>
>>> + unsigned dummy_data:1;
>>> unsigned cs_change:1;
>>> unsigned tx_nbits:3;
>>> unsigned rx_nbits:3;
>>> --
>>> 2.7.4
>>>

2020-12-18 20:50:48

by Pratyush Yadav

[permalink] [raw]
Subject: Re: [PATCH v4 5/9] spi: spi-mem: Mark dummy transfers by setting dummy_data bit

On 18/12/20 10:09AM, Sowjanya Komatineni wrote:
>
> On 12/18/20 1:57 AM, Boris Brezillon wrote:
> > On Fri, 18 Dec 2020 14:51:08 +0530
> > Pratyush Yadav <[email protected]> wrote:
> >
> > > Hi Sowjanya,
> > >
> > > On 17/12/20 12:28PM, Sowjanya Komatineni wrote:
> > > > This patch marks dummy transfer by setting dummy_data bit to 1.
> > > >
> > > > Controllers supporting dummy transfer by hardware use this bit field
> > > > to skip software transfer of dummy bytes and use hardware dummy bytes
> > > > transfer.
> > > What is the benefit you get from this change? You add complexity in
> > > spi-mem and the controller driver, so that must come with some benefits.
> > > Here I don't see any. The transfer will certainly take the same amount
> > > of time because the number or period of the dummy cycles has not
> > > changed. So why is this needed?
> > Well, you don't have to queue TX bytes if you use HW-based dummy
> > cycles, but I agree, I'd expect the overhead to be negligible,
> > especially since we're talking about emitting a few bytes, not hundreds.
> > This being said, the complexity added to the core is reasonable IMHO,
> > so if it really helps reducing the CPU overhead (we might need some
> > numbers to prove that), I guess it's okay.
>
> Hardware dummy cycles feature of Tegra QSPI is to save SW transfer cycle of
> dummy bytes by filling FIFO.
>
> I don't have numbers as we always use hardware dummy cycles with Tegra QSPI.

Most flashes use somewhere around 8-10 dummy cycles. Some of the faster
ones working at 166 MHz or above use 20-25. From what I understand, the
only time hardware transfer of dummy cycles will reduce CPU load
significantly will be when multiple small transfers are made in quick
succession. Unless you have such a usecase I don't know how much benefit
you will get by it.

Anyway, if the SPI maintainers think this is worth it, I won't object.

> > > > Signed-off-by: Sowjanya Komatineni <[email protected]>
> > > > ---
> > > > drivers/spi/spi-mem.c | 1 +
> > > > include/linux/spi/spi.h | 2 ++
> > > > 2 files changed, 3 insertions(+)
> > > >
> > > > diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c
> > > > index f3a3f19..c64371c 100644
> > > > --- a/drivers/spi/spi-mem.c
> > > > +++ b/drivers/spi/spi-mem.c
> > > > @@ -354,6 +354,7 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op)
> > > > xfers[xferpos].tx_buf = tmpbuf + op->addr.nbytes + 1;
> > > > xfers[xferpos].len = op->dummy.nbytes;
> > > > xfers[xferpos].tx_nbits = op->dummy.buswidth;
> > > > + xfers[xferpos].dummy_data = 1;
> > > > spi_message_add_tail(&xfers[xferpos], &msg);
> > > > xferpos++;
> > > > totalxferlen += op->dummy.nbytes;
> > > > diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
> > > > index aa09fdc..708f2f5 100644
> > > > --- a/include/linux/spi/spi.h
> > > > +++ b/include/linux/spi/spi.h
> > > > @@ -827,6 +827,7 @@ extern void spi_res_release(struct spi_controller *ctlr,
> > > > * transfer. If 0 the default (from @spi_device) is used.
> > > > * @bits_per_word: select a bits_per_word other than the device default
> > > > * for this transfer. If 0 the default (from @spi_device) is used.
> > > > + * @dummy_data: indicates transfer is dummy bytes transfer.
> > > > * @cs_change: affects chipselect after this transfer completes
> > > > * @cs_change_delay: delay between cs deassert and assert when
> > > > * @cs_change is set and @spi_transfer is not the last in @spi_message
> > > > @@ -939,6 +940,7 @@ struct spi_transfer {
> > > > struct sg_table tx_sg;
> > > > struct sg_table rx_sg;
> > > > + unsigned dummy_data:1;
> > > > unsigned cs_change:1;
> > > > unsigned tx_nbits:3;
> > > > unsigned rx_nbits:3;
> > > > --
> > > > 2.7.4

--
Regards,
Pratyush Yadav
Texas Instruments India

2020-12-18 20:55:40

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 5/9] spi: spi-mem: Mark dummy transfers by setting dummy_data bit

On Sat, Dec 19, 2020 at 12:49:38AM +0530, Pratyush Yadav wrote:

> Anyway, if the SPI maintainers think this is worth it, I won't object.

This gets kind of circular, for me it's a question of if there's some
meaningful benefit from using the feature vs the cost to support it and
from the sounds of it we don't have numbers on the benefits from using
it at present.


Attachments:
(No filename) (375.00 B)
signature.asc (499.00 B)
Download all attachments

2020-12-18 20:56:21

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 5/9] spi: spi-mem: Mark dummy transfers by setting dummy_data bit

On Fri, Dec 18, 2020 at 08:41:02PM +0000, Mark Brown wrote:
> On Sat, Dec 19, 2020 at 12:49:38AM +0530, Pratyush Yadav wrote:

> > Anyway, if the SPI maintainers think this is worth it, I won't object.

> This gets kind of circular, for me it's a question of if there's some
> meaningful benefit from using the feature vs the cost to support it and
> from the sounds of it we don't have numbers on the benefits from using
> it at present.

...although I do have to say looking at the implementation that the cost
seems low, it's just a flag set on an existing transfer. The only issue
is if we'd get more win from coalesing the entire transaction (or entire
transmit) into a single transfer that could be DMAed and/or requires
fewer trips through the stack which does make it seem like an unclear
tradeoff from the point of view of client drivers


Attachments:
(No filename) (864.00 B)
signature.asc (499.00 B)
Download all attachments

2020-12-18 22:07:37

by Sowjanya Komatineni

[permalink] [raw]
Subject: Re: [PATCH v4 5/9] spi: spi-mem: Mark dummy transfers by setting dummy_data bit


On 12/18/20 12:44 PM, Mark Brown wrote:
> On Fri, Dec 18, 2020 at 08:41:02PM +0000, Mark Brown wrote:
>> On Sat, Dec 19, 2020 at 12:49:38AM +0530, Pratyush Yadav wrote:
>>> Anyway, if the SPI maintainers think this is worth it, I won't object.
>> This gets kind of circular, for me it's a question of if there's some
>> meaningful benefit from using the feature vs the cost to support it and
>> from the sounds of it we don't have numbers on the benefits from using
>> it at present.
> ...although I do have to say looking at the implementation that the cost
> seems low, it's just a flag set on an existing transfer. The only issue
> is if we'd get more win from coalesing the entire transaction (or entire
> transmit) into a single transfer that could be DMAed and/or requires
> fewer trips through the stack which does make it seem like an unclear
> tradeoff from the point of view of client drivers

Using HW dummy cycles save extra software cycle of transfer which
involves transfer setup register writes, writing dummy bytes to TX FIFO,
interrupt processing.

Implementation wise it just a single bit field added to spi_transfer and
on Tegra controller driver programming dummy cycles with prior transfer
and skipping sw dummy transfer which is actually not complex.

From quick check, I see HW dummy cycles transfer of 128KB shows 18 Mb/s
while SW transfer of bytes shows 17.3 MB/s on average.

When back-to-back read commands are executed using HW dummy cycles will
definitely save cycles.

2020-12-21 16:53:33

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 5/9] spi: spi-mem: Mark dummy transfers by setting dummy_data bit

On Fri, Dec 18, 2020 at 02:01:56PM -0800, Sowjanya Komatineni wrote:

> From quick check, I see HW dummy cycles transfer of 128KB shows 18 Mb/s
> while SW transfer of bytes shows 17.3 MB/s on average.

OK, it's not going to revolutionize the world or anything but that's
definitely a speedup.


Attachments:
(No filename) (300.00 B)
signature.asc (499.00 B)
Download all attachments