This change reverts aba3a882a178: "mtd: spi-nor: intel: provide a range
for poll_timout". That change introduces a performance regression when
reading sequentially from flash. Logging calls to intel_spi_read without
this change we get:
Start MTD read
[ 20.045527] intel_spi_read(from=1800000, len=400000)
[ 20.045527] intel_spi_read(from=1800000, len=400000)
[ 282.199274] intel_spi_read(from=1c00000, len=400000)
[ 282.199274] intel_spi_read(from=1c00000, len=400000)
[ 544.351528] intel_spi_read(from=2000000, len=400000)
[ 544.351528] intel_spi_read(from=2000000, len=400000)
End MTD read
With this change:
Start MTD read
[ 21.942922] intel_spi_read(from=1c00000, len=400000)
[ 21.942922] intel_spi_read(from=1c00000, len=400000)
[ 23.784058] intel_spi_read(from=2000000, len=400000)
[ 23.784058] intel_spi_read(from=2000000, len=400000)
[ 25.625006] intel_spi_read(from=2400000, len=400000)
[ 25.625006] intel_spi_read(from=2400000, len=400000)
End MTD read
Signed-off-by: Luis Alberto Herrera <[email protected]>
---
drivers/mtd/spi-nor/controllers/intel-spi.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/spi-nor/controllers/intel-spi.c b/drivers/mtd/spi-nor/controllers/intel-spi.c
index 61d2a0ad2131..2b89361a0d3a 100644
--- a/drivers/mtd/spi-nor/controllers/intel-spi.c
+++ b/drivers/mtd/spi-nor/controllers/intel-spi.c
@@ -292,7 +292,7 @@ static int intel_spi_wait_hw_busy(struct intel_spi *ispi)
u32 val;
return readl_poll_timeout(ispi->base + HSFSTS_CTL, val,
- !(val & HSFSTS_CTL_SCIP), 40,
+ !(val & HSFSTS_CTL_SCIP), 0,
INTEL_SPI_TIMEOUT * 1000);
}
@@ -301,7 +301,7 @@ static int intel_spi_wait_sw_busy(struct intel_spi *ispi)
u32 val;
return readl_poll_timeout(ispi->sregs + SSFSTS_CTL, val,
- !(val & SSFSTS_CTL_SCIP), 40,
+ !(val & SSFSTS_CTL_SCIP), 0,
INTEL_SPI_TIMEOUT * 1000);
}
--
2.27.0.278.ge193c7cf3a9-goog
On Wed, Jun 10, 2020 at 10:46:49PM +0000, Luis Alberto Herrera wrote:
> This change reverts aba3a882a178: "mtd: spi-nor: intel: provide a range
> for poll_timout". That change introduces a performance regression when
> reading sequentially from flash. Logging calls to intel_spi_read without
> this change we get:
>
> Start MTD read
> [ 20.045527] intel_spi_read(from=1800000, len=400000)
> [ 20.045527] intel_spi_read(from=1800000, len=400000)
> [ 282.199274] intel_spi_read(from=1c00000, len=400000)
> [ 282.199274] intel_spi_read(from=1c00000, len=400000)
> [ 544.351528] intel_spi_read(from=2000000, len=400000)
> [ 544.351528] intel_spi_read(from=2000000, len=400000)
> End MTD read
>
> With this change:
>
> Start MTD read
> [ 21.942922] intel_spi_read(from=1c00000, len=400000)
> [ 21.942922] intel_spi_read(from=1c00000, len=400000)
> [ 23.784058] intel_spi_read(from=2000000, len=400000)
> [ 23.784058] intel_spi_read(from=2000000, len=400000)
> [ 25.625006] intel_spi_read(from=2400000, len=400000)
> [ 25.625006] intel_spi_read(from=2400000, len=400000)
> End MTD read
>
> Signed-off-by: Luis Alberto Herrera <[email protected]>
Acked-by: Mika Westerberg <[email protected]>
On 6/11/20 1:46 AM, Luis Alberto Herrera wrote:
> This change reverts aba3a882a178: "mtd: spi-nor: intel: provide a range
> for poll_timout". That change introduces a performance regression when
> reading sequentially from flash. Logging calls to intel_spi_read without
> this change we get:
>
> Start MTD read
> [ 20.045527] intel_spi_read(from=1800000, len=400000)
> [ 20.045527] intel_spi_read(from=1800000, len=400000)
> [ 282.199274] intel_spi_read(from=1c00000, len=400000)
> [ 282.199274] intel_spi_read(from=1c00000, len=400000)
> [ 544.351528] intel_spi_read(from=2000000, len=400000)
> [ 544.351528] intel_spi_read(from=2000000, len=400000)
> End MTD read
>
> With this change:
>
> Start MTD read
> [ 21.942922] intel_spi_read(from=1c00000, len=400000)
> [ 21.942922] intel_spi_read(from=1c00000, len=400000)
> [ 23.784058] intel_spi_read(from=2000000, len=400000)
> [ 23.784058] intel_spi_read(from=2000000, len=400000)
> [ 25.625006] intel_spi_read(from=2400000, len=400000)
> [ 25.625006] intel_spi_read(from=2400000, len=400000)
> End MTD read
>
> Signed-off-by: Luis Alberto Herrera <[email protected]>
> ---
> drivers/mtd/spi-nor/controllers/intel-spi.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/mtd/spi-nor/controllers/intel-spi.c b/drivers/mtd/spi-nor/controllers/intel-spi.c
> index 61d2a0ad2131..2b89361a0d3a 100644
> --- a/drivers/mtd/spi-nor/controllers/intel-spi.c
> +++ b/drivers/mtd/spi-nor/controllers/intel-spi.c
> @@ -292,7 +292,7 @@ static int intel_spi_wait_hw_busy(struct intel_spi *ispi)
> u32 val;
>
> return readl_poll_timeout(ispi->base + HSFSTS_CTL, val,
> - !(val & HSFSTS_CTL_SCIP), 40,
> + !(val & HSFSTS_CTL_SCIP), 0,
Would 10 us keep the performance as it was before?
Cheers,
ta
Hello Luis,
thank you for the patch!
On 11/06/2020 00:46, Luis Alberto Herrera wrote:
> This change reverts aba3a882a178: "mtd: spi-nor: intel: provide a range
> for poll_timout". That change introduces a performance regression when
> reading sequentially from flash. Logging calls to intel_spi_read without
> this change we get:
>
> Start MTD read
> [ 20.045527] intel_spi_read(from=1800000, len=400000)
> [ 20.045527] intel_spi_read(from=1800000, len=400000)
> [ 282.199274] intel_spi_read(from=1c00000, len=400000)
> [ 282.199274] intel_spi_read(from=1c00000, len=400000)
> [ 544.351528] intel_spi_read(from=2000000, len=400000)
> [ 544.351528] intel_spi_read(from=2000000, len=400000)
> End MTD read
>
> With this change:
>
> Start MTD read
> [ 21.942922] intel_spi_read(from=1c00000, len=400000)
> [ 21.942922] intel_spi_read(from=1c00000, len=400000)
> [ 23.784058] intel_spi_read(from=2000000, len=400000)
> [ 23.784058] intel_spi_read(from=2000000, len=400000)
> [ 25.625006] intel_spi_read(from=2400000, len=400000)
> [ 25.625006] intel_spi_read(from=2400000, len=400000)
> End MTD read
I've performed my testing as well and got the following results:
Vanilla Linux 4.9 (i.e. before the introduction of the offending
patch):
dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
1280+0 records in
1280+0 records out
5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.91981 s, 1.3 MB/s
Vanilla 4.19 (i.e. with offending patch):
dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
1280+0 records in
1280+0 records out
5242880 bytes (5.2 MB, 5.0 MiB) copied, 6.70891 s, 781 kB/s
4.19 + revert:
dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
1280+0 records in
1280+0 records out
5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.90503 s, 1.3 MB/s
Therefore it looks good from my PoV:
Tested-by: Alexander Sverdlin <[email protected]>
> Signed-off-by: Luis Alberto Herrera <[email protected]>
> Acked-by: Mika Westerberg <[email protected]>
> ---
> drivers/mtd/spi-nor/controllers/intel-spi.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/mtd/spi-nor/controllers/intel-spi.c b/drivers/mtd/spi-nor/controllers/intel-spi.c
> index 61d2a0ad2131..2b89361a0d3a 100644
> --- a/drivers/mtd/spi-nor/controllers/intel-spi.c
> +++ b/drivers/mtd/spi-nor/controllers/intel-spi.c
> @@ -292,7 +292,7 @@ static int intel_spi_wait_hw_busy(struct intel_spi *ispi)
> u32 val;
>
> return readl_poll_timeout(ispi->base + HSFSTS_CTL, val,
> - !(val & HSFSTS_CTL_SCIP), 40,
> + !(val & HSFSTS_CTL_SCIP), 0,
> INTEL_SPI_TIMEOUT * 1000);
> }
>
> @@ -301,7 +301,7 @@ static int intel_spi_wait_sw_busy(struct intel_spi *ispi)
> u32 val;
>
> return readl_poll_timeout(ispi->sregs + SSFSTS_CTL, val,
> - !(val & SSFSTS_CTL_SCIP), 40,
> + !(val & SSFSTS_CTL_SCIP), 0,
> INTEL_SPI_TIMEOUT * 1000);
> }
>
>
--
Best regards,
Alexander Sverdlin.
Hi, Alexander,
On 7/22/20 7:37 PM, Alexander Sverdlin wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> Hello Luis,
>
> thank you for the patch!
>
> On 11/06/2020 00:46, Luis Alberto Herrera wrote:
>> This change reverts aba3a882a178: "mtd: spi-nor: intel: provide a range
>> for poll_timout". That change introduces a performance regression when
>> reading sequentially from flash. Logging calls to intel_spi_read without
>> this change we get:
>>
>> Start MTD read
>> [ 20.045527] intel_spi_read(from=1800000, len=400000)
>> [ 20.045527] intel_spi_read(from=1800000, len=400000)
>> [ 282.199274] intel_spi_read(from=1c00000, len=400000)
>> [ 282.199274] intel_spi_read(from=1c00000, len=400000)
>> [ 544.351528] intel_spi_read(from=2000000, len=400000)
>> [ 544.351528] intel_spi_read(from=2000000, len=400000)
>> End MTD read
>>
>> With this change:
>>
>> Start MTD read
>> [ 21.942922] intel_spi_read(from=1c00000, len=400000)
>> [ 21.942922] intel_spi_read(from=1c00000, len=400000)
>> [ 23.784058] intel_spi_read(from=2000000, len=400000)
>> [ 23.784058] intel_spi_read(from=2000000, len=400000)
>> [ 25.625006] intel_spi_read(from=2400000, len=400000)
>> [ 25.625006] intel_spi_read(from=2400000, len=400000)
>> End MTD read
>
> I've performed my testing as well and got the following results:
>
> Vanilla Linux 4.9 (i.e. before the introduction of the offending
> patch):
>
> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
> 1280+0 records in
> 1280+0 records out
> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.91981 s, 1.3 MB/s
>
> Vanilla 4.19 (i.e. with offending patch):
>
> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
> 1280+0 records in
> 1280+0 records out
> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 6.70891 s, 781 kB/s
>
> 4.19 + revert:
>
> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
> 1280+0 records in
> 1280+0 records out
> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.90503 s, 1.3 MB/s
>
> Therefore it looks good from my PoV:
>
> Tested-by: Alexander Sverdlin <[email protected]>
>
>> Signed-off-by: Luis Alberto Herrera <[email protected]>
>> Acked-by: Mika Westerberg <[email protected]>
>> ---
>> drivers/mtd/spi-nor/controllers/intel-spi.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/mtd/spi-nor/controllers/intel-spi.c b/drivers/mtd/spi-nor/controllers/intel-spi.c
>> index 61d2a0ad2131..2b89361a0d3a 100644
>> --- a/drivers/mtd/spi-nor/controllers/intel-spi.c
>> +++ b/drivers/mtd/spi-nor/controllers/intel-spi.c
>> @@ -292,7 +292,7 @@ static int intel_spi_wait_hw_busy(struct intel_spi *ispi)
>> u32 val;
>>
>> return readl_poll_timeout(ispi->base + HSFSTS_CTL, val,
>> - !(val & HSFSTS_CTL_SCIP), 40,>> + !(val & HSFSTS_CTL_SCIP), 0,
would you put 10 us here
>> INTEL_SPI_TIMEOUT * 1000);
>> }
>>
>> @@ -301,7 +301,7 @@ static int intel_spi_wait_sw_busy(struct intel_spi *ispi)
>> u32 val;
>>
>> return readl_poll_timeout(ispi->sregs + SSFSTS_CTL, val,
>> - !(val & SSFSTS_CTL_SCIP), 40,
>> + !(val & SSFSTS_CTL_SCIP), 0,
also here, and re-do a test? I'm curios if the performance will be
as it was before.
Thanks!
>> INTEL_SPI_TIMEOUT * 1000);
>> }
>>
>>
>
> --
> Best regards,
> Alexander Sverdlin.
>
Hello Tudor,
On 22/07/2020 19:03, [email protected] wrote:
> On 7/22/20 7:37 PM, Alexander Sverdlin wrote:
[...]
>> I've performed my testing as well and got the following results:
>>
>> Vanilla Linux 4.9 (i.e. before the introduction of the offending
>> patch):
>>
>> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
>> 1280+0 records in
>> 1280+0 records out
>> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.91981 s, 1.3 MB/s
>>
>> Vanilla 4.19 (i.e. with offending patch):
>>
>> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
>> 1280+0 records in
>> 1280+0 records out
>> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 6.70891 s, 781 kB/s
>>
>> 4.19 + revert:
>>
>> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
>> 1280+0 records in
>> 1280+0 records out
>> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.90503 s, 1.3 MB/s
>>
>> Therefore it looks good from my PoV:
>>
>> Tested-by: Alexander Sverdlin <[email protected]>
[...]
> would you put 10 us here
>>> INTEL_SPI_TIMEOUT * 1000);
>>> }
>>>
>>> @@ -301,7 +301,7 @@ static int intel_spi_wait_sw_busy(struct intel_spi *ispi)
>>> u32 val;
>>>
>>> return readl_poll_timeout(ispi->sregs + SSFSTS_CTL, val,
>>> - !(val & SSFSTS_CTL_SCIP), 40,
>>> + !(val & SSFSTS_CTL_SCIP), 0,
>
> also here, and re-do a test? I'm curios if the performance will be
> as it was before.
with 10us it looks like this:
dd if=/dev/flash/by-name/... of=/dev/null bs=4k
1280+0 records in
1280+0 records out
5242880 bytes (5.2 MB, 5.0 MiB) copied, 4.33816 s, 1.2 MB/s
Which means, there is a performance regression and it would depend on
the test case, how bad it will be...
--
Best regards,
Alexander Sverdlin.
Hi, Mika,
On 7/23/20 12:05 PM, Alexander Sverdlin wrote:
>
> Hello Tudor,
>
> On 22/07/2020 19:03, [email protected] wrote:
>> On 7/22/20 7:37 PM, Alexander Sverdlin wrote:
>
> [...]
>
>>> I've performed my testing as well and got the following results:
>>>
>>> Vanilla Linux 4.9 (i.e. before the introduction of the offending
>>> patch):
>>>
>>> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
>>> 1280+0 records in
>>> 1280+0 records out
>>> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.91981 s, 1.3 MB/s
>>>
>>> Vanilla 4.19 (i.e. with offending patch):
>>>
>>> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
>>> 1280+0 records in
>>> 1280+0 records out
>>> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 6.70891 s, 781 kB/s
>>>
>>> 4.19 + revert:
>>>
>>> dd if=/dev/flash/by-name/XXX of=/dev/null bs=4k
>>> 1280+0 records in
>>> 1280+0 records out
>>> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 3.90503 s, 1.3 MB/s
>>>
[cut]
> with 10us it looks like this:
>
> dd if=/dev/flash/by-name/... of=/dev/null bs=4k
> 1280+0 records in
> 1280+0 records out
> 5242880 bytes (5.2 MB, 5.0 MiB) copied, 4.33816 s, 1.2 MB/s
>
> Which means, there is a performance regression and it would depend on
> the test case, how bad it will be...
>
We need a bit of a context here. Using a tight-loop for polling and
having a 5 secs timeout is fishy. For anything that's expected to
complete less than a few usec, it's usually better to poll continuously,
but then a timeout of 5s is way too big. Can we shrink the timeout to
few msecs?
I'll queue this to spi-nor/next to fix the perf regression, but I would
like to continue the discussion and to come up with an incremental patch
on top of this one.
Cheers,
ta
On Wed, 10 Jun 2020 22:46:49 +0000, Luis Alberto Herrera wrote:
> This change reverts aba3a882a178: "mtd: spi-nor: intel: provide a range
> for poll_timout". That change introduces a performance regression when
> reading sequentially from flash. Logging calls to intel_spi_read without
> this change we get:
>
> Start MTD read
> [ 20.045527] intel_spi_read(from=1800000, len=400000)
> [ 20.045527] intel_spi_read(from=1800000, len=400000)
> [ 282.199274] intel_spi_read(from=1c00000, len=400000)
> [ 282.199274] intel_spi_read(from=1c00000, len=400000)
> [ 544.351528] intel_spi_read(from=2000000, len=400000)
> [ 544.351528] intel_spi_read(from=2000000, len=400000)
> End MTD read
>
> [...]
Applied to spi-nor/next, thanks!
[1/1] mtd: revert "spi-nor: intel: provide a range for poll_timout"
https://git.kernel.org/mtd/c/e93a977367b2
Best regards,
--
Tudor Ambarus <[email protected]>