There are large number of QSPI irqs that fire during boot/init and later
on every suspend/resume.
This could be made faster by doing DMA instead of PIO.
Below is comparison for number of interrupts raised in 2 acenarios...
Boot up and stabilise
Suspend/Resume
Sequence PIO DMA
=======================
Boot-up 69088 19284
S/R 5066 3430
Though we have not made measurements for speed, power we expect
the performance to be better with DMA mode and no regressions were
encountered in testing.
Vijaya Krishna Nivarthi (3):
spi: dt-bindings: qcom,spi-qcom-qspi: Add iommus
arm64: dts: qcom: sc7280: Add stream-id of qspi to iommus
spi: spi-qcom-qspi: Add DMA mode support
---
v2 -> v3:
- Modified commit messages
- Made a change to driver based on re-review
v1 -> v2:
- Added documentation file to the series
- Made changes to driver based on HPG re-review
---
.../bindings/spi/qcom,spi-qcom-qspi.yaml | 3 +
arch/arm64/boot/dts/qcom/sc7280.dtsi | 1 +
drivers/spi/spi-qcom-qspi.c | 434 +++++++++++++++++++--
3 files changed, 407 insertions(+), 31 deletions(-)
--
Qualcomm INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by the Linux Foundation.
Hi,
On Fri, Apr 14, 2023 at 7:06 AM Vijaya Krishna Nivarthi
<[email protected]> wrote:
>
> There are large number of QSPI irqs that fire during boot/init and later
> on every suspend/resume.
> This could be made faster by doing DMA instead of PIO.
> Below is comparison for number of interrupts raised in 2 acenarios...
s/acenarios/scenarios
> Boot up and stabilise
> Suspend/Resume
>
> Sequence PIO DMA
> =======================
> Boot-up 69088 19284
> S/R 5066 3430
>
> Though we have not made measurements for speed, power we expect
> the performance to be better with DMA mode and no regressions were
> encountered in testing.
Measuring the speed isn't really very hard, so I gave it a shot.
I used a truly terrible python script to do this on a Chromebook:
--
import os
import time
os.system("""
stop ui
stop powerd
cd /sys/devices/system/cpu/cpufreq
for policy in policy*; do
cat ${policy}/cpuinfo_max_freq > ${policy}/scaling_min_freq
done
""")
all_times = []
for i in range(1000):
start = time.time()
os.system("flashrom -p host -r /tmp/foo.bin")
end = time.time()
all_times.append(end - start)
print("Iteration %d, min=%.2f, max=%.2f, avg=%.2f" % (
i, min(all_times), max(all_times), sum(all_times) / len(all_times)))
--
The good news is that after applying your patches the loop runs _much_ faster.
The bad news is that it runs much faster because it very quickly fails
and errors out. flashrom just keeps reporting:
Opened /dev/mtd0 successfully
Found Programmer flash chip "Opaque flash chip" (8192 kB,
Programmer-specific) on host.
Reading flash... Cannot read 0x001000 bytes at 0x000000: Connection timed out
read_flash: failed to read (00000000..0x7fffff).
Read operation failed!
FAILED.
FAILED
I went back and tried v1, v2, and v3 and all three versions fail.
Hi,
On Fri, Apr 14, 2023 at 8:48 AM Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Fri, Apr 14, 2023 at 7:06 AM Vijaya Krishna Nivarthi
> <[email protected]> wrote:
> >
> > There are large number of QSPI irqs that fire during boot/init and later
> > on every suspend/resume.
> > This could be made faster by doing DMA instead of PIO.
> > Below is comparison for number of interrupts raised in 2 acenarios...
>
> s/acenarios/scenarios
>
> > Boot up and stabilise
> > Suspend/Resume
> >
> > Sequence PIO DMA
> > =======================
> > Boot-up 69088 19284
> > S/R 5066 3430
> >
> > Though we have not made measurements for speed, power we expect
> > the performance to be better with DMA mode and no regressions were
> > encountered in testing.
>
> Measuring the speed isn't really very hard, so I gave it a shot.
>
> I used a truly terrible python script to do this on a Chromebook:
>
> --
>
> import os
> import time
>
> os.system("""
> stop ui
> stop powerd
>
> cd /sys/devices/system/cpu/cpufreq
> for policy in policy*; do
> cat ${policy}/cpuinfo_max_freq > ${policy}/scaling_min_freq
> done
> """)
>
> all_times = []
> for i in range(1000):
> start = time.time()
> os.system("flashrom -p host -r /tmp/foo.bin")
> end = time.time()
>
> all_times.append(end - start)
> print("Iteration %d, min=%.2f, max=%.2f, avg=%.2f" % (
> i, min(all_times), max(all_times), sum(all_times) / len(all_times)))
>
> --
>
> The good news is that after applying your patches the loop runs _much_ faster.
>
> The bad news is that it runs much faster because it very quickly fails
> and errors out. flashrom just keeps reporting:
>
> Opened /dev/mtd0 successfully
> Found Programmer flash chip "Opaque flash chip" (8192 kB,
> Programmer-specific) on host.
> Reading flash... Cannot read 0x001000 bytes at 0x000000: Connection timed out
> read_flash: failed to read (00000000..0x7fffff).
> Read operation failed!
> FAILED.
> FAILED
>
> I went back and tried v1, v2, and v3 and all three versions fail.
Ah, I see what's likely the problem. Your patch series only adds the
"iommus" for sc7280 but I'm testing on sc7180. That means:
1. You need to add the iommus to _all_ the boards that have qspi. That
means sc7280, sc7180, and sdm845.
2. Ideally the code should still be made to work (it should fall back
to PIO mode) if DMA isn't properly enabled. That would keep old device
trees working, which we're supposed to do.
-Doug
On 4/14/2023 10:12 PM, Doug Anderson wrote:
> Hi,
>
> On Fri, Apr 14, 2023 at 8:48 AM Doug Anderson <[email protected]> wrote:
>> Hi,
>>
>> On Fri, Apr 14, 2023 at 7:06 AM Vijaya Krishna Nivarthi
>> <[email protected]> wrote:
>>> There are large number of QSPI irqs that fire during boot/init and later
>>> on every suspend/resume.
>>> This could be made faster by doing DMA instead of PIO.
>>> Below is comparison for number of interrupts raised in 2 acenarios...
>> s/acenarios/scenarios
>>
>>> Boot up and stabilise
>>> Suspend/Resume
>>>
>>> Sequence PIO DMA
>>> =======================
>>> Boot-up 69088 19284
>>> S/R 5066 3430
>>>
>>> Though we have not made measurements for speed, power we expect
>>> the performance to be better with DMA mode and no regressions were
>>> encountered in testing.
>> Measuring the speed isn't really very hard, so I gave it a shot.
>>
>> I used a truly terrible python script to do this on a Chromebook:
>>
>> --
>>
>> import os
>> import time
>>
>> os.system("""
>> stop ui
>> stop powerd
>>
>> cd /sys/devices/system/cpu/cpufreq
>> for policy in policy*; do
>> cat ${policy}/cpuinfo_max_freq > ${policy}/scaling_min_freq
>> done
>> """)
>>
>> all_times = []
>> for i in range(1000):
>> start = time.time()
>> os.system("flashrom -p host -r /tmp/foo.bin")
>> end = time.time()
>>
>> all_times.append(end - start)
>> print("Iteration %d, min=%.2f, max=%.2f, avg=%.2f" % (
>> i, min(all_times), max(all_times), sum(all_times) / len(all_times)))
>>
>> --
>>
>> The good news is that after applying your patches the loop runs _much_ faster.
>>
>> The bad news is that it runs much faster because it very quickly fails
>> and errors out. flashrom just keeps reporting:
>>
>> Opened /dev/mtd0 successfully
>> Found Programmer flash chip "Opaque flash chip" (8192 kB,
>> Programmer-specific) on host.
>> Reading flash... Cannot read 0x001000 bytes at 0x000000: Connection timed out
>> read_flash: failed to read (00000000..0x7fffff).
>> Read operation failed!
>> FAILED.
>> FAILED
>>
>> I went back and tried v1, v2, and v3 and all three versions fail.
> Ah, I see what's likely the problem. Your patch series only adds the
> "iommus" for sc7280 but I'm testing on sc7180. That means:
>
> 1. You need to add the iommus to _all_ the boards that have qspi. That
> means sc7280, sc7180, and sdm845.
>
> 2. Ideally the code should still be made to work (it should fall back
> to PIO mode) if DMA isn't properly enabled. That would keep old device
> trees working, which we're supposed to do.
Thank you very much for the review, script, test and quick debug.
Will check same and update a v4.
-Vijay/
> -Doug