There's an unreliable code to handle DMA mappings on unidirection transfers.
This series does two things:
- it reverts the seemingly unnecessary change
- it reworks dummy SG list handling
There is no need to backport that AFAIU, but no harm to apply for v6.10 aka
the current release cycle. Guys, please test these.
Andy Shevchenko (2):
spi: Revert "Check if transfer is mapped before calling DMA sync APIs"
spi: Do not rely on the SG table and respective API implementations
drivers/spi/spi.c | 46 +++++++++++++++++-----------------------------
1 file changed, 17 insertions(+), 29 deletions(-)
--
2.45.1
On Fri, May 31, 2024 at 12:44:31PM +0300, Andy Shevchenko wrote:
> There's an unreliable code to handle DMA mappings on unidirection transfers.
> This series does two things:
> - it reverts the seemingly unnecessary change
> - it reworks dummy SG list handling
>
> There is no need to backport that AFAIU, but no harm to apply for v6.10 aka
> the current release cycle. Guys, please test these.
>
> Andy Shevchenko (2):
> spi: Revert "Check if transfer is mapped before calling DMA sync APIs"
> spi: Do not rely on the SG table and respective API implementations
>
> drivers/spi/spi.c | 46 +++++++++++++++++-----------------------------
> 1 file changed, 17 insertions(+), 29 deletions(-)
>
> --
> 2.45.1
>
Hi Andy,
applying either of these patches causes issues. See the traces for each one
below. This was tested on top of next-20240531, which works fine.
Thanks,
N?colas
patch 1, spi: Revert "Check if transfer is mapped before calling DMA sync APIs":
[ 3.114283] Unable to handle kernel NULL pointer dereference at virtual address 000000000000001c
[ 3.123323] Mem abort info:
[ 3.126200] ESR = 0x0000000096000004
[ 3.130053] EC = 0x25: DABT (current EL), IL = 32 bits
[ 3.135523] SET = 0, FnV = 0
[ 3.138667] EA = 0, S1PTW = 0
[ 3.141902] FSC = 0x04: level 0 translation fault
[ 3.146913] Data abort info:
[ 3.149884] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 3.155521] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 3.160717] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 3.166173] [000000000000001c] user address but active_mm is swapper
[ 3.169640] input: cros_ec as /devices/platform/soc@0/ac0000.geniqup/a80000.spi/spi_master/spi6/spi6.0/a80000.spi:ec@0:keyboard-controller/input/input0
[ 3.172698] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 3.172703] Modules linked in:
[ 3.172708] CPU: 6 PID: 58 Comm: kworker/u32:1 Not tainted 6.10.0-rc1-next-20240531-00003-g60f06088bdb0 #452
[ 3.172716] Hardware name: Google Lazor Limozeen without Touchscreen (rev5 - rev8) (DT)
[ 3.172720] Workqueue: events_unbound deferred_probe_work_func
[ 3.172735] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3.172743] pc : iommu_dma_sync_sg_for_device+0x28/0x100
[ 3.172755] lr : __dma_sync_sg_for_device+0x28/0x4c
[ 3.172764] sp : ffff80008036adc0
[ 3.172767] x29: ffff80008036adc0 x28: ffff646682dd4000 x27: ffff646680feb810
[ 3.172777] x26: ffff80008036b008 x25: ffff646682dd4480 x24: ffffbf8f359d40e8
[ 3.172784] x23: ffff646680feb810 x22: 0000000000000001 x21: 0000000000000000
[ 3.172792] x20: ffffbf8f35d3da18 x19: 0000000000000000 x18: ffffbf8f36a1a390
[ 3.172799] x17: 0000000000010108 x16: 0000000000000000 x15: 0000000000000002
[ 3.172806] x14: 0000000000000001 x13: 0000000000067dc0 x12: 0000000000000001
[ 3.172813] x11: ffff80008036acd0 x10: ffff64668370eff8 x9 : ffff646682dd4469
[ 3.172821] x8 : ffff646683629704 x7 : 00000000ffffffff x6 : 0000000000000001
[ 3.172828] x5 : ffffbf8f3513bdb0 x4 : ffffbf8f34308ccc x3 : 0000000000000001
[ 3.172835] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff646680feb810
[ 3.172842] Call trace:
[ 3.172845] iommu_dma_sync_sg_for_device+0x28/0x100
[ 3.172851] __dma_sync_sg_for_device+0x28/0x4c
[ 3.172856] spi_transfer_one_message+0x378/0x6e4
[ 3.172866] __spi_pump_transfer_message+0x1e0/0x504
[ 3.172875] __spi_sync+0x2a0/0x3c4
[ 3.172882] spi_sync+0x30/0x54
[ 3.172889] spi_mem_exec_op+0x26c/0x41c
[ 3.172898] spi_nor_spimem_read_data+0x148/0x158
[ 3.172906] spi_nor_read_data+0x30/0x3c
[ 3.172913] spi_nor_read_sfdp+0x74/0xe4
[ 3.172922] spi_nor_parse_sfdp+0x120/0x11d0
[ 3.172930] spi_nor_sfdp_init_params_deprecated+0x3c/0x8c
[ 3.172938] spi_nor_scan+0x5d8/0xe6c
[ 3.172946] spi_nor_probe+0x94/0x2f0
[ 3.172954] spi_mem_probe+0x6c/0xac
[ 3.172962] spi_probe+0x84/0xe4
[ 3.172966] really_probe+0xbc/0x2a0
[ 3.172973] __driver_probe_device+0x78/0x12c
[ 3.172980] driver_probe_device+0x40/0x160
[ 3.172986] __device_attach_driver+0xb8/0x134
[ 3.172992] bus_for_each_drv+0x84/0xe0
[ 3.172997] __device_attach+0xa8/0x1b0
[ 3.173004] device_initial_probe+0x14/0x20
[ 3.173010] bus_probe_device+0xa8/0xac
[ 3.173015] device_add+0x590/0x750
[ 3.173026] __spi_add_device+0x138/0x208
[ 3.173033] of_register_spi_device+0x394/0x57c
[ 3.173041] spi_register_controller+0x394/0x760
[ 3.173048] qcom_qspi_probe+0x328/0x390
[ 3.173058] platform_probe+0x68/0xd8
[ 3.173066] really_probe+0xbc/0x2a0
[ 3.173072] __driver_probe_device+0x78/0x12c
[ 3.173078] driver_probe_device+0x40/0x160
[ 3.173084] __device_attach_driver+0xb8/0x134
[ 3.173090] bus_for_each_drv+0x84/0xe0
[ 3.173095] __device_attach+0xa8/0x1b0
[ 3.173101] device_initial_probe+0x14/0x20
[ 3.173108] bus_probe_device+0xa8/0xac
[ 3.173113] deferred_probe_work_func+0x88/0xc0
[ 3.173119] process_one_work+0x154/0x298
[ 3.173131] worker_thread+0x2f0/0x3f8
[ 3.173140] kthread+0x118/0x11c
[ 3.173148] ret_from_fork+0x10/0x20
[ 3.173158] Code: 2a0203f5 2a0303f6 a90363f7 aa0003f7 (b9401c20)
[ 3.173163] ---[ end trace 0000000000000000 ]---
patch 2, spi: Do not rely on the SG table and respective API implementations:
[ 3.085653] Unable to handle kernel paging request at virtual address ffff801000000000
[ 3.093815] Mem abort info:
[ 3.109892] ESR = 0x0000000096000145
[ 3.109896] EC = 0x25: DABT (current EL), IL = 32 bits
[ 3.126955] SET = 0, FnV = 0
[ 3.134900] Unable to handle kernel paging request at virtual address ffff801000000000
[ 3.138038] EA = 0, S1PTW = 0
[ 3.138041] FSC = 0x05: level 1 translation fault
[ 3.141183] Mem abort info:
[ 3.141185] ESR = 0x0000000096000145
[ 3.149305] Data abort info:
[ 3.149308] ISV = 0, ISS = 0x00000145, ISS2 = 0x00000000
[ 3.152530] EC = 0x25: DABT (current EL), IL = 32 bits
[ 3.157552] CM = 1, WnR = 1, TnD = 0, TagAccess = 0
[ 3.160417] SET = 0, FnV = 0
[ 3.164276] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 3.167237] EA = 0, S1PTW = 0
[ 3.167240] FSC = 0x05: level 1 translation fault
[ 3.172880] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000082d2d000
[ 3.178323] Data abort info:
[ 3.178325] ISV = 0, ISS = 0x00000145, ISS2 = 0x00000000
[ 3.178330] CM = 1, WnR = 1, TnD = 0, TagAccess = 0
[ 3.183519] [ffff801000000000] pgd=0000000000000000
[ 3.186653] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 3.192114] , p4d=1000000100039003
[ 3.195339] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000082d2d000
[ 3.195344] [ffff801000000000] pgd=0000000000000000
[ 3.200367] , pud=0000000000000000
[ 3.207249] , p4d=1000000100039003, pud=0000000000000000
[ 3.207257] Internal error: Oops: 0000000096000145 [#1] PREEMPT SMP
[ 3.207262] Modules linked in:
[ 3.207267] CPU: 7 PID: 70 Comm: kworker/u32:4 Not tainted 6.10.0-rc1-next-20240531-00003-ge34bac32ec36 #453
[ 3.207275] Hardware name: Google Lazor Limozeen without Touchscreen (rev5 - rev8) (DT)
[ 3.207278] Workqueue: events_unbound deferred_probe_work_func
[ 3.207292] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3.207299] pc : dcache_clean_poc+0x20/0x38
[ 3.207315] lr : arch_sync_dma_for_device+0x24/0x30
[ 3.207323] sp : ffff8000807d2db0
[ 3.207325] x29: ffff8000807d2db0 x28: ffff0cc6c0feb810 x27: ffff0cc6c0feb810
[ 3.207334] x26: ffff0cc6c339b800 x25: ffff0cc6c339bc80 x24: ffffb9dc0508af48
[ 3.207341] x23: 0000000000000001 x22: 0000000000000001 x21: 0000000000000001
[ 3.207348] x20: ffffc1ffc0000000 x19: ffff0cc6c32b6d20 x18: ffffb9dc0601a390
[ 3.207355] x17: 0000000000010108 x16: 0000000000000000 x15: ffff0cc73b5dd580
[ 3.207363] x14: ffff0cc6c09591c0 x13: 0000000000000400 x12: ffffb9dc05c0d000
[ 3.207370] x11: 071c71c71c71c71c x10: 0000000000000aa0 x9 : ffff8000807d2c80
[ 3.207378] x8 : ffff0cc73b5ec800 x7 : 0000000000000400 x6 : ffff0cc73b5ec828
[ 3.207385] x5 : ffffb9dc0473bdb0 x4 : fffffe32d7000000 x3 : 000000000000003f
[ 3.207392] x2 : 0000000000000040 x1 : ffff801000000000 x0 : ffff801000000000
[ 3.207399] Call trace:
[ 3.207402] dcache_clean_poc+0x20/0x38
[ 3.207410] iommu_dma_sync_sg_for_device+0xd4/0x100
[ 3.207420] __dma_sync_sg_for_device+0x28/0x4c
[ 3.207427] spi_transfer_one_message+0x3a8/0x700
[ 3.207436] __spi_pump_transfer_message+0x1f0/0x504
[ 3.207444] __spi_sync+0x2a0/0x3c4
[ 3.207452] spi_sync+0x30/0x54
[ 3.207459] spi_mem_exec_op+0x26c/0x41c
[ 3.207468] spi_nor_spimem_read_data+0x148/0x158
[ 3.207476] spi_nor_read_data+0x30/0x3c
[ 3.207482] spi_nor_read_sfdp+0x74/0xe4
[ 3.207490] spi_nor_parse_sfdp+0x120/0x11d0
[ 3.207499] spi_nor_sfdp_init_params_deprecated+0x3c/0x8c
[ 3.207507] spi_nor_scan+0x5d8/0xe6c
[ 3.207515] spi_nor_probe+0x94/0x2f0
[ 3.207523] spi_mem_probe+0x6c/0xac
[ 3.207531] spi_probe+0x84/0xe4
[ 3.207536] really_probe+0xbc/0x2a0
[ 3.207542] __driver_probe_device+0x78/0x12c
[ 3.207549] driver_probe_device+0x40/0x160
[ 3.207555] __device_attach_driver+0xb8/0x134
[ 3.207562] bus_for_each_drv+0x84/0xe0
[ 3.207567] __device_attach+0xa8/0x1b0
[ 3.207573] device_initial_probe+0x14/0x20
[ 3.207580] bus_probe_device+0xa8/0xac
[ 3.207586] device_add+0x590/0x750
[ 3.207596] __spi_add_device+0x138/0x208
[ 3.207603] of_register_spi_device+0x394/0x57c
[ 3.207611] spi_register_controller+0x394/0x760
[ 3.207619] qcom_qspi_probe+0x328/0x390
[ 3.207628] platform_probe+0x68/0xd8
[ 3.207636] really_probe+0xbc/0x2a0
[ 3.207642] __driver_probe_device+0x78/0x12c
[ 3.207649] driver_probe_device+0x40/0x160
[ 3.207655] __device_attach_driver+0xb8/0x134
[ 3.207660] bus_for_each_drv+0x84/0xe0
[ 3.207666] __device_attach+0xa8/0x1b0
[ 3.207672] device_initial_probe+0x14/0x20
[ 3.207679] bus_probe_device+0xa8/0xac
[ 3.207684] deferred_probe_work_func+0x88/0xc0
[ 3.207690] process_one_work+0x154/0x298
[ 3.207701] worker_thread+0x2f0/0x3f8
[ 3.207711] kthread+0x118/0x11c
[ 3.207719] ret_from_fork+0x10/0x20
[ 3.207729] Code: d2800082 9ac32042 d1000443 8a230000 (d50b7a20)
[ 3.207733] ---[ end trace 0000000000000000 ]---
[ 3.579215]
[ 3.580760] Internal error: Oops: 0000000096000145 [#2] PREEMPT SMP
[ 3.580767] Modules linked in:
[ 3.580774] CPU: 4 PID: 121 Comm: cros_ec_spi_hig Tainted: G D 6.10.0-rc1-next-20240531-00003-ge34bac32ec36 #453
[ 3.580781] Hardware name: Google Lazor Limozeen without Touchscreen (rev5 - rev8) (DT)
[ 3.580785] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3.580792] pc : dcache_clean_poc+0x20/0x38
[ 3.580806] lr : arch_sync_dma_for_device+0x24/0x30
[ 3.580813] sp : ffff800080e7b970
[ 3.580815] x29: ffff800080e7b970 x28: ffff0cc6c0fd0010 x27: ffff0cc6c0fd0010
[ 3.580823] x26: ffff0cc6c25d1000 x25: ffff0cc6c25d1480 x24: ffffb9dc0508af48
[ 3.580830] x23: 0000000000000001 x22: 0000000000000001 x21: 0000000000000001
[ 3.580837] x20: ffffc1ffc0000000 x19: ffff0cc6c03a9520 x18: ffffb9dc0601a958
[ 3.580845] x17: 000000040044ffff x16: 005000f2b5503510 x15: 0000000000000002
[ 3.580852] x14: 0000000000000001 x13: 000000000006931f x12: 0000000000000001
[ 3.580859] x11: ffff800080e7b890 x10: ffff0cc6c03bdff8 x9 : ffff0cc6c25d1469
[ 3.580867] x8 : ffff0cc6c002b704 x7 : 0000000000000000 x6 : 0000000000000001
[ 3.580873] x5 : ffffb9dc0473bdb0 x4 : fffffe32d7000000 x3 : 000000000000003f
[ 3.580880] x2 : 0000000000000040 x1 : ffff801000000000 x0 : ffff801000000000
[ 3.580889] Call trace:
[ 3.580891] dcache_clean_poc+0x20/0x38
[ 3.580897] iommu_dma_sync_sg_for_device+0xd4/0x100
[ 3.580906] __dma_sync_sg_for_device+0x28/0x4c
[ 3.580913] spi_transfer_one_message+0x3a8/0x700
[ 3.580921] __spi_pump_transfer_message+0x1f0/0x504
[ 3.580928] __spi_sync+0x2a0/0x3c4
[ 3.580933] spi_sync_locked+0x10/0x1c
[ 3.580939] receive_n_bytes+0xc0/0x118
[ 3.580947] do_cros_ec_pkt_xfer_spi+0x3c0/0x530
[ 3.580951] cros_ec_xfer_high_pri_work+0x20/0x34
[ 3.580956] kthread_worker_fn+0xcc/0x184
[ 3.580964] kthread+0x118/0x11c
[ 3.580970] ret_from_fork+0x10/0x20
[ 3.580980] Code: d2800082 9ac32042 d1000443 8a230000 (d50b7a20)
[ 3.580983] ---[ end trace 0000000000000000 ]---
On Fri, May 31, 2024 at 5:37 PM Nícolas F. R. A. Prado
<[email protected]> wrote:
> On Fri, May 31, 2024 at 12:44:31PM +0300, Andy Shevchenko wrote:
> > There's an unreliable code to handle DMA mappings on unidirection transfers.
> > This series does two things:
> > - it reverts the seemingly unnecessary change
> > - it reworks dummy SG list handling
> >
> > There is no need to backport that AFAIU, but no harm to apply for v6.10 aka
> > the current release cycle. Guys, please test these.
> >
> > Andy Shevchenko (2):
> > spi: Revert "Check if transfer is mapped before calling DMA sync APIs"
> > spi: Do not rely on the SG table and respective API implementations
> Hi Andy,
>
> applying either of these patches causes issues. See the traces for each one
> below. This was tested on top of next-20240531, which works fine.
Oh, thank you very much for prompt testing! Can you test just the
second one without the revert?
So, your patch seems needed for the sync API calls, while I considered
only mapping APIs.
--
With Best Regards,
Andy Shevchenko
On Fri, May 31, 2024 at 6:46 PM Andy Shevchenko
<[email protected]> wrote:
> On Fri, May 31, 2024 at 5:37 PM Nícolas F. R. A. Prado
> <[email protected]> wrote:
> > On Fri, May 31, 2024 at 12:44:31PM +0300, Andy Shevchenko wrote:
..
> > applying either of these patches causes issues. See the traces for each one
> > below. This was tested on top of next-20240531, which works fine.
>
> Oh, thank you very much for prompt testing! Can you test just the
> second one without the revert?
Ah, you wrote "either", so it seems you have tried that already.
> So, your patch seems needed for the sync API calls, while I considered
> only mapping APIs.
--
With Best Regards,
Andy Shevchenko
On Fri, May 31, 2024 at 06:51:46PM +0300, Andy Shevchenko wrote:
> On Fri, May 31, 2024 at 6:46 PM Andy Shevchenko
> <[email protected]> wrote:
> > On Fri, May 31, 2024 at 5:37 PM Nícolas F. R. A. Prado
> > <[email protected]> wrote:
> > > On Fri, May 31, 2024 at 12:44:31PM +0300, Andy Shevchenko wrote:
>
> ...
>
> > > applying either of these patches causes issues. See the traces for each one
> > > below. This was tested on top of next-20240531, which works fine.
> >
> > Oh, thank you very much for prompt testing! Can you test just the
> > second one without the revert?
>
> Ah, you wrote "either", so it seems you have tried that already.
Yes exactly. Both patches are troublesome. Patch 2 causes a slightly different
null pointer dereference, in "dcache_clean_poc+0x20/0x38", as the stack trace I
posted shows.
Thanks,
Nícolas
On Fri, May 31, 2024 at 8:14 PM Nícolas F. R. A. Prado
<[email protected]> wrote:
>
> On Fri, May 31, 2024 at 06:51:46PM +0300, Andy Shevchenko wrote:
> > On Fri, May 31, 2024 at 6:46 PM Andy Shevchenko
> > <[email protected]> wrote:
> > > On Fri, May 31, 2024 at 5:37 PM Nícolas F. R. A. Prado
> > > <[email protected]> wrote:
> > > > On Fri, May 31, 2024 at 12:44:31PM +0300, Andy Shevchenko wrote:
...
> > > > applying either of these patches causes issues. See the traces for each one
> > > > below. This was tested on top of next-20240531, which works fine.
> > >
> > > Oh, thank you very much for prompt testing! Can you test just the
> > > second one without the revert?
> >
> > Ah, you wrote "either", so it seems you have tried that already.
>
> Yes exactly. Both patches are troublesome. Patch 2 causes a slightly different
> null pointer dereference, in "dcache_clean_poc+0x20/0x38", as the stack trace I
> posted shows.
I have sent a new series where the last patch has a massive rework of
the cur_msg_mapped flag. Would be nice to see if it passes your tests.
The main idea there is to actually move to per transfer flag(s) from
per message one.
--
With Best Regards,
Andy Shevchenko
On Sat, Jun 01, 2024 at 12:45:30AM +0300, Andy Shevchenko wrote:
> On Fri, May 31, 2024 at 8:14 PM Nícolas F. R. A. Prado
> <[email protected]> wrote:
> >
> > On Fri, May 31, 2024 at 06:51:46PM +0300, Andy Shevchenko wrote:
> > > On Fri, May 31, 2024 at 6:46 PM Andy Shevchenko
> > > <[email protected]> wrote:
> > > > On Fri, May 31, 2024 at 5:37 PM Nícolas F. R. A. Prado
> > > > <[email protected]> wrote:
> > > > > On Fri, May 31, 2024 at 12:44:31PM +0300, Andy Shevchenko wrote:
>
> ...
>
> > > > > applying either of these patches causes issues. See the traces for each one
> > > > > below. This was tested on top of next-20240531, which works fine.
> > > >
> > > > Oh, thank you very much for prompt testing! Can you test just the
> > > > second one without the revert?
> > >
> > > Ah, you wrote "either", so it seems you have tried that already.
> >
> > Yes exactly. Both patches are troublesome. Patch 2 causes a slightly different
> > null pointer dereference, in "dcache_clean_poc+0x20/0x38", as the stack trace I
> > posted shows.
>
> I have sent a new series where the last patch has a massive rework of
> the cur_msg_mapped flag. Would be nice to see if it passes your tests.
> The main idea there is to actually move to per transfer flag(s) from
> per message one.
Ah great, I really felt that the flag should've been per transfer, so thank you
for making that change. I'll do some testing on Monday.
Thanks,
Nícolas
On Sat, Jun 01, 2024 at 12:45:30AM +0300, Andy Shevchenko wrote:
> I have sent a new series where the last patch has a massive rework of
> the cur_msg_mapped flag. Would be nice to see if it passes your tests.
> The main idea there is to actually move to per transfer flag(s) from
> per message one.
That feels like a sensible cleanup but also a bit much for a fix with
all the driver updates...
On Mon, Jun 03, 2024 at 01:26:36PM +0100, Mark Brown wrote:
> On Sat, Jun 01, 2024 at 12:45:30AM +0300, Andy Shevchenko wrote:
>
> > I have sent a new series where the last patch has a massive rework of
> > the cur_msg_mapped flag. Would be nice to see if it passes your tests.
> > The main idea there is to actually move to per transfer flag(s) from
> > per message one.
>
> That feels like a sensible cleanup but also a bit much for a fix with
> all the driver updates...
The current state of linux-next works fine, so there's nothing to fix. That
series is really a cleanup series, since the fix merged wasn't the cleanest
solution possible. (I'll be testing it shortly and posting the feedback there)
Thanks,
N?colas
On Mon, Jun 03, 2024 at 01:37:02PM -0400, N?colas F. R. A. Prado wrote:
> On Mon, Jun 03, 2024 at 01:26:36PM +0100, Mark Brown wrote:
> > That feels like a sensible cleanup but also a bit much for a fix with
> > all the driver updates...
> The current state of linux-next works fine, so there's nothing to fix. That
> series is really a cleanup series, since the fix merged wasn't the cleanest
> solution possible. (I'll be testing it shortly and posting the feedback there)
Ah, good - I'd lost context and thought there were still some issues
needed fixing.