When unloading the spi-mt65xx kernel module during an ongoing spi-mem
operation the kernel will Oops shortly after unloading the module.
This is because wait_for_completion_timeout was still running and
returning into the no longer loaded module:
Internal error: Oops: 0000000096000005 [#1] SMP
Modules linked in: [many, but spi-mt65xx is no longer there]
CPU: 0 PID: 2578 Comm: block Tainted: G W O 6.3.0-next-20230428+ #0
Hardware name: Bananapi BPI-R3 (DT)
pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __lock_acquire+0x18c/0x20e8
lr : __lock_acquire+0x9b8/0x20e8
sp : ffffffc009ec3400
x29: ffffffc009ec3400 x28: 0000000000000001 x27: 0000000000000004
x26: ffffff80082888c8 x25: 0000000000000000 x24: 0000000000000000
x23: ffffffc009609da8 x22: ffffff8008288000 x21: ffffff8008288968
x20: 00000000000003c2 x19: ffffff8008be7990 x18: 00000000000002af
x17: 0000000000000000 x16: 0000000000000000 x15: ffffffc008d78970
x14: 000000000000080d x13: 00000000000002af x12: 00000000ffffffea
x11: 00000000ffffefff x10: ffffffc008dd0970 x9 : ffffffc008d78918
x8 : 0000000000017fe8 x7 : 0000000000000001 x6 : 0000000000000000
x5 : ffffff807fb53910 x4 : 0000000000000000 x3 : 0000000000000027
x2 : 0000000000000027 x1 : 0000000000000000 x0 : 00000000000c03c2
Call trace:
__lock_acquire+0x18c/0x20e8
lock_acquire+0x100/0x2a4
_raw_spin_lock_irq+0x58/0x74
__wait_for_common+0xe0/0x1b4
wait_for_completion_timeout+0x1c/0x24
0xffffffc000acc8a4 <--- used to be mtk_spi_transfer_wait
spi_mem_exec_op+0x390/0x3ec
spi_mem_no_dirmap_read+0x6c/0x88
spi_mem_dirmap_read+0xcc/0x12c
spinand_read_page+0xf8/0x1dc
spinand_mtd_read+0x1b4/0x2fc
mtd_read_oob_std+0x58/0x7c
mtd_read_oob+0x8c/0x148
mtd_read+0x50/0x6c
...
Prevent this by completing in mtk_spi_remove if needed.
Fixes: 9f763fd20da7 ("spi: mediatek: add spi memory support for ipm design")
Signed-off-by: Daniel Golle <[email protected]>
---
A short grep revealed that many if not most other SPI drivers may need
the same fix. However, I can impossibly test all of them, so let's
start with this one.
drivers/spi/spi-mt65xx.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/spi/spi-mt65xx.c b/drivers/spi/spi-mt65xx.c
index 21c321f437667..d7432e2219d85 100644
--- a/drivers/spi/spi-mt65xx.c
+++ b/drivers/spi/spi-mt65xx.c
@@ -1275,6 +1275,9 @@ static int mtk_spi_remove(struct platform_device *pdev)
struct mtk_spi *mdata = spi_master_get_devdata(master);
int ret;
+ if (mdata->use_spimem && !completion_done(&mdata->spimem_done))
+ complete(&mdata->spimem_done);
+
ret = pm_runtime_resume_and_get(&pdev->dev);
if (ret < 0)
return ret;
--
2.40.1
On Mon, 01 May 2023 19:33:14 +0100, Daniel Golle wrote:
> When unloading the spi-mt65xx kernel module during an ongoing spi-mem
> operation the kernel will Oops shortly after unloading the module.
> This is because wait_for_completion_timeout was still running and
> returning into the no longer loaded module:
>
> Internal error: Oops: 0000000096000005 [#1] SMP
> Modules linked in: [many, but spi-mt65xx is no longer there]
> CPU: 0 PID: 2578 Comm: block Tainted: G W O 6.3.0-next-20230428+ #0
> Hardware name: Bananapi BPI-R3 (DT)
> pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : __lock_acquire+0x18c/0x20e8
> lr : __lock_acquire+0x9b8/0x20e8
> sp : ffffffc009ec3400
> x29: ffffffc009ec3400 x28: 0000000000000001 x27: 0000000000000004
> x26: ffffff80082888c8 x25: 0000000000000000 x24: 0000000000000000
> x23: ffffffc009609da8 x22: ffffff8008288000 x21: ffffff8008288968
> x20: 00000000000003c2 x19: ffffff8008be7990 x18: 00000000000002af
> x17: 0000000000000000 x16: 0000000000000000 x15: ffffffc008d78970
> x14: 000000000000080d x13: 00000000000002af x12: 00000000ffffffea
> x11: 00000000ffffefff x10: ffffffc008dd0970 x9 : ffffffc008d78918
> x8 : 0000000000017fe8 x7 : 0000000000000001 x6 : 0000000000000000
> x5 : ffffff807fb53910 x4 : 0000000000000000 x3 : 0000000000000027
> x2 : 0000000000000027 x1 : 0000000000000000 x0 : 00000000000c03c2
> Call trace:
> __lock_acquire+0x18c/0x20e8
> lock_acquire+0x100/0x2a4
> _raw_spin_lock_irq+0x58/0x74
> __wait_for_common+0xe0/0x1b4
> wait_for_completion_timeout+0x1c/0x24
> 0xffffffc000acc8a4 <--- used to be mtk_spi_transfer_wait
> spi_mem_exec_op+0x390/0x3ec
> spi_mem_no_dirmap_read+0x6c/0x88
> spi_mem_dirmap_read+0xcc/0x12c
> spinand_read_page+0xf8/0x1dc
> spinand_mtd_read+0x1b4/0x2fc
> mtd_read_oob_std+0x58/0x7c
> mtd_read_oob+0x8c/0x148
> mtd_read+0x50/0x6c
> ...
>
> [...]
Applied to
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next
Thanks!
[1/1] spi: mt65xx: make sure operations completed before unloading
commit: 4be47a5d59cbc9396a6ffd327913eb4c8d67a32f
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
Hi Daniel,
On Mon, May 1, 2023 at 8:37 PM Daniel Golle <[email protected]> wrote:
> When unloading the spi-mt65xx kernel module during an ongoing spi-mem
> operation the kernel will Oops shortly after unloading the module.
> This is because wait_for_completion_timeout was still running and
> returning into the no longer loaded module:
>
> Internal error: Oops: 0000000096000005 [#1] SMP
> Modules linked in: [many, but spi-mt65xx is no longer there]
> CPU: 0 PID: 2578 Comm: block Tainted: G W O 6.3.0-next-20230428+ #0
> Hardware name: Bananapi BPI-R3 (DT)
> pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : __lock_acquire+0x18c/0x20e8
> lr : __lock_acquire+0x9b8/0x20e8
> sp : ffffffc009ec3400
> x29: ffffffc009ec3400 x28: 0000000000000001 x27: 0000000000000004
> x26: ffffff80082888c8 x25: 0000000000000000 x24: 0000000000000000
> x23: ffffffc009609da8 x22: ffffff8008288000 x21: ffffff8008288968
> x20: 00000000000003c2 x19: ffffff8008be7990 x18: 00000000000002af
> x17: 0000000000000000 x16: 0000000000000000 x15: ffffffc008d78970
> x14: 000000000000080d x13: 00000000000002af x12: 00000000ffffffea
> x11: 00000000ffffefff x10: ffffffc008dd0970 x9 : ffffffc008d78918
> x8 : 0000000000017fe8 x7 : 0000000000000001 x6 : 0000000000000000
> x5 : ffffff807fb53910 x4 : 0000000000000000 x3 : 0000000000000027
> x2 : 0000000000000027 x1 : 0000000000000000 x0 : 00000000000c03c2
> Call trace:
> __lock_acquire+0x18c/0x20e8
> lock_acquire+0x100/0x2a4
> _raw_spin_lock_irq+0x58/0x74
> __wait_for_common+0xe0/0x1b4
> wait_for_completion_timeout+0x1c/0x24
> 0xffffffc000acc8a4 <--- used to be mtk_spi_transfer_wait
> spi_mem_exec_op+0x390/0x3ec
> spi_mem_no_dirmap_read+0x6c/0x88
> spi_mem_dirmap_read+0xcc/0x12c
> spinand_read_page+0xf8/0x1dc
> spinand_mtd_read+0x1b4/0x2fc
> mtd_read_oob_std+0x58/0x7c
> mtd_read_oob+0x8c/0x148
> mtd_read+0x50/0x6c
> ...
>
> Prevent this by completing in mtk_spi_remove if needed.
>
> Fixes: 9f763fd20da7 ("spi: mediatek: add spi memory support for ipm design")
> Signed-off-by: Daniel Golle <[email protected]>
Thanks for your patch, which is now commit 4be47a5d59cbc939 ("spi:
mt65xx: make sure operations completed before unloading") in spi/for-next.
> --- a/drivers/spi/spi-mt65xx.c
> +++ b/drivers/spi/spi-mt65xx.c
> @@ -1275,6 +1275,9 @@ static int mtk_spi_remove(struct platform_device *pdev)
> struct mtk_spi *mdata = spi_master_get_devdata(master);
> int ret;
>
> + if (mdata->use_spimem && !completion_done(&mdata->spimem_done))
> + complete(&mdata->spimem_done);
I'm afraid that is not sufficient, as the code that was waiting on the
completion accesses hardware registers and driver-private data after
that, and there is no guarantee all of that has completed by the time
mtk_spi_remove() finishes.
> +
> ret = pm_runtime_resume_and_get(&pdev->dev);
> if (ret < 0)
> return ret;
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds