2020-10-31 19:03:09

by Krzysztof Kozlowski

[permalink] [raw]
Subject: dmaengine: pl330 rare NULL pointer dereference in pl330_tasklet

Hi all,

I hit quite rare issue with pl330 DMA driver, difficult to reproduce
(actually failed to do so):

Happened during early reboot

[ OK ] Stopped target Graphical Interface.
[ OK ] Stopped target Multi-User System.
[ OK ] Stopped target RPC Port Mapper.
Stopping OpenSSH Daemonti[ 75.447904] 8<--- cut here ---
[ 75.449506] Unable to handle kernel NULL pointer dereference at virtual address 0000000c
...
[ 75.690850] [<c0902f70>] (pl330_tasklet) from [<c034d460>] (tasklet_action_common+0x88/0x1f4)
[ 75.699340] [<c034d460>] (tasklet_action_common) from [<c03013f8>] (__do_softirq+0x108/0x428)
[ 75.707850] [<c03013f8>] (__do_softirq) from [<c034dadc>] (run_ksoftirqd+0x2c/0x4c)
[ 75.715486] [<c034dadc>] (run_ksoftirqd) from [<c036fbfc>] (smpboot_thread_fn+0x13c/0x24c)
[ 75.723693] [<c036fbfc>] (smpboot_thread_fn) from [<c036c18c>] (kthread+0x13c/0x16c)
[ 75.731390] [<c036c18c>] (kthread) from [<c03001a8>] (ret_from_fork+0x14/0x2c)

Full log:
https://krzk.eu/#/builders/20/builds/954/steps/22/logs/serial0

1. Arch ARM Linux
2. multi_v7_defconfig
3. Odroid HC1, ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC
4. systemd, boot up with static IP set in kernel command line
5. No swap
6. Kernel, DTB and initramfs are downloaded with TFTP
7. NFS root (NFS client) mounted from a NFSv4 server

Since I was not able to reproduce it, obviously I did not run bisect. If
anyone has ideas, please share.

Best regards,
Krzysztof


2020-11-02 07:40:43

by Marek Szyprowski

[permalink] [raw]
Subject: Re: dmaengine: pl330 rare NULL pointer dereference in pl330_tasklet

Hi Krzysztof,

On 31.10.2020 20:01, Krzysztof Kozlowski wrote:
> I hit quite rare issue with pl330 DMA driver, difficult to reproduce
> (actually failed to do so):
>
> Happened during early reboot
>
> [ OK ] Stopped target Graphical Interface.
> [ OK ] Stopped target Multi-User System.
> [ OK ] Stopped target RPC Port Mapper.
> Stopping OpenSSH Daemonti[ 75.447904] 8<--- cut here ---
> [ 75.449506] Unable to handle kernel NULL pointer dereference at virtual address 0000000c
> ...
> [ 75.690850] [<c0902f70>] (pl330_tasklet) from [<c034d460>] (tasklet_action_common+0x88/0x1f4)
> [ 75.699340] [<c034d460>] (tasklet_action_common) from [<c03013f8>] (__do_softirq+0x108/0x428)
> [ 75.707850] [<c03013f8>] (__do_softirq) from [<c034dadc>] (run_ksoftirqd+0x2c/0x4c)
> [ 75.715486] [<c034dadc>] (run_ksoftirqd) from [<c036fbfc>] (smpboot_thread_fn+0x13c/0x24c)
> [ 75.723693] [<c036fbfc>] (smpboot_thread_fn) from [<c036c18c>] (kthread+0x13c/0x16c)
> [ 75.731390] [<c036c18c>] (kthread) from [<c03001a8>] (ret_from_fork+0x14/0x2c)
>
> Full log:
> https://protect2.fireeye.com/v1/url?k=7445a1ab-2bde98a7-74442ae4-000babff3563-a368d542db0c5500&q=1&e=62e4887b-e224-48e5-80a2-71163caeeec8&u=https%3A%2F%2Fkrzk.eu%2F%23%2Fbuilders%2F20%2Fbuilds%2F954%2Fsteps%2F22%2Flogs%2Fserial0
>
> 1. Arch ARM Linux
> 2. multi_v7_defconfig
> 3. Odroid HC1, ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC
> 4. systemd, boot up with static IP set in kernel command line
> 5. No swap
> 6. Kernel, DTB and initramfs are downloaded with TFTP
> 7. NFS root (NFS client) mounted from a NFSv4 server
>
> Since I was not able to reproduce it, obviously I did not run bisect. If
> anyone has ideas, please share.

Well, I've also observed it a few times. IMHO it is related to the
broken UART (in DMA mode) shutdown procedure. Usually it can be easily
observed by flushing some random parts of the previously transmitted
data to the UART console during the system shutdown. This also depends
on the board and used system (especially the presence of systemd, which
plays with UART differently than the old sysv init). IMHO there is a
kind of use-after-free issue there, so the above pl330 stacktrace can be
also observed depending on the timing and system load. This issue is
there from the beginning of the DMA support. I have it on my todo list,
but it had too low priority to take a look into it. I only briefly
checked the related code a few years ago and noticed that the UART
shutdown is not really synchronized with DMA. However that time I didn't
find any simple fix, so I gave up.

Best regards

--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2020-11-02 08:43:17

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: dmaengine: pl330 rare NULL pointer dereference in pl330_tasklet

On Mon, Nov 02, 2020 at 08:38:14AM +0100, Marek Szyprowski wrote:
> Hi Krzysztof,
>
> On 31.10.2020 20:01, Krzysztof Kozlowski wrote:
> > I hit quite rare issue with pl330 DMA driver, difficult to reproduce
> > (actually failed to do so):
> >
> > Happened during early reboot
> >
> > [ OK ] Stopped target Graphical Interface.
> > [ OK ] Stopped target Multi-User System.
> > [ OK ] Stopped target RPC Port Mapper.
> > Stopping OpenSSH Daemonti[ 75.447904] 8<--- cut here ---
> > [ 75.449506] Unable to handle kernel NULL pointer dereference at virtual address 0000000c
> > ...
> > [ 75.690850] [<c0902f70>] (pl330_tasklet) from [<c034d460>] (tasklet_action_common+0x88/0x1f4)
> > [ 75.699340] [<c034d460>] (tasklet_action_common) from [<c03013f8>] (__do_softirq+0x108/0x428)
> > [ 75.707850] [<c03013f8>] (__do_softirq) from [<c034dadc>] (run_ksoftirqd+0x2c/0x4c)
> > [ 75.715486] [<c034dadc>] (run_ksoftirqd) from [<c036fbfc>] (smpboot_thread_fn+0x13c/0x24c)
> > [ 75.723693] [<c036fbfc>] (smpboot_thread_fn) from [<c036c18c>] (kthread+0x13c/0x16c)
> > [ 75.731390] [<c036c18c>] (kthread) from [<c03001a8>] (ret_from_fork+0x14/0x2c)
> >
> > Full log:
> > https://protect2.fireeye.com/v1/url?k=7445a1ab-2bde98a7-74442ae4-000babff3563-a368d542db0c5500&q=1&e=62e4887b-e224-48e5-80a2-71163caeeec8&u=https%3A%2F%2Fkrzk.eu%2F%23%2Fbuilders%2F20%2Fbuilds%2F954%2Fsteps%2F22%2Flogs%2Fserial0
> >
> > 1. Arch ARM Linux
> > 2. multi_v7_defconfig
> > 3. Odroid HC1, ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC
> > 4. systemd, boot up with static IP set in kernel command line
> > 5. No swap
> > 6. Kernel, DTB and initramfs are downloaded with TFTP
> > 7. NFS root (NFS client) mounted from a NFSv4 server
> >
> > Since I was not able to reproduce it, obviously I did not run bisect. If
> > anyone has ideas, please share.
>
> Well, I've also observed it a few times. IMHO it is related to the
> broken UART (in DMA mode) shutdown procedure. Usually it can be easily
> observed by flushing some random parts of the previously transmitted
> data to the UART console during the system shutdown. This also depends
> on the board and used system (especially the presence of systemd, which
> plays with UART differently than the old sysv init). IMHO there is a
> kind of use-after-free issue there, so the above pl330 stacktrace can be
> also observed depending on the timing and system load. This issue is
> there from the beginning of the DMA support. I have it on my todo list,
> but it had too low priority to take a look into it. I only briefly
> checked the related code a few years ago and noticed that the UART
> shutdown is not really synchronized with DMA. However that time I didn't
> find any simple fix, so I gave up.

Thanks for the explanation.

Best regards,
Krzysztof