2023-08-04 09:04:41

by Kees Cook

[permalink] [raw]
Subject: Re: pstore/ram: printk: NULL characters in pstore ramoops area

On Thu, Aug 03, 2023 at 04:34:09PM -0700, Vijay Balakrishna wrote:
> Hello,
>
> We are noticing NULL characters in ramoops/pstore memory after a warm or a
> kexec reboot [1] in our 5.10 ARM64 product kernel after moving from 5.4
> kernel.? I ruled out fs/pstore/* as the source from where NULL characters
> originate by adding debug code [2] and confirming from collected output
> [3].? Then isolated further to printk log/ring buffer area, the NULL
> characters were already present in buffer in kmsg_dump_get_buffer() when
> kmsg log lines are read.? After looking at printk merges in mainline kernel,
> I cherry-picked following which looked related to our 5.10 kernel and still
> see NULL characters.
>
> 4260e0e5510158d704898603331e5365ebe957de printk: consolidate
> kmsg_dump_get_buffer/syslog_print_all code
> 726b5097701a8d46f5354be780e1a11fc4ca1187 printk: refactor
> kmsg_dump_get_buffer()
> bb07b16c44b2c6ddbafa44bb06454719002e828e printk: limit second loop
> of syslog_print_all

Do you mean that you took a working v5.4 kernel and backported the above
3 commits and it starting showing the %NUL characters?

> [...]
> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
> index ade66dbe5f39..1825972151b2 100644
> --- a/fs/pstore/ram.c
> +++ b/fs/pstore/ram.c
> @@ -383,6 +383,10 @@ static int notrace ramoops_pstore_write(struct
> pstore_record *record)
> ???? size = record->size;
> ???? if (size + hlen > prz->buffer_size)
> ???? ??? size = prz->buffer_size - hlen;
> +??? if (null_char(record->buf, size))
> +??? ??? pr_crit("%s: A NULL char in record buf, size %zu\n", __func__,
> size);
> +??? else
> +??? ??? pr_crit("%s: No NULL char in record buf, size %zu\n", __func__,
> size);
> ???? persistent_ram_write(prz, record->buf, size);
> [...]
> root@localhost:~# reboot
> [ 2188.073362] systemd-shutdown[1]: Could not detach loopback /dev/loop1: Device or resource busy
> [ 2188.082272] systemd-shutdown[1]: Could not detach loopback /dev/loop0: Device or resource busy
> [ 2188.091873] watchdog: watchdog0: watchdog did not stop!
> [ 2188.099227] systemd-shutdown[1]: Failed to finalize loop devices, DM devices, ignoring.
> [ 2188.306671] reboot: Restarting system
> [ 2188.316932] ramoops: ramoops_pstore_write: A NULL char in record buf, size 88190

Well that does seem pretty definitive that it's a problem with the
printk/kmsg infrastructure: the %NULs are present in the buffer being
handed to pstore. :(

I have had a growing suspicion that there is a hard-to-find memory
corruption issue with recent printk work (seen during early-boot UBSAN
reporting), but v5.10 is pretty old, so it's probably not related.

Is the issue present in modern kernels?

-Kees

--
Kees Cook


2023-08-11 00:47:52

by Vijay Balakrishna

[permalink] [raw]
Subject: Re: pstore/ram: printk: NULL characters in pstore ramoops area

On 8/4/23 00:59, Kees Cook wrote:
> Is the issue present in modern kernels?

Yes, the issue present in mainline kernel, I can repro on a x86_64 VM with v6.5.0-rc5 kernel.

$ uname -a
Linux vijay-vm3 6.5.0-rc5-ramoops #10 SMP PREEMPT_DYNAMIC Thu Aug 10 17:24:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.5.0-rc5-ramoops root=UUID=xx ro mem=8188M ramoops.mem_address=0x2BFC00000 ramoops.mem_size=0x400000 ramoops.max_reason=5 ramoops.record_size=0x80000 console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300

(excerpt from /var/lib/systemd/pstore/dmesg-ramoops-0 after warm reboot)
<5>[ 0.000000] Linux version 6.5.0-rc5-ramoops (vijay@vijay-vm1) (gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #10 SMP PREEMPT_DYNAMIC Thu Aug 10 17:24:39 UTC 2023
..
<6>[ 0.463984] Calibrating delay loop (skipped), value calculated using timer frequency.. ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@

(hex dump output)

00003160 61 6c 63 75 6c 61 74 65 64 20 75 73 69 6e 67 20 |alculated using |
00003170 74 69 6d 65 72 20 66 72 65 71 75 65 6e 63 79 2e |timer frequency.|
00003180 2e 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |. ..............|
00003190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000031a0 00 0a 3c 36 3e 5b 20 20 20 20 30 2e 34 36 37 39 |..<6>[ 0.4679|

2023-08-11 01:00:22

by Kees Cook

[permalink] [raw]
Subject: Re: pstore/ram: printk: NULL characters in pstore ramoops area

On Thu, Aug 10, 2023 at 04:32:58PM -0700, Vijay Balakrishna wrote:
> On 8/4/23 00:59, Kees Cook wrote:
> > Is the issue present in modern kernels?
>
> Yes, the issue present in mainline kernel, I can repro on a x86_64 VM with v6.5.0-rc5 kernel.
>
> $ uname -a
> Linux vijay-vm3 6.5.0-rc5-ramoops #10 SMP PREEMPT_DYNAMIC Thu Aug 10 17:24:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
> $ cat /proc/cmdline
> BOOT_IMAGE=/boot/vmlinuz-6.5.0-rc5-ramoops root=UUID=xx ro mem=8188M ramoops.mem_address=0x2BFC00000 ramoops.mem_size=0x400000 ramoops.max_reason=5 ramoops.record_size=0x80000 console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300
>
> (excerpt from /var/lib/systemd/pstore/dmesg-ramoops-0 after warm reboot)
> <5>[ 0.000000] Linux version 6.5.0-rc5-ramoops (vijay@vijay-vm1) (gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #10 SMP PREEMPT_DYNAMIC Thu Aug 10 17:24:39 UTC 2023
> ..
> <6>[ 0.463984] Calibrating delay loop (skipped), value calculated using timer frequency.. ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
>
> (hex dump output)
>
> 00003160 61 6c 63 75 6c 61 74 65 64 20 75 73 69 6e 67 20 |alculated using |
> 00003170 74 69 6d 65 72 20 66 72 65 71 75 65 6e 63 79 2e |timer frequency.|
> 00003180 2e 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |. ..............|
> 00003190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> 000031a0 00 0a 3c 36 3e 5b 20 20 20 20 30 2e 34 36 37 39 |..<6>[ 0.4679|

Can you share the .config you're building with? And what are you using
to trigger an Oops? I will see if I can reproduce this...

--
Kees Cook