2020-07-21 15:22:49

by Paul Menzel

[permalink] [raw]
Subject: Start messages in message buffer missing (dmesg)

Dear Linux folks,


On two indentical Dell PowerEdge T440 with Linux 5.4.39 and systemd 242

00:00.0 Host bridge [0600]: Intel Corporation Sky Lake-E DMI3
Registers [8086:2020] (rev 07)

running `dmesg`, it’s truncated despite the buffer not yet filled.

> $ sudo dmidecode -t2
> # dmidecode 3.0
> Getting SMBIOS data from sysfs.
> SMBIOS 3.2 present.
> # SMBIOS implementations newer than version 3.0 are not
> # fully supported by this version of dmidecode.
>
> Handle 0x0200, DMI type 2, 8 bytes
> Base Board Information
> Manufacturer: Dell Inc.
> Product Name: 02KM69
> Version: A04

The message cutoffs differ.

> @sang:~$ dmesg | head
> [ 6.362201] system 00:01: [mem 0xfed12010-0xfed1201f] has been reserved
> [ 6.369102] system 00:01: [mem 0xfed1b000-0xfed1bfff] has been reserved
> [ 6.376003] system 00:01: Plug and Play ACPI device, IDs PNP0c02 (active)
> [ 6.376176] pnp 00:02: Plug and Play ACPI device, IDs PNP0501 (active)
> [ 6.376319] pnp 00:03: Plug and Play ACPI device, IDs PNP0501 (active)
> [ 6.376451] system 00:04: [mem 0xfd000000-0xfdabffff] has been reserved
> [ 6.383352] system 00:04: [mem 0xfdad0000-0xfdadffff] has been reserved
> [ 6.390251] system 00:04: [mem 0xfdb00000-0xfdffffff] has been reserved
> [ 6.397149] system 00:04: [mem 0xfe000000-0xfe00ffff] has been reserved
> [ 6.404049] system 00:04: [mem 0xfe011000-0xfe01ffff] has been reserved

> @mouches:~$ dmesg | head
> [ 6.623209] pci 0000:00:1c.4: PCI bridge to [bus 02-03]
> [ 6.628723] pci 0000:00:1c.4: bridge window [mem 0x92000000-0x928fffff]
> [ 6.635795] pci 0000:00:1c.4: bridge window [mem 0x91000000-0x91ffffff 64bit pref]
> [ 6.644046] pci 0000:04:00.0: BAR 6: assigned [mem 0x90000000-0x9003ffff pref]
> [ 6.651772] pci 0000:04:00.1: BAR 6: assigned [mem 0x90040000-0x9007ffff pref]
> [ 6.659494] pci 0000:00:1c.5: PCI bridge to [bus 04]
> [ 6.664752] pci 0000:00:1c.5: bridge window [mem 0x90000000-0x900fffff]
> [ 6.671834] pci 0000:00:1c.5: bridge window [mem 0x92e00000-0x92efffff 64bit pref]
> [ 6.680085] pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7 window]
> [ 6.686566] pci_bus 0000:00: resource 5 [io 0x1000-0x3fff window]

> @sang:~$ head /dev/kmsg
> 6,862,6362201,-;system 00:01: [mem 0xfed12010-0xfed1201f] has been reserved
> SUBSYSTEM=pnp
> DEVICE=+pnp:00:01
> 6,863,6369102,-;system 00:01: [mem 0xfed1b000-0xfed1bfff] has been reserved
> SUBSYSTEM=pnp
> DEVICE=+pnp:00:01
> 7,864,6376003,-;system 00:01: Plug and Play ACPI device, IDs PNP0c02 (active)
> SUBSYSTEM=pnp
> DEVICE=+pnp:00:01
> 7,865,6376176,-;pnp 00:02: Plug and Play ACPI device, IDs PNP0501 (active)

> @sang:~$ cp /dev/kmsg /dev/shm/kmsg
> ^C
> @sang:~$ ls -lh /dev/shm/kmsg
> -rw-r----- 1 pmenzel pmenzel 135K Jul 21 13:53 /dev/shm/kmsg

So, the size is smaller than 128 KB + 16 * 4 KB = 192 KB. The systemd
journal contains some earlier messages.

> @sang:~$ sudo journalctl -k | head
> -- Logs begin at Wed 2020-06-03 14:31:56 CEST, end at Tue 2020-07-21 13:54:38 CEST. --
> Jul 16 18:08:59 sang.molgen.mpg.de kernel: Normal zone: 315392 pages used for memmap
> Jul 16 18:08:59 sang.molgen.mpg.de kernel: Normal zone: 20185088 pages, LIFO batch:63
> Jul 16 18:08:59 sang.molgen.mpg.de kernel: Initmem setup node 1 [mem 0x0000001440000000-0x000000203fffffff]
> Jul 16 18:08:59 sang.molgen.mpg.de kernel: On node 1 totalpages: 12582912
> Jul 16 18:08:59 sang.molgen.mpg.de kernel: Normal zone: 196608 pages used for memmap
> Jul 16 18:08:59 sang.molgen.mpg.de kernel: Normal zone: 12582912 pages, LIFO batch:63
> Jul 16 18:08:59 sang.molgen.mpg.de kernel: ACPI: PM-Timer IO Port: 0x508
> Jul 16 18:08:59 sang.molgen.mpg.de kernel: ACPI: Local APIC address 0xfee00000
> Jul 16 18:08:59 sang.molgen.mpg.de kernel: ACPI: X2APIC_NMI (uid[0xffffffff] high level lint[0x1])
>
> @sang:~$ sudo journalctl -k > /dev/shm/journalctl-k.txt
> @sang:~$ ls -lh /dev/shm/journalctl-k.txt
> -rw-rw---- 1 pmenzel pmenzel 191K Jul 21 13:55 /dev/shm/journalctl-k.txt

The systemd journal output starts at the same position on both systems.

But now, trying to get the full log over serial console, the full log is
now present too on one system, when running `dmesg`. Detaching the
serial console, the behavior stays the same, that means all messages are
there. I am confused, how this could have fixed anything.

Is somebody interested into looking into this, or should a just fix the
affected system by plugging in a serial cable too?


Kind regards,

Paul


Attachments:
=?UTF-8?Q?20200721=E2=80=93mouches=E2=80=93linux=5F5=2E4=2E39=E2=80=93messages=2Etxt?= (90.48 kB)
=?UTF-8?Q?20200721=E2=80=93sang=E2=80=93linux=5F5=2E4=2E39=E2=80=93messages-full=2Etxt?= (131.40 kB)
Download all attachments

2020-08-10 22:16:06

by Paul Menzel

[permalink] [raw]
Subject: Re: Start messages in message buffer missing (dmesg)

[CC: +PRINTK maintainers]

Dear Linux folks,


Am 21.07.20 um 17:20 schrieb Paul Menzel:

> On two identical Dell PowerEdge T440 with Linux 5.4.39 and systemd 242
>
>     00:00.0 Host bridge [0600]: Intel Corporation Sky Lake-E DMI3 Registers [8086:2020] (rev 07)
>
> running `dmesg`, it’s truncated despite the buffer not yet filled.
>
>> $ sudo dmidecode -t2
>> # dmidecode 3.0
>> Getting SMBIOS data from sysfs.
>> SMBIOS 3.2 present.
>> # SMBIOS implementations newer than version 3.0 are not
>> # fully supported by this version of dmidecode.
>>
>> Handle 0x0200, DMI type 2, 8 bytes
>> Base Board Information
>>     Manufacturer: Dell Inc.
>>     Product Name: 02KM69
>>     Version: A04
>
> The message cutoffs differ.
>
>> @sang:~$ dmesg | head
>> [    6.362201] system 00:01: [mem 0xfed12010-0xfed1201f] has been reserved
>> [    6.369102] system 00:01: [mem 0xfed1b000-0xfed1bfff] has been reserved
>> [    6.376003] system 00:01: Plug and Play ACPI device, IDs PNP0c02 (active)
>> [    6.376176] pnp 00:02: Plug and Play ACPI device, IDs PNP0501 (active)
>> [    6.376319] pnp 00:03: Plug and Play ACPI device, IDs PNP0501 (active)
>> [    6.376451] system 00:04: [mem 0xfd000000-0xfdabffff] has been reserved
>> [    6.383352] system 00:04: [mem 0xfdad0000-0xfdadffff] has been reserved
>> [    6.390251] system 00:04: [mem 0xfdb00000-0xfdffffff] has been reserved
>> [    6.397149] system 00:04: [mem 0xfe000000-0xfe00ffff] has been reserved
>> [    6.404049] system 00:04: [mem 0xfe011000-0xfe01ffff] has been reserved
>
>> @mouches:~$ dmesg | head
>> [    6.623209] pci 0000:00:1c.4: PCI bridge to [bus 02-03]
>> [    6.628723] pci 0000:00:1c.4:   bridge window [mem 0x92000000-0x928fffff]
>> [    6.635795] pci 0000:00:1c.4:   bridge window [mem 0x91000000-0x91ffffff 64bit pref]
>> [    6.644046] pci 0000:04:00.0: BAR 6: assigned [mem 0x90000000-0x9003ffff pref]
>> [    6.651772] pci 0000:04:00.1: BAR 6: assigned [mem 0x90040000-0x9007ffff pref]
>> [    6.659494] pci 0000:00:1c.5: PCI bridge to [bus 04]
>> [    6.664752] pci 0000:00:1c.5:   bridge window [mem 0x90000000-0x900fffff]
>> [    6.671834] pci 0000:00:1c.5:   bridge window [mem 0x92e00000-0x92efffff 64bit pref]
>> [    6.680085] pci_bus 0000:00: resource 4 [io  0x0000-0x0cf7 window]
>> [    6.686566] pci_bus 0000:00: resource 5 [io  0x1000-0x3fff window]
>
>> @sang:~$ head /dev/kmsg
>> 6,862,6362201,-;system 00:01: [mem 0xfed12010-0xfed1201f] has been reserved
>>  SUBSYSTEM=pnp
>>  DEVICE=+pnp:00:01
>> 6,863,6369102,-;system 00:01: [mem 0xfed1b000-0xfed1bfff] has been reserved
>>  SUBSYSTEM=pnp
>>  DEVICE=+pnp:00:01
>> 7,864,6376003,-;system 00:01: Plug and Play ACPI device, IDs PNP0c02 (active)
>>  SUBSYSTEM=pnp
>>  DEVICE=+pnp:00:01
>> 7,865,6376176,-;pnp 00:02: Plug and Play ACPI device, IDs PNP0501 (active)
>
>> @sang:~$ cp /dev/kmsg /dev/shm/kmsg
>> ^C
>> @sang:~$ ls -lh /dev/shm/kmsg
>> -rw-r----- 1 pmenzel pmenzel 135K Jul 21 13:53 /dev/shm/kmsg
>
> So, the size is smaller than 128 KB + 16 * 4 KB = 192 KB. The systemd
> journal contains some earlier messages.

This seems incorrect. See below.

>> @sang:~$ sudo journalctl -k | head
>> -- Logs begin at Wed 2020-06-03 14:31:56 CEST, end at Tue 2020-07-21 13:54:38 CEST. --
>> Jul 16 18:08:59 sang.molgen.mpg.de kernel:   Normal zone: 315392 pages used for memmap
>> Jul 16 18:08:59 sang.molgen.mpg.de kernel:   Normal zone: 20185088 pages, LIFO batch:63
>> Jul 16 18:08:59 sang.molgen.mpg.de kernel: Initmem setup node 1 [mem 0x0000001440000000-0x000000203fffffff]
>> Jul 16 18:08:59 sang.molgen.mpg.de kernel: On node 1 totalpages: 12582912
>> Jul 16 18:08:59 sang.molgen.mpg.de kernel:   Normal zone: 196608 pages used for memmap
>> Jul 16 18:08:59 sang.molgen.mpg.de kernel:   Normal zone: 12582912 pages, LIFO batch:63
>> Jul 16 18:08:59 sang.molgen.mpg.de kernel: ACPI: PM-Timer IO Port: 0x508
>> Jul 16 18:08:59 sang.molgen.mpg.de kernel: ACPI: Local APIC address 0xfee00000
>> Jul 16 18:08:59 sang.molgen.mpg.de kernel: ACPI: X2APIC_NMI (uid[0xffffffff] high level lint[0x1])
>>
>> @sang:~$ sudo journalctl -k > /dev/shm/journalctl-k.txt
>> @sang:~$ ls -lh /dev/shm/journalctl-k.txt
>> -rw-rw---- 1 pmenzel pmenzel 191K Jul 21 13:55 /dev/shm/journalctl-k.txt
>
> The systemd journal output starts at the same position on both systems.
>
> But now, trying to get the full log over serial console, the full log is
> now present too on one system, when running `dmesg`. Detaching the
> serial console, the behavior stays the same, that means all messages are
> there. I am confused, how this could have fixed anything.
>
> Is somebody interested into looking into this, or should a just fix the
> affected system by plugging in a serial cable too?

It turns out, that the issue with the serial console seems to have been
an exception or maybe a testing error.

The issue still happens with current Linux master
v5.8-11935-g8d3e09b43312 and the attached debug patch on top.

But first, from `init/Kconfig`:

> The increased size means that a new buffer has to be allocated and
> the original static one is unused. It makes sense only on systems
> with more CPUs. Therefore this value is used only when the sum of
> contributions is greater than the half of the default kernel ring
> buffer as defined by LOG_BUF_SHIFT. The default values are set
> so that more than 64 CPUs are needed to trigger the allocation.

As the ring buffer size defaults to 2^17 bytes = 128 KB, shouldn’t it be
64 instead instead of 16 CPUs?

Anyway, with the attached debug patch, I can confirm, that only 128 KB
are used. That could explain the lost messages. The serial log is attached.

On other systems, like my laptop with just four CPUs, despite the 128 KB
limit, the size of the stored messages is bigger.

$ grep BUF_SHIFT /boot/config-5.7.0-2-amd64
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13

But the size of `/dev/kmsg` is 301393 bytes.

So I wonder, what knobs decide, when the buffer is wrapped, and messages
are lost. I also wonder, why the systemd journal does not contain all
Linux kernel messages.

Lastly, should the default buffer size be increased from 128 KB to 512
KB as today machines should have enough room?


Kind regards,

Paul


Attachments:
0001-printk-Debug-lost-message-issue.patch (898.00 B)
mouches-linux-5.8-rc0-messages.txt (124.75 kB)
Download all attachments

2020-08-11 09:10:29

by Paul Menzel

[permalink] [raw]
Subject: Re: Start messages in message buffer missing (dmesg)

Dear Linux folks,


Am 11.08.20 um 00:13 schrieb Paul Menzel:
> Am 21.07.20 um 17:20 schrieb Paul Menzel:
>
>> On two identical Dell PowerEdge T440 with Linux 5.4.39 and systemd 242
>>
>>      00:00.0 Host bridge [0600]: Intel Corporation Sky Lake-E DMI3 Registers [8086:2020] (rev 07)
>>
>> running `dmesg`, it’s truncated despite the buffer not yet filled.
>>
>>> $ sudo dmidecode -t2
>>> # dmidecode 3.0
>>> Getting SMBIOS data from sysfs.
>>> SMBIOS 3.2 present.
>>> # SMBIOS implementations newer than version 3.0 are not
>>> # fully supported by this version of dmidecode.
>>>
>>> Handle 0x0200, DMI type 2, 8 bytes
>>> Base Board Information
>>>     Manufacturer: Dell Inc.
>>>     Product Name: 02KM69
>>>     Version: A04
>>
>> The message cutoffs differ.
>>
>>> @sang:~$ dmesg | head
>>> [    6.362201] system 00:01: [mem 0xfed12010-0xfed1201f] has been reserved
>>> [    6.369102] system 00:01: [mem 0xfed1b000-0xfed1bfff] has been reserved
>>> [    6.376003] system 00:01: Plug and Play ACPI device, IDs PNP0c02 (active)
>>> [    6.376176] pnp 00:02: Plug and Play ACPI device, IDs PNP0501 (active)
>>> [    6.376319] pnp 00:03: Plug and Play ACPI device, IDs PNP0501 (active)
>>> [    6.376451] system 00:04: [mem 0xfd000000-0xfdabffff] has been reserved
>>> [    6.383352] system 00:04: [mem 0xfdad0000-0xfdadffff] has been reserved
>>> [    6.390251] system 00:04: [mem 0xfdb00000-0xfdffffff] has been reserved
>>> [    6.397149] system 00:04: [mem 0xfe000000-0xfe00ffff] has been reserved
>>> [    6.404049] system 00:04: [mem 0xfe011000-0xfe01ffff] has been reserved
>>
>>> @mouches:~$ dmesg | head
>>> [    6.623209] pci 0000:00:1c.4: PCI bridge to [bus 02-03]
>>> [    6.628723] pci 0000:00:1c.4:   bridge window [mem 0x92000000-0x928fffff]
>>> [    6.635795] pci 0000:00:1c.4:   bridge window [mem 0x91000000-0x91ffffff 64bit pref]
>>> [    6.644046] pci 0000:04:00.0: BAR 6: assigned [mem 0x90000000-0x9003ffff pref]
>>> [    6.651772] pci 0000:04:00.1: BAR 6: assigned [mem 0x90040000-0x9007ffff pref]
>>> [    6.659494] pci 0000:00:1c.5: PCI bridge to [bus 04]
>>> [    6.664752] pci 0000:00:1c.5:   bridge window [mem 0x90000000-0x900fffff]
>>> [    6.671834] pci 0000:00:1c.5:   bridge window [mem 0x92e00000-0x92efffff 64bit pref]
>>> [    6.680085] pci_bus 0000:00: resource 4 [io  0x0000-0x0cf7 window]
>>> [    6.686566] pci_bus 0000:00: resource 5 [io  0x1000-0x3fff window]
>>
>>> @sang:~$ head /dev/kmsg
>>> 6,862,6362201,-;system 00:01: [mem 0xfed12010-0xfed1201f] has been reserved
>>>  SUBSYSTEM=pnp
>>>  DEVICE=+pnp:00:01
>>> 6,863,6369102,-;system 00:01: [mem 0xfed1b000-0xfed1bfff] has been reserved
>>>  SUBSYSTEM=pnp
>>>  DEVICE=+pnp:00:01
>>> 7,864,6376003,-;system 00:01: Plug and Play ACPI device, IDs PNP0c02 (active)
>>>  SUBSYSTEM=pnp
>>>  DEVICE=+pnp:00:01
>>> 7,865,6376176,-;pnp 00:02: Plug and Play ACPI device, IDs PNP0501 (active)
>>
>>> @sang:~$ cp /dev/kmsg /dev/shm/kmsg
>>> ^C
>>> @sang:~$ ls -lh /dev/shm/kmsg
>>> -rw-r----- 1 pmenzel pmenzel 135K Jul 21 13:53 /dev/shm/kmsg
>>
>> So, the size is smaller than 128 KB + 16 * 4 KB = 192 KB. The systemd
>> journal contains some earlier messages.
>
> This seems incorrect. See below.
>
>>> @sang:~$ sudo journalctl -k | head
>>> -- Logs begin at Wed 2020-06-03 14:31:56 CEST, end at Tue 2020-07-21
>>> 13:54:38 CEST. --
>>> Jul 16 18:08:59 sang.molgen.mpg.de kernel:   Normal zone: 315392 pages used for memmap
>>> Jul 16 18:08:59 sang.molgen.mpg.de kernel:   Normal zone: 20185088 pages, LIFO batch:63
>>> Jul 16 18:08:59 sang.molgen.mpg.de kernel: Initmem setup node 1 [mem 0x0000001440000000-0x000000203fffffff]
>>> Jul 16 18:08:59 sang.molgen.mpg.de kernel: On node 1 totalpages: 12582912
>>> Jul 16 18:08:59 sang.molgen.mpg.de kernel:   Normal zone: 196608 pages used for memmap
>>> Jul 16 18:08:59 sang.molgen.mpg.de kernel:   Normal zone: 12582912 pages, LIFO batch:63
>>> Jul 16 18:08:59 sang.molgen.mpg.de kernel: ACPI: PM-Timer IO Port: 0x508
>>> Jul 16 18:08:59 sang.molgen.mpg.de kernel: ACPI: Local APIC address 0xfee00000
>>> Jul 16 18:08:59 sang.molgen.mpg.de kernel: ACPI: X2APIC_NMI (uid[0xffffffff] high level lint[0x1])
>>>
>>> @sang:~$ sudo journalctl -k > /dev/shm/journalctl-k.txt
>>> @sang:~$ ls -lh /dev/shm/journalctl-k.txt
>>> -rw-rw---- 1 pmenzel pmenzel 191K Jul 21 13:55 /dev/shm/journalctl-k.txt
>>
>> The systemd journal output starts at the same position on both systems.
>>
>> But now, trying to get the full log over serial console, the full log
>> is now present too on one system, when running `dmesg`. Detaching the
>> serial console, the behavior stays the same, that means all messages
>> are there. I am confused, how this could have fixed anything.
>>
>> Is somebody interested into looking into this, or should a just fix
>> the affected system by plugging in a serial cable too?
>
> It turns out, that the issue with the serial console seems to have been
> an exception or maybe a testing error.
>
> The issue still happens with current Linux master
> v5.8-11935-g8d3e09b43312 and the attached debug patch on top.
>
> But first, from `init/Kconfig`:
>
>>           The increased size means that a new buffer has to be allocated and
>>           the original static one is unused. It makes sense only on systems
>>           with more CPUs. Therefore this value is used only when the sum of
>>           contributions is greater than the half of the default kernel ring
>>           buffer as defined by LOG_BUF_SHIFT. The default values are set
>>           so that more than 64 CPUs are needed to trigger the allocation.
>
> As the ring buffer size defaults to 2^17 bytes = 128 KB, shouldn’t it be
> 64 instead instead of 16 CPUs?

I sent a patch to correct that [1].

> Anyway, with the attached debug patch, I can confirm, that only 128 KB
> are used. That could explain the lost messages. The serial log is attached.

The reason for the different behavior of some of the 16 CPU systems is,
that these are two socket systems, and the Skylake systems also seem to
have a lot more PCI devices.

$ lspci | wc -l
281

> On other systems, like my laptop with just four CPUs, despite the 128 KB
> limit, the size of the stored messages is bigger.
>
>     $ grep BUF_SHIFT /boot/config-5.7.0-2-amd64
>     CONFIG_LOG_BUF_SHIFT=17
>     CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
>     CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
>
> But the size of `/dev/kmsg` is 301393 bytes.

I missesd, that I have `log_buf_len=8M` set on the laptop. (Sorry for
the noise.)

> So I wonder, what knobs decide, when the buffer is wrapped, and messages
> are lost. I also wonder, why the systemd journal does not contain all
> Linux kernel messages.
>
> Lastly, should the default buffer size be increased from 128 KB to 512
> KB as today machines should have enough room?

I am going to send a patch for that.


Kind regards,

Paul


[1]: https://lkml.org/lkml/2020/8/11/110