2019-08-15 11:49:52

by Paul Menzel

[permalink] [raw]
Subject: Brocken/incomplete `/proc/vmcore`

Dear Linux folks,


Using Linux 4.19.57 (configuration attached), crashing the system, and
starting it using the same Linux kernel as crash kernel, the available
`/proc/vmcore` seems to be incomplete.

Running GDB commands, working with `/proc/kcore`, do not work with
`/proc/vmcore`, and the addresses are not there.

In the running system, iterating through the tasks works.

```
macro define offsetof(type, member) ((size_t)(&((type *)0)->member))
macro define container_of(ptr,type,member) ((type *)((size_t)ptr-offsetof(type,member)))
```

### /proc/kcore ###

```
Core was generated by `BOOT_IMAGE=/boot/bzImage-4.19.57.mx64.286 root=LABEL=root ro crashkernel=512M c'.
#0 0x0000000000000000 in irq_stack_union ()
(gdb) source gdb-macros.txt
(gdb) set $t=&init_task
(gdb) print $t->tasks
$1 = {next = 0xffff889ffbb0f080, prev = 0xffff88bff9b09300}
(gdb) print $t->pid
$2 = 0
(gdb) set $t=container_of($t->tasks->next,struct task_struct,tasks)
(gdb) print $t->tasks
$3 = {next = 0xffff889ffbb0e340, prev = 0xffffffff82411a80 <init_task+768>}
(gdb) print $t->pid
$4 = 1
(gdb) set $t=container_of($t->tasks->next,struct task_struct,tasks)
(gdb) print $t->tasks
$5 = {next = 0xffff889ffbb530c0, prev = 0xffff889ffbb0f080}
(gdb) print $t->pid
$6 = 2
```

### /proc/vmcore ###

After the crash by SysRQ trigger, values in `/proc/vmcore` are incorrect.

```
(gdb) set $t=&init_task
(gdb) print $t->tasks
$1 = {next = 0xffff889ffbb0f080, prev = 0xffff88bff9b09300}
(gdb) print $t->pid
$2 = 0
(gdb) set $t=container_of($t->tasks->next,struct task_struct,tasks)
(gdb) print $t->tasks
$3 = {next = 0x0 <irq_stack_union>, prev = 0x0 <irq_stack_union>}
(gdb) print $t->pid
$4 = 0
```

We can reproduce this in a virtual machine and on a big server.


Kind regards,

Paul


Attachments:
config-4.19.57.mx64.286 (126.22 kB)
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature
Download all attachments

2019-08-19 19:24:07

by Bhupesh Sharma

[permalink] [raw]
Subject: Re: Brocken/incomplete `/proc/vmcore`

Hi Paul,

On Mon, Aug 19, 2019 at 1:59 PM Paul Menzel <[email protected]> wrote:
>
> Dear Linux folks,
>
>
> Using Linux 4.19.57 (configuration attached), crashing the system, and
> starting it using the same Linux kernel as crash kernel, the available
> `/proc/vmcore` seems to be incomplete.
>
> Running GDB commands, working with `/proc/kcore`, do not work with
> `/proc/vmcore`, and the addresses are not there.
>
> In the running system, iterating through the tasks works.
>
> ```
> macro define offsetof(type, member) ((size_t)(&((type *)0)->member))
> macro define container_of(ptr,type,member) ((type *)((size_t)ptr-offsetof(type,member)))
> ```
>
> ### /proc/kcore ###
>
> ```
> Core was generated by `BOOT_IMAGE=/boot/bzImage-4.19.57.mx64.286 root=LABEL=root ro crashkernel=512M c'.
> #0 0x0000000000000000 in irq_stack_union ()
> (gdb) source gdb-macros.txt
> (gdb) set $t=&init_task
> (gdb) print $t->tasks
> $1 = {next = 0xffff889ffbb0f080, prev = 0xffff88bff9b09300}
> (gdb) print $t->pid
> $2 = 0
> (gdb) set $t=container_of($t->tasks->next,struct task_struct,tasks)
> (gdb) print $t->tasks
> $3 = {next = 0xffff889ffbb0e340, prev = 0xffffffff82411a80 <init_task+768>}
> (gdb) print $t->pid
> $4 = 1
> (gdb) set $t=container_of($t->tasks->next,struct task_struct,tasks)
> (gdb) print $t->tasks
> $5 = {next = 0xffff889ffbb530c0, prev = 0xffff889ffbb0f080}
> (gdb) print $t->pid
> $6 = 2
> ```
>
> ### /proc/vmcore ###
>
> After the crash by SysRQ trigger, values in `/proc/vmcore` are incorrect.
>
> ```
> (gdb) set $t=&init_task
> (gdb) print $t->tasks
> $1 = {next = 0xffff889ffbb0f080, prev = 0xffff88bff9b09300}
> (gdb) print $t->pid
> $2 = 0
> (gdb) set $t=container_of($t->tasks->next,struct task_struct,tasks)
> (gdb) print $t->tasks
> $3 = {next = 0x0 <irq_stack_union>, prev = 0x0 <irq_stack_union>}
> (gdb) print $t->pid
> $4 = 0
> ```
>
> We can reproduce this in a virtual machine and on a big server.

Looking at the attached config file it seems the underlying arch is
x86_64, but there are a few things missing from your email which can
help suggest solutions better:

1. Can you please share bootargs provided to the kdump kernel,
2. Please share the kexec-tools version that you are using:
$ kexec --version
3. Do you notice any specific warning/error messages on the console
when the second (kdump) kernel executes - better still if you can
share a snippet of the second kernel's console messages - it will
further help in suggesting debug points for this issue.

Thanks,
Bhupesh

2019-08-22 17:09:29

by Donald Buczek

[permalink] [raw]
Subject: Re: Brocken/incomplete `/proc/vmcore`

Dear Paul,

On 8/15/19 1:36 PM, Paul Menzel wrote:
> Dear Linux folks,
>
>
> Using Linux 4.19.57 (configuration attached), crashing the system, and
> starting it using the same Linux kernel as crash kernel, the available
> `/proc/vmcore` seems to be incomplete.
>
> Running GDB commands, working with `/proc/kcore`, do not work with
> `/proc/vmcore`, and the addresses are not there.
>
> In the running system, iterating through the tasks works.
>
> ```
> macro define offsetof(type, member) ((size_t)(&((type *)0)->member))
> macro define container_of(ptr,type,member) ((type *)((size_t)ptr-offsetof(type,member)))
> ```
>
> ### /proc/kcore ###
>
> ```
> Core was generated by `BOOT_IMAGE=/boot/bzImage-4.19.57.mx64.286 root=LABEL=root ro crashkernel=512M c'.
> #0 0x0000000000000000 in irq_stack_union ()
> (gdb) source gdb-macros.txt
> (gdb) set $t=&init_task
> (gdb) print $t->tasks
> $1 = {next = 0xffff889ffbb0f080, prev = 0xffff88bff9b09300}
> (gdb) print $t->pid
> $2 = 0
> (gdb) set $t=container_of($t->tasks->next,struct task_struct,tasks)
> (gdb) print $t->tasks
> $3 = {next = 0xffff889ffbb0e340, prev = 0xffffffff82411a80 <init_task+768>}
> (gdb) print $t->pid
> $4 = 1
> (gdb) set $t=container_of($t->tasks->next,struct task_struct,tasks)
> (gdb) print $t->tasks
> $5 = {next = 0xffff889ffbb530c0, prev = 0xffff889ffbb0f080}
> (gdb) print $t->pid
> $6 = 2
> ```
>
> ### /proc/vmcore ###
>
> After the crash by SysRQ trigger, values in `/proc/vmcore` are incorrect.
>
> ```
> (gdb) set $t=&init_task
> (gdb) print $t->tasks
> $1 = {next = 0xffff889ffbb0f080, prev = 0xffff88bff9b09300}
> (gdb) print $t->pid
> $2 = 0
> (gdb) set $t=container_of($t->tasks->next,struct task_struct,tasks)
> (gdb) print $t->tasks
> $3 = {next = 0x0 <irq_stack_union>, prev = 0x0 <irq_stack_union>}
> (gdb) print $t->pid
> $4 = 0
> ```
>
> We can reproduce this in a virtual machine and on a big server.

It is the same bug as the one described in my mail "/proc/vmcore and wrong PAGE_OFFSET". The task list can be walked if addresses are corrected by 0x0000008000000000:

(gdb) set $t=&init_task
(gdb) print $t->pid
$1 = 0
(gdb) set $t=container_of($t->tasks->next,struct task_struct,tasks)
(gdb) set $t=(struct task_struct *)( (char *)$t - 0x0000008000000000)
(gdb) print $t->pid
$2 = 1
(gdb) set $t=container_of($t->tasks->next,struct task_struct,tasks)
(gdb) set $t=(struct task_struct *)( (char *)$t - 0x0000008000000000)
(gdb) print $t->pid
$3 = 2

The debugger has wrongly mapped the physical memory at virtual 0xffff880000000000 instead of at 0xffff888000000000, because the vmcore file says so for yet unknown reasons.

Donald

>
>
> Kind regards,
>
> Paul
>


--
Donald Buczek
[email protected]
Tel: +49 30 8413 1433