2021-12-29 14:44:14

by Guilherme G. Piccoli

[permalink] [raw]
Subject: pstore/ramoops - why only collect a partial dmesg?

Hi Anton / Colin / Kees / Tony, I'd like to understand the rationale
behind a ramoops behavior, appreciate in advance any information/advice!

I've noticed that while using ramoops as a backend for pstore, only the
first "record_size" bytes of dmesg is collected/saved in sysfs on panic.
It is the "Part 1" of dmesg - seems this is on purpose [0], so I'm
curious on why can't we save the full dmesg split in multi-part files,
like efi-pstore for example?

If that's an interesting idea, I'm willing to try implementing that in
case there are no available patches for it already (maybe somebody
worked on it for their own usage). My idea would be to have a tuning to
enable or disable such new behavior, and we could have files like
"dmesg-ramoops-0.partX" as the partitions of the full "dmesg-ramoops-0".

Thanks in advance,


Guilherme


[0]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/pstore/ram.c#n353


2022-01-03 23:31:36

by Luck, Tony

[permalink] [raw]
Subject: RE: pstore/ramoops - why only collect a partial dmesg?

> Hi Anton / Colin / Kees / Tony, I'd like to understand the rationale
> behind a ramoops behavior, appreciate in advance any information/advice!
>
> I've noticed that while using ramoops as a backend for pstore, only the
> first "record_size" bytes of dmesg is collected/saved in sysfs on panic.
> It is the "Part 1" of dmesg - seems this is on purpose [0], so I'm
> curious on why can't we save the full dmesg split in multi-part files,
> like efi-pstore for example?
>
> If that's an interesting idea, I'm willing to try implementing that in
> case there are no available patches for it already (maybe somebody
> worked on it for their own usage). My idea would be to have a tuning to
> enable or disable such new behavior, and we could have files like
> "dmesg-ramoops-0.partX" as the partitions of the full "dmesg-ramoops-0".

Guilherme,

The efi (and erst) backends for pstore have severe limitations on the size
of objects that can store (just a few Kbytes) so pstore breaks the dmesg
data into pieces.

I'm not super-familiar with how ramoops behaves, but maybe it allows setting
a much larger "record_size" ... so this split isn't needed?

-Tony

2022-01-04 12:18:23

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: pstore/ramoops - why only collect a partial dmesg?

On 03/01/2022 20:31, Luck, Tony wrote:
> Guilherme,
>
> The efi (and erst) backends for pstore have severe limitations on the size
> of objects that can store (just a few Kbytes) so pstore breaks the dmesg
> data into pieces.
>
> I'm not super-familiar with how ramoops behaves, but maybe it allows setting
> a much larger "record_size" ... so this split isn't needed?
>
> -Tony
>

Hi Tony, thanks a lot for your response! It makes sense indeed, but in
my case, for example, I have a "log_buf_len=4M", but cannot allocate a
4M record_size - when I try that, I can only see page_alloc spews and
pstore/ramoops doesn't work. So, I could allocate 2M and that works
fine, but I then lose half of my dmesg heh
Hence my question.

If there's no special reason, I guess would make sense to allow ramoops
to split the dmesg, what do you think?
Cheers,


Guilherme

2022-01-04 17:01:01

by Luck, Tony

[permalink] [raw]
Subject: RE: pstore/ramoops - why only collect a partial dmesg?

> Hi Tony, thanks a lot for your response! It makes sense indeed, but in
> my case, for example, I have a "log_buf_len=4M", but cannot allocate a
> 4M record_size - when I try that, I can only see page_alloc spews and
> pstore/ramoops doesn't work. So, I could allocate 2M and that works
> fine, but I then lose half of my dmesg heh
> Hence my question.
>
> If there's no special reason, I guess would make sense to allow ramoops
> to split the dmesg, what do you think?

Guilherme,

Linux is indeed somewhat reluctant to hand out allocations > 2MB. :-(

Do you really need the whole dmesg in the pstore dump? The expectation
is that systems run normally for a while. During that time console logs are
saved off to /var/log/messages.

When the system crashes, the last part (the interesting bit!) of the console
log is lost. The purpose of pstore is to save that last bit.

So while you could add code to ramoops to save multiple 2MB chunks, it
doesn't seem (to me) that it would add much value.

-Tony

2022-01-04 18:04:17

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: pstore/ramoops - why only collect a partial dmesg?

On 04/01/2022 14:00, Luck, Tony wrote:
> [...]
> Guilherme,
>
> Linux is indeed somewhat reluctant to hand out allocations > 2MB. :-(
>
> Do you really need the whole dmesg in the pstore dump? The expectation
> is that systems run normally for a while. During that time console logs are
> saved off to /var/log/messages.
>
> When the system crashes, the last part (the interesting bit!) of the console
> log is lost. The purpose of pstore is to save that last bit.
>
> So while you could add code to ramoops to save multiple 2MB chunks, it
> doesn't seem (to me) that it would add much value.
>

Thanks again Tony, for the interesting points. So, I partially agree
with you: indeed, in a normal situation we have all messages collected
by some userspace daemon, and when some issue/oops happens, we can rely
on pstore to collect the latest portion of the log buffer (2M is a
bunch!) and "merge" that with the previously collected portion, likely
saved in a /var/log/ file.

The problem is that our use case is a bit different: the idea is to rely
on pstore/ramoops to collect the most information we can in a panic
event, without the need of kdump. The latter is a pretty
comprehensive/complete approach, but requires a bunch of memory reserved
- it's a bit too much if we want just the task list, backtraces and
memory state of the system, for example. And for that...we have the
"panic_print" setting!

There lies the issue: if I set panic_print to dump all backtraces, task
info and memory state in a panic event, that information + the
panic/oops and some previous relevant stuff, does it all fit in the 2M
chunk? Likely so, but *if it doesn't fit*, we may lose _exactly_ the
most important piece, which is the panic cause.

The same way I have the "log_buf_len" tuning to determine how much size
my log buffer has, I'd like to be able to effectively collect that much
information using pstore/ramoops. Requiring that amount of space in an
efi-pstore, for example, would be indeed really crazy! But ramoops is
just a way for using some portion of the system RAM to save the log
buffer, so I feel it'd be interesting to be able to properly collect
full logs there, no matter the size of the logs. Of course, I'd like to
see that as a setting, because the current behavior is great/enough for
most of users I guess, as you pointed, and there's no need to change it
by default.

Let me know your thoughts and maybe others also have good opinions about
that!
Cheers,


Guilherme

2022-01-04 18:46:06

by Luck, Tony

[permalink] [raw]
Subject: RE: pstore/ramoops - why only collect a partial dmesg?

> There lies the issue: if I set panic_print to dump all backtraces, task
> info and memory state in a panic event, that information + the
> panic/oops and some previous relevant stuff, does it all fit in the 2M
> chunk? Likely so, but *if it doesn't fit*, we may lose _exactly_ the
> most important piece, which is the panic cause.

That does change things ... I wonder how many megabytes you need
for a big system (hundreds of cores, thousands of tasks)!

This use case does look like it could use multiple chunks in ramoops.

-Tony

2022-01-04 19:37:09

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: pstore/ramoops - why only collect a partial dmesg?

On 04/01/2022 15:46, Luck, Tony wrote:
> That does change things ... I wonder how many megabytes you need
> for a big system (hundreds of cores, thousands of tasks)!

Heheh indeed, this case would require a very big log buffer I guess! But
our setup is not so big, only 4/8 CPUs, not so much RAM and not that
many tasks expected, opposed to a big server with maybe multiple VMs,
containers, etc...

>
> This use case does look like it could use multiple chunks in ramoops.

Cool, thanks! If nobody complains or show any reason in that ramoops
shouldn't be changed to deal with multi-chunk dmesg, I'll try to come up
with something then.

Cheers,


Guilherme

2023-06-28 18:51:11

by Yuxiao Zhang

[permalink] [raw]
Subject: Re: [PATCH] pstore: ramoops: support pmsg size larger than kmalloc limitation

Thanks for the review Kees.

>And yes, Greg's questions are all good -- fixing syntax and adding size
details in the commit log would be appreciated.

Sure, will send another patch to address this.

Hi Guilherme, thanks for testing this patch.

>But when I tried to increase the record size in ramoops, I got this error: https://termbin.com/b12e

What record type are you testing? I should have mention that this patch is only for pmsg use case on ramoops (I mentioned it in the original thread but need to start a new one due to format issue).

>Also - Kees certainly knows that way better, but - are we 100% sure that
the region for pstore can be non-contiguous? For some reason, I always
thought this was a requirement

Right, that is why this patch only touches the pstore ram path, I am not sure how things will go if it is block device backed.

Thanks,
-Yuxiao Zhang

2023-06-28 19:09:49

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH] pstore: ramoops: support pmsg size larger than kmalloc limitation

On 28/06/2023 20:48, Yuxiao Zhang wrote:
> [...]
> What record type are you testing? I should have mention that this patch is only for pmsg use case on ramoops (I mentioned it in the original thread but need to start a new one due to format issue).

Oh, my bad! I'm collecting oops log, so the change is not really related
to my case. It is in the patch title, so my fault not reading that
properly heheh


>
> Right, that is why this patch only touches the pstore ram path, I am not sure how things will go if it is block device backed.

It makes sense!
Thanks,


Guilherme