2014-10-25 18:49:22

by Mathieu Desnoyers

[permalink] [raw]
Subject: Progress on system crash traces with LTTng using DAX and pmem

Hi Matthew, Hi Ross,

A quick follow up on my progress on using DAX and pmem with
LTTng. I've been able to successfully gather a user-space
trace into buffers mmap'd into an ext4 filesystem within
a pmem block device mounted with -o dax to bypass the page
cache. After a soft reboot, I'm able to mount the partition
again, and gather the very last data collected in the buffers
by the applications. I created a "lttng-crash" program that
extracts data from those buffers and converts the content
into a readable Common Trace Format trace. So I guess
you have a use-case for your patchsets on commodity hardware
right there. :)

I've been asked by my customers if DAX would work well with
mtd-ram, which they are using. To you foresee any roadblock
with this approach ?

FYI, the main reason why my customer wants to go with a
"trace into memory that survives soft reboot" approach
rather than to use things like kexec/kdump is that they
care about the amount of time it takes to reboot their
machines. They want a solution where they can extract the
detailed crash data after reboot, after the machine is
back online, rather than requiring a few minutes of offline
time to extract the crash details.

So I guess next year I'll probably be looking into
allocating the LTTng kernel tracer buffers into an mmap'd file
within a ext2/4-DAX-over-pmem/mtd-ram filesystem. It's going
to be exciting! :)

Please keep me in CC on your next patch versions. I'm willing
to spend some more time reviewing them if needed. By the way,
do you guys have a target time-frame/kernel version you aim
at for getting this work upstream ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


2014-10-27 18:48:15

by Matthew Wilcox

[permalink] [raw]
Subject: Re: Progress on system crash traces with LTTng using DAX and pmem

On Sat, Oct 25, 2014 at 12:51:25PM +0000, Mathieu Desnoyers wrote:
> A quick follow up on my progress on using DAX and pmem with
> LTTng. I've been able to successfully gather a user-space
> trace into buffers mmap'd into an ext4 filesystem within
> a pmem block device mounted with -o dax to bypass the page
> cache. After a soft reboot, I'm able to mount the partition
> again, and gather the very last data collected in the buffers
> by the applications. I created a "lttng-crash" program that
> extracts data from those buffers and converts the content
> into a readable Common Trace Format trace. So I guess
> you have a use-case for your patchsets on commodity hardware
> right there. :)

Sweet!

> I've been asked by my customers if DAX would work well with
> mtd-ram, which they are using. To you foresee any roadblock
> with this approach ?

Looks like we'd need to add support to mtd-blkdevs.c for DAX. I assume
they're already using one of the block-based ways to expose MTD to
filesystems, rather than jffs2/logfs/ubifs?

I'm thinking we might want to add a flag somewhere in the block_dev / bdi
that indicates whether DAX is supported. Currently we rely on whether
->direct_access is present in the block_device_operations to indicate
that, so we'd have to have two block_dev_operations in mtd-blkdevs,
depending on whether direct access is supported by the underlying
MTD device. Not a show-stopper.

> Please keep me in CC on your next patch versions. I'm willing
> to spend some more time reviewing them if needed. By the way,
> do you guys have a target time-frame/kernel version you aim
> at for getting this work upstream ?

We're trying to get it upstream ASAP. We've been working on it
publically since December last year, and it's getting frustrating that
it's not upstream already. I sent a v12 a few minutes before you sent
this message ... I thought git would add you to the cc's since your
Reviewed-by is on some of the patches.

2014-10-28 10:55:19

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: Progress on system crash traces with LTTng using DAX and pmem

On Sat, Oct 25, 2014 at 12:51:25PM +0000, Mathieu Desnoyers wrote:
> FYI, the main reason why my customer wants to go with a
> "trace into memory that survives soft reboot" approach
> rather than to use things like kexec/kdump is that they
> care about the amount of time it takes to reboot their
> machines. They want a solution where they can extract the
> detailed crash data after reboot, after the machine is
> back online, rather than requiring a few minutes of offline
> time to extract the crash details.

IIRC, on x86 there's no guarantee that your memory content will be
preserved over reboot. BIOS is free to mess with it.

--
Kirill A. Shutemov

2014-10-30 15:01:51

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: Progress on system crash traces with LTTng using DAX and pmem

----- Original Message -----
> From: "Matthew Wilcox" <[email protected]>
> To: "Mathieu Desnoyers" <[email protected]>
> Cc: "Matthew Wilcox" <[email protected]>, "Ross Zwisler" <[email protected]>, "lttng-dev"
> <[email protected]>, [email protected], [email protected], [email protected]
> Sent: Monday, October 27, 2014 2:48:09 PM
> Subject: Re: Progress on system crash traces with LTTng using DAX and pmem
>
> On Sat, Oct 25, 2014 at 12:51:25PM +0000, Mathieu Desnoyers wrote:
> > A quick follow up on my progress on using DAX and pmem with
> > LTTng. I've been able to successfully gather a user-space
> > trace into buffers mmap'd into an ext4 filesystem within
> > a pmem block device mounted with -o dax to bypass the page
> > cache. After a soft reboot, I'm able to mount the partition
> > again, and gather the very last data collected in the buffers
> > by the applications. I created a "lttng-crash" program that
> > extracts data from those buffers and converts the content
> > into a readable Common Trace Format trace. So I guess
> > you have a use-case for your patchsets on commodity hardware
> > right there. :)
>
> Sweet!
>
> > I've been asked by my customers if DAX would work well with
> > mtd-ram, which they are using. To you foresee any roadblock
> > with this approach ?
>
> Looks like we'd need to add support to mtd-blkdevs.c for DAX. I assume
> they're already using one of the block-based ways to expose MTD to
> filesystems, rather than jffs2/logfs/ubifs?

Yes, from what I understand they interact with a block device. They
are aiming at using ext2 over this block device. I'm adding Hans
Beckerus and Therry Vilmart in CC so they can describe how the mtd
device is used in their setup (which driver exactly, along with
kernel options to set it up if possible).

>
> I'm thinking we might want to add a flag somewhere in the block_dev / bdi
> that indicates whether DAX is supported. Currently we rely on whether
> ->direct_access is present in the block_device_operations to indicate
> that, so we'd have to have two block_dev_operations in mtd-blkdevs,
> depending on whether direct access is supported by the underlying
> MTD device. Not a show-stopper.

Great!

>
> > Please keep me in CC on your next patch versions. I'm willing
> > to spend some more time reviewing them if needed. By the way,
> > do you guys have a target time-frame/kernel version you aim
> > at for getting this work upstream ?
>
> We're trying to get it upstream ASAP. We've been working on it
> publically since December last year, and it's getting frustrating that
> it's not upstream already. I sent a v12 a few minutes before you sent
> this message ... I thought git would add you to the cc's since your
> Reviewed-by is on some of the patches.

It appears I have not received the patches. Would it be possible for you
to setup a git tree with those patches ? It would be easier for me to
try them out than to fish them from gmane. :-)

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2014-10-30 15:11:44

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: Progress on system crash traces with LTTng using DAX and pmem

----- Original Message -----
> From: "Kirill A. Shutemov" <[email protected]>
> To: "Mathieu Desnoyers" <[email protected]>
> Cc: "Matthew Wilcox" <[email protected]>, "Ross Zwisler" <[email protected]>, "lttng-dev"
> <[email protected]>, [email protected], [email protected], [email protected]
> Sent: Tuesday, October 28, 2014 6:54:58 AM
> Subject: Re: Progress on system crash traces with LTTng using DAX and pmem
>
> On Sat, Oct 25, 2014 at 12:51:25PM +0000, Mathieu Desnoyers wrote:
> > FYI, the main reason why my customer wants to go with a
> > "trace into memory that survives soft reboot" approach
> > rather than to use things like kexec/kdump is that they
> > care about the amount of time it takes to reboot their
> > machines. They want a solution where they can extract the
> > detailed crash data after reboot, after the machine is
> > back online, rather than requiring a few minutes of offline
> > time to extract the crash details.
>
> IIRC, on x86 there's no guarantee that your memory content will be
> preserved over reboot. BIOS is free to mess with it.

Hi Kirill,

This is a good point,

There are a few more aspects to consider here:

- Other architectures appear to have different guarantees, for
instance ARM which, AFAIK, does not reset memory on soft
reboot (well at least for my customer's boards). So I guess
if x86 wants to be competitive, it would be good for them to
offer a similar feature,

- Already having a subset of machines supporting this is useful,
e.g. storing trace buffers and recovering them after a crash,

- Since we are in a world of dynamically upgradable BIOS, perhaps
if we can show that there is value in having a BIOS option to
specify a memory range that should not be reset on soft reboot,
BIOS vendors might be inclined to include an option for it,

- Perhaps UEFI BIOS already have some way of specifying that a
memory range should not be reset on soft reboot ?

Thoughts ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2014-12-13 11:48:33

by Matt Fleming

[permalink] [raw]
Subject: Re: Progress on system crash traces with LTTng using DAX and pmem

On Thu, 30 Oct, at 03:11:36PM, Mathieu Desnoyers wrote:
>
> Hi Kirill,
>
> This is a good point,
>
> There are a few more aspects to consider here:
>
> - Other architectures appear to have different guarantees, for
> instance ARM which, AFAIK, does not reset memory on soft
> reboot (well at least for my customer's boards). So I guess
> if x86 wants to be competitive, it would be good for them to
> offer a similar feature,
>
> - Already having a subset of machines supporting this is useful,
> e.g. storing trace buffers and recovering them after a crash,
>
> - Since we are in a world of dynamically upgradable BIOS, perhaps
> if we can show that there is value in having a BIOS option to
> specify a memory range that should not be reset on soft reboot,
> BIOS vendors might be inclined to include an option for it,
>
> - Perhaps UEFI BIOS already have some way of specifying that a
> memory range should not be reset on soft reboot ?

We've achieved this in the past using UEFI capsules with the
EFI_CAPSULE_PERSIST_ACROSS_RESET header flag.

Unfortunately, runtime capsule support is pretty spotty, so it's not a
general solution right now.

--
Matt Fleming, Intel Open Source Technology Center