LinuxLists.cc - Using pmem from a driver exposing a memory mapping (mmap) to userspace

2015-04-28 15:35:24

Subject: Using pmem from a driver exposing a memory mapping (mmap) to userspace

Hi!

I'm currently adaping lttng-modules to use DAX and pmem.
It will allow LTTng buffers to be recovered after a kernel
crash. I've moved pretty much all struct page pointers to
page frame numbers, as I remember being told that pmem does
not have struct page.

Now I'm looking into adapting my mmap and page fault handler
implementation (based on struct page) to a page-frame number
based implementation when the ring buffer is backed by
persistent memory, which will probably not require any page
fault handler at all when based by pmem+dax memory.

My current work is in this branch: https://github.com/compudj/lttng-modules-dev/tree/persistent-memory-buffers
(see last commits)

LTTng-modules supports both mmap() and splice(), but I plan
to only provide mmap() support for persistent memory, since
splice() really requires struct page.

Are there existing driver mmap implementations doing similar
things, or do you have recommendations on how to implement
this ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2015-04-29 07:06:59

by Boaz Harrosh

[permalink] [raw]

Subject: Re: Using pmem from a driver exposing a memory mapping (mmap) to userspace

On 04/28/2015 06:35 PM, Mathieu Desnoyers wrote:
> Hi!
>
> I'm currently adaping lttng-modules to use DAX and pmem.
> It will allow LTTng buffers to be recovered after a kernel
> crash. I've moved pretty much all struct page pointers to
> page frame numbers, as I remember being told that pmem does
> not have struct page.
>
> Now I'm looking into adapting my mmap and page fault handler
> implementation (based on struct page) to a page-frame number
> based implementation when the ring buffer is backed by
> persistent memory, which will probably not require any page
> fault handler at all when based by pmem+dax memory.

There will be page-faults at lease once for every combination
of application+page. Sure there may only be one per a+p
until the application does a close on the file.

Your job can be simple if you use the pmem's inode. You know
how each block-device is a mini file system with a single file.
Use bdev->bd_inode to get to the one inode associated with
your pmem bdev. Well this inode is IS_DAX(), so if you supply
your own get_block() function to the DAX handlers you need
not duplicate any mmap code at all.

(You can also use the same DAX infrastructure for the read/write_iter
implementation)

>
> My current work is in this branch: https://github.com/compudj/lttng-modules-dev/tree/persistent-memory-buffers
> (see last commits)
>
> LTTng-modules supports both mmap() and splice(), but I plan
> to only provide mmap() support for persistent memory, since
> splice() really requires struct page.
>

No splice just works fine. In-fact a NULL .splice_XXX vector
will use the default_file_splice_read/write which does a
copy and uses your regular read/write_iter vectors. So
leave the .splice NULL and it will be supported by your
read/write_iter interface.

> Are there existing driver mmap implementations doing similar
> things, or do you have recommendations on how to implement
> this ?
>

DAX.c lib does all that you need. You only need your own
translation from your device files to a chunk of pmem.

Its how I'd do it, good luck. CC me on the patches I'll
review them.

Cheers
Boaz

> Thanks,
> Mathieu

2015-04-30 00:25:45

by Mathieu Desnoyers

[permalink] [raw]

Subject: Re: Using pmem from a driver exposing a memory mapping (mmap) to userspace

----- Original Message -----
> On 04/28/2015 06:35 PM, Mathieu Desnoyers wrote:
> > Hi!
> >
> > I'm currently adaping lttng-modules to use DAX and pmem.
> > It will allow LTTng buffers to be recovered after a kernel
> > crash. I've moved pretty much all struct page pointers to
> > page frame numbers, as I remember being told that pmem does
> > not have struct page.
> >
> > Now I'm looking into adapting my mmap and page fault handler
> > implementation (based on struct page) to a page-frame number
> > based implementation when the ring buffer is backed by
> > persistent memory, which will probably not require any page
> > fault handler at all when based by pmem+dax memory.
>
> There will be page-faults at lease once for every combination
> of application+page. Sure there may only be one per a+p
> until the application does a close on the file.
>
> Your job can be simple if you use the pmem's inode. You know
> how each block-device is a mini file system with a single file.
> Use bdev->bd_inode to get to the one inode associated with
> your pmem bdev. Well this inode is IS_DAX(), so if you supply
> your own get_block() function to the DAX handlers you need
> not duplicate any mmap code at all.
>
> (You can also use the same DAX infrastructure for the read/write_iter
> implementation)
>
> >
> > My current work is in this branch:
> > https://github.com/compudj/lttng-modules-dev/tree/persistent-memory-buffers
> > (see last commits)
> >
> > LTTng-modules supports both mmap() and splice(), but I plan
> > to only provide mmap() support for persistent memory, since
> > splice() really requires struct page.
> >
>
> No splice just works fine. In-fact a NULL .splice_XXX vector
> will use the default_file_splice_read/write which does a
> copy and uses your regular read/write_iter vectors. So
> leave the .splice NULL and it will be supported by your
> read/write_iter interface.
>
> > Are there existing driver mmap implementations doing similar
> > things, or do you have recommendations on how to implement
> > this ?
> >
>
> DAX.c lib does all that you need. You only need your own
> translation from your device files to a chunk of pmem.
>
> Its how I'd do it, good luck. CC me on the patches I'll
> review them.

Now that I think about it a bit more, the simplest solution
would probably be to open() a file within a DAX-enabled
filesystem from our userspace daemon (lttng-consumerd)
for each buffer. Then, I could pass each file descriptor
to the kernel through a lttng-specific ioctl(), and let
the kernel use that file as a ring buffer. lttng-consumerd
could then simply mmap() that file and use it if it wants
to consume the data while it's being produced (optional).

And this file stays there after crash/reboot, so we
can extract the buffers with a separate tool.

Thoughts ?

Thanks!

Mathieu

>
> Cheers
> Boaz
>
> > Thanks,
> > Mathieu
>
>

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com