Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751867AbbD3AZp (ORCPT ); Wed, 29 Apr 2015 20:25:45 -0400 Received: from mail.efficios.com ([78.47.125.74]:46881 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751048AbbD3AZn (ORCPT ); Wed, 29 Apr 2015 20:25:43 -0400 Date: Thu, 30 Apr 2015 00:25:45 +0000 (UTC) From: Mathieu Desnoyers To: Boaz Harrosh Cc: Matthew Wilcox , LKML , Christoph Hellwig , Ross Zwisler Message-ID: <1089567325.41151.1430353545349.JavaMail.zimbra@efficios.com> In-Reply-To: <5540830D.7030006@plexistor.com> References: <1364669203.39300.1430235326454.JavaMail.zimbra@efficios.com> <5540830D.7030006@plexistor.com> Subject: Re: Using pmem from a driver exposing a memory mapping (mmap) to userspace MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [192.222.176.50] X-Mailer: Zimbra 8.0.7_GA_6021 (ZimbraWebClient - FF37 (Linux)/8.0.7_GA_6021) Thread-Topic: Using pmem from a driver exposing a memory mapping (mmap) to userspace Thread-Index: Sqm5wWR1/j6H77Frhvnb53Z/oljOxQ== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3198 Lines: 94 ----- Original Message ----- > On 04/28/2015 06:35 PM, Mathieu Desnoyers wrote: > > Hi! > > > > I'm currently adaping lttng-modules to use DAX and pmem. > > It will allow LTTng buffers to be recovered after a kernel > > crash. I've moved pretty much all struct page pointers to > > page frame numbers, as I remember being told that pmem does > > not have struct page. > > > > Now I'm looking into adapting my mmap and page fault handler > > implementation (based on struct page) to a page-frame number > > based implementation when the ring buffer is backed by > > persistent memory, which will probably not require any page > > fault handler at all when based by pmem+dax memory. > > There will be page-faults at lease once for every combination > of application+page. Sure there may only be one per a+p > until the application does a close on the file. > > Your job can be simple if you use the pmem's inode. You know > how each block-device is a mini file system with a single file. > Use bdev->bd_inode to get to the one inode associated with > your pmem bdev. Well this inode is IS_DAX(), so if you supply > your own get_block() function to the DAX handlers you need > not duplicate any mmap code at all. > > (You can also use the same DAX infrastructure for the read/write_iter > implementation) > > > > > My current work is in this branch: > > https://github.com/compudj/lttng-modules-dev/tree/persistent-memory-buffers > > (see last commits) > > > > LTTng-modules supports both mmap() and splice(), but I plan > > to only provide mmap() support for persistent memory, since > > splice() really requires struct page. > > > > No splice just works fine. In-fact a NULL .splice_XXX vector > will use the default_file_splice_read/write which does a > copy and uses your regular read/write_iter vectors. So > leave the .splice NULL and it will be supported by your > read/write_iter interface. > > > Are there existing driver mmap implementations doing similar > > things, or do you have recommendations on how to implement > > this ? > > > > DAX.c lib does all that you need. You only need your own > translation from your device files to a chunk of pmem. > > Its how I'd do it, good luck. CC me on the patches I'll > review them. Now that I think about it a bit more, the simplest solution would probably be to open() a file within a DAX-enabled filesystem from our userspace daemon (lttng-consumerd) for each buffer. Then, I could pass each file descriptor to the kernel through a lttng-specific ioctl(), and let the kernel use that file as a ring buffer. lttng-consumerd could then simply mmap() that file and use it if it wants to consume the data while it's being produced (optional). And this file stays there after crash/reboot, so we can extract the buffers with a separate tool. Thoughts ? Thanks! Mathieu > > Cheers > Boaz > > > Thanks, > > Mathieu > > -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/