LinuxLists.cc - Re: [PATCH 0/3] Volatile Ranges (v7) & Lots of words

2012-10-09 08:32:20

Subject: Re: [PATCH 0/3] Volatile Ranges (v7) & Lots of words

On Fri, Sep 28, 2012 at 11:16:30PM -0400, John Stultz wrote:
> fd based interfaces vs madvise:
> In talking with Taras Glek, he pointed out that for his
> needs, the fd based interface is a little annoying, as it
> requires having to get access to tmpfs file and mmap it in,
> then instead of just referencing a pointer to the data he
> wants to mark volatile, he has to calculate the offset from
> start of the mmap and pass those file offsets to the interface.
> Instead he mentioned that using something like madvise would be
> much nicer, since they could just pass a pointer to the object
> in memory they want to make volatile and avoid the extra work.
>
> I'm not opposed to adding an madvise interface for this as
> well, but since we have a existing use case with Android's
> ashmem, I want to make sure we support this existing behavior.
> Specifically as with ashmem applications can be sharing
> these tmpfs fds, and so file-relative volatile ranges make
> more sense if you need to coordinate what data is volatile
> between two applications.
>
> Also, while I agree that having an madvise interface for
> volatile ranges would be nice, it does open up some more
> complex implementation issues, since with files, there is a
> fixed relationship between pages and the files' address_space
> mapping, where you can't have pages shared between different
> mappings. This makes it easy to hang the volatile-range tree
> off of the mapping (well, indirectly via a hash table). With
> general anonymous memory, pages can be shared between multiple
> processes, and as far as I understand, don't have any grouping
> structure we could use to determine if the page is in a
> volatile range or not. We would also need to determine more
> complex questions like: What are the semantics of volatility
> with copy-on-write pages? I'm hoping to investigate this
> idea more deeply soon so I can be sure whatever is pushed has
> a clear plan of how to address this idea. Further thoughts
> here would be appreciated.

Note it doesn't have to be a vs. situation. madvise could be an
additional way to interface with volatile ranges on a given fd.

That is, madvise doesn't have to mean anonymous memory. As a matter of
fact, MADV_WILLNEED/MADV_DONTNEED are usually used on mmaped files.
Similarly, there could be a way to use madvise to mark volatile ranges,
without the application having to track what memory ranges are
associated to what part of what file, which the kernel already tracks.

Mike

2012-10-09 21:31:37

by John Stultz

[permalink] [raw]

Subject: Re: [PATCH 0/3] Volatile Ranges (v7) & Lots of words

On 10/09/2012 01:07 AM, Mike Hommey wrote:
> Note it doesn't have to be a vs. situation. madvise could be an
> additional way to interface with volatile ranges on a given fd.
>
> That is, madvise doesn't have to mean anonymous memory. As a matter of
> fact, MADV_WILLNEED/MADV_DONTNEED are usually used on mmaped files.
> Similarly, there could be a way to use madvise to mark volatile ranges,
> without the application having to track what memory ranges are
> associated to what part of what file, which the kernel already tracks.

Good point. We could add madvise() interface, but limit it only to
mmapped tmpfs files, in parallel with the fallocate() interface.

However, I would like to think through how MADV_MARK_VOLATILE with
purely anonymous memory could work, before starting that approach. That
and Neil's point that having an identical kernel interface restricted to
tmpfs, only as a convenience to userland in switching from virtual
address to/from mmapped file offset may be better left to a userland
library.

thanks
-john

2012-10-10 00:11:08

by Minchan Kim

[permalink] [raw]

Subject: Re: [PATCH 0/3] Volatile Ranges (v7) & Lots of words

On Tue, Oct 09, 2012 at 02:30:03PM -0700, John Stultz wrote:
> On 10/09/2012 01:07 AM, Mike Hommey wrote:
> >Note it doesn't have to be a vs. situation. madvise could be an
> >additional way to interface with volatile ranges on a given fd.
> >
> >That is, madvise doesn't have to mean anonymous memory. As a matter of
> >fact, MADV_WILLNEED/MADV_DONTNEED are usually used on mmaped files.
> >Similarly, there could be a way to use madvise to mark volatile ranges,
> >without the application having to track what memory ranges are
> >associated to what part of what file, which the kernel already tracks.
>
> Good point. We could add madvise() interface, but limit it only to
> mmapped tmpfs files, in parallel with the fallocate() interface.
>
> However, I would like to think through how MADV_MARK_VOLATILE with
> purely anonymous memory could work, before starting that approach.
> That and Neil's point that having an identical kernel interface
> restricted to tmpfs, only as a convenience to userland in switching
> from virtual address to/from mmapped file offset may be better left
> to a userland library.

How about this?

The scenario I imagine about madvise semantic following as.

1) Anonymous pages
Assume that there is some allocator library which manage mmaped reserved pool.
If it has lots of free memory which isn't used by anyone, it can unmap part of
reserved pool but unmap isn't cheap because kernel should zap all ptes of the
pages in the range. But if we avoid unmap, VM would swap out that range which
have just garbage unnecessary when memory pressure happens.
If it mark that range volatile, we can avoid unnecessary swap out and even
reclaim them with no swap. Only thing allocator have to do is unmark that range
before allocating to user.

2) File pages(NOT tmpfs)
We can reclaim volatile file pages easily without recycling of LRU
although it is accessed recently.
The difference with DONTNEED is that DONTNEED always move pages to
tail of inactive LRU to reclaim early but VOLATILE semantic leave them
as it is without moving to tail and reclaim them without considering
recently-used when they reach at tail of LRU by aging because they can
be unmarked sooner or later for using and we can't expect cost of
recreating of the object.

So reclaim preference : NORMAL < VOLATILE < DONTNEED

>
> thanks
> -john
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kind regards,
Minchan Kim