2012-08-06 03:03:31

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 4/5] [RFC][HACK] Add LRU_VOLATILE support to the VM

Hi John,

On Fri, Jul 27, 2012 at 11:57:11PM -0400, John Stultz wrote:
> In an attempt to push the volatile range managment even
> deeper into the VM code, this is my first attempt at
> implementing Minchan's idea of a LRU_VOLATILE list in
> the mm core.
>
> This list sits along side the LRU_ACTIVE_ANON, _INACTIVE_ANON,
> _ACTIVE_FILE, _INACTIVE_FILE and _UNEVICTABLE lru lists.
>
> When a range is marked volatile, the pages in that range
> are moved to the LRU_VOLATILE list. Since volatile pages
> can be quickly purged, this list is the first list we
> shrink when we need to free memory.
>
> When a page is marked non-volatile, it is moved from the
> LRU_VOLATILE list to the appropriate LRU_ACTIVE_ list.

I think active list promotion is not good.
It should go to the inactive list and they get a chance to
activate from inactive to active sooner or later if it is
really touched.

>
> This patch introduces the LRU_VOLATILE list, an isvolatile
> page flag, functions to mark and unmark a single page
> as volatile, and shrinker functions to purge volatile
> pages.
>
> This is a very raw first pass, and is neither performant
> or likely bugfree. It works in my trivial testing, but
> I've not pushed it very hard yet.
>
> I wanted to send it out just to get some inital thoughts
> on the approach and any suggestions should I be going too
> far in the wrong direction.

I look at this series and found several nitpicks about implemenataion
but I think it's not a good stage about concerning it.

Although naming is rather differet with I suggested, I think it's good idea.
So let's talk about it firstly.
I will call VOLATILE list as EReclaimale LRU list.

The purpose of it is that prevent unnecessary LRU churning and
reclaim unnecessary pages fastly so that latency-sensitive system
don't have a big latency when memory pressure happens.

Targets for the LRU list could be following as in future

1. volatile pages in this patchset.
2. ephemeral pages of tmem
3. madivse(DONTNEED)
4. fadvise(NOREUSE)
5. PG_reclaimed pages
6. clean pages if we write CFLRU(clean first LRU)

So if any guys have objection, please raise your hands
before further progress.

>
> CC: Andrew Morton <[email protected]>
> CC: Android Kernel Team <[email protected]>
> CC: Robert Love <[email protected]>
> CC: Mel Gorman <[email protected]>
> CC: Hugh Dickins <[email protected]>
> CC: Dave Hansen <[email protected]>
> CC: Rik van Riel <[email protected]>
> CC: Dmitry Adamushko <[email protected]>
> CC: Dave Chinner <[email protected]>
> CC: Neil Brown <[email protected]>
> CC: Andrea Righi <[email protected]>
> CC: Aneesh Kumar K.V <[email protected]>
> CC: Mike Hommey <[email protected]>
> CC: Jan Kara <[email protected]>
> CC: KOSAKI Motohiro <[email protected]>
> CC: Michel Lespinasse <[email protected]>
> CC: Minchan Kim <[email protected]>
> CC: [email protected] <[email protected]>
> Signed-off-by: John Stultz <[email protected]>
> ---
> include/linux/fs.h | 1 +
> include/linux/mm_inline.h | 2 ++
> include/linux/mmzone.h | 1 +
> include/linux/page-flags.h | 3 ++
> include/linux/swap.h | 3 ++
> mm/memcontrol.c | 1 +
> mm/page_alloc.c | 1 +
> mm/swap.c | 71 +++++++++++++++++++++++++++++++++++++++++
> mm/vmscan.c | 76 +++++++++++++++++++++++++++++++++++++++++---
> 9 files changed, 155 insertions(+), 4 deletions(-)
>
--
Kind regards,
Minchan Kim


2012-08-06 15:48:01

by Dan Magenheimer

[permalink] [raw]
Subject: RE: [PATCH 4/5] [RFC][HACK] Add LRU_VOLATILE support to the VM

> From: Minchan Kim [mailto:[email protected]]
> To: John Stultz
> Subject: Re: [PATCH 4/5] [RFC][HACK] Add LRU_VOLATILE support to the VM

Hi Minchan --

Thanks for cc'ing me on this!

> Targets for the LRU list could be following as in future
>
> 1. volatile pages in this patchset.
> 2. ephemeral pages of tmem
> 3. madivse(DONTNEED)
> 4. fadvise(NOREUSE)
> 5. PG_reclaimed pages
> 6. clean pages if we write CFLRU(clean first LRU)
>
> So if any guys have objection, please raise your hands
> before further progress.

I agree that the existing shrinker mechanism is too primitive
and the kernel needs to take into account more factors in
deciding how to quickly reclaim pages from a broader set
of sources. However, I think it is important to ensure
that both the "demand" side and the "supply" side are
studied. There has to be some kind of prioritization policy
among all the RAM consumers so that a lower-priority
alloc_page doesn't cause a higher-priority "volatile" page
to be consumed. I suspect this policy will be VERY hard to
define and maintain.

Related, ephemeral pages in tmem are not truly volatile
as there is always at least one tmem data structure pointing
to it. I haven't followed this thread previously so my apologies
if it already has this, but the LRU_VOLATILE list might
need to support a per-page "garbage collection" callback.

Dan

2012-08-06 20:39:06

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH 4/5] [RFC][HACK] Add LRU_VOLATILE support to the VM

On 08/05/2012 08:04 PM, Minchan Kim wrote:
> Hi John,
>
> On Fri, Jul 27, 2012 at 11:57:11PM -0400, John Stultz wrote:
>> In an attempt to push the volatile range managment even
>> deeper into the VM code, this is my first attempt at
>> implementing Minchan's idea of a LRU_VOLATILE list in
>> the mm core.
>>
>> This list sits along side the LRU_ACTIVE_ANON, _INACTIVE_ANON,
>> _ACTIVE_FILE, _INACTIVE_FILE and _UNEVICTABLE lru lists.
>>
>> When a range is marked volatile, the pages in that range
>> are moved to the LRU_VOLATILE list. Since volatile pages
>> can be quickly purged, this list is the first list we
>> shrink when we need to free memory.
>>
>> When a page is marked non-volatile, it is moved from the
>> LRU_VOLATILE list to the appropriate LRU_ACTIVE_ list.
> I think active list promotion is not good.
> It should go to the inactive list and they get a chance to
> activate from inactive to active sooner or later if it is
> really touched.

Ok. Thanks, I'll change it so we move to the inactive list then.


>> This patch introduces the LRU_VOLATILE list, an isvolatile
>> page flag, functions to mark and unmark a single page
>> as volatile, and shrinker functions to purge volatile
>> pages.
>>
>> This is a very raw first pass, and is neither performant
>> or likely bugfree. It works in my trivial testing, but
>> I've not pushed it very hard yet.
>>
>> I wanted to send it out just to get some inital thoughts
>> on the approach and any suggestions should I be going too
>> far in the wrong direction.
> I look at this series and found several nitpicks about implemenataion
> but I think it's not a good stage about concerning it.

Although while I know the design may still need significant change, I'd
still appreciate nitpicks, as they might help me better understand the
mm code and any mistakes I'm making.


> Although naming is rather differet with I suggested, I think it's good idea.
> So let's talk about it firstly.
> I will call VOLATILE list as EReclaimale LRU list.
Yea, I didn't want to call it ERECLAIMABLE since for this iteration I
was limiting the scope just to volatile pages. I'm totally fine renaming
it as the scope widens.

thanks
-john

2012-08-07 00:54:56

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 4/5] [RFC][HACK] Add LRU_VOLATILE support to the VM

On Mon, Aug 06, 2012 at 08:46:18AM -0700, Dan Magenheimer wrote:
> > From: Minchan Kim [mailto:[email protected]]
> > To: John Stultz
> > Subject: Re: [PATCH 4/5] [RFC][HACK] Add LRU_VOLATILE support to the VM
>
> Hi Minchan --
>
> Thanks for cc'ing me on this!
>
> > Targets for the LRU list could be following as in future
> >
> > 1. volatile pages in this patchset.
> > 2. ephemeral pages of tmem
> > 3. madivse(DONTNEED)
> > 4. fadvise(NOREUSE)
> > 5. PG_reclaimed pages
> > 6. clean pages if we write CFLRU(clean first LRU)
> >
> > So if any guys have objection, please raise your hands
> > before further progress.
>
> I agree that the existing shrinker mechanism is too primitive
> and the kernel needs to take into account more factors in
> deciding how to quickly reclaim pages from a broader set
> of sources. However, I think it is important to ensure
> that both the "demand" side and the "supply" side are
> studied. There has to be some kind of prioritization policy
> among all the RAM consumers so that a lower-priority
> alloc_page doesn't cause a higher-priority "volatile" page
> to be consumed. I suspect this policy will be VERY hard to
> define and maintain.

Yes. It's another story.
At the moment, VM doesn't consider such priority-inversion problem
excpet giving the more memory to privileged processes. It's so simple
but works well till now.

>
> Related, ephemeral pages in tmem are not truly volatile

"volatile" term is used by John for only his special patch so
I like Ereclaim(Easy Reclaim) rather than volatile.

> as there is always at least one tmem data structure pointing
> to it. I haven't followed this thread previously so my apologies
> if it already has this, but the LRU_VOLATILE list might
> need to support a per-page "garbage collection" callback.

Right. That's why this patch provides purgepage in address_space_operations.
I think zcache could attach own address_space_operations to the page
which is allocated by zbud for instance, zcache_purgepage which is called by VM
when the page is reclaimed. So zcache don't need custom LRU policy(but still need
linked list for managing zbuddy) and pass the decision to the VM.


>
> Dan
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kind regards,
Minchan Kim

2012-08-07 01:27:42

by Dan Magenheimer

[permalink] [raw]
Subject: RE: [PATCH 4/5] [RFC][HACK] Add LRU_VOLATILE support to the VM

> From: Minchan Kim [mailto:[email protected]]
> Subject: Re: [PATCH 4/5] [RFC][HACK] Add LRU_VOLATILE support to the VM
>
> On Mon, Aug 06, 2012 at 08:46:18AM -0700, Dan Magenheimer wrote:
> > > From: Minchan Kim [mailto:[email protected]]
> > > To: John Stultz
> > > Subject: Re: [PATCH 4/5] [RFC][HACK] Add LRU_VOLATILE support to the VM
> >
> > Hi Minchan --
> >
> > Thanks for cc'ing me on this!
> >
> > > Targets for the LRU list could be following as in future
> > >
> > > 1. volatile pages in this patchset.
> > > 2. ephemeral pages of tmem
> > > 3. madivse(DONTNEED)
> > > 4. fadvise(NOREUSE)
> > > 5. PG_reclaimed pages
> > > 6. clean pages if we write CFLRU(clean first LRU)
> > >
> > > So if any guys have objection, please raise your hands
> > > before further progress.
> >
> > I agree that the existing shrinker mechanism is too primitive
> > and the kernel needs to take into account more factors in
> > deciding how to quickly reclaim pages from a broader set
> > of sources. However, I think it is important to ensure
> > that both the "demand" side and the "supply" side are
> > studied. There has to be some kind of prioritization policy
> > among all the RAM consumers so that a lower-priority
> > alloc_page doesn't cause a higher-priority "volatile" page
> > to be consumed. I suspect this policy will be VERY hard to
> > define and maintain.
>
> Yes. It's another story.
> At the moment, VM doesn't consider such priority-inversion problem
> excpet giving the more memory to privileged processes. It's so simple
> but works well till now.

I think it is very important that both stories must be
solved together. See below...

> > Related, ephemeral pages in tmem are not truly volatile
>
> "volatile" term is used by John for only his special patch so
> I like Ereclaim(Easy Reclaim) rather than volatile.

If others agree, that's fine. However, the "E" prefix is
currently used differently in common English (for example,
for e-books). Maybe "ezreclaim"?

> > as there is always at least one tmem data structure pointing
> > to it. I haven't followed this thread previously so my apologies
> > if it already has this, but the LRU_VOLATILE list might
> > need to support a per-page "garbage collection" callback.
>
> Right. That's why this patch provides purgepage in address_space_operations.
> I think zcache could attach own address_space_operations to the page
> which is allocated by zbud for instance, zcache_purgepage which is called by VM
> when the page is reclaimed. So zcache don't need custom LRU policy(but still need
> linked list for managing zbuddy) and pass the decision to the VM.

The simple VM decisions are going to need a lot more intelligence
(and data?) to drive which page to reclaim. For example, is it better
to reclaim a pageframe that contains two compressed pages of ephemeral data
or a pageframe that has one active (or inactive) file page? Such
a policy is not "Easy". ;-)

(Also, BTW, zcache pages aren't in any address space so don't have
an address_space_operations... because it is not possible to directly
address the data in a compressed page.)

2012-08-07 01:43:54

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 4/5] [RFC][HACK] Add LRU_VOLATILE support to the VM

On Mon, Aug 06, 2012 at 06:26:03PM -0700, Dan Magenheimer wrote:
> > From: Minchan Kim [mailto:[email protected]]
> > Subject: Re: [PATCH 4/5] [RFC][HACK] Add LRU_VOLATILE support to the VM
> >
> > On Mon, Aug 06, 2012 at 08:46:18AM -0700, Dan Magenheimer wrote:
> > > > From: Minchan Kim [mailto:[email protected]]
> > > > To: John Stultz
> > > > Subject: Re: [PATCH 4/5] [RFC][HACK] Add LRU_VOLATILE support to the VM
> > >
> > > Hi Minchan --
> > >
> > > Thanks for cc'ing me on this!
> > >
> > > > Targets for the LRU list could be following as in future
> > > >
> > > > 1. volatile pages in this patchset.
> > > > 2. ephemeral pages of tmem
> > > > 3. madivse(DONTNEED)
> > > > 4. fadvise(NOREUSE)
> > > > 5. PG_reclaimed pages
> > > > 6. clean pages if we write CFLRU(clean first LRU)
> > > >
> > > > So if any guys have objection, please raise your hands
> > > > before further progress.
> > >
> > > I agree that the existing shrinker mechanism is too primitive
> > > and the kernel needs to take into account more factors in
> > > deciding how to quickly reclaim pages from a broader set
> > > of sources. However, I think it is important to ensure
> > > that both the "demand" side and the "supply" side are
> > > studied. There has to be some kind of prioritization policy
> > > among all the RAM consumers so that a lower-priority
> > > alloc_page doesn't cause a higher-priority "volatile" page
> > > to be consumed. I suspect this policy will be VERY hard to
> > > define and maintain.
> >
> > Yes. It's another story.
> > At the moment, VM doesn't consider such priority-inversion problem
> > excpet giving the more memory to privileged processes. It's so simple
> > but works well till now.
>
> I think it is very important that both stories must be
> solved together. See below...
>
> > > Related, ephemeral pages in tmem are not truly volatile
> >
> > "volatile" term is used by John for only his special patch so
> > I like Ereclaim(Easy Reclaim) rather than volatile.
>
> If others agree, that's fine. However, the "E" prefix is
> currently used differently in common English (for example,
> for e-books). Maybe "ezreclaim"?

Looks better. I will use that term from now on.
Thanks!

>
> > > as there is always at least one tmem data structure pointing
> > > to it. I haven't followed this thread previously so my apologies
> > > if it already has this, but the LRU_VOLATILE list might
> > > need to support a per-page "garbage collection" callback.
> >
> > Right. That's why this patch provides purgepage in address_space_operations.
> > I think zcache could attach own address_space_operations to the page
> > which is allocated by zbud for instance, zcache_purgepage which is called by VM
> > when the page is reclaimed. So zcache don't need custom LRU policy(but still need
> > linked list for managing zbuddy) and pass the decision to the VM.
>
> The simple VM decisions are going to need a lot more intelligence
> (and data?) to drive which page to reclaim. For example, is it better
> to reclaim a pageframe that contains two compressed pages of ephemeral data
> or a pageframe that has one active (or inactive) file page? Such
> a policy is not "Easy". ;-)

I should have said more cleary.
VM just pick a page in tail of ezreclaim list and and then just reclaim.
So rotation of the active page or two compresssed pages should be implemented
by smart zcache which can do anyting.

>
> (Also, BTW, zcache pages aren't in any address space so don't have
> an address_space_operations... because it is not possible to directly
> address the data in a compressed page.)

I mean we can make just fake address_space_operations like this.

static struct address_space_operations zcache_aop = {
.purgepage = zcache_purge_page,
};

static struct address_space zcache_address_space = {
.a_ops = &zcache_aop,
};

struct page *page = alloc_page();
page->mapping = &zcache_address_space;

>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kind regards,
Minchan Kim