2014-09-04 22:16:14

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCHv4 0/3] new APIs to allocate buffer-cache with user specific flag

On Thu, 4 Sep 2014 16:29:38 +0900 Gioh Kim <[email protected]> wrote:

> This patch try to solve problem that a long-lasting page caches of
> ext4 superblock and journaling of superblock disturb page migration.
>
> I've been testing CMA feature on my ARM-based platform
> and found that two page caches cannot be migrated.
> They are page caches of superblock of ext4 filesystem and its journaling data.
>
> Current ext4 reads superblock with sb_bread() that allocates page
> from movable area. But the problem is that ext4 hold the page until
> it is unmounted. If root filesystem is ext4 the page cannot be migrated
> forever.
> And also the journaling data for the superblock cannot be migreated.
>
> I introduce new APIs that allocates page cache with specific flag passed by an
> argument.
> *_gfp APIs are for user want to set page allocation flag for page cache
> allocation.
> And *_unmovable APIs are for the user wants to allocate page cache from
> non-movable area.
>
> It is useful for ext4/ext3 and others that want to hold page cache for a long
> time.

Could we please have some detailed information about the real-world
effect of this patchset?

You earlier said "My test platform is currently selling item in the
market. And also I test my patch when my platform is working as if
real user uses it.".

But what were the problems which were observed in standard kernels and
what effect did this patchset have upon them? Some quantitative
measurements will really help here.

I'm trying to get an understanding of how effective and important the
change is, whether others will see similar benefits. I'd also like to
understand how *complete* the fix is - were the problems which you
observed completely fixed, or do outstanding problems remain?

Thanks.


2014-09-05 00:37:05

by Gioh Kim

[permalink] [raw]
Subject: Re: [PATCHv4 0/3] new APIs to allocate buffer-cache with user specific flag



2014-09-05 오전 7:16, Andrew Morton 쓴 글:
> On Thu, 4 Sep 2014 16:29:38 +0900 Gioh Kim <[email protected]> wrote:
>
>> This patch try to solve problem that a long-lasting page caches of
>> ext4 superblock and journaling of superblock disturb page migration.
>>
>> I've been testing CMA feature on my ARM-based platform
>> and found that two page caches cannot be migrated.
>> They are page caches of superblock of ext4 filesystem and its journaling data.
>>
>> Current ext4 reads superblock with sb_bread() that allocates page
>> from movable area. But the problem is that ext4 hold the page until
>> it is unmounted. If root filesystem is ext4 the page cannot be migrated
>> forever.
>> And also the journaling data for the superblock cannot be migreated.
>>
>> I introduce new APIs that allocates page cache with specific flag passed by an
>> argument.
>> *_gfp APIs are for user want to set page allocation flag for page cache
>> allocation.
>> And *_unmovable APIs are for the user wants to allocate page cache from
>> non-movable area.
>>
>> It is useful for ext4/ext3 and others that want to hold page cache for a long
>> time.
>
> Could we please have some detailed information about the real-world
> effect of this patchset?
>
> You earlier said "My test platform is currently selling item in the
> market. And also I test my patch when my platform is working as if
> real user uses it.".

OK. I'm writing details as possible as I can.
Please feel free to request me more information.

My platform is TV and 1GB system memory and 256MB CMA memory.
I want to use full 256MB CMA memory.

>
> But what were the problems which were observed in standard kernels and
> what effect did this patchset have upon them? Some quantitative
> measurements will really help here.

The problem is that I cannot allocate entire CMA memory.
Actually the problem is not found without Joonsoo's patch: https://lkml.org/lkml/2014/5/28/64.
Without it CMA memory is free and every CMA-memory allocation is successed.

If the Joonsoo's patch is applied, the CMA memory is allocated generally when system boots-up.
Therefore superblocks of mounted filesystem and buffer cache of it are allocation from CMA memory.

I have three ext4 partitions and one squash partition.
The squash filesystem has no problem. It holds buffer-cache temporarily.

But each ext4 partition holds 2 buffer-cache until unmounted (one for sb and one for journal)
so that I found 2 or 3 pages, the page are storing buffer-cache, are busy in my platform
when I try to allocate 256MB, entiry CMA memory.
So my allocation fails.

This patchset makes the long-lasting buffer-caches be allocated in non-CMA area.
Therefore I can success to allocate the entire CMA memory always.

Please tell me what I have to measure quantitatively?
What I know is every ext4 filesystem has 2 long-lasting buffer-cache
that are released when it is unmounted.

I applied this patch and try to allocate the entire CMA memory almost 100-times.
And I successed always.

>
> I'm trying to get an understanding of how effective and important the
> change is, whether others will see similar benefits. I'd also like to
> understand how *complete* the fix is - were the problems which you
> observed completely fixed, or do outstanding problems remain?

I think this patch has benefits only for systems that use CMA or HOTPLUG feature.
As I mentioned above, the problem is not occured without Joonsoo's patch that allocates CMA area frequently.

If a system want to use CMA/HOTPLUG feature, I think, this patch is very important.
The problem is only several pages but several MB can be wasted considering an align of allocation size.
If allocation size align is 16MB and one page is busy, 16MB can be wasted.
For embedded system like TV 16MB is really big issue.

I beleive the problem is completely fixed with my patch.
I've tested many times for several days and reviewed ext4 code that deals with buffer-header.
I couldn't find any other problem.

I'm sorry to confuse you with my poor English.
Please reply me whatever you need.

Next week is Korean thanksgiving holidays.
I think I can reply on Fri.

>
> Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2014-09-05 01:14:19

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCHv4 0/3] new APIs to allocate buffer-cache with user specific flag

On Fri, Sep 05, 2014 at 09:37:05AM +0900, Gioh Kim wrote:
> >But what were the problems which were observed in standard kernels and
> >what effect did this patchset have upon them? Some quantitative
> >measurements will really help here.
>
> The problem is that I cannot allocate entire CMA memory.
> >
> Actually the problem is not found without Joonsoo's patch:
> https://lkml.org/lkml/2014/5/28/64. Without it CMA memory is free
> and every CMA-memory allocation is successed.
>
> If the Joonsoo's patch is applied, the CMA memory is allocated
> generally when system boots-up.

As I said earlier, I'm happy to carry this patch in the ext4 tree,
because as it turns out I could use this facility for another purpose
(to cause a few buffer cache allocations to happen with __GFP_NOFAIL).

I do have one question; I note that Joonsoo's patch dates back to May,
and yet this has not hit the mainline kernel, and I haven't seen any
discussions about this patch after May. Has there been some pushback
from the mm maintainers about Joonsoo's approach with respect to this
patch? What is the current status of that patch set?

Thanks,

- Ted

2014-09-05 01:48:09

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCHv4 0/3] new APIs to allocate buffer-cache with user specific flag

On Thu, Sep 04, 2014 at 09:14:19PM -0400, Theodore Ts'o wrote:
> On Fri, Sep 05, 2014 at 09:37:05AM +0900, Gioh Kim wrote:
> > >But what were the problems which were observed in standard kernels and
> > >what effect did this patchset have upon them? Some quantitative
> > >measurements will really help here.
> >
> > The problem is that I cannot allocate entire CMA memory.
> > >
> > Actually the problem is not found without Joonsoo's patch:
> > https://lkml.org/lkml/2014/5/28/64. Without it CMA memory is free
> > and every CMA-memory allocation is successed.
> >
> > If the Joonsoo's patch is applied, the CMA memory is allocated
> > generally when system boots-up.
>
> As I said earlier, I'm happy to carry this patch in the ext4 tree,
> because as it turns out I could use this facility for another purpose
> (to cause a few buffer cache allocations to happen with __GFP_NOFAIL).
>
> I do have one question; I note that Joonsoo's patch dates back to May,
> and yet this has not hit the mainline kernel, and I haven't seen any
> discussions about this patch after May. Has there been some pushback
> from the mm maintainers about Joonsoo's approach with respect to this
> patch? What is the current status of that patch set?

Hello,

That patchset is postponed, but will be continued. The reason is that
another bugs turn up frequently if that patchset is applied. I will
fix this bug first and will re-submit that patchset. Following is
the attempt to fix this bug.

https://lkml.org/lkml/2014/8/26/147

Thanks.

2014-09-05 03:17:35

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCHv4 0/3] new APIs to allocate buffer-cache with user specific flag

Joonson,

Thanks for the update. I've applied Gioh's patches to the ext4 tree,
but I'd appreciate a further clarification. My understanding with the
problem you were trying to address is that with the current CMA
implementation, kswapd was getting activiated too early, yes?

But it would still be a good idea to try to use non-moveable memory in
preference in favor of CMA memory; even if the page migration can move
the contents of the page elsewhere, wouldn't be better to avoid
needing to do the page migation in the first place. Given that the
ext4 file systems are getting mounted very early in the boot process,
when there should be plenty of CMA and non-CMA elegible memory
available, why was CMA memory getting selected for the buffer cache
allocations when non-CMA memory available?

In other words, even without Gioh's patch to force the use of non-CMA
eligible memory, wouldn't it be better if the memory allocator used
non-CMA preferentially if it were available. This should be
orthogonal to whether or not kswaped gets activiated, right?

Regards,

- Ted

2014-09-05 07:31:58

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCHv4 0/3] new APIs to allocate buffer-cache with user specific flag

On Thu, Sep 04, 2014 at 11:17:35PM -0400, Theodore Ts'o wrote:
> Joonson,
>
> Thanks for the update. I've applied Gioh's patches to the ext4 tree,
> but I'd appreciate a further clarification. My understanding with the
> problem you were trying to address is that with the current CMA
> implementation, kswapd was getting activiated too early, yes?
>
> But it would still be a good idea to try to use non-moveable memory in
> preference in favor of CMA memory; even if the page migration can move
> the contents of the page elsewhere, wouldn't be better to avoid
> needing to do the page migation in the first place. Given that the
> ext4 file systems are getting mounted very early in the boot process,
> when there should be plenty of CMA and non-CMA elegible memory
> available, why was CMA memory getting selected for the buffer cache
> allocations when non-CMA memory available?
>
> In other words, even without Gioh's patch to force the use of non-CMA
> eligible memory, wouldn't it be better if the memory allocator used
> non-CMA preferentially if it were available. This should be
> orthogonal to whether or not kswaped gets activiated, right?

Hello,

Yes, similar with your understanding.

With current CMA implementation, page allocator doesn't returns
freepages in CMA reserved region until there is no freepage of movable
type. If there is no freepage of movable type, freepages in CMA reserved
region is allocated via fallback allocation. This approach is simliar
with what you suggested.

In that situation, kswapd would start to reclaim because free memory
is already too low. By this reclaim, freepages of movable type are
refilled and page allocator would stop to allocate freepages in CMA
reserved region. So, with above approach, system would work as if
it has only (total memory - CMA reserved memory) memory.

CMA is intended for using reserved memory on general purpose when
reserved memory isn't used. But current implementation don't do that.
My patch fixes this situation by allocating freepages in CMA reserved
region aggrressively.

I also test another approach, such as allocate freepage in CMA
reserved region as late as possible, which is also similar to your
suggestion and this doesn't works well. When reclaim is started,
too many pages reclaim at once, because lru list has successive pages
in CMA region and these doesn't help kswapd's reclaim. kswapd stop
reclaiming when freepage count is recovered. But, CMA pages isn't
counted for freepage for kswapd because they can't be usable for
unmovable, reclaimable allocation. So kswap reclaim too many pages
at once unnecessarilly.

More detailed explanation may be in following link.
https://lkml.org/lkml/2014/5/28/57

Thanks.


2014-09-05 14:14:16

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCHv4 0/3] new APIs to allocate buffer-cache with user specific flag

On Fri, Sep 05, 2014 at 04:32:48PM +0900, Joonsoo Kim wrote:
> I also test another approach, such as allocate freepage in CMA
> reserved region as late as possible, which is also similar to your
> suggestion and this doesn't works well. When reclaim is started,
> too many pages reclaim at once, because lru list has successive pages
> in CMA region and these doesn't help kswapd's reclaim. kswapd stop
> reclaiming when freepage count is recovered. But, CMA pages isn't
> counted for freepage for kswapd because they can't be usable for
> unmovable, reclaimable allocation. So kswap reclaim too many pages
> at once unnecessarilly.

Have you considered putting the pages in a CMA region in a separate
zone? After all, that's what we originally did with brain-damaged
hardware that could only DMA into the low 16M of memory. We just
reserved a separate zone for that? That way, we could do
zone-directed reclaim and free pages in that zone, if that was what
was actually needed.

But if we would also preferentially avoid using pages from that zone
unless there was no choice, in order to avoid needing to do that
zone-directed reclaim. Perhaps a similar solution could do done here?

- Ted

2014-09-15 01:10:18

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCHv4 0/3] new APIs to allocate buffer-cache with user specific flag

On Fri, Sep 05, 2014 at 10:14:16AM -0400, Theodore Ts'o wrote:
> On Fri, Sep 05, 2014 at 04:32:48PM +0900, Joonsoo Kim wrote:
> > I also test another approach, such as allocate freepage in CMA
> > reserved region as late as possible, which is also similar to your
> > suggestion and this doesn't works well. When reclaim is started,
> > too many pages reclaim at once, because lru list has successive pages
> > in CMA region and these doesn't help kswapd's reclaim. kswapd stop
> > reclaiming when freepage count is recovered. But, CMA pages isn't
> > counted for freepage for kswapd because they can't be usable for
> > unmovable, reclaimable allocation. So kswap reclaim too many pages
> > at once unnecessarilly.
>
> Have you considered putting the pages in a CMA region in a separate
> zone? After all, that's what we originally did with brain-damaged
> hardware that could only DMA into the low 16M of memory. We just
> reserved a separate zone for that? That way, we could do
> zone-directed reclaim and free pages in that zone, if that was what
> was actually needed.

Sorry for long delay. It was long holidays.

No, I haven't consider it. It sounds good idea to place the pages in a
CMA region into a separate zone. Perhaps we can remove one of
migratetype, MIGRATE_CMA, with this way and it would be a good long-term
architecture for CMA.

I don't know exact history and reason why CMA is implemented in current
form. Ccing some experts in this area.

Thanks.

2014-09-15 06:37:54

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCHv4 0/3] new APIs to allocate buffer-cache with user specific flag

On Mon, Sep 15, 2014 at 10:10:18AM +0900, Joonsoo Kim wrote:
> On Fri, Sep 05, 2014 at 10:14:16AM -0400, Theodore Ts'o wrote:
> > On Fri, Sep 05, 2014 at 04:32:48PM +0900, Joonsoo Kim wrote:
> > > I also test another approach, such as allocate freepage in CMA
> > > reserved region as late as possible, which is also similar to your
> > > suggestion and this doesn't works well. When reclaim is started,
> > > too many pages reclaim at once, because lru list has successive pages
> > > in CMA region and these doesn't help kswapd's reclaim. kswapd stop
> > > reclaiming when freepage count is recovered. But, CMA pages isn't
> > > counted for freepage for kswapd because they can't be usable for
> > > unmovable, reclaimable allocation. So kswap reclaim too many pages
> > > at once unnecessarilly.
> >
> > Have you considered putting the pages in a CMA region in a separate
> > zone? After all, that's what we originally did with brain-damaged
> > hardware that could only DMA into the low 16M of memory. We just
> > reserved a separate zone for that? That way, we could do
> > zone-directed reclaim and free pages in that zone, if that was what
> > was actually needed.
>
> Sorry for long delay. It was long holidays.
>
> No, I haven't consider it. It sounds good idea to place the pages in a
> CMA region into a separate zone. Perhaps we can remove one of
> migratetype, MIGRATE_CMA, with this way and it would be a good long-term
> architecture for CMA.

IIRC, Mel suggested two options, ZONE_MOVABLE zone and MIGRATE_ISOLATE.
Absolutely, movable zone option is better solution if we consider
interacting with reclaim but one problem was CMA had a specific
requirement for memory in the middle of an existing zone.
And his concern comes true.
Look at https://lkml.org/lkml/2014/5/28/64.
It starts to add more stuff in allocator's fast path to overcome the
problem. :(

Let's rethink. We already have a logic to handle overlapping nodes/zones
in compaction.c so isn't it possible to have discrete address ranges
in a movable zone? If so, movable zone can include specific ranges horrible
devices want and it could make allocation/reclaim logic simple than now and
add overheads to slow path(ie, linear pfn scanning logic of zone like
compaction).

>
> I don't know exact history and reason why CMA is implemented in current
> form. Ccing some experts in this area.
>
> Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Kind regards,
Minchan Kim