2012-11-29 06:55:26

by Lin Feng

[permalink] [raw]
Subject: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

Hi all,

We encounter a "Resource temporarily unavailable" fail while trying
to offline a memory section in a movable zone. We found that there are
some pages can't be migrated. The offline operation fails in function
migrate_page_move_mapping() returning -EAGAIN till timeout because
the if assertion 'page_count(page) != 1' fails.
I wonder in the case 'page_count(page) != 1', should we always wait
(return -EAGAING)? Or in other words, can we do something here for
migration if we know where the pages from?

And finally found that such pages are used by /sbin/multipathd in the form
of aio ring_pages. Besides once increment introduced by the offline calling
chain, another increment is added by aio_setup_ring() via callling
get_userpages(), it won't decrease until we call aio_free_ring().

The dump_page info in the offline context is showed as following:
page:ffffea0011e69140 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1d
page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)
page:ffffea0011fb0480 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1c
page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)
page:ffffea0011fbaa80 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1a
page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)
page:ffffea0011ff21c0 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1b
page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)

The multipathd seems never going to release the ring_pages until we reboot the box.
Furthermore, if some guy makes app which only calls io_setup() but never calls
io_destroy() for the reason that he has to keep the io_setup() for a long time
or just forgets to or even on purpose that we can't expect.
So I think the mm-hotplug framwork should get the capability to deal with such
situation. And should we consider adding migration support for such pages?

However I don't know if there are any other kinds of such particular pages in
current kernel/Linux system. If unluckily there are many apparently it's hard to
handle them all, just adding migrate support for aio ring_pages is insufficient.

But if luckily can we use the private field of page struct to track the
ring_pages[] pointer so that we can retrieve the user when migrate?
Doing so another problem occurs, how to distinguish such special pages?
Use pageflag may cause an impact on current pageflag layout, add new pageflag
item also seems to be impossible.

I'm not sure what way is the right approach, seeking for help.
Any comments are extremely needed, thanks :)

Thanks,
linfeng


2012-11-29 23:39:33

by Andrew Morton

[permalink] [raw]
Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

On Thu, 29 Nov 2012 14:54:58 +0800
Lin Feng <[email protected]> wrote:

> Hi all,
>
> We encounter a "Resource temporarily unavailable" fail while trying
> to offline a memory section in a movable zone. We found that there are
> some pages can't be migrated. The offline operation fails in function
> migrate_page_move_mapping() returning -EAGAIN till timeout because
> the if assertion 'page_count(page) != 1' fails.
> I wonder in the case 'page_count(page) != 1', should we always wait
> (return -EAGAING)? Or in other words, can we do something here for
> migration if we know where the pages from?
>
> And finally found that such pages are used by /sbin/multipathd in the form
> of aio ring_pages. Besides once increment introduced by the offline calling
> chain, another increment is added by aio_setup_ring() via callling
> get_userpages(), it won't decrease until we call aio_free_ring().
>
> The dump_page info in the offline context is showed as following:
> page:ffffea0011e69140 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1d
> page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)
> page:ffffea0011fb0480 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1c
> page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)
> page:ffffea0011fbaa80 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1a
> page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)
> page:ffffea0011ff21c0 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1b
> page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)
>
> The multipathd seems never going to release the ring_pages until we reboot the box.
> Furthermore, if some guy makes app which only calls io_setup() but never calls
> io_destroy() for the reason that he has to keep the io_setup() for a long time
> or just forgets to or even on purpose that we can't expect.
> So I think the mm-hotplug framwork should get the capability to deal with such
> situation. And should we consider adding migration support for such pages?
>
> However I don't know if there are any other kinds of such particular pages in
> current kernel/Linux system. If unluckily there are many apparently it's hard to
> handle them all, just adding migrate support for aio ring_pages is insufficient.
>
> But if luckily can we use the private field of page struct to track the
> ring_pages[] pointer so that we can retrieve the user when migrate?
> Doing so another problem occurs, how to distinguish such special pages?
> Use pageflag may cause an impact on current pageflag layout, add new pageflag
> item also seems to be impossible.
>
> I'm not sure what way is the right approach, seeking for help.
> Any comments are extremely needed, thanks :)

Tricky.

I expect the same problem would occur with pages which are under
O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
periods, but the durations could still be lengthy (seconds).

Worse is a futex page, which could easily remain pinned indefinitely.

The best I can think of is to make changes in or around
get_user_pages(), to steal the pages from userspace and replace them
with non-movable ones before pinning them. The performance cost of
something like this would surely be unacceptable for direct-io, but
maybe OK for the aio ring and futexes.

2012-11-30 00:04:45

by Zach Brown

[permalink] [raw]
Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

> The best I can think of is to make changes in or around
> get_user_pages(), to steal the pages from userspace and replace them
> with non-movable ones before pinning them. The performance cost of
> something like this would surely be unacceptable for direct-io, but
> maybe OK for the aio ring and futexes.

In the aio case it seems like it could be taught to populate the mapping
with non-movable pages to begin with. It's calling get_user_pages() a
few lines after instantiating the mapping itself with do_mmap_pgoff().

- z

2012-11-30 03:40:03

by Lin Feng

[permalink] [raw]
Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

Hi Zach,

Thanks for your advice. So agree, I will look into it to lead aio
to use non-movable pages.


Thanks,
linfeng
On 11/30/2012 08:04 AM, Zach Brown wrote:
>> The best I can think of is to make changes in or around
>> get_user_pages(), to steal the pages from userspace and replace them
>> with non-movable ones before pinning them. The performance cost of
>> something like this would surely be unacceptable for direct-io, but
>> maybe OK for the aio ring and futexes.
>
> In the aio case it seems like it could be taught to populate the mapping
> with non-movable pages to begin with. It's calling get_user_pages() a
> few lines after instantiating the mapping itself with do_mmap_pgoff().
>
> - z
>

--
--------------------------------------------------
Lin Feng
Development Dept.I
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST) No. 6 Wenzhu Road,
Nanjing, 210012, China
PHONE:+86-25-86630566-8557
COINS:7998-8557
FAX:+86-25-83317685
MAIL:[email protected]
--------------------------------------------------

2012-11-30 03:42:24

by Lin Feng

[permalink] [raw]
Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

hi Andrew,

On 11/30/2012 07:39 AM, Andrew Morton wrote:
> Tricky.
>
> I expect the same problem would occur with pages which are under
> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
> periods, but the durations could still be lengthy (seconds).
the offline retry timeout duration is 2 minutes, so to O_DIRECT pages
seem maybe not a problem for the moment.
>
> Worse is a futex page, which could easily remain pinned indefinitely.
>
> The best I can think of is to make changes in or around
> get_user_pages(), to steal the pages from userspace and replace them
> with non-movable ones before pinning them. The performance cost of
> something like this would surely be unacceptable for direct-io, but
> maybe OK for the aio ring and futexes.
thanks for your advice.
I want to limit the impact as little as possible, as mentioned above,
direct-io seems not a problem, we needn't touch them. Maybe we can
just change the use of get_user_pages()(in or around) such as aio
ring pages. I will try to find a way to do this.

Thanks,
linfeng

2012-11-30 05:57:59

by Andrew Morton

[permalink] [raw]
Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng <[email protected]> wrote:

> hi Andrew,
>
> On 11/30/2012 07:39 AM, Andrew Morton wrote:
> > Tricky.
> >
> > I expect the same problem would occur with pages which are under
> > O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
> > periods, but the durations could still be lengthy (seconds).
> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages
> seem maybe not a problem for the moment.
> >
> > Worse is a futex page, which could easily remain pinned indefinitely.
> >
> > The best I can think of is to make changes in or around
> > get_user_pages(), to steal the pages from userspace and replace them
> > with non-movable ones before pinning them. The performance cost of
> > something like this would surely be unacceptable for direct-io, but
> > maybe OK for the aio ring and futexes.
> thanks for your advice.
> I want to limit the impact as little as possible, as mentioned above,
> direct-io seems not a problem, we needn't touch them. Maybe we can
> just change the use of get_user_pages()(in or around) such as aio
> ring pages. I will try to find a way to do this.

What about futexes?

2012-11-30 07:01:45

by Lin Feng

[permalink] [raw]
Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined



On 11/30/2012 01:57 PM, Andrew Morton wrote:
> On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng <[email protected]> wrote:
>
>> hi Andrew,
>>
>> On 11/30/2012 07:39 AM, Andrew Morton wrote:
>>> Tricky.
>>>
>>> I expect the same problem would occur with pages which are under
>>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
>>> periods, but the durations could still be lengthy (seconds).
>> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages
>> seem maybe not a problem for the moment.
>>>
>>> Worse is a futex page, which could easily remain pinned indefinitely.
>>>
>>> The best I can think of is to make changes in or around
>>> get_user_pages(), to steal the pages from userspace and replace them
>>> with non-movable ones before pinning them. The performance cost of
>>> something like this would surely be unacceptable for direct-io, but
>>> maybe OK for the aio ring and futexes.
>> thanks for your advice.
>> I want to limit the impact as little as possible, as mentioned above,
>> direct-io seems not a problem, we needn't touch them. Maybe we can
>> just change the use of get_user_pages()(in or around) such as aio
>> ring pages. I will try to find a way to do this.
>
> What about futexes?
hi Andrew,

Yes, better to find an approach to solve them all.

But I'm worried about that if we just confine get_user_pages() to use
none-movable pages, it will drain the none-movable pages soon. Because
there are many places using get_user_pages() such as some drivers.

IMHO in most cases get_user_pages() callers should release the pages soon,
so pages allocated from movable zone should be OK. But I'm not sure if
we get such rule upon get_user_pages().
And in other cases we specify get_user_pages() to allocate pages from
none-movable zone.

So could we add a zone-alloc flags when we call get_user_pages()?

Thanks,
linfeng

>

2012-11-30 07:13:47

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

(2012/11/30 14:57), Andrew Morton wrote:
> On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng <[email protected]> wrote:
>
>> hi Andrew,
>>
>> On 11/30/2012 07:39 AM, Andrew Morton wrote:
>>> Tricky.
>>>
>>> I expect the same problem would occur with pages which are under
>>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
>>> periods, but the durations could still be lengthy (seconds).
>> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages
>> seem maybe not a problem for the moment.
>>>
>>> Worse is a futex page, which could easily remain pinned indefinitely.
>>>
>>> The best I can think of is to make changes in or around
>>> get_user_pages(), to steal the pages from userspace and replace them
>>> with non-movable ones before pinning them. The performance cost of
>>> something like this would surely be unacceptable for direct-io, but
>>> maybe OK for the aio ring and futexes.
>> thanks for your advice.
>> I want to limit the impact as little as possible, as mentioned above,
>> direct-io seems not a problem, we needn't touch them. Maybe we can
>> just change the use of get_user_pages()(in or around) such as aio
>> ring pages. I will try to find a way to do this.
>
> What about futexes?
>

IIUC, futex's key is now a pair of (mm,address) or (inode, pgoff).
Then, get_user_page() in futex.c will release the page by put_page().
'struct page' is just touched by get_futex_key() to obtain page->mapping info.

Thanks,
-Kame




2012-11-30 07:55:15

by Andrew Morton

[permalink] [raw]
Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

On Fri, 30 Nov 2012 15:01:26 +0800 Lin Feng <[email protected]> wrote:

>
>
> On 11/30/2012 01:57 PM, Andrew Morton wrote:
> > On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng <[email protected]> wrote:
> >
> >> hi Andrew,
> >>
> >> On 11/30/2012 07:39 AM, Andrew Morton wrote:
> >>> Tricky.
> >>>
> >>> I expect the same problem would occur with pages which are under
> >>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
> >>> periods, but the durations could still be lengthy (seconds).
> >> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages
> >> seem maybe not a problem for the moment.
> >>>
> >>> Worse is a futex page, which could easily remain pinned indefinitely.
> >>>
> >>> The best I can think of is to make changes in or around
> >>> get_user_pages(), to steal the pages from userspace and replace them
> >>> with non-movable ones before pinning them. The performance cost of
> >>> something like this would surely be unacceptable for direct-io, but
> >>> maybe OK for the aio ring and futexes.
> >> thanks for your advice.
> >> I want to limit the impact as little as possible, as mentioned above,
> >> direct-io seems not a problem, we needn't touch them. Maybe we can
> >> just change the use of get_user_pages()(in or around) such as aio
> >> ring pages. I will try to find a way to do this.
> >
> > What about futexes?
> hi Andrew,
>
> Yes, better to find an approach to solve them all.
>
> But I'm worried about that if we just confine get_user_pages() to use
> none-movable pages, it will drain the none-movable pages soon. Because
> there are many places using get_user_pages() such as some drivers.

Obviously we shouldn't change get_user_pages() for all callers.

> IMHO in most cases get_user_pages() callers should release the pages soon,
> so pages allocated from movable zone should be OK. But I'm not sure if
> we get such rule upon get_user_pages().
> And in other cases we specify get_user_pages() to allocate pages from
> none-movable zone.
>
> So could we add a zone-alloc flags when we call get_user_pages()?

Well, that's a fairly low-level implementation detail. A more typical
approach would be to add a new get_user_pages_non_movable() or such.
That would probably have the same signature as get_user_pages(), with
one additional argument. Then get_user_pages() becomes a one-line
wrapper which passes in a particular value of that argument.

But that means we'd also have to add get_user_pages_fast_non_movable()
and things might become a bit stupid. A better approach might be to
add a new library function which callers can use before (or after?)
calling get_user_pages[_fast]().

Unsure. It's the sort of thing where one has to dive in and try a few
things.

2012-11-30 08:00:54

by Andrew Morton

[permalink] [raw]
Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

On Fri, 30 Nov 2012 16:13:16 +0900 Kamezawa Hiroyuki <[email protected]> wrote:

> > What about futexes?
> >
>
> IIUC, futex's key is now a pair of (mm,address) or (inode, pgoff).
> Then, get_user_page() in futex.c will release the page by put_page().
> 'struct page' is just touched by get_futex_key() to obtain page->mapping info.

Ah yes, that page is unpinned before syscall return.

grep -rl get_user_pages .

Gad.

These should be audited. The great majority will be simple and OK,
but drivers/media, drivers/infiniband and net/rds could be problematic.

2012-11-30 10:29:51

by Lin Feng

[permalink] [raw]
Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined



On 11/30/2012 03:55 PM, Andrew Morton wrote:
> On Fri, 30 Nov 2012 15:01:26 +0800 Lin Feng <[email protected]> wrote:
>
>>
>>
>> On 11/30/2012 01:57 PM, Andrew Morton wrote:
>>> On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng <[email protected]> wrote:
>>>
>>>> hi Andrew,
>>>>
>>>> On 11/30/2012 07:39 AM, Andrew Morton wrote:
>>>>> Tricky.
>>>>>
>>>>> I expect the same problem would occur with pages which are under
>>>>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
>>>>> periods, but the durations could still be lengthy (seconds).
>>>> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages
>>>> seem maybe not a problem for the moment.
>>>>>
>>>>> Worse is a futex page, which could easily remain pinned indefinitely.
>>>>>
>>>>> The best I can think of is to make changes in or around
>>>>> get_user_pages(), to steal the pages from userspace and replace them
>>>>> with non-movable ones before pinning them. The performance cost of
>>>>> something like this would surely be unacceptable for direct-io, but
>>>>> maybe OK for the aio ring and futexes.
>>>> thanks for your advice.
>>>> I want to limit the impact as little as possible, as mentioned above,
>>>> direct-io seems not a problem, we needn't touch them. Maybe we can
>>>> just change the use of get_user_pages()(in or around) such as aio
>>>> ring pages. I will try to find a way to do this.
>>>
>>> What about futexes?
>> hi Andrew,
>>
>> Yes, better to find an approach to solve them all.
>>
>> But I'm worried about that if we just confine get_user_pages() to use
>> none-movable pages, it will drain the none-movable pages soon. Because
>> there are many places using get_user_pages() such as some drivers.
>
> Obviously we shouldn't change get_user_pages() for all callers.
>
>> IMHO in most cases get_user_pages() callers should release the pages soon,
>> so pages allocated from movable zone should be OK. But I'm not sure if
>> we get such rule upon get_user_pages().
>> And in other cases we specify get_user_pages() to allocate pages from
>> none-movable zone.
>>
>> So could we add a zone-alloc flags when we call get_user_pages()?
>
> Well, that's a fairly low-level implementation detail. A more typical
> approach would be to add a new get_user_pages_non_movable() or such.
> That would probably have the same signature as get_user_pages(), with
> one additional argument. Then get_user_pages() becomes a one-line
> wrapper which passes in a particular value of that argument.
>
> But that means we'd also have to add get_user_pages_fast_non_movable()
> and things might become a bit stupid. A better approach might be to
hi Andrew,

Thanks for your patient reply.

What I can think out is like following:
inline int generic_get_user_pages(..., int movable_flag)
{
if (0 == movable_flag)
return get_user_pages();
else if (1 == movable_flag)
return get_user_pages_non_movable();
}
Yes, that seems to add a lot of duplicated codes.

> add a new library function which callers can use before (or after?)
> calling get_user_pages[_fast]().
Sorry, I'm not quite understand what "library function" function means..
Does it means a function aids get_user_pages() or totally wraps/replaces
get_user_pages(), or none of above?

Thanks,
linfeng
>
> Unsure. It's the sort of thing where one has to dive in and try a few
> things.
ah, maybe more complicated than as I can expect..
>

2012-11-30 10:48:06

by Andrew Morton

[permalink] [raw]
Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

On Fri, 30 Nov 2012 18:29:30 +0800 Lin Feng <[email protected]> wrote:

> > add a new library function which callers can use before (or after?)
> > calling get_user_pages[_fast]().
> Sorry, I'm not quite understand what "library function" function means..
> Does it means a function aids get_user_pages() or totally wraps/replaces
> get_user_pages(), or none of above?

"library function" is terminology for a general facility which
the core kernel makes available to other parts of the kernel.
get_user_pages() is a library function, as are the functions in lib/,
etc. "grep EXPORT_SYMBOL ./*/*.c"

2012-11-30 10:57:22

by Mel Gorman

[permalink] [raw]
Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

On Thu, Nov 29, 2012 at 03:39:30PM -0800, Andrew Morton wrote:
> On Thu, 29 Nov 2012 14:54:58 +0800
> Lin Feng <[email protected]> wrote:
>
> > Hi all,
> >
> > We encounter a "Resource temporarily unavailable" fail while trying
> > to offline a memory section in a movable zone. We found that there are
> > some pages can't be migrated. The offline operation fails in function
> > migrate_page_move_mapping() returning -EAGAIN till timeout because
> > the if assertion 'page_count(page) != 1' fails.
> > I wonder in the case 'page_count(page) != 1', should we always wait
> > (return -EAGAING)? Or in other words, can we do something here for
> > migration if we know where the pages from?
> >
> > And finally found that such pages are used by /sbin/multipathd in the form
> > of aio ring_pages. Besides once increment introduced by the offline calling
> > chain, another increment is added by aio_setup_ring() via callling
> > get_userpages(), it won't decrease until we call aio_free_ring().
> >
> > The dump_page info in the offline context is showed as following:
> > page:ffffea0011e69140 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1d
> > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)
> > page:ffffea0011fb0480 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1c
> > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)
> > page:ffffea0011fbaa80 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1a
> > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)
> > page:ffffea0011ff21c0 count:2 mapcount:0 mapping:ffff8801d6949881 index:0x7fc4b6d1b
> > page flags: 0x30000000018081d(locked|referenced|uptodate|dirty|swapbacked|unevictable)
> >
> > The multipathd seems never going to release the ring_pages until we reboot the box.
> > Furthermore, if some guy makes app which only calls io_setup() but never calls
> > io_destroy() for the reason that he has to keep the io_setup() for a long time
> > or just forgets to or even on purpose that we can't expect.
> > So I think the mm-hotplug framwork should get the capability to deal with such
> > situation. And should we consider adding migration support for such pages?
> >
> > However I don't know if there are any other kinds of such particular pages in
> > current kernel/Linux system. If unluckily there are many apparently it's hard to
> > handle them all, just adding migrate support for aio ring_pages is insufficient.
> >
> > But if luckily can we use the private field of page struct to track the
> > ring_pages[] pointer so that we can retrieve the user when migrate?
> > Doing so another problem occurs, how to distinguish such special pages?
> > Use pageflag may cause an impact on current pageflag layout, add new pageflag
> > item also seems to be impossible.
> >
> > I'm not sure what way is the right approach, seeking for help.
> > Any comments are extremely needed, thanks :)
>
> Tricky.
>
> I expect the same problem would occur with pages which are under
> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
> periods, but the durations could still be lengthy (seconds).
>
> Worse is a futex page, which could easily remain pinned indefinitely.
>
> The best I can think of is to make changes in or around
> get_user_pages(), to steal the pages from userspace and replace them
> with non-movable ones before pinning them. The performance cost of
> something like this would surely be unacceptable for direct-io, but
> maybe OK for the aio ring and futexes.
>

If this happens then it would be preferred if this only happened for
ZONE_MOVABLE. If it generally happens it means we're going to have a lot
more MIGRATE_UNMOVABLE pageblocks and a lot more fragmentation leading
to lower THP availability. For THP, we're ok if some pageblocks are
temporarily unavailable or even unavailable for long periods of time,
we can cope with that but we (or I at least) do not want to lower THP
availability on systems that do not care about ZONE_MOVABLE or node hot-plug.

--
Mel Gorman
SUSE Labs

2012-11-30 11:01:06

by Mel Gorman

[permalink] [raw]
Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

On Thu, Nov 29, 2012 at 11:55:02PM -0800, Andrew Morton wrote:
> On Fri, 30 Nov 2012 15:01:26 +0800 Lin Feng <[email protected]> wrote:
>
> >
> >
> > On 11/30/2012 01:57 PM, Andrew Morton wrote:
> > > On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng <[email protected]> wrote:
> > >
> > >> hi Andrew,
> > >>
> > >> On 11/30/2012 07:39 AM, Andrew Morton wrote:
> > >>> Tricky.
> > >>>
> > >>> I expect the same problem would occur with pages which are under
> > >>> O_DIRECT I/O. Obviously O_DIRECT pages won't be pinned for such long
> > >>> periods, but the durations could still be lengthy (seconds).
> > >> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages
> > >> seem maybe not a problem for the moment.
> > >>>
> > >>> Worse is a futex page, which could easily remain pinned indefinitely.
> > >>>
> > >>> The best I can think of is to make changes in or around
> > >>> get_user_pages(), to steal the pages from userspace and replace them
> > >>> with non-movable ones before pinning them. The performance cost of
> > >>> something like this would surely be unacceptable for direct-io, but
> > >>> maybe OK for the aio ring and futexes.
> > >> thanks for your advice.
> > >> I want to limit the impact as little as possible, as mentioned above,
> > >> direct-io seems not a problem, we needn't touch them. Maybe we can
> > >> just change the use of get_user_pages()(in or around) such as aio
> > >> ring pages. I will try to find a way to do this.
> > >
> > > What about futexes?
> > hi Andrew,
> >
> > Yes, better to find an approach to solve them all.
> >
> > But I'm worried about that if we just confine get_user_pages() to use
> > none-movable pages, it will drain the none-movable pages soon. Because
> > there are many places using get_user_pages() such as some drivers.
>
> Obviously we shouldn't change get_user_pages() for all callers.
>
> > IMHO in most cases get_user_pages() callers should release the pages soon,
> > so pages allocated from movable zone should be OK. But I'm not sure if
> > we get such rule upon get_user_pages().
> > And in other cases we specify get_user_pages() to allocate pages from
> > none-movable zone.
> >
> > So could we add a zone-alloc flags when we call get_user_pages()?
>
> Well, that's a fairly low-level implementation detail. A more typical
> approach would be to add a new get_user_pages_non_movable() or such.
> That would probably have the same signature as get_user_pages(), with
> one additional argument. Then get_user_pages() becomes a one-line
> wrapper which passes in a particular value of that argument.
>

That is going in the direction that all pinned pages become MIGRATE_UNMOVABLE
allocations. That will impact THP availability by increasing the number
of MIGRATE_UNMOVABLE blocks that exist and it would hit every user --
not just those that care about ZONE_MOVABLE.

I'm likely to NAK such a patch if it's only about node hot-remove because
it's much more of a corner case than wanting to use THP.

I would prefer if get_user_pages() checked if the page it was about to
pin was in ZONE_MOVABLE and if so, migrate it at that point before it's
pinned. It'll be expensive but will guarantee ZONE_MOVABLE availability
if that's what they want. The CMA people might also want to take
advantage of this if the page happened to be in the MIGRATE_CMA
pageblock.

--
Mel Gorman
SUSE Labs

2012-11-30 15:24:28

by Domenico Andreoli

[permalink] [raw]
Subject: Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

On Thu, Nov 29, 2012 at 02:54:58PM +0800, Lin Feng wrote:
> Hi all,

Hi Lin,

> We encounter a "Resource temporarily unavailable" fail while trying
> to offline a memory section in a movable zone. We found that there are
> some pages can't be migrated. The offline operation fails in function
> migrate_page_move_mapping() returning -EAGAIN till timeout because
> the if assertion 'page_count(page) != 1' fails.

is this something that worked before? if yes (then it's a regression)
do you know with which kernel?

Thanks,
Domenico