2022-05-14 02:14:36

by syzbot

[permalink] [raw]
Subject: [syzbot] WARNING in follow_hugetlb_page

Hello,

syzbot found the following issue on:

HEAD commit: 1e1b28b936ae Add linux-next specific files for 20220513
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1480e611f00000
kernel config: https://syzkaller.appspot.com/x/.config?x=e4eb3c0c4b289571
dashboard link: https://syzkaller.appspot.com/bug?extid=acf65ca584991f3cc447
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

------------[ cut here ]------------
WARNING: CPU: 1 PID: 6653 at mm/hugetlb.c:6250 follow_hugetlb_page+0x1326/0x1c80 mm/hugetlb.c:6250
Modules linked in:
CPU: 1 PID: 6653 Comm: syz-executor.1 Not tainted 5.18.0-rc6-next-20220513-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:follow_hugetlb_page+0x1326/0x1c80 mm/hugetlb.c:6250
Code: 89 44 24 70 e8 2b 1d b7 ff 48 8b 44 24 70 48 85 c0 0f 84 f1 07 00 00 e8 88 1a b7 ff 48 83 ed 01 e9 09 fb ff ff e8 7a 1a b7 ff <0f> 0b 48 8b 7c 24 30 bb f4 ff ff ff e8 69 74 b8 07 4c 8b a4 24 b8
RSP: 0018:ffffc9000d04f7e0 EFLAGS: 00010212
RAX: 00000000000030f2 RBX: ffff88807eb336c0 RCX: ffffc90003331000
RDX: 0000000000040000 RSI: ffffffff81c38f76 RDI: 0000000000000003
RBP: ffffea0001fd8080 R08: 0000000000000000 R09: 0000000000000003
R10: ffffffff81b128fb R11: 0000000000000057 R12: 0000000000000002
R13: ffff88807eb336c0 R14: ffff88807eb33d80 R15: 0000000000000001
FS: 00007f59bc2a1700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f59bb0d6720 CR3: 000000001eaf8000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__get_user_pages+0x27a/0xfa0 mm/gup.c:1146
__get_user_pages_locked mm/gup.c:1365 [inline]
__gup_longterm_locked+0x1d5/0xfe0 mm/gup.c:1985
pin_user_pages+0x8e/0xe0 mm/gup.c:3118
io_sqe_buffer_register+0x254/0x1710 fs/io_uring.c:10537
io_sqe_buffers_register.cold+0x28e/0x443 fs/io_uring.c:10664
__io_uring_register fs/io_uring.c:12682 [inline]
__do_sys_io_uring_register+0xd21/0x1930 fs/io_uring.c:12816
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f59bb0890e9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f59bc2a1168 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab
RAX: ffffffffffffffda RBX: 00007f59bb19bf60 RCX: 00007f59bb0890e9
RDX: 0000000020000080 RSI: 0000000000000000 RDI: 0000000000000005
RBP: 00007f59bb0e308d R08: 0000000000000000 R09: 0000000000000000
R10: 1000000000000239 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd53b6ab6f R14: 00007f59bc2a1300 R15: 0000000000022000
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


2022-05-14 02:19:39

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

syzbot has found a reproducer for the following issue on:

HEAD commit: 1e1b28b936ae Add linux-next specific files for 20220513
git tree: linux-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=174ae715f00000
kernel config: https://syzkaller.appspot.com/x/.config?x=e4eb3c0c4b289571
dashboard link: https://syzkaller.appspot.com/bug?extid=acf65ca584991f3cc447
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11531766f00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16ce5a9ef00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

------------[ cut here ]------------
WARNING: CPU: 1 PID: 3611 at mm/hugetlb.c:6250 follow_hugetlb_page+0x1326/0x1c80 mm/hugetlb.c:6250
Modules linked in:
CPU: 1 PID: 3611 Comm: syz-executor603 Not tainted 5.18.0-rc6-next-20220513-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:follow_hugetlb_page+0x1326/0x1c80 mm/hugetlb.c:6250
Code: 89 44 24 70 e8 2b 1d b7 ff 48 8b 44 24 70 48 85 c0 0f 84 f1 07 00 00 e8 88 1a b7 ff 48 83 ed 01 e9 09 fb ff ff e8 7a 1a b7 ff <0f> 0b 48 8b 7c 24 30 bb f4 ff ff ff e8 69 74 b8 07 4c 8b a4 24 b8
RSP: 0018:ffffc90002f6f7e0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88801bdd4e00 RCX: 0000000000000000
RDX: ffff88801e4e3a80 RSI: ffffffff81c38f76 RDI: 0000000000000003
RBP: ffffea0001fe8680 R08: 0000000000000000 R09: 0000000000000003
R10: ffffffff81b128fb R11: 0000000000000008 R12: 000000000000001a
R13: ffff88801bdd4e00 R14: ffff88801bdd5600 R15: 0000000000000019
FS: 0000555556ad2300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000458 CR3: 000000001e850000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__get_user_pages+0x27a/0xfa0 mm/gup.c:1146
__get_user_pages_locked mm/gup.c:1365 [inline]
__gup_longterm_locked+0x1d5/0xfe0 mm/gup.c:1985
pin_user_pages+0x8e/0xe0 mm/gup.c:3118
io_sqe_buffer_register+0x254/0x1710 fs/io_uring.c:10537
io_sqe_buffers_register.cold+0x28e/0x443 fs/io_uring.c:10664
__io_uring_register fs/io_uring.c:12682 [inline]
__do_sys_io_uring_register+0xd21/0x1930 fs/io_uring.c:12816
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f5f42760cc9
Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffc3407aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f5f42760cc9
RDX: 0000000020000380 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00007f5f42724e70 R08: 0000000010000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000246 R12: 00007f5f42724f00
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>


2022-05-14 04:25:22

by Andrew Morton

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

On Fri, 13 May 2022 16:54:06 -0700 Minchan Kim <[email protected]> wrote:

> > > >> has been there over a month so I guess it's something else. Does
> > > >> someone have the time to bisect?
> > > >
> > > > I can recreate in my 'easy to debug' environment, so I can bisect in
> > > > parallel with other things I need to do today.
> > > >
> > >
> > > I isolated this to Minchan Kim's "mm: fix is_pinnable_page against on cma
> > > page". Yes, the fat finger fix is in next-20220513.
> > >
> > > I don't have time to analyze right now, but can confirm that in the
> > > reproducer is_pinnable_page is returning false after this change when it
> > > previously returned true.
> >
> > OK, thanks, I dropped mm-fix-is_pinnable_page-against-on-cma-page.patch
>
> Seems like bug of the patch v5 due to change of this
>
> if (mt & (MIGRATE_CMA | MIGRATE_ISOLATE))
>
> The migration type is not bit type so it shold be
>
> if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE)
>

argh, I meant to check that change but the grey cell died.

I'll bring it back, with

--- a/include/linux/mm.h~mm-fix-is_pinnable_page-against-on-cma-page-fix
+++ a/include/linux/mm.h
@@ -1635,7 +1635,7 @@ static inline bool is_pinnable_page(stru
int __mt = get_pageblock_migratetype(page);
int mt = __READ_ONCE(__mt);

- if (mt & (MIGRATE_CMA | MIGRATE_ISOLATE))
+ if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE)
return false;
#endif

_


2022-05-14 04:25:45

by John Hubbard

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

On 5/13/22 16:54, Minchan Kim wrote:
>>> I isolated this to Minchan Kim's "mm: fix is_pinnable_page against on cma
>>> page". Yes, the fat finger fix is in next-20220513.
>>>
>>> I don't have time to analyze right now, but can confirm that in the
>>> reproducer is_pinnable_page is returning false after this change when it
>>> previously returned true.
>>
>> OK, thanks, I dropped mm-fix-is_pinnable_page-against-on-cma-page.patch
>
> Seems like bug of the patch v5 due to change of this
>
> if (mt & (MIGRATE_CMA | MIGRATE_ISOLATE))
>
> The migration type is not bit type so it shold be
>
> if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE)
>

Sorry for leading you astray by recommending the bitwise OR, Minchan.
I overlooked that point even though it was right in front of me.


thanks,
--
John Hubbard
NVIDIA


2022-05-14 04:25:49

by John Hubbard

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

On 5/13/22 17:26, Minchan Kim wrote:
> Anything else further can we get insight from the warning?
>
> For example, pin_user_pages going on against a hugetlb page
> which are concurrently running alloc_contig_range(it's
> exported function so anyone can call randomly) so
> alloc_contig_range changes pageblock type as MIGRATE_ISOLATE
> under us so the hit at the warning?

Well, yes. First of all, the comments above the warning that fired have
gone a little bit stale: they claim that we can only hit the warning if
the page refcount overflows. However, we almost certainly got here via:

try_grab_folio()
/*
* Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
* right zone, so fail and let the caller fall back to the slow
* path.
*/
if (unlikely((flags & FOLL_LONGTERM) &&
!is_pinnable_page(page))) /* which we just changed */
return NULL;

...and now I'm starting to think that this warning might fire even with
the corrected check for MIGRATE_CMA || MIGRATE_ISOLATE. Because
try_grab_folio() didn't always have this early exit and it is starting
to look wrong.

Simply attempting to pin a non-pinnable huge page would hit this
warning. Adding additional reasons that a page is not pinnable (which
the patch does) could make this more likely to fire.

I need to look at this a little more closely, it is making me wonder
whether the is_pinnable_page() check is a problem in this path. The
comment in try_grab_folio() indicates that the early return is a hack
(it assumes that the caller is in the gup fast path), and maybe the hack
is just wrong here--I think we're actually on the slow gup path. Not
good.

Mike, any thoughts here?



thanks,
--
John Hubbard
NVIDIA

2022-05-14 04:26:23

by Minchan Kim

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

On Fri, May 13, 2022 at 05:09:11PM -0700, John Hubbard wrote:
> On 5/13/22 16:54, Minchan Kim wrote:
> > > > I isolated this to Minchan Kim's "mm: fix is_pinnable_page against on cma
> > > > page". Yes, the fat finger fix is in next-20220513.
> > > >
> > > > I don't have time to analyze right now, but can confirm that in the
> > > > reproducer is_pinnable_page is returning false after this change when it
> > > > previously returned true.
> > >
> > > OK, thanks, I dropped mm-fix-is_pinnable_page-against-on-cma-page.patch
> >
> > Seems like bug of the patch v5 due to change of this
> >
> > if (mt & (MIGRATE_CMA | MIGRATE_ISOLATE))
> >
> > The migration type is not bit type so it shold be
> >
> > if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE)
> >
>
> Sorry for leading you astray by recommending the bitwise OR, Minchan.
> I overlooked that point even though it was right in front of me.

No worry, John.

Anything else further can we get insight from the warning?

For example, pin_user_pages going on against a hugetlb page
which are concurrently running alloc_contig_range(it's
exported function so anyone can call randomly) so
alloc_contig_range changes pageblock type as MIGRATE_ISOLATE
under us so the hit at the warning?

2022-05-14 04:28:01

by John Hubbard

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

On 5/13/22 17:56, John Hubbard wrote:
> On 5/13/22 17:26, Minchan Kim wrote:
>> Anything else further can we get insight from the warning?
>>
>> For example, pin_user_pages going on against a hugetlb page
>> which are concurrently running alloc_contig_range(it's
>> exported function so anyone can call randomly) so
>> alloc_contig_range changes pageblock type as MIGRATE_ISOLATE
>> under us so the hit at the warning?
>
> Well, yes. First of all, the comments above the warning that fired have
> gone a little bit stale: they claim that we can only hit the warning if
> the page refcount overflows. However, we almost certainly got here via:
>
> try_grab_folio()
>     /*
>      * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
>      * right zone, so fail and let the caller fall back to the slow
>      * path.
>      */
>     if (unlikely((flags & FOLL_LONGTERM) &&
>              !is_pinnable_page(page))) /* which we just changed */

Specifically, the recent patch effectively acted as an error injection
test, by forcing is_pinnable_page() to always return true (if CONFIG_CMA
is defined). Because: MIGRATE_CMA|MIGRATE_ISOLATE == 7, which will match
any of the MIGRATE_* enums when checked with bitwise AND.

I suspect this particular error path has not been exercised much, or if
it has, not reported here anyway. Until now.


thanks,
--
John Hubbard
NVIDIA

2022-05-17 13:59:22

by Mike Kravetz

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

On 5/13/22 17:56, John Hubbard wrote:
> On 5/13/22 17:26, Minchan Kim wrote:
>> Anything else further can we get insight from the warning?
>>
>> For example, pin_user_pages going on against a hugetlb page
>> which are concurrently running alloc_contig_range(it's
>> exported function so anyone can call randomly) so
>> alloc_contig_range changes pageblock type as MIGRATE_ISOLATE
>> under us so the hit at the warning?
>
> Well, yes. First of all, the comments above the warning that fired have
> gone a little bit stale: they claim that we can only hit the warning if
> the page refcount overflows. However, we almost certainly got here via:

Yes, the comment is stale.

John, you added that comment with commit 3faa52c03f44. At that time,
the code was doing a try_grab_page(), and this routine does not check
for pinnable page in any manner. So, the comment was accurate at that
time.

Later, that code was modified (for performance reasons) in commit 0fa5bc4023c18 to do a single try_grab_compound_page() instead of multiple
calls to try_grab_page(). At the time it was not noticed that
try_grab_compound_page had a check for 'pinnable' when try_grab_page did
not. So, I think this commit actually changed the behavior.

>
> try_grab_folio()
>     /*
>      * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
>      * right zone, so fail and let the caller fall back to the slow
>      * path.
>      */
>     if (unlikely((flags & FOLL_LONGTERM) &&
>              !is_pinnable_page(page))) /* which we just changed */
>         return NULL;
>
> ...and now I'm starting to think that this warning might fire even with
> the corrected check for MIGRATE_CMA || MIGRATE_ISOLATE. Because
> try_grab_folio() didn't always have this early exit and it is starting
> to look wrong.
>
> Simply attempting to pin a non-pinnable huge page would hit this
> warning. Adding additional reasons that a page is not pinnable (which
> the patch does) could make this more likely to fire.

Yes, that is correct. One could easily allocate a hugetlb page from
CMA and trigger this warning.

>
> I need to look at this a little more closely, it is making me wonder
> whether the is_pinnable_page() check is a problem in this path. The
> comment in try_grab_folio() indicates that the early return is a hack
> (it assumes that the caller is in the gup fast path), and maybe the hack
> is just wrong here--I think we're actually on the slow gup path. Not
> good.
>
> Mike, any thoughts here?
>

Do you know why try_grab_compound_page(now try_grab_folio) checks for
pinnable when try_grab_page does not?

Then I guess the next question is 'Should we allow pinning of hugetlb pages
in these areas?'. My first thought would be no. But, recall it was 'allowed'
until that commit which changed try_grab_page to try_grab_compound_page.
In the 'common' case of compaction, we do not attempt to migrate/move hugetlb
pages (last time I looked), so long term pinning should not be an issue.
However, for things like memory offline or alloc_contig_range() we want to
migrate hugetlb pages, so this would be an issue there.

At a minimum, I think the warning should go.

More thoughts?
--
Mike Kravetz

2022-05-18 07:17:09

by John Hubbard

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

On 5/16/22 20:37, Mike Kravetz wrote:

> Later, that code was modified (for performance reasons) in commit 0fa5bc4023c18 to do a single try_grab_compound_page() instead of multiple
> calls to try_grab_page(). At the time it was not noticed that
> try_grab_compound_page had a check for 'pinnable' when try_grab_page did
> not. So, I think this commit actually changed the behavior.

This makes me unhappy with the try_grab_page() situation: the name
doesn't give you any hint that it now has extra policy in there.
Furthermore, the policy is unaware of all of the call sites and
is already getting it wrong.

After looking through the call sites, I'm leaning slightly toward pulling
is_pinnable_page() out, and letting the call sites decide if they want
or need to apply that policy.

Just because try_grab_folio() is a choke point, does not mean that it
is the right place for the is_pinnable_page() checks. And while it's
nice to avoid scattering is_pinnable_page() all over the place, it's
more important to *not* have it when it is not correct.

There is a counter-argument, though, which goes something like,
"try_grab_folio(FOLL_PIN) is never ever correct if the page is not
pinnable". But that's not as clear cut as one might think. See below.

>
>>
>> try_grab_folio()
>>     /*
>>      * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
>>      * right zone, so fail and let the caller fall back to the slow
>>      * path.
>>      */
>>     if (unlikely((flags & FOLL_LONGTERM) &&
>>              !is_pinnable_page(page))) /* which we just changed */
>>         return NULL;
>>
>> ...and now I'm starting to think that this warning might fire even with
>> the corrected check for MIGRATE_CMA || MIGRATE_ISOLATE. Because
>> try_grab_folio() didn't always have this early exit and it is starting
>> to look wrong.
>>
>> Simply attempting to pin a non-pinnable huge page would hit this
>> warning. Adding additional reasons that a page is not pinnable (which
>> the patch does) could make this more likely to fire.
>
> Yes, that is correct. One could easily allocate a hugetlb page from
> CMA and trigger this warning.
>

Thanks for confirming that!

>>
>> I need to look at this a little more closely, it is making me wonder
>> whether the is_pinnable_page() check is a problem in this path. The
>> comment in try_grab_folio() indicates that the early return is a hack
>> (it assumes that the caller is in the gup fast path), and maybe the hack
>> is just wrong here--I think we're actually on the slow gup path. Not
>> good.
>>
>> Mike, any thoughts here?
>>
>
> Do you know why try_grab_compound_page(now try_grab_folio) checks for
> pinnable when try_grab_page does not?

I think it's just an oversight. The CMA and migrate-out logic was sort of
retrofitted and I think it's just incomplete, yet.

>
> Then I guess the next question is 'Should we allow pinning of hugetlb pages
> in these areas?'. My first thought would be no. But, recall it was 'allowed'
> until that commit which changed try_grab_page to try_grab_compound_page.
> In the 'common' case of compaction, we do not attempt to migrate/move hugetlb
> pages (last time I looked), so long term pinning should not be an issue.
> However, for things like memory offline or alloc_contig_range() we want to
> migrate hugetlb pages, so this would be an issue there.
>
> At a minimum, I think the warning should go.

Agreed. That, and either a) bring try_grab_folio() and try_grab_page() into
the same behavior with respect to checking for pinnable, or b) lifting
is_pinnable_page() out of try_grab_folio() and letting the callers decide, as
mentioned above.

Your point above makes it seem like the flexibility of (b) might be better...

thanks,
--
John Hubbard
NVIDIA



2022-05-22 11:57:25

by Minchan Kim

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

On Sat, May 21, 2022 at 06:46:27PM +0200, David Hildenbrand wrote:

< snip >

> >> The general rules are:
> >>
> >> ZONE_MOVABLE: nobody is allowed to place unmovable allocations there; it
> >> could prevent memory offlining/unplug.
> >>
> >> CMA: nobody *but the designated owner* is allowed to place unmovable
> >> memory there; it could prevent the actual owner to allocate contiguous
> >> memory.
> >
> > I am confused what's the meaning of designated owner and actuall owner
> > in your context.
>
> designated==actual here. I just wanted to distinguish from someone
> current temporary owner of the page ("allocated it via a movable
> allocation") but the actual designated owner (e.g., hugetlb CMA)
>
> The page/memory owner terminology is just confusing. Let's rephrase to:
> only the CMA area owner is allowed to place unmovable allocations there.

Yeah, the CMA area owner is much better.

>
> >
> > What I thought about the issue based on you explanation:
> >
> > HugeTLB allocates its page by two types of allocation
> >
> > 1. alloc_pages(GFP_MOVABLE)
> >
> > It could allocate the hugetlb page from CMA area but longterm pin
> > should migrate them out of cma before the pinning so allowing
> > the pinning on the page is no problem and current code works like
> > that.
> >
> > check_and_migrate_movable_pages
> >
>
> Yes.
>
> > 2. cma_alloc
> >
> > The cma_alloc is used only for *gigantic page* and the hugetlbfs
> > is the very owner of the page. IOW, if the hugetlbfs was succeeded
> > to allocate the gigantic page by cma_alloc, there is no other
> > owner to be able to claim the page any longer so it's fine to
> > allow longterm pinning againt the gingantic page but current.
> > However, current code doesn't work like that due to
> > is_pinnable_page. IOW, hugetlbfs need a way to distinguish
> > whether the page owner is hugetlbfs or not.
> >
> > Are we on same page?
>
> Yes, exactly. What I wanted to express is: for huge pages we have to
> make a smarter decision because there are cases where we want to
> migrate, and cases where we don't want to migrate.

Sure, maybe hugetlbfs could squeeze a bit in one of subpage of the
CMA compound page. "I am CMA allocated but allow to pinned for longterm"

Thanks.

2022-05-22 15:24:31

by Minchan Kim

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

On Fri, May 20, 2022 at 05:04:22PM -0700, Mike Kravetz wrote:
> On 5/20/22 16:43, Minchan Kim wrote:
> > On Fri, May 20, 2022 at 04:31:31PM -0700, Mike Kravetz wrote:
> >> On 5/20/22 15:56, John Hubbard wrote:
> >>> On 5/20/22 15:19, Minchan Kim wrote:
> >>>> The memory offline would be an issue so we shouldn't allow pinning of any
> >>>> pages in *movable zone*.
> >>>>
> >>>> Isn't alloc_contig_range just best effort? Then, it wouldn't be a big
> >>>> problem to allow pinning on those area. The matter is what target range
> >>>> on alloc_contig_range is backed by CMA or movable zone and usecases.
> >>>>
> >>>> IOW, movable zone should be never allowed. But CMA case, if pages
> >>>> are used by normal process memory instead of hugeTLB, we shouldn't
> >>>> allow longterm pinning since someone can claim those memory suddenly.
> >>>> However, we are fine to allow longterm pinning if the CMA memory
> >>>> already claimed and mapped at userspace(hugeTLB case IIUC).
> >>>>
> >>>
> >>> From Mike's comments and yours, plus a rather quick reading of some
> >>> CMA-related code in mm/hugetlb.c (free_gigantic_page(), alloc_gigantic_pages()), the following seems true:
> >>>
> >>> a) hugetlbfs can allocate pages *from* CMA, via cma_alloc()
> >>>
> >>> b) while hugetlbfs is using those CMA-allocated pages, it is debatable
> >>> whether those pages should be allowed to be long term pinned. That's
> >>> because there are two cases:
> >>>
> >>> ??? Case 1: pages are longterm pinned, then released, all while
> >>> ??????????? owned by hugetlbfs. No problem.
> >>>
> >>> ??? Case 2: pages are longterm pinned, but then hugetlbfs releases the
> >>> ??????????? pages entirely (via unmounting hugetlbfs, I presume). In
> >>> ??????????? this case, we now have CMA page that are long-term pinned,
> >>> ??????????? and that's the state we want to avoid.
> >>
> >> I do not think case 2 can happen. A hugetlb page can only be changed back
> >> to 'normal' (buddy) pages when ref count goes to zero.
> >>
> >> It should also be noted that hugetlb code sets up the CMA area from which
> >> hugetlb pages can be allocated. This area is never unreserved/freed.
> >>
> >> I do not think there is a reason to disallow long term pinning of hugetlb
> >> pages allocated from THE hugetlb CMA area.
> >>
> >> But, I wonder if it is possible for hugetlb pages to be allocated from
> >> another (non-hugetlb) area. For example if someone sets up a huge CMA area
> >> and hugetlb allocations spill over into that area. If this is possible
> >> (still need to research), then we would not want to long term pin such
> >> hugetlb pages. We can check this in the hugetlb code to determine if
> >> long term pinning is allowed.
> >
> > I don't think it's possible because cma_alloc needs "struct cma" just
> > like handle and VM doesn't maintain any fallback list of cma chains
> > so unless someone could steal the handle somehow, there is no way to
> > claim memory others reserved for the CMA purpose.
>
> I was thinking about the case where a hugetlb page is allocated via
> __alloc_pages(). Not sure if that can fall back to a CMA area that
> someone else might have created/reserved.
>
> Unless I do not understand, normal movable memory allocations can fall
> back to CMA areas?

In the case, Yes, it would be fallback if gfp_flag was __GFP_MOVABLE.

If HugeTLB support it(I think so), pin_user_pages with FOLL_LONGTERM
will migrate the page out of movable/CMA before the longterm pinning
so IMHO, we shouldn't have the problem.

__gup_longterm_locked
__get_user_pages_locked
check_and_migrate_movable_pages

> --
> Mike Kravetz

2022-05-23 05:51:04

by John Hubbard

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

On 5/20/22 15:19, Minchan Kim wrote:
> The memory offline would be an issue so we shouldn't allow pinning of any
> pages in *movable zone*.
>
> Isn't alloc_contig_range just best effort? Then, it wouldn't be a big
> problem to allow pinning on those area. The matter is what target range
> on alloc_contig_range is backed by CMA or movable zone and usecases.
>
> IOW, movable zone should be never allowed. But CMA case, if pages
> are used by normal process memory instead of hugeTLB, we shouldn't
> allow longterm pinning since someone can claim those memory suddenly.
> However, we are fine to allow longterm pinning if the CMA memory
> already claimed and mapped at userspace(hugeTLB case IIUC).
>

From Mike's comments and yours, plus a rather quick reading of some
CMA-related code in mm/hugetlb.c (free_gigantic_page(),
alloc_gigantic_pages()), the following seems true:

a) hugetlbfs can allocate pages *from* CMA, via cma_alloc()

b) while hugetlbfs is using those CMA-allocated pages, it is debatable
whether those pages should be allowed to be long term pinned. That's
because there are two cases:

Case 1: pages are longterm pinned, then released, all while
owned by hugetlbfs. No problem.

Case 2: pages are longterm pinned, but then hugetlbfs releases the
pages entirely (via unmounting hugetlbfs, I presume). In
this case, we now have CMA page that are long-term pinned,
and that's the state we want to avoid.

The reason it is debatable is that hugetlbfs is intended to be used
long term, itself. The expected use cases do not normally include a
lot of short term mounting and unmounting.

And whichever way that debate goes, we need to allow it to be
fixable, by not tying "is pinnable" to "using gup/pup". The caller
has the context that is needed to make that policy decision, but
gup/pup does not.

At this point, I think it's time to fix up the problems and restore
previous behavior, by choosing Case 1 behavior for now. And also
lifting the is_pinnable_page() checks up a level, as noted in my
other thread. I can do that, unless someone sees a flaw in the
reasoning.

thanks,
--
John Hubbard
NVIDIA

2022-05-23 06:10:22

by David Hildenbrand

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

>>>>>> It should also be noted that hugetlb code sets up the CMA area from which
>>>>>> hugetlb pages can be allocated. This area is never unreserved/freed.
>>>>>>
>>>>>> I do not think there is a reason to disallow long term pinning of hugetlb
>>>>>> pages allocated from THE hugetlb CMA area.
>>
>> Hm. We primarily use CMA for gigantic pages only IIRC. Ordinary huge
>> pages come via the buddy.
>>
>> Assume we allocated a (movable) 2MiB huge page ordinarily via the buddy
>> and it ended up on that CMA area by pure luck (as it's movable). If we'd
>> allow to pin it long-term, allocating a gigantic page from the
>> designated CMA area would fail.
>
> If we allow the longterm pin against the hugetlb page come via buddy,
> it should be migrated out of CMA before the longterm pinning by
> check_and_migrate_movable_pages, IIUC.

Yes.

> If so, what the allocating a giganitc page from the designated CMA area
> would fail?

Nothing I just summarized it.

>
>>
>> So we'd want to allow long-term pinning a gigantic page but we'd not
>> want to allow long-term pinning an ordinary huge page. We'd want to
>> migrate the latter away.
>
> Sure. Gigantic page was already CMA claimed page so there is no user
> in the future to claim the memory again so fine to allow longterm pin
> but ordinary huge page shouldn't be allowed since CMA owner could
> claim the memory some day.
>

Right.

>>
>>
>> The general rules are:
>>
>> ZONE_MOVABLE: nobody is allowed to place unmovable allocations there; it
>> could prevent memory offlining/unplug.
>>
>> CMA: nobody *but the designated owner* is allowed to place unmovable
>> memory there; it could prevent the actual owner to allocate contiguous
>> memory.
>
> I am confused what's the meaning of designated owner and actuall owner
> in your context.

designated==actual here. I just wanted to distinguish from someone
current temporary owner of the page ("allocated it via a movable
allocation") but the actual designated owner (e.g., hugetlb CMA)

The page/memory owner terminology is just confusing. Let's rephrase to:
only the CMA area owner is allowed to place unmovable allocations there.

>
> What I thought about the issue based on you explanation:
>
> HugeTLB allocates its page by two types of allocation
>
> 1. alloc_pages(GFP_MOVABLE)
>
> It could allocate the hugetlb page from CMA area but longterm pin
> should migrate them out of cma before the pinning so allowing
> the pinning on the page is no problem and current code works like
> that.
>
> check_and_migrate_movable_pages
>

Yes.

> 2. cma_alloc
>
> The cma_alloc is used only for *gigantic page* and the hugetlbfs
> is the very owner of the page. IOW, if the hugetlbfs was succeeded
> to allocate the gigantic page by cma_alloc, there is no other
> owner to be able to claim the page any longer so it's fine to
> allow longterm pinning againt the gingantic page but current.
> However, current code doesn't work like that due to
> is_pinnable_page. IOW, hugetlbfs need a way to distinguish
> whether the page owner is hugetlbfs or not.
>
> Are we on same page?

Yes, exactly. What I wanted to express is: for huge pages we have to
make a smarter decision because there are cases where we want to
migrate, and cases where we don't want to migrate.


--
Thanks,

David / dhildenb


2022-05-23 06:17:16

by Minchan Kim

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

On Fri, May 20, 2022 at 03:56:31PM -0700, John Hubbard wrote:
> On 5/20/22 15:19, Minchan Kim wrote:
> > The memory offline would be an issue so we shouldn't allow pinning of any
> > pages in *movable zone*.
> >
> > Isn't alloc_contig_range just best effort? Then, it wouldn't be a big
> > problem to allow pinning on those area. The matter is what target range
> > on alloc_contig_range is backed by CMA or movable zone and usecases.
> >
> > IOW, movable zone should be never allowed. But CMA case, if pages
> > are used by normal process memory instead of hugeTLB, we shouldn't
> > allow longterm pinning since someone can claim those memory suddenly.
> > However, we are fine to allow longterm pinning if the CMA memory
> > already claimed and mapped at userspace(hugeTLB case IIUC).
> >
>
> From Mike's comments and yours, plus a rather quick reading of some
> CMA-related code in mm/hugetlb.c (free_gigantic_page(),
> alloc_gigantic_pages()), the following seems true:
>
> a) hugetlbfs can allocate pages *from* CMA, via cma_alloc()
>
> b) while hugetlbfs is using those CMA-allocated pages, it is debatable
> whether those pages should be allowed to be long term pinned. That's
> because there are two cases:
>
> Case 1: pages are longterm pinned, then released, all while
> owned by hugetlbfs. No problem.
>
> Case 2: pages are longterm pinned, but then hugetlbfs releases the

Longterm pinned means the hugetlbfs page were mapped at userspace and
someone called FOLL_LONGTERM against on the page?

> pages entirely (via unmounting hugetlbfs, I presume). In

Then, how can FS unmount successfully while something is accessing
on the page of the file in FS? (I expected FS should return -EBUSY).
Does hugetlbfs have something special?


> this case, we now have CMA page that are long-term pinned,
> and that's the state we want to avoid.

2022-05-23 06:23:40

by Mike Kravetz

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

On 5/21/22 11:25, Minchan Kim wrote:
> On Sat, May 21, 2022 at 06:46:27PM +0200, David Hildenbrand wrote:
>
> < snip >
>
>>>> The general rules are:
>>>>
>>>> ZONE_MOVABLE: nobody is allowed to place unmovable allocations there; it
>>>> could prevent memory offlining/unplug.
>>>>
>>>> CMA: nobody *but the designated owner* is allowed to place unmovable
>>>> memory there; it could prevent the actual owner to allocate contiguous
>>>> memory.
>>>
>>> I am confused what's the meaning of designated owner and actuall owner
>>> in your context.
>>
>> designated==actual here. I just wanted to distinguish from someone
>> current temporary owner of the page ("allocated it via a movable
>> allocation") but the actual designated owner (e.g., hugetlb CMA)
>>
>> The page/memory owner terminology is just confusing. Let's rephrase to:
>> only the CMA area owner is allowed to place unmovable allocations there.
>
> Yeah, the CMA area owner is much better.
>
>>
>>>
>>> What I thought about the issue based on you explanation:
>>>
>>> HugeTLB allocates its page by two types of allocation
>>>
>>> 1. alloc_pages(GFP_MOVABLE)
>>>
>>> It could allocate the hugetlb page from CMA area but longterm pin
>>> should migrate them out of cma before the pinning so allowing
>>> the pinning on the page is no problem and current code works like
>>> that.
>>>
>>> check_and_migrate_movable_pages
>>>
>>
>> Yes.
>>
>>> 2. cma_alloc
>>>
>>> The cma_alloc is used only for *gigantic page* and the hugetlbfs
>>> is the very owner of the page. IOW, if the hugetlbfs was succeeded
>>> to allocate the gigantic page by cma_alloc, there is no other
>>> owner to be able to claim the page any longer so it's fine to
>>> allow longterm pinning againt the gingantic page but current.
>>> However, current code doesn't work like that due to
>>> is_pinnable_page. IOW, hugetlbfs need a way to distinguish
>>> whether the page owner is hugetlbfs or not.
>>>
>>> Are we on same page?
>>
>> Yes, exactly. What I wanted to express is: for huge pages we have to
>> make a smarter decision because there are cases where we want to
>> migrate, and cases where we don't want to migrate.
>
> Sure, maybe hugetlbfs could squeeze a bit in one of subpage of the
> CMA compound page. "I am CMA allocated but allow to pinned for longterm"
>

Thanks for all the ideas here. Yes, we already have a whole word for hugetlb
specific page flags (see hugetlb_page_flags in linux/hugetlb.h). I'm pretty
sure I even proposed a 'allocated from CMA' flag, but there was another way to
get that information. We can add such a flag to for the purpose if making a
decision about long term pinning.

BTW - It is possible that a gigantic page allocated in CMA could be demoted
(split) into smaller hugetlb pages. I 'think' we would also want to allow
long term pinning in this case.
--
Mike Kravetz

2022-05-23 06:47:19

by Mike Kravetz

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

On 5/20/22 15:56, John Hubbard wrote:
> On 5/20/22 15:19, Minchan Kim wrote:
>> The memory offline would be an issue so we shouldn't allow pinning of any
>> pages in *movable zone*.
>>
>> Isn't alloc_contig_range just best effort? Then, it wouldn't be a big
>> problem to allow pinning on those area. The matter is what target range
>> on alloc_contig_range is backed by CMA or movable zone and usecases.
>>
>> IOW, movable zone should be never allowed. But CMA case, if pages
>> are used by normal process memory instead of hugeTLB, we shouldn't
>> allow longterm pinning since someone can claim those memory suddenly.
>> However, we are fine to allow longterm pinning if the CMA memory
>> already claimed and mapped at userspace(hugeTLB case IIUC).
>>
>
> From Mike's comments and yours, plus a rather quick reading of some
> CMA-related code in mm/hugetlb.c (free_gigantic_page(), alloc_gigantic_pages()), the following seems true:
>
> a) hugetlbfs can allocate pages *from* CMA, via cma_alloc()
>
> b) while hugetlbfs is using those CMA-allocated pages, it is debatable
> whether those pages should be allowed to be long term pinned. That's
> because there are two cases:
>
>     Case 1: pages are longterm pinned, then released, all while
>             owned by hugetlbfs. No problem.
>
>     Case 2: pages are longterm pinned, but then hugetlbfs releases the
>             pages entirely (via unmounting hugetlbfs, I presume). In
>             this case, we now have CMA page that are long-term pinned,
>             and that's the state we want to avoid.

I do not think case 2 can happen. A hugetlb page can only be changed back
to 'normal' (buddy) pages when ref count goes to zero.

It should also be noted that hugetlb code sets up the CMA area from which
hugetlb pages can be allocated. This area is never unreserved/freed.

I do not think there is a reason to disallow long term pinning of hugetlb
pages allocated from THE hugetlb CMA area.

But, I wonder if it is possible for hugetlb pages to be allocated from
another (non-hugetlb) area. For example if someone sets up a huge CMA area
and hugetlb allocations spill over into that area. If this is possible
(still need to research), then we would not want to long term pin such
hugetlb pages. We can check this in the hugetlb code to determine if
long term pinning is allowed.

>
> The reason it is debatable is that hugetlbfs is intended to be used
> long term, itself. The expected use cases do not normally include a
> lot of short term mounting and unmounting.
>
> And whichever way that debate goes, we need to allow it to be
> fixable, by not tying "is pinnable" to "using gup/pup". The caller
> has the context that is needed to make that policy decision, but
> gup/pup does not.
>
> At this point, I think it's time to fix up the problems and restore
> previous behavior, by choosing Case 1 behavior for now. And also
> lifting the is_pinnable_page() checks up a level, as noted in my
> other thread.  I can do that, unless someone sees a flaw in the
> reasoning.

Go for it.

--
Mike Kravetz

2022-05-23 07:06:01

by Mike Kravetz

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

On 5/20/22 16:43, Minchan Kim wrote:
> On Fri, May 20, 2022 at 04:31:31PM -0700, Mike Kravetz wrote:
>> On 5/20/22 15:56, John Hubbard wrote:
>>> On 5/20/22 15:19, Minchan Kim wrote:
>>>> The memory offline would be an issue so we shouldn't allow pinning of any
>>>> pages in *movable zone*.
>>>>
>>>> Isn't alloc_contig_range just best effort? Then, it wouldn't be a big
>>>> problem to allow pinning on those area. The matter is what target range
>>>> on alloc_contig_range is backed by CMA or movable zone and usecases.
>>>>
>>>> IOW, movable zone should be never allowed. But CMA case, if pages
>>>> are used by normal process memory instead of hugeTLB, we shouldn't
>>>> allow longterm pinning since someone can claim those memory suddenly.
>>>> However, we are fine to allow longterm pinning if the CMA memory
>>>> already claimed and mapped at userspace(hugeTLB case IIUC).
>>>>
>>>
>>> From Mike's comments and yours, plus a rather quick reading of some
>>> CMA-related code in mm/hugetlb.c (free_gigantic_page(), alloc_gigantic_pages()), the following seems true:
>>>
>>> a) hugetlbfs can allocate pages *from* CMA, via cma_alloc()
>>>
>>> b) while hugetlbfs is using those CMA-allocated pages, it is debatable
>>> whether those pages should be allowed to be long term pinned. That's
>>> because there are two cases:
>>>
>>>     Case 1: pages are longterm pinned, then released, all while
>>>             owned by hugetlbfs. No problem.
>>>
>>>     Case 2: pages are longterm pinned, but then hugetlbfs releases the
>>>             pages entirely (via unmounting hugetlbfs, I presume). In
>>>             this case, we now have CMA page that are long-term pinned,
>>>             and that's the state we want to avoid.
>>
>> I do not think case 2 can happen. A hugetlb page can only be changed back
>> to 'normal' (buddy) pages when ref count goes to zero.
>>
>> It should also be noted that hugetlb code sets up the CMA area from which
>> hugetlb pages can be allocated. This area is never unreserved/freed.
>>
>> I do not think there is a reason to disallow long term pinning of hugetlb
>> pages allocated from THE hugetlb CMA area.
>>
>> But, I wonder if it is possible for hugetlb pages to be allocated from
>> another (non-hugetlb) area. For example if someone sets up a huge CMA area
>> and hugetlb allocations spill over into that area. If this is possible
>> (still need to research), then we would not want to long term pin such
>> hugetlb pages. We can check this in the hugetlb code to determine if
>> long term pinning is allowed.
>
> I don't think it's possible because cma_alloc needs "struct cma" just
> like handle and VM doesn't maintain any fallback list of cma chains
> so unless someone could steal the handle somehow, there is no way to
> claim memory others reserved for the CMA purpose.

I was thinking about the case where a hugetlb page is allocated via
__alloc_pages(). Not sure if that can fall back to a CMA area that
someone else might have created/reserved.

Unless I do not understand, normal movable memory allocations can fall
back to CMA areas?
--
Mike Kravetz

2022-05-23 07:14:44

by Minchan Kim

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

On Fri, May 20, 2022 at 04:31:31PM -0700, Mike Kravetz wrote:
> On 5/20/22 15:56, John Hubbard wrote:
> > On 5/20/22 15:19, Minchan Kim wrote:
> >> The memory offline would be an issue so we shouldn't allow pinning of any
> >> pages in *movable zone*.
> >>
> >> Isn't alloc_contig_range just best effort? Then, it wouldn't be a big
> >> problem to allow pinning on those area. The matter is what target range
> >> on alloc_contig_range is backed by CMA or movable zone and usecases.
> >>
> >> IOW, movable zone should be never allowed. But CMA case, if pages
> >> are used by normal process memory instead of hugeTLB, we shouldn't
> >> allow longterm pinning since someone can claim those memory suddenly.
> >> However, we are fine to allow longterm pinning if the CMA memory
> >> already claimed and mapped at userspace(hugeTLB case IIUC).
> >>
> >
> > From Mike's comments and yours, plus a rather quick reading of some
> > CMA-related code in mm/hugetlb.c (free_gigantic_page(), alloc_gigantic_pages()), the following seems true:
> >
> > a) hugetlbfs can allocate pages *from* CMA, via cma_alloc()
> >
> > b) while hugetlbfs is using those CMA-allocated pages, it is debatable
> > whether those pages should be allowed to be long term pinned. That's
> > because there are two cases:
> >
> > ??? Case 1: pages are longterm pinned, then released, all while
> > ??????????? owned by hugetlbfs. No problem.
> >
> > ??? Case 2: pages are longterm pinned, but then hugetlbfs releases the
> > ??????????? pages entirely (via unmounting hugetlbfs, I presume). In
> > ??????????? this case, we now have CMA page that are long-term pinned,
> > ??????????? and that's the state we want to avoid.
>
> I do not think case 2 can happen. A hugetlb page can only be changed back
> to 'normal' (buddy) pages when ref count goes to zero.
>
> It should also be noted that hugetlb code sets up the CMA area from which
> hugetlb pages can be allocated. This area is never unreserved/freed.
>
> I do not think there is a reason to disallow long term pinning of hugetlb
> pages allocated from THE hugetlb CMA area.
>
> But, I wonder if it is possible for hugetlb pages to be allocated from
> another (non-hugetlb) area. For example if someone sets up a huge CMA area
> and hugetlb allocations spill over into that area. If this is possible
> (still need to research), then we would not want to long term pin such
> hugetlb pages. We can check this in the hugetlb code to determine if
> long term pinning is allowed.

I don't think it's possible because cma_alloc needs "struct cma" just
like handle and VM doesn't maintain any fallback list of cma chains
so unless someone could steal the handle somehow, there is no way to
claim memory others reserved for the CMA purpose.

2022-05-23 07:32:04

by David Hildenbrand

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

On 21.05.22 17:24, Minchan Kim wrote:
> On Fri, May 20, 2022 at 05:04:22PM -0700, Mike Kravetz wrote:
>> On 5/20/22 16:43, Minchan Kim wrote:
>>> On Fri, May 20, 2022 at 04:31:31PM -0700, Mike Kravetz wrote:
>>>> On 5/20/22 15:56, John Hubbard wrote:
>>>>> On 5/20/22 15:19, Minchan Kim wrote:
>>>>>> The memory offline would be an issue so we shouldn't allow pinning of any
>>>>>> pages in *movable zone*.
>>>>>>
>>>>>> Isn't alloc_contig_range just best effort? Then, it wouldn't be a big
>>>>>> problem to allow pinning on those area. The matter is what target range
>>>>>> on alloc_contig_range is backed by CMA or movable zone and usecases.
>>>>>>
>>>>>> IOW, movable zone should be never allowed. But CMA case, if pages
>>>>>> are used by normal process memory instead of hugeTLB, we shouldn't
>>>>>> allow longterm pinning since someone can claim those memory suddenly.
>>>>>> However, we are fine to allow longterm pinning if the CMA memory
>>>>>> already claimed and mapped at userspace(hugeTLB case IIUC).
>>>>>>
>>>>>
>>>>> From Mike's comments and yours, plus a rather quick reading of some
>>>>> CMA-related code in mm/hugetlb.c (free_gigantic_page(), alloc_gigantic_pages()), the following seems true:
>>>>>
>>>>> a) hugetlbfs can allocate pages *from* CMA, via cma_alloc()
>>>>>
>>>>> b) while hugetlbfs is using those CMA-allocated pages, it is debatable
>>>>> whether those pages should be allowed to be long term pinned. That's
>>>>> because there are two cases:
>>>>>
>>>>>     Case 1: pages are longterm pinned, then released, all while
>>>>>             owned by hugetlbfs. No problem.
>>>>>
>>>>>     Case 2: pages are longterm pinned, but then hugetlbfs releases the
>>>>>             pages entirely (via unmounting hugetlbfs, I presume). In
>>>>>             this case, we now have CMA page that are long-term pinned,
>>>>>             and that's the state we want to avoid.
>>>>
>>>> I do not think case 2 can happen. A hugetlb page can only be changed back
>>>> to 'normal' (buddy) pages when ref count goes to zero.
>>>>
>>>> It should also be noted that hugetlb code sets up the CMA area from which
>>>> hugetlb pages can be allocated. This area is never unreserved/freed.
>>>>
>>>> I do not think there is a reason to disallow long term pinning of hugetlb
>>>> pages allocated from THE hugetlb CMA area.

Hm. We primarily use CMA for gigantic pages only IIRC. Ordinary huge
pages come via the buddy.

Assume we allocated a (movable) 2MiB huge page ordinarily via the buddy
and it ended up on that CMA area by pure luck (as it's movable). If we'd
allow to pin it long-term, allocating a gigantic page from the
designated CMA area would fail.

So we'd want to allow long-term pinning a gigantic page but we'd not
want to allow long-term pinning an ordinary huge page. We'd want to
migrate the latter away.


The general rules are:

ZONE_MOVABLE: nobody is allowed to place unmovable allocations there; it
could prevent memory offlining/unplug.

CMA: nobody *but the designated owner* is allowed to place unmovable
memory there; it could prevent the actual owner to allocate contiguous
memory.

As explained above, it gets a bit weird if the owner (hugetlb) deals
with different allocation types (huge vs. gigantic pages).
>> Unless I do not understand, normal movable memory allocations can fall
>> back to CMA areas?

Yes, just like ZONE_MOVABLE IIRC.

>
> In the case, Yes, it would be fallback if gfp_flag was __GFP_MOVABLE.
>
> If HugeTLB support it(I think so), pin_user_pages with FOLL_LONGTERM
> will migrate the page out of movable/CMA before the longterm pinning
> so IMHO, we shouldn't have the problem.

As explained, the tricky bit would be hitting a gigantic page that's
valid to reside permanently on the designated CMA area. IIRC, some
gigantic pages are indeed movable, but we never place them on
ZONE_MOVABLE because migration is unlikely to work in practice.


--
Thanks,

David / dhildenb


2022-05-23 07:36:54

by Minchan Kim

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

On Mon, May 16, 2022 at 08:37:01PM -0700, Mike Kravetz wrote:

< snip >

> > I need to look at this a little more closely, it is making me wonder
> > whether the is_pinnable_page() check is a problem in this path. The
> > comment in try_grab_folio() indicates that the early return is a hack
> > (it assumes that the caller is in the gup fast path), and maybe the hack
> > is just wrong here--I think we're actually on the slow gup path. Not
> > good.
> >
> > Mike, any thoughts here?
> >
>
> Do you know why try_grab_compound_page(now try_grab_folio) checks for
> pinnable when try_grab_page does not?
>
> Then I guess the next question is 'Should we allow pinning of hugetlb pages
> in these areas?'. My first thought would be no. But, recall it was 'allowed'
> until that commit which changed try_grab_page to try_grab_compound_page.

The reason we don't allow longterm pinning in CMA area is to improve
big contigus memory allocation sccuess ratio when someone claim the memory
space. Thus, any pages mapped at userspace given the CMA area shouldn't be
pinned with longterm. Otherwise, the cma_alloc will fail due to migration
failure.

In hugetlb case(I might miss something..), the CMA memory was already
claimed by hugeTLB and the big contiguous memory was mapped at userspace
so there is no reason to prevent longterm pinning since HugeTLB will
never claim those CMA memory until user release the memory and HugeTLB
free the range using cma_release.

> In the 'common' case of compaction, we do not attempt to migrate/move hugetlb
> pages (last time I looked), so long term pinning should not be an issue.
> However, for things like memory offline or alloc_contig_range() we want to

The memory offline would be an issue so we shouldn't allow pinning of any
pages in *movable zone*.

Isn't alloc_contig_range just best effort? Then, it wouldn't be a big
problem to allow pinning on those area. The matter is what target range
on alloc_contig_range is backed by CMA or movable zone and usecases.

IOW, movable zone should be never allowed. But CMA case, if pages
are used by normal process memory instead of hugeTLB, we shouldn't
allow longterm pinning since someone can claim those memory suddenly.
However, we are fine to allow longterm pinning if the CMA memory
already claimed and mapped at userspace(hugeTLB case IIUC).

Please correct me if I miss something.

Thanks.

2022-05-23 07:45:35

by Minchan Kim

[permalink] [raw]
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

On Sat, May 21, 2022 at 05:51:58PM +0200, David Hildenbrand wrote:
> On 21.05.22 17:24, Minchan Kim wrote:
> > On Fri, May 20, 2022 at 05:04:22PM -0700, Mike Kravetz wrote:
> >> On 5/20/22 16:43, Minchan Kim wrote:
> >>> On Fri, May 20, 2022 at 04:31:31PM -0700, Mike Kravetz wrote:
> >>>> On 5/20/22 15:56, John Hubbard wrote:
> >>>>> On 5/20/22 15:19, Minchan Kim wrote:
> >>>>>> The memory offline would be an issue so we shouldn't allow pinning of any
> >>>>>> pages in *movable zone*.
> >>>>>>
> >>>>>> Isn't alloc_contig_range just best effort? Then, it wouldn't be a big
> >>>>>> problem to allow pinning on those area. The matter is what target range
> >>>>>> on alloc_contig_range is backed by CMA or movable zone and usecases.
> >>>>>>
> >>>>>> IOW, movable zone should be never allowed. But CMA case, if pages
> >>>>>> are used by normal process memory instead of hugeTLB, we shouldn't
> >>>>>> allow longterm pinning since someone can claim those memory suddenly.
> >>>>>> However, we are fine to allow longterm pinning if the CMA memory
> >>>>>> already claimed and mapped at userspace(hugeTLB case IIUC).
> >>>>>>
> >>>>>
> >>>>> From Mike's comments and yours, plus a rather quick reading of some
> >>>>> CMA-related code in mm/hugetlb.c (free_gigantic_page(), alloc_gigantic_pages()), the following seems true:
> >>>>>
> >>>>> a) hugetlbfs can allocate pages *from* CMA, via cma_alloc()
> >>>>>
> >>>>> b) while hugetlbfs is using those CMA-allocated pages, it is debatable
> >>>>> whether those pages should be allowed to be long term pinned. That's
> >>>>> because there are two cases:
> >>>>>
> >>>>> ??? Case 1: pages are longterm pinned, then released, all while
> >>>>> ??????????? owned by hugetlbfs. No problem.
> >>>>>
> >>>>> ??? Case 2: pages are longterm pinned, but then hugetlbfs releases the
> >>>>> ??????????? pages entirely (via unmounting hugetlbfs, I presume). In
> >>>>> ??????????? this case, we now have CMA page that are long-term pinned,
> >>>>> ??????????? and that's the state we want to avoid.
> >>>>
> >>>> I do not think case 2 can happen. A hugetlb page can only be changed back
> >>>> to 'normal' (buddy) pages when ref count goes to zero.
> >>>>
> >>>> It should also be noted that hugetlb code sets up the CMA area from which
> >>>> hugetlb pages can be allocated. This area is never unreserved/freed.
> >>>>
> >>>> I do not think there is a reason to disallow long term pinning of hugetlb
> >>>> pages allocated from THE hugetlb CMA area.
>
> Hm. We primarily use CMA for gigantic pages only IIRC. Ordinary huge
> pages come via the buddy.
>
> Assume we allocated a (movable) 2MiB huge page ordinarily via the buddy
> and it ended up on that CMA area by pure luck (as it's movable). If we'd
> allow to pin it long-term, allocating a gigantic page from the
> designated CMA area would fail.

If we allow the longterm pin against the hugetlb page come via buddy,
it should be migrated out of CMA before the longterm pinning by
check_and_migrate_movable_pages, IIUC.
If so, what the allocating a giganitc page from the designated CMA area
would fail?

>
> So we'd want to allow long-term pinning a gigantic page but we'd not
> want to allow long-term pinning an ordinary huge page. We'd want to
> migrate the latter away.

Sure. Gigantic page was already CMA claimed page so there is no user
in the future to claim the memory again so fine to allow longterm pin
but ordinary huge page shouldn't be allowed since CMA owner could
claim the memory some day.

>
>
> The general rules are:
>
> ZONE_MOVABLE: nobody is allowed to place unmovable allocations there; it
> could prevent memory offlining/unplug.
>
> CMA: nobody *but the designated owner* is allowed to place unmovable
> memory there; it could prevent the actual owner to allocate contiguous
> memory.

I am confused what's the meaning of designated owner and actuall owner
in your context.

What I thought about the issue based on you explanation:

HugeTLB allocates its page by two types of allocation

1. alloc_pages(GFP_MOVABLE)

It could allocate the hugetlb page from CMA area but longterm pin
should migrate them out of cma before the pinning so allowing
the pinning on the page is no problem and current code works like
that.

check_and_migrate_movable_pages

2. cma_alloc

The cma_alloc is used only for *gigantic page* and the hugetlbfs
is the very owner of the page. IOW, if the hugetlbfs was succeeded
to allocate the gigantic page by cma_alloc, there is no other
owner to be able to claim the page any longer so it's fine to
allow longterm pinning againt the gingantic page but current.
However, current code doesn't work like that due to
is_pinnable_page. IOW, hugetlbfs need a way to distinguish
whether the page owner is hugetlbfs or not.

Are we on same page?