2022-05-18 23:12:41

by Mike Kravetz

[permalink] [raw]
Subject: vma_needs_copy always true for VM_HUGETLB ?

For most non-anonymous vmas, we do not copy page tables at fork time, but
rather lazily populate the tables after fork via faults. The routine
vma_needs_copy() is used to make this decision. For VM_HUGETLB vmas, it always
returns true.

Anyone know/remember why? The code was added more than 15 years ago and
my search for why hugetlb vmas were excluded came up empty.

I do not see a reason why VM_HUGETLB is in this list. Initial testing did
not reveal any problems when I removed the VM_HUGETLB check.

FYI - I am looking at the performance of fork and exec (unmap) of processes
with very large hugetlb mappings. Skipping the copy at fork time would
certainly speed things up. Of course, there could some users who would
notice if hugetlb page tables are not copied at fork time. However, this
is the behavior for 'normal' mappings. I am inclined to make hugetlb be
'more normal'.
--
Mike Kravetz


2022-05-19 04:29:49

by Mike Kravetz

[permalink] [raw]
Subject: Re: vma_needs_copy always true for VM_HUGETLB ?

On 5/18/22 18:30, Hugh Dickins wrote:
> On Wed, 18 May 2022, Mike Kravetz wrote:
>
>> For most non-anonymous vmas, we do not copy page tables at fork time, but
>> rather lazily populate the tables after fork via faults. The routine
>> vma_needs_copy() is used to make this decision. For VM_HUGETLB vmas, it always
>> returns true.
>
> "vma_needs_copy()" is *very* recent coinage, not reached Linus yet.
>
>>
>> Anyone know/remember why? The code was added more than 15 years ago and
>> my search for why hugetlb vmas were excluded came up empty.
>>
>> I do not see a reason why VM_HUGETLB is in this list. Initial testing did
>> not reveal any problems when I removed the VM_HUGETLB check.
>>
>> FYI - I am looking at the performance of fork and exec (unmap) of processes
>> with very large hugetlb mappings. Skipping the copy at fork time would
>> certainly speed things up. Of course, there could some users who would
>> notice if hugetlb page tables are not copied at fork time. However, this
>> is the behavior for 'normal' mappings. I am inclined to make hugetlb be
>> 'more normal'.
>
> Good question, not obvious to me either: but I've found the answer.

Thank you Hugh! You went above and beyond as usual.

> The commit was of course Nick's d992895ba2b2 ("[PATCH] Lazy page table
> copies in fork()") in 2.6.14; but it doesn't explain why VM_HUGETLB is
> there in the test, and goes on to be copied.
>
> I haven't re-read through the whole mail thread which led to that
> commit, but I think you'll find the crucial observation comes from
> Andi in https://lore.kernel.org/lkml/[email protected]/#t

Sorry, that I did not find the entire thread. There were only a couple
pieces on linux-mm and that is the only place I looked.

>
> "Actually I disabled it for hugetlbfs (... !is_huge...vma). The reason
> is that lazy faulting for huge pages is still not in mainline."
>
> and indeed, look at the 2.6.13 or 2.6.14 mm/hugetlb.c and you find
> /*
> * We cannot handle pagefaults against hugetlb pages at all. They cause
> * handle_mm_fault() to try to instantiate regular-sized pages in the
> * hugegpage VMA. do_page_fault() is supposed to trap this, so BUG is we get
> * this far.
> */
> static struct page *hugetlb_nopage(struct vm_area_struct *vma,
> unsigned long address, int *unused)
> {
> BUG();
> return NULL;
> }
>
> Oh, and that pretty much still exists to this day, to cover that path
> to a fault; but 2.6.16 implemented hugetlb_no_page(), which is what
> then actually got used to satisfy a hugetlb fault.
>
> So the reason for fork copying VM_HUGETLB appears to have gone away
> in 2.6.16.

Yes, that is the likely reason. Functionality was not originally
supported, and when it was added this 'optimization' was not enabled.

> (I haven't a clue on private hugetlb mappings and reservations and
> whether anon_vma means the same on hugetlb, but you know all that.)

Yes, I believe anon_vma means the same on hugetlb for this purpose.
Although, I do need to look closer just to make sure there are not
hidden surprises.

Thanks again,
--
Mike Kravetz

2022-05-19 12:47:36

by Hugh Dickins

[permalink] [raw]
Subject: Re: vma_needs_copy always true for VM_HUGETLB ?

On Wed, 18 May 2022, Mike Kravetz wrote:

> For most non-anonymous vmas, we do not copy page tables at fork time, but
> rather lazily populate the tables after fork via faults. The routine
> vma_needs_copy() is used to make this decision. For VM_HUGETLB vmas, it always
> returns true.

"vma_needs_copy()" is *very* recent coinage, not reached Linus yet.

>
> Anyone know/remember why? The code was added more than 15 years ago and
> my search for why hugetlb vmas were excluded came up empty.
>
> I do not see a reason why VM_HUGETLB is in this list. Initial testing did
> not reveal any problems when I removed the VM_HUGETLB check.
>
> FYI - I am looking at the performance of fork and exec (unmap) of processes
> with very large hugetlb mappings. Skipping the copy at fork time would
> certainly speed things up. Of course, there could some users who would
> notice if hugetlb page tables are not copied at fork time. However, this
> is the behavior for 'normal' mappings. I am inclined to make hugetlb be
> 'more normal'.

Good question, not obvious to me either: but I've found the answer.

The commit was of course Nick's d992895ba2b2 ("[PATCH] Lazy page table
copies in fork()") in 2.6.14; but it doesn't explain why VM_HUGETLB is
there in the test, and goes on to be copied.

I haven't re-read through the whole mail thread which led to that
commit, but I think you'll find the crucial observation comes from
Andi in https://lore.kernel.org/lkml/[email protected]/#t

"Actually I disabled it for hugetlbfs (... !is_huge...vma). The reason
is that lazy faulting for huge pages is still not in mainline."

and indeed, look at the 2.6.13 or 2.6.14 mm/hugetlb.c and you find
/*
* We cannot handle pagefaults against hugetlb pages at all. They cause
* handle_mm_fault() to try to instantiate regular-sized pages in the
* hugegpage VMA. do_page_fault() is supposed to trap this, so BUG is we get
* this far.
*/
static struct page *hugetlb_nopage(struct vm_area_struct *vma,
unsigned long address, int *unused)
{
BUG();
return NULL;
}

Oh, and that pretty much still exists to this day, to cover that path
to a fault; but 2.6.16 implemented hugetlb_no_page(), which is what
then actually got used to satisfy a hugetlb fault.

So the reason for fork copying VM_HUGETLB appears to have gone away
in 2.6.16.

(I haven't a clue on private hugetlb mappings and reservations and
whether anon_vma means the same on hugetlb, but you know all that.)

Hugh