2021-06-22 03:24:03

by Peilin Ye

[permalink] [raw]
Subject: [PATCH] docs: x86: Remove obsolete information about x86_64 vmalloc() faulting

x86_64 vmalloc() mappings are no longer "lazily synchronized" among page
tables via page fault handling since commit 7f0a002b5a21 ("x86/mm: remove
vmalloc faulting"). Subsequently, commit 6eb82f994026 ("x86/mm:
Pre-allocate P4D/PUD pages for vmalloc area") rendered it unnecessary to
synchronize, whether lazily or not, x86_64 vmalloc() mappings at runtime,
since the corresponding P4D or PUD pages are now preallocated during
system initialization by preallocate_vmalloc_pages(). Drop the "lazily
synchronized" description for less confusion.

It is worth noting, however, that there is still a slight complication for
x86_32; see commit 4819e15f740e ("x86/mm/32: Bring back vmalloc faulting
on x86_32") for details.

Signed-off-by: Peilin Ye <[email protected]>
---
Hi all,

I was trying to understand vmalloc() when I saw this "lazily synchronized"
statement, which confused me for a while. Please correct me if my
understanding is wrong or out of date.

Thank you,
Peilin Ye

Documentation/x86/x86_64/mm.rst | 4 ----
1 file changed, 4 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.rst b/Documentation/x86/x86_64/mm.rst
index ede1875719fb..9798676bb0bf 100644
--- a/Documentation/x86/x86_64/mm.rst
+++ b/Documentation/x86/x86_64/mm.rst
@@ -140,10 +140,6 @@ The direct mapping covers all memory in the system up to the highest
memory address (this means in some cases it can also include PCI memory
holes).

-vmalloc space is lazily synchronized into the different PML4/PML5 pages of
-the processes using the page fault handler, with init_top_pgt as
-reference.
-
We map EFI runtime services in the 'efi_pgd' PGD in a 64Gb large virtual
memory window (this size is arbitrary, it can be raised later if needed).
The mappings are not part of any other kernel PGD and are only available
--
2.25.1


2021-07-16 06:12:18

by Peilin Ye

[permalink] [raw]
Subject: Re: [PATCH] docs: x86: Remove obsolete information about x86_64 vmalloc() faulting

Hi all,

> diff --git a/Documentation/x86/x86_64/mm.rst b/Documentation/x86/x86_64/mm.rst
> index ede1875719fb..9798676bb0bf 100644
> --- a/Documentation/x86/x86_64/mm.rst
> +++ b/Documentation/x86/x86_64/mm.rst
> @@ -140,10 +140,6 @@ The direct mapping covers all memory in the system up to the highest
> memory address (this means in some cases it can also include PCI memory
> holes).
>
> -vmalloc space is lazily synchronized into the different PML4/PML5 pages of
> -the processes using the page fault handler, with init_top_pgt as
> -reference.

This information is out-of-date, and it took me quite some time of
ftrace'ing before I figured it out... I think it would be beneficial to
update, or at least remove it.

As a proof that I understand what I am talking about, on my x86_64 box:

1. I allocated a vmalloc() area containing linear address `addr`;
2. I manually pagewalked `addr` in different page tables, including
`init_mm.pgd`;
3. The corresponding PGD entries for `addr` in different page tables,
they all immediately pointed at the same PUD table (my box uses
4-level paging), at the same physical address;
4. No "lazy synchronization" via page fault handling happened at all,
since it is the same PUD table pre-allocated by
preallocate_vmalloc_pages() during boot time.

Commit 6eb82f994026 ("x86/mm: Pre-allocate P4D/PUD pages for vmalloc
area") documented this clearly:

"""
Doing this at boot makes sure no synchronization of that area is
necessary at runtime.
"""

Should we remove this sentence, or update it? Any ideas?

Sincerely,
Peilin Ye

2021-07-19 12:35:23

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH] docs: x86: Remove obsolete information about x86_64 vmalloc() faulting

Hi,

On Fri, Jul 16, 2021 at 02:09:58AM -0400, Peilin Ye wrote:
> This information is out-of-date, and it took me quite some time of
> ftrace'ing before I figured it out... I think it would be beneficial to
> update, or at least remove it.
>
> As a proof that I understand what I am talking about, on my x86_64 box:
>
> 1. I allocated a vmalloc() area containing linear address `addr`;
> 2. I manually pagewalked `addr` in different page tables, including
> `init_mm.pgd`;
> 3. The corresponding PGD entries for `addr` in different page tables,
> they all immediately pointed at the same PUD table (my box uses
> 4-level paging), at the same physical address;
> 4. No "lazy synchronization" via page fault handling happened at all,
> since it is the same PUD table pre-allocated by
> preallocate_vmalloc_pages() during boot time.

Yes, this is the story for x86-64, because all PUD/P4D pages for the vmalloc
area are pre-allocated at boot. So no faulting or synchronization needs
to happen.

On x86-32 this is a bit different. Pre-allocation of PMD/PTE pages is
not an option there (even less when 4MB large-pages with 2-level paging
come into the picture).

So what happens there is that vmalloc related changes to the init_mm.pgd
are synchronized to all page-tables in the system. But this
synchronization is subject to race conditions in a way that another CPU
might vmalloc an area below a PMD which is not fully synchronized yet.

When this happens there is a fault, which is handled as a vmalloc()
fault on x86-32 just as before. So vmalloc faults still exist on 32-bit,
they are just less likely as they used to be.

Regards,

Joerg

2021-07-20 04:52:29

by Peilin Ye

[permalink] [raw]
Subject: Re: [PATCH] docs: x86: Remove obsolete information about x86_64 vmalloc() faulting

Hi Joerg,

On Mon, Jul 19, 2021 at 02:34:31PM +0200, Joerg Roedel wrote:
> On Fri, Jul 16, 2021 at 02:09:58AM -0400, Peilin Ye wrote:
> > This information is out-of-date, and it took me quite some time of
> > ftrace'ing before I figured it out... I think it would be beneficial to
> > update, or at least remove it.
> >
> > As a proof that I understand what I am talking about, on my x86_64 box:
> >
> > 1. I allocated a vmalloc() area containing linear address `addr`;
> > 2. I manually pagewalked `addr` in different page tables, including
> > `init_mm.pgd`;
> > 3. The corresponding PGD entries for `addr` in different page tables,
> > they all immediately pointed at the same PUD table (my box uses
> > 4-level paging), at the same physical address;
> > 4. No "lazy synchronization" via page fault handling happened at all,
> > since it is the same PUD table pre-allocated by
> > preallocate_vmalloc_pages() during boot time.
>
> Yes, this is the story for x86-64, because all PUD/P4D pages for the vmalloc
> area are pre-allocated at boot. So no faulting or synchronization needs
> to happen.
>
> On x86-32 this is a bit different. Pre-allocation of PMD/PTE pages is
> not an option there (even less when 4MB large-pages with 2-level paging
> come into the picture).
>
> So what happens there is that vmalloc related changes to the init_mm.pgd
> are synchronized to all page-tables in the system. But this
> synchronization is subject to race conditions in a way that another CPU
> might vmalloc an area below a PMD which is not fully synchronized yet.
>
> When this happens there is a fault, which is handled as a vmalloc()
> fault on x86-32 just as before. So vmalloc faults still exist on 32-bit,
> they are just less likely as they used to be.

Thanks a lot for the information! I will improve my commit message and
send a v2 soon.

I think for this patch, removing that out-of-date statement is
sufficient, since mm.rst is x86-64-specific, but maybe we should
document this behavior for x86-32 somewhere as well...

Thank you,
Peilin Ye