x86_64 vmalloc() mappings are no longer "lazily synchronized" among page
tables via page fault handling since commit 7f0a002b5a21 ("x86/mm: remove
vmalloc faulting"). Subsequently, commit 6eb82f994026 ("x86/mm:
Pre-allocate P4D/PUD pages for vmalloc area") rendered it unnecessary to
synchronize, whether lazily or not, x86_64 vmalloc() mappings at runtime,
since the corresponding P4D or PUD pages are now preallocated during
system initialization by preallocate_vmalloc_pages(). Drop the "lazily
synchronized" description for less confusion.
It is worth noting, however, that there is still a slight complication for
x86_32; see commit 4819e15f740e ("x86/mm/32: Bring back vmalloc faulting
on x86_32") for details.
Signed-off-by: Peilin Ye <[email protected]>
---
Hi all,
I was trying to understand vmalloc() when I saw this "lazily synchronized"
statement, which confused me for a while. Please correct me if my
understanding is wrong or out of date.
Thank you,
Peilin Ye
Documentation/x86/x86_64/mm.rst | 4 ----
1 file changed, 4 deletions(-)
diff --git a/Documentation/x86/x86_64/mm.rst b/Documentation/x86/x86_64/mm.rst
index ede1875719fb..9798676bb0bf 100644
--- a/Documentation/x86/x86_64/mm.rst
+++ b/Documentation/x86/x86_64/mm.rst
@@ -140,10 +140,6 @@ The direct mapping covers all memory in the system up to the highest
memory address (this means in some cases it can also include PCI memory
holes).
-vmalloc space is lazily synchronized into the different PML4/PML5 pages of
-the processes using the page fault handler, with init_top_pgt as
-reference.
-
We map EFI runtime services in the 'efi_pgd' PGD in a 64Gb large virtual
memory window (this size is arbitrary, it can be raised later if needed).
The mappings are not part of any other kernel PGD and are only available
--
2.25.1
Hi all,
> diff --git a/Documentation/x86/x86_64/mm.rst b/Documentation/x86/x86_64/mm.rst
> index ede1875719fb..9798676bb0bf 100644
> --- a/Documentation/x86/x86_64/mm.rst
> +++ b/Documentation/x86/x86_64/mm.rst
> @@ -140,10 +140,6 @@ The direct mapping covers all memory in the system up to the highest
> memory address (this means in some cases it can also include PCI memory
> holes).
>
> -vmalloc space is lazily synchronized into the different PML4/PML5 pages of
> -the processes using the page fault handler, with init_top_pgt as
> -reference.
This information is out-of-date, and it took me quite some time of
ftrace'ing before I figured it out... I think it would be beneficial to
update, or at least remove it.
As a proof that I understand what I am talking about, on my x86_64 box:
1. I allocated a vmalloc() area containing linear address `addr`;
2. I manually pagewalked `addr` in different page tables, including
`init_mm.pgd`;
3. The corresponding PGD entries for `addr` in different page tables,
they all immediately pointed at the same PUD table (my box uses
4-level paging), at the same physical address;
4. No "lazy synchronization" via page fault handling happened at all,
since it is the same PUD table pre-allocated by
preallocate_vmalloc_pages() during boot time.
Commit 6eb82f994026 ("x86/mm: Pre-allocate P4D/PUD pages for vmalloc
area") documented this clearly:
"""
Doing this at boot makes sure no synchronization of that area is
necessary at runtime.
"""
Should we remove this sentence, or update it? Any ideas?
Sincerely,
Peilin Ye
Hi,
On Fri, Jul 16, 2021 at 02:09:58AM -0400, Peilin Ye wrote:
> This information is out-of-date, and it took me quite some time of
> ftrace'ing before I figured it out... I think it would be beneficial to
> update, or at least remove it.
>
> As a proof that I understand what I am talking about, on my x86_64 box:
>
> 1. I allocated a vmalloc() area containing linear address `addr`;
> 2. I manually pagewalked `addr` in different page tables, including
> `init_mm.pgd`;
> 3. The corresponding PGD entries for `addr` in different page tables,
> they all immediately pointed at the same PUD table (my box uses
> 4-level paging), at the same physical address;
> 4. No "lazy synchronization" via page fault handling happened at all,
> since it is the same PUD table pre-allocated by
> preallocate_vmalloc_pages() during boot time.
Yes, this is the story for x86-64, because all PUD/P4D pages for the vmalloc
area are pre-allocated at boot. So no faulting or synchronization needs
to happen.
On x86-32 this is a bit different. Pre-allocation of PMD/PTE pages is
not an option there (even less when 4MB large-pages with 2-level paging
come into the picture).
So what happens there is that vmalloc related changes to the init_mm.pgd
are synchronized to all page-tables in the system. But this
synchronization is subject to race conditions in a way that another CPU
might vmalloc an area below a PMD which is not fully synchronized yet.
When this happens there is a fault, which is handled as a vmalloc()
fault on x86-32 just as before. So vmalloc faults still exist on 32-bit,
they are just less likely as they used to be.
Regards,
Joerg
Hi Joerg,
On Mon, Jul 19, 2021 at 02:34:31PM +0200, Joerg Roedel wrote:
> On Fri, Jul 16, 2021 at 02:09:58AM -0400, Peilin Ye wrote:
> > This information is out-of-date, and it took me quite some time of
> > ftrace'ing before I figured it out... I think it would be beneficial to
> > update, or at least remove it.
> >
> > As a proof that I understand what I am talking about, on my x86_64 box:
> >
> > 1. I allocated a vmalloc() area containing linear address `addr`;
> > 2. I manually pagewalked `addr` in different page tables, including
> > `init_mm.pgd`;
> > 3. The corresponding PGD entries for `addr` in different page tables,
> > they all immediately pointed at the same PUD table (my box uses
> > 4-level paging), at the same physical address;
> > 4. No "lazy synchronization" via page fault handling happened at all,
> > since it is the same PUD table pre-allocated by
> > preallocate_vmalloc_pages() during boot time.
>
> Yes, this is the story for x86-64, because all PUD/P4D pages for the vmalloc
> area are pre-allocated at boot. So no faulting or synchronization needs
> to happen.
>
> On x86-32 this is a bit different. Pre-allocation of PMD/PTE pages is
> not an option there (even less when 4MB large-pages with 2-level paging
> come into the picture).
>
> So what happens there is that vmalloc related changes to the init_mm.pgd
> are synchronized to all page-tables in the system. But this
> synchronization is subject to race conditions in a way that another CPU
> might vmalloc an area below a PMD which is not fully synchronized yet.
>
> When this happens there is a fault, which is handled as a vmalloc()
> fault on x86-32 just as before. So vmalloc faults still exist on 32-bit,
> they are just less likely as they used to be.
Thanks a lot for the information! I will improve my commit message and
send a v2 soon.
I think for this patch, removing that out-of-date statement is
sufficient, since mm.rst is x86-64-specific, but maybe we should
document this behavior for x86-32 somewhere as well...
Thank you,
Peilin Ye