2023-07-28 14:53:09

by Fabio M. De Francesco

[permalink] [raw]
Subject: [PATCH] Documentation/page_tables: Add info about MMU/TLB and Page Faults

Extend page_tables.rst by adding a section about the role of MMU and TLB
in translating between virtual addresses and physical page frames.
Furthermore explain the concept behind Page Faults and how the Linux
kernel handles TLB misses. Finally briefly explain how and why to disable
the page faults handler.

Cc: Andrew Morton <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Fabio M. De Francesco <[email protected]>
---

This has been an RFC PATCH in its 2nd version for a week or so. I received
comments and suggestions on it from Jonathan Cameron (thanks!), and so it has
now been modified to a real patch. I hope that other people want to add their
comments on this document in order to further improve and extend it.

The link to the thread with the RFC PATCH v2 and the messages between Jonathan
and me start at https://lore.kernel.org/all/[email protected]/#r

Documentation/mm/page_tables.rst | 105 +++++++++++++++++++++++++++++++
1 file changed, 105 insertions(+)

diff --git a/Documentation/mm/page_tables.rst b/Documentation/mm/page_tables.rst
index 7840c1891751..6ecfd6d2f1f3 100644
--- a/Documentation/mm/page_tables.rst
+++ b/Documentation/mm/page_tables.rst
@@ -152,3 +152,108 @@ Page table handling code that wishes to be architecture-neutral, such as the
virtual memory manager, will need to be written so that it traverses all of the
currently five levels. This style should also be preferred for
architecture-specific code, so as to be robust to future changes.
+
+
+MMU, TLB, and Page Faults
+=========================
+
+The `Memory Management Unit (MMU)` is a hardware component that handles virtual
+to physical address translations. It may use relatively small caches in hardware
+called `Translation Lookaside Buffers (TLBs)` and `Page Walk Caches` to speed up
+these translations.
+
+When a process wants to access a memory location, the CPU provides a virtual
+address to the MMU, which then uses the MMU to check access permissions and
+dirty bits, and if possible it resolves the physical address and consents the
+requested type of access to the corresponding physical address.
+
+If the TLBs have not yet any recorded translations, the MMU may use the Page
+Walk Caches and complete or restart the page tables walks until a physical
+address can finally be resolved. Permissions and dirty bits are checked.
+
+In the context of a virtual memory system, like the one used by the Linux
+kernel, each page of memory has associated permission and dirty bits.
+
+The dirty bit for a page is set (i.e., turned on) when the page is written
+to. This indicates that the page has been modified since it was loaded into
+memory. It probably needs to be written on disk or other cores may need to
+be informed about previous changes before allowing further operations.
+
+If nothing prevents it, eventually the physical memory can be accessed and
+the requested operation on the physical frame is performed.
+
+There are several reasons why the MMU can't find certain translations. It
+could happen because the process is trying to access a range of memory that is
+not allowed to, or because the data is not present into RAM.
+
+When these conditions happen, the MMU triggers page faults, which are types
+of exceptions that signal the CPU to pause the current process and run a special
+function to handle the mentioned page faults.
+
+One cause of page faults is due to bugs (or maliciously crafted addresses) and
+happens when a process tries to access a range of memory that it doesn't have
+permission to. This could be because the memory is reserved for the kernel or
+for another process, or because the process is trying to write to a read-only
+section of memory. When this happens, the kernel sends a Segmentation Fault
+(SIGSEGV) signal to the process, which usually causes the process to terminate.
+
+An expected and more common cause of page faults is an optimization called "lazy
+allocation". This is a technique used by the Kernel to improve memory efficiency
+and reduce footprint. Instead of allocating physical memory to a process as soon
+as it's requested, the Kernel waits until the process actually tries to use the
+memory. This can save a significant amount of memory in cases where a process
+requests a large block but only uses a small portion of it.
+
+A related technique is called "Copy-on-Write" (CoW), where the Kernel allows
+multiple processes to share the same physical memory as long as they're only
+reading from it. If a process tries to write to the shared memory, the kernel
+triggers a page fault and allocates a separate copy of the memory for the
+process. This allows the Kernel to save memory and avoid unnecessary data
+copying and, by doing so, it reduces latency and space occupation.
+
+Now, let's see how the Linux kernel handles these page faults:
+
+1. For most architectures, `do_page_fault()` is the primary interrupt handler
+ for page faults. It delegates the actual handling of the page fault to
+ `handle_mm_fault()`. This function checks the cause of the page fault and
+ takes the appropriate action, such as loading the required page into
+ memory, granting the process the necessary permissions, or sending a
+ SIGSEGV signal to the process.
+
+2. In the specific case of the x86 architecture, the interrupt handler is
+ defined by the `DEFINE_IDTENTRY_RAW_ERRORCODE()` macro, which calls
+ `handle_page_fault()`. This function then calls either
+ `do_user_addr_fault()` or `do_kern_addr_fault()`, depending on whether
+ the fault occurred in user space or kernel space. Both of these functions
+ eventually lead to `handle_mm_fault()`, similar to the workflow in other
+ architectures.
+
+`handle_mm_fault()` (likely) ends up calling `__handle_mm_fault()` to carry
+out the actual work of allocation of the page tables. It works by using
+several functions to find the entry's offsets of the 4 - 5 layers of tables
+and allocate the tables it needs to. The functions that look for the offset
+have names like `*_offset()`, where the "*" is for pgd, p4d, pud, pmd, pte;
+instead the functions to allocate the corresponding tables, layer by layer,
+are named `*_alloc`, with the above mentioned convention to name them after
+the corresponding types of tables in the hierarchy.
+
+At the very end of the walk with allocations, if it didn't return errors,
+`__handle_mm_fault()` finally calls `handle_pte_fault()`, which via
+`do_fault()` performs one of `do_read_fault()`, `do_cow_fault()`,
+`do_shared_fault()`. "read", "cow", "shared" give hints about the reasons
+and the kind of fault it's handling.
+
+The actual implementation of the workflow is very complex. Its design allows
+Linux to handle page faults in a way that is tailored to the specific
+characteristics of each architecture, while still sharing a common overall
+structure.
+
+To conclude this brief overview from very high altitude of how Linux handles
+page faults, let's add that page faults handler can be disabled and enabled
+respectively with `pagefault_disable()` and `pagefault_enable()`.
+
+Several code path make use of the latter two functions because they need to
+disable traps into the page faults handler, mostly to prevent deadlocks.[1]
+
+[1] mm/userfaultfd: Replace kmap/kmap_atomic() with kmap_local_page()
+https://lore.kernel.org/all/[email protected]/
--
2.41.0



2023-08-03 18:43:00

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [PATCH] Documentation/page_tables: Add info about MMU/TLB and Page Faults

On venerd? 28 luglio 2023 13:53:01 CEST Fabio M. De Francesco wrote:
> Extend page_tables.rst by adding a section about the role of MMU and TLB
> in translating between virtual addresses and physical page frames.
> Furthermore explain the concept behind Page Faults and how the Linux
> kernel handles TLB misses. Finally briefly explain how and why to disable
> the page faults handler.

Hello everyone,

I'd be grateful to anyone who wanted to comment on / or formally review this
patch. At the moment I've only had comments by Jonathan Cameron on RFC v2
(https://lore.kernel.org/all/[email protected]/
#t).

Does anybody else want to contribute?

Thanks in advance,

Fabio

> Cc: Andrew Morton <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Cc: Jonathan Cameron <[email protected]>
> Cc: Jonathan Corbet <[email protected]>
> Cc: Linus Walleij <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Mike Rapoport <[email protected]>
> Cc: Randy Dunlap <[email protected]>
> Signed-off-by: Fabio M. De Francesco <[email protected]>
> ---
>
> This has been an RFC PATCH in its 2nd version for a week or so. I received
> comments and suggestions on it from Jonathan Cameron (thanks!), and so it
has
> now been modified to a real patch. I hope that other people want to add
their
> comments on this document in order to further improve and extend it.
>
> The link to the thread with the RFC PATCH v2 and the messages between
Jonathan
> and me start at
> https://lore.kernel.org/all/[email protected]/#r
>
> Documentation/mm/page_tables.rst | 105 +++++++++++++++++++++++++++++++
> 1 file changed, 105 insertions(+)
>
> diff --git a/Documentation/mm/page_tables.rst
> b/Documentation/mm/page_tables.rst index 7840c1891751..6ecfd6d2f1f3 100644
> --- a/Documentation/mm/page_tables.rst
> +++ b/Documentation/mm/page_tables.rst
> @@ -152,3 +152,108 @@ Page table handling code that wishes to be
> architecture-neutral, such as the virtual memory manager, will need to be
> written so that it traverses all of the currently five levels. This style
> should also be preferred for
> architecture-specific code, so as to be robust to future changes.
> +
> +
> +MMU, TLB, and Page Faults
> +=========================
> +
> +The `Memory Management Unit (MMU)` is a hardware component that handles
> virtual +to physical address translations. It may use relatively small
caches
> in hardware +called `Translation Lookaside Buffers (TLBs)` and `Page Walk
> Caches` to speed up +these translations.
> +
> +When a process wants to access a memory location, the CPU provides a
virtual
> +address to the MMU, which then uses the MMU to check access permissions and
> +dirty bits, and if possible it resolves the physical address and consents
the
> +requested type of access to the corresponding physical address.
> +
> +If the TLBs have not yet any recorded translations, the MMU may use the
Page
> +Walk Caches and complete or restart the page tables walks until a physical
> +address can finally be resolved. Permissions and dirty bits are checked.
> +
> +In the context of a virtual memory system, like the one used by the Linux
> +kernel, each page of memory has associated permission and dirty bits.
> +
> +The dirty bit for a page is set (i.e., turned on) when the page is written
> +to. This indicates that the page has been modified since it was loaded into
> +memory. It probably needs to be written on disk or other cores may need to
> +be informed about previous changes before allowing further operations.
> +
> +If nothing prevents it, eventually the physical memory can be accessed and
> +the requested operation on the physical frame is performed.
> +
> +There are several reasons why the MMU can't find certain translations. It
> +could happen because the process is trying to access a range of memory that
> is +not allowed to, or because the data is not present into RAM.
> +
> +When these conditions happen, the MMU triggers page faults, which are types
> +of exceptions that signal the CPU to pause the current process and run a
> special +function to handle the mentioned page faults.
> +
> +One cause of page faults is due to bugs (or maliciously crafted addresses)
> and +happens when a process tries to access a range of memory that it
doesn't
> have +permission to. This could be because the memory is reserved for the
> kernel or +for another process, or because the process is trying to write to
> a read-only +section of memory. When this happens, the kernel sends a
> Segmentation Fault +(SIGSEGV) signal to the process, which usually causes
the
> process to terminate. +
> +An expected and more common cause of page faults is an optimization called
> "lazy +allocation". This is a technique used by the Kernel to improve memory
> efficiency +and reduce footprint. Instead of allocating physical memory to a
> process as soon +as it's requested, the Kernel waits until the process
> actually tries to use the +memory. This can save a significant amount of
> memory in cases where a process +requests a large block but only uses a
small
> portion of it.
> +
> +A related technique is called "Copy-on-Write" (CoW), where the Kernel
allows
> +multiple processes to share the same physical memory as long as they're
only
> +reading from it. If a process tries to write to the shared memory, the
kernel
> +triggers a page fault and allocates a separate copy of the memory for the
> +process. This allows the Kernel to save memory and avoid unnecessary data
> +copying and, by doing so, it reduces latency and space occupation.
> +
> +Now, let's see how the Linux kernel handles these page faults:
> +
> +1. For most architectures, `do_page_fault()` is the primary interrupt
handler
> + for page faults. It delegates the actual handling of the page fault to +
> `handle_mm_fault()`. This function checks the cause of the page fault and +
> takes the appropriate action, such as loading the required page into +
> memory, granting the process the necessary permissions, or sending a +
> SIGSEGV signal to the process.
> +
> +2. In the specific case of the x86 architecture, the interrupt handler is
> + defined by the `DEFINE_IDTENTRY_RAW_ERRORCODE()` macro, which calls
> + `handle_page_fault()`. This function then calls either
> + `do_user_addr_fault()` or `do_kern_addr_fault()`, depending on whether
> + the fault occurred in user space or kernel space. Both of these
functions
> + eventually lead to `handle_mm_fault()`, similar to the workflow in other
> + architectures.
> +
> +`handle_mm_fault()` (likely) ends up calling `__handle_mm_fault()` to carry
> +out the actual work of allocation of the page tables. It works by using
> +several functions to find the entry's offsets of the 4 - 5 layers of tables
> +and allocate the tables it needs to. The functions that look for the offset
> +have names like `*_offset()`, where the "*" is for pgd, p4d, pud, pmd, pte;
> +instead the functions to allocate the corresponding tables, layer by layer,
> +are named `*_alloc`, with the above mentioned convention to name them after
> +the corresponding types of tables in the hierarchy.
> +
> +At the very end of the walk with allocations, if it didn't return errors,
> +`__handle_mm_fault()` finally calls `handle_pte_fault()`, which via
> +`do_fault()` performs one of `do_read_fault()`, `do_cow_fault()`,
> +`do_shared_fault()`. "read", "cow", "shared" give hints about the reasons
> +and the kind of fault it's handling.
> +
> +The actual implementation of the workflow is very complex. Its design
allows
> +Linux to handle page faults in a way that is tailored to the specific
> +characteristics of each architecture, while still sharing a common overall
> +structure.
> +
> +To conclude this brief overview from very high altitude of how Linux
handles
> +page faults, let's add that page faults handler can be disabled and enabled
> +respectively with `pagefault_disable()` and `pagefault_enable()`.
> +
> +Several code path make use of the latter two functions because they need to
> +disable traps into the page faults handler, mostly to prevent deadlocks.[1]
> +
> +[1] mm/userfaultfd: Replace kmap/kmap_atomic() with kmap_local_page()
> +https://lore.kernel.org/all/[email protected]/
> --
> 2.41.0





2023-08-07 10:30:55

by Linus Walleij

[permalink] [raw]
Subject: Re: [PATCH] Documentation/page_tables: Add info about MMU/TLB and Page Faults

Hi Fabio,

I'm back from vacation! Overall this documentation looks good and
in line with the reset in this section.

On Fri, Jul 28, 2023 at 2:01 PM Fabio M. De Francesco
<[email protected]> wrote:

> +One cause of page faults is due to bugs (or maliciously crafted addresses) and
> +happens when a process tries to access a range of memory that it doesn't have
> +permission to. This could be because the memory is reserved for the kernel or
> +for another process, or because the process is trying to write to a read-only
> +section of memory. When this happens, the kernel sends a Segmentation Fault
> +(SIGSEGV) signal to the process, which usually causes the process to terminate.

This "segmentation fault" (SIGSEGV reads "signal segmentation violation)
is actually a bit hard to understand for people not familiar
with the 1970ies hardware. The Wikipedia tries to explain it but gets a bit
long and confusing.
https://en.wikipedia.org/wiki/Segmentation_fault

The computers where the first Unix was developed (PDP machines) simply
named its MMU the "memory segmentation unit" so "segmentation fault"
is just a 1970ies way of saying "MMU access violation", which stuck inside
Unix and thus inside Linux. Here is the explanation:
https://wfjm.github.io/blogs/w11/2022-08-18-on-segments-and-pages.html

The binary loader would generously use the plentyful virtual memory
"segments" to split each executable into three segments when loading the binary,
still reflected in ELF binaries to this day:
https://en.wikipedia.org/wiki/Code_segment
https://en.wikipedia.org/wiki/Data_segment
https://en.wikipedia.org/wiki/.bss

Then the page table got special permissions set to each segment for read/write
etc. Other programs and the kernel memory are also in inaccessible segments,
so accessing any of the own segments in the wrong way, or another programs
segment, or an unmapped segment (virtual memory) would all result in the
SIGSEGV opaque message "segmentation fault"

I don't know how to reflect this in a good way in the documentation
though, maybe
copy/paste/edit some of my text or I can try to write something as an additional
patch if you prefer.

Yours,
Linus Walleij

2023-08-07 11:11:08

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH] Documentation/page_tables: Add info about MMU/TLB and Page Faults

Hi Fabio,

On Fri, Jul 28, 2023 at 01:53:01PM +0200, Fabio M. De Francesco wrote:
> Extend page_tables.rst by adding a section about the role of MMU and TLB
> in translating between virtual addresses and physical page frames.
> Furthermore explain the concept behind Page Faults and how the Linux
> kernel handles TLB misses. Finally briefly explain how and why to disable
> the page faults handler.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Cc: Jonathan Cameron <[email protected]>
> Cc: Jonathan Corbet <[email protected]>
> Cc: Linus Walleij <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Mike Rapoport <[email protected]>
> Cc: Randy Dunlap <[email protected]>
> Signed-off-by: Fabio M. De Francesco <[email protected]>
> ---
>
> This has been an RFC PATCH in its 2nd version for a week or so. I received
> comments and suggestions on it from Jonathan Cameron (thanks!), and so it has
> now been modified to a real patch. I hope that other people want to add their
> comments on this document in order to further improve and extend it.
>
> The link to the thread with the RFC PATCH v2 and the messages between Jonathan
> and me start at https://lore.kernel.org/all/[email protected]/#r
>
> Documentation/mm/page_tables.rst | 105 +++++++++++++++++++++++++++++++
> 1 file changed, 105 insertions(+)
>
> diff --git a/Documentation/mm/page_tables.rst b/Documentation/mm/page_tables.rst
> index 7840c1891751..6ecfd6d2f1f3 100644
> --- a/Documentation/mm/page_tables.rst
> +++ b/Documentation/mm/page_tables.rst
> @@ -152,3 +152,108 @@ Page table handling code that wishes to be architecture-neutral, such as the
> virtual memory manager, will need to be written so that it traverses all of the
> currently five levels. This style should also be preferred for
> architecture-specific code, so as to be robust to future changes.
> +
> +
> +MMU, TLB, and Page Faults
> +=========================
> +
> +The `Memory Management Unit (MMU)` is a hardware component that handles virtual
> +to physical address translations. It may use relatively small caches in hardware
> +called `Translation Lookaside Buffers (TLBs)` and `Page Walk Caches` to speed up
> +these translations.
> +
> +When a process wants to access a memory location, the CPU provides a virtual
> +address to the MMU, which then uses the MMU to check access permissions and
> +dirty bits, and if possible it resolves the physical address and consents the
> +requested type of access to the corresponding physical address.

Essentially any access to a memory location involves the translation from
virtual to physical, not only when processes access memory.
I'd write this and the next paragraph as:

When CPU accesses a memory location, it provides a virtual address to the
MMU, which checks if there is existing translation in TLB or in the Page
Walk Caches on architectures that support them. If no translation is found,
MMU uses the page tables to determine the physical address.

During the translation process, be it a cached translation or page table
walk, MMU checks the permissions and verifies that access is allowed.
If architecture supports accessed and dirty bits in the page tables, these
bits are updated in the page tables.

> +If the TLBs have not yet any recorded translations, the MMU may use the Page
> +Walk Caches and complete or restart the page tables walks until a physical
> +address can finally be resolved. Permissions and dirty bits are checked.
> +
> +In the context of a virtual memory system, like the one used by the Linux
> +kernel, each page of memory has associated permission and dirty bits.

The permissions and dirty (and accessed) bits should be introduced before
we mention them, or we can just presume that the reader knows what they are
or will check it elsewhere :)

> +The dirty bit for a page is set (i.e., turned on) when the page is written
> +to. This indicates that the page has been modified since it was loaded into
> +memory. It probably needs to be written on disk or other cores may need to
> +be informed about previous changes before allowing further operations.

I don't think there are architectures that require synchronization on dirty
bits.

> +If nothing prevents it, eventually the physical memory can be accessed and
> +the requested operation on the physical frame is performed.
> +
> +There are several reasons why the MMU can't find certain translations. It
> +could happen because the process is trying to access a range of memory that is

here as well, it's not necessarily a process.

> +not allowed to, or because the data is not present into RAM.
> +
> +When these conditions happen, the MMU triggers page faults, which are types
> +of exceptions that signal the CPU to pause the current process and run a special

maybe current execution ^

> +function to handle the mentioned page faults.
> +
> +One cause of page faults is due to bugs (or maliciously crafted addresses) and
> +happens when a process tries to access a range of memory that it doesn't have
> +permission to. This could be because the memory is reserved for the kernel or
> +for another process, or because the process is trying to write to a read-only
> +section of memory. When this happens, the kernel sends a Segmentation Fault
> +(SIGSEGV) signal to the process, which usually causes the process to terminate.
> +
> +An expected and more common cause of page faults is an optimization called "lazy
> +allocation". This is a technique used by the Kernel to improve memory efficiency

^ no need in capital K

> +and reduce footprint. Instead of allocating physical memory to a process as soon
> +as it's requested, the Kernel waits until the process actually tries to use the
> +memory. This can save a significant amount of memory in cases where a process
> +requests a large block but only uses a small portion of it.
> +
> +A related technique is called "Copy-on-Write" (CoW), where the Kernel allows
> +multiple processes to share the same physical memory as long as they're only
> +reading from it. If a process tries to write to the shared memory, the kernel
> +triggers a page fault and allocates a separate copy of the memory for the
> +process. This allows the Kernel to save memory and avoid unnecessary data
> +copying and, by doing so, it reduces latency and space occupation.

I believe both lazy allocation and CoW descriptions better fit into Process
Addresses.

> +
> +Now, let's see how the Linux kernel handles these page faults:
> +
> +1. For most architectures, `do_page_fault()` is the primary interrupt handler
> + for page faults. It delegates the actual handling of the page fault to
> + `handle_mm_fault()`. This function checks the cause of the page fault and
> + takes the appropriate action, such as loading the required page into
> + memory, granting the process the necessary permissions, or sending a
> + SIGSEGV signal to the process.
> +
> +2. In the specific case of the x86 architecture, the interrupt handler is
> + defined by the `DEFINE_IDTENTRY_RAW_ERRORCODE()` macro, which calls
> + `handle_page_fault()`. This function then calls either
> + `do_user_addr_fault()` or `do_kern_addr_fault()`, depending on whether
> + the fault occurred in user space or kernel space. Both of these functions
> + eventually lead to `handle_mm_fault()`, similar to the workflow in other
> + architectures.

I don't think it's important how the arch-specific page fault handlers are
defined. The important bits are that there is an architecture specific
exception handler that does initial processing of the page fault and calls
handle_mm_fault() which takes care of the generic page processing.

> +`handle_mm_fault()` (likely) ends up calling `__handle_mm_fault()` to carry

The unlikely bit actually generates SIGSEGV, so it would be nice to
describe that.

> +out the actual work of allocation of the page tables. It works by using
> +several functions to find the entry's offsets of the 4 - 5 layers of tables

Better to use 'upper' instead of '4-5'. __handle_mm_fault() never gets to
pte offset.

> +and allocate the tables it needs to. The functions that look for the offset
> +have names like `*_offset()`, where the "*" is for pgd, p4d, pud, pmd, pte;
> +instead the functions to allocate the corresponding tables, layer by layer,
> +are named `*_alloc`, with the above mentioned convention to name them after
> +the corresponding types of tables in the hierarchy.

It's worth mentioning that page table walk may end at one of the upper
layers if the memory is to be mapped with PMD or PUD.

> +At the very end of the walk with allocations, if it didn't return errors,
> +`__handle_mm_fault()` finally calls `handle_pte_fault()`, which via
> +`do_fault()` performs one of `do_read_fault()`, `do_cow_fault()`,
> +`do_shared_fault()`. "read", "cow", "shared" give hints about the reasons
> +and the kind of fault it's handling.
> +
> +The actual implementation of the workflow is very complex. Its design allows
> +Linux to handle page faults in a way that is tailored to the specific
> +characteristics of each architecture, while still sharing a common overall
> +structure.
> +
> +To conclude this brief overview from very high altitude of how Linux handles
> +page faults, let's add that page faults handler can be disabled and enabled
> +respectively with `pagefault_disable()` and `pagefault_enable()`.
> +
> +Several code path make use of the latter two functions because they need to
> +disable traps into the page faults handler, mostly to prevent deadlocks.[1]

I don't think this reference is needed here

> +
> +[1] mm/userfaultfd: Replace kmap/kmap_atomic() with kmap_local_page()
> +https://lore.kernel.org/all/[email protected]/
> --
> 2.41.0
>

--
Sincerely yours,
Mike.

2023-08-09 13:38:14

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [PATCH] Documentation/page_tables: Add info about MMU/TLB and Page Faults

On lunedì 7 agosto 2023 11:40:30 CEST Linus Walleij wrote:
> Hi Fabio,
>
> I'm back from vacation! Overall this documentation looks good and
> in line with the reset in this section.
>
> On Fri, Jul 28, 2023 at 2:01 PM Fabio M. De Francesco
>
> <[email protected]> wrote:
> > +One cause of page faults is due to bugs (or maliciously crafted
addresses)
> > and +happens when a process tries to access a range of memory that it
> > doesn't have +permission to. This could be because the memory is reserved
> > for the kernel or +for another process, or because the process is trying
to
> > write to a read-only +section of memory. When this happens, the kernel
> > sends a Segmentation Fault +(SIGSEGV) signal to the process, which usually
> > causes the process to terminate.
> This "segmentation fault" (SIGSEGV reads "signal segmentation violation)
> is actually a bit hard to understand for people not familiar
> with the 1970ies hardware.

Linus,

Actually, I see a lot of "Segmentation fault (core dumped)" because I still
develop in user space.

Stupid distractions are enough to get that message printed...

#include <stdio.h>
#include <malloc.h>

int main() {
int *p1, *p2;
p2 = malloc(sizeof(int));
*p2 = 9;
printf("*p2 is %d\n", *p1);
return 0;
}

fabio@suse:/tmp> gcc -o test test.c
fabio@suse:/tmp> ./test
Segmentation fault (core dumped)

Furthermore, everybody can still type "man signal.h" (document written in
2017) and lookup the table of the POSIX signals and see that SIGSEGV is for
"Invalid memory reference.".
>
> [Snip]
>
> Other programs and the kernel memory are also in inaccessible segments,
> so accessing any of the own segments in the wrong way, or another programs
> segment, or an unmapped segment (virtual memory) would all result in the
> SIGSEGV opaque message "segmentation fault"
>
> I don't know how to reflect this in a good way in the documentation
> though, maybe
> copy/paste/edit some of my text or I can try to write something as an
> additional patch if you prefer.

I suspect that people is much more used to get more "Segmentation fault" these
days than in the 1970's (when developers probably were a little more careful
with pointers - at least this is what I have heard about this subject :-)).

BTW, please feel free to change / extend this paragraph with a follow up
patch.

Thanks for your comments,

Fabio
> Yours,
> Linus Walleij





2023-08-09 13:45:02

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [PATCH] Documentation/page_tables: Add info about MMU/TLB and Page Faults

On luned? 7 agosto 2023 12:50:10 CEST Mike Rapoport wrote:
> Hi Fabio,
>
> On Fri, Jul 28, 2023 at 01:53:01PM +0200, Fabio M. De Francesco wrote:
> > Extend page_tables.rst by adding a section about the role of MMU and TLB
> > in translating between virtual addresses and physical page frames.
> > Furthermore explain the concept behind Page Faults and how the Linux
> > kernel handles TLB misses. Finally briefly explain how and why to disable
> > the page faults handler.
> >
> > [snip]
> >
> > +MMU, TLB, and Page Faults
> > +=========================
> > +
> > +The `Memory Management Unit (MMU)` is a hardware component that handles
> > virtual +to physical address translations. It may use relatively small
> > caches in hardware +called `Translation Lookaside Buffers (TLBs)` and
`Page
> > Walk Caches` to speed up +these translations.
> > +
> > +When a process wants to access a memory location, the CPU provides a
> > virtual
> > +address to the MMU, which then uses the MMU to check access permissions
and
> > +dirty bits, and if possible it resolves the physical address and consents
> > the +requested type of access to the corresponding physical address.
>
> Essentially any access to a memory location involves the translation from
> virtual to physical, not only when processes access memory.

Mike,

I'm cutting everything from here on because I agree with your comments, so I
could just write a long list of 'I agree', 'I understand' and the like. I want
to avoid readers from the aforementioned list :-)

I think (actually, I hope) that I have understood everything correctly. I will
send a new version with the necessary corrections by the end of this week.

Thanks again for your comments and suggestions.

Fabio