v4:
- Fixed a few "subordinate clauses" (SC) cases [Alex]
- Reword in ioctl_userfaultfd.2 to use bold font for the two modes referenced,
so as to be clear on what is "both" referring to [Alex]
v3:
- Don't use "Currently", instead add "(since x.y)" mark where proper [Alex]
- Always use semantic newlines across the whole patchset [Alex]
- Use quote when possible, rather than escapes [Alex]
- Fix one missing replacement of ".BR" -> ".B" [Alex]
- Some other trivial rephrases here and there when fixing up above
v2 changes:
- Fix wordings as suggested [MikeR]
- convert ".BR" to ".B" where proper for the patchset [Alex]
- rearrange a few lines in the last two patches where they got messed up
- document more things, e.g. UFFDIO_COPY_MODE_WP; and also on how to resolve a
wr-protect page fault.
There're two features missing in current manpage, namely:
(1) Userfaultfd Thread-ID feature
(2) Userfaultfd write protect mode
There's also a 3rd one which was just contributed from Axel - Axel, I think it
would be great if you can add that part too, probably after the whole
hugetlbfs/shmem minor mode reaches the linux master branch.
Please review, thanks.
Peter Xu (4):
userfaultfd.2: Add UFFD_FEATURE_THREAD_ID docs
userfaultfd.2: Add write-protect mode
ioctl_userfaultfd.2: Add UFFD_FEATURE_THREAD_ID docs
ioctl_userfaultfd.2: Add write-protect mode docs
man2/ioctl_userfaultfd.2 | 89 ++++++++++++++++++++++++++++-
man2/userfaultfd.2 | 117 ++++++++++++++++++++++++++++++++++++++-
2 files changed, 201 insertions(+), 5 deletions(-)
--
2.26.2
Write-protect mode is supported starting from Linux 5.7.
Signed-off-by: Peter Xu <[email protected]>
---
man2/userfaultfd.2 | 104 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 102 insertions(+), 2 deletions(-)
diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
index 555e37409..8ad4a71b5 100644
--- a/man2/userfaultfd.2
+++ b/man2/userfaultfd.2
@@ -78,6 +78,32 @@ all memory ranges that were registered with the object are unregistered
and unread events are flushed.
.\"
.PP
+Userfaultfd supports two modes of registration:
+.TP
+.BR UFFDIO_REGISTER_MODE_MISSING " (since 4.10)"
+When registered with
+.B UFFDIO_REGISTER_MODE_MISSING
+mode, the userspace will receive a page fault message
+when a missing page is accessed.
+The faulted thread will be stopped from execution until the page fault is
+resolved from the userspace by either an
+.B UFFDIO_COPY
+or an
+.B UFFDIO_ZEROPAGE
+ioctl.
+.TP
+.BR UFFDIO_REGISTER_MODE_WP " (since 5.7)"
+When registered with
+.B UFFDIO_REGISTER_MODE_WP
+mode, the userspace will receive a page fault message
+when a write-protected page is written.
+The faulted thread will be stopped from execution
+until the userspace un-write-protect the page using an
+.B UFFDIO_WRITEPROTECT
+ioctl.
+.PP
+Multiple modes can be enabled at the same time for the same memory range.
+.PP
Since Linux 4.14, userfaultfd page fault message can selectively embed faulting
thread ID information into the fault message.
One needs to enable this feature explicitly using the
@@ -144,6 +170,17 @@ single threaded non-cooperative userfaultfd manager implementations.
.\" and limitations remaining in 4.11
.\" Maybe it's worth adding a dedicated sub-section...
.\"
+.PP
+Starting from Linux 5.7, userfaultfd is able to do
+synchronous page dirty tracking using the new write-protection register mode.
+One should check against the feature bit
+.B UFFD_FEATURE_PAGEFAULT_FLAG_WP
+before using this feature.
+Similar to the original userfaultfd missing mode, the write-protect mode will
+generate an userfaultfd message when the protected page is written.
+The user needs to resolve the page fault by unprotecting the faulted page and
+kick the faulted thread to continue.
+For more information, please refer to "Userfaultfd write-protect mode" section.
.SS Userfaultfd operation
After the userfaultfd object is created with
.BR userfaultfd (),
@@ -219,6 +256,65 @@ userfaultfd can be used only with anonymous private memory mappings.
Since Linux 4.11,
userfaultfd can be also used with hugetlbfs and shared memory mappings.
.\"
+.SS Userfaultfd write-protect mode (since 5.7)
+Since Linux 5.7, userfaultfd supports write-protect mode.
+The user needs to first check availability of this feature using
+.B UFFDIO_API
+ioctl against the feature bit
+.B UFFD_FEATURE_PAGEFAULT_FLAG_WP
+before using this feature.
+.PP
+To register with userfaultfd write-protect mode, the user needs to initiate the
+.B UFFDIO_REGISTER
+ioctl with mode
+.B UFFDIO_REGISTER_MODE_WP
+set.
+Note that it's legal to monitor the same memory range with multiple modes.
+For example, the user can do
+.B UFFDIO_REGISTER
+with the mode set to
+.BR "UFFDIO_REGISTER_MODE_MISSING | UFFDIO_REGISTER_MODE_WP" .
+When there is only
+.B UFFDIO_REGISTER_MODE_WP
+registered, the userspace will
+.I not
+receive any message when a missing page is written.
+Instead, the userspace will only receive a write-protect page fault message
+when an existing but write-protected page got written.
+.PP
+After the
+.B UFFDIO_REGISTER
+ioctl completed with
+.B UFFDIO_REGISTER_MODE_WP
+mode set,
+the user can write-protect any existing memory within the range using the ioctl
+.B UFFDIO_WRITEPROTECT
+where
+.I uffdio_writeprotect.mode
+should be set to
+.BR UFFDIO_WRITEPROTECT_MODE_WP .
+.PP
+When a write-protect event happens,
+the userspace will receive a page fault message whose
+.I uffd_msg.pagefault.flags
+will be with
+.B UFFD_PAGEFAULT_FLAG_WP
+flag set.
+Note: since only writes can trigger such kind of fault,
+write-protect messages will always be with
+.B UFFD_PAGEFAULT_FLAG_WRITE
+bit set too along with bit
+.BR UFFD_PAGEFAULT_FLAG_WP .
+.PP
+To resolve a write-protection page fault, the user should initiate another
+.B UFFDIO_WRITEPROTECT
+ioctl, whose
+.I uffd_msg.pagefault.flags
+should have the flag
+.B UFFDIO_WRITEPROTECT_MODE_WP
+cleared upon the faulted page or range.
+.PP
+Write-protect mode only supports private anonymous memory.
.SS Reading from the userfaultfd structure
Each
.BR read (2)
@@ -364,8 +460,12 @@ flag (see
.BR ioctl_userfaultfd (2))
and this flag is set, this a write fault;
otherwise it is a read fault.
-.\"
-.\" UFFD_PAGEFAULT_FLAG_WP is not yet supported.
+.TP
+.B UFFD_PAGEFAULT_FLAG_WP
+If the address is in a range that was registered with the
+.B UFFDIO_REGISTER_MODE_WP
+flag, when this bit is set it means it's a write-protect fault.
+Otherwise it's a page missing fault.
.RE
.TP
.I pagefault.feat.pid
--
2.26.2
UFFD_FEATURE_THREAD_ID is supported in Linux 4.14.
Signed-off-by: Peter Xu <[email protected]>
---
man2/ioctl_userfaultfd.2 | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index 47ae5f473..d4a8375b8 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2
@@ -208,6 +208,11 @@ signal will be sent to the faulting process.
Applications using this
feature will not require the use of a userfaultfd monitor for processing
memory accesses to the regions registered with userfaultfd.
+.TP
+.BR UFFD_FEATURE_THREAD_ID " (since Linux 4.14)"
+If this feature bit is set,
+.I uffd_msg.pagefault.feat.ptid
+will be set to the faulted thread ID for each page fault message.
.PP
The returned
.I ioctls
--
2.26.2
Userfaultfd write-protect mode is supported starting from Linux 5.7.
Signed-off-by: Peter Xu <[email protected]>
---
man2/ioctl_userfaultfd.2 | 84 ++++++++++++++++++++++++++++++++++++++--
1 file changed, 81 insertions(+), 3 deletions(-)
diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index d4a8375b8..5419687a6 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2
@@ -234,6 +234,11 @@ operation is supported.
The
.B UFFDIO_UNREGISTER
operation is supported.
+.TP
+.B 1 << _UFFDIO_WRITEPROTECT
+The
+.B UFFDIO_WRITEPROTECT
+operation is supported.
.PP
This
.BR ioctl (2)
@@ -322,9 +327,6 @@ Track page faults on missing pages.
.B UFFDIO_REGISTER_MODE_WP
Track page faults on write-protected pages.
.PP
-Currently, the only supported mode is
-.BR UFFDIO_REGISTER_MODE_MISSING .
-.PP
If the operation is successful, the kernel modifies the
.I ioctls
bit-mask field to indicate which
@@ -443,6 +445,16 @@ operation:
.TP
.B UFFDIO_COPY_MODE_DONTWAKE
Do not wake up the thread that waits for page-fault resolution
+.TP
+.B UFFDIO_COPY_MODE_WP
+Copy the page with read-only permission.
+This allows the user to trap the next write to the page,
+which will block and generate another write-protect userfault message.
+This is only used when both
+.B UFFDIO_REGISTER_MODE_MISSING
+and
+.B UFFDIO_REGISTER_MODE_WP
+modes are enabled for the registered range.
.PP
The
.I copy
@@ -654,6 +666,72 @@ field of the
structure was not a multiple of the system page size; or
.I len
was zero; or the specified range was otherwise invalid.
+.SS UFFDIO_WRITEPROTECT (Since Linux 5.7)
+Write-protect or write-unprotect an userfaultfd registered memory range
+registered with mode
+.BR UFFDIO_REGISTER_MODE_WP .
+.PP
+The
+.I argp
+argument is a pointer to a
+.I uffdio_range
+structure as shown below:
+.PP
+.in +4n
+.EX
+struct uffdio_writeprotect {
+ struct uffdio_range range; /* Range to change write permission */
+ __u64 mode; /* Mode to change write permission */
+};
+.EE
+.in
+There're two mode bits that are supported in this structure:
+.TP
+.B UFFDIO_WRITEPROTECT_MODE_WP
+When this mode bit is set, the ioctl will be a write-protect operation upon the
+memory range specified by
+.IR range .
+Otherwise it'll be a write-unprotect operation upon the specified range,
+which can be used to resolve an userfaultfd write-protect page fault.
+.TP
+.B UFFDIO_WRITEPROTECT_MODE_DONTWAKE
+When this mode bit is set,
+do not wake up any thread that waits for page-fault resolution after the operation.
+This could only be specified if
+.B UFFDIO_WRITEPROTECT_MODE_WP
+is not specified.
+.PP
+This
+.BR ioctl (2)
+operation returns 0 on success.
+On error, \-1 is returned and
+.I errno
+is set to indicate the error.
+Possible errors include:
+.TP
+.B EINVAL
+The
+.I start
+or the
+.I len
+field of the
+.I ufdio_range
+structure was not a multiple of the system page size; or
+.I len
+was zero; or the specified range was otherwise invalid.
+.TP
+.B EAGAIN
+The process was interrupted and need to retry.
+.TP
+.B ENOENT
+The range specified in
+.I range
+is not valid.
+For example, the virtual address does not exist,
+or not registered with userfaultfd write-protect mode.
+.TP
+.B EFAULT
+Encountered a generic fault during processing.
.SH RETURN VALUE
See descriptions of the individual operations, above.
.SH ERRORS
--
2.26.2
On Mon, Mar 22, 2021 at 06:08:44PM -0400, Peter Xu wrote:
> v4:
> - Fixed a few "subordinate clauses" (SC) cases [Alex]
> - Reword in ioctl_userfaultfd.2 to use bold font for the two modes referenced,
> so as to be clear on what is "both" referring to [Alex]
>
> v3:
> - Don't use "Currently", instead add "(since x.y)" mark where proper [Alex]
> - Always use semantic newlines across the whole patchset [Alex]
> - Use quote when possible, rather than escapes [Alex]
> - Fix one missing replacement of ".BR" -> ".B" [Alex]
> - Some other trivial rephrases here and there when fixing up above
>
> v2 changes:
> - Fix wordings as suggested [MikeR]
> - convert ".BR" to ".B" where proper for the patchset [Alex]
> - rearrange a few lines in the last two patches where they got messed up
> - document more things, e.g. UFFDIO_COPY_MODE_WP; and also on how to resolve a
> wr-protect page fault.
>
> There're two features missing in current manpage, namely:
>
> (1) Userfaultfd Thread-ID feature
> (2) Userfaultfd write protect mode
>
> There's also a 3rd one which was just contributed from Axel - Axel, I think it
> would be great if you can add that part too, probably after the whole
> hugetlbfs/shmem minor mode reaches the linux master branch.
>
> Please review, thanks.
>
> Peter Xu (4):
> userfaultfd.2: Add UFFD_FEATURE_THREAD_ID docs
> userfaultfd.2: Add write-protect mode
> ioctl_userfaultfd.2: Add UFFD_FEATURE_THREAD_ID docs
> ioctl_userfaultfd.2: Add write-protect mode docs
>
> man2/ioctl_userfaultfd.2 | 89 ++++++++++++++++++++++++++++-
> man2/userfaultfd.2 | 117 ++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 201 insertions(+), 5 deletions(-)
Acked-by: Mike Rapoport <[email protected]>
> --
> 2.26.2
>
>
--
Sincerely yours,
Mike.
Hi Peter,
Please see a few comments below.
Thanks,
Alex
On 3/22/21 11:08 PM, Peter Xu wrote:
> Userfaultfd write-protect mode is supported starting from Linux 5.7.
>
> Signed-off-by: Peter Xu <[email protected]>
> ---
> man2/ioctl_userfaultfd.2 | 84 ++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 81 insertions(+), 3 deletions(-)
>
> diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> index d4a8375b8..5419687a6 100644
> --- a/man2/ioctl_userfaultfd.2
> +++ b/man2/ioctl_userfaultfd.2
> @@ -234,6 +234,11 @@ operation is supported.
> The
> .B UFFDIO_UNREGISTER
> operation is supported.
> +.TP
> +.B 1 << _UFFDIO_WRITEPROTECT
> +The
> +.B UFFDIO_WRITEPROTECT
> +operation is supported.
> .PP
> This
> .BR ioctl (2)
> @@ -322,9 +327,6 @@ Track page faults on missing pages.
> .B UFFDIO_REGISTER_MODE_WP
> Track page faults on write-protected pages.
> .PP
> -Currently, the only supported mode is
> -.BR UFFDIO_REGISTER_MODE_MISSING .
> -.PP
> If the operation is successful, the kernel modifies the
> .I ioctls
> bit-mask field to indicate which
> @@ -443,6 +445,16 @@ operation:
> .TP
> .B UFFDIO_COPY_MODE_DONTWAKE
> Do not wake up the thread that waits for page-fault resolution
> +.TP
> +.B UFFDIO_COPY_MODE_WP
> +Copy the page with read-only permission.
> +This allows the user to trap the next write to the page,
> +which will block and generate another write-protect userfault message.
s/write-protect/write-protected/
?
> +This is only used when both
> +.B UFFDIO_REGISTER_MODE_MISSING
> +and
> +.B UFFDIO_REGISTER_MODE_WP
> +modes are enabled for the registered range.
> .PP
> The
> .I copy
> @@ -654,6 +666,72 @@ field of the
> structure was not a multiple of the system page size; or
> .I len
> was zero; or the specified range was otherwise invalid.
> +.SS UFFDIO_WRITEPROTECT (Since Linux 5.7)
> +Write-protect or write-unprotect an userfaultfd registered memory range
> +registered with mode
> +.BR UFFDIO_REGISTER_MODE_WP .
> +.PP
> +The
> +.I argp
> +argument is a pointer to a
> +.I uffdio_range
> +structure as shown below:
> +.PP
> +.in +4n
> +.EX
> +struct uffdio_writeprotect {
> + struct uffdio_range range; /* Range to change write permission */
> + __u64 mode; /* Mode to change write permission */
> +};
> +.EE
> +.in
> +There're two mode bits that are supported in this structure:
> +.TP
> +.B UFFDIO_WRITEPROTECT_MODE_WP
> +When this mode bit is set, the ioctl will be a write-protect operation upon the
> +memory range specified by
> +.IR range .
> +Otherwise it'll be a write-unprotect operation upon the specified range,
> +which can be used to resolve an userfaultfd write-protect page fault.
> +.TP
> +.B UFFDIO_WRITEPROTECT_MODE_DONTWAKE
> +When this mode bit is set,
> +do not wake up any thread that waits for page-fault resolution after the operation.
> +This could only be specified if
> +.B UFFDIO_WRITEPROTECT_MODE_WP
> +is not specified.
> +.PP
> +This
> +.BR ioctl (2)
> +operation returns 0 on success.
> +On error, \-1 is returned and
> +.I errno
> +is set to indicate the error.
> +Possible errors include:
> +.TP
> +.B EINVAL
> +The
> +.I start
> +or the
> +.I len
> +field of the
> +.I ufdio_range
> +structure was not a multiple of the system page size; or
> +.I len
> +was zero; or the specified range was otherwise invalid.
> +.TP
> +.B EAGAIN
> +The process was interrupted and need to retry.
Maybe: "The process was interrupted; retry this call."?
I don't know what other pager say about this kind of error.
> +.TP
> +.B ENOENT
> +The range specified in
> +.I range
> +is not valid.
I'm not sure how this is different from the wording above in EINVAL. An
"otherwise invalid range" was already giving EINVAL?
> +For example, the virtual address does not exist,
> +or not registered with userfaultfd write-protect mode.
> +.TP
> +.B EFAULT
> +Encountered a generic fault during processing.
What is a "generic fault"?
> .SH RETURN VALUE
> See descriptions of the individual operations, above.
> .SH ERRORS
>
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
On Tue, Mar 23, 2021 at 07:19:12PM +0100, Alejandro Colomar (man-pages) wrote:
> Hi Peter,
>
> Please see a few more comments below.
>
> Thanks,
>
> Alex
>
> On 3/22/21 11:08 PM, Peter Xu wrote:
> > Write-protect mode is supported starting from Linux 5.7.
> >
> > Signed-off-by: Peter Xu <[email protected]>
> > ---
> > man2/userfaultfd.2 | 104 ++++++++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 102 insertions(+), 2 deletions(-)
> >
> > diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
> > index 555e37409..8ad4a71b5 100644
> > --- a/man2/userfaultfd.2
> > +++ b/man2/userfaultfd.2
> > @@ -78,6 +78,32 @@ all memory ranges that were registered with the object are unregistered
> > and unread events are flushed.
> > .\"
> > .PP
> > +Userfaultfd supports two modes of registration:
> > +.TP
> > +.BR UFFDIO_REGISTER_MODE_MISSING " (since 4.10)"
> > +When registered with
> > +.B UFFDIO_REGISTER_MODE_MISSING
> > +mode, the userspace will receive a page fault message
> > +when a missing page is accessed.
> > +The faulted thread will be stopped from execution until the page fault is
> > +resolved from the userspace by either an
> > +.B UFFDIO_COPY
> > +or an
> > +.B UFFDIO_ZEROPAGE
> > +ioctl.
> > +.TP
> > +.BR UFFDIO_REGISTER_MODE_WP " (since 5.7)"
> > +When registered with
> > +.B UFFDIO_REGISTER_MODE_WP
> > +mode, the userspace will receive a page fault message
> > +when a write-protected page is written.
> > +The faulted thread will be stopped from execution
> > +until the userspace un-write-protect the page using an
>
> Here you use un-write-protect, but in the other patch you use
> write-unprotect. Please, use a consistent wording if it's the same thing
> (if there are other similar things with different wordings in different
> pages, please fix them too, but I didn't see more of those). If there's
> already a wording for that in any page, please reuse it (I ignore it).
I tried to look for it, and these are the only ones I got that are related:
man4/fd.4:gets the cached drive state (disk changed, write protected et al.)
man4/st.4:The drive is write-protected.
man4/st.4:An attempt was made to write or erase a write-protected tape.
man4/st.4:when the tape in the drive is write-protected.
Unluckily I didn't find the unprotect part. I think I'll reword it as
"write-unprotect", since "write" should be an adjective-kind prefix.
>
> > +.B UFFDIO_WRITEPROTECT
> > +ioctl.
> > +.PP
> > +Multiple modes can be enabled at the same time for the same memory range.
> > +.PP
> > Since Linux 4.14, userfaultfd page fault message can selectively embed faulting
> > thread ID information into the fault message.
> > One needs to enable this feature explicitly using the
> > @@ -144,6 +170,17 @@ single threaded non-cooperative userfaultfd manager implementations.
> > .\" and limitations remaining in 4.11
> > .\" Maybe it's worth adding a dedicated sub-section...
> > .\"
> > +.PP
> > +Starting from Linux 5.7, userfaultfd is able to do
>
> The previous paragraph uses "Siince Linux 4.14". For consistency, please
> use that same wording here.
Ok.
Unless Mike speaks out, I'll still keep Mike's a-b as a credit for reviewing,
considering these changes are small.
Thanks,
--
Peter Xu
Hi Peter,
Please see a few more comments below.
Thanks,
Alex
On 3/22/21 11:08 PM, Peter Xu wrote:
> Write-protect mode is supported starting from Linux 5.7.
>
> Signed-off-by: Peter Xu <[email protected]>
> ---
> man2/userfaultfd.2 | 104 ++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 102 insertions(+), 2 deletions(-)
>
> diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
> index 555e37409..8ad4a71b5 100644
> --- a/man2/userfaultfd.2
> +++ b/man2/userfaultfd.2
> @@ -78,6 +78,32 @@ all memory ranges that were registered with the object are unregistered
> and unread events are flushed.
> .\"
> .PP
> +Userfaultfd supports two modes of registration:
> +.TP
> +.BR UFFDIO_REGISTER_MODE_MISSING " (since 4.10)"
> +When registered with
> +.B UFFDIO_REGISTER_MODE_MISSING
> +mode, the userspace will receive a page fault message
> +when a missing page is accessed.
> +The faulted thread will be stopped from execution until the page fault is
> +resolved from the userspace by either an
> +.B UFFDIO_COPY
> +or an
> +.B UFFDIO_ZEROPAGE
> +ioctl.
> +.TP
> +.BR UFFDIO_REGISTER_MODE_WP " (since 5.7)"
> +When registered with
> +.B UFFDIO_REGISTER_MODE_WP
> +mode, the userspace will receive a page fault message
> +when a write-protected page is written.
> +The faulted thread will be stopped from execution
> +until the userspace un-write-protect the page using an
Here you use un-write-protect, but in the other patch you use
write-unprotect. Please, use a consistent wording if it's the same
thing (if there are other similar things with different wordings in
different pages, please fix them too, but I didn't see more of those).
If there's already a wording for that in any page, please reuse it (I
ignore it).
> +.B UFFDIO_WRITEPROTECT
> +ioctl.
> +.PP
> +Multiple modes can be enabled at the same time for the same memory range.
> +.PP
> Since Linux 4.14, userfaultfd page fault message can selectively embed faulting
> thread ID information into the fault message.
> One needs to enable this feature explicitly using the
> @@ -144,6 +170,17 @@ single threaded non-cooperative userfaultfd manager implementations.
> .\" and limitations remaining in 4.11
> .\" Maybe it's worth adding a dedicated sub-section...
> .\"
> +.PP
> +Starting from Linux 5.7, userfaultfd is able to do
The previous paragraph uses "Siince Linux 4.14". For consistency,
please use that same wording here.
> +synchronous page dirty tracking using the new write-protection register mode.
> +One should check against the feature bit
> +.B UFFD_FEATURE_PAGEFAULT_FLAG_WP
> +before using this feature.
> +Similar to the original userfaultfd missing mode, the write-protect mode will
> +generate an userfaultfd message when the protected page is written.
> +The user needs to resolve the page fault by unprotecting the faulted page and
> +kick the faulted thread to continue.
> +For more information, please refer to "Userfaultfd write-protect mode" section.
> .SS Userfaultfd operation
> After the userfaultfd object is created with
> .BR userfaultfd (),
> @@ -219,6 +256,65 @@ userfaultfd can be used only with anonymous private memory mappings.
> Since Linux 4.11,
> userfaultfd can be also used with hugetlbfs and shared memory mappings.
> .\"
> +.SS Userfaultfd write-protect mode (since 5.7)
> +Since Linux 5.7, userfaultfd supports write-protect mode.
> +The user needs to first check availability of this feature using
> +.B UFFDIO_API
> +ioctl against the feature bit
> +.B UFFD_FEATURE_PAGEFAULT_FLAG_WP
> +before using this feature.
> +.PP
> +To register with userfaultfd write-protect mode, the user needs to initiate the
> +.B UFFDIO_REGISTER
> +ioctl with mode
> +.B UFFDIO_REGISTER_MODE_WP
> +set.
> +Note that it's legal to monitor the same memory range with multiple modes.
> +For example, the user can do
> +.B UFFDIO_REGISTER
> +with the mode set to
> +.BR "UFFDIO_REGISTER_MODE_MISSING | UFFDIO_REGISTER_MODE_WP" .
> +When there is only
> +.B UFFDIO_REGISTER_MODE_WP
> +registered, the userspace will
> +.I not
> +receive any message when a missing page is written.
> +Instead, the userspace will only receive a write-protect page fault message
> +when an existing but write-protected page got written.
> +.PP
> +After the
> +.B UFFDIO_REGISTER
> +ioctl completed with
> +.B UFFDIO_REGISTER_MODE_WP
> +mode set,
> +the user can write-protect any existing memory within the range using the ioctl
> +.B UFFDIO_WRITEPROTECT
> +where
> +.I uffdio_writeprotect.mode
> +should be set to
> +.BR UFFDIO_WRITEPROTECT_MODE_WP .
> +.PP
> +When a write-protect event happens,
> +the userspace will receive a page fault message whose
> +.I uffd_msg.pagefault.flags
> +will be with
> +.B UFFD_PAGEFAULT_FLAG_WP
> +flag set.
> +Note: since only writes can trigger such kind of fault,
> +write-protect messages will always be with
> +.B UFFD_PAGEFAULT_FLAG_WRITE
> +bit set too along with bit
> +.BR UFFD_PAGEFAULT_FLAG_WP .
> +.PP
> +To resolve a write-protection page fault, the user should initiate another
> +.B UFFDIO_WRITEPROTECT
> +ioctl, whose
> +.I uffd_msg.pagefault.flags
> +should have the flag
> +.B UFFDIO_WRITEPROTECT_MODE_WP
> +cleared upon the faulted page or range.
> +.PP
> +Write-protect mode only supports private anonymous memory.
> .SS Reading from the userfaultfd structure
> Each
> .BR read (2)
> @@ -364,8 +460,12 @@ flag (see
> .BR ioctl_userfaultfd (2))
> and this flag is set, this a write fault;
> otherwise it is a read fault.
> -.\"
> -.\" UFFD_PAGEFAULT_FLAG_WP is not yet supported.
> +.TP
> +.B UFFD_PAGEFAULT_FLAG_WP
> +If the address is in a range that was registered with the
> +.B UFFDIO_REGISTER_MODE_WP
> +flag, when this bit is set it means it's a write-protect fault.
> +Otherwise it's a page missing fault.
> .RE
> .TP
> .I pagefault.feat.pid
>
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
On Tue, Mar 23, 2021 at 07:11:04PM +0100, Alejandro Colomar (man-pages) wrote:
> Hi Peter,
Hi, Alex,
[...]
> > +.TP
> > +.B UFFDIO_COPY_MODE_WP
> > +Copy the page with read-only permission.
> > +This allows the user to trap the next write to the page,
> > +which will block and generate another write-protect userfault message.
>
> s/write-protect/write-protected/
> ?
I think here "write-protect" is the wording I wanted to use, it is the name of
the type of the message in plain text.
[...]
> > +.B EAGAIN
> > +The process was interrupted and need to retry.
>
> Maybe: "The process was interrupted; retry this call."?
> I don't know what other pager say about this kind of error.
Frankly I see no difference between the two.. If you prefer the latter, I can
switch.
>
> > +.TP
> > +.B ENOENT
> > +The range specified in
> > +.I range
> > +is not valid.
>
> I'm not sure how this is different from the wording above in EINVAL. An
> "otherwise invalid range" was already giving EINVAL?
This can be returned when vma is not found (mwriteprotect_range()):
err = -ENOENT;
dst_vma = find_dst_vma(dst_mm, start, len);
if (!dst_vma)
goto out_unlock;
I think maybe I could simply remove this entry, because from an user app
developer pov I'd only be interested in specific error that I'd be able to
detect and (even better) recover from. For such error I'd say there's not much
to do besides failing the app.
>
> > +For example, the virtual address does not exist,
> > +or not registered with userfaultfd write-protect mode.
> > +.TP
> > +.B EFAULT
> > +Encountered a generic fault during processing.
>
> What is a "generic fault"?
For example when the user copy failed due to some reason. See
userfaultfd_writeprotect():
if (copy_from_user(&uffdio_wp, user_uffdio_wp,
sizeof(struct uffdio_writeprotect)))
return -EFAULT;
But I didn't check other places, generally I'd return -EFAULT if I can't find a
proper other replacement which has a clearer meaning.
I don't think this is really helpful to user app too because no user app would
start to read this -EFAULT to do anything useful.. how about I drop it too if
you think the description is confusing?
Thanks,
--
Peter Xu
Hi Peter,
On 3/23/21 8:16 PM, Peter Xu wrote:
> On Tue, Mar 23, 2021 at 07:11:04PM +0100, Alejandro Colomar (man-pages) wrote:
>>> +.TP
>>> +.B UFFDIO_COPY_MODE_WP
>>> +Copy the page with read-only permission.
>>> +This allows the user to trap the next write to the page,
>>> +which will block and generate another write-protect userfault message.
>>
>> s/write-protect/write-protected/
>> ?
>
> I think here "write-protect" is the wording I wanted to use, it is the name of
> the type of the message in plain text.
Okay.
>
> [...]
>
>>> +.B EAGAIN
>>> +The process was interrupted and need to retry.
>>
>> Maybe: "The process was interrupted; retry this call."?
>> I don't know what other pager say about this kind of error.
>
> Frankly I see no difference between the two.. If you prefer the latter, I can
> switch.
I understand yours, but technically it's a bit incorrect: The subject
of the sentence changes: in "The process was interrupted" it's the
process, and in "need to retry" it's [you]. By separating the sentence
into two, it's more natural. :)
>
>>
>>> +.TP
>>> +.B ENOENT
>>> +The range specified in
>>> +.I range
>>> +is not valid.
>>
>> I'm not sure how this is different from the wording above in EINVAL. An
>> "otherwise invalid range" was already giving EINVAL?
>
> This can be returned when vma is not found (mwriteprotect_range()):
>
> err = -ENOENT;
> dst_vma = find_dst_vma(dst_mm, start, len);
>
> if (!dst_vma)
> goto out_unlock;
>
> I think maybe I could simply remove this entry, because from an user app
> developer pov I'd only be interested in specific error that I'd be able to
> detect and (even better) recover from. For such error I'd say there's not much
> to do besides failing the app.
If there's any possibility that the error can happen, it should be
documented, even if it's to say "Fatal error; abort!". Just try to
explain the causes and how to avoid causing them and/or possibly what to
do when they happen (abort?).
>
>>
>>> +For example, the virtual address does not exist,
>>> +or not registered with userfaultfd write-protect mode.
>>> +.TP
>>> +.B EFAULT
>>> +Encountered a generic fault during processing.
>>
>> What is a "generic fault"?
>
> For example when the user copy failed due to some reason. See
> userfaultfd_writeprotect():
>
> if (copy_from_user(&uffdio_wp, user_uffdio_wp,
> sizeof(struct uffdio_writeprotect)))
> return -EFAULT;
>
> But I didn't check other places, generally I'd return -EFAULT if I can't find a
> proper other replacement which has a clearer meaning.
>
> I don't think this is really helpful to user app too because no user app would
> start to read this -EFAULT to do anything useful.. how about I drop it too if
> you think the description is confusing?
Same as above.
Thanks,
Alex
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
On Thu, Mar 25, 2021 at 10:32:20PM +0100, Alejandro Colomar (man-pages) wrote:
> Hi Peter,
>
> On 3/23/21 8:16 PM, Peter Xu wrote:
> > On Tue, Mar 23, 2021 at 07:11:04PM +0100, Alejandro Colomar (man-pages) wrote:
> > > > +.TP
> > > > +.B UFFDIO_COPY_MODE_WP
> > > > +Copy the page with read-only permission.
> > > > +This allows the user to trap the next write to the page,
> > > > +which will block and generate another write-protect userfault message.
> > >
> > > s/write-protect/write-protected/
> > > ?
> >
> > I think here "write-protect" is the wording I wanted to use, it is the name of
> > the type of the message in plain text.
>
> Okay.
>
> >
> > [...]
> >
> > > > +.B EAGAIN
> > > > +The process was interrupted and need to retry.
> > >
> > > Maybe: "The process was interrupted; retry this call."?
> > > I don't know what other pager say about this kind of error.
> >
> > Frankly I see no difference between the two.. If you prefer the latter, I can
> > switch.
>
> I understand yours, but technically it's a bit incorrect: The subject of
> the sentence changes: in "The process was interrupted" it's the process, and
> in "need to retry" it's [you]. By separating the sentence into two, it's
> more natural. :)
Sure, I'll change.
>
> >
> > >
> > > > +.TP
> > > > +.B ENOENT
> > > > +The range specified in
> > > > +.I range
> > > > +is not valid.
> > >
> > > I'm not sure how this is different from the wording above in EINVAL. An
> > > "otherwise invalid range" was already giving EINVAL?
> >
> > This can be returned when vma is not found (mwriteprotect_range()):
> >
> > err = -ENOENT;
> > dst_vma = find_dst_vma(dst_mm, start, len);
> >
> > if (!dst_vma)
> > goto out_unlock;
> >
> > I think maybe I could simply remove this entry, because from an user app
> > developer pov I'd only be interested in specific error that I'd be able to
> > detect and (even better) recover from. For such error I'd say there's not much
> > to do besides failing the app.
>
> If there's any possibility that the error can happen, it should be
> documented, even if it's to say "Fatal error; abort!". Just try to explain
> the causes and how to avoid causing them and/or possibly what to do when
> they happen (abort?).
Okay. Would you mind me keeping my original wording? Because IMHO that
exactly does what you said as "trying to explain the causes" and so on:
.B ENOENT
The range specified in
.I range
is not valid.
For example, the virtual address does not exist,
or not registered with userfaultfd write-protect mode.
It's indeed slightly duplicated with EINVAL, but if you don't agree with the
wording meanwhile if you don't agree on overlapping of the errors, then what I
need is not reworking this patchset, but proposing a kernel patch to change the
error retval to make them match. I am not against proposing a kernel patch, but
I just don't see it extremely necessary.
For my own experience on working with the kernel, the return value sometimes is
not that strict - say, it's hard to control every single bit of the possible
return code of a syscall/ioctl to reflect everything matching the document. We
should always try to do it accurate but it seems not easy to me. It's also
hard to write up the document that 100% matching the kernel code, because at
least that'll require a full-path workthrough of every single piece of kernel
code that the syscall/ioctl has called, so as to collect all the errors, then
summarize their meanings. That could be a lot of work.
>
> >
> > >
> > > > +For example, the virtual address does not exist,
> > > > +or not registered with userfaultfd write-protect mode.
> > > > +.TP
> > > > +.B EFAULT
> > > > +Encountered a generic fault during processing.
> > >
> > > What is a "generic fault"?
> >
> > For example when the user copy failed due to some reason. See
> > userfaultfd_writeprotect():
> >
> > if (copy_from_user(&uffdio_wp, user_uffdio_wp,
> > sizeof(struct uffdio_writeprotect)))
> > return -EFAULT;
> >
> > But I didn't check other places, generally I'd return -EFAULT if I can't find a
> > proper other replacement which has a clearer meaning.
> >
> > I don't think this is really helpful to user app too because no user app would
> > start to read this -EFAULT to do anything useful.. how about I drop it too if
> > you think the description is confusing?
>
> Same as above.
Above copy_from_user() is the only place that could trigger -EFAULT so far I
can find. So either I can change above into:
.TP
.B EFAULT
Failure on copying ioctl parameters into the kernel.
Would you think it okay (before I repost)? I'd still prefer my original
wording because I bet 90% user developer may not even know what does it mean
when the kernel cannot copy the user parameter, and what he/she can do with
it.. However if you think it's proper I'll use it.
Thanks,
--
Peter Xu
Hi Peter,
On 3/29/21 11:51 PM, Peter Xu wrote:
> On Thu, Mar 25, 2021 at 10:32:20PM +0100, Alejandro Colomar (man-pages) wrote:
>>>>> +.TP
>>>>> +.B ENOENT
>>>>> +The range specified in
>>>>> +.I range
>>>>> +is not valid.
>>>>
>>>> I'm not sure how this is different from the wording above in EINVAL. An
>>>> "otherwise invalid range" was already giving EINVAL?
>>>
>>> This can be returned when vma is not found (mwriteprotect_range()):
>>>
>>> err = -ENOENT;
>>> dst_vma = find_dst_vma(dst_mm, start, len);
>>>
>>> if (!dst_vma)
>>> goto out_unlock;
>>>
>>> I think maybe I could simply remove this entry, because from an user app
>>> developer pov I'd only be interested in specific error that I'd be able to
>>> detect and (even better) recover from. For such error I'd say there's not much
>>> to do besides failing the app.
>>
>> If there's any possibility that the error can happen, it should be
>> documented, even if it's to say "Fatal error; abort!". Just try to explain
>> the causes and how to avoid causing them and/or possibly what to do when
>> they happen (abort?).
>
> Okay. Would you mind me keeping my original wording? Because IMHO that
> exactly does what you said as "trying to explain the causes" and so on:
>
> .B ENOENT
> The range specified in
> .I range
> is not valid.
> For example, the virtual address does not exist,
> or not registered with userfaultfd write-protect mode.
>
> It's indeed slightly duplicated with EINVAL, but if you don't agree with the
> wording meanwhile if you don't agree on overlapping of the errors, then what I
> need is not reworking this patchset, but proposing a kernel patch to change the
> error retval to make them match. I am not against proposing a kernel patch, but
> I just don't see it extremely necessary.
>
> For my own experience on working with the kernel, the return value sometimes is
> not that strict - say, it's hard to control every single bit of the possible
> return code of a syscall/ioctl to reflect everything matching the document. We
> should always try to do it accurate but it seems not easy to me. It's also
> hard to write up the document that 100% matching the kernel code, because at
> least that'll require a full-path workthrough of every single piece of kernel
> code that the syscall/ioctl has called, so as to collect all the errors, then
> summarize their meanings. That could be a lot of work.
Yes, That's fine. I was only curious about the overlap, but if they do
overlap, that's it.
>>>>> +For example, the virtual address does not exist,
>>>>> +or not registered with userfaultfd write-protect mode.
>>>>> +.TP
>>>>> +.B EFAULT
>>>>> +Encountered a generic fault during processing.
>>>>
>>>> What is a "generic fault"?
>>>
>>> For example when the user copy failed due to some reason. See
>>> userfaultfd_writeprotect():
>>>
>>> if (copy_from_user(&uffdio_wp, user_uffdio_wp,
>>> sizeof(struct uffdio_writeprotect)))
>>> return -EFAULT;
>>>
>>> But I didn't check other places, generally I'd return -EFAULT if I can't find a
>>> proper other replacement which has a clearer meaning.
>>>
>>> I don't think this is really helpful to user app too because no user app would
>>> start to read this -EFAULT to do anything useful.. how about I drop it too if
>>> you think the description is confusing?
>>
>> Same as above.
>
> Above copy_from_user() is the only place that could trigger -EFAULT so far I
> can find. So either I can change above into:
>
> .TP
> .B EFAULT
> Failure on copying ioctl parameters into the kernel.
>
> Would you think it okay (before I repost)? I'd still prefer my original
> wording because I bet 90% user developer may not even know what does it mean
> when the kernel cannot copy the user parameter, and what he/she can do with
> it.. However if you think it's proper I'll use it.
Okay, I'll take your original words. Maybe all this "extra" info could
go into the commit message. I'll wait for your resend with the a-b and
the minor changes :-)
Thanks,
Alex
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/