2023-09-19 21:47:11

by Axel Rasmussen

[permalink] [raw]
Subject: [PATCH 00/10] userfaultfd man page updates

Various updates for userfaultfd man pages. To summarize the changes:

- Correctly / fully describe the two-step feature support handshake process.
- Describe new UFFDIO_POISON ioctl.
- Other small improvements (missing ioctls, error codes, etc).

Axel Rasmussen (10):
userfaultfd.2: briefly mention two-step feature handshake process
userfaultfd.2: reword to account for new fault resolution ioctls
userfaultfd.2: comment on feature detection in the example program
ioctl_userfaultfd.2: fix a few trivial mistakes
ioctl_userfaultfd.2: describe two-step feature handshake
ioctl_userfaultfd.2: describe missing UFFDIO_API feature flags
ioctl_userfaultfd.2: correct and update UFFDIO_API ioctl error codes
ioctl_userfaultfd.2: clarify the state of the uffdio_api structure on
error
ioctl_userfaultfd.2: fix / update UFFDIO_REGISTER error code list
ioctl_userfaultfd.2: document new UFFDIO_POISON ioctl

man2/ioctl_userfaultfd.2 | 254 ++++++++++++++++++++++++++++++++-------
man2/userfaultfd.2 | 15 ++-
2 files changed, 220 insertions(+), 49 deletions(-)

--
2.42.0.459.ge4e396fd5e-goog


2023-09-19 22:37:57

by Axel Rasmussen

[permalink] [raw]
Subject: [PATCH 06/10] ioctl_userfaultfd.2: describe missing UFFDIO_API feature flags

Several new features have been added to the kernel recently, and the man
page wasn't updated to describe these new features. So, add in
descriptions of any missing features.

Signed-off-by: Axel Rasmussen <[email protected]>
---
man2/ioctl_userfaultfd.2 | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)

diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index e91a1dfc8..53b1f473f 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2
@@ -204,6 +204,13 @@ If this feature bit is set,
.I uffd_msg.pagefault.feat.ptid
will be set to the faulted thread ID for each page-fault message.
.TP
+.BR UFFD_FEATURE_PAGEFAULT_FLAG_WP " (since Linux 5.10)"
+If this feature bit is set,
+userfaultfd supports write-protect faults
+for anonymous memory.
+(Note that shmem / hugetlbfs support
+is indicated by a separate feature.)
+.TP
.BR UFFD_FEATURE_MINOR_HUGETLBFS " (since Linux 5.13)"
If this feature bit is set,
the kernel supports registering userfaultfd ranges
@@ -221,6 +228,22 @@ will be set to the exact page-fault address that was reported by the hardware,
and will not mask the offset within the page.
Note that old Linux versions might indicate the exact address as well,
even though the feature bit is not set.
+.TP
+.BR UFFD_FEATURE_WP_HUGETLBFS_SHMEM " (since Linux 5.19)"
+If this feature bit is set,
+userfaultfd supports write-protect faults
+for hugetlbfs and shmem / tmpfs memory.
+.TP
+.BR UFFD_FEATURE_WP_UNPOPULATED " (since Linux 6.4)"
+If this feature bit is set,
+the kernel will handle anonymous memory the same way as file memory,
+by allowing the user to write-protect unpopulated ptes.
+.TP
+.BR UFFD_FEATURE_POISON " (since Linux 6.6)"
+If this feature bit is set,
+the kernel supports resolving faults with the
+.B UFFDIO_POISON
+ioctl.
.PP
The returned
.I ioctls
--
2.42.0.459.ge4e396fd5e-goog

2023-09-19 22:51:20

by Axel Rasmussen

[permalink] [raw]
Subject: [PATCH 03/10] userfaultfd.2: comment on feature detection in the example program

The example program doesn't depend on any extra features, so it does not
make use of the two-step feature handshake process. This is fine, but it
might set a bad example for programs which *do* depend on specific
features (e.g. they may conclude they don't need to do anything to
enable / detect them).

No need to make the example program more complicated: let's just add a
comment indicating why we do it the way we do it in the example, and
describing briefly what a more complicated program would need to do
instead.

The comment is kept rather brief; a full description of this feature
will be included in ioctl_userfaultfd.2 instead.

Signed-off-by: Axel Rasmussen <[email protected]>
---
man2/userfaultfd.2 | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
index 00d94e514..b2b79f61d 100644
--- a/man2/userfaultfd.2
+++ b/man2/userfaultfd.2
@@ -881,6 +881,13 @@ main(int argc, char *argv[])
if (uffd == \-1)
err(EXIT_FAILURE, "userfaultfd");
\&
+ /* NOTE: Two-step feature handshake is not needed here, since this
+ example doesn't require any specific features.
+
+ Programs that *do* should call UFFDIO_API twice: once with
+ `features = 0` to detect features supported by this kernel, and
+ again with the subset of features the program actually wants to
+ enable. */
uffdio_api.api = UFFD_API;
uffdio_api.features = 0;
if (ioctl(uffd, UFFDIO_API, &uffdio_api) == \-1)
--
2.42.0.459.ge4e396fd5e-goog

2023-09-19 23:06:47

by Axel Rasmussen

[permalink] [raw]
Subject: [PATCH 04/10] ioctl_userfaultfd.2: fix a few trivial mistakes

- Fix missing paragraph tag. The lack of this tag yielded no blank line
in the rendere dpage, which is inconsistent with style elsewhere.

- The description of UFFDIO_WRITEPROTECT was a sentence fragment; the
last half of the sentence was left out by mistake. Add it in to fix
the issue.

- move UFFDIO_WRITEPROTECT 'since' to its own line All other ioctls
note the kernel version introduced on a separate line from the ioctl
name. Update UFFDIO_WRITEPROTECT to match the existing style.

Signed-off-by: Axel Rasmussen <[email protected]>
---
man2/ioctl_userfaultfd.2 | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index b5281ec4c..339adf8fe 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2
@@ -31,6 +31,7 @@ using calls of the form:
ioctl(fd, cmd, argp);
.EE
.in
+.PP
In the above,
.I fd
is a file descriptor referring to a userfaultfd object,
@@ -351,6 +352,7 @@ operation is supported.
.B 1 << _UFFDIO_WRITEPROTECT
The
.B UFFDIO_WRITEPROTECT
+operation is supported.
.TP
.B 1 << _UFFDIO_ZEROPAGE
The
@@ -693,7 +695,8 @@ field of the
structure was not a multiple of the system page size; or
.I len
was zero; or the specified range was otherwise invalid.
-.SS UFFDIO_WRITEPROTECT (Since Linux 5.7)
+.SS UFFDIO_WRITEPROTECT
+(Since Linux 5.7.)
Write-protect or write-unprotect a userfaultfd-registered memory range
registered with mode
.BR UFFDIO_REGISTER_MODE_WP .
--
2.42.0.459.ge4e396fd5e-goog

2023-09-19 23:52:13

by Axel Rasmussen

[permalink] [raw]
Subject: [PATCH 05/10] ioctl_userfaultfd.2: describe two-step feature handshake

Fully describe how UFFDIO_API can be used to perform a two-step feature
handshake, and also note the case where this isn't necessary (programs
which don't depend on any extra features).

This lets us clean up an old FIXME asking for this to be described.

Signed-off-by: Axel Rasmussen <[email protected]>
---
man2/ioctl_userfaultfd.2 | 37 +++++++++++++++++++++----------------
1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index 339adf8fe..e91a1dfc8 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2
@@ -83,7 +83,6 @@ struct uffdio_api {
The
.I api
field denotes the API version requested by the application.
-.PP
The kernel verifies that it can support the requested API version,
and sets the
.I features
@@ -93,6 +92,25 @@ fields to bit masks representing all the available features and the generic
.BR ioctl (2)
operations available.
.PP
+After Linux 4.11,
+applications should use the
+.I features
+field to perform a two-step handshake.
+First,
+.BR UFFDIO_API
+is called with the
+.I features
+field set to zero.
+The kernel responsds by setting all supported feature bits.
+.PP
+Applications which do not require any specific features
+can begin using the userfaultfd immediately.
+Applications which do need specific features
+should call
+.BR UFFDIO_API
+again with a subset of the reported feature bits set
+to enable those features.
+.PP
Before Linux 4.11, the
.I features
field must be initialized to zero before the call to
@@ -102,24 +120,11 @@ and zero (i.e., no feature bits) is placed in the
field by the kernel upon return from
.BR ioctl (2).
.PP
-Starting from Linux 4.11, the
-.I features
-field can be used to ask whether particular features are supported
-and explicitly enable userfaultfd features that are disabled by default.
-The kernel always reports all the available features in the
-.I features
-field.
-.PP
-To enable userfaultfd features the application should set
-a bit corresponding to each feature it wants to enable in the
-.I features
-field.
-If the kernel supports all the requested features it will enable them.
-Otherwise it will zero out the returned
+If the application sets unsupported feature bits,
+the kernel will zero out the returned
.I uffdio_api
structure and return
.BR EINVAL .
-.\" FIXME add more details about feature negotiation and enablement
.PP
The following feature bits may be set:
.TP
--
2.42.0.459.ge4e396fd5e-goog

2023-09-20 00:18:29

by Axel Rasmussen

[permalink] [raw]
Subject: [PATCH 01/10] userfaultfd.2: briefly mention two-step feature handshake process

This process is critical for programs which depend on extra features, so
it's worth mentioning here.

Future commits will much more fully describe it in ioctl_userfaultfd.2.

Signed-off-by: Axel Rasmussen <[email protected]>
---
man2/userfaultfd.2 | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
index 40354065c..1b2af22f9 100644
--- a/man2/userfaultfd.2
+++ b/man2/userfaultfd.2
@@ -200,8 +200,9 @@ the application must enable it using the
.B UFFDIO_API
.BR ioctl (2)
operation.
-This operation allows a handshake between the kernel and user space
-to determine the API version and supported features.
+This operation allows a two-step handshake between the kernel and user space
+to determine what API version and features the kernel supports,
+and then to enable those features user space wants.
This operation must be performed before any of the other
.BR ioctl (2)
operations described below (or those operations fail with the
--
2.42.0.459.ge4e396fd5e-goog

2023-09-20 08:42:10

by Axel Rasmussen

[permalink] [raw]
Subject: [PATCH 08/10] ioctl_userfaultfd.2: clarify the state of the uffdio_api structure on error

The old FIXME noted that the zeroing was done to differentiate the two
EINVAL cases. It's possible something like this was true historically,
but in current Linux we zero it in *both* EINVAL cases, so this is at
least no longer true.

After reading the code, I can't determine any clear reason why we zero
it in some cases but not in others. So, some simple advice we can give
userspace is: if an error occurs, treat the contents of the structure as
unspecified. Just re-initialize it before retrying UFFDIO_API again.

Signed-off-by: Axel Rasmussen <[email protected]>
---
man2/ioctl_userfaultfd.2 | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index 1aa9654be..29dca1f6b 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2
@@ -272,6 +272,14 @@ operation returns 0 on success.
On error, \-1 is returned and
.I errno
is set to indicate the error.
+If an error occurs,
+the kernel may zero the provided
+.I uffdio_api
+structure.
+The caller should treat its contents as unspecified,
+and reinitialize it before re-attempting another
+.B UFFDIO_API
+call.
Possible errors include:
.TP
.B EFAULT
@@ -305,14 +313,6 @@ twice,
the first time with no features set,
is explicitly allowed
as per the two-step feature detection handshake.
-.\" FIXME In the above error case, the returned 'uffdio_api' structure is
-.\" zeroed out. Why is this done? This should be explained in the manual page.
-.\"
-.\" Mike Rapoport:
-.\" In my understanding the uffdio_api
-.\" structure is zeroed to allow the caller
-.\" to distinguish the reasons for -EINVAL.
-.\"
.SS UFFDIO_REGISTER
(Since Linux 4.3.)
Register a memory address range with the userfaultfd object.
--
2.42.0.459.ge4e396fd5e-goog

2023-09-21 00:06:17

by Axel Rasmussen

[permalink] [raw]
Subject: [PATCH 10/10] ioctl_userfaultfd.2: document new UFFDIO_POISON ioctl

This is a new feature recently added to the kernel. So, document the new
ioctl the same way we do other UFFDIO_* ioctls.

Also note the corresponding new ioctl flag we can return in reponse to a
UFFDIO_REGISTER call.

Signed-off-by: Axel Rasmussen <[email protected]>
---
man2/ioctl_userfaultfd.2 | 112 +++++++++++++++++++++++++++++++++++++++
1 file changed, 112 insertions(+)

diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index afe3caffc..1282f63e1 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2
@@ -405,6 +405,11 @@ operation is supported.
The
.B UFFDIO_CONTINUE
operation is supported.
+.TP
+.B 1 << _UFFDIO_POISON
+The
+.B UFFDIO_POISON
+operation is supported.
.PP
This
.BR ioctl (2)
@@ -916,6 +921,113 @@ The faulting process has exited at the time of a
.B UFFDIO_CONTINUE
operation.
.\"
+.SS UFFDIO_POISON
+(Since Linux 6.6.)
+Mark an address range as "poisoned".
+Future accesses to these addresses will raise a
+.B SIGBUS
+signal.
+Unlike
+.B MADV_HWPOISON
+this works by installing page table entries,
+rather than "really" poisoning the underlying physical pages.
+This means it only affects this particular address space.
+.PP
+The
+.I argp
+argument is a pointer to a
+.I uffdio_continue
+structure as shown below:
+.PP
+.in +4n
+.EX
+struct uffdio_poison {
+ struct uffdio_range range;
+ /* Range to install poison PTE markers in */
+ __u64 mode; /* Flags controlling the behavior of poison */
+ __s64 updated; /* Number of bytes poisoned, or negated error */
+};
+.EE
+.in
+.PP
+The following value may be bitwise ORed in
+.I mode
+to change the behavior of the
+.B UFFDIO_POISON
+operation:
+.TP
+.B UFFDIO_POISON_MODE_DONTWAKE
+Do not wake up the thread that waits for page-fault resolution.
+.PP
+The
+.I updated
+field is used by the kernel
+to return the number of bytes that were actually poisoned,
+or an error in the same manner as
+.BR UFFDIO_COPY .
+If the value returned in the
+.I updated
+field doesn't match the value that was specified in
+.IR range.len ,
+the operation fails with the error
+.BR EAGAIN .
+The
+.I updated
+field is output-only;
+it is not read by the
+.B UFFDIO_POISON
+operation.
+.PP
+This
+.BR ioctl (2)
+operation returns 0 on success.
+In this case,
+the entire area was poisoned.
+On error, \-1 is returned and
+.I errno
+is set to indicate the error.
+Possible errors include:
+.TP
+.B EAGAIN
+The number of bytes mapped
+(i.e., the value returned in the
+.I updated
+field)
+does not equal the value that was specified in the
+.I range.len
+field.
+.TP
+.B EINVAL
+Either
+.I range.start
+or
+.I range.len
+was not a multiple of the system page size; or
+.I range.len
+was zero; or the range specified was invalid.
+.TP
+.B EINVAL
+An invalid bit was specified in the
+.I mode
+field.
+.TP
+.B EEXIST
+One or more pages were already mapped in the given range.
+.TP
+.B ENOENT
+The faulting process has changed its virtual memory layout simultaneously with
+an outstanding
+.B UFFDIO_POISON
+operation.
+.TP
+.B ENOMEM
+Allocating memory for page table entries failed.
+.TP
+.B ESRCH
+The faulting process has exited at the time of a
+.B UFFDIO_POISON
+operation.
+.\"
.SH RETURN VALUE
See descriptions of the individual operations, above.
.SH ERRORS
--
2.42.0.459.ge4e396fd5e-goog

2023-09-25 23:29:01

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [PATCH 01/10] userfaultfd.2: briefly mention two-step feature handshake process

On Tue, Sep 19, 2023 at 12:01:57PM -0700, Axel Rasmussen wrote:
> This process is critical for programs which depend on extra features, so
> it's worth mentioning here.
>
> Future commits will much more fully describe it in ioctl_userfaultfd.2.
>
> Signed-off-by: Axel Rasmussen <[email protected]>

Patch applied. Thanks,
Alex

> ---
> man2/userfaultfd.2 | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
> index 40354065c..1b2af22f9 100644
> --- a/man2/userfaultfd.2
> +++ b/man2/userfaultfd.2
> @@ -200,8 +200,9 @@ the application must enable it using the
> .B UFFDIO_API
> .BR ioctl (2)
> operation.
> -This operation allows a handshake between the kernel and user space
> -to determine the API version and supported features.
> +This operation allows a two-step handshake between the kernel and user space
> +to determine what API version and features the kernel supports,
> +and then to enable those features user space wants.
> This operation must be performed before any of the other
> .BR ioctl (2)
> operations described below (or those operations fail with the
> --
> 2.42.0.459.ge4e396fd5e-goog
>


Attachments:
(No filename) (1.20 kB)
signature.asc (849.00 B)
Download all attachments

2023-09-25 23:36:42

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [PATCH 03/10] userfaultfd.2: comment on feature detection in the example program

Hi Axel,

On Tue, Sep 19, 2023 at 12:01:59PM -0700, Axel Rasmussen wrote:
> The example program doesn't depend on any extra features, so it does not
> make use of the two-step feature handshake process. This is fine, but it
> might set a bad example for programs which *do* depend on specific
> features (e.g. they may conclude they don't need to do anything to
> enable / detect them).
>
> No need to make the example program more complicated: let's just add a
> comment indicating why we do it the way we do it in the example, and
> describing briefly what a more complicated program would need to do
> instead.
>
> The comment is kept rather brief; a full description of this feature
> will be included in ioctl_userfaultfd.2 instead.
>
> Signed-off-by: Axel Rasmussen <[email protected]>

Patch applied.

Thanks,
Alex

> ---
> man2/userfaultfd.2 | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
> index 00d94e514..b2b79f61d 100644
> --- a/man2/userfaultfd.2
> +++ b/man2/userfaultfd.2
> @@ -881,6 +881,13 @@ main(int argc, char *argv[])
> if (uffd == \-1)
> err(EXIT_FAILURE, "userfaultfd");
> \&
> + /* NOTE: Two-step feature handshake is not needed here, since this
> + example doesn't require any specific features.
> +
> + Programs that *do* should call UFFDIO_API twice: once with
> + `features = 0` to detect features supported by this kernel, and
> + again with the subset of features the program actually wants to
> + enable. */
> uffdio_api.api = UFFD_API;
> uffdio_api.features = 0;
> if (ioctl(uffd, UFFDIO_API, &uffdio_api) == \-1)
> --
> 2.42.0.459.ge4e396fd5e-goog
>


Attachments:
(No filename) (1.72 kB)
signature.asc (849.00 B)
Download all attachments

2023-09-25 23:40:02

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [PATCH 04/10] ioctl_userfaultfd.2: fix a few trivial mistakes

On Tue, Sep 19, 2023 at 12:02:00PM -0700, Axel Rasmussen wrote:
> - Fix missing paragraph tag. The lack of this tag yielded no blank line
> in the rendere dpage, which is inconsistent with style elsewhere.
>
> - The description of UFFDIO_WRITEPROTECT was a sentence fragment; the
> last half of the sentence was left out by mistake. Add it in to fix
> the issue.
>
> - move UFFDIO_WRITEPROTECT 'since' to its own line All other ioctls
> note the kernel version introduced on a separate line from the ioctl
> name. Update UFFDIO_WRITEPROTECT to match the existing style.
>
> Signed-off-by: Axel Rasmussen <[email protected]>
> ---

Patch applied.

Thanks,
Alex

> man2/ioctl_userfaultfd.2 | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> index b5281ec4c..339adf8fe 100644
> --- a/man2/ioctl_userfaultfd.2
> +++ b/man2/ioctl_userfaultfd.2
> @@ -31,6 +31,7 @@ using calls of the form:
> ioctl(fd, cmd, argp);
> .EE
> .in
> +.PP
> In the above,
> .I fd
> is a file descriptor referring to a userfaultfd object,
> @@ -351,6 +352,7 @@ operation is supported.
> .B 1 << _UFFDIO_WRITEPROTECT
> The
> .B UFFDIO_WRITEPROTECT
> +operation is supported.
> .TP
> .B 1 << _UFFDIO_ZEROPAGE
> The
> @@ -693,7 +695,8 @@ field of the
> structure was not a multiple of the system page size; or
> .I len
> was zero; or the specified range was otherwise invalid.
> -.SS UFFDIO_WRITEPROTECT (Since Linux 5.7)
> +.SS UFFDIO_WRITEPROTECT
> +(Since Linux 5.7.)
> Write-protect or write-unprotect a userfaultfd-registered memory range
> registered with mode
> .BR UFFDIO_REGISTER_MODE_WP .
> --
> 2.42.0.459.ge4e396fd5e-goog
>


Attachments:
(No filename) (1.74 kB)
signature.asc (849.00 B)
Download all attachments

2023-09-25 23:45:08

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [PATCH 05/10] ioctl_userfaultfd.2: describe two-step feature handshake

Hi Axel,

On Tue, Sep 19, 2023 at 12:02:01PM -0700, Axel Rasmussen wrote:
> Fully describe how UFFDIO_API can be used to perform a two-step feature
> handshake, and also note the case where this isn't necessary (programs
> which don't depend on any extra features).
>
> This lets us clean up an old FIXME asking for this to be described.
>
> Signed-off-by: Axel Rasmussen <[email protected]>
> ---
> man2/ioctl_userfaultfd.2 | 37 +++++++++++++++++++++----------------
> 1 file changed, 21 insertions(+), 16 deletions(-)
>
> diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> index 339adf8fe..e91a1dfc8 100644
> --- a/man2/ioctl_userfaultfd.2
> +++ b/man2/ioctl_userfaultfd.2
> @@ -83,7 +83,6 @@ struct uffdio_api {
> The
> .I api
> field denotes the API version requested by the application.
> -.PP
> The kernel verifies that it can support the requested API version,
> and sets the
> .I features
> @@ -93,6 +92,25 @@ fields to bit masks representing all the available features and the generic
> .BR ioctl (2)
> operations available.
> .PP
> +After Linux 4.11,

"After" to me means that you're not including 4.11. You probably mean
"Since", which would be inclusive? Or do you actually mean since 4.12?

In any case, "since" is more commonly used, so I prefer that wording,
for consistency.

Thanks,
Alex

> +applications should use the
> +.I features
> +field to perform a two-step handshake.
> +First,
> +.BR UFFDIO_API
> +is called with the
> +.I features
> +field set to zero.
> +The kernel responsds by setting all supported feature bits.
> +.PP
> +Applications which do not require any specific features
> +can begin using the userfaultfd immediately.
> +Applications which do need specific features
> +should call
> +.BR UFFDIO_API
> +again with a subset of the reported feature bits set
> +to enable those features.
> +.PP
> Before Linux 4.11, the
> .I features
> field must be initialized to zero before the call to
> @@ -102,24 +120,11 @@ and zero (i.e., no feature bits) is placed in the
> field by the kernel upon return from
> .BR ioctl (2).
> .PP
> -Starting from Linux 4.11, the
> -.I features
> -field can be used to ask whether particular features are supported
> -and explicitly enable userfaultfd features that are disabled by default.
> -The kernel always reports all the available features in the
> -.I features
> -field.
> -.PP
> -To enable userfaultfd features the application should set
> -a bit corresponding to each feature it wants to enable in the
> -.I features
> -field.
> -If the kernel supports all the requested features it will enable them.
> -Otherwise it will zero out the returned
> +If the application sets unsupported feature bits,
> +the kernel will zero out the returned
> .I uffdio_api
> structure and return
> .BR EINVAL .
> -.\" FIXME add more details about feature negotiation and enablement
> .PP
> The following feature bits may be set:
> .TP
> --
> 2.42.0.459.ge4e396fd5e-goog
>


Attachments:
(No filename) (3.00 kB)
signature.asc (849.00 B)
Download all attachments

2023-09-25 23:56:38

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [PATCH 06/10] ioctl_userfaultfd.2: describe missing UFFDIO_API feature flags

On Tue, Sep 19, 2023 at 12:02:02PM -0700, Axel Rasmussen wrote:
> Several new features have been added to the kernel recently, and the man
> page wasn't updated to describe these new features. So, add in
> descriptions of any missing features.
>
> Signed-off-by: Axel Rasmussen <[email protected]>
> ---

Patch applied.

Thanks,
Alex

> man2/ioctl_userfaultfd.2 | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> index e91a1dfc8..53b1f473f 100644
> --- a/man2/ioctl_userfaultfd.2
> +++ b/man2/ioctl_userfaultfd.2
> @@ -204,6 +204,13 @@ If this feature bit is set,
> .I uffd_msg.pagefault.feat.ptid
> will be set to the faulted thread ID for each page-fault message.
> .TP
> +.BR UFFD_FEATURE_PAGEFAULT_FLAG_WP " (since Linux 5.10)"
> +If this feature bit is set,
> +userfaultfd supports write-protect faults
> +for anonymous memory.
> +(Note that shmem / hugetlbfs support
> +is indicated by a separate feature.)
> +.TP
> .BR UFFD_FEATURE_MINOR_HUGETLBFS " (since Linux 5.13)"
> If this feature bit is set,
> the kernel supports registering userfaultfd ranges
> @@ -221,6 +228,22 @@ will be set to the exact page-fault address that was reported by the hardware,
> and will not mask the offset within the page.
> Note that old Linux versions might indicate the exact address as well,
> even though the feature bit is not set.
> +.TP
> +.BR UFFD_FEATURE_WP_HUGETLBFS_SHMEM " (since Linux 5.19)"
> +If this feature bit is set,
> +userfaultfd supports write-protect faults
> +for hugetlbfs and shmem / tmpfs memory.
> +.TP
> +.BR UFFD_FEATURE_WP_UNPOPULATED " (since Linux 6.4)"
> +If this feature bit is set,
> +the kernel will handle anonymous memory the same way as file memory,
> +by allowing the user to write-protect unpopulated ptes.
> +.TP
> +.BR UFFD_FEATURE_POISON " (since Linux 6.6)"
> +If this feature bit is set,
> +the kernel supports resolving faults with the
> +.B UFFDIO_POISON
> +ioctl.
> .PP
> The returned
> .I ioctls
> --
> 2.42.0.459.ge4e396fd5e-goog
>


Attachments:
(No filename) (2.09 kB)
signature.asc (849.00 B)
Download all attachments

2023-09-25 23:59:51

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [PATCH 08/10] ioctl_userfaultfd.2: clarify the state of the uffdio_api structure on error

Hi Axel,

On Tue, Sep 19, 2023 at 12:02:04PM -0700, Axel Rasmussen wrote:
> The old FIXME noted that the zeroing was done to differentiate the two
> EINVAL cases. It's possible something like this was true historically,
> but in current Linux we zero it in *both* EINVAL cases, so this is at
> least no longer true.
>
> After reading the code, I can't determine any clear reason why we zero
> it in some cases but not in others. So, some simple advice we can give
> userspace is: if an error occurs, treat the contents of the structure as
> unspecified. Just re-initialize it before retrying UFFDIO_API again.
>
> Signed-off-by: Axel Rasmussen <[email protected]>

I can't apply this patch due to conflicts (due to not having applied two
of the previous ones). Please resend all remaining patches in following
revisions of the patch set.

The applied ones are here:

<https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=contrib>

It's kind of like Linux's 'next' branch.

Cheers,
Alex

> ---
> man2/ioctl_userfaultfd.2 | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> index 1aa9654be..29dca1f6b 100644
> --- a/man2/ioctl_userfaultfd.2
> +++ b/man2/ioctl_userfaultfd.2
> @@ -272,6 +272,14 @@ operation returns 0 on success.
> On error, \-1 is returned and
> .I errno
> is set to indicate the error.
> +If an error occurs,
> +the kernel may zero the provided
> +.I uffdio_api
> +structure.
> +The caller should treat its contents as unspecified,
> +and reinitialize it before re-attempting another
> +.B UFFDIO_API
> +call.
> Possible errors include:
> .TP
> .B EFAULT
> @@ -305,14 +313,6 @@ twice,
> the first time with no features set,
> is explicitly allowed
> as per the two-step feature detection handshake.
> -.\" FIXME In the above error case, the returned 'uffdio_api' structure is
> -.\" zeroed out. Why is this done? This should be explained in the manual page.
> -.\"
> -.\" Mike Rapoport:
> -.\" In my understanding the uffdio_api
> -.\" structure is zeroed to allow the caller
> -.\" to distinguish the reasons for -EINVAL.
> -.\"
> .SS UFFDIO_REGISTER
> (Since Linux 4.3.)
> Register a memory address range with the userfaultfd object.
> --
> 2.42.0.459.ge4e396fd5e-goog
>


Attachments:
(No filename) (2.35 kB)
signature.asc (849.00 B)
Download all attachments

2023-09-27 01:06:47

by Axel Rasmussen

[permalink] [raw]
Subject: Re: [PATCH 08/10] ioctl_userfaultfd.2: clarify the state of the uffdio_api structure on error

On Mon, Sep 25, 2023 at 4:56 PM Alejandro Colomar <[email protected]> wrote:
>
> Hi Axel,
>
> On Tue, Sep 19, 2023 at 12:02:04PM -0700, Axel Rasmussen wrote:
> > The old FIXME noted that the zeroing was done to differentiate the two
> > EINVAL cases. It's possible something like this was true historically,
> > but in current Linux we zero it in *both* EINVAL cases, so this is at
> > least no longer true.
> >
> > After reading the code, I can't determine any clear reason why we zero
> > it in some cases but not in others. So, some simple advice we can give
> > userspace is: if an error occurs, treat the contents of the structure as
> > unspecified. Just re-initialize it before retrying UFFDIO_API again.
> >
> > Signed-off-by: Axel Rasmussen <[email protected]>
>
> I can't apply this patch due to conflicts (due to not having applied two
> of the previous ones). Please resend all remaining patches in following
> revisions of the patch set.
>
> The applied ones are here:
>
> <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=contrib>
>
> It's kind of like Linux's 'next' branch.

Thanks for the review Alex! I'll fix up the issues noted and send the
remaining few patches this week. :)

>
> Cheers,
> Alex
>
> > ---
> > man2/ioctl_userfaultfd.2 | 16 ++++++++--------
> > 1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> > index 1aa9654be..29dca1f6b 100644
> > --- a/man2/ioctl_userfaultfd.2
> > +++ b/man2/ioctl_userfaultfd.2
> > @@ -272,6 +272,14 @@ operation returns 0 on success.
> > On error, \-1 is returned and
> > .I errno
> > is set to indicate the error.
> > +If an error occurs,
> > +the kernel may zero the provided
> > +.I uffdio_api
> > +structure.
> > +The caller should treat its contents as unspecified,
> > +and reinitialize it before re-attempting another
> > +.B UFFDIO_API
> > +call.
> > Possible errors include:
> > .TP
> > .B EFAULT
> > @@ -305,14 +313,6 @@ twice,
> > the first time with no features set,
> > is explicitly allowed
> > as per the two-step feature detection handshake.
> > -.\" FIXME In the above error case, the returned 'uffdio_api' structure is
> > -.\" zeroed out. Why is this done? This should be explained in the manual page.
> > -.\"
> > -.\" Mike Rapoport:
> > -.\" In my understanding the uffdio_api
> > -.\" structure is zeroed to allow the caller
> > -.\" to distinguish the reasons for -EINVAL.
> > -.\"
> > .SS UFFDIO_REGISTER
> > (Since Linux 4.3.)
> > Register a memory address range with the userfaultfd object.
> > --
> > 2.42.0.459.ge4e396fd5e-goog
> >

2023-10-09 08:39:56

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 01/10] userfaultfd.2: briefly mention two-step feature handshake process

On Tue, Sep 19, 2023 at 12:01:57PM -0700, Axel Rasmussen wrote:
> This process is critical for programs which depend on extra features, so
> it's worth mentioning here.
>
> Future commits will much more fully describe it in ioctl_userfaultfd.2.
>
> Signed-off-by: Axel Rasmussen <[email protected]>

Reviewed-by: Mike Rapoport (IBM) <[email protected]>

> ---
> man2/userfaultfd.2 | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
> index 40354065c..1b2af22f9 100644
> --- a/man2/userfaultfd.2
> +++ b/man2/userfaultfd.2
> @@ -200,8 +200,9 @@ the application must enable it using the
> .B UFFDIO_API
> .BR ioctl (2)
> operation.
> -This operation allows a handshake between the kernel and user space
> -to determine the API version and supported features.
> +This operation allows a two-step handshake between the kernel and user space
> +to determine what API version and features the kernel supports,
> +and then to enable those features user space wants.
> This operation must be performed before any of the other
> .BR ioctl (2)
> operations described below (or those operations fail with the
> --
> 2.42.0.459.ge4e396fd5e-goog
>
>

--
Sincerely yours,
Mike.

2023-10-09 08:42:09

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 03/10] userfaultfd.2: comment on feature detection in the example program

On Tue, Sep 19, 2023 at 12:01:59PM -0700, Axel Rasmussen wrote:
> The example program doesn't depend on any extra features, so it does not
> make use of the two-step feature handshake process. This is fine, but it
> might set a bad example for programs which *do* depend on specific
> features (e.g. they may conclude they don't need to do anything to
> enable / detect them).
>
> No need to make the example program more complicated: let's just add a
> comment indicating why we do it the way we do it in the example, and
> describing briefly what a more complicated program would need to do
> instead.
>
> The comment is kept rather brief; a full description of this feature
> will be included in ioctl_userfaultfd.2 instead.
>
> Signed-off-by: Axel Rasmussen <[email protected]>

Reviewed-by: Mike Rapoport (IBM) <[email protected]>

> ---
> man2/userfaultfd.2 | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
> index 00d94e514..b2b79f61d 100644
> --- a/man2/userfaultfd.2
> +++ b/man2/userfaultfd.2
> @@ -881,6 +881,13 @@ main(int argc, char *argv[])
> if (uffd == \-1)
> err(EXIT_FAILURE, "userfaultfd");
> \&
> + /* NOTE: Two-step feature handshake is not needed here, since this
> + example doesn't require any specific features.
> +
> + Programs that *do* should call UFFDIO_API twice: once with
> + `features = 0` to detect features supported by this kernel, and
> + again with the subset of features the program actually wants to
> + enable. */
> uffdio_api.api = UFFD_API;
> uffdio_api.features = 0;
> if (ioctl(uffd, UFFDIO_API, &uffdio_api) == \-1)
> --
> 2.42.0.459.ge4e396fd5e-goog
>
>

--
Sincerely yours,
Mike.

2023-10-09 08:43:53

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 05/10] ioctl_userfaultfd.2: describe two-step feature handshake

On Tue, Sep 19, 2023 at 12:02:01PM -0700, Axel Rasmussen wrote:
> Fully describe how UFFDIO_API can be used to perform a two-step feature
> handshake, and also note the case where this isn't necessary (programs
> which don't depend on any extra features).
>
> This lets us clean up an old FIXME asking for this to be described.
>
> Signed-off-by: Axel Rasmussen <[email protected]>

Reviewed-by: Mike Rapoport (IBM) <[email protected]>

> ---
> man2/ioctl_userfaultfd.2 | 37 +++++++++++++++++++++----------------
> 1 file changed, 21 insertions(+), 16 deletions(-)
>
> diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> index 339adf8fe..e91a1dfc8 100644
> --- a/man2/ioctl_userfaultfd.2
> +++ b/man2/ioctl_userfaultfd.2
> @@ -83,7 +83,6 @@ struct uffdio_api {
> The
> .I api
> field denotes the API version requested by the application.
> -.PP
> The kernel verifies that it can support the requested API version,
> and sets the
> .I features
> @@ -93,6 +92,25 @@ fields to bit masks representing all the available features and the generic
> .BR ioctl (2)
> operations available.
> .PP
> +After Linux 4.11,
> +applications should use the
> +.I features
> +field to perform a two-step handshake.
> +First,
> +.BR UFFDIO_API
> +is called with the
> +.I features
> +field set to zero.
> +The kernel responsds by setting all supported feature bits.
> +.PP
> +Applications which do not require any specific features
> +can begin using the userfaultfd immediately.
> +Applications which do need specific features
> +should call
> +.BR UFFDIO_API
> +again with a subset of the reported feature bits set
> +to enable those features.
> +.PP
> Before Linux 4.11, the
> .I features
> field must be initialized to zero before the call to
> @@ -102,24 +120,11 @@ and zero (i.e., no feature bits) is placed in the
> field by the kernel upon return from
> .BR ioctl (2).
> .PP
> -Starting from Linux 4.11, the
> -.I features
> -field can be used to ask whether particular features are supported
> -and explicitly enable userfaultfd features that are disabled by default.
> -The kernel always reports all the available features in the
> -.I features
> -field.
> -.PP
> -To enable userfaultfd features the application should set
> -a bit corresponding to each feature it wants to enable in the
> -.I features
> -field.
> -If the kernel supports all the requested features it will enable them.
> -Otherwise it will zero out the returned
> +If the application sets unsupported feature bits,
> +the kernel will zero out the returned
> .I uffdio_api
> structure and return
> .BR EINVAL .
> -.\" FIXME add more details about feature negotiation and enablement
> .PP
> The following feature bits may be set:
> .TP
> --
> 2.42.0.459.ge4e396fd5e-goog
>
>

--
Sincerely yours,
Mike.

2023-10-09 08:46:31

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 06/10] ioctl_userfaultfd.2: describe missing UFFDIO_API feature flags

On Tue, Sep 19, 2023 at 12:02:02PM -0700, Axel Rasmussen wrote:
> Several new features have been added to the kernel recently, and the man
> page wasn't updated to describe these new features. So, add in
> descriptions of any missing features.
>
> Signed-off-by: Axel Rasmussen <[email protected]>

Reviewed-by: Mike Rapoport (IBM) <[email protected]>

with a small nit below

> ---
> man2/ioctl_userfaultfd.2 | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> index e91a1dfc8..53b1f473f 100644
> --- a/man2/ioctl_userfaultfd.2
> +++ b/man2/ioctl_userfaultfd.2
> @@ -204,6 +204,13 @@ If this feature bit is set,
> .I uffd_msg.pagefault.feat.ptid
> will be set to the faulted thread ID for each page-fault message.
> .TP
> +.BR UFFD_FEATURE_PAGEFAULT_FLAG_WP " (since Linux 5.10)"
> +If this feature bit is set,
> +userfaultfd supports write-protect faults
> +for anonymous memory.
> +(Note that shmem / hugetlbfs support
> +is indicated by a separate feature.)
> +.TP
> .BR UFFD_FEATURE_MINOR_HUGETLBFS " (since Linux 5.13)"
> If this feature bit is set,
> the kernel supports registering userfaultfd ranges
> @@ -221,6 +228,22 @@ will be set to the exact page-fault address that was reported by the hardware,
> and will not mask the offset within the page.
> Note that old Linux versions might indicate the exact address as well,
> even though the feature bit is not set.
> +.TP
> +.BR UFFD_FEATURE_WP_HUGETLBFS_SHMEM " (since Linux 5.19)"
> +If this feature bit is set,
> +userfaultfd supports write-protect faults
> +for hugetlbfs and shmem / tmpfs memory.
> +.TP
> +.BR UFFD_FEATURE_WP_UNPOPULATED " (since Linux 6.4)"
> +If this feature bit is set,
> +the kernel will handle anonymous memory the same way as file memory,
> +by allowing the user to write-protect unpopulated ptes.

Nit: s/ptes/page table entries/

> +.TP
> +.BR UFFD_FEATURE_POISON " (since Linux 6.6)"
> +If this feature bit is set,
> +the kernel supports resolving faults with the
> +.B UFFDIO_POISON
> +ioctl.
> .PP
> The returned
> .I ioctls
> --
> 2.42.0.459.ge4e396fd5e-goog
>
>

--
Sincerely yours,
Mike.

2023-10-09 09:06:23

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 08/10] ioctl_userfaultfd.2: clarify the state of the uffdio_api structure on error

On Tue, Sep 19, 2023 at 12:02:04PM -0700, Axel Rasmussen wrote:
> The old FIXME noted that the zeroing was done to differentiate the two
> EINVAL cases. It's possible something like this was true historically,
> but in current Linux we zero it in *both* EINVAL cases, so this is at
> least no longer true.
>
> After reading the code, I can't determine any clear reason why we zero
> it in some cases but not in others. So, some simple advice we can give
> userspace is: if an error occurs, treat the contents of the structure as
> unspecified. Just re-initialize it before retrying UFFDIO_API again.

In old kernels (e.g. 4.20 and I didn't go to check when this changed) we
had two -EINVALS: one when UFFDIO_API was called when

state != UFFD_STATE_WAIT_API

and another for API version or features mismatch and we
zeroed uffd_api struct only in the second case.

In the current code the first case does not exits anymore.

> Signed-off-by: Axel Rasmussen <[email protected]>

Reviewed-by: Mike Rapoport (IBM) <[email protected]>

> ---
> man2/ioctl_userfaultfd.2 | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> index 1aa9654be..29dca1f6b 100644
> --- a/man2/ioctl_userfaultfd.2
> +++ b/man2/ioctl_userfaultfd.2
> @@ -272,6 +272,14 @@ operation returns 0 on success.
> On error, \-1 is returned and
> .I errno
> is set to indicate the error.
> +If an error occurs,
> +the kernel may zero the provided
> +.I uffdio_api
> +structure.
> +The caller should treat its contents as unspecified,
> +and reinitialize it before re-attempting another
> +.B UFFDIO_API
> +call.
> Possible errors include:
> .TP
> .B EFAULT
> @@ -305,14 +313,6 @@ twice,
> the first time with no features set,
> is explicitly allowed
> as per the two-step feature detection handshake.
> -.\" FIXME In the above error case, the returned 'uffdio_api' structure is
> -.\" zeroed out. Why is this done? This should be explained in the manual page.
> -.\"
> -.\" Mike Rapoport:
> -.\" In my understanding the uffdio_api
> -.\" structure is zeroed to allow the caller
> -.\" to distinguish the reasons for -EINVAL.
> -.\"
> .SS UFFDIO_REGISTER
> (Since Linux 4.3.)
> Register a memory address range with the userfaultfd object.
> --
> 2.42.0.459.ge4e396fd5e-goog
>
>

--
Sincerely yours,
Mike.

2023-10-09 09:11:01

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 10/10] ioctl_userfaultfd.2: document new UFFDIO_POISON ioctl

On Tue, Sep 19, 2023 at 12:02:06PM -0700, Axel Rasmussen wrote:
> This is a new feature recently added to the kernel. So, document the new
> ioctl the same way we do other UFFDIO_* ioctls.
>
> Also note the corresponding new ioctl flag we can return in reponse to a
> UFFDIO_REGISTER call.
>
> Signed-off-by: Axel Rasmussen <[email protected]>

With a small correction

Reviewed-by: Mike Rapoport (IBM) <[email protected]>

> ---
> man2/ioctl_userfaultfd.2 | 112 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 112 insertions(+)
>
> diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> index afe3caffc..1282f63e1 100644
> --- a/man2/ioctl_userfaultfd.2
> +++ b/man2/ioctl_userfaultfd.2
> @@ -405,6 +405,11 @@ operation is supported.
> The
> .B UFFDIO_CONTINUE
> operation is supported.
> +.TP
> +.B 1 << _UFFDIO_POISON
> +The
> +.B UFFDIO_POISON
> +operation is supported.
> .PP
> This
> .BR ioctl (2)
> @@ -916,6 +921,113 @@ The faulting process has exited at the time of a
> .B UFFDIO_CONTINUE
> operation.
> .\"
> +.SS UFFDIO_POISON
> +(Since Linux 6.6.)
> +Mark an address range as "poisoned".
> +Future accesses to these addresses will raise a
> +.B SIGBUS
> +signal.
> +Unlike
> +.B MADV_HWPOISON
> +this works by installing page table entries,
> +rather than "really" poisoning the underlying physical pages.
> +This means it only affects this particular address space.
> +.PP
> +The
> +.I argp
> +argument is a pointer to a
> +.I uffdio_continue

Did you mean uffdio_poison?

> +structure as shown below:
> +.PP
> +.in +4n
> +.EX
> +struct uffdio_poison {
> + struct uffdio_range range;
> + /* Range to install poison PTE markers in */
> + __u64 mode; /* Flags controlling the behavior of poison */
> + __s64 updated; /* Number of bytes poisoned, or negated error */
> +};
> +.EE
> +.in
> +.PP
> +The following value may be bitwise ORed in
> +.I mode
> +to change the behavior of the
> +.B UFFDIO_POISON
> +operation:
> +.TP
> +.B UFFDIO_POISON_MODE_DONTWAKE
> +Do not wake up the thread that waits for page-fault resolution.
> +.PP
> +The
> +.I updated
> +field is used by the kernel
> +to return the number of bytes that were actually poisoned,
> +or an error in the same manner as
> +.BR UFFDIO_COPY .
> +If the value returned in the
> +.I updated
> +field doesn't match the value that was specified in
> +.IR range.len ,
> +the operation fails with the error
> +.BR EAGAIN .
> +The
> +.I updated
> +field is output-only;
> +it is not read by the
> +.B UFFDIO_POISON
> +operation.
> +.PP
> +This
> +.BR ioctl (2)
> +operation returns 0 on success.
> +In this case,
> +the entire area was poisoned.
> +On error, \-1 is returned and
> +.I errno
> +is set to indicate the error.
> +Possible errors include:
> +.TP
> +.B EAGAIN
> +The number of bytes mapped
> +(i.e., the value returned in the
> +.I updated
> +field)
> +does not equal the value that was specified in the
> +.I range.len
> +field.
> +.TP
> +.B EINVAL
> +Either
> +.I range.start
> +or
> +.I range.len
> +was not a multiple of the system page size; or
> +.I range.len
> +was zero; or the range specified was invalid.
> +.TP
> +.B EINVAL
> +An invalid bit was specified in the
> +.I mode
> +field.
> +.TP
> +.B EEXIST
> +One or more pages were already mapped in the given range.
> +.TP
> +.B ENOENT
> +The faulting process has changed its virtual memory layout simultaneously with
> +an outstanding
> +.B UFFDIO_POISON
> +operation.
> +.TP
> +.B ENOMEM
> +Allocating memory for page table entries failed.
> +.TP
> +.B ESRCH
> +The faulting process has exited at the time of a
> +.B UFFDIO_POISON
> +operation.
> +.\"
> .SH RETURN VALUE
> See descriptions of the individual operations, above.
> .SH ERRORS
> --
> 2.42.0.459.ge4e396fd5e-goog
>
>

--
Sincerely yours,
Mike.

2023-10-09 10:49:43

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [PATCH 06/10] ioctl_userfaultfd.2: describe missing UFFDIO_API feature flags

Hi Mike,

On Mon, Oct 09, 2023 at 11:45:14AM +0300, Mike Rapoport wrote:
> On Tue, Sep 19, 2023 at 12:02:02PM -0700, Axel Rasmussen wrote:
> > Several new features have been added to the kernel recently, and the man
> > page wasn't updated to describe these new features. So, add in
> > descriptions of any missing features.
> >
> > Signed-off-by: Axel Rasmussen <[email protected]>
>
> Reviewed-by: Mike Rapoport (IBM) <[email protected]>
>
> with a small nit below

Thanks for the reviews!

>
> > ---

[...]

> > +by allowing the user to write-protect unpopulated ptes.
>
> Nit: s/ptes/page table entries/

I've applied the following patch:
<https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=2afbc25a7a3b1b68b638d7542a6bead7a1960a7d>

Cheers,
Alex

--
<https://www.alejandro-colomar.es/>


Attachments:
(No filename) (876.00 B)
signature.asc (849.00 B)
Download all attachments

2023-10-09 10:58:42

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [PATCH 05/10] ioctl_userfaultfd.2: describe two-step feature handshake

Hi Mike,

On Mon, Oct 09, 2023 at 11:42:47AM +0300, Mike Rapoport wrote:
> On Tue, Sep 19, 2023 at 12:02:01PM -0700, Axel Rasmussen wrote:
> > Fully describe how UFFDIO_API can be used to perform a two-step feature
> > handshake, and also note the case where this isn't necessary (programs
> > which don't depend on any extra features).
> >
> > This lets us clean up an old FIXME asking for this to be described.
> >
> > Signed-off-by: Axel Rasmussen <[email protected]>
>
> Reviewed-by: Mike Rapoport (IBM) <[email protected]>

Since v2 is unchanged, I've added this tag. Thanks for the review!

Cheers,
Alex

>
> > ---
> > man2/ioctl_userfaultfd.2 | 37 +++++++++++++++++++++----------------
> > 1 file changed, 21 insertions(+), 16 deletions(-)
> >
> > diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> > index 339adf8fe..e91a1dfc8 100644
> > --- a/man2/ioctl_userfaultfd.2
> > +++ b/man2/ioctl_userfaultfd.2
> > @@ -83,7 +83,6 @@ struct uffdio_api {
> > The
> > .I api
> > field denotes the API version requested by the application.
> > -.PP
> > The kernel verifies that it can support the requested API version,
> > and sets the
> > .I features
> > @@ -93,6 +92,25 @@ fields to bit masks representing all the available features and the generic
> > .BR ioctl (2)
> > operations available.
> > .PP
> > +After Linux 4.11,
> > +applications should use the
> > +.I features
> > +field to perform a two-step handshake.
> > +First,
> > +.BR UFFDIO_API
> > +is called with the
> > +.I features
> > +field set to zero.
> > +The kernel responsds by setting all supported feature bits.

s/responsds/responds/ amended.

> > +.PP
> > +Applications which do not require any specific features
> > +can begin using the userfaultfd immediately.
> > +Applications which do need specific features
> > +should call
> > +.BR UFFDIO_API
> > +again with a subset of the reported feature bits set
> > +to enable those features.
> > +.PP
> > Before Linux 4.11, the
> > .I features
> > field must be initialized to zero before the call to
> > @@ -102,24 +120,11 @@ and zero (i.e., no feature bits) is placed in the
> > field by the kernel upon return from
> > .BR ioctl (2).
> > .PP
> > -Starting from Linux 4.11, the
> > -.I features
> > -field can be used to ask whether particular features are supported
> > -and explicitly enable userfaultfd features that are disabled by default.
> > -The kernel always reports all the available features in the
> > -.I features
> > -field.
> > -.PP
> > -To enable userfaultfd features the application should set
> > -a bit corresponding to each feature it wants to enable in the
> > -.I features
> > -field.
> > -If the kernel supports all the requested features it will enable them.
> > -Otherwise it will zero out the returned
> > +If the application sets unsupported feature bits,
> > +the kernel will zero out the returned
> > .I uffdio_api
> > structure and return
> > .BR EINVAL .
> > -.\" FIXME add more details about feature negotiation and enablement
> > .PP
> > The following feature bits may be set:
> > .TP
> > --
> > 2.42.0.459.ge4e396fd5e-goog
> >
> >
>
> --
> Sincerely yours,
> Mike.

--
<https://www.alejandro-colomar.es/>


Attachments:
(No filename) (3.24 kB)
signature.asc (849.00 B)
Download all attachments

2023-10-10 17:14:41

by Axel Rasmussen

[permalink] [raw]
Subject: Re: [PATCH 10/10] ioctl_userfaultfd.2: document new UFFDIO_POISON ioctl

On Mon, Oct 9, 2023 at 2:10 AM Mike Rapoport <[email protected]> wrote:
>
> On Tue, Sep 19, 2023 at 12:02:06PM -0700, Axel Rasmussen wrote:
> > This is a new feature recently added to the kernel. So, document the new
> > ioctl the same way we do other UFFDIO_* ioctls.
> >
> > Also note the corresponding new ioctl flag we can return in reponse to a
> > UFFDIO_REGISTER call.
> >
> > Signed-off-by: Axel Rasmussen <[email protected]>
>
> With a small correction
>
> Reviewed-by: Mike Rapoport (IBM) <[email protected]>
>
> > ---
> > man2/ioctl_userfaultfd.2 | 112 +++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 112 insertions(+)
> >
> > diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> > index afe3caffc..1282f63e1 100644
> > --- a/man2/ioctl_userfaultfd.2
> > +++ b/man2/ioctl_userfaultfd.2
> > @@ -405,6 +405,11 @@ operation is supported.
> > The
> > .B UFFDIO_CONTINUE
> > operation is supported.
> > +.TP
> > +.B 1 << _UFFDIO_POISON
> > +The
> > +.B UFFDIO_POISON
> > +operation is supported.
> > .PP
> > This
> > .BR ioctl (2)
> > @@ -916,6 +921,113 @@ The faulting process has exited at the time of a
> > .B UFFDIO_CONTINUE
> > operation.
> > .\"
> > +.SS UFFDIO_POISON
> > +(Since Linux 6.6.)
> > +Mark an address range as "poisoned".
> > +Future accesses to these addresses will raise a
> > +.B SIGBUS
> > +signal.
> > +Unlike
> > +.B MADV_HWPOISON
> > +this works by installing page table entries,
> > +rather than "really" poisoning the underlying physical pages.
> > +This means it only affects this particular address space.
> > +.PP
> > +The
> > +.I argp
> > +argument is a pointer to a
> > +.I uffdio_continue
>
> Did you mean uffdio_poison?

Ah, yes. :) Should have copy/pasted more carefully. I can send a v3
with this small correction.

>
> > +structure as shown below:
> > +.PP
> > +.in +4n
> > +.EX
> > +struct uffdio_poison {
> > + struct uffdio_range range;
> > + /* Range to install poison PTE markers in */
> > + __u64 mode; /* Flags controlling the behavior of poison */
> > + __s64 updated; /* Number of bytes poisoned, or negated error */
> > +};
> > +.EE
> > +.in
> > +.PP
> > +The following value may be bitwise ORed in
> > +.I mode
> > +to change the behavior of the
> > +.B UFFDIO_POISON
> > +operation:
> > +.TP
> > +.B UFFDIO_POISON_MODE_DONTWAKE
> > +Do not wake up the thread that waits for page-fault resolution.
> > +.PP
> > +The
> > +.I updated
> > +field is used by the kernel
> > +to return the number of bytes that were actually poisoned,
> > +or an error in the same manner as
> > +.BR UFFDIO_COPY .
> > +If the value returned in the
> > +.I updated
> > +field doesn't match the value that was specified in
> > +.IR range.len ,
> > +the operation fails with the error
> > +.BR EAGAIN .
> > +The
> > +.I updated
> > +field is output-only;
> > +it is not read by the
> > +.B UFFDIO_POISON
> > +operation.
> > +.PP
> > +This
> > +.BR ioctl (2)
> > +operation returns 0 on success.
> > +In this case,
> > +the entire area was poisoned.
> > +On error, \-1 is returned and
> > +.I errno
> > +is set to indicate the error.
> > +Possible errors include:
> > +.TP
> > +.B EAGAIN
> > +The number of bytes mapped
> > +(i.e., the value returned in the
> > +.I updated
> > +field)
> > +does not equal the value that was specified in the
> > +.I range.len
> > +field.
> > +.TP
> > +.B EINVAL
> > +Either
> > +.I range.start
> > +or
> > +.I range.len
> > +was not a multiple of the system page size; or
> > +.I range.len
> > +was zero; or the range specified was invalid.
> > +.TP
> > +.B EINVAL
> > +An invalid bit was specified in the
> > +.I mode
> > +field.
> > +.TP
> > +.B EEXIST
> > +One or more pages were already mapped in the given range.
> > +.TP
> > +.B ENOENT
> > +The faulting process has changed its virtual memory layout simultaneously with
> > +an outstanding
> > +.B UFFDIO_POISON
> > +operation.
> > +.TP
> > +.B ENOMEM
> > +Allocating memory for page table entries failed.
> > +.TP
> > +.B ESRCH
> > +The faulting process has exited at the time of a
> > +.B UFFDIO_POISON
> > +operation.
> > +.\"
> > .SH RETURN VALUE
> > See descriptions of the individual operations, above.
> > .SH ERRORS
> > --
> > 2.42.0.459.ge4e396fd5e-goog
> >
> >
>
> --
> Sincerely yours,
> Mike.