2023-10-03 19:46:30

by Axel Rasmussen

[permalink] [raw]
Subject: [PATCH v2 0/5] userfaultfd man page updates

This series includes only the remaining patches not applied from v1, with
review comments addressed. This series is based on the "contrib" branch.

Changelog:

v1->v2:
- In patch 1 (patch 5 in v1), change "after" to "since" for consistency and to
be clear that we mean 4.11+ (inclusive).
- In patch 2 (patch 7 in v1), reorder error codes alphabetically (EINVAL then
EPERM).
- In patch 3 (patch 8 in v1), resolve conflicts with earlier review comments.

Original cover letter:

Various updates for userfaultfd man pages. To summarize the changes:

- Correctly / fully describe the two-step feature support handshake process.
- Describe new UFFDIO_POISON ioctl.
- Other small improvements (missing ioctls, error codes, etc).

Axel Rasmussen (5):
ioctl_userfaultfd.2: describe two-step feature handshake
ioctl_userfaultfd.2: correct and update UFFDIO_API ioctl error codes
ioctl_userfaultfd.2: clarify the state of the uffdio_api structure on
error
ioctl_userfaultfd.2: fix / update UFFDIO_REGISTER error code list
ioctl_userfaultfd.2: document new UFFDIO_POISON ioctl

man2/ioctl_userfaultfd.2 | 226 +++++++++++++++++++++++++++++++--------
1 file changed, 181 insertions(+), 45 deletions(-)

--
2.42.0.609.gbb76f46606-goog


2023-10-03 19:46:33

by Axel Rasmussen

[permalink] [raw]
Subject: [PATCH v2 4/5] ioctl_userfaultfd.2: fix / update UFFDIO_REGISTER error code list

The list of error codes in the man page was out of date with respect to
the current state of the kernel. Some errors were partially /
incorrectly described.

Update the error code listing, so it matches the current state of the
kernel, and correctly describes all the errors.

Signed-off-by: Axel Rasmussen <[email protected]>
---
man2/ioctl_userfaultfd.2 | 37 +++++++++++++++++++++----------------
1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index 2ee6a0532..95d69f773 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2
@@ -388,12 +388,6 @@ On error, \-1 is returned and
.I errno
is set to indicate the error.
Possible errors include:
-.\" FIXME Is the following error list correct?
-.\"
-.TP
-.B EBUSY
-A mapping in the specified range is registered with another
-userfaultfd object.
.TP
.B EFAULT
.I argp
@@ -408,21 +402,32 @@ field; or the
field was zero.
.TP
.B EINVAL
-There is no mapping in the specified address range.
-.TP
-.B EINVAL
+The specified address range was invalid.
+More specifically,
+no mapping exists in the given range,
+or the mapping that exists there is invalid
+(e.g. unsupported type of memory),
+or the range values (
.I range.start
or
.I range.len
-is not a multiple of the system page size; or,
+) are not multiples of the relevant page size,
+or
.I range.len
-is zero; or these fields are otherwise invalid.
+is zero.
.TP
-.B EINVAL
-There as an incompatible mapping in the specified address range.
-.\" Mike Rapoport:
-.\" ENOMEM if the process is exiting and the
-.\" mm_struct has gone by the time userfault grabs it.
+.B ENOMEM
+The process is exiting,
+and its address space has already been torn down
+when userfaultfd attempts to reference it.
+.TP
+.B EPERM
+The userfaultfd would allow writing to a file backing the mapping,
+but the calling process lacks such write permissions.
+.TP
+.B EBUSY
+A mapping in the specified range is registered with another
+userfaultfd object.
.SS UFFDIO_UNREGISTER
(Since Linux 4.3.)
Unregister a memory address range from userfaultfd.
--
2.42.0.609.gbb76f46606-goog

2023-10-03 19:46:37

by Axel Rasmussen

[permalink] [raw]
Subject: [PATCH v2 1/5] ioctl_userfaultfd.2: describe two-step feature handshake

Fully describe how UFFDIO_API can be used to perform a two-step feature
handshake, and also note the case where this isn't necessary (programs
which don't depend on any extra features).

This lets us clean up an old FIXME asking for this to be described.

Signed-off-by: Axel Rasmussen <[email protected]>
---
man2/ioctl_userfaultfd.2 | 37 +++++++++++++++++++++----------------
1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index b5281ec4c..ef352a69d 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2
@@ -82,7 +82,6 @@ struct uffdio_api {
The
.I api
field denotes the API version requested by the application.
-.PP
The kernel verifies that it can support the requested API version,
and sets the
.I features
@@ -92,6 +91,25 @@ fields to bit masks representing all the available features and the generic
.BR ioctl (2)
operations available.
.PP
+Since Linux 4.11,
+applications should use the
+.I features
+field to perform a two-step handshake.
+First,
+.BR UFFDIO_API
+is called with the
+.I features
+field set to zero.
+The kernel responsds by setting all supported feature bits.
+.PP
+Applications which do not require any specific features
+can begin using the userfaultfd immediately.
+Applications which do need specific features
+should call
+.BR UFFDIO_API
+again with a subset of the reported feature bits set
+to enable those features.
+.PP
Before Linux 4.11, the
.I features
field must be initialized to zero before the call to
@@ -101,24 +119,11 @@ and zero (i.e., no feature bits) is placed in the
field by the kernel upon return from
.BR ioctl (2).
.PP
-Starting from Linux 4.11, the
-.I features
-field can be used to ask whether particular features are supported
-and explicitly enable userfaultfd features that are disabled by default.
-The kernel always reports all the available features in the
-.I features
-field.
-.PP
-To enable userfaultfd features the application should set
-a bit corresponding to each feature it wants to enable in the
-.I features
-field.
-If the kernel supports all the requested features it will enable them.
-Otherwise it will zero out the returned
+If the application sets unsupported feature bits,
+the kernel will zero out the returned
.I uffdio_api
structure and return
.BR EINVAL .
-.\" FIXME add more details about feature negotiation and enablement
.PP
The following feature bits may be set:
.TP
--
2.42.0.609.gbb76f46606-goog

2023-10-03 19:46:40

by Axel Rasmussen

[permalink] [raw]
Subject: [PATCH v2 2/5] ioctl_userfaultfd.2: correct and update UFFDIO_API ioctl error codes

First, it is not correct that repeated UFFDIO_API calls result in
EINVAL. This is true *if both calls enable features*, but in the case
where we're doing a two-step feature detection handshake, the kernel
explicitly expects 2 calls (one with no features set). So, correct this
description.

Then, some new error cases have been added to the kernel recently, and
the man page wasn't updated to note these. So, add in descriptions of
these new error cases.

Signed-off-by: Axel Rasmussen <[email protected]>
---
man2/ioctl_userfaultfd.2 | 24 +++++++++++++++++++-----
1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index ef352a69d..28dd2fcdd 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2
@@ -256,17 +256,31 @@ refers to an address that is outside the calling process's
accessible address space.
.TP
.B EINVAL
-The userfaultfd has already been enabled by a previous
-.B UFFDIO_API
-operation.
-.TP
-.B EINVAL
The API version requested in the
.I api
field is not supported by this kernel, or the
.I features
field passed to the kernel includes feature bits that are not supported
by the current kernel version.
+.TP
+.B EINVAL
+A previous
+.B UFFDIO_API
+call already enabled one or more features for this userfaultfd.
+Calling
+.B UFFDIO_API
+twice,
+the first time with no features set,
+is explicitly allowed
+as per the two-step feature detection handshake.
+.TP
+.B EPERM
+The
+.B UFFD_FEATURE_EVENT_FORK
+feature was enabled,
+but the calling process doesn't have the
+.B CAP_SYS_PTRACE
+capability.
.\" FIXME In the above error case, the returned 'uffdio_api' structure is
.\" zeroed out. Why is this done? This should be explained in the manual page.
.\"
--
2.42.0.609.gbb76f46606-goog

2023-10-03 19:46:41

by Axel Rasmussen

[permalink] [raw]
Subject: [PATCH v2 3/5] ioctl_userfaultfd.2: clarify the state of the uffdio_api structure on error

The old FIXME noted that the zeroing was done to differentiate the two
EINVAL cases. It's possible something like this was true historically,
but in current Linux we zero it in *both* EINVAL cases, so this is at
least no longer true.

After reading the code, I can't determine any clear reason why we zero
it in some cases but not in others. So, some simple advice we can give
userspace is: if an error occurs, treat the contents of the structure as
unspecified. Just re-initialize it before retrying UFFDIO_API again.

Signed-off-by: Axel Rasmussen <[email protected]>
---
man2/ioctl_userfaultfd.2 | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index 28dd2fcdd..2ee6a0532 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2
@@ -248,6 +248,14 @@ operation returns 0 on success.
On error, \-1 is returned and
.I errno
is set to indicate the error.
+If an error occurs,
+the kernel may zero the provided
+.I uffdio_api
+structure.
+The caller should treat its contents as unspecified,
+and reinitialize it before re-attempting another
+.B UFFDIO_API
+call.
Possible errors include:
.TP
.B EFAULT
@@ -281,14 +289,6 @@ feature was enabled,
but the calling process doesn't have the
.B CAP_SYS_PTRACE
capability.
-.\" FIXME In the above error case, the returned 'uffdio_api' structure is
-.\" zeroed out. Why is this done? This should be explained in the manual page.
-.\"
-.\" Mike Rapoport:
-.\" In my understanding the uffdio_api
-.\" structure is zeroed to allow the caller
-.\" to distinguish the reasons for -EINVAL.
-.\"
.SS UFFDIO_REGISTER
(Since Linux 4.3.)
Register a memory address range with the userfaultfd object.
--
2.42.0.609.gbb76f46606-goog

2023-10-03 19:46:42

by Axel Rasmussen

[permalink] [raw]
Subject: [PATCH v2 5/5] ioctl_userfaultfd.2: document new UFFDIO_POISON ioctl

This is a new feature recently added to the kernel. So, document the new
ioctl the same way we do other UFFDIO_* ioctls.

Also note the corresponding new ioctl flag we can return in reponse to a
UFFDIO_REGISTER call.

Signed-off-by: Axel Rasmussen <[email protected]>
---
man2/ioctl_userfaultfd.2 | 112 +++++++++++++++++++++++++++++++++++++++
1 file changed, 112 insertions(+)

diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index 95d69f773..6b6980d4a 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2
@@ -380,6 +380,11 @@ operation is supported.
The
.B UFFDIO_CONTINUE
operation is supported.
+.TP
+.B 1 << _UFFDIO_POISON
+The
+.B UFFDIO_POISON
+operation is supported.
.PP
This
.BR ioctl (2)
@@ -890,6 +895,113 @@ The faulting process has exited at the time of a
.B UFFDIO_CONTINUE
operation.
.\"
+.SS UFFDIO_POISON
+(Since Linux 6.6.)
+Mark an address range as "poisoned".
+Future accesses to these addresses will raise a
+.B SIGBUS
+signal.
+Unlike
+.B MADV_HWPOISON
+this works by installing page table entries,
+rather than "really" poisoning the underlying physical pages.
+This means it only affects this particular address space.
+.PP
+The
+.I argp
+argument is a pointer to a
+.I uffdio_continue
+structure as shown below:
+.PP
+.in +4n
+.EX
+struct uffdio_poison {
+ struct uffdio_range range;
+ /* Range to install poison PTE markers in */
+ __u64 mode; /* Flags controlling the behavior of poison */
+ __s64 updated; /* Number of bytes poisoned, or negated error */
+};
+.EE
+.in
+.PP
+The following value may be bitwise ORed in
+.I mode
+to change the behavior of the
+.B UFFDIO_POISON
+operation:
+.TP
+.B UFFDIO_POISON_MODE_DONTWAKE
+Do not wake up the thread that waits for page-fault resolution.
+.PP
+The
+.I updated
+field is used by the kernel
+to return the number of bytes that were actually poisoned,
+or an error in the same manner as
+.BR UFFDIO_COPY .
+If the value returned in the
+.I updated
+field doesn't match the value that was specified in
+.IR range.len ,
+the operation fails with the error
+.BR EAGAIN .
+The
+.I updated
+field is output-only;
+it is not read by the
+.B UFFDIO_POISON
+operation.
+.PP
+This
+.BR ioctl (2)
+operation returns 0 on success.
+In this case,
+the entire area was poisoned.
+On error, \-1 is returned and
+.I errno
+is set to indicate the error.
+Possible errors include:
+.TP
+.B EAGAIN
+The number of bytes mapped
+(i.e., the value returned in the
+.I updated
+field)
+does not equal the value that was specified in the
+.I range.len
+field.
+.TP
+.B EINVAL
+Either
+.I range.start
+or
+.I range.len
+was not a multiple of the system page size; or
+.I range.len
+was zero; or the range specified was invalid.
+.TP
+.B EINVAL
+An invalid bit was specified in the
+.I mode
+field.
+.TP
+.B EEXIST
+One or more pages were already mapped in the given range.
+.TP
+.B ENOENT
+The faulting process has changed its virtual memory layout simultaneously with
+an outstanding
+.B UFFDIO_POISON
+operation.
+.TP
+.B ENOMEM
+Allocating memory for page table entries failed.
+.TP
+.B ESRCH
+The faulting process has exited at the time of a
+.B UFFDIO_POISON
+operation.
+.\"
.SH RETURN VALUE
See descriptions of the individual operations, above.
.SH ERRORS
--
2.42.0.609.gbb76f46606-goog

2023-10-08 21:53:14

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] ioctl_userfaultfd.2: clarify the state of the uffdio_api structure on error

Hi Axel,

On Tue, Oct 03, 2023 at 12:45:45PM -0700, Axel Rasmussen wrote:
> The old FIXME noted that the zeroing was done to differentiate the two
> EINVAL cases. It's possible something like this was true historically,
> but in current Linux we zero it in *both* EINVAL cases, so this is at
> least no longer true.
>
> After reading the code, I can't determine any clear reason why we zero
> it in some cases but not in others. So, some simple advice we can give
> userspace is: if an error occurs, treat the contents of the structure as
> unspecified. Just re-initialize it before retrying UFFDIO_API again.
>
> Signed-off-by: Axel Rasmussen <[email protected]>
> ---
> man2/ioctl_userfaultfd.2 | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> index 28dd2fcdd..2ee6a0532 100644
> --- a/man2/ioctl_userfaultfd.2
> +++ b/man2/ioctl_userfaultfd.2
> @@ -248,6 +248,14 @@ operation returns 0 on success.
> On error, \-1 is returned and
> .I errno
> is set to indicate the error.
> +If an error occurs,
> +the kernel may zero the provided
> +.I uffdio_api
> +structure.
> +The caller should treat its contents as unspecified,
> +and reinitialize it before re-attempting another
> +.B UFFDIO_API
> +call.
> Possible errors include:
> .TP
> .B EFAULT
> @@ -281,14 +289,6 @@ feature was enabled,
> but the calling process doesn't have the
> .B CAP_SYS_PTRACE
> capability.
> -.\" FIXME In the above error case, the returned 'uffdio_api' structure is
> -.\" zeroed out. Why is this done? This should be explained in the manual page.
> -.\"
> -.\" Mike Rapoport:
> -.\" In my understanding the uffdio_api
> -.\" structure is zeroed to allow the caller
> -.\" to distinguish the reasons for -EINVAL.
> -.\"

I've added Mike to the thread in case he wants to comment.

Thanks,
Alex

> .SS UFFDIO_REGISTER
> (Since Linux 4.3.)
> Register a memory address range with the userfaultfd object.
> --
> 2.42.0.609.gbb76f46606-goog
>

--
<https://www.alejandro-colomar.es/>


Attachments:
(No filename) (2.10 kB)
signature.asc (849.00 B)
Download all attachments

2023-10-08 22:07:22

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] ioctl_userfaultfd.2: fix / update UFFDIO_REGISTER error code list

Hi Axel,

On Tue, Oct 03, 2023 at 12:45:46PM -0700, Axel Rasmussen wrote:
> The list of error codes in the man page was out of date with respect to
> the current state of the kernel. Some errors were partially /
> incorrectly described.
>
> Update the error code listing, so it matches the current state of the
> kernel, and correctly describes all the errors.
>
> Signed-off-by: Axel Rasmussen <[email protected]>
> ---
> man2/ioctl_userfaultfd.2 | 37 +++++++++++++++++++++----------------
> 1 file changed, 21 insertions(+), 16 deletions(-)
>
> diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> index 2ee6a0532..95d69f773 100644
> --- a/man2/ioctl_userfaultfd.2
> +++ b/man2/ioctl_userfaultfd.2
> @@ -388,12 +388,6 @@ On error, \-1 is returned and
> .I errno
> is set to indicate the error.
> Possible errors include:
> -.\" FIXME Is the following error list correct?
> -.\"
> -.TP
> -.B EBUSY
> -A mapping in the specified range is registered with another
> -userfaultfd object.
> .TP
> .B EFAULT
> .I argp
> @@ -408,21 +402,32 @@ field; or the
> field was zero.
> .TP
> .B EINVAL
> -There is no mapping in the specified address range.
> -.TP
> -.B EINVAL
> +The specified address range was invalid.
> +More specifically,
> +no mapping exists in the given range,
> +or the mapping that exists there is invalid
> +(e.g. unsupported type of memory),
> +or the range values (

This produces some unwanted space. Please apply the following fix to
your patch.

diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index 6e954e98c..795014794 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2
@@ -432,11 +432,11 @@ .SS UFFDIO_REGISTER
no mapping exists in the given range,
or the mapping that exists there is invalid
(e.g. unsupported type of memory),
-or the range values (
-.I range.start
+or the range values
+.IR ( range.start
or
-.I range.len
-) are not multiples of the relevant page size,
+.IR range.len )
+are not multiples of the relevant page size,
or
.I range.len
is zero.

> .I range.start
> or
> .I range.len
> -is not a multiple of the system page size; or,
> +) are not multiples of the relevant page size,
> +or
> .I range.len
> -is zero; or these fields are otherwise invalid.
> +is zero.
> .TP
> -.B EINVAL
> -There as an incompatible mapping in the specified address range.
> -.\" Mike Rapoport:
> -.\" ENOMEM if the process is exiting and the
> -.\" mm_struct has gone by the time userfault grabs it.
> +.B ENOMEM
> +The process is exiting,
> +and its address space has already been torn down
> +when userfaultfd attempts to reference it.
> +.TP
> +.B EPERM
> +The userfaultfd would allow writing to a file backing the mapping,
> +but the calling process lacks such write permissions.
> +.TP
> +.B EBUSY
> +A mapping in the specified range is registered with another
> +userfaultfd object.

Why would you move EBUSY to the end? Do you see any reasons to order it
that way?

Thanks,
Alex

> .SS UFFDIO_UNREGISTER
> (Since Linux 4.3.)
> Unregister a memory address range from userfaultfd.
> --
> 2.42.0.609.gbb76f46606-goog
>

--
<https://www.alejandro-colomar.es/>


Attachments:
(No filename) (3.21 kB)
signature.asc (849.00 B)
Download all attachments

2023-10-08 22:23:40

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] ioctl_userfaultfd.2: document new UFFDIO_POISON ioctl

Hi Axel,

On Tue, Oct 03, 2023 at 12:45:47PM -0700, Axel Rasmussen wrote:
> This is a new feature recently added to the kernel. So, document the new
> ioctl the same way we do other UFFDIO_* ioctls.
>
> Also note the corresponding new ioctl flag we can return in reponse to a
> UFFDIO_REGISTER call.
>
> Signed-off-by: Axel Rasmussen <[email protected]>
> ---
> man2/ioctl_userfaultfd.2 | 112 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 112 insertions(+)
>
> diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> index 95d69f773..6b6980d4a 100644
> --- a/man2/ioctl_userfaultfd.2
> +++ b/man2/ioctl_userfaultfd.2
> @@ -380,6 +380,11 @@ operation is supported.
> The
> .B UFFDIO_CONTINUE
> operation is supported.
> +.TP
> +.B 1 << _UFFDIO_POISON
> +The
> +.B UFFDIO_POISON
> +operation is supported.
> .PP
> This
> .BR ioctl (2)
> @@ -890,6 +895,113 @@ The faulting process has exited at the time of a
> .B UFFDIO_CONTINUE
> operation.
> .\"
> +.SS UFFDIO_POISON
> +(Since Linux 6.6.)
> +Mark an address range as "poisoned".
> +Future accesses to these addresses will raise a
> +.B SIGBUS
> +signal.
> +Unlike
> +.B MADV_HWPOISON
> +this works by installing page table entries,
> +rather than "really" poisoning the underlying physical pages.
> +This means it only affects this particular address space.
> +.PP
> +The
> +.I argp
> +argument is a pointer to a
> +.I uffdio_continue
> +structure as shown below:
> +.PP
> +.in +4n
> +.EX
> +struct uffdio_poison {
> + struct uffdio_range range;
> + /* Range to install poison PTE markers in */
> + __u64 mode; /* Flags controlling the behavior of poison */
> + __s64 updated; /* Number of bytes poisoned, or negated error */
> +};
> +.EE
> +.in
> +.PP
> +The following value may be bitwise ORed in
> +.I mode
> +to change the behavior of the
> +.B UFFDIO_POISON
> +operation:
> +.TP
> +.B UFFDIO_POISON_MODE_DONTWAKE
> +Do not wake up the thread that waits for page-fault resolution.
> +.PP
> +The
> +.I updated
> +field is used by the kernel
> +to return the number of bytes that were actually poisoned,
> +or an error in the same manner as
> +.BR UFFDIO_COPY .
> +If the value returned in the
> +.I updated
> +field doesn't match the value that was specified in
> +.IR range.len ,
> +the operation fails with the error
> +.BR EAGAIN .
> +The
> +.I updated
> +field is output-only;
> +it is not read by the
> +.B UFFDIO_POISON
> +operation.
> +.PP
> +This
> +.BR ioctl (2)
> +operation returns 0 on success.
> +In this case,
> +the entire area was poisoned.
> +On error, \-1 is returned and
> +.I errno
> +is set to indicate the error.
> +Possible errors include:
> +.TP
> +.B EAGAIN
> +The number of bytes mapped
> +(i.e., the value returned in the
> +.I updated
> +field)
> +does not equal the value that was specified in the
> +.I range.len
> +field.
> +.TP
> +.B EINVAL
> +Either
> +.I range.start
> +or
> +.I range.len
> +was not a multiple of the system page size; or
> +.I range.len
> +was zero; or the range specified was invalid.
> +.TP
> +.B EINVAL
> +An invalid bit was specified in the
> +.I mode
> +field.
> +.TP
> +.B EEXIST

Any reasons for this order, or should we use alphabetic order?

Thanks,
Alex

> +One or more pages were already mapped in the given range.
> +.TP
> +.B ENOENT
> +The faulting process has changed its virtual memory layout simultaneously with
> +an outstanding
> +.B UFFDIO_POISON
> +operation.
> +.TP
> +.B ENOMEM
> +Allocating memory for page table entries failed.
> +.TP
> +.B ESRCH
> +The faulting process has exited at the time of a
> +.B UFFDIO_POISON
> +operation.
> +.\"
> .SH RETURN VALUE
> See descriptions of the individual operations, above.
> .SH ERRORS
> --
> 2.42.0.609.gbb76f46606-goog
>

--
<https://www.alejandro-colomar.es/>


Attachments:
(No filename) (3.87 kB)
signature.asc (849.00 B)
Download all attachments

2023-10-08 22:24:13

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] ioctl_userfaultfd.2: describe two-step feature handshake

On Tue, Oct 03, 2023 at 12:45:43PM -0700, Axel Rasmussen wrote:
> Fully describe how UFFDIO_API can be used to perform a two-step feature
> handshake, and also note the case where this isn't necessary (programs
> which don't depend on any extra features).
>
> This lets us clean up an old FIXME asking for this to be described.
>
> Signed-off-by: Axel Rasmussen <[email protected]>

Patch applied.

Thanks,
Alex

> ---
> man2/ioctl_userfaultfd.2 | 37 +++++++++++++++++++++----------------
> 1 file changed, 21 insertions(+), 16 deletions(-)
>
> diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> index b5281ec4c..ef352a69d 100644
> --- a/man2/ioctl_userfaultfd.2
> +++ b/man2/ioctl_userfaultfd.2
> @@ -82,7 +82,6 @@ struct uffdio_api {
> The
> .I api
> field denotes the API version requested by the application.
> -.PP
> The kernel verifies that it can support the requested API version,
> and sets the
> .I features
> @@ -92,6 +91,25 @@ fields to bit masks representing all the available features and the generic
> .BR ioctl (2)
> operations available.
> .PP
> +Since Linux 4.11,
> +applications should use the
> +.I features
> +field to perform a two-step handshake.
> +First,
> +.BR UFFDIO_API
> +is called with the
> +.I features
> +field set to zero.
> +The kernel responsds by setting all supported feature bits.
> +.PP
> +Applications which do not require any specific features
> +can begin using the userfaultfd immediately.
> +Applications which do need specific features
> +should call
> +.BR UFFDIO_API
> +again with a subset of the reported feature bits set
> +to enable those features.
> +.PP
> Before Linux 4.11, the
> .I features
> field must be initialized to zero before the call to
> @@ -101,24 +119,11 @@ and zero (i.e., no feature bits) is placed in the
> field by the kernel upon return from
> .BR ioctl (2).
> .PP
> -Starting from Linux 4.11, the
> -.I features
> -field can be used to ask whether particular features are supported
> -and explicitly enable userfaultfd features that are disabled by default.
> -The kernel always reports all the available features in the
> -.I features
> -field.
> -.PP
> -To enable userfaultfd features the application should set
> -a bit corresponding to each feature it wants to enable in the
> -.I features
> -field.
> -If the kernel supports all the requested features it will enable them.
> -Otherwise it will zero out the returned
> +If the application sets unsupported feature bits,
> +the kernel will zero out the returned
> .I uffdio_api
> structure and return
> .BR EINVAL .
> -.\" FIXME add more details about feature negotiation and enablement
> .PP
> The following feature bits may be set:
> .TP
> --
> 2.42.0.609.gbb76f46606-goog
>

--
<https://www.alejandro-colomar.es/>


Attachments:
(No filename) (2.82 kB)
signature.asc (849.00 B)
Download all attachments

2023-10-08 22:25:56

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] ioctl_userfaultfd.2: correct and update UFFDIO_API ioctl error codes

Hi Axel,

On Tue, Oct 03, 2023 at 12:45:44PM -0700, Axel Rasmussen wrote:
> First, it is not correct that repeated UFFDIO_API calls result in
> EINVAL. This is true *if both calls enable features*, but in the case
> where we're doing a two-step feature detection handshake, the kernel
> explicitly expects 2 calls (one with no features set). So, correct this
> description.
>
> Then, some new error cases have been added to the kernel recently, and
> the man page wasn't updated to note these. So, add in descriptions of
> these new error cases.
>
> Signed-off-by: Axel Rasmussen <[email protected]>

Patch applied.

Thanks,
Alex

> ---
> man2/ioctl_userfaultfd.2 | 24 +++++++++++++++++++-----
> 1 file changed, 19 insertions(+), 5 deletions(-)
>
> diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> index ef352a69d..28dd2fcdd 100644
> --- a/man2/ioctl_userfaultfd.2
> +++ b/man2/ioctl_userfaultfd.2
> @@ -256,17 +256,31 @@ refers to an address that is outside the calling process's
> accessible address space.
> .TP
> .B EINVAL
> -The userfaultfd has already been enabled by a previous
> -.B UFFDIO_API
> -operation.
> -.TP
> -.B EINVAL
> The API version requested in the
> .I api
> field is not supported by this kernel, or the
> .I features
> field passed to the kernel includes feature bits that are not supported
> by the current kernel version.
> +.TP
> +.B EINVAL
> +A previous
> +.B UFFDIO_API
> +call already enabled one or more features for this userfaultfd.
> +Calling
> +.B UFFDIO_API
> +twice,
> +the first time with no features set,
> +is explicitly allowed
> +as per the two-step feature detection handshake.
> +.TP
> +.B EPERM
> +The
> +.B UFFD_FEATURE_EVENT_FORK
> +feature was enabled,
> +but the calling process doesn't have the
> +.B CAP_SYS_PTRACE
> +capability.
> .\" FIXME In the above error case, the returned 'uffdio_api' structure is
> .\" zeroed out. Why is this done? This should be explained in the manual page.
> .\"
> --
> 2.42.0.609.gbb76f46606-goog
>

--
<https://www.alejandro-colomar.es/>


Attachments:
(No filename) (2.09 kB)
signature.asc (849.00 B)
Download all attachments

2023-10-08 22:28:15

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] userfaultfd man page updates

Hi Axel,

On Tue, Oct 03, 2023 at 12:45:42PM -0700, Axel Rasmussen wrote:
> This series includes only the remaining patches not applied from v1, with
> review comments addressed. This series is based on the "contrib" branch.

Thanks. For the next revision, I've moved the contrib branch to my
personal repo, since it's more of a personal branch. You can find it
here:
<https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/log/?h=contrib>

Cheers,
Alex

>
> Changelog:
>
> v1->v2:
> - In patch 1 (patch 5 in v1), change "after" to "since" for consistency and to
> be clear that we mean 4.11+ (inclusive).
> - In patch 2 (patch 7 in v1), reorder error codes alphabetically (EINVAL then
> EPERM).
> - In patch 3 (patch 8 in v1), resolve conflicts with earlier review comments.
>
> Original cover letter:
>
> Various updates for userfaultfd man pages. To summarize the changes:
>
> - Correctly / fully describe the two-step feature support handshake process.
> - Describe new UFFDIO_POISON ioctl.
> - Other small improvements (missing ioctls, error codes, etc).
>
> Axel Rasmussen (5):
> ioctl_userfaultfd.2: describe two-step feature handshake
> ioctl_userfaultfd.2: correct and update UFFDIO_API ioctl error codes
> ioctl_userfaultfd.2: clarify the state of the uffdio_api structure on
> error
> ioctl_userfaultfd.2: fix / update UFFDIO_REGISTER error code list
> ioctl_userfaultfd.2: document new UFFDIO_POISON ioctl
>
> man2/ioctl_userfaultfd.2 | 226 +++++++++++++++++++++++++++++++--------
> 1 file changed, 181 insertions(+), 45 deletions(-)
>
> --
> 2.42.0.609.gbb76f46606-goog
>

--
<https://www.alejandro-colomar.es/>


Attachments:
(No filename) (1.69 kB)
signature.asc (849.00 B)
Download all attachments

2023-10-09 09:14:19

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] ioctl_userfaultfd.2: clarify the state of the uffdio_api structure on error

On Sun, Oct 08, 2023 at 11:52:56PM +0200, Alejandro Colomar wrote:
> Hi Axel,
>
> On Tue, Oct 03, 2023 at 12:45:45PM -0700, Axel Rasmussen wrote:
> > The old FIXME noted that the zeroing was done to differentiate the two
> > EINVAL cases. It's possible something like this was true historically,
> > but in current Linux we zero it in *both* EINVAL cases, so this is at
> > least no longer true.
> >
> > After reading the code, I can't determine any clear reason why we zero
> > it in some cases but not in others. So, some simple advice we can give
> > userspace is: if an error occurs, treat the contents of the structure as
> > unspecified. Just re-initialize it before retrying UFFDIO_API again.
> >
> > Signed-off-by: Axel Rasmussen <[email protected]>
> > ---
> > man2/ioctl_userfaultfd.2 | 16 ++++++++--------
> > 1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> > index 28dd2fcdd..2ee6a0532 100644
> > --- a/man2/ioctl_userfaultfd.2
> > +++ b/man2/ioctl_userfaultfd.2
> > @@ -248,6 +248,14 @@ operation returns 0 on success.
> > On error, \-1 is returned and
> > .I errno
> > is set to indicate the error.
> > +If an error occurs,
> > +the kernel may zero the provided
> > +.I uffdio_api
> > +structure.
> > +The caller should treat its contents as unspecified,
> > +and reinitialize it before re-attempting another
> > +.B UFFDIO_API
> > +call.
> > Possible errors include:
> > .TP
> > .B EFAULT
> > @@ -281,14 +289,6 @@ feature was enabled,
> > but the calling process doesn't have the
> > .B CAP_SYS_PTRACE
> > capability.
> > -.\" FIXME In the above error case, the returned 'uffdio_api' structure is
> > -.\" zeroed out. Why is this done? This should be explained in the manual page.
> > -.\"
> > -.\" Mike Rapoport:
> > -.\" In my understanding the uffdio_api
> > -.\" structure is zeroed to allow the caller
> > -.\" to distinguish the reasons for -EINVAL.
> > -.\"
>
> I've added Mike to the thread in case he wants to comment.

Thanks, Alex!

It looks like I reviewed v1 of the patchset though :)

> Thanks,
> Alex
>
> > .SS UFFDIO_REGISTER
> > (Since Linux 4.3.)
> > Register a memory address range with the userfaultfd object.
> > --
> > 2.42.0.609.gbb76f46606-goog
> >
>
> --
> <https://www.alejandro-colomar.es/>



--
Sincerely yours,
Mike.

2023-10-17 21:31:27

by Guillem Jover

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] ioctl_userfaultfd.2: fix / update UFFDIO_REGISTER error code list

Hi!

On Mon, 2023-10-09 at 00:06:48 +0200, Alejandro Colomar wrote:
> This produces some unwanted space. Please apply the following fix to
> your patch.
>
> diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> index 6e954e98c..795014794 100644
> --- a/man2/ioctl_userfaultfd.2
> +++ b/man2/ioctl_userfaultfd.2
> @@ -432,11 +432,11 @@ .SS UFFDIO_REGISTER
> no mapping exists in the given range,
> or the mapping that exists there is invalid
> (e.g. unsupported type of memory),
> -or the range values (
> -.I range.start
> +or the range values
> +.IR ( range.start

I think you meant «.RI» here?

> or
> -.I range.len
> -) are not multiples of the relevant page size,
> +.IR range.len )
> +are not multiples of the relevant page size,
> or
> .I range.len
> is zero.
>

Regards,
Guillem

2023-10-17 21:42:55

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] ioctl_userfaultfd.2: fix / update UFFDIO_REGISTER error code list

Hi Guillem!

On Tue, Oct 17, 2023 at 11:23:10PM +0200, Guillem Jover wrote:
> Hi!
>
> On Mon, 2023-10-09 at 00:06:48 +0200, Alejandro Colomar wrote:
> > This produces some unwanted space. Please apply the following fix to
> > your patch.
> >
> > diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> > index 6e954e98c..795014794 100644
> > --- a/man2/ioctl_userfaultfd.2
> > +++ b/man2/ioctl_userfaultfd.2
> > @@ -432,11 +432,11 @@ .SS UFFDIO_REGISTER
> > no mapping exists in the given range,
> > or the mapping that exists there is invalid
> > (e.g. unsupported type of memory),
> > -or the range values (
> > -.I range.start
> > +or the range values
> > +.IR ( range.start
>
> I think you meant «.RI» here?

Yup! Good catch.

Cheers,
Alex

>
> > or
> > -.I range.len
> > -) are not multiples of the relevant page size,
> > +.IR range.len )
> > +are not multiples of the relevant page size,
> > or
> > .I range.len
> > is zero.
> >
>
> Regards,
> Guillem

--
<https://www.alejandro-colomar.es/>


Attachments:
(No filename) (1.05 kB)
signature.asc (849.00 B)
Download all attachments

2023-10-17 22:15:57

by Axel Rasmussen

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] ioctl_userfaultfd.2: fix / update UFFDIO_REGISTER error code list

On Tue, Oct 17, 2023 at 2:42 PM Alejandro Colomar <[email protected]> wrote:
>
> Hi Guillem!
>
> On Tue, Oct 17, 2023 at 11:23:10PM +0200, Guillem Jover wrote:
> > Hi!
> >
> > On Mon, 2023-10-09 at 00:06:48 +0200, Alejandro Colomar wrote:
> > > This produces some unwanted space. Please apply the following fix to
> > > your patch.
> > >
> > > diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> > > index 6e954e98c..795014794 100644
> > > --- a/man2/ioctl_userfaultfd.2
> > > +++ b/man2/ioctl_userfaultfd.2
> > > @@ -432,11 +432,11 @@ .SS UFFDIO_REGISTER
> > > no mapping exists in the given range,
> > > or the mapping that exists there is invalid
> > > (e.g. unsupported type of memory),
> > > -or the range values (
> > > -.I range.start
> > > +or the range values
> > > +.IR ( range.start
> >
> > I think you meant «.RI» here?
>
> Yup! Good catch.

Thanks, I'll apply this change in a v3.

>
> Cheers,
> Alex
>
> >
> > > or
> > > -.I range.len
> > > -) are not multiples of the relevant page size,
> > > +.IR range.len )
> > > +are not multiples of the relevant page size,
> > > or
> > > .I range.len
> > > is zero.
> > >
> >
> > Regards,
> > Guillem

Regarding the -EBUSY ordering, I did it this way because that's the
order in which the conditions are checked in the code. But, I agree
that isn't very obvious / useful to any reader of the man page :) and
alphabetical order is preferred. I'll correct that in v3.

>
> --
> <https://www.alejandro-colomar.es/>

2023-10-17 22:26:22

by Axel Rasmussen

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] ioctl_userfaultfd.2: document new UFFDIO_POISON ioctl

On Sun, Oct 8, 2023 at 3:23 PM Alejandro Colomar <[email protected]> wrote:
>
> Hi Axel,
>
> On Tue, Oct 03, 2023 at 12:45:47PM -0700, Axel Rasmussen wrote:
> > This is a new feature recently added to the kernel. So, document the new
> > ioctl the same way we do other UFFDIO_* ioctls.
> >
> > Also note the corresponding new ioctl flag we can return in reponse to a
> > UFFDIO_REGISTER call.
> >
> > Signed-off-by: Axel Rasmussen <[email protected]>
> > ---
> > man2/ioctl_userfaultfd.2 | 112 +++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 112 insertions(+)
> >
> > diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> > index 95d69f773..6b6980d4a 100644
> > --- a/man2/ioctl_userfaultfd.2
> > +++ b/man2/ioctl_userfaultfd.2
> > @@ -380,6 +380,11 @@ operation is supported.
> > The
> > .B UFFDIO_CONTINUE
> > operation is supported.
> > +.TP
> > +.B 1 << _UFFDIO_POISON
> > +The
> > +.B UFFDIO_POISON
> > +operation is supported.
> > .PP
> > This
> > .BR ioctl (2)
> > @@ -890,6 +895,113 @@ The faulting process has exited at the time of a
> > .B UFFDIO_CONTINUE
> > operation.
> > .\"
> > +.SS UFFDIO_POISON
> > +(Since Linux 6.6.)
> > +Mark an address range as "poisoned".
> > +Future accesses to these addresses will raise a
> > +.B SIGBUS
> > +signal.
> > +Unlike
> > +.B MADV_HWPOISON
> > +this works by installing page table entries,
> > +rather than "really" poisoning the underlying physical pages.
> > +This means it only affects this particular address space.
> > +.PP
> > +The
> > +.I argp
> > +argument is a pointer to a
> > +.I uffdio_continue
> > +structure as shown below:
> > +.PP
> > +.in +4n
> > +.EX
> > +struct uffdio_poison {
> > + struct uffdio_range range;
> > + /* Range to install poison PTE markers in */
> > + __u64 mode; /* Flags controlling the behavior of poison */
> > + __s64 updated; /* Number of bytes poisoned, or negated error */
> > +};
> > +.EE
> > +.in
> > +.PP
> > +The following value may be bitwise ORed in
> > +.I mode
> > +to change the behavior of the
> > +.B UFFDIO_POISON
> > +operation:
> > +.TP
> > +.B UFFDIO_POISON_MODE_DONTWAKE
> > +Do not wake up the thread that waits for page-fault resolution.
> > +.PP
> > +The
> > +.I updated
> > +field is used by the kernel
> > +to return the number of bytes that were actually poisoned,
> > +or an error in the same manner as
> > +.BR UFFDIO_COPY .
> > +If the value returned in the
> > +.I updated
> > +field doesn't match the value that was specified in
> > +.IR range.len ,
> > +the operation fails with the error
> > +.BR EAGAIN .
> > +The
> > +.I updated
> > +field is output-only;
> > +it is not read by the
> > +.B UFFDIO_POISON
> > +operation.
> > +.PP
> > +This
> > +.BR ioctl (2)
> > +operation returns 0 on success.
> > +In this case,
> > +the entire area was poisoned.
> > +On error, \-1 is returned and
> > +.I errno
> > +is set to indicate the error.
> > +Possible errors include:
> > +.TP
> > +.B EAGAIN
> > +The number of bytes mapped
> > +(i.e., the value returned in the
> > +.I updated
> > +field)
> > +does not equal the value that was specified in the
> > +.I range.len
> > +field.
> > +.TP
> > +.B EINVAL
> > +Either
> > +.I range.start
> > +or
> > +.I range.len
> > +was not a multiple of the system page size; or
> > +.I range.len
> > +was zero; or the range specified was invalid.
> > +.TP
> > +.B EINVAL
> > +An invalid bit was specified in the
> > +.I mode
> > +field.
> > +.TP
> > +.B EEXIST
>
> Any reasons for this order, or should we use alphabetic order?

This is the order the conditions are checked in code, but I agree
alphabetic order is better. :) I'll send a v3.

>
> Thanks,
> Alex
>
> > +One or more pages were already mapped in the given range.
> > +.TP
> > +.B ENOENT
> > +The faulting process has changed its virtual memory layout simultaneously with
> > +an outstanding
> > +.B UFFDIO_POISON
> > +operation.
> > +.TP
> > +.B ENOMEM
> > +Allocating memory for page table entries failed.
> > +.TP
> > +.B ESRCH
> > +The faulting process has exited at the time of a
> > +.B UFFDIO_POISON
> > +operation.
> > +.\"
> > .SH RETURN VALUE
> > See descriptions of the individual operations, above.
> > .SH ERRORS
> > --
> > 2.42.0.609.gbb76f46606-goog
> >
>
> --
> <https://www.alejandro-colomar.es/>