2017-07-27 06:27:10

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

In the non-cooperative userfaultfd case, the process exit may race with
outstanding mcopy_atomic called by the uffd monitor. Returning -ENOSPC
instead of -EINVAL when mm is already gone will allow uffd monitor to
distinguish this case from other error conditions.


Cc: [email protected]
Fixes: 96333187ab162 ("userfaultfd_copy: return -ENOSPC in case mm has gone")

Signed-off-by: Mike Rapoport <[email protected]>
---

Unfortunately, I've overlooked userfaultfd_zeropage when I updated
userfaultd_copy :(

fs/userfaultfd.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index cadcd12a3d35..2d8c2d848668 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1643,6 +1643,8 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx,
ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start,
uffdio_zeropage.range.len);
mmput(ctx->mm);
+ } else {
+ return -ENOSPC;
}
if (unlikely(put_user(ret, &user_uffdio_zeropage->zeropage)))
return -EFAULT;
--
2.7.4


2017-07-31 12:22:08

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

On Thu 27-07-17 09:26:59, Mike Rapoport wrote:
> In the non-cooperative userfaultfd case, the process exit may race with
> outstanding mcopy_atomic called by the uffd monitor. Returning -ENOSPC
> instead of -EINVAL when mm is already gone will allow uffd monitor to
> distinguish this case from other error conditions.

Normally we tend to return ESRCH in such case. ENOSPC sounds rather
confusing...

> Cc: [email protected]
> Fixes: 96333187ab162 ("userfaultfd_copy: return -ENOSPC in case mm has gone")
>
> Signed-off-by: Mike Rapoport <[email protected]>
> ---
>
> Unfortunately, I've overlooked userfaultfd_zeropage when I updated
> userfaultd_copy :(
>
> fs/userfaultfd.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index cadcd12a3d35..2d8c2d848668 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1643,6 +1643,8 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx,
> ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start,
> uffdio_zeropage.range.len);
> mmput(ctx->mm);
> + } else {
> + return -ENOSPC;
> }
> if (unlikely(put_user(ret, &user_uffdio_zeropage->zeropage)))
> return -EFAULT;
> --
> 2.7.4
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Michal Hocko
SUSE Labs

2017-07-31 13:32:58

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

On Mon, Jul 31, 2017 at 02:22:04PM +0200, Michal Hocko wrote:
> On Thu 27-07-17 09:26:59, Mike Rapoport wrote:
> > In the non-cooperative userfaultfd case, the process exit may race with
> > outstanding mcopy_atomic called by the uffd monitor. Returning -ENOSPC
> > instead of -EINVAL when mm is already gone will allow uffd monitor to
> > distinguish this case from other error conditions.
>
> Normally we tend to return ESRCH in such case. ENOSPC sounds rather
> confusing...

This is in sync and consistent with the retval for UFFDIO_COPY upstream:

if (mmget_not_zero(ctx->mm)) {
ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
uffdio_copy.len);
mmput(ctx->mm);
} else {
return -ENOSPC;
}

If you preferred ESRCH I certainly wouldn't have been against, but we
should have discussed it before it was upstream. All it matters is
it's documented in the great manpage that was written for it as quoted
below.

+.TP
+.B ENOENT
+(Since Linux 4.11)
+The faulting process has changed
+its virtual memory layout simultaneously with outstanding
+.I UFFDIO_COPY
+operation.
+.TP
+.B ENOSPC
+(Since Linux 4.11)
+The faulting process has exited at the time of
+.I UFFDIO_COPY
+operation.

To change it now, we would need to involve manpage and other code
changes.

Thanks,
Andrea

2017-07-31 13:36:35

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

On Mon, Jul 31, 2017 at 02:22:04PM +0200, Michal Hocko wrote:
> On Thu 27-07-17 09:26:59, Mike Rapoport wrote:
> > In the non-cooperative userfaultfd case, the process exit may race with
> > outstanding mcopy_atomic called by the uffd monitor. Returning -ENOSPC
> > instead of -EINVAL when mm is already gone will allow uffd monitor to
> > distinguish this case from other error conditions.
>
> Normally we tend to return ESRCH in such case. ENOSPC sounds rather
> confusing...

Well, I don't remember why I used ENOSPC in userfault_copy at the first
place, but if we are to keep it userfaultfd_zeropage should return the same
error...

> > Cc: [email protected]
> > Fixes: 96333187ab162 ("userfaultfd_copy: return -ENOSPC in case mm has gone")
> >
> > Signed-off-by: Mike Rapoport <[email protected]>
> > ---
> >
> > Unfortunately, I've overlooked userfaultfd_zeropage when I updated
> > userfaultd_copy :(
> >
> > fs/userfaultfd.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > index cadcd12a3d35..2d8c2d848668 100644
> > --- a/fs/userfaultfd.c
> > +++ b/fs/userfaultfd.c
> > @@ -1643,6 +1643,8 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx,
> > ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start,
> > uffdio_zeropage.range.len);
> > mmput(ctx->mm);
> > + } else {
> > + return -ENOSPC;
> > }
> > if (unlikely(put_user(ret, &user_uffdio_zeropage->zeropage)))
> > return -EFAULT;
> > --
> > 2.7.4
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to [email protected]. For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>
> --
> Michal Hocko
> SUSE Labs
>

--
Sincerely yours,
Mike.

2017-07-31 13:45:13

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

On Mon 31-07-17 15:32:47, Andrea Arcangeli wrote:
> On Mon, Jul 31, 2017 at 02:22:04PM +0200, Michal Hocko wrote:
> > On Thu 27-07-17 09:26:59, Mike Rapoport wrote:
> > > In the non-cooperative userfaultfd case, the process exit may race with
> > > outstanding mcopy_atomic called by the uffd monitor. Returning -ENOSPC
> > > instead of -EINVAL when mm is already gone will allow uffd monitor to
> > > distinguish this case from other error conditions.
> >
> > Normally we tend to return ESRCH in such case. ENOSPC sounds rather
> > confusing...
>
> This is in sync and consistent with the retval for UFFDIO_COPY upstream:
>
> if (mmget_not_zero(ctx->mm)) {
> ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
> uffdio_copy.len);
> mmput(ctx->mm);
> } else {
> return -ENOSPC;
> }
>
> If you preferred ESRCH I certainly wouldn't have been against, but we
> should have discussed it before it was upstream. All it matters is
> it's documented in the great manpage that was written for it as quoted
> below.

OK, I wasn't aware of this.

> +.TP
> +.B ENOENT
> +(Since Linux 4.11)
> +The faulting process has changed
> +its virtual memory layout simultaneously with outstanding
> +.I UFFDIO_COPY
> +operation.
> +.TP
> +.B ENOSPC
> +(Since Linux 4.11)
> +The faulting process has exited at the time of
> +.I UFFDIO_COPY
> +operation.
>
> To change it now, we would need to involve manpage and other code
> changes.

Well, ESRCH is more appropriate so I would rather change it sooner than
later. But if we are going to risk user space breakage then this is not
worth the risk. I expected there are very few users of this API
currently so maybe it won't be a big disaster?

Anyway, at least this is documented so I will leave the decision to you.
--
Michal Hocko
SUSE Labs

2017-08-02 12:34:51

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

On Mon, Jul 31, 2017 at 03:45:08PM +0200, Michal Hocko wrote:
> On Mon 31-07-17 15:32:47, Andrea Arcangeli wrote:
> > On Mon, Jul 31, 2017 at 02:22:04PM +0200, Michal Hocko wrote:
> > > On Thu 27-07-17 09:26:59, Mike Rapoport wrote:
> > > > In the non-cooperative userfaultfd case, the process exit may race with
> > > > outstanding mcopy_atomic called by the uffd monitor. Returning -ENOSPC
> > > > instead of -EINVAL when mm is already gone will allow uffd monitor to
> > > > distinguish this case from other error conditions.
> > >
> > > Normally we tend to return ESRCH in such case. ENOSPC sounds rather
> > > confusing...
> >
> > This is in sync and consistent with the retval for UFFDIO_COPY upstream:
> >
> > if (mmget_not_zero(ctx->mm)) {
> > ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
> > uffdio_copy.len);
> > mmput(ctx->mm);
> > } else {
> > return -ENOSPC;
> > }
> >
> > If you preferred ESRCH I certainly wouldn't have been against, but we
> > should have discussed it before it was upstream. All it matters is
> > it's documented in the great manpage that was written for it as quoted
> > below.
>
> OK, I wasn't aware of this.
>
> > +.TP
> > +.B ENOENT
> > +(Since Linux 4.11)
> > +The faulting process has changed
> > +its virtual memory layout simultaneously with outstanding
> > +.I UFFDIO_COPY
> > +operation.
> > +.TP
> > +.B ENOSPC
> > +(Since Linux 4.11)
> > +The faulting process has exited at the time of
> > +.I UFFDIO_COPY
> > +operation.
> >
> > To change it now, we would need to involve manpage and other code
> > changes.
>
> Well, ESRCH is more appropriate so I would rather change it sooner than
> later. But if we are going to risk user space breakage then this is not
> worth the risk. I expected there are very few users of this API
> currently so maybe it won't be a big disaster?

I surely can take care of CRIU, but I don't know if QEMU or certain
database application that uses userfaultfd rely on this API, not mentioning
there maybe other unknown users.

Andrea, what do you think?

> Anyway, at least this is documented so I will leave the decision to you.
> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

--
Sincerely yours,
Mike.

2017-08-02 13:21:52

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Re: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

* Mike Rapoport ([email protected]) wrote:
> On Mon, Jul 31, 2017 at 03:45:08PM +0200, Michal Hocko wrote:
> > On Mon 31-07-17 15:32:47, Andrea Arcangeli wrote:
> > > On Mon, Jul 31, 2017 at 02:22:04PM +0200, Michal Hocko wrote:
> > > > On Thu 27-07-17 09:26:59, Mike Rapoport wrote:
> > > > > In the non-cooperative userfaultfd case, the process exit may race with
> > > > > outstanding mcopy_atomic called by the uffd monitor. Returning -ENOSPC
> > > > > instead of -EINVAL when mm is already gone will allow uffd monitor to
> > > > > distinguish this case from other error conditions.
> > > >
> > > > Normally we tend to return ESRCH in such case. ENOSPC sounds rather
> > > > confusing...
> > >
> > > This is in sync and consistent with the retval for UFFDIO_COPY upstream:
> > >
> > > if (mmget_not_zero(ctx->mm)) {
> > > ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
> > > uffdio_copy.len);
> > > mmput(ctx->mm);
> > > } else {
> > > return -ENOSPC;
> > > }
> > >
> > > If you preferred ESRCH I certainly wouldn't have been against, but we
> > > should have discussed it before it was upstream. All it matters is
> > > it's documented in the great manpage that was written for it as quoted
> > > below.
> >
> > OK, I wasn't aware of this.
> >
> > > +.TP
> > > +.B ENOENT
> > > +(Since Linux 4.11)
> > > +The faulting process has changed
> > > +its virtual memory layout simultaneously with outstanding
> > > +.I UFFDIO_COPY
> > > +operation.
> > > +.TP
> > > +.B ENOSPC
> > > +(Since Linux 4.11)
> > > +The faulting process has exited at the time of
> > > +.I UFFDIO_COPY
> > > +operation.
> > >
> > > To change it now, we would need to involve manpage and other code
> > > changes.
> >
> > Well, ESRCH is more appropriate so I would rather change it sooner than
> > later. But if we are going to risk user space breakage then this is not
> > worth the risk. I expected there are very few users of this API
> > currently so maybe it won't be a big disaster?
>
> I surely can take care of CRIU, but I don't know if QEMU or certain
> database application that uses userfaultfd rely on this API, not mentioning
> there maybe other unknown users.
>
> Andrea, what do you think?

QEMU doesn't care about the errno value, it just reports it.

Dave

> > Anyway, at least this is documented so I will leave the decision to you.
> > --
> > Michal Hocko
> > SUSE Labs
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to [email protected]. For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
> >
>
> --
> Sincerely yours,
> Mike.
>
--
Dr. David Alan Gilbert / [email protected] / Manchester, UK

2017-08-02 15:55:26

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

On Wed, Aug 02, 2017 at 03:34:41PM +0300, Mike Rapoport wrote:
> I surely can take care of CRIU, but I don't know if QEMU or certain
> database application that uses userfaultfd rely on this API, not mentioning
> there maybe other unknown users.
>
> Andrea, what do you think?

The manpage would need updates, from v4.11 to v4.13 -ENOSPC, from v4.1
-ESRCH and I don't see the benefit and it just looks confusion for
nothing, but if somebody feel strongly about it and does the work (and
risks to take the blame if something breaks...) I wouldn't be against
it, it won't make much of a difference anyway.

The reason I don't see any benefit in code readability is that I don't
see ESRCH as an obviously better retval, because if you grep for ESRCH
you'll see it's a failure to find a process with a certain pid, it is
an obvious retval when you're dealing with processes and pids, but we
never search pids and in fact the pid and the process may be already
gone but we still won't return ESRCH. UFFDIO_COPY never takes a pid as
parameter anywhere so why to return ESRCH? ENOSPC shall be interpreted
"no memory avail to copy anything", ESRCH as far as I can tell, could
be as unexpected as ENOSPC is you don't specify a pid as parameter to
the kernel.

If the mm_users is already zero and the mm is gone it means the
process is gone too, that is true, but the process could be gone
already and we could still obtain the mm_users and run UFFDIO_COPY if
there's async I/O pending or something. There's no association between
process/pid being still alive and the need to run a UFFDIO_COPY and
succeed at it.

Not ever dealing with pids and processes is why not even ESRCH is an
obvious perfect match for such an error, and this is why I think such
a change now would add no tangible pros and only short term cons.

Thanks,
Andrea

2017-08-02 16:22:54

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

On Wed 02-08-17 17:55:22, Andrea Arcangeli wrote:
> On Wed, Aug 02, 2017 at 03:34:41PM +0300, Mike Rapoport wrote:
> > I surely can take care of CRIU, but I don't know if QEMU or certain
> > database application that uses userfaultfd rely on this API, not mentioning
> > there maybe other unknown users.
> >
> > Andrea, what do you think?
>
> The manpage would need updates, from v4.11 to v4.13 -ENOSPC, from v4.1
> -ESRCH and I don't see the benefit and it just looks confusion for
> nothing, but if somebody feel strongly about it and does the work (and
> risks to take the blame if something breaks...) I wouldn't be against
> it, it won't make much of a difference anyway.
>
> The reason I don't see any benefit in code readability is that I don't
> see ESRCH as an obviously better retval, because if you grep for ESRCH
> you'll see it's a failure to find a process with a certain pid, it is
> an obvious retval when you're dealing with processes and pids, but we
> never search pids and in fact the pid and the process may be already
> gone but we still won't return ESRCH. UFFDIO_COPY never takes a pid as
> parameter anywhere so why to return ESRCH?

ESRCH refers to "no such process". Strictly speaking userfaultfd code is
about a mm which is gone but that is a mere detail. In fact the owner of
the mm is gone as well. You might not refer to the process by its pid
but you are surely refer to a process via its address space. That's why
I think this error code is more appropriate.

But as I've said, this might be really risky to change. My impression
was that userfaultfd is not widely used yet and those can be fixed
easily but if that is not the case then we have to live with the current
ENOSPC.
--
Michal Hocko
SUSE Labs

2017-08-02 16:40:08

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

On Wed, Aug 02, 2017 at 06:22:49PM +0200, Michal Hocko wrote:
> ESRCH refers to "no such process". Strictly speaking userfaultfd code is
> about a mm which is gone but that is a mere detail. In fact the owner of

Well this whole issue about which retval, is about a mere detail in
the first place, so I don't think you can discount all other mere
details as irrelevant in the evaluation of a change to solve a mere
detail.

> But as I've said, this might be really risky to change. My impression
> was that userfaultfd is not widely used yet and those can be fixed
> easily but if that is not the case then we have to live with the current
> ENOSPC.

The only change would be for userfaultfd non cooperative mode, and
CRIU is the main user of that. So I think it is up to Mike to decide,
I'm fine either ways. I certainly agree ESRCH could be a slightly
better fit, I only wanted to clarify it's not a 100% match either.

2017-08-03 17:25:57

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

On Wed, Aug 02, 2017 at 06:40:01PM +0200, Andrea Arcangeli wrote:
> On Wed, Aug 02, 2017 at 06:22:49PM +0200, Michal Hocko wrote:
> > ESRCH refers to "no such process". Strictly speaking userfaultfd code is
> > about a mm which is gone but that is a mere detail. In fact the owner of
>
> Well this whole issue about which retval, is about a mere detail in
> the first place, so I don't think you can discount all other mere
> details as irrelevant in the evaluation of a change to solve a mere
> detail.
>
> > But as I've said, this might be really risky to change. My impression
> > was that userfaultfd is not widely used yet and those can be fixed
> > easily but if that is not the case then we have to live with the current
> > ENOSPC.
>
> The only change would be for userfaultfd non cooperative mode, and
> CRIU is the main user of that. So I think it is up to Mike to decide,
> I'm fine either ways. I certainly agree ESRCH could be a slightly
> better fit, I only wanted to clarify it's not a 100% match either.

I'm Ok with updating the code and the man page as long as Michal takes the
blame if anything but CRIU breaks :)

Now, seriously, I believe there are not many users of non-cooperative uffd
if at all and it is very unlikely anybody has it in production.

I'll send a patch with s/ENOSPC/ESRCH in the next few days.

--
Sincerely yours,
Mike.

2017-08-03 21:25:26

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

On Thu, Aug 03, 2017 at 08:24:43PM +0300, Mike Rapoport wrote:
> Now, seriously, I believe there are not many users of non-cooperative uffd
> if at all and it is very unlikely anybody has it in production.
>
> I'll send a patch with s/ENOSPC/ESRCH in the next few days.

Ok.

Some more thought on this one, enterprise kernels have been shipped
matching the v4.11-v4.12 upstream kernel ABI and I've no time machine
to alter the kABI on those installs.

If you go ahead with the change, the safest would be that you keep
handling -ENOSPC and -ESRCH equally in CRIU code, so there will be no
risk of regression in the short term if somebody is playing with an
upstream CRIU. The alternative would be add uname -r knowledge.

Once it's upstream, I can fixup so further kernel updates will go in
sync. I obviously can't make changes that affects the kABI until it's
upstream and shipped in a official release so things will be out of
sync for a while (and the risk of somebody using ancient kernels will
persist for the mid term).