From: Nicolas Geoffray <[email protected]>
F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE:
A private mapping created after the memfd file that gets sealed with
F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning
children and parent share the same memory, even though the mapping is
private.
The reason for this is due to the code below:
static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
{
struct shmem_inode_info *info = SHMEM_I(file_inode(file));
if (info->seals & F_SEAL_FUTURE_WRITE) {
/*
* New PROT_WRITE and MAP_SHARED mmaps are not allowed when
* "future write" seal active.
*/
if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE))
return -EPERM;
/*
* Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED
* read-only mapping, take care to not allow mprotect to revert
* protections.
*/
vma->vm_flags &= ~(VM_MAYWRITE);
}
...
}
And for the mm to know if a mapping is copy-on-write:
static inline bool is_cow_mapping(vm_flags_t flags)
{
return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
}
The patch fixes the issue by making the mprotect revert protection
happen only for shared mappings. For private mappings, using mprotect
will have no effect on the seal behavior.
Cc: [email protected]
Signed-off-by: Nicolas Geoffray <[email protected]>
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
Google bug: 143833776
mm/shmem.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 447fd575587c..6ac5e867ef13 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2214,11 +2214,14 @@ static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
return -EPERM;
/*
- * Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED
- * read-only mapping, take care to not allow mprotect to revert
- * protections.
+ * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
+ * MAP_SHARED and read-only, take care to not allow mprotect to
+ * revert protections on such mappings. Do this only for shared
+ * mappings. For private mappings, don't need to mask VM_MAYWRITE
+ * as we still want them to be COW-writable.
*/
- vma->vm_flags &= ~(VM_MAYWRITE);
+ if (vma->vm_flags & VM_SHARED)
+ vma->vm_flags &= ~(VM_MAYWRITE);
}
file_accessed(file);
--
2.24.0.rc1.363.gb1bccd3e3d-goog
On Thu, 7 Nov 2019 14:53:54 -0500 "Joel Fernandes (Google)" <[email protected]> wrote:
> F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE:
> A private mapping created after the memfd file that gets sealed with
> F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning
> children and parent share the same memory, even though the mapping is
> private.
That sounds fairly serious. Should this be backported into -stable kernels?
On Thu, Nov 07, 2019 at 05:00:23PM -0800, Andrew Morton wrote:
> On Thu, 7 Nov 2019 14:53:54 -0500 "Joel Fernandes (Google)" <[email protected]> wrote:
>
> > F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE:
> > A private mapping created after the memfd file that gets sealed with
> > F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning
> > children and parent share the same memory, even though the mapping is
> > private.
>
> That sounds fairly serious. Should this be backported into -stable kernels?
Yes, it should be. The F_SEAL_FUTURE_WRITE feature was introduced in v5.1 so
v5.3.x stable kernels would need a backport. I can submit a backport tomorrow
unless we are Ok with stable automatically picking it up (I believe the
stable folks "auto select" fixes which should detect this is a fix since I
have said it is a fix in the subject line).
thanks,
- Joel
On Thu, 7 Nov 2019 21:06:14 -0500 Joel Fernandes <[email protected]> wrote:
> On Thu, Nov 07, 2019 at 05:00:23PM -0800, Andrew Morton wrote:
> > On Thu, 7 Nov 2019 14:53:54 -0500 "Joel Fernandes (Google)" <[email protected]> wrote:
> >
> > > F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE:
> > > A private mapping created after the memfd file that gets sealed with
> > > F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning
> > > children and parent share the same memory, even though the mapping is
> > > private.
> >
> > That sounds fairly serious. Should this be backported into -stable kernels?
>
> Yes, it should be.
I added
Fixes: ab3948f58ff84 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd")
Cc: <[email protected]>
> The F_SEAL_FUTURE_WRITE feature was introduced in v5.1 so
> v5.3.x stable kernels would need a backport. I can submit a backport tomorrow
> unless we are Ok with stable automatically picking it up (I believe the
> stable folks "auto select" fixes which should detect this is a fix since I
> have said it is a fix in the subject line).
The Cc:stable tag should trigger the appropriate actions, assisted by
the Fixes:. I doubt if "fix" in the Subject has much effect.
> - * Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED
> - * read-only mapping, take care to not allow mprotect to revert
> - * protections.
> + * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
> + * MAP_SHARED and read-only, take care to not allow mprotect to
> + * revert protections on such mappings. Do this only for shared
> + * mappings. For private mappings, don't need to mask VM_MAYWRITE
This adds an > 80 char line.
On Thu, Nov 07, 2019 at 09:06:14PM -0500, Joel Fernandes wrote:
> On Thu, Nov 07, 2019 at 05:00:23PM -0800, Andrew Morton wrote:
> > On Thu, 7 Nov 2019 14:53:54 -0500 "Joel Fernandes (Google)" <[email protected]> wrote:
> >
> > > F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE:
> > > A private mapping created after the memfd file that gets sealed with
> > > F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning
> > > children and parent share the same memory, even though the mapping is
> > > private.
> >
> > That sounds fairly serious. Should this be backported into -stable kernels?
>
> Yes, it should be. The F_SEAL_FUTURE_WRITE feature was introduced in v5.1 so
> v5.3.x stable kernels would need a backport. I can submit a backport tomorrow
> unless we are Ok with stable automatically picking it up (I believe the
> stable folks "auto select" fixes which should detect this is a fix since I
> have said it is a fix in the subject line).
Never rely on "auto select" to pick up a patch for stable if you already
know it should go to stable. Just mark it as such, or tell stable@vger
after the fact.
thanks,
greg k-h
On Fri, Nov 08, 2019 at 07:37:15AM +0100, Greg KH wrote:
> On Thu, Nov 07, 2019 at 09:06:14PM -0500, Joel Fernandes wrote:
> > On Thu, Nov 07, 2019 at 05:00:23PM -0800, Andrew Morton wrote:
> > > On Thu, 7 Nov 2019 14:53:54 -0500 "Joel Fernandes (Google)" <[email protected]> wrote:
> > >
> > > > F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE:
> > > > A private mapping created after the memfd file that gets sealed with
> > > > F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning
> > > > children and parent share the same memory, even though the mapping is
> > > > private.
> > >
> > > That sounds fairly serious. Should this be backported into -stable kernels?
> >
> > Yes, it should be. The F_SEAL_FUTURE_WRITE feature was introduced in v5.1 so
> > v5.3.x stable kernels would need a backport. I can submit a backport tomorrow
> > unless we are Ok with stable automatically picking it up (I believe the
> > stable folks "auto select" fixes which should detect this is a fix since I
> > have said it is a fix in the subject line).
>
> Never rely on "auto select" to pick up a patch for stable if you already
> know it should go to stable. Just mark it as such, or tell stable@vger
> after the fact.
Sure, agreed.
Thanks Andrew for adding the tags!
thanks,
- Joel
On Thu, Nov 07, 2019 at 10:33:08PM -0800, Christoph Hellwig wrote:
> > - * Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED
> > - * read-only mapping, take care to not allow mprotect to revert
> > - * protections.
> > + * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
> > + * MAP_SHARED and read-only, take care to not allow mprotect to
> > + * revert protections on such mappings. Do this only for shared
> > + * mappings. For private mappings, don't need to mask VM_MAYWRITE
>
> This adds an > 80 char line.
Oh, true. Sorry. Andrew I hate to ask you but since you took the patch
already, could you just the comment for the character limit in the one
you applied?
thanks,
- Joel