2016-11-24 10:55:49

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 0/7] overlayfs: fix ro/rw fd data inconsistecies

A file is opened for read-only, opened read-write (resulting in a copy up)
and modified. The data read back from the the read-only fd will be stale
in this case (the read-only file descriptor still refers to the lower,
unmodified file).

This patchset fixes issues related to this corner case.

The VFS impact is minimal and performance in the non-corner cases shouldn't
suffer.
---
Miklos Szeredi (7):
vfs: allow overlayfs to intercept file ops
vfs: export filp_clone_open()
mm: ovl: copy-up on MAP_SHARED
ovl: add infrastructure for intercepting file ops
ovl: intercept read_iter
ovl: intercept mmap
ovl: intercept fsync

fs/internal.h | 1 -
fs/open.c | 2 +-
fs/overlayfs/inode.c | 225 +++++++++++++++++++++++++++++++++++++++++++++++
fs/overlayfs/overlayfs.h | 2 +
fs/overlayfs/super.c | 1 +
include/linux/fs.h | 1 +
mm/util.c | 22 +++++
7 files changed, 252 insertions(+), 2 deletions(-)

--
2.5.5


2016-11-24 10:55:52

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 0/7] overlayfs: fix ro/rw fd data inconsistecies

A file is opened for read-only, opened read-write (resulting in a copy up)
and modified. The data read back from the the read-only fd will be stale
in this case (the read-only file descriptor still refers to the lower,
unmodified file).

This patchset fixes issues related to this corner case.

The VFS impact is minimal and performance in the non-corner cases shouldn't
suffer.
---
Miklos Szeredi (7):
vfs: allow overlayfs to intercept file ops
vfs: export filp_clone_open()
mm: ovl: copy-up on MAP_SHARED
ovl: add infrastructure for intercepting file ops
ovl: intercept read_iter
ovl: intercept mmap
ovl: intercept fsync

fs/internal.h | 1 -
fs/open.c | 2 +-
fs/overlayfs/inode.c | 225 +++++++++++++++++++++++++++++++++++++++++++++++
fs/overlayfs/overlayfs.h | 2 +
fs/overlayfs/super.c | 1 +
include/linux/fs.h | 1 +
mm/util.c | 22 +++++
7 files changed, 252 insertions(+), 2 deletions(-)

--
2.5.5

2016-11-24 10:56:03

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 5/7] ovl: intercept read_iter

...in order to handle the corner case when the file is copyied up after
being opened read-only.

Can be verified with the following script:

- 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - -
cd /
rm -rf /tmp/ovl-rorw-test
mkdir /tmp/ovl-rorw-test
cd /tmp/ovl-rorw-test
mkdir -p mnt lower upper work
echo baba > lower/foo
mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work mnt
exec 3< mnt/foo
echo bubu > mnt/foo
cat <&3
exec 3>&-
umount mnt
- 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - -

Correct output is "bubu", incorrect output is "baba".

Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/overlayfs/inode.c | 29 ++++++++++++++++++++++++++++-
1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 3e26615c4697..09c6f99bd5db 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -12,6 +12,7 @@
#include <linux/xattr.h>
#include <linux/posix_acl.h>
#include <linux/module.h>
+#include <linux/file.h>
#include <linux/hashtable.h>
#include "overlayfs.h"

@@ -368,6 +369,29 @@ void ovl_cleanup_fops_htable(void)
__ofop->orig_fops->call; \
})

+static bool ovl_file_is_lower(struct file *file)
+{
+ return !OVL_TYPE_UPPER(ovl_path_type(file->f_path.dentry));
+}
+
+static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+ struct file *file = iocb->ki_filp;
+ ssize_t ret;
+
+ if (likely(ovl_file_is_lower(file)))
+ return OVL_CALL_REAL_FOP(file, read_iter(iocb, to));
+
+ file = filp_clone_open(file);
+ if (IS_ERR(file))
+ return PTR_ERR(file);
+
+ ret = vfs_iter_read(file, to, &iocb->ki_pos);
+ fput(file);
+
+ return ret;
+}
+
static struct ovl_fops *ovl_fops_find(const struct file_operations *orig)
{
struct ovl_fops *ofop;
@@ -404,8 +428,11 @@ static struct ovl_fops *ovl_fops_get(struct file *file)
ofop->magic = OVL_FOPS_MAGIC;
ofop->orig_fops = fops_get(orig);

+ /* Intercept these: */
+ if (orig->read_iter)
+ ofop->fops.read_iter = ovl_read_iter;
+
/* These will need to be intercepted: */
- ofop->fops.read_iter = orig->read_iter;
ofop->fops.mmap = orig->mmap;
ofop->fops.fsync = orig->fsync;

--
2.5.5

2016-11-24 10:55:57

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 1/7] vfs: allow overlayfs to intercept file ops

Almost all of the time overlayfs just wants to let underlying filesystems
handle file operations on regular files.

There is a rare corner case, namely when a file residing a read-only
(lower) layer is:

- opened for O_RDONLY
- opened for O_WRONLY
- contents modified
- contents read back though O_RDONLY fd

Currently overlayfs gives inconsistent result in this case, since the
O_RDONLY file refers to the lower, unmodified file, even after the copy up
has happened.

This happens very rarely, so

a) we want the normal cases (when no copy up happens) still go as fast as
possible;

b) we can let the corner case go slow.

The proposed solution is to allow overlayfs to intercept file operations if
it wants (O_RDONLY open of file on lower layer), but still let the file be
owned by the underlying layer. This means everything in filp, including
the private_data, is controlled by the underlying layer (as it has been
until now) with the exception of f_op, which comes from overlayfs.

This patch doesn't change the actual behavior, it just adds the vfs change
necessary and then unconditionally sets the f_op back to the underlying
file's ops.

For non-overlayfs, this doesn't change anything.

Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/open.c | 2 +-
fs/overlayfs/inode.c | 18 ++++++++++++++++++
2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/fs/open.c b/fs/open.c
index 8a0254b56780..453ef5581389 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -734,7 +734,7 @@ static int do_dentry_open(struct file *f,
if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode))
f->f_mode |= FMODE_ATOMIC_POS;

- f->f_op = fops_get(inode->i_fop);
+ f->f_op = fops_get(f->f_path.dentry->d_inode->i_fop);
if (unlikely(WARN_ON(!f->f_op))) {
error = -ENODEV;
goto cleanup_all;
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index a10e948d24fa..1981a5514f51 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -11,6 +11,7 @@
#include <linux/slab.h>
#include <linux/xattr.h>
#include <linux/posix_acl.h>
+#include <linux/module.h>
#include "overlayfs.h"

static int ovl_copy_up_truncate(struct dentry *dentry)
@@ -333,6 +334,22 @@ static const struct inode_operations ovl_symlink_inode_operations = {
.update_time = ovl_update_time,
};

+static int ovl_open(struct inode *inode, struct file *file)
+{
+ int ret = 0;
+
+ /* Want fops from real inode */
+ replace_fops(file, inode->i_fop);
+ if (file->f_op->open)
+ ret = file->f_op->open(inode, file);
+
+ return ret;
+}
+
+static const struct file_operations ovl_file_operations = {
+ .open = ovl_open,
+};
+
static void ovl_fill_inode(struct inode *inode, umode_t mode, dev_t rdev)
{
inode->i_ino = get_next_ino();
@@ -345,6 +362,7 @@ static void ovl_fill_inode(struct inode *inode, umode_t mode, dev_t rdev)
switch (mode & S_IFMT) {
case S_IFREG:
inode->i_op = &ovl_file_inode_operations;
+ inode->i_fop = &ovl_file_operations;
break;

case S_IFDIR:
--
2.5.5

2016-11-24 10:56:25

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 4/7] ovl: add infrastructure for intercepting file ops

Overlayfs needs to intercept a few file operations in order to properly
handle the following corner case:

- file X is in overlayfs;

- X resides on a lower (read-only) layer
the lower file is L;

- X is opened with O_RDONLY ->
L is opened -> 'rofd';

- X is opened with O_WRONLY ->
this results in L being copied up to the upper layer file U
U is opened -> 'rwfd';

- write to 'rwfd' modifying U;

- read from 'rofd' gets data from unmodified L;

While this is a rare occurrence, it has been known to cause problems. To
prevent such inconsistencies, allow intercepting read operations and fix
them up in the unlikely case that it's needed.

This patch adds an ovl_fops structure that is going to contain the pieces
necessary for intercepting file operations:

a) a new file_operations structure, based on the original but changing
those operations that need to be intercepted;

b) a pointer to the origin f_op. Intercepted operations will normally
just call the original, unless the file was copied up;

c) a hash table entry to hook this up for reuse in subsequent opens.

The hash table is small (32 buckets) since there will only be a few
different file operations used (basically the number of different
filesystems used as lower layer of overlay). The key to the has table is
the original fops pointer.

Insertion is done under a global mutex. There will only be one insertion
event for each unique underlying file_operations structure, so this is
going to be rare.

The hash table is accessed using rcu, so this will add minimal overhead to
opens. The override fops are removed only at module exit time, so we don't
even have to be worry about read-side rcu locking.

There are a few assumption this scheme makes about file operation
structures supplied by filesystems:

1) the lifetime of the structures is equal to the lifetime of the
filesystem module; ensure this by holding a ref to the
sb->s_type->owner (normal filesystems don't fill in f_ops->owner).
This means that once a filesystem has been used as lower layer of
overlayfs its module cannot be removed until the overlay module itself
is removed. I don't believe this will be a problem.

2) Filesystems should not make use of f_op in any way except possibly
replacing it in ->open. Overlayfs itself breaks this assumption, but
recursion is handled by ->d_real so this is not an issue. Add a
->magic field to the override fops structure to verify that the
filesystem didn't mess with file->f_op.

Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/overlayfs/inode.c | 144 ++++++++++++++++++++++++++++++++++++++++++++++-
fs/overlayfs/overlayfs.h | 2 +
fs/overlayfs/super.c | 1 +
3 files changed, 146 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 1981a5514f51..3e26615c4697 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -12,6 +12,7 @@
#include <linux/xattr.h>
#include <linux/posix_acl.h>
#include <linux/module.h>
+#include <linux/hashtable.h>
#include "overlayfs.h"

static int ovl_copy_up_truncate(struct dentry *dentry)
@@ -334,16 +335,157 @@ static const struct inode_operations ovl_symlink_inode_operations = {
.update_time = ovl_update_time,
};

+static DEFINE_READ_MOSTLY_HASHTABLE(ovl_fops_htable, 5);
+static DEFINE_MUTEX(ovl_fops_mutex);
+
+#define OVL_FOPS_MAGIC 0x73706f462a6c764f
+
+struct ovl_fops {
+ struct module *owner;
+ struct file_operations fops;
+ u64 magic;
+ const struct file_operations *orig_fops;
+ struct hlist_node entry;
+};
+
+void ovl_cleanup_fops_htable(void)
+{
+ int bkt;
+ struct hlist_node *tmp;
+ struct ovl_fops *ofop;
+
+ hash_for_each_safe(ovl_fops_htable, bkt, tmp, ofop, entry) {
+ module_put(ofop->owner);
+ fops_put(ofop->orig_fops);
+ kfree(ofop);
+ }
+}
+
+#define OVL_CALL_REAL_FOP(file, call) \
+ ({ struct ovl_fops *__ofop = \
+ container_of(file->f_op, struct ovl_fops, fops); \
+ WARN_ON(__ofop->magic != OVL_FOPS_MAGIC) ? -EIO : \
+ __ofop->orig_fops->call; \
+ })
+
+static struct ovl_fops *ovl_fops_find(const struct file_operations *orig)
+{
+ struct ovl_fops *ofop;
+
+ hash_for_each_possible_rcu(ovl_fops_htable, ofop, entry, (long) orig) {
+ if (ofop->orig_fops == orig)
+ return ofop;
+ }
+ return NULL;
+}
+
+static struct ovl_fops *ovl_fops_get(struct file *file)
+{
+ const struct file_operations *orig = file->f_op;
+ struct ovl_fops *ofop = ovl_fops_find(orig);
+
+ if (likely(ofop))
+ return ofop;
+
+ mutex_lock(&ovl_fops_mutex);
+ ofop = ovl_fops_find(orig);
+ if (ofop)
+ goto out_unlock;
+
+ ofop = kzalloc(sizeof(struct ovl_fops), GFP_KERNEL);
+ if (ofop) {
+ /*
+ * FS don't usually fill in fops->owner, so grab ref to
+ * filesystem's module as well.
+ */
+ ofop->owner = file_inode(file)->i_sb->s_type->owner;
+ __module_get(ofop->owner);
+
+ ofop->magic = OVL_FOPS_MAGIC;
+ ofop->orig_fops = fops_get(orig);
+
+ /* These will need to be intercepted: */
+ ofop->fops.read_iter = orig->read_iter;
+ ofop->fops.mmap = orig->mmap;
+ ofop->fops.fsync = orig->fsync;
+
+ /*
+ * These should be intercepted, but they are very unlikely to be
+ * a problem in practice. Leave them alone for now.
+ */
+ ofop->fops.copy_file_range = orig->copy_file_range;
+ ofop->fops.clone_file_range = orig->clone_file_range;
+ ofop->fops.dedupe_file_range = orig->dedupe_file_range;
+
+ /* Don't intercept these: */
+ ofop->fops.llseek = orig->llseek;
+ ofop->fops.unlocked_ioctl = orig->unlocked_ioctl;
+ ofop->fops.compat_ioctl = orig->compat_ioctl;
+ ofop->fops.flush = orig->flush;
+ ofop->fops.release = orig->release;
+ ofop->fops.get_unmapped_area = orig->get_unmapped_area;
+ ofop->fops.check_flags = orig->check_flags;
+
+ /* splice_read should be generic_file_splice_read */
+ WARN_ON(orig->splice_read != generic_file_splice_read);
+ ofop->fops.splice_read = generic_file_splice_read;
+
+ /* These make no sense for "normal" files: */
+ WARN_ON(orig->read);
+ WARN_ON(orig->iterate);
+ WARN_ON(orig->iterate_shared);
+ WARN_ON(orig->poll);
+ WARN_ON(orig->fasync);
+ WARN_ON(orig->show_fdinfo);
+
+ /*
+ * Don't add those which are unneeded for O_RDONLY:
+ *
+ * write
+ * write_iter
+ * splice_write
+ * sendpage
+ * fallocate
+ *
+ * Locking operations are already intercepted by vfs for ovl:
+ *
+ * lock
+ * flock
+ * setlease
+ */
+
+ hash_add_rcu(ovl_fops_htable, &ofop->entry, (long) orig);
+ }
+out_unlock:
+ mutex_unlock(&ovl_fops_mutex);
+
+ return ofop;
+}
+
static int ovl_open(struct inode *inode, struct file *file)
{
int ret = 0;
+ struct ovl_fops *ofop;
+ bool isupper = OVL_TYPE_UPPER(ovl_path_type(file->f_path.dentry));

/* Want fops from real inode */
replace_fops(file, inode->i_fop);
if (file->f_op->open)
ret = file->f_op->open(inode, file);

- return ret;
+ /* No need to override fops for upper */
+ if (isupper || ret)
+ return ret;
+
+ ofop = ovl_fops_get(file);
+ if (unlikely(!ofop)) {
+ if (file->f_op->release)
+ file->f_op->release(inode, file);
+ return -ENOMEM;
+ }
+ replace_fops(file, &ofop->fops);
+
+ return 0;
}

static const struct file_operations ovl_file_operations = {
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index bdda37fa3f67..abc00f4ac247 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -209,6 +209,8 @@ static inline void ovl_copyattr(struct inode *from, struct inode *to)
to->i_ctime = from->i_ctime;
}

+void ovl_cleanup_fops_htable(void);
+
/* dir.c */
extern const struct inode_operations ovl_dir_inode_operations;
struct dentry *ovl_lookup_temp(struct dentry *workdir, struct dentry *dentry);
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index e847cc354599..89cbb11eef58 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -960,6 +960,7 @@ static int __init ovl_init(void)
static void __exit ovl_exit(void)
{
unregister_filesystem(&ovl_fs_type);
+ ovl_cleanup_fops_htable();
}

module_init(ovl_init);
--
2.5.5

2016-11-24 10:56:51

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 3/7] mm: ovl: copy-up on MAP_SHARED

A corner case of a corner case is when

- file opened for O_RDONLY
- which is then memory mapped SHARED
- file opened for O_WRONLY
- contents modified
- contents read back though the shared mapping

Unfortunately it looks very difficult to do anything about the established
shared map after the file is copied up. Instead when a read-only file is
mapped shared overlayfs copies up the file before actually doing the map.
This may result in unnecessary copy-ups (but so may copy-up on open(O_RDWR)
for exampe).

We can revisit this later if it turns out to be a performance problem in
real life.

Signed-off-by: Miklos Szeredi <[email protected]>
---
mm/util.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/mm/util.c b/mm/util.c
index 1a41553db866..09a179f92d18 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -300,6 +300,28 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,

ret = security_mmap_file(file, prot, flag);
if (!ret) {
+ /*
+ * Special treatment for overlayfs:
+ *
+ * Take MAP_SHARED/PROT_READ as hint about future writes to the
+ * file (through another file descriptor). Caller might not
+ * have had such an intent, but we hope MAP_PRIVATE will be used
+ * in most such cases.
+ *
+ * If we don't copy up now and the file is modified, it becomes
+ * really difficult to change the mapping to match that of the
+ * file's content later.
+ *
+ * Copy up needs to be done without mmap_sem since it takes vfs
+ * locks which would potentially deadlock under mmap_sem.
+ */
+ if ((flag & MAP_SHARED) && !(prot & PROT_WRITE)) {
+ void *p = d_real(file->f_path.dentry, NULL, O_WRONLY);
+
+ if (IS_ERR(p))
+ return PTR_ERR(p);
+ }
+
if (down_write_killable(&mm->mmap_sem))
return -EINTR;
ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
--
2.5.5

2016-11-24 11:03:25

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 7/7] ovl: intercept fsync

Fsync is really an inode operation (AFAICS) so a doing it on a O_RDONLY
file descriptor should flush any data written through an O_WRONLY file
descriptor for example.

To make this work correctly in case the file is copied up after being
opened for read intercept the fsync fop, similarly to read_iter and mmap.

Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/overlayfs/inode.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 25b31c6ebe9e..f522cb350f3c 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -412,6 +412,24 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
return file->f_op->mmap(file, vma);
}

+static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
+{
+ int ret;
+
+ if (likely(ovl_file_is_lower(file))) {
+ return OVL_CALL_REAL_FOP(file,
+ fsync(file, start, end, datasync));
+ }
+ file = filp_clone_open(file);
+ if (IS_ERR(file))
+ return PTR_ERR(file);
+
+ ret = vfs_fsync_range(file, start, end, datasync);
+ fput(file);
+
+ return ret;
+}
+
static struct ovl_fops *ovl_fops_find(const struct file_operations *orig)
{
struct ovl_fops *ofop;
@@ -453,9 +471,8 @@ static struct ovl_fops *ovl_fops_get(struct file *file)
ofop->fops.read_iter = ovl_read_iter;
if (orig->mmap)
ofop->fops.mmap = ovl_mmap;
-
- /* These will need to be intercepted: */
- ofop->fops.fsync = orig->fsync;
+ if (orig->fsync)
+ ofop->fops.fsync = ovl_fsync;

/*
* These should be intercepted, but they are very unlikely to be
--
2.5.5

2016-11-24 11:03:30

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 6/7] ovl: intercept mmap

... in order to handle the corner case when the file is copied up after
being opened read-only and mapped shared.

Can be verified with the following script:

- 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - -
cd /
rm -rf /tmp/ovl-rorw-test
mkdir /tmp/ovl-rorw-test
cd /tmp/ovl-rorw-test
cat << EOF > rorw-map.c
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <err.h>
#include <sys/mman.h>

int main(int argc, char *argv[])
{
int rofd, rwfd;
int ret;
char buf[4];
char *addr;

rofd = open(argv[1], O_RDONLY);
if (rofd == -1)
err(1, "ro open");

addr = mmap(NULL, 4, PROT_READ, MAP_SHARED, rofd, 0);
if (addr == MAP_FAILED)
err(1, "ro mmap");

if (memcmp(addr, "bubu", 4) == 0)
errx(1, "identical startup data");

rwfd = open(argv[1], O_WRONLY);
if (rwfd == -1)
err(1, "rw open");

ret = write(rwfd, "bubu", 4);
if (ret == -1)
err(1, "write");
if (ret < 4)
errx(1, "short write");

if (memcmp(addr, "bubu", 4) != 0)
errx(1, "bad mmap data");

ret = read(rofd, buf, 4);
if (ret == -1)
err(1, "read");
if (ret < 4)
errx(1, "short read");
if (memcmp(buf, "bubu", 4) != 0)
errx(1, "bad read data");

return 0;
}
EOF
gcc -o rorw-map rorw-map.c
mkdir -p mnt lower upper work
echo baba > lower/foo
mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work mnt
./rorw-map mnt/foo
umount mnt
- 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - -

No output means success, "rorw-map: bad mmap data" means failure.

Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/overlayfs/inode.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 09c6f99bd5db..25b31c6ebe9e 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -11,6 +11,7 @@
#include <linux/slab.h>
#include <linux/xattr.h>
#include <linux/posix_acl.h>
+#include <linux/mm.h>
#include <linux/module.h>
#include <linux/file.h>
#include <linux/hashtable.h>
@@ -392,6 +393,25 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *to)
return ret;
}

+static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ if (likely(ovl_file_is_lower(file)))
+ return OVL_CALL_REAL_FOP(file, mmap(file, vma));
+
+ file = filp_clone_open(file);
+ if (IS_ERR(file))
+ return PTR_ERR(file);
+
+ fput(vma->vm_file);
+ /* transfer ref: */
+ vma->vm_file = file;
+
+ if (!file->f_op->mmap)
+ return -ENODEV;
+
+ return file->f_op->mmap(file, vma);
+}
+
static struct ovl_fops *ovl_fops_find(const struct file_operations *orig)
{
struct ovl_fops *ofop;
@@ -431,9 +451,10 @@ static struct ovl_fops *ovl_fops_get(struct file *file)
/* Intercept these: */
if (orig->read_iter)
ofop->fops.read_iter = ovl_read_iter;
+ if (orig->mmap)
+ ofop->fops.mmap = ovl_mmap;

/* These will need to be intercepted: */
- ofop->fops.mmap = orig->mmap;
ofop->fops.fsync = orig->fsync;

/*
--
2.5.5

2016-11-24 11:04:10

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH 2/7] vfs: export filp_clone_open()

Already exported to modules, just needs to move declaration from
"fs/internal.h" to <linux/fs.h>.

This will be used by overlayfs to clone a read-only file that has been
copied up after being opened. The clone will refer to the copied-up uppper
file instead of the original, lower file.

Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/internal.h | 1 -
include/linux/fs.h | 1 +
2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/internal.h b/fs/internal.h
index f4da3341b4a3..361ba1c12698 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -112,7 +112,6 @@ extern long do_handle_open(int mountdirfd,
struct file_handle __user *ufh, int open_flag);
extern int open_check_o_direct(struct file *f);
extern int vfs_open(const struct path *, struct file *, const struct cred *);
-extern struct file *filp_clone_open(struct file *);

/*
* inode.c
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 87a32f999812..e3bc970214e1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2355,6 +2355,7 @@ extern struct file *filp_open(const char *, int, umode_t);
extern struct file *file_open_root(struct dentry *, struct vfsmount *,
const char *, int, umode_t);
extern struct file * dentry_open(const struct path *, int, const struct cred *);
+extern struct file *filp_clone_open(struct file *);
extern int filp_close(struct file *, fl_owner_t id);

extern struct filename *getname_flags(const char __user *, int, int *);
--
2.5.5

2016-11-24 11:53:35

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 4/7] ovl: add infrastructure for intercepting file ops

On Thu, Nov 24, 2016 at 12:55 PM, Miklos Szeredi <[email protected]> wrote:
> Overlayfs needs to intercept a few file operations in order to properly
> handle the following corner case:
>
> - file X is in overlayfs;
>
> - X resides on a lower (read-only) layer
> the lower file is L;
>
> - X is opened with O_RDONLY ->
> L is opened -> 'rofd';
>
> - X is opened with O_WRONLY ->
> this results in L being copied up to the upper layer file U
> U is opened -> 'rwfd';
>
> - write to 'rwfd' modifying U;
>
> - read from 'rofd' gets data from unmodified L;
>
> While this is a rare occurrence, it has been known to cause problems. To
> prevent such inconsistencies, allow intercepting read operations and fix
> them up in the unlikely case that it's needed.
>
> This patch adds an ovl_fops structure that is going to contain the pieces
> necessary for intercepting file operations:
>
> a) a new file_operations structure, based on the original but changing
> those operations that need to be intercepted;
>
> b) a pointer to the origin f_op. Intercepted operations will normally
> just call the original, unless the file was copied up;
>
> c) a hash table entry to hook this up for reuse in subsequent opens.
>
> The hash table is small (32 buckets) since there will only be a few
> different file operations used (basically the number of different
> filesystems used as lower layer of overlay). The key to the has table is
> the original fops pointer.
>
> Insertion is done under a global mutex. There will only be one insertion
> event for each unique underlying file_operations structure, so this is
> going to be rare.
>
> The hash table is accessed using rcu, so this will add minimal overhead to
> opens. The override fops are removed only at module exit time, so we don't
> even have to be worry about read-side rcu locking.
>
> There are a few assumption this scheme makes about file operation
> structures supplied by filesystems:
>
> 1) the lifetime of the structures is equal to the lifetime of the
> filesystem module; ensure this by holding a ref to the
> sb->s_type->owner (normal filesystems don't fill in f_ops->owner).
> This means that once a filesystem has been used as lower layer of
> overlayfs its module cannot be removed until the overlay module itself
> is removed. I don't believe this will be a problem.
>
> 2) Filesystems should not make use of f_op in any way except possibly
> replacing it in ->open. Overlayfs itself breaks this assumption, but
> recursion is handled by ->d_real so this is not an issue. Add a
> ->magic field to the override fops structure to verify that the
> filesystem didn't mess with file->f_op.
>
> Signed-off-by: Miklos Szeredi <[email protected]>
> ---
> fs/overlayfs/inode.c | 144 ++++++++++++++++++++++++++++++++++++++++++++++-
> fs/overlayfs/overlayfs.h | 2 +
> fs/overlayfs/super.c | 1 +
> 3 files changed, 146 insertions(+), 1 deletion(-)
>
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index 1981a5514f51..3e26615c4697 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -12,6 +12,7 @@
> #include <linux/xattr.h>
> #include <linux/posix_acl.h>
> #include <linux/module.h>
> +#include <linux/hashtable.h>
> #include "overlayfs.h"
>
> static int ovl_copy_up_truncate(struct dentry *dentry)
> @@ -334,16 +335,157 @@ static const struct inode_operations ovl_symlink_inode_operations = {
> .update_time = ovl_update_time,
> };
>
> +static DEFINE_READ_MOSTLY_HASHTABLE(ovl_fops_htable, 5);
> +static DEFINE_MUTEX(ovl_fops_mutex);
> +
> +#define OVL_FOPS_MAGIC 0x73706f462a6c764f
> +
> +struct ovl_fops {
> + struct module *owner;
> + struct file_operations fops;
> + u64 magic;
> + const struct file_operations *orig_fops;
> + struct hlist_node entry;
> +};
> +
> +void ovl_cleanup_fops_htable(void)
> +{
> + int bkt;
> + struct hlist_node *tmp;
> + struct ovl_fops *ofop;
> +
> + hash_for_each_safe(ovl_fops_htable, bkt, tmp, ofop, entry) {
> + module_put(ofop->owner);
> + fops_put(ofop->orig_fops);
> + kfree(ofop);
> + }
> +}
> +
> +#define OVL_CALL_REAL_FOP(file, call) \
> + ({ struct ovl_fops *__ofop = \
> + container_of(file->f_op, struct ovl_fops, fops); \
> + WARN_ON(__ofop->magic != OVL_FOPS_MAGIC) ? -EIO : \
> + __ofop->orig_fops->call; \
> + })
> +
> +static struct ovl_fops *ovl_fops_find(const struct file_operations *orig)
> +{
> + struct ovl_fops *ofop;
> +
> + hash_for_each_possible_rcu(ovl_fops_htable, ofop, entry, (long) orig) {
> + if (ofop->orig_fops == orig)
> + return ofop;
> + }
> + return NULL;
> +}
> +
> +static struct ovl_fops *ovl_fops_get(struct file *file)
> +{
> + const struct file_operations *orig = file->f_op;
> + struct ovl_fops *ofop = ovl_fops_find(orig);
> +
> + if (likely(ofop))
> + return ofop;
> +
> + mutex_lock(&ovl_fops_mutex);
> + ofop = ovl_fops_find(orig);
> + if (ofop)
> + goto out_unlock;
> +
> + ofop = kzalloc(sizeof(struct ovl_fops), GFP_KERNEL);

Suggestion:
if (!ofop) goto out_unlock;
and rid of all this nested chunk


> + if (ofop) {
> + /*
> + * FS don't usually fill in fops->owner, so grab ref to
> + * filesystem's module as well.
> + */
> + ofop->owner = file_inode(file)->i_sb->s_type->owner;
> + __module_get(ofop->owner);
> +
> + ofop->magic = OVL_FOPS_MAGIC;
> + ofop->orig_fops = fops_get(orig);
> +
> + /* These will need to be intercepted: */
> + ofop->fops.read_iter = orig->read_iter;
> + ofop->fops.mmap = orig->mmap;
> + ofop->fops.fsync = orig->fsync;
> +
> + /*
> + * These should be intercepted, but they are very unlikely to be
> + * a problem in practice. Leave them alone for now.

It could also be handled in vfs helpers.
Since these ops all start with establishing that src and dest are on
the same sb,
then the cost of copy up of src is the cost of clone_file_range from
lower to upper,
so it is probably worth to copy up src and leave those fops alone.

> + */
> + ofop->fops.copy_file_range = orig->copy_file_range;
> + ofop->fops.clone_file_range = orig->clone_file_range;
> + ofop->fops.dedupe_file_range = orig->dedupe_file_range;
> +
> + /* Don't intercept these: */
> + ofop->fops.llseek = orig->llseek;
> + ofop->fops.unlocked_ioctl = orig->unlocked_ioctl;
> + ofop->fops.compat_ioctl = orig->compat_ioctl;
> + ofop->fops.flush = orig->flush;
> + ofop->fops.release = orig->release;
> + ofop->fops.get_unmapped_area = orig->get_unmapped_area;
> + ofop->fops.check_flags = orig->check_flags;
> +
> + /* splice_read should be generic_file_splice_read */
> + WARN_ON(orig->splice_read != generic_file_splice_read);
> + ofop->fops.splice_read = generic_file_splice_read;
> +
> + /* These make no sense for "normal" files: */
> + WARN_ON(orig->read);
> + WARN_ON(orig->iterate);
> + WARN_ON(orig->iterate_shared);
> + WARN_ON(orig->poll);
> + WARN_ON(orig->fasync);
> + WARN_ON(orig->show_fdinfo);
> +
> + /*
> + * Don't add those which are unneeded for O_RDONLY:
> + *
> + * write
> + * write_iter
> + * splice_write
> + * sendpage
> + * fallocate
> + *
> + * Locking operations are already intercepted by vfs for ovl:
> + *
> + * lock
> + * flock
> + * setlease
> + */
> +
> + hash_add_rcu(ovl_fops_htable, &ofop->entry, (long) orig);
> + }
> +out_unlock:
> + mutex_unlock(&ovl_fops_mutex);
> +
> + return ofop;
> +}
> +
> static int ovl_open(struct inode *inode, struct file *file)
> {
> int ret = 0;
> + struct ovl_fops *ofop;
> + bool isupper = OVL_TYPE_UPPER(ovl_path_type(file->f_path.dentry));
>
> /* Want fops from real inode */
> replace_fops(file, inode->i_fop);
> if (file->f_op->open)
> ret = file->f_op->open(inode, file);
>
> - return ret;
> + /* No need to override fops for upper */
> + if (isupper || ret)
> + return ret;
> +
> + ofop = ovl_fops_get(file);
> + if (unlikely(!ofop)) {
> + if (file->f_op->release)
> + file->f_op->release(inode, file);
> + return -ENOMEM;
> + }
> + replace_fops(file, &ofop->fops);
> +
> + return 0;
> }
>
> static const struct file_operations ovl_file_operations = {
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index bdda37fa3f67..abc00f4ac247 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -209,6 +209,8 @@ static inline void ovl_copyattr(struct inode *from, struct inode *to)
> to->i_ctime = from->i_ctime;
> }
>
> +void ovl_cleanup_fops_htable(void);
> +
> /* dir.c */
> extern const struct inode_operations ovl_dir_inode_operations;
> struct dentry *ovl_lookup_temp(struct dentry *workdir, struct dentry *dentry);
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index e847cc354599..89cbb11eef58 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -960,6 +960,7 @@ static int __init ovl_init(void)
> static void __exit ovl_exit(void)
> {
> unregister_filesystem(&ovl_fs_type);
> + ovl_cleanup_fops_htable();
> }
>
> module_init(ovl_init);
> --
> 2.5.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2016-11-24 12:03:43

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 4/7] ovl: add infrastructure for intercepting file ops

On Thu, Nov 24, 2016 at 12:52 PM, Amir Goldstein <[email protected]> wrote:
> On Thu, Nov 24, 2016 at 12:55 PM, Miklos Szeredi <[email protected]> wrote:

>> + /*
>> + * These should be intercepted, but they are very unlikely to be
>> + * a problem in practice. Leave them alone for now.
>
> It could also be handled in vfs helpers.
> Since these ops all start with establishing that src and dest are on
> the same sb,
> then the cost of copy up of src is the cost of clone_file_range from
> lower to upper,
> so it is probably worth to copy up src and leave those fops alone.
>
>> + */
>> + ofop->fops.copy_file_range = orig->copy_file_range;
>> + ofop->fops.clone_file_range = orig->clone_file_range;
>> + ofop->fops.dedupe_file_range = orig->dedupe_file_range;

Not sure I understand. Why should we copy up src? Copy up is the
problem not the solution.

Thanks,
Miklos

2016-11-24 13:13:01

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 4/7] ovl: add infrastructure for intercepting file ops

On Thu, Nov 24, 2016 at 2:03 PM, Miklos Szeredi <[email protected]> wrote:
> On Thu, Nov 24, 2016 at 12:52 PM, Amir Goldstein <[email protected]> wrote:
>> On Thu, Nov 24, 2016 at 12:55 PM, Miklos Szeredi <[email protected]> wrote:
>
>>> + /*
>>> + * These should be intercepted, but they are very unlikely to be
>>> + * a problem in practice. Leave them alone for now.
>>
>> It could also be handled in vfs helpers.
>> Since these ops all start with establishing that src and dest are on
>> the same sb,
>> then the cost of copy up of src is the cost of clone_file_range from
>> lower to upper,
>> so it is probably worth to copy up src and leave those fops alone.
>>
>>> + */
>>> + ofop->fops.copy_file_range = orig->copy_file_range;
>>> + ofop->fops.clone_file_range = orig->clone_file_range;
>>> + ofop->fops.dedupe_file_range = orig->dedupe_file_range;
>
> Not sure I understand. Why should we copy up src? Copy up is the
> problem not the solution.
>

Maybe the idea is ill conceived, but the reasoning is:
To avoid the corner case of cloning from a stale lower src,
call d_real() in vfs helpers to always copy up src before cloning from it
and pass the correct file onwards.

It would have been crazy if we suggested the same for read_iter(),
so why it may make sense for clone?
because once you got that far into the vfs_clone_range() helper,
with src from lower and dst from upper, it means they are on the same sb
that supports clone, so it is likely that copy up is going to use clone and
be relatively cheap.

Pretty twisted. I agree... and not sure if this logic holds for copy_range as
well. Anyway, that's what I meant.

Amir.

2016-11-24 13:25:05

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 6/7] ovl: intercept mmap

On Thu, Nov 24, 2016 at 12:55 PM, Miklos Szeredi <[email protected]> wrote:
> ... in order to handle the corner case when the file is copied up after
> being opened read-only and mapped shared.
>
> Can be verified with the following script:
>
> - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - -
> cd /
> rm -rf /tmp/ovl-rorw-test
> mkdir /tmp/ovl-rorw-test
> cd /tmp/ovl-rorw-test
> cat << EOF > rorw-map.c
> #include <fcntl.h>
> #include <unistd.h>
> #include <string.h>
> #include <err.h>
> #include <sys/mman.h>
>
> int main(int argc, char *argv[])
> {
> int rofd, rwfd;
> int ret;
> char buf[4];
> char *addr;
>
> rofd = open(argv[1], O_RDONLY);
> if (rofd == -1)
> err(1, "ro open");
>
> addr = mmap(NULL, 4, PROT_READ, MAP_SHARED, rofd, 0);
> if (addr == MAP_FAILED)
> err(1, "ro mmap");
>
> if (memcmp(addr, "bubu", 4) == 0)
> errx(1, "identical startup data");
>
> rwfd = open(argv[1], O_WRONLY);
> if (rwfd == -1)
> err(1, "rw open");
>
> ret = write(rwfd, "bubu", 4);
> if (ret == -1)
> err(1, "write");
> if (ret < 4)
> errx(1, "short write");
>
> if (memcmp(addr, "bubu", 4) != 0)
> errx(1, "bad mmap data");
>
> ret = read(rofd, buf, 4);
> if (ret == -1)
> err(1, "read");
> if (ret < 4)
> errx(1, "short read");
> if (memcmp(buf, "bubu", 4) != 0)
> errx(1, "bad read data");
>
> return 0;
> }
> EOF


Good timing :-)
I just started working on an xfstest this morning to bring this corner
case into attention.
Once I get the xfs_io commands in order, you could use them for a
shorter commit message.
Unless you like it that way...

> gcc -o rorw-map rorw-map.c
> mkdir -p mnt lower upper work
> echo baba > lower/foo
> mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work mnt
> ./rorw-map mnt/foo
> umount mnt
> - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - -
>
> No output means success, "rorw-map: bad mmap data" means failure.
>
> Signed-off-by: Miklos Szeredi <[email protected]>
> ---
> fs/overlayfs/inode.c | 23 ++++++++++++++++++++++-
> 1 file changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index 09c6f99bd5db..25b31c6ebe9e 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -11,6 +11,7 @@
> #include <linux/slab.h>
> #include <linux/xattr.h>
> #include <linux/posix_acl.h>
> +#include <linux/mm.h>
> #include <linux/module.h>
> #include <linux/file.h>
> #include <linux/hashtable.h>
> @@ -392,6 +393,25 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *to)
> return ret;
> }
>
> +static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> + if (likely(ovl_file_is_lower(file)))
> + return OVL_CALL_REAL_FOP(file, mmap(file, vma));
> +
> + file = filp_clone_open(file);
> + if (IS_ERR(file))
> + return PTR_ERR(file);
> +
> + fput(vma->vm_file);
> + /* transfer ref: */
> + vma->vm_file = file;
> +
> + if (!file->f_op->mmap)
> + return -ENODEV;
> +
> + return file->f_op->mmap(file, vma);
> +}
> +
> static struct ovl_fops *ovl_fops_find(const struct file_operations *orig)
> {
> struct ovl_fops *ofop;
> @@ -431,9 +451,10 @@ static struct ovl_fops *ovl_fops_get(struct file *file)
> /* Intercept these: */
> if (orig->read_iter)
> ofop->fops.read_iter = ovl_read_iter;
> + if (orig->mmap)
> + ofop->fops.mmap = ovl_mmap;
>
> /* These will need to be intercepted: */
> - ofop->fops.mmap = orig->mmap;
> ofop->fops.fsync = orig->fsync;
>
> /*
> --
> 2.5.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2016-11-24 13:51:25

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 4/7] ovl: add infrastructure for intercepting file ops

On Thu, Nov 24, 2016 at 2:12 PM, Amir Goldstein <[email protected]> wrote:
> On Thu, Nov 24, 2016 at 2:03 PM, Miklos Szeredi <[email protected]> wrote:
>> On Thu, Nov 24, 2016 at 12:52 PM, Amir Goldstein <[email protected]> wrote:
>>> On Thu, Nov 24, 2016 at 12:55 PM, Miklos Szeredi <[email protected]> wrote:
>>
>>>> + /*
>>>> + * These should be intercepted, but they are very unlikely to be
>>>> + * a problem in practice. Leave them alone for now.
>>>
>>> It could also be handled in vfs helpers.
>>> Since these ops all start with establishing that src and dest are on
>>> the same sb,
>>> then the cost of copy up of src is the cost of clone_file_range from
>>> lower to upper,
>>> so it is probably worth to copy up src and leave those fops alone.
>>>
>>>> + */
>>>> + ofop->fops.copy_file_range = orig->copy_file_range;
>>>> + ofop->fops.clone_file_range = orig->clone_file_range;
>>>> + ofop->fops.dedupe_file_range = orig->dedupe_file_range;
>>
>> Not sure I understand. Why should we copy up src? Copy up is the
>> problem not the solution.
>>
>
> Maybe the idea is ill conceived, but the reasoning is:
> To avoid the corner case of cloning from a stale lower src,
> call d_real() in vfs helpers to always copy up src before cloning from it
> and pass the correct file onwards.

Which correct file? src is still the wrong one after calling d_real.
We need to clone-open src, just like we do in ovl_read_iter to get the
correct file. But then what's the use of copying it up beforehand?

We could move the whole logic into the vfs, but I don't really see the point.

I left these ops alone because there is some confusion in there about
getting the f_op from the source or the destination file. And while
it doesn't matter normally (all regular files have the same f_op,
regardless of open flags) it does matter for overlayfs intercept,
because overriding fops in the the dest file would mean additional
complexity and resources). That could easily be fixed in the vfs:
calling src->f_ops->foo works equally well, but I simply didn't want
to bother with this. We can return to it later.

Thanks,
Miklos

2016-11-24 14:09:24

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 4/7] ovl: add infrastructure for intercepting file ops

On Thu, Nov 24, 2016 at 3:51 PM, Miklos Szeredi <[email protected]> wrote:
> On Thu, Nov 24, 2016 at 2:12 PM, Amir Goldstein <[email protected]> wrote:
>> On Thu, Nov 24, 2016 at 2:03 PM, Miklos Szeredi <[email protected]> wrote:
>>> On Thu, Nov 24, 2016 at 12:52 PM, Amir Goldstein <[email protected]> wrote:
>>>> On Thu, Nov 24, 2016 at 12:55 PM, Miklos Szeredi <[email protected]> wrote:
>>>
>>>>> + /*
>>>>> + * These should be intercepted, but they are very unlikely to be
>>>>> + * a problem in practice. Leave them alone for now.
>>>>
>>>> It could also be handled in vfs helpers.
>>>> Since these ops all start with establishing that src and dest are on
>>>> the same sb,
>>>> then the cost of copy up of src is the cost of clone_file_range from
>>>> lower to upper,
>>>> so it is probably worth to copy up src and leave those fops alone.
>>>>
>>>>> + */
>>>>> + ofop->fops.copy_file_range = orig->copy_file_range;
>>>>> + ofop->fops.clone_file_range = orig->clone_file_range;
>>>>> + ofop->fops.dedupe_file_range = orig->dedupe_file_range;
>>>
>>> Not sure I understand. Why should we copy up src? Copy up is the
>>> problem not the solution.
>>>
>>
>> Maybe the idea is ill conceived, but the reasoning is:
>> To avoid the corner case of cloning from a stale lower src,
>> call d_real() in vfs helpers to always copy up src before cloning from it
>> and pass the correct file onwards.
>
> Which correct file? src is still the wrong one after calling d_real.
> We need to clone-open src, just like we do in ovl_read_iter to get the
> correct file. But then what's the use of copying it up beforehand?
>
> We could move the whole logic into the vfs, but I don't really see the point.
>
> I left these ops alone because there is some confusion in there about
> getting the f_op from the source or the destination file.

Yes, I saw that. Shouldn't be a problem to always use src->f_ops-> IMO.

Could you please push this work to a topic branch to make it easier for me to
pull and test?

Thanks,
Amir.

> And while
> it doesn't matter normally (all regular files have the same f_op,
> regardless of open flags) it does matter for overlayfs intercept,
> because overriding fops in the the dest file would mean additional
> complexity and resources). That could easily be fixed in the vfs:
> calling src->f_ops->foo works equally well, but I simply didn't want
> to bother with this. We can return to it later.
>
> Thanks,
> Miklos

2016-11-24 14:14:32

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 0/7] overlayfs: fix ro/rw fd data inconsistecies

On Thu, Nov 24, 2016 at 11:55 AM, Miklos Szeredi <[email protected]> wrote:
> A file is opened for read-only, opened read-write (resulting in a copy up)
> and modified. The data read back from the the read-only fd will be stale
> in this case (the read-only file descriptor still refers to the lower,
> unmodified file).
>
> This patchset fixes issues related to this corner case.
>
> The VFS impact is minimal and performance in the non-corner cases shouldn't
> suffer.

Git tree is here:

git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git #overlayfs-rorw

Thanks,
Miklos

> ---
> Miklos Szeredi (7):
> vfs: allow overlayfs to intercept file ops
> vfs: export filp_clone_open()
> mm: ovl: copy-up on MAP_SHARED
> ovl: add infrastructure for intercepting file ops
> ovl: intercept read_iter
> ovl: intercept mmap
> ovl: intercept fsync
>
> fs/internal.h | 1 -
> fs/open.c | 2 +-
> fs/overlayfs/inode.c | 225 +++++++++++++++++++++++++++++++++++++++++++++++
> fs/overlayfs/overlayfs.h | 2 +
> fs/overlayfs/super.c | 1 +
> include/linux/fs.h | 1 +
> mm/util.c | 22 +++++
> 7 files changed, 252 insertions(+), 2 deletions(-)
>
> --
> 2.5.5
>

2016-11-24 18:03:30

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 6/7] ovl: intercept mmap

On Thu, Nov 24, 2016 at 3:25 PM, Amir Goldstein <[email protected]> wrote:
> On Thu, Nov 24, 2016 at 12:55 PM, Miklos Szeredi <[email protected]> wrote:
>> ... in order to handle the corner case when the file is copied up after
>> being opened read-only and mapped shared.
>>
>> Can be verified with the following script:
>>
>> - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - -
>> cd /
>> rm -rf /tmp/ovl-rorw-test
>> mkdir /tmp/ovl-rorw-test
>> cd /tmp/ovl-rorw-test
>> cat << EOF > rorw-map.c
>> #include <fcntl.h>
>> #include <unistd.h>
>> #include <string.h>
>> #include <err.h>
>> #include <sys/mman.h>
>>
>> int main(int argc, char *argv[])
>> {
>> int rofd, rwfd;
>> int ret;
>> char buf[4];
>> char *addr;
>>
>> rofd = open(argv[1], O_RDONLY);
>> if (rofd == -1)
>> err(1, "ro open");
>>
>> addr = mmap(NULL, 4, PROT_READ, MAP_SHARED, rofd, 0);
>> if (addr == MAP_FAILED)
>> err(1, "ro mmap");
>>
>> if (memcmp(addr, "bubu", 4) == 0)
>> errx(1, "identical startup data");
>>
>> rwfd = open(argv[1], O_WRONLY);
>> if (rwfd == -1)
>> err(1, "rw open");
>>
>> ret = write(rwfd, "bubu", 4);
>> if (ret == -1)
>> err(1, "write");
>> if (ret < 4)
>> errx(1, "short write");
>>
>> if (memcmp(addr, "bubu", 4) != 0)
>> errx(1, "bad mmap data");
>>
>> ret = read(rofd, buf, 4);
>> if (ret == -1)
>> err(1, "read");
>> if (ret < 4)
>> errx(1, "short read");
>> if (memcmp(buf, "bubu", 4) != 0)
>> errx(1, "bad read data");
>>
>> return 0;
>> }
>> EOF
>
>
> Good timing :-)
> I just started working on an xfstest this morning to bring this corner
> case into attention.
> Once I get the xfs_io commands in order, you could use them for a
> shorter commit message.
> Unless you like it that way...
>
>> gcc -o rorw-map rorw-map.c
>> mkdir -p mnt lower upper work
>> echo baba > lower/foo
>> mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work mnt
>> ./rorw-map mnt/foo

Well, there is a bug/feature is xfs_io that puts it in a spin when an
open command is
used from command line, so I will need to fix that before sumbitting
the xfstest, but for
documentation purpose, you can use the following command in place of rorw-map:

$ xfs_io -r -c "mmap -r 0 4" -c "open foo" -c "pwrite -S 0x61 0 4" -c
"mread -v 0 4" foo |tail -n 1
00000000: 62 61 62 61 baba

Naturally, 'baba' is bad. 'aaaa' is good.

>> umount mnt
>> - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - -
>>
>> No output means success, "rorw-map: bad mmap data" means failure.
>>
>> Signed-off-by: Miklos Szeredi <[email protected]>
>> ---
>> fs/overlayfs/inode.c | 23 ++++++++++++++++++++++-
>> 1 file changed, 22 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
>> index 09c6f99bd5db..25b31c6ebe9e 100644
>> --- a/fs/overlayfs/inode.c
>> +++ b/fs/overlayfs/inode.c
>> @@ -11,6 +11,7 @@
>> #include <linux/slab.h>
>> #include <linux/xattr.h>
>> #include <linux/posix_acl.h>
>> +#include <linux/mm.h>
>> #include <linux/module.h>
>> #include <linux/file.h>
>> #include <linux/hashtable.h>
>> @@ -392,6 +393,25 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *to)
>> return ret;
>> }
>>
>> +static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
>> +{
>> + if (likely(ovl_file_is_lower(file)))
>> + return OVL_CALL_REAL_FOP(file, mmap(file, vma));
>> +
>> + file = filp_clone_open(file);
>> + if (IS_ERR(file))
>> + return PTR_ERR(file);
>> +
>> + fput(vma->vm_file);
>> + /* transfer ref: */
>> + vma->vm_file = file;
>> +
>> + if (!file->f_op->mmap)
>> + return -ENODEV;
>> +
>> + return file->f_op->mmap(file, vma);
>> +}
>> +
>> static struct ovl_fops *ovl_fops_find(const struct file_operations *orig)
>> {
>> struct ovl_fops *ofop;
>> @@ -431,9 +451,10 @@ static struct ovl_fops *ovl_fops_get(struct file *file)
>> /* Intercept these: */
>> if (orig->read_iter)
>> ofop->fops.read_iter = ovl_read_iter;
>> + if (orig->mmap)
>> + ofop->fops.mmap = ovl_mmap;
>>
>> /* These will need to be intercepted: */
>> - ofop->fops.mmap = orig->mmap;
>> ofop->fops.fsync = orig->fsync;
>>
>> /*
>> --
>> 2.5.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html

2016-11-24 19:06:22

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 6/7] ovl: intercept mmap

On Thu, Nov 24, 2016 at 8:03 PM, Amir Goldstein <[email protected]> wrote:
> On Thu, Nov 24, 2016 at 3:25 PM, Amir Goldstein <[email protected]> wrote:
>> On Thu, Nov 24, 2016 at 12:55 PM, Miklos Szeredi <[email protected]> wrote:
>>> ... in order to handle the corner case when the file is copied up after
>>> being opened read-only and mapped shared.
>>>
>>> Can be verified with the following script:
>>>
>>> - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - -
>>> cd /
>>> rm -rf /tmp/ovl-rorw-test
>>> mkdir /tmp/ovl-rorw-test
>>> cd /tmp/ovl-rorw-test
>>> cat << EOF > rorw-map.c
>>> #include <fcntl.h>
>>> #include <unistd.h>
>>> #include <string.h>
>>> #include <err.h>
>>> #include <sys/mman.h>
>>>
>>> int main(int argc, char *argv[])
>>> {
>>> int rofd, rwfd;
>>> int ret;
>>> char buf[4];
>>> char *addr;
>>>
>>> rofd = open(argv[1], O_RDONLY);
>>> if (rofd == -1)
>>> err(1, "ro open");
>>>
>>> addr = mmap(NULL, 4, PROT_READ, MAP_SHARED, rofd, 0);
>>> if (addr == MAP_FAILED)
>>> err(1, "ro mmap");
>>>
>>> if (memcmp(addr, "bubu", 4) == 0)
>>> errx(1, "identical startup data");
>>>
>>> rwfd = open(argv[1], O_WRONLY);
>>> if (rwfd == -1)
>>> err(1, "rw open");
>>>
>>> ret = write(rwfd, "bubu", 4);
>>> if (ret == -1)
>>> err(1, "write");
>>> if (ret < 4)
>>> errx(1, "short write");
>>>
>>> if (memcmp(addr, "bubu", 4) != 0)
>>> errx(1, "bad mmap data");
>>>
>>> ret = read(rofd, buf, 4);
>>> if (ret == -1)
>>> err(1, "read");
>>> if (ret < 4)
>>> errx(1, "short read");
>>> if (memcmp(buf, "bubu", 4) != 0)
>>> errx(1, "bad read data");
>>>
>>> return 0;
>>> }
>>> EOF
>>
>>
>> Good timing :-)
>> I just started working on an xfstest this morning to bring this corner
>> case into attention.
>> Once I get the xfs_io commands in order, you could use them for a
>> shorter commit message.
>> Unless you like it that way...
>>
>>> gcc -o rorw-map rorw-map.c
>>> mkdir -p mnt lower upper work
>>> echo baba > lower/foo
>>> mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work mnt
>>> ./rorw-map mnt/foo
>
> Well, there is a bug/feature is xfs_io that puts it in a spin when an
> open command is
> used from command line, so I will need to fix that before sumbitting
> the xfstest, but for
> documentation purpose, you can use the following command in place of rorw-map:
>
> $ xfs_io -r -c "mmap -r 0 4" -c "open foo" -c "pwrite -S 0x61 0 4" -c
> "mread -v 0 4" foo |tail -n 1
> 00000000: 62 61 62 61 baba
>
> Naturally, 'baba' is bad. 'aaaa' is good.
>

Or maybe it was meant to be used this way, which does not blow up:

$ xfs_io << EOF
open -r foo
mmap -r 0 4
open foo
pwrite -S 0x61 0 4
mread -v 0 4
EOF

>>> umount mnt
>>> - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - -
>>>
>>> No output means success, "rorw-map: bad mmap data" means failure.
>>>
>>> Signed-off-by: Miklos Szeredi <[email protected]>
>>> ---
>>> fs/overlayfs/inode.c | 23 ++++++++++++++++++++++-
>>> 1 file changed, 22 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
>>> index 09c6f99bd5db..25b31c6ebe9e 100644
>>> --- a/fs/overlayfs/inode.c
>>> +++ b/fs/overlayfs/inode.c
>>> @@ -11,6 +11,7 @@
>>> #include <linux/slab.h>
>>> #include <linux/xattr.h>
>>> #include <linux/posix_acl.h>
>>> +#include <linux/mm.h>
>>> #include <linux/module.h>
>>> #include <linux/file.h>
>>> #include <linux/hashtable.h>
>>> @@ -392,6 +393,25 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *to)
>>> return ret;
>>> }
>>>
>>> +static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
>>> +{
>>> + if (likely(ovl_file_is_lower(file)))
>>> + return OVL_CALL_REAL_FOP(file, mmap(file, vma));
>>> +
>>> + file = filp_clone_open(file);
>>> + if (IS_ERR(file))
>>> + return PTR_ERR(file);
>>> +
>>> + fput(vma->vm_file);
>>> + /* transfer ref: */
>>> + vma->vm_file = file;
>>> +
>>> + if (!file->f_op->mmap)
>>> + return -ENODEV;
>>> +
>>> + return file->f_op->mmap(file, vma);
>>> +}
>>> +
>>> static struct ovl_fops *ovl_fops_find(const struct file_operations *orig)
>>> {
>>> struct ovl_fops *ofop;
>>> @@ -431,9 +451,10 @@ static struct ovl_fops *ovl_fops_get(struct file *file)
>>> /* Intercept these: */
>>> if (orig->read_iter)
>>> ofop->fops.read_iter = ovl_read_iter;
>>> + if (orig->mmap)
>>> + ofop->fops.mmap = ovl_mmap;
>>>
>>> /* These will need to be intercepted: */
>>> - ofop->fops.mmap = orig->mmap;
>>> ofop->fops.fsync = orig->fsync;
>>>
>>> /*
>>> --
>>> 2.5.5
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html

2016-11-24 20:50:53

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 0/7] overlayfs: fix ro/rw fd data inconsistecies

On Thu, Nov 24, 2016 at 4:14 PM, Miklos Szeredi <[email protected]> wrote:
> On Thu, Nov 24, 2016 at 11:55 AM, Miklos Szeredi <[email protected]> wrote:
>> A file is opened for read-only, opened read-write (resulting in a copy up)
>> and modified. The data read back from the the read-only fd will be stale
>> in this case (the read-only file descriptor still refers to the lower,
>> unmodified file).
>>
>> This patchset fixes issues related to this corner case.
>>
>> The VFS impact is minimal and performance in the non-corner cases shouldn't
>> suffer.
>
> Git tree is here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git #overlayfs-rorw
>

passed my tests.
I now run all my tests (including unionmount-testsuite) on a setup of
overlayfs over xfs with reflink:
- xfstests - from my github overlayfs-devel branch
-g quick and overlay/*
including new test 016 to check ro/rw fd data inconsistencies
- pjdfstest - from my github overlayfs-devel branch
including acl/xattr tests
- unionmount-testsuite - from my github overlayfs-devel branch
without recycling (--ov), with mount recycle -(-ov=0) and with mount
recycle and
layers (--ov=10)

Tested-by: Amir Goldstein <[email protected]>

2016-11-25 05:23:00

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 4/7] ovl: add infrastructure for intercepting file ops

On Thu, Nov 24, 2016 at 4:08 PM, Amir Goldstein <[email protected]> wrote:
> On Thu, Nov 24, 2016 at 3:51 PM, Miklos Szeredi <[email protected]> wrote:
>> On Thu, Nov 24, 2016 at 2:12 PM, Amir Goldstein <[email protected]> wrote:
>>> On Thu, Nov 24, 2016 at 2:03 PM, Miklos Szeredi <[email protected]> wrote:
>>>> On Thu, Nov 24, 2016 at 12:52 PM, Amir Goldstein <[email protected]> wrote:
>>>>> On Thu, Nov 24, 2016 at 12:55 PM, Miklos Szeredi <[email protected]> wrote:
>>>>
>>>>>> + /*
>>>>>> + * These should be intercepted, but they are very unlikely to be
>>>>>> + * a problem in practice. Leave them alone for now.
>>>>>
>>>>> It could also be handled in vfs helpers.
>>>>> Since these ops all start with establishing that src and dest are on
>>>>> the same sb,
>>>>> then the cost of copy up of src is the cost of clone_file_range from
>>>>> lower to upper,
>>>>> so it is probably worth to copy up src and leave those fops alone.
>>>>>
>>>>>> + */
>>>>>> + ofop->fops.copy_file_range = orig->copy_file_range;
>>>>>> + ofop->fops.clone_file_range = orig->clone_file_range;
>>>>>> + ofop->fops.dedupe_file_range = orig->dedupe_file_range;
>>>>
>>>> Not sure I understand. Why should we copy up src? Copy up is the
>>>> problem not the solution.
>>>>
>>>
>>> Maybe the idea is ill conceived, but the reasoning is:
>>> To avoid the corner case of cloning from a stale lower src,
>>> call d_real() in vfs helpers to always copy up src before cloning from it
>>> and pass the correct file onwards.
>>
>> Which correct file? src is still the wrong one after calling d_real.
>> We need to clone-open src, just like we do in ovl_read_iter to get the
>> correct file. But then what's the use of copying it up beforehand?
>>
>> We could move the whole logic into the vfs, but I don't really see the point.
>>

Here is a relevant use case (creating several clones),
although not directly related to ro/rw inconsistency, which
justified putting the logic in vfs.

X is a file in lower
lower is different fs then upper
upper supports clone/dedup/copy_range

for i in `seq 1 100`; do cp --reflink=auto X X${i}; done

With current code the src and destination files are on the same
mount (test in ioctl_file_clone), but not on the same sb (test in
vfs_clone_file_range), so cp will fall back to 100 expensive data copies.

*If* instead we d_real() and clone-open src in start of vfs_clone_file_range
*after* verifying the dest file ops support clone, then we will get only one
expensive copy up and 100 cheap clones, so its a big win.

And for the case of src and dst inodes already on the same sb, we can
skip d_real() to avoid possible unneeded copy up, although a clone up
is going to be cheap anyway.

The so called worst case is that this was a one time clone (to X1),
but the cost in this case is not huge - 1 data copy up of X and 1 clone
X->X1 instead of just 1 data copy X->X1, so the difference is negligible.

Now it's true that this is heuristic, but arguably a good one.

Amir.

2016-11-25 21:20:21

by kernel test robot

[permalink] [raw]
Subject: [mm] 68ab21008a: BUG:unable_to_handle_kernel

FYI, we noticed the following commit:

commit 68ab21008a656abeb1fe2c7117a67eeab4d68ded ("mm: ovl: copy-up on MAP_SHARED")
url: https://github.com/0day-ci/linux/commits/Miklos-Szeredi/overlayfs-fix-ro-rw-fd-data-inconsistecies/20161124-233654
base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next

in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-x86_64 -enable-kvm -cpu IvyBridge -m 360M

caused below changes:


+------------------------------------------+------------+------------+
| | b45dbaab96 | 68ab21008a |
+------------------------------------------+------------+------------+
| boot_successes | 4 | 0 |
| boot_failures | 0 | 5 |
| BUG:unable_to_handle_kernel | 0 | 5 |
| Oops | 0 | 5 |
| RIP:vm_mmap_pgoff | 0 | 5 |
| calltrace:SyS_mmap_pgoff | 0 | 5 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 5 |
+------------------------------------------+------------+------------+



[ 5.829350] job=/lkp/scheduled/vm-ivb41-yocto-ia32-22/trinity-300s-yocto-tiny-i386-2016-04-22.cgz-68ab21008a656abeb1fe2c7117a67eeab4d68ded-20161126-104135-1r99nk0-0.yaml
[ 5.829350] run-job /lkp/scheduled/vm-ivb41-yocto-ia32-22/trinity-300s-yocto-tiny-i386-2016-04-22.cgz-68ab21008a656abeb1fe2c7117a67eeab4d68ded-20161126-104135-1r99nk0-0.yaml
[ 5.829350] /bin/busybox wget -q http://inn:80/~lkp/cgi-bin/lkp-jobfile-append-var?job_file=/lkp/scheduled/vm-ivb41-yocto-ia32-22/trinity-300s-yocto-tiny-i386-2016-04-22.cgz-68ab21008a656abeb1fe2c7117a67eeab4d68ded-20161126-104135-1r99nk0-0.yaml&job_state=running -O /dev/null
[ 16.146778] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 16.150092] IP: [<ffffffff811eb290>] vm_mmap_pgoff+0x63/0xe7
[ 16.152858] PGD 71bf067
[ 16.153453] PUD 688b067
PMD 0
[ 16.154890]
[ 16.156053] Oops: 0000 [#1] SMP
[ 16.157251] Modules linked in:
[ 16.158444] CPU: 0 PID: 1483 Comm: trinity Not tainted 4.9.0-rc4-00032-g68ab210 #1
[ 16.160691] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[ 16.162901] task: ffff880006cc1c00 task.stack: ffffc9000027c000
[ 16.164145] RIP: 0010:[<ffffffff811eb290>] [<ffffffff811eb290>] vm_mmap_pgoff+0x63/0xe7
[ 16.166525] RSP: 0000:ffffc9000027fe90 EFLAGS: 00010246
[ 16.167620] RAX: ffff880006cc1c00 RBX: 0000000000000000 RCX: 0000000000000001
[ 16.169186] RDX: 0000000000001000 RSI: 0000000000000000 RDI: 0000000000000000
[ 16.170761] RBP: ffffc9000027fed8 R08: 0000000000000021 R09: 0000000000000000
[ 16.172283] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 16.173878] R13: 0000000000001000 R14: ffff880006cc8800 R15: 0000000000000000
[ 16.175332] FS: 0000000000000000(0000) GS:ffff880016000000(0063) knlGS:0000000009af2840
[ 16.177455] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 16.178827] CR2: 0000000000000018 CR3: 00000000071b4000 CR4: 00000000001406f0
[ 16.180266] Stack:
[ 16.181094] 0000000000000000 0000000000001000 0000000000000000 ffff880004b6ff00
[ 16.183365] 0000000000001000 0000000000000021 0000000000001000 0000000000000000
[ 16.185548] 0000000000000000 ffffc9000027ff30 ffffffff812036f8 0000000600270000
[ 16.187824] Call Trace:
[ 16.188656] [<ffffffff812036f8>] SyS_mmap_pgoff+0x184/0x1a9
[ 16.189918] [<ffffffff810019e8>] do_int80_syscall_32+0x64/0xbf
[ 16.191138] [<ffffffff81ae3938>] entry_INT80_compat+0x38/0x50
[ 16.192397] Code: 03 00 00 75 21 49 83 c6 68 4c 89 45 b8 49 c7 c5 fc ff ff ff 4c 89 f7 e8 c0 4c 8f 00 85 c0 4c 8b 45 b8 75 79 eb 37 f6 c1 02 75 da <48> 8b 47 18 8b 10 0f ba e2 1a 73 19 48 8b 48 60 4c 89 45 b8 ba
[ 16.201562] RIP [<ffffffff811eb290>] vm_mmap_pgoff+0x63/0xe7
[ 16.203254] RSP <ffffc9000027fe90>
[ 16.204501] CR2: 0000000000000018
[ 16.206063] ---[ end trace 910d3120db07d449 ]---
[ 16.226016] Kernel panic - not syncing: Fatal exception
[ 16.227115] Kernel Offset: disabled

Elapsed time: 20


To reproduce:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
Kernel Test Robot


Attachments:
(No filename) (4.52 kB)
config-4.9.0-rc4-00032-g68ab210 (99.32 kB)
job-script (3.55 kB)
dmesg.xz (12.42 kB)
Download all attachments