LinuxLists.cc - [PATCH 0/7] overlayfs: fix ro/rw fd data inconsistecies

2016-11-24 10:55:49

Subject: [PATCH 0/7] overlayfs: fix ro/rw fd data inconsistecies

A file is opened for read-only, opened read-write (resulting in a copy up)
and modified. The data read back from the the read-only fd will be stale
in this case (the read-only file descriptor still refers to the lower,
unmodified file).

This patchset fixes issues related to this corner case.

The VFS impact is minimal and performance in the non-corner cases shouldn't
suffer.
---
Miklos Szeredi (7):
vfs: allow overlayfs to intercept file ops
vfs: export filp_clone_open()
mm: ovl: copy-up on MAP_SHARED
ovl: add infrastructure for intercepting file ops
ovl: intercept read_iter
ovl: intercept mmap
ovl: intercept fsync

fs/internal.h | 1 -
fs/open.c | 2 +-
fs/overlayfs/inode.c | 225 +++++++++++++++++++++++++++++++++++++++++++++++
fs/overlayfs/overlayfs.h | 2 +
fs/overlayfs/super.c | 1 +
include/linux/fs.h | 1 +
mm/util.c | 22 +++++
7 files changed, 252 insertions(+), 2 deletions(-)

--
2.5.5

2016-11-24 10:55:52

by Miklos Szeredi

[permalink] [raw]

On Thu, Nov 24, 2016 at 4:08 PM, Amir Goldstein <[email protected]> wrote:
> On Thu, Nov 24, 2016 at 3:51 PM, Miklos Szeredi <[email protected]> wrote:
>> On Thu, Nov 24, 2016 at 2:12 PM, Amir Goldstein <[email protected]> wrote:
>>> On Thu, Nov 24, 2016 at 2:03 PM, Miklos Szeredi <[email protected]> wrote:
>>>> On Thu, Nov 24, 2016 at 12:52 PM, Amir Goldstein <[email protected]> wrote:
>>>>> On Thu, Nov 24, 2016 at 12:55 PM, Miklos Szeredi <[email protected]> wrote:
>>>>
>>>>>> + /*
>>>>>> + * These should be intercepted, but they are very unlikely to be
>>>>>> + * a problem in practice. Leave them alone for now.
>>>>>
>>>>> It could also be handled in vfs helpers.
>>>>> Since these ops all start with establishing that src and dest are on
>>>>> the same sb,
>>>>> then the cost of copy up of src is the cost of clone_file_range from
>>>>> lower to upper,
>>>>> so it is probably worth to copy up src and leave those fops alone.
>>>>>
>>>>>> + */
>>>>>> + ofop->fops.copy_file_range = orig->copy_file_range;
>>>>>> + ofop->fops.clone_file_range = orig->clone_file_range;
>>>>>> + ofop->fops.dedupe_file_range = orig->dedupe_file_range;
>>>>
>>>> Not sure I understand. Why should we copy up src? Copy up is the
>>>> problem not the solution.
>>>>
>>>
>>> Maybe the idea is ill conceived, but the reasoning is:
>>> To avoid the corner case of cloning from a stale lower src,
>>> call d_real() in vfs helpers to always copy up src before cloning from it
>>> and pass the correct file onwards.
>>
>> Which correct file? src is still the wrong one after calling d_real.
>> We need to clone-open src, just like we do in ovl_read_iter to get the
>> correct file. But then what's the use of copying it up beforehand?
>>
>> We could move the whole logic into the vfs, but I don't really see the point.
>>

Here is a relevant use case (creating several clones),
although not directly related to ro/rw inconsistency, which
justified putting the logic in vfs.

X is a file in lower
lower is different fs then upper
upper supports clone/dedup/copy_range

for i in `seq 1 100`; do cp --reflink=auto X X${i}; done

With current code the src and destination files are on the same
mount (test in ioctl_file_clone), but not on the same sb (test in
vfs_clone_file_range), so cp will fall back to 100 expensive data copies.

*If* instead we d_real() and clone-open src in start of vfs_clone_file_range
*after* verifying the dest file ops support clone, then we will get only one
expensive copy up and 100 cheap clones, so its a big win.

And for the case of src and dst inodes already on the same sb, we can
skip d_real() to avoid possible unneeded copy up, although a clone up
is going to be cheap anyway.

The so called worst case is that this was a one time clone (to X1),
but the cost in this case is not huge - 1 data copy up of X and 1 clone
X->X1 instead of just 1 data copy X->X1, so the difference is negligible.

Now it's true that this is heuristic, but arguably a good one.

Amir.

2016-11-25 21:20:21

by kernel test robot

[permalink] [raw]

Subject: [mm] 68ab21008a: BUG:unable_to_handle_kernel

FYI, we noticed the following commit:

commit 68ab21008a656abeb1fe2c7117a67eeab4d68ded ("mm: ovl: copy-up on MAP_SHARED")
url: https://github.com/0day-ci/linux/commits/Miklos-Szeredi/overlayfs-fix-ro-rw-fd-data-inconsistecies/20161124-233654
base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next

in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/

on test machine: qemu-system-x86_64 -enable-kvm -cpu IvyBridge -m 360M

caused below changes:

+------------------------------------------+------------+------------+
| | b45dbaab96 | 68ab21008a |
+------------------------------------------+------------+------------+
| boot_successes | 4 | 0 |
| boot_failures | 0 | 5 |
| BUG:unable_to_handle_kernel | 0 | 5 |
| Oops | 0 | 5 |
| RIP:vm_mmap_pgoff | 0 | 5 |
| calltrace:SyS_mmap_pgoff | 0 | 5 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 5 |
+------------------------------------------+------------+------------+

[ 5.829350] job=/lkp/scheduled/vm-ivb41-yocto-ia32-22/trinity-300s-yocto-tiny-i386-2016-04-22.cgz-68ab21008a656abeb1fe2c7117a67eeab4d68ded-20161126-104135-1r99nk0-0.yaml
[ 5.829350] run-job /lkp/scheduled/vm-ivb41-yocto-ia32-22/trinity-300s-yocto-tiny-i386-2016-04-22.cgz-68ab21008a656abeb1fe2c7117a67eeab4d68ded-20161126-104135-1r99nk0-0.yaml
[ 5.829350] /bin/busybox wget -q http://inn:80/~lkp/cgi-bin/lkp-jobfile-append-var?job_file=/lkp/scheduled/vm-ivb41-yocto-ia32-22/trinity-300s-yocto-tiny-i386-2016-04-22.cgz-68ab21008a656abeb1fe2c7117a67eeab4d68ded-20161126-104135-1r99nk0-0.yaml&job_state=running -O /dev/null
[ 16.146778] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 16.150092] IP: [<ffffffff811eb290>] vm_mmap_pgoff+0x63/0xe7
[ 16.152858] PGD 71bf067
[ 16.153453] PUD 688b067
PMD 0
[ 16.154890]
[ 16.156053] Oops: 0000 [#1] SMP
[ 16.157251] Modules linked in:
[ 16.158444] CPU: 0 PID: 1483 Comm: trinity Not tainted 4.9.0-rc4-00032-g68ab210 #1
[ 16.160691] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[ 16.162901] task: ffff880006cc1c00 task.stack: ffffc9000027c000
[ 16.164145] RIP: 0010:[<ffffffff811eb290>] [<ffffffff811eb290>] vm_mmap_pgoff+0x63/0xe7
[ 16.166525] RSP: 0000:ffffc9000027fe90 EFLAGS: 00010246
[ 16.167620] RAX: ffff880006cc1c00 RBX: 0000000000000000 RCX: 0000000000000001
[ 16.169186] RDX: 0000000000001000 RSI: 0000000000000000 RDI: 0000000000000000
[ 16.170761] RBP: ffffc9000027fed8 R08: 0000000000000021 R09: 0000000000000000
[ 16.172283] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 16.173878] R13: 0000000000001000 R14: ffff880006cc8800 R15: 0000000000000000
[ 16.175332] FS: 0000000000000000(0000) GS:ffff880016000000(0063) knlGS:0000000009af2840
[ 16.177455] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 16.178827] CR2: 0000000000000018 CR3: 00000000071b4000 CR4: 00000000001406f0
[ 16.180266] Stack:
[ 16.181094] 0000000000000000 0000000000001000 0000000000000000 ffff880004b6ff00
[ 16.183365] 0000000000001000 0000000000000021 0000000000001000 0000000000000000
[ 16.185548] 0000000000000000 ffffc9000027ff30 ffffffff812036f8 0000000600270000
[ 16.187824] Call Trace:
[ 16.188656] [<ffffffff812036f8>] SyS_mmap_pgoff+0x184/0x1a9
[ 16.189918] [<ffffffff810019e8>] do_int80_syscall_32+0x64/0xbf
[ 16.191138] [<ffffffff81ae3938>] entry_INT80_compat+0x38/0x50
[ 16.192397] Code: 03 00 00 75 21 49 83 c6 68 4c 89 45 b8 49 c7 c5 fc ff ff ff 4c 89 f7 e8 c0 4c 8f 00 85 c0 4c 8b 45 b8 75 79 eb 37 f6 c1 02 75 da <48> 8b 47 18 8b 10 0f ba e2 1a 73 19 48 8b 48 60 4c 89 45 b8 ba
[ 16.201562] RIP [<ffffffff811eb290>] vm_mmap_pgoff+0x63/0xe7
[ 16.203254] RSP <ffffc9000027fe90>
[ 16.204501] CR2: 0000000000000018
[ 16.206063] ---[ end trace 910d3120db07d449 ]---
[ 16.226016] Kernel panic - not syncing: Fatal exception
[ 16.227115] Kernel Offset: disabled

Elapsed time: 20

To reproduce:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email

Thanks,
Kernel Test Robot

Attachments:

(No filename) (4.52 kB)
config-4.9.0-rc4-00032-g68ab210 (99.32 kB)
job-script (3.55 kB)
dmesg.xz (12.42 kB)
Download all attachments