Greetings,
be83bbf80682 "mmap: introduce sane default mmap limits" introduced a
problem with mapping /proc/vmcore if it is bigger than 2gb. This
breaks s390 kernel zfcpdump. But it should be a general problem.
Please consider the following one-liner fix, if it makes sense.
Vasily Gorbik (1):
procfs: fix mmap() for /proc/vmcore
fs/proc/inode.c | 1 +
1 file changed, 1 insertion(+)
--
⢀⡵⣤⡴⣅ 2.17.0
⠏⢟⡛⣛⠏⠇ space invaders edition
Procfs does not set max file size (s_maxbytes) and ends up with default
value of
MAX_NON_LFS ((1UL<<31) - 1)
Commit be83bbf80682 ("mmap: introduce sane default mmap limits")
introduced "pgoff" limits checks for mmap, considering fs max file size
as upper limit, which made it impossible to mmap /proc/vmcore file
contents above 2Gb (/proc/vmcore appears to be the only procfs file
supporting mmap).
Reuse MAX_LFS_FILESIZE as procfs s_maxbytes value.
Fixes: be83bbf80682 ("mmap: introduce sane default mmap limits")
Signed-off-by: Vasily Gorbik <[email protected]>
---
fs/proc/inode.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 2cf3b74391ca..de1b3b1da883 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -502,6 +502,7 @@ int proc_fill_super(struct super_block *s, void *data, int silent)
/* User space would break if executables or devices appear on proc */
s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC | SB_I_NODEV;
s->s_flags |= SB_NODIRATIME | SB_NOSUID | SB_NOEXEC;
+ s->s_maxbytes = MAX_LFS_FILESIZE;
s->s_blocksize = 1024;
s->s_blocksize_bits = 10;
s->s_magic = PROC_SUPER_MAGIC;
--
⢀⡵⣤⡴⣅ 2.17.0
⠏⢟⡛⣛⠏⠇ space invaders edition
On Fri, May 18, 2018 at 8:15 PM Vasily Gorbik <[email protected]> wrote:
> Commit be83bbf80682 ("mmap: introduce sane default mmap limits")
> introduced "pgoff" limits checks for mmap, considering fs max file size
> as upper limit, which made it impossible to mmap /proc/vmcore file
> contents above 2Gb (/proc/vmcore appears to be the only procfs file
> supporting mmap).
> Reuse MAX_LFS_FILESIZE as procfs s_maxbytes value.
Ugh. /proc is where a lot of problems *have* been.
Admittedly not as much as random drivers, but still. If proc doesn't set
s_maxbytes, then we should not raise it here magically, there might be
various /proc files that get offset accounting wrong.
I'd *much* rather just set FMODE_UNSIGNED_OFFSET for /proc/vmcore _only_,
rather than open up all proc files to issues with 4G+ offsets.
Linus
On Fri, May 18, 2018 at 8:20 PM Linus Torvalds <
[email protected]> wrote:
> I'd *much* rather just set FMODE_UNSIGNED_OFFSET for /proc/vmcore _only_,
> rather than open up all proc files to issues with 4G+ offsets.
Hmm. I was going to point to the s_maxbytes check in rw_verify_area() and
ask you how that ever worked for that file, but it's not there, the
s_maxbyte checks are only in lseek and in do_splice().
So apparently we protect against llseek + read/write, but we don't protect
against pread64/pwrite64 having offset overflows..
That's crazy. That makes all the s_maxbytes protection much less effective
than it should be. Filesystems that don't get the 64-bit case right will
screw up pread64 and friends.
Al, I'm missing something. Did we always have this gaping hole where we
didn't actually check s_maxbytes against read/write, only
generic_file_llseek? Apparently.
Linus
On Fri, May 18, 2018 at 08:33:37PM -0700, Linus Torvalds wrote:
> On Fri, May 18, 2018 at 8:20 PM Linus Torvalds <
> [email protected]> wrote:
>
> > I'd *much* rather just set FMODE_UNSIGNED_OFFSET for /proc/vmcore _only_,
> > rather than open up all proc files to issues with 4G+ offsets.
>
> Hmm. I was going to point to the s_maxbytes check in rw_verify_area() and
> ask you how that ever worked for that file, but it's not there, the
> s_maxbyte checks are only in lseek and in do_splice().
>
> So apparently we protect against llseek + read/write, but we don't protect
> against pread64/pwrite64 having offset overflows..
>
> That's crazy. That makes all the s_maxbytes protection much less effective
> than it should be. Filesystems that don't get the 64-bit case right will
> screw up pread64 and friends.
>
> Al, I'm missing something. Did we always have this gaping hole where we
> didn't actually check s_maxbytes against read/write, only
> generic_file_llseek? Apparently.
Not quite. The things like
if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
return 0;
iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
protect most of the regular files (see mm/filemap.c). And for devices (which is
where the majority of crap ->read()/->write() is) it's obviously not applicable -
->s_maxbytes of *what*?
On Fri, May 18, 2018 at 9:12 PM Linus Torvalds <
[email protected]> wrote:
> (So we'd still have a "sb_maxbytes" that filesystems would fill in, but it
> would only be used as a "fill in inode value for regular files for this
> superblock").
Actually, maybe we could just make rw_verify_area() use the new
file_mmap_size_max() function.
We'd just remove the "mmap" part of the name, and have all IO (and all
mmap) use that limit.
That doesn't sound like a horrible idea to try for 4.18. Hmm?
Linus
On Fri, May 18, 2018 at 8:43 PM Al Viro <[email protected]> wrote:
> Not quite. The things like
> if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
> return 0;
> iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
> protect most of the regular files (see mm/filemap.c). And for devices
(which is
> where the majority of crap ->read()/->write() is) it's obviously not
applicable -
> ->s_maxbytes of *what*?
Yeah that "s_maxbytes of what" is I think the real issue. We should never
have made s_maxbytes be super-block specific: we should have made it be
per-inode, and then have inode_init_always() initialize it using something
like the file_mmap_size_max() logic.
(So we'd still have a "sb_maxbytes" that filesystems would fill in, but it
would only be used as a "fill in inode value for regular files for this
superblock").
Then we could actually protect read/write properly, because many of the
nasty bugs have been in character device drivers.
Oh well. It would still be a good thing to do some day, I suspect, but it's
clearly not the case now, and so s_maxbytes actually has much less coverage
than I was hoping for.
(And thus also the problems with /proc/vmcore - it never saw s_maxbytes
limits before).
Oh, well. The lack of any meaningful s_maxbytes coverage for proc obviously
means that my objections against Vasily's patch are mostly invalid. Even if
/proc does use "generic_file_llseek()" a lot and that should limit things
to 4G offsets, you can just use pread64/pwrite64 to see if you can screw up
the offset.
I'd still prefer to limit the damage to just "vmcore".
Something like the below COMPLETELY UNTESTED patch? Vasily?
Linus
On Fri, May 18, 2018 at 09:12:56PM -0700, Linus Torvalds wrote:
> On Fri, May 18, 2018 at 8:43 PM Al Viro <[email protected]> wrote:
>
> > Not quite. The things like
> > if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
> > return 0;
> > iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
> > protect most of the regular files (see mm/filemap.c). And for devices
> (which is
> > where the majority of crap ->read()/->write() is) it's obviously not
> applicable -
> > ->s_maxbytes of *what*?
>
> Yeah that "s_maxbytes of what" is I think the real issue. We should never
> have made s_maxbytes be super-block specific: we should have made it be
> per-inode, and then have inode_init_always() initialize it using something
> like the file_mmap_size_max() logic.
>
> (So we'd still have a "sb_maxbytes" that filesystems would fill in, but it
> would only be used as a "fill in inode value for regular files for this
> superblock").
>
> Then we could actually protect read/write properly, because many of the
> nasty bugs have been in character device drivers.
>
> Oh well. It would still be a good thing to do some day, I suspect, but it's
> clearly not the case now, and so s_maxbytes actually has much less coverage
> than I was hoping for.
>
> (And thus also the problems with /proc/vmcore - it never saw s_maxbytes
> limits before).
>
> Oh, well. The lack of any meaningful s_maxbytes coverage for proc obviously
> means that my objections against Vasily's patch are mostly invalid. Even if
> /proc does use "generic_file_llseek()" a lot and that should limit things
> to 4G offsets, you can just use pread64/pwrite64 to see if you can screw up
> the offset.
>
> I'd still prefer to limit the damage to just "vmcore".
>
> Something like the below COMPLETELY UNTESTED patch? Vasily?
Would work, but file_mmap_size_max first checks
if (S_ISREG(inode->i_mode))
return inode->i_sb->s_maxbytes;
before
if (file->f_mode & FMODE_UNSIGNED_OFFSET)
return 0;
so, as it is this patch does not fix the issue.
> Linus
>
> fs/proc/vmcore.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index a45f0af22a60..83278c547127 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -491,7 +491,15 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
> }
> #endif
>
> +/* Mark vmcore as being able and willing to do 64-bit mmaps */
> +static int vmcore_open(struct inode *inode, struct file *file)
> +{
> + file->f_mode |= FMODE_UNSIGNED_OFFSET;
> + return 0;
> +}
> +
> static const struct file_operations proc_vmcore_operations = {
> + .open = vmcore_open,
> .read = read_vmcore,
> .llseek = default_llseek,
> .mmap = mmap_vmcore,
On Sat, May 19, 2018 at 4:39 AM Vasily Gorbik <[email protected]> wrote:
> Would work, but file_mmap_size_max first checks
> if (S_ISREG(inode->i_mode))
> return inode->i_sb->s_maxbytes;
> before
> if (file->f_mode & FMODE_UNSIGNED_OFFSET)
> return 0;
> so, as it is this patch does not fix the issue.
Right you are.
We could easily reverse the order of those two checks, but since clearly
the s_maxbytes case hadn't gotten as much coverage as I thought it had,
let's just simplify file_mmap_size_max() instead, and just make the limit
be MAX_LFS_FILESIZE for regular files too, the way we already do for block
devices (instead of trying to figure out the size of the block device).
That way we won't be introducing changes to s_maxbytes late in the rc
series, and we'll only affect the new logic.
Plus regular files was never what that logic really was introduced to
protect against anyway. It's the random ad-hoc mmap implementations in
drivers that it really aimed to protect.
So for now, I've committed the minimal fixlet that doesn't affect any
existing code (just the new max file size logic), and maybe this can be
revisited for 4.18.
Linus