Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753653AbdDLI6O (ORCPT ); Wed, 12 Apr 2017 04:58:14 -0400 Received: from mail-wr0-f196.google.com ([209.85.128.196]:35797 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753045AbdDLI6L (ORCPT ); Wed, 12 Apr 2017 04:58:11 -0400 MIME-Version: 1.0 In-Reply-To: <1491951118-30678-1-git-send-email-mike.kravetz@oracle.com> References: <1491951118-30678-1-git-send-email-mike.kravetz@oracle.com> From: Vegard Nossum Date: Wed, 12 Apr 2017 10:58:08 +0200 Message-ID: Subject: Re: [PATCH] hugetlbfs: fix offset overflow in huegtlbfs mmap To: Mike Kravetz Cc: Linux Memory Management List , LKML , Dmitry Vyukov , Hillf Danton , Michal Hocko , "Kirill A . Shutemov" , Andrey Ryabinin , Naoya Horiguchi , Andrew Morton Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4221 Lines: 112 On 12 April 2017 at 00:51, Mike Kravetz wrote: > If mmap() maps a file, it can be passed an offset into the file at > which the mapping is to start. Offset could be a negative value when > represented as a loff_t. The offset plus length will be used to > update the file size (i_size) which is also a loff_t. Validate the > value of offset and offset + length to make sure they do not overflow > and appear as negative. > > Found by syzcaller with commit ff8c0c53c475 ("mm/hugetlb.c: don't call > region_abort if region_chg fails") applied. Prior to this commit, the > overflow would still occur but we would luckily return ENOMEM. > To reproduce: > mmap(0, 0x2000, 0, 0x40021, 0xffffffffffffffffULL, 0x8000000000000000ULL); > > Resulted in, > kernel BUG at mm/hugetlb.c:742! > Call Trace: > hugetlbfs_evict_inode+0x80/0xa0 > ? hugetlbfs_setattr+0x3c0/0x3c0 > evict+0x24a/0x620 > iput+0x48f/0x8c0 > dentry_unlink_inode+0x31f/0x4d0 > __dentry_kill+0x292/0x5e0 > dput+0x730/0x830 > __fput+0x438/0x720 > ____fput+0x1a/0x20 > task_work_run+0xfe/0x180 > exit_to_usermode_loop+0x133/0x150 > syscall_return_slowpath+0x184/0x1c0 > entry_SYSCALL_64_fastpath+0xab/0xad > > Reported-by: Vegard Nossum Please use if possible :-) > Signed-off-by: Mike Kravetz > --- > fs/hugetlbfs/inode.c | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > index 7163fe0..dde8613 100644 > --- a/fs/hugetlbfs/inode.c > +++ b/fs/hugetlbfs/inode.c > @@ -136,17 +136,26 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma) > vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND; > vma->vm_ops = &hugetlb_vm_ops; > > + /* > + * Offset passed to mmap (before page shift) could have been > + * negative when represented as a (l)off_t. > + */ > + if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0) > + return -EINVAL; > + This is strictly speaking undefined behaviour in C and would get flagged by e.g. UBSAN. The kernel does compile with -fno-strict-overflow when supported, though, so maybe it's more of a theoretical issue. Another thing: wouldn't we want to detect all truncations, not just the ones that happen to end up negative? For example (with -fno-strict-overflow), (0x12345678 << 12) == 0x45678000, which is still a positive integer, but obviously truncated. We can easily avoid the UB by moving the cast out (since ->vm_pgoff is unsigned and unsigned shifts are always defined IIRC), but that still doesn't reliably detect the positive-result truncation/overflow. > if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT)) > return -EINVAL; > > vma_len = (loff_t)(vma->vm_end - vma->vm_start); > + len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT); > + /* check for overflow */ > + if (len < vma_len) > + return -EINVAL; Also strictly speaking UB. You can avoid it by casting vma_len to unsigned and dropping the loff_t cast, but it's admittedly somewhat verbose. There also isn't an "unsigned loff_t" AFAIK, but don't we have some helpers to safely check for overflows? Surely this isn't the only place that does loff_t arithmetic. > > inode_lock(inode); > file_accessed(file); > > ret = -ENOMEM; > - len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT); > - > if (hugetlb_reserve_pages(inode, > vma->vm_pgoff >> huge_page_order(h), > len >> huge_page_shift(h), vma, > @@ -155,7 +164,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma) > > ret = 0; > if (vma->vm_flags & VM_WRITE && inode->i_size < len) > - inode->i_size = len; > + i_size_write(inode, len); > out: > inode_unlock(inode); This hunk seems a bit out of place in the sense that I don't see how it relates to the overflow checking. I think this either belongs in a separate patch or it deserves a mention in the changelog. Vegard