Received: by 10.213.65.68 with SMTP id h4csp1924275imn; Thu, 29 Mar 2018 13:44:18 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+pFMTFAcQmOmcGHfF9wiJRGWK1VwMP7qZug0F9jaxg5s4EDlTmgPrC873sDECQ7Iksk37z X-Received: by 2002:a17:902:740c:: with SMTP id g12-v6mr10040629pll.406.1522356258495; Thu, 29 Mar 2018 13:44:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522356258; cv=none; d=google.com; s=arc-20160816; b=Q3SbcX0swd5lTwDF5ZhYckwMrbjaA4c30D4B8+i9+EVxFPdasS35E+XfTLz31iaoxL 53oqB5AtxNNh+RCtWHQwMwKRIPChgc1o2xTsEPGltJbPyQJZTP7zogAiS18BWdNogH41 T3iQdjjLnTqhkU4/NOrKFsYZZVueFI6EfTcpe69J/jrCJ8FsWeZQAvy2A0at6yOe/Cby IrQEhRVklal6sjkRLqNRHfYqXnG14R6Nbl5vTM9zr3177gdsijDQX+eGz3ex1r1PvJO0 wY57bXsNomWABclLQvo2h18BHN/LAnRuDRgAnhHdFwbHFZaY/LN0lGKP0KrFUvntQ7xL bg5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=M4THepXuFer0HHCaZdH9lr3cTT/1YvyCWhObqZ2G0dA=; b=FhiwF3jPtGgeBp0LouM0jgO+HkiHKqY+Zw8X1k2kS0USpoWdDF0i6gV4whLfUKuljz Cv7mVTHIzyBBaq+0foI1yiGvFpJG+WzisQ5WZqqcAgvfXFk+zuCXl5RtzMz4XaEFDrEV gEWyhNyKGXpN6CUqBUZKz1JoQa4gHBiEj9LPUdPNNqYQcRpa/3rktC/ou4NPdoFKUKZq z5aF3XQi5k8Dzlg+pF7mmRRqrDtBZJWdnhRMFh5bm0OTnMgeHWkSpQWBxKOdi/3eX7Ic g28Ae5eg2A00ivR6E6T9lxdcvJ9/obsiZnK/mHeMNuqA2oi2h8MApKpu4uzSCzKTjBWg UG3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=YadJpX6+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l1si4470934pga.536.2018.03.29.13.44.04; Thu, 29 Mar 2018 13:44:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=YadJpX6+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752197AbeC2Um0 (ORCPT + 99 others); Thu, 29 Mar 2018 16:42:26 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:44982 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750971AbeC2UmX (ORCPT ); Thu, 29 Mar 2018 16:42:23 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w2TKXDXb011198; Thu, 29 Mar 2018 20:42:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=M4THepXuFer0HHCaZdH9lr3cTT/1YvyCWhObqZ2G0dA=; b=YadJpX6+iAxFeBHAe/K7T1E84BhY3WO5GoY0ag7YQPDoUdC7ZXGJaSdpMJOWQZw80HXc mW9jrHTzD5tZS2hIclYhoLSJt1qZbf7+6wVBKEEad2gWG3LgWPdHd+ZAlTS9Q6fR8Tvg zgjMxr71OkoggJWddlcDnRDV1TACVM5+R71zOrlfMu4HOm3XqrIkQi8oc87iGr66iwrs RuWeXCDeMjGX8C+CTJzmjyqzU7s3qHbm6VAE1Aq0BDygRZatlYwlNB6gc2zwJQT8HfoQ TsWkaYSFU0PrD3Dlr1Z7fa0SHy/dmqP8Bh0n+T8BdT4zQVEVQUyIdtVkz/Tolmrpn7MK xA== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2130.oracle.com with ESMTP id 2h179j01ab-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 29 Mar 2018 20:42:13 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w2TKgC8h013111 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 29 Mar 2018 20:42:12 GMT Received: from abhmp0017.oracle.com (abhmp0017.oracle.com [141.146.116.23]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w2TKgBDw001389; Thu, 29 Mar 2018 20:42:11 GMT Received: from [192.168.1.164] (/98.246.252.205) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 29 Mar 2018 13:42:11 -0700 Subject: Re: [PATCH 0/1] fix regression in hugetlbfs overflow checking To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Michal Hocko , Yisheng Xie , "Kirill A . Shutemov" , Nic Losby , Dan Rue , Andrew Morton References: <20180329041656.19691-1-mike.kravetz@oracle.com> From: Mike Kravetz Message-ID: <91db8046-0504-4851-4bdb-28bfe7d4c24f@oracle.com> Date: Thu, 29 Mar 2018 13:42:09 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <20180329041656.19691-1-mike.kravetz@oracle.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8847 signatures=668697 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1803290208 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/28/2018 09:16 PM, Mike Kravetz wrote: > Commit 63489f8e8211 ("hugetlbfs: check for pgoff value overflow") > introduced a regression in 32 bit kernels. When creating the mask > to check vm_pgoff, it incorrectly specified that the size of a loff_t > was the size of a long. This prevents mapping hugetlbfs files at > offsets greater than 4GB on 32 bit kernels. Well, the kbuild test robot found issues with that as well. :( I stepped back and did some analysis on what really needs to be checked WRT arguments causing overflow in the hugetlbfs mmap routine. For my reference more than anything, here are type sizes to be concerned with on 32 and 64 bit systems: Data or type 32 bit system 64 bit system ------------ ------------- ------------- vm_pgoff 32 64 size_t 32 64 loff_t 64 64 huge_page_index 32 64 There are three areas of concern: 1) Will the page offset value passed in vm_pgoff overflow a loff_t? On 64 bit systems, vm_off is 64 bits and loff_t is 64 bits. When hugetlbfs mmap is entered via the normal mmap path, the file offset is PAGE_SHIFT'ed right and put into vm_pgoff. However, the remap_file_pages system call allows a user to specify a page offset which is put directly into vm_pgoff. Converting from a page offset to byte offset is required to get offsets within the underlying hugetlbfs file. In both cases, the value in vm_pgoff when converted to a byte offset could overflow a loff_t. On 32 bit systems, vm_pgoff is 32 bits and loff_t is 64 bits. Therefore, when converting to a byte address we will never overflow a loff_t. In addition, on 32 bit systems it is perfectly valid for vm_pgoff to be as big as ULONG_MAX. This allows for hugetlbfs files greater than 4GB in size. 2) Does vm_pgoff when converted to bytes plus the length of the mapping overflow a loff_t? On 64 bit systems, we have validated that vm_pgoff will not overflow a loff_t when converted to bytes. But, it is still possible for this value plus length to overflow a loff_t. On 32 bit systems, vm_pgoff is a 32 bits and can be at most ULONG_MAX. Converting this value to bytes is a PAGE_SHIFT left into a 64 bit loff_t. length is a 32 bits and can be at most LONG_MAX. Adding these two values can not overflow a 64 bit loff_t. 3) Can vm_pgoff and (vm_pgoff + length) be represented as huge page offsets with a signed long? The hugetlbfs reservation management code uses longs for huge page offsets into the underlying file. On 64 bit systems, the checks in 1) and 2) have ensured that both values can be represented by a loff_t which is a signed 64 bit value. These values will be shifted right by 'huge page shift' Therefore, they can certainly be represented by a signed long. On 32 bit systems pg_off can be at most ULONG_MAX. This value will be right shifted 'huge_page_shift - PAGE_SHIFT' bits. So, as long as huge_page_shift is one or more greater than PAGE_SHIFT this value can be represented with a signed long. Adding the maximum length value (LONG_MAX) the maximum pg_off byte converted value, would result in one more significant bit being set. For example assuming PAGE_SHIFT = 12: 0x0ffffffff000 + 0x00ffffffff = 0x1000ffffefff. To represent this as a huge page index, we right shift 'huge_page_shift - PAGE_SHIFT' bits. As long as huge_page_shift is 2 or more greater than PAGE_SHIFT, this value can be represented with a signed long. If we make the following assumptions: - PAGE_SHIFT will be 2 or more less than BITS_PER_LONG - huge_page_shift will be 2 or more greater than PAGE_SHIFT then we need to make to make following checks in the code. 1) Check the upper PAGE_SHIFT+1 bits of vm_pgoff on 64 bit systems. Explicitly disable on 32 bit systems. /* * Mask used when checking the page offset value passed in via system * calls. This value will be converted to a loff_t which is signed. * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the * value. The extra bit (- 1 in the shift value) is to take the sign * bit into account. */ #define PGOFF_LOFFT_MAX \ (((1UL << (PAGE_SHIFT + 1)) - 1) << (BITS_PER_LONG - (PAGE_SHIFT + 1))) ... /* * page based offset in vm_pgoff could be sufficiently large to * overflow a loff_t when converted to byte offset. This can * only happen on architectures where sizeof(loff_t) == * sizeof(unsigned long). So, only check in those instances. */ if (sizeof(unsigned long) == sizeof(loff_t)) { if (vma->vm_pgoff & PGOFF_LOFFT_MAX) return -EINVAL; } 2) Convert vm_pgoff and length to loff_t byte offset. Add together and check for overflow. This is unnecessary on 32 bit systems as described above, but likely not worth the conditional code. vma_len = (loff_t)(vma->vm_end - vma->vm_start); len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT); /* check for overflow */ if (len < vma_len) return -EINVAL; 3) After making the above two checks, pg_off and (pg_off + length) can be represented as huge page offsets with a signed long. So, no checks needed. I know this is long and a bore to read. However, I wanted to get some feedback before sending another patch. I have attempted to fix this several times and seem to always over look some detail. Hopefully, this will help others to think of additional concerns. -- Mike Kravetz