Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3921692imu; Mon, 14 Jan 2019 11:28:08 -0800 (PST) X-Google-Smtp-Source: ALg8bN4jGr9X2u8FT0ybb0bQKrtLB4LC9SSBE53kymVeM0q8RRye00XPcXlKJOHCpFuOroDni8kr X-Received: by 2002:a17:902:8f83:: with SMTP id z3mr44116plo.328.1547494088644; Mon, 14 Jan 2019 11:28:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547494088; cv=none; d=google.com; s=arc-20160816; b=TfcLPzSjRyfEHtaFZFogZT2k0jF63MojJ3sd4boIlj6SQh1cf0q8FvB0B7cECOyZ09 4ReKmx2xYqUtVby+gKWqbjOIM3FOGssYEcWnjBpGf0aTjhJWGtjvC/sPB3z9uJubOE1V W+xCg5ohc42IgNSFakFAOSASrSidfT6lbbxtI4mh190rASKX7FlALIVo2CbUrP1chvvH Tf81O6f0XH4OpsKxWNEfianx8iI0Mym1kUg75XwVHvfCL2mNwA31tHhn1zR3xqbSf604 S/GwhlDXIJpT5Jee6wdp2G0yV+E14nrbuXKg/88xOw0NN/ZTgKZ1ec7uu5jTViwZMaX9 v2kg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:cc:to:subject :dkim-signature; bh=LRxXWiyehqYbBuW7O6LgTqYIsTc7dhOgAUatOAhhu4E=; b=Q7feS9gIj1SVQ/SOeHlalShpv0tLOXsN1jkObgUkK26SGhIZ2R7kP84iYrqJswcFZS /mP0FNG4S7BY+wr00ldhfmTLOjzjZqblAtqP7wQNL85AKqD+Xb4nNhTXI584dVOHOfbV Pn8AriukUtA/rizgD5mfqvNqli9hINCWmGdFzjLp7MwlTfv7BcJtdiV18a2Bq7Lh3MLU 9OPVn4b4f7vJ8XBNAnbxUPVDamti23ubSVCk8cEwr8P5rQ5vQoI1aiimeSgTn5LfJqQb as6W+A0Z96zJUegsBSdRqJVhMvRUEe+8NVECsK/ZpxVfuXFP9B/m+p/UogEjbPyazISM Fm1A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=3BVlzjTy; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id ca19si1093586plb.238.2019.01.14.11.27.52; Mon, 14 Jan 2019 11:28:08 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=3BVlzjTy; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726836AbfANT0r (ORCPT + 99 others); Mon, 14 Jan 2019 14:26:47 -0500 Received: from userp2130.oracle.com ([156.151.31.86]:40026 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726745AbfANT0r (ORCPT ); Mon, 14 Jan 2019 14:26:47 -0500 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id x0EJNU7f024088; Mon, 14 Jan 2019 19:26:32 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=LRxXWiyehqYbBuW7O6LgTqYIsTc7dhOgAUatOAhhu4E=; b=3BVlzjTykqS4UCw/Q0TvJR6awo5C0DlBqkRAp4zrrEWhSwy8WcV0I0rsCQI3Ue4XcGLB 9NUG2qzWzjp8+CO0rHwk+O4zIOpkovIU+p+DfvYWbwCmO8rNGuunrGgxPb/qxeD+9VbT X4y9vQVEwcpO8EpQwBE45KKcj2RKrXNjKBW9550Fkp44ObKPXb7yI068nr4i4JcBeiEs CHQB3szYOLEe67COPcjT/NXN44zFq3O/4pmr5kRGSPQa4aTQqxb9+BxBzAaZfCwOTVge tbx5Lud47rWvrVjDa54fqV8+zcdYjqdBFoAt32BeGgc4mPKq0m2hEiag0yGjIwezgCNT sg== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp2130.oracle.com with ESMTP id 2pybkc7kvr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 14 Jan 2019 19:26:32 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id x0EJQVnR029436 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 14 Jan 2019 19:26:31 GMT Received: from abhmp0003.oracle.com (abhmp0003.oracle.com [141.146.116.9]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x0EJQU8n006387; Mon, 14 Jan 2019 19:26:30 GMT Received: from [10.152.35.100] (/10.152.35.100) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 14 Jan 2019 11:26:29 -0800 Subject: Re: [RFC PATCH] mm: align anon mmap for THP To: Mike Kravetz , "Kirill A. Shutemov" , linux_lkml_grp@oracle.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Hugh Dickins , Michal Hocko , Dan Williams , Matthew Wilcox , Toshi Kani , Boaz Harrosh , Andrew Morton References: <20190111201003.19755-1-mike.kravetz@oracle.com> <20190111215506.jmp2s5end2vlzhvb@black.fi.intel.com> <50c6abdc-b906-d16a-2f8f-8647b3d129aa@oracle.com> From: Steven Sistare Organization: Oracle Corporation Message-ID: <7d1ccbc3-7dad-99de-1b15-77bb1196f9a3@oracle.com> Date: Mon, 14 Jan 2019 14:26:26 -0500 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <50c6abdc-b906-d16a-2f8f-8647b3d129aa@oracle.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9136 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1901140150 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/14/2019 1:54 PM, Mike Kravetz wrote: > On 1/14/19 7:35 AM, Steven Sistare wrote: >> On 1/11/2019 6:28 PM, Mike Kravetz wrote: >>> On 1/11/19 1:55 PM, Kirill A. Shutemov wrote: >>>> On Fri, Jan 11, 2019 at 08:10:03PM +0000, Mike Kravetz wrote: >>>>> At LPC last year, Boaz Harrosh asked why he had to 'jump through hoops' >>>>> to get an address returned by mmap() suitably aligned for THP. It seems >>>>> that if mmap is asking for a mapping length greater than huge page >>>>> size, it should align the returned address to huge page size. >> >> A better heuristic would be to return an aligned address if the length >> is a multiple of the huge page size. The gap (if any) between the end of >> the previous VMA and the start of this VMA would be filled by subsequent >> smaller mmap requests. The new behavior would need to become part of the >> mmap interface definition so apps can rely on it and omit their hoop-jumping >> code. > > Yes, the heuristic really should be 'length is a multiple of the huge page > size'. As you mention, this would still leave gaps. I need to look closer > but this may not be any worse than the trick of mapping an area with rounded > up length and then unmapping pages at the beginning. > > When I sent this out, the thought in the back of my mind was that this doesn't > really matter unless there is some type of alignment guarantee. Otherwise, > user space code needs continue employing their code to check/force alignment. > Making matters somewhat worse is that I do not believe there is C interface to > query huge page size. I thought there was discussion about adding one, but I > can not find it. Right. Solaris provides getpagesizes(). >> Personally I would like to see a new MAP_ALIGN flag and treat the addr >> argument as the alignment (like Solaris), but I am told that adding flags >> is problematic because old kernels accept undefined flag bits from userland >> without complaint, so their behavior would change. > > Well, a flag would clearly define desired behavior. > > As others have been mentioned, there are mechanisms in place that allow user > space code to get the alignment it wants. However, it is at the expense of > an additional system call or two. Perhaps the question is, "Is it worth > defining new behavior to eliminate this overhead?". > > One other thing to consider is that at mmap time, we likely do not know if > the vma will/can use THP. We would know if system wide THP configuration > is set to never or always. However, I 'think' the default for most distros > is madvize. Therefore, it is not until a subsequent madvise call that we > know THP will be employed. If the application code will need to make this > separate madvise call, then perhaps it is not too much to expect that it > take explicit action to optimally align the mapping. True. It is annoying to write the extra code, but the power user will do it. The heuristic alignment would primarily benefit applications that are not as carefully optimized. - Steve