Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3894458imu; Mon, 14 Jan 2019 10:56:33 -0800 (PST) X-Google-Smtp-Source: ALg8bN7sVoj7Iy7+VHHxSETRYZREDZ+B51gQWpfpArURdzA7/fabJ4xwg5EQ9SmJDGP/cfROZXO8 X-Received: by 2002:a17:902:b18b:: with SMTP id s11mr26600743plr.56.1547492193824; Mon, 14 Jan 2019 10:56:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547492193; cv=none; d=google.com; s=arc-20160816; b=OsxGaSkQZybiM2VNVT7PSXYhpiSqhNl/HoHXoJuNUySCJgLu80EZK3ARmLzdbEsp+C czp+PKGbhHQS4Z42OzyGgJlJyb1O26KxlmspP9si2J+iEGroIok7NeGJpoav7cho05mf edwsuzuIblIS3CapHEpCaVp0mLe+NOK0gcq5vVoCcOL2pUFhSk5a4S8B60IsNOMNbT4X 9tK8Qpg3W5TzfDxQYLn/mozLp0aP5KKpN3qp0vvdRlrmuX2sekGvpLd+Tx9SKs9SWZLj jOFOwz9x+eCxCVlb7X6lwt5xMOAPrvZikjJg6W7LrCJ8fFerBvAjCd3iX5ZoWTDpFywA 3OQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=D6IcMPvxpx46yhXElMiPg47qBx/KN44eshTKnYB9/9E=; b=XOqcdbfX4ro/Ydv6DGb3oYbpjL8KyvJCC81zIMYag12L7zhgLjfM3kuJ59SdAlH5VU RM3llC/PnZQB7zGiA1frSLl02ubW6fDN+tu1bM+WTl2YtcVin5C0DqLJMp6Lm5MOt/Vl dN6TgSpJQHEQYoI14gRWTz5TurSY1hctJv1fxuEXW6p50OFkcZvVaiXcbofLZbUiCLml NCC4nemrvsTRfKuWKDDh1MBhfzHmoolJVYy3xlS7oPab75aTwTbIOfKKA7jMWFFt2W98 GkF1GL9qDdVHGvscQ1QkUdmGrK4ZGzikzgckcUQNe98kasDTB8tdbxkFL38sLgJY/O0g SlMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=JS9BnpdP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u9si998130plk.61.2019.01.14.10.56.17; Mon, 14 Jan 2019 10:56:33 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=JS9BnpdP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726767AbfANSzK (ORCPT + 99 others); Mon, 14 Jan 2019 13:55:10 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:34000 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726729AbfANSzK (ORCPT ); Mon, 14 Jan 2019 13:55:10 -0500 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id x0EIi8Jb136821; Mon, 14 Jan 2019 18:54:50 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=D6IcMPvxpx46yhXElMiPg47qBx/KN44eshTKnYB9/9E=; b=JS9BnpdPcVehXXG2NHPEINH5dZ0LJB3K2iIn8Vll2d9XGvKDHlGZqqA6HDGpbs1AHhVK ewLUuyBp4U/6AkiJSpBN+UuduD6KO1Lfsu/CrP+DpxYYV1J7UQLbenkTvhfq8zsuUnGB dqQ3/Fx1mOj1Y6sDOHfFWRQj9DPVLDEl+w5mIXh++qdMsK2C0cGxjCN8QrrO4/oA+a4p s9xCDuXCX60hzAKUsTkjgTTw8n6OXvmlJLiJyf8ob9Qk+acW9zAPEQ508m8rZ+titgd1 5nvfXT0sFk/p5NIlxSDMcx7JHnLSFlqgRA71k83NQt541SujNQXCmaqc4RcDwFJOz1xz Hg== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp2130.oracle.com with ESMTP id 2pybjnfc8j-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 14 Jan 2019 18:54:49 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id x0EIsm5W029645 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 14 Jan 2019 18:54:49 GMT Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x0EIsl3I015036; Mon, 14 Jan 2019 18:54:47 GMT Received: from [192.168.1.164] (/50.38.38.67) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 14 Jan 2019 10:54:46 -0800 Subject: Re: [RFC PATCH] mm: align anon mmap for THP To: Steven Sistare , "Kirill A. Shutemov" , linux_lkml_grp@oracle.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Hugh Dickins , Michal Hocko , Dan Williams , Matthew Wilcox , Toshi Kani , Boaz Harrosh , Andrew Morton References: <20190111201003.19755-1-mike.kravetz@oracle.com> <20190111215506.jmp2s5end2vlzhvb@black.fi.intel.com> From: Mike Kravetz Message-ID: <50c6abdc-b906-d16a-2f8f-8647b3d129aa@oracle.com> Date: Mon, 14 Jan 2019 10:54:45 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9136 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1901140147 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/14/19 7:35 AM, Steven Sistare wrote: > On 1/11/2019 6:28 PM, Mike Kravetz wrote: >> On 1/11/19 1:55 PM, Kirill A. Shutemov wrote: >>> On Fri, Jan 11, 2019 at 08:10:03PM +0000, Mike Kravetz wrote: >>>> At LPC last year, Boaz Harrosh asked why he had to 'jump through hoops' >>>> to get an address returned by mmap() suitably aligned for THP. It seems >>>> that if mmap is asking for a mapping length greater than huge page >>>> size, it should align the returned address to huge page size. > > A better heuristic would be to return an aligned address if the length > is a multiple of the huge page size. The gap (if any) between the end of > the previous VMA and the start of this VMA would be filled by subsequent > smaller mmap requests. The new behavior would need to become part of the > mmap interface definition so apps can rely on it and omit their hoop-jumping > code. Yes, the heuristic really should be 'length is a multiple of the huge page size'. As you mention, this would still leave gaps. I need to look closer but this may not be any worse than the trick of mapping an area with rounded up length and then unmapping pages at the beginning. When I sent this out, the thought in the back of my mind was that this doesn't really matter unless there is some type of alignment guarantee. Otherwise, user space code needs continue employing their code to check/force alignment. Making matters somewhat worse is that I do not believe there is C interface to query huge page size. I thought there was discussion about adding one, but I can not find it. > Personally I would like to see a new MAP_ALIGN flag and treat the addr > argument as the alignment (like Solaris), but I am told that adding flags > is problematic because old kernels accept undefined flag bits from userland > without complaint, so their behavior would change. Well, a flag would clearly define desired behavior. As others have been mentioned, there are mechanisms in place that allow user space code to get the alignment it wants. However, it is at the expense of an additional system call or two. Perhaps the question is, "Is it worth defining new behavior to eliminate this overhead?". One other thing to consider is that at mmap time, we likely do not know if the vma will/can use THP. We would know if system wide THP configuration is set to never or always. However, I 'think' the default for most distros is madvize. Therefore, it is not until a subsequent madvise call that we know THP will be employed. If the application code will need to make this separate madvise call, then perhaps it is not too much to expect that it take explicit action to optimally align the mapping. -- Mike Kravetz