Received: by 10.223.176.5 with SMTP id f5csp2109568wra; Wed, 31 Jan 2018 17:14:39 -0800 (PST) X-Google-Smtp-Source: AH8x2256ajgQjpEiT8BGeZtBAUvgg/ARlTK3PLZnqdv8lN5ibZ3bhaPBKn8CscPLt8lDcdVb1DAm X-Received: by 10.98.58.194 with SMTP id v63mr35413472pfj.36.1517447679152; Wed, 31 Jan 2018 17:14:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517447679; cv=none; d=google.com; s=arc-20160816; b=UaaCjbSl67AZUplx+IsfHw95Nf18GLCNN7aCqGTUUHmUms7D1iADm9NRvvo8SVoa0c dgtwanScbG5tGH7FDjXnqvfVYfFMFnKzoblNioNWpY6LMjAplezYXw/1TtKSgsqyFNCz sHo0LLXEE32i0hMtEwjQMffKNLrzZGo4Ni1AAPxq18NwN9yBVa2vvWwEq+YETVkDX7C7 Sf7IgNBUmLcvalm3ZLPF06UK5K7BduNA0nAWB6HSyWAAye6wdMwiuTu48HVnQYuHtRLK lXvbMlHEOqhF2pqr3z4NJnHh3I6IxVgQm29W3jiC6hrpG1txtWH8CA2h1BwdlOpXySNm NIfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=nLoCTLzkaVdpKepwXtp+BULC2XEhxQnEkNOudAgY1SE=; b=G3G3HlsCIcfehRJ+/hNld8Kl2XlW+inOCksEBf89WIVu/MWeUFMageDPAmCPnVyfkG RH4RdkD3pvKdKDz2B1PVuPYS/B1l7QQJunJFYZ+VWJ/5z2+5hum8jLEWgqjVR3ZsjwOt Gl4gKNPGBwCIWMq0vg7ae/XYzrtQvyks6Ndyq7PNUBqAN6/T/uzv15D+M6q+ZO/uXRyS qmpYI1kW6HtofXF+lQkGfTJgu6Ax95Z4Af7O3R0TVH8gb6dYy012UcCKwNacMrNorIPH IIk5QUOFocLz8+AXlBfKtJz/EyvK4BPpmQVJpcQy49qlnJxEaIlSRk1wi7s0pbHNFnwn eIWQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=NXYDOqfR; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a16-v6si806162plm.392.2018.01.31.17.14.23; Wed, 31 Jan 2018 17:14:39 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=NXYDOqfR; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756013AbeBABN7 (ORCPT + 99 others); Wed, 31 Jan 2018 20:13:59 -0500 Received: from userp2120.oracle.com ([156.151.31.85]:57700 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755903AbeBABN6 (ORCPT ); Wed, 31 Jan 2018 20:13:58 -0500 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w1112g4i007497; Thu, 1 Feb 2018 01:12:12 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=nLoCTLzkaVdpKepwXtp+BULC2XEhxQnEkNOudAgY1SE=; b=NXYDOqfR/u8+kGoXVKq1g8gLeFBN87leBP6fXzZgVKK9DU9HFDvErw600wAmfG4bCkE5 GfIidgarUufITKUhhUVUgLgBZY/7efpC9BegL8RxA9HVRl/cmDk8SgRsAeHL4Y+tq/N7 gU9Gx35F3X023Yb/f5SMaNuaX66fcvt5yStwfv+bl4kPAA+mxlIAMaoSgXZsltLHPajz DscQueouLev6xpJOtGsYxtych5v8emCguhaxnoD7Msd4LHx2TbxjRgFYunMFBZJYWm6P zUVsWwtGkfMh6yEFzHwgf5TSa6CCLTi/inH7XMkd75+pcQVzd6LPYw/d5nL9gLqiw85m dQ== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2120.oracle.com with ESMTP id 2fuqnt86sj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 01 Feb 2018 01:12:12 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w111CAdk021694 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 1 Feb 2018 01:12:10 GMT Received: from abhmp0017.oracle.com (abhmp0017.oracle.com [141.146.116.23]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w111C5iL019889; Thu, 1 Feb 2018 01:12:06 GMT Received: from [10.211.47.120] (/10.211.47.120) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 31 Jan 2018 17:12:05 -0800 Subject: Re: [PATCH v2] mm: Reduce memory bloat with THP To: Mel Gorman Cc: Zi Yan , Michal Hocko , Nitin Gupta , steven.sistare@oracle.com, Andrew Morton , Ingo Molnar , Nadav Amit , Minchan Kim , "Kirill A. Shutemov" , Peter Zijlstra , Vegard Nossum , "Levin, Alexander" , Mike Rapoport , Hillf Danton , Shaohua Li , Anshuman Khandual , Andrea Arcangeli , David Rientjes , Rik van Riel , Jan Kara , Dave Jiang , J?r?me Glisse , Matthew Wilcox , Ross Zwisler , Hugh Dickins , Tobin C Harding , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <1516318444-30868-1-git-send-email-nitingupta910@gmail.com> <20180119124957.GA6584@dhcp22.suse.cz> <59F98618-C49F-48A8-BCA1-A8F717888BAA@cs.rutgers.edu> <4d7ce874-9771-ad5f-c064-52a46fc37689@oracle.com> <20180125211303.rbfeg7ultwr6hpd3@suse.de> From: Nitin Gupta Message-ID: Date: Wed, 31 Jan 2018 17:09:48 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180125211303.rbfeg7ultwr6hpd3@suse.de> Content-Type: text/plain; charset=iso-8859-15 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8791 signatures=668659 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1802010012 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/25/2018 01:13 PM, Mel Gorman wrote: > On Thu, Jan 25, 2018 at 11:41:03AM -0800, Nitin Gupta wrote: >>>> It's not really about memory scarcity but a more efficient use of it. >>>> Applications may want hugepage benefits without requiring any changes to >>>> app code which is what THP is supposed to provide, while still avoiding >>>> memory bloat. >>>> >>> I read these links and find that there are mainly two complains: >>> 1. THP causes latency spikes, because direction compaction slows down THP allocation, >>> 2. THP bloats memory footprint when jemalloc uses MADV_DONTNEED to return memory ranges smaller than >>> THP size and fails because of THP. >>> >>> The first complain is not related to this patch. >> >> I'm trying to address many different THP issues and memory bloat is >> first among them. > > Expecting userspace to get this right is probably going to go sideways. > It'll be screwed up and be sub-optimal or have odd semantics for existing > madvise flags. The fact is that an application may not even know if it's > going to be sparsely using memory in advance if it's a computation load > modelling from unknown input data. > > I suggest you read the old Talluri paper "Superpassing the TLB Performance > of Superpages with Less Operating System Support" and pay attention to > Section 4. There it discusses a page reservation scheme whereby on fault > a naturally aligned set of base pages are reserved and only one correctly > placed base page is inserted into the faulting address. It was tied into > a hypothetical piece of hardware that doesn't exist to give best-effort > support for superpages so it does not directly help you but the initial > idea is sound. There are holes in the paper from todays perspective but > it was written in the 90's. > > From there, read "Transparent operating system support for superpages" > by Navarro, particularly chapter 4 paying attention to the parts where > it talks about opportunism and promotion threshold. > > Superficially, it goes like this > > 1. On fault, reserve a THP in the allocator and use one base page that > is correctly-aligned for the faulting addresses. By correctly-aligned, > I mean that you use base page whose offset would be naturally contiguous > if it ever was part of a huge page. > 2. On subsequent faults, attempt to use a base page that is naturally > aligned to be a THP > 3. When a "threshold" of base pages are inserted, allocate the remaining > pages and promote it to a THP > 4. If there is memory pressure, spill "reserved" pages into the main > allocation pool and lose the opportunity to promote (which will need > khugepaged to recover) > > By definition, a promotion threshold of 1 would be the existing scheme > of allocation a THP on the first fault and some users will want that. It > also should be the default to avoid unexpected overhead. For workloads > where memory is being sparsely addressed and the increased overhead of > THP is unwelcome then the threshold should be tuned higher with a maximum > possible value of HPAGE_PMD_NR. > > It's non-trivial to do this because at minimum a page fault has to check > if there is a potential promotion candidate by checking the PTEs around > the faulting address searching for a correctly-aligned base page that is > already inserted. If there is, then check if the correctly aligned base > page for the current faulting address is free and if so use it. It'll > also then need to check the remaining PTEs to see if both the promotion > threshold has been reached and if so, promote it to a THP (or else teach > khugepaged to do an in-place promotion if possible). In other words, > implementing the promotion threshold is both hard and it's not free. > > However, if it did exist then the only tunable would be the "promotion > threshold" and applications would not need any special awareness of their > address space. > I went through both references you mentioned and I really like the idea of reservation-based hugepage allocation. Navarro also extends the idea to allow multiple hugepage sizes to be used (as support by underlying hardware) which was next in order of what I wanted to do in THP. So, please ignore this patch and I would work towards implementing ideas in these papers. Thanks for the feedback. Nitin