Received: by 10.223.176.46 with SMTP id f43csp1214379wra; Wed, 24 Jan 2018 12:33:38 -0800 (PST) X-Google-Smtp-Source: AH8x224YeidxDjKh5DFGXD+8Wi2BD5y4ZzkAJ4lrb+nVIJLxcxDO9jBTIO2vz+O1Ai9gwySp0lyf X-Received: by 10.98.161.16 with SMTP id b16mr14129305pff.34.1516826017995; Wed, 24 Jan 2018 12:33:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516826017; cv=none; d=google.com; s=arc-20160816; b=TNe8yg2Irn3jdqznkRSDI5d5jNyKFsO4EdM4tzjx+uvi8Oif0/byAdXUYRpjSltO7O QVAEFUljo0q8H8Jg9RISqeBUOC90GyLKf+KZpS/DQPop8+RgubkSukXqz+u/R+PBjV1c Jybf1g84bfkQ/1N6b3iaBRuJKJfevoKXX1//OGpy5zW2VfBY43SokJQ8HexUoJ6SswwG bcnVDvXceMtX+xXXxsj+RJz02vq1Yu4CBc7gthrbpGpPlCzuxyvWPMLQuh9PcZLJNvuq aagcDfgtnrnTjGHhA2uWX9eddvnnv3DLBWIiEuZ+cYNXeENB10n9TCKEpplonk3GixVR ljJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=58dlkFdlYjJz6ncWdLSiIpGCRQ0epj1cpwajad8nw/Y=; b=igsxtO8Qn0kbrGmKzrmVqaZkXDycc2cz+J1WCC6N3MFb7J3lVeImfllsTzwu/fzw52 PrX9uZv2K5xXLDJT5hDkZb9vVIGwwRA34LqxQUpkJ75dwr+KSDnh2bZOOQip44LVs+KI atFo3vBG4Eo6DE4T5CYIP2GGFxj1GgaW9y8bzI17YTtPhipYXVouI3JLWfObtjQxRRLy H+MHtI02aeLGD4P7G2V3mo9tJtPEZz9lML1BuCtXT+QQGQw1DaxAAXF3feYVui7bxR2d ruknQQoF26zmOyB5ngBseUMA4JV2cbuuUFzDVPdMJuLT1bwWceXVLG47U7MQny5SsyOA bDsg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=vI+1Uv/e; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b14-v6si698974pll.758.2018.01.24.12.33.24; Wed, 24 Jan 2018 12:33:37 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=vI+1Uv/e; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932364AbeAXUcx (ORCPT + 99 others); Wed, 24 Jan 2018 15:32:53 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:34138 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932072AbeAXUcv (ORCPT ); Wed, 24 Jan 2018 15:32:51 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w0OKVxpx043772; Wed, 24 Jan 2018 20:32:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=58dlkFdlYjJz6ncWdLSiIpGCRQ0epj1cpwajad8nw/Y=; b=vI+1Uv/eixpXpeBpA1gXuKQCdrGm5m1nWA2Xd5hwQK23v3hwNb23EP5LKKIOepb2R9Uu wGFShJST5Ylr+fISd/yWJIJ0OyAsjg6a3QDpjemId1f9SCW1IZC5xQhXOuEBsDTUJVWb HhlxNt6FtTc/1jo+05VTlQX95tJREhSymCWc8qhjf0y38PccEZ+dMqM79d/W1SENZazC Ufq8L7fXtMs98UXvooFWYk6fJ5ynhjDK9TiMrJWjUZMwSGULAkDMYcXCAmXdR82o/Fxa iVnU2IIYnk13JCUOZghjZLwalnQUkxzSMW5qhO6+oG2lyRPWNWNyax92DdpFPsnFLmxJ Bw== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2120.oracle.com with ESMTP id 2fq0v9r6aa-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Jan 2018 20:32:14 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w0JKxLjw020487 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 19 Jan 2018 20:59:21 GMT Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w0JKxJZg004184; Fri, 19 Jan 2018 20:59:19 GMT Received: from Nitins-iMac.local (/96.82.68.94) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 19 Jan 2018 12:59:19 -0800 Subject: Re: [PATCH v2] mm: Reduce memory bloat with THP To: Michal Hocko , Nitin Gupta Cc: steven.sistare@oracle.com, Andrew Morton , Ingo Molnar , Mel Gorman , Nadav Amit , Minchan Kim , "Kirill A. Shutemov" , Peter Zijlstra , Vegard Nossum , "Levin, Alexander (Sasha Levin)" , Mike Rapoport , Hillf Danton , Shaohua Li , Anshuman Khandual , Andrea Arcangeli , David Rientjes , Rik van Riel , Jan Kara , Dave Jiang , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Matthew Wilcox , Ross Zwisler , Hugh Dickins , Tobin C Harding , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <1516318444-30868-1-git-send-email-nitingupta910@gmail.com> <20180119124957.GA6584@dhcp22.suse.cz> From: Nitin Gupta Message-ID: Date: Fri, 19 Jan 2018 12:59:17 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <20180119124957.GA6584@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8784 signatures=668655 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801240268 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/19/18 4:49 AM, Michal Hocko wrote: > On Thu 18-01-18 15:33:16, Nitin Gupta wrote: >> From: Nitin Gupta >> >> Currently, if the THP enabled policy is "always", or the mode >> is "madvise" and a region is marked as MADV_HUGEPAGE, a hugepage >> is allocated on a page fault if the pud or pmd is empty. This >> yields the best VA translation performance, but increases memory >> consumption if some small page ranges within the huge page are >> never accessed. > > Yes, this is true but hardly unexpected for MADV_HUGEPAGE or THP always > users. > Yes, allocating hugepage on first touch is the current behavior for above two cases. However, I see issues with this current behavior. Firstly, THP=always mode is often too aggressive/wasteful to be useful for any realistic workloads. For THP=madvise, users may want to back active parts of memory region with hugepages while avoiding aggressive hugepage allocation on first touch. Or, they may really want the current behavior. With this patch, users would have the option to pick what behavior they want by passing hints to the kernel in the form of MADV_HUGEPAGE and MADV_DONTNEED madvise calls. >> An alternate behavior for such page faults is to install a >> hugepage only when a region is actually found to be (almost) >> fully mapped and active. This is a compromise between >> translation performance and memory consumption. Currently there >> is no way for an application to choose this compromise for the >> page fault conditions above. > > Is that really true? We have /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none > This is not reflected during the PF of course but you can control the > behavior there as well. Either by the global setting or a per proces > prctl. > I think this part of patch description needs some rewording. This patch is to change *only* the page fault behavior. Once pages are installed, khugepaged does its job as usual, using max_ptes_none and other config values. I'm not trying to change any khugepaged behavior here. >> With this change, whenever an application issues MADV_DONTNEED on a >> memory region, the region is marked as "space-efficient". For such >> regions, a hugepage is not immediately allocated on first write. > > Kirill didn't like it in the previous version and I do not like this > either. You are adding a very subtle side effect which might completely > unexpected. Consider userspace memory allocator which uses MADV_DONTNEED > to free up unused memory. Now you have put it out of THP usage > basically. > Userpsace may want a region to be considered by khugepaged while opting out of hugepage allocation on first touch. Asking userspace memory allocators to have to track and reclaim unused parts of a THP allocated hugepage does not seems right, as the kernel can use simple userspace hints to avoid allocating extra memory in the first place. I agree that this patch is adding a subtle side-effect which may take some applications by surprise. However, I often see the opposite too: for many workloads, disabling THP is the first advise as this aggressive allocation of hugepages on first touch is unexpected and is too wasteful. For e.g.: 1) Disabling THP for TokuDB (Storage engine for MySQL, MariaDB) http://www.chriscalender.com/disabling-transparent-hugepages-for-tokudb/ 2) Disable THP on MongoDB https://docs.mongodb.com/manual/tutorial/transparent-huge-pages/ 3) Disable THP for Couchbase Server https://blog.couchbase.com/often-overlooked-linux-os-tweaks/ 4) Redis http://antirez.com/news/84 > If the memory is used really scarce then we have MADV_NOHUGEPAGE. > It's not really about memory scarcity but a more efficient use of it. Applications may want hugepage benefits without requiring any changes to app code which is what THP is supposed to provide, while still avoiding memory bloat. -Nitin