Received: by 10.223.176.46 with SMTP id f43csp2558663wra; Thu, 25 Jan 2018 11:44:31 -0800 (PST) X-Google-Smtp-Source: AH8x224JHzvqR2AL3BbKoX+07N5qcaqVv72KgPwTmruYE1nPLS320MPhMoZc5Nev0mA1EIP2RnDr X-Received: by 2002:a17:902:988b:: with SMTP id s11-v6mr11598925plp.99.1516909471264; Thu, 25 Jan 2018 11:44:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516909471; cv=none; d=google.com; s=arc-20160816; b=skUttIG9/ms5kbHhJLtqYI3vWi7Tv0PZQ8Hik1MVquLkLO6i/Lf6t4yxb7TddvOskj cE3HCoaLpHdpOkF/AI5nFncyYdh2hg4gsd2j/+2lyN2xTOET2Aa6lLd2ZfVvUCuqxf4S 8S4oOZq+aRnbCje4JNZbtK6gdv8onaTZgAZo5h7MvfHdQMw8WC19Jr5R5a6c1OeefdE1 wXTgaEsDJR3U+L4XheR5ZaqvBhV/0qhxO28Wrum0eDx1qACehuNxkSuOM7SCwFzOqfY+ 3j9zQ3hrLLD/B7dthglbPmcLWFFpBLO+Z2K6gHyMnSNFold1eXHOWQRzZRPRJ1GmaIxe WdnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=Ba/MIYLVdsr+ww/PA5N1rBZlWqntgc5VFtcFNB0d+jA=; b=VIhuYh6MZAjIwvJAXaM/UjJKW3QctXL47nEih7Y/Xx3WRq9d+bMRql1IjjihF6t7Xb yPdk3NUmVH43TG/SCcU/tXzkwPybuP+iKpPTm5ptclCOp+ZLE/VEFpNXMe1T6+qD/kje dPKWlp+Q/oN9JoPPHv5rbdJZiBfGtdCf+yjwiH1F+8R77C/ZZLZrB3Yx8n254mUOHh4j poTd4HxkyhMLlgSruRYmscMJ4AHJyjWjzkbZrjEGQXKn+fGhAuRXaj+LTGJn3aU+RlHo 6vmo1yNBs/tJu+1ChTJcJyMJi6GGKrI58LkJor2GJwBAc08aM8xdWmwVma0niLDw2gB5 iFcg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=YFUndumT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a12si1932393pgq.440.2018.01.25.11.44.15; Thu, 25 Jan 2018 11:44:31 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=YFUndumT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751232AbeAYTnv (ORCPT + 99 others); Thu, 25 Jan 2018 14:43:51 -0500 Received: from userp2120.oracle.com ([156.151.31.85]:41902 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751108AbeAYTnu (ORCPT ); Thu, 25 Jan 2018 14:43:50 -0500 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w0PJg3w2090110; Thu, 25 Jan 2018 19:43:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=Ba/MIYLVdsr+ww/PA5N1rBZlWqntgc5VFtcFNB0d+jA=; b=YFUndumTToP5kP4vwQX2QIYVZz6QoPSNuWDzO+fHsFAQVRt96iYN5yS++nz8/gfpBJpT MLjQcmXdNbitTXdHjek7C7uMckey/ztyzV3UIHb+7nL9L7mVCATAFpJPdMdGDC9hHtY2 HusJl0r4QTkyKuvB2BypF/8EyJvAE016Ao5hrdyv+vc2Kwo8RjX4bpKbks/aGwLzVwvA ci+Wrsqml5K7XtxHluefjM+BBfXl++Shm2cN7201piRRwjotrRopg/EO/cRJp8yf3UmT aA8v3A8so4iQQH3Nq/2Ig+/JOtV/KhF2x9NlMfioUjAfgNWqfBuM5PrDo2tv5SXen/+M dw== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2120.oracle.com with ESMTP id 2fqk3sgvhd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 Jan 2018 19:43:09 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w0PJh7UA002568 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 25 Jan 2018 19:43:07 GMT Received: from abhmp0003.oracle.com (abhmp0003.oracle.com [141.146.116.9]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w0PJh43m009510; Thu, 25 Jan 2018 19:43:04 GMT Received: from [10.211.47.120] (/10.211.47.120) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 25 Jan 2018 11:43:03 -0800 Subject: Re: [PATCH v2] mm: Reduce memory bloat with THP To: Zi Yan Cc: Michal Hocko , Nitin Gupta , steven.sistare@oracle.com, Andrew Morton , Ingo Molnar , Mel Gorman , Nadav Amit , Minchan Kim , "Kirill A. Shutemov" , Peter Zijlstra , Vegard Nossum , "Levin, Alexander" , Mike Rapoport , Hillf Danton , Shaohua Li , Anshuman Khandual , Andrea Arcangeli , David Rientjes , Rik van Riel , Jan Kara , Dave Jiang , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Matthew Wilcox , Ross Zwisler , Hugh Dickins , Tobin C Harding , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <1516318444-30868-1-git-send-email-nitingupta910@gmail.com> <20180119124957.GA6584@dhcp22.suse.cz> <59F98618-C49F-48A8-BCA1-A8F717888BAA@cs.rutgers.edu> From: Nitin Gupta Message-ID: <4d7ce874-9771-ad5f-c064-52a46fc37689@oracle.com> Date: Thu, 25 Jan 2018 11:41:03 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <59F98618-C49F-48A8-BCA1-A8F717888BAA@cs.rutgers.edu> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8785 signatures=668655 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801250261 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/24/2018 04:47 PM, Zi Yan wrote: >>>> With this change, whenever an application issues MADV_DONTNEED on a >>>> memory region, the region is marked as "space-efficient". For such >>>> regions, a hugepage is not immediately allocated on first write. >>> Kirill didn't like it in the previous version and I do not like this >>> either. You are adding a very subtle side effect which might completely >>> unexpected. Consider userspace memory allocator which uses MADV_DONTNEED >>> to free up unused memory. Now you have put it out of THP usage >>> basically. >>> >> Userpsace may want a region to be considered by khugepaged while opting >> out of hugepage allocation on first touch. Asking userspace memory >> allocators to have to track and reclaim unused parts of a THP allocated >> hugepage does not seems right, as the kernel can use simple userspace >> hints to avoid allocating extra memory in the first place. >> >> I agree that this patch is adding a subtle side-effect which may take >> some applications by surprise. However, I often see the opposite too: >> for many workloads, disabling THP is the first advise as this aggressive >> allocation of hugepages on first touch is unexpected and is too >> wasteful. For e.g.: >> >> 1) Disabling THP for TokuDB (Storage engine for MySQL, MariaDB) >> http://www.chriscalender.com/disabling-transparent-hugepages-for-tokudb/ >> >> 2) Disable THP on MongoDB >> https://docs.mongodb.com/manual/tutorial/transparent-huge-pages/ >> >> 3) Disable THP for Couchbase Server >> https://blog.couchbase.com/often-overlooked-linux-os-tweaks/ >> >> 4) Redis >> http://antirez.com/news/84 >> >> >>> If the memory is used really scarce then we have MADV_NOHUGEPAGE. >>> >> It's not really about memory scarcity but a more efficient use of it. >> Applications may want hugepage benefits without requiring any changes to >> app code which is what THP is supposed to provide, while still avoiding >> memory bloat. >> > I read these links and find that there are mainly two complains: > 1. THP causes latency spikes, because direction compaction slows down THP allocation, > 2. THP bloats memory footprint when jemalloc uses MADV_DONTNEED to return memory ranges smaller than > THP size and fails because of THP. > > The first complain is not related to this patch. I'm trying to address many different THP issues and memory bloat is first among them. > For second one, at least with recent kernels, MADV_DONTNEED splits THPs and returns the memory range you > specified in madvise(). Am I missing anything? > Yes, MADV_DONTNEED splits THPs and releases the requested range but this is not solving the issue of aggressive alloc-hugepage-on-first-touch policy of THP=madvise on MADV_HUGEPAGE regions. Sure, some workloads may prefer that policy but for application that don't, this patch give them an option to give hints to the kernel to go for gradual hugepage promotion via khugepaged only (and not on first touch). It's not good if an application has to track which parts of their (implicitly allocated) hugepage are in use and which sub-parts are free so they can issue MADV_DONTNEED calls on them. This approach really does not make THP "transparent" and requires lot of mm tracking code in userpace. Nitin