Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp2646335ybc; Mon, 25 Nov 2019 01:41:31 -0800 (PST) X-Google-Smtp-Source: APXvYqxJ2ule7kh6NWLcDsMD4Wl0ad3EEf8f91jv/bgod8ZPQKttO+TyElCjXM8gX7/uvCnYtViU X-Received: by 2002:a17:906:c7d0:: with SMTP id dc16mr36300493ejb.73.1574674890969; Mon, 25 Nov 2019 01:41:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1574674890; cv=none; d=google.com; s=arc-20160816; b=pMCDgDWTmZ/3VPjYcbOds7/FBddH5J9mwsoW1Yz20iYIyQ+ZjP0Sr51Sw01VzQtg8P 3TKkGI2R2yMpjzNceAVnZAlZHVWp/dF4zjEr1MGfWSGunXvWMUEaoBumR8ueEF9ZOpYJ Yu5HzBqPWQ8rV4Y2hiDo9W4PB/Hc03VdK/8g19QP12mGMBjnuCZ2BgwcFXddPkyKe3RX De+UrT73gJ1OEhLRsCk52HaivXzf5DZgmqAWFnYltvMCo0Bf3fUgkVL5AbKqyHa5GTh7 CCER9nU0liyq9Hr0jn0qBH0i75rl+0xP6uoSJBSm4Zq6j1Zkls828TdKvOW8uAXqnSpC VMFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=Lltk+ABKwMzN4Q3qE+TFYH7hNAzWoOTyrBpQNHINGWg=; b=hbqr2sG1P9qtBBMtBrYAsdh3iTnQuV71w+6NNz8vE33+gmuJsluOzPkHZwRVRzAaX1 lWiquLl8YrpqIdvbjraveCpFot9G3dE5JMqaxYsNm8yjGcjGvPVMLbSqSDft23rThbCc I1mjeiqLnX6yf05TPTkqAT5znVKQlgHHePgzfeHSBMseeE7dM9brCn4jkcrfmEXZ4mG0 e1PC8nPjFWYH80vkw/8g7VyK6ICOS8gmAkYnT64/DNs/R+rjXRIeQqL1F+smnFAYiDUd JdLzxq4YcxkN6h+JzXehXN2tosxKUr9xYXrIj3V9qNcRjGBi6/Wi0JFDZQnIk6Y2V7ij W3vQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=pOGsDGS1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id lw2si4094330ejb.440.2019.11.25.01.41.05; Mon, 25 Nov 2019 01:41:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=pOGsDGS1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727372AbfKYJgF (ORCPT + 99 others); Mon, 25 Nov 2019 04:36:05 -0500 Received: from mail-lj1-f195.google.com ([209.85.208.195]:39694 "EHLO mail-lj1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727247AbfKYJgF (ORCPT ); Mon, 25 Nov 2019 04:36:05 -0500 Received: by mail-lj1-f195.google.com with SMTP id e10so5827217ljj.6 for ; Mon, 25 Nov 2019 01:36:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=Lltk+ABKwMzN4Q3qE+TFYH7hNAzWoOTyrBpQNHINGWg=; b=pOGsDGS1zDV3Ji0L/1c0M+QXhoBqekXbOPlVWITZTZtMhTIefNFnDfdvq1VmiXPfbf teEowcVmzNnApS8P56bu43c43VqftJ+caG5ytLqALGFHSUZCOIRlV1A8/+RVEIBwuAEI cu/kBnw3wE2btTRTTWCz9VSZjeRAOeuskHgL4D1BbzdpssJDQyIeRK3FaZEIlFgoGOIm nzKJABybFIShwvGdy9JUFpJioZ0tQ5huegi1TvIpCsQUuX2J5pKSTnZHluPLlqg2Rs8j EL12ZUlwcre5plySoqzI3zcopFpHXNFi1WncaBkmxw0V9KkOWTGLJ/yCYwkp7e/3wyJ0 w27Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=Lltk+ABKwMzN4Q3qE+TFYH7hNAzWoOTyrBpQNHINGWg=; b=ml6SJW77wGMi81tYf6Jqlxs6zZSYHCPN6WUfgl06vAPoLqoot0mBPfuVf4kQOF6Iq/ oQnXI6KZc71O/QGLYSR5/sz1vsEMdHFzO+HDfocZIqhfXqhtziI6xlMfniXPxcj5xtPv QN1g1Za/YLSj9Kn6diODZmAWlaO2bYFVclHd/CRspVaunGIcJwJXLeHG7c5Fc2yQ493+ QcIY/JneqxMGnuJf8m5f/SJxFK57sF+ExvvXc0SGOAEWVNM6EbBhz8jRTwZXA8LoBGp0 wIFhjQAYPPPgg4yq19n5c8TsHtLAcDr+odRKx9yZG1ylsAZyc/yeTsBs/02yns6z38Le swMQ== X-Gm-Message-State: APjAAAWvkcUz3hECstuFT/uAjZf7zRyen0r3Pi1irgUX/cn7vtgbW2qO PYCgSR8eBDDE/uLNJEkSz/J2Uw== X-Received: by 2002:a2e:84d0:: with SMTP id q16mr10206471ljh.48.1574674563554; Mon, 25 Nov 2019 01:36:03 -0800 (PST) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id u67sm3581116lja.78.2019.11.25.01.36.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Nov 2019 01:36:02 -0800 (PST) Received: by box.localdomain (Postfix, from userid 1000) id BBB581032C4; Mon, 25 Nov 2019 12:36:11 +0300 (+03) Date: Mon, 25 Nov 2019 12:36:11 +0300 From: "Kirill A. Shutemov" To: Yang Shi Cc: hughd@google.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH] mm: shmem: allow split THP when truncating THP partially Message-ID: <20191125093611.hlamtyo4hvefwibi@box> References: <1574471132-55639-1-git-send-email-yang.shi@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1574471132-55639-1-git-send-email-yang.shi@linux.alibaba.com> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Nov 23, 2019 at 09:05:32AM +0800, Yang Shi wrote: > Currently when truncating shmem file, if the range is partial of THP > (start or end is in the middle of THP), the pages actually will just get > cleared rather than being freed unless the range cover the whole THP. > Even though all the subpages are truncated (randomly or sequentially), > the THP may still be kept in page cache. This might be fine for some > usecases which prefer preserving THP. > > But, when doing balloon inflation in QEMU, QEMU actually does hole punch > or MADV_DONTNEED in base page size granulairty if hugetlbfs is not used. > So, when using shmem THP as memory backend QEMU inflation actually doesn't > work as expected since it doesn't free memory. But, the inflation > usecase really needs get the memory freed. Anonymous THP will not get > freed right away too but it will be freed eventually when all subpages are > unmapped, but shmem THP would still stay in page cache. > > To protect the usecases which may prefer preserving THP, introduce a > new fallocate mode: FALLOC_FL_SPLIT_HPAGE, which means spltting THP is > preferred behavior if truncating partial THP. This mode just makes > sense to tmpfs for the time being. We need to clarify interaction with khugepaged. This implementation doesn't do anything to prevent khugepaged from collapsing the range back to THP just after the split. > @@ -976,8 +1022,31 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, > } > unlock_page(page); > } > +rescan_split: > pagevec_remove_exceptionals(&pvec); > pagevec_release(&pvec); > + > + if (split && PageTransCompound(page)) { > + /* The THP may get freed under us */ > + if (!get_page_unless_zero(compound_head(page))) > + goto rescan_out; > + > + lock_page(page); > + > + /* > + * The extra pins from page cache lookup have been > + * released by pagevec_release(). > + */ > + if (!split_huge_page(page)) { > + unlock_page(page); > + put_page(page); > + /* Re-look up page cache from current index */ > + goto again; > + } > + unlock_page(page); > + put_page(page); > + } > +rescan_out: > index++; > } Doing get_page_unless_zero() just after you've dropped the pin for the page looks very suboptimal. -- Kirill A. Shutemov