Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp235943ybf; Thu, 27 Feb 2020 20:05:19 -0800 (PST) X-Google-Smtp-Source: APXvYqy+jbIVwRaPZLD0IJH5j5iFKdb97S8O5DNvXO1cxEV1o+khIm/nbNF30rBhwUsdKzUskw6y X-Received: by 2002:a05:6830:10da:: with SMTP id z26mr1664314oto.27.1582862719228; Thu, 27 Feb 2020 20:05:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582862719; cv=none; d=google.com; s=arc-20160816; b=IXC6o6ckyjW9sOuF2XLGtbIx4lZezJyEJVP5LZIyvMOcq4QlyBDqoH83wpwiQzkhzr WFtEJ9yxvvBP5XIwGjlpV7XS6cJkxfaM3IbM3/XkiYKwe58OCiox8ar+Wxmn2bZJt4YP Y1u876J8lqNzzbtvYIa0gFeCpJs8VW3ZV3TzCyZbl8MBOGrgB76CGhH5RRgyHRngrgiO GYc1rc2EL3htUmzcYCS9Qnp4c/agOi97bMV+i376BltiE9IkxvdO5MJctTx2LaPrJFFa RyilsE3M//ReBnEGez4w+7N7jxcHJ6ZNZT+/1deYYUGM1dLNugZT7WfZSh2q2MT0K8/n 6FjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=Mb+GXitK62yQ1M3tJCil58KnWQatqaLnDM8GS4W1DY8=; b=oZQXfE94qGv0Nj8Bu2MTJ88O/+Bt6P8YT5tIRUWTV9uM3dL5kTrgjpkjP29OG8bopR yvglb7sVyGuo2480kyLqPDd2LIGLE1js5JVoMB0Pq2GtJbO0zzFhhBk4vV9CGXr2Di+1 dxG0yBSjJgMr6jYC7RyzdcwU5FvogAE11mUcDWHiYR849/9+ukNeY0Lcey6B5V8nh/fA KXla3yL/2cWkxfACaNlvz4dY9+0wBG04KAAmsQDd1udB20Xs2effEIQROZx8d0SOV2ej b9yOwVJY+ww4vbNmaM/yZ12EzrcaO5gksXHB/l33M5DjS+EJsg0Okg57UuE8Ud4AnOdG cqvw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Ubod28Rt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y144si992511oia.67.2020.02.27.20.05.07; Thu, 27 Feb 2020 20:05:19 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Ubod28Rt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730827AbgB1EEl (ORCPT + 99 others); Thu, 27 Feb 2020 23:04:41 -0500 Received: from mail-pl1-f194.google.com ([209.85.214.194]:37275 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730638AbgB1EEl (ORCPT ); Thu, 27 Feb 2020 23:04:41 -0500 Received: by mail-pl1-f194.google.com with SMTP id q4so701742pls.4 for ; Thu, 27 Feb 2020 20:04:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=Mb+GXitK62yQ1M3tJCil58KnWQatqaLnDM8GS4W1DY8=; b=Ubod28RtEkdJd3Hi8hDlbiGPxxGjZwfCwSKMe6XvTdS7UrCDvlPHFSFff+GMqEy7Zb Q+oFuYFMBgQUcCjgs82T0soaXDoq015hZAstKMxoczWTOUe8A0lZRXjnPHCovxfV2+Sm v6Y+hUl6G0blhj2WMfhk5ryuIP6M5HenCpqQm+dHYMH0g5ugHt5b9NtLZ+CJX4URPtgg pcrzVdZr+AcuI1Fyw9U2JITRbTu1O6+uwT4gzJslWavGzBscRc5XunNiwyRXEur7KnpY V80oIiIfR32pFodG0vwGy7Iysx+Za2rQTKhM+Rp3g6Bho0FubUtfbnfK4NmWGY1yrwTl Pkpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=Mb+GXitK62yQ1M3tJCil58KnWQatqaLnDM8GS4W1DY8=; b=hkRoyCSDxuoX0/dM+HijjVojbmqfqHBMg7YwZJ0E0fgdluww7dcMqhVCs0PjQUcjWE iXCGMKOKqtZRGGBi9noi1/qK0UAuEufbPjPdv09zWem+AhG1ayvG3zF9VW3ZHRGQI0kd M1B24CyJxUj/BTzMXcOOuplyIO/W57VJLDaG04mCQVJD14TadnL9yE8r851B3YeLfGzH f60nj+DqBHMmS1IDs8alYI3nQWfUQbIGqyB4O+NGErd6yABJnkwmu2L0KRk1eXn6UqU0 jJYHXZsrH1rRTK1BFvIVEgtCC1+xkO3bDV3WlHrVQAtuEwNPXVm4KzUnSFIY1LvspTwp thaQ== X-Gm-Message-State: APjAAAUnA7C6WcH6i/CR5taCI7Gqw0srDTnrfnsBtj4Ufe6tpCefNlp8 x79kTjZpdI5sflf2TJskoJMNUQ== X-Received: by 2002:a17:90a:394d:: with SMTP id n13mr2500811pjf.1.1582862679656; Thu, 27 Feb 2020 20:04:39 -0800 (PST) Received: from [100.112.92.218] ([104.133.9.106]) by smtp.gmail.com with ESMTPSA id q21sm9241494pff.105.2020.02.27.20.04.38 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 27 Feb 2020 20:04:38 -0800 (PST) Date: Thu, 27 Feb 2020 20:04:21 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: "Kirill A. Shutemov" cc: Hugh Dickins , Andrew Morton , Yang Shi , Alexander Duyck , "Michael S. Tsirkin" , David Hildenbrand , "Kirill A. Shutemov" , Matthew Wilcox , Andrea Arcangeli , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] huge tmpfs: try to split_huge_page() when punching hole In-Reply-To: <20200227084704.aolem5nktpricrzo@box> Message-ID: References: <20200227084704.aolem5nktpricrzo@box> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 27 Feb 2020, Kirill A. Shutemov wrote: > On Wed, Feb 26, 2020 at 08:06:33PM -0800, Hugh Dickins wrote: > > Yang Shi writes: > > > > Currently, when truncating a shmem file, if the range is partly in a THP > > (start or end is in the middle of THP), the pages actually will just get > > cleared rather than being freed, unless the range covers the whole THP. > > Even though all the subpages are truncated (randomly or sequentially), > > the THP may still be kept in page cache. > > > > This might be fine for some usecases which prefer preserving THP, but > > balloon inflation is handled in base page size. So when using shmem THP > > as memory backend, QEMU inflation actually doesn't work as expected since > > it doesn't free memory. But the inflation usecase really needs to get > > the memory freed. (Anonymous THP will also not get freed right away, > > but will be freed eventually when all subpages are unmapped: whereas > > shmem THP still stays in page cache.) > > > > Split THP right away when doing partial hole punch, and if split fails > > just clear the page so that read of the punched area will return zeroes. > > > > Hugh Dickins adds: > > > > Our earlier "team of pages" huge tmpfs implementation worked in the way > > that Yang Shi proposes; and we have been using this patch to continue to > > split the huge page when hole-punched or truncated, since converting over > > to the compound page implementation. Although huge tmpfs gives out huge > > pages when available, if the user specifically asks to truncate or punch > > a hole (perhaps to free memory, perhaps to reduce the memcg charge), then > > the filesystem should do so as best it can, splitting the huge page. > > I'm still uncomfortable with proposition to use truncate or punch a hole > operations to manage memory footprint. These operations are about managing > storage footprint, not memory. This happens to be the same for tmpfs. I'd slightly reword that as "These operations are mainly about managing storage footprint. This happens to be the same as memory for tmpfs." and then happily agree with it. > > I wounder if we should consider limiting the behaviour to the operation > that explicitly combines memory and storage managing: MADV_REMOVE. I'd strongly oppose letting MADV_REMOVE diverge from FALLOC_FL_PUNCH_HOLE: if it came down to that, I would prefer to revert this patch. > This way we can avoid future misunderstandings with THP backed by a real > filesystem. It's good to consider the implications for hole-punch on a persistent filesystem cached with THPs (or lower order compound pages); but I disagree that they should behave differently from this patch. The hole-punch is fundamentally directed at freeing up the storage, yes; but its page cache must also be removed, otherwise you have the user writing into cache which is not backed by storage, and potentially losing the data later. So a hole must be punched in the compound page in that case too: in fact, it's then much more important that split_huge_page() succeeds - not obvious what the fallback should be if it fails (perhaps in that case the compound page must be kept, but all its pmds removed, and info on holes kept in spare fields of the compound page, to prevent writes and write faults without calling back into the filesystem: soluble, but more work than tmpfs needs today)(and perhaps when that extra work is done, we would choose to rely on it rather than immediately splitting; but it will involve discounting the holes). Hugh