Received: by 2002:a05:6a10:83d0:0:0:0:0 with SMTP id o16csp56258pxh; Thu, 7 Apr 2022 13:51:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzwYA2OQBZLAogiBQNPFDAkbuRpZZRlFvIU5NuvnnWX5rQn9k9l4n6fyPfEmsbU6ftkH8xo X-Received: by 2002:a17:903:2306:b0:154:92f:67c3 with SMTP id d6-20020a170903230600b00154092f67c3mr16153716plh.157.1649364661383; Thu, 07 Apr 2022 13:51:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649364661; cv=none; d=google.com; s=arc-20160816; b=qgpln5V0HM03jm7ZoGVYO361xdQf6tHwS2utLx/qcBUGhkbIgUj7y1XoMkqmVB06gf EA93CSU37scywlzmhyddeQ8Nz1CpsVD7TT1TE0UN8780D37KS5Rfp1m+Gv/93xPw4mU+ iTzwUepmOj7WdclYKvvvm57TN0A/2xBms5CTava3MRrEi+cHpGwQ8HA+4IabCj6R/kM6 svXGU8xPk1bORpSahUC7zsoECI1aC+WCFtpkSkGR8Ey/epTNu3JVasIjRWpslMV1Cwe1 biD+yZTx4kglEY/k1r5NxkcbiTzAxSlEJdD5vG1eNHlmhHnv4iyNlqfLqkUhLFKU4538 nQ9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:subject:cc:to:from :date:references:in-reply-to:message-id:mime-version:user-agent :dkim-signature; bh=LMc1MePKCgmJ6BWBiTZfdBOE50lb9nGCAHHHyzr9x9Y=; b=Hy6S3dGBFA8ipMVgFwj9xEu+MpVlIrJrQvZ8vQzyGMVudtnKcaOsQRPBoAexKgmLKx x36eA8ljPyX1gf28bJj+dNOSinzOCIT33RwYWs5/k2sB+eYdEClxoRwbrVs8URtzrfrb yQW6xIjFpH/PEsK6JKBvMwqhvQK4sgoSU/mze1LYk3lEncY0rVI+jGa071bfcpM9Yx3M bzo3QZR+XBMXT7lLhNn7RLVPpxJZrKzKKXxdQgorrZQRpgqLLdzFhP+R0rmG8QpvN1Xy WEfo/5SrOBb9r3ZzSiGoB4gfqffsglefTrlYyUeZR4V3QzepA2y81v7Exa5xBfXNY/LI WOXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=u9TLtj76; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id y4-20020a056a00190400b004fa8042c398si20608239pfi.190.2022.04.07.13.51.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Apr 2022 13:51:01 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=u9TLtj76; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6A4AB3E4C24; Thu, 7 Apr 2022 12:56:28 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230386AbiDGRMu (ORCPT + 99 others); Thu, 7 Apr 2022 13:12:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346027AbiDGRMh (ORCPT ); Thu, 7 Apr 2022 13:12:37 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BFC301E9637; Thu, 7 Apr 2022 10:10:35 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 5A1666157C; Thu, 7 Apr 2022 17:10:20 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 667DCC385A6; Thu, 7 Apr 2022 17:10:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1649351419; bh=4Bi+StaZdOPFXAdU/bNHFV7mZV6B2itRPcHJI8amDrg=; h=In-Reply-To:References:Date:From:To:Cc:Subject:From; b=u9TLtj76C8m9CC/tH7uQ7MlfCEWIi+QNH/YhOnoZ9xa/e9MdwAaItd74EmSTmNJKj HOr5CIZ7jsLIxtluoCWxg8Ao3Zx+DUVpo4P5PQ1vGpMcDrKfRyLTBGLnd6BLBkkYaZ BYBhwWLsbrQLvaln61cfErbSWkxUEMoyB6idvNPq8Y0+s2XiS+LCdpMxfs2N/5cLoH WN6cIKF8yDkDa37YZGAvsL10YCBchgRdWDMJw7Ez/xSwxKFZSOD1tSm0nuKrPyGsOn w9B1VnQyj/no82a/jOGdxOucrTSdfFvGcU1ABMXjbwjTxx2Np1xrFjgzvAIpM5JOkG Og7zspGQiS5sA== Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailauth.nyi.internal (Postfix) with ESMTP id 43D7427C0054; Thu, 7 Apr 2022 13:10:17 -0400 (EDT) Received: from imap48 ([10.202.2.98]) by compute2.internal (MEProxy); Thu, 07 Apr 2022 13:10:17 -0400 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvvddrudejkedguddutdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefofgggkfgjfhffhffvufgtgfesthhqredtreerjeenucfhrhhomhepfdet nhguhicunfhuthhomhhirhhskhhifdcuoehluhhtoheskhgvrhhnvghlrdhorhhgqeenuc ggtffrrghtthgvrhhnpedvleehjeejvefhuddtgeegffdtjedtffegveethedvgfejieev ieeufeevuedvteenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfh hrohhmpegrnhguhidomhgvshhmthhprghuthhhphgvrhhsohhnrghlihhthidqudduiedu keehieefvddqvdeifeduieeitdekqdhluhhtoheppehkvghrnhgvlhdrohhrgheslhhinh hugidrlhhuthhordhush X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id AC96021E006E; Thu, 7 Apr 2022 13:10:15 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.7.0-alpha0-386-g4174665229-fm-20220406.001-g41746652 Mime-Version: 1.0 Message-Id: <02e18c90-196e-409e-b2ac-822aceea8891@www.fastmail.com> In-Reply-To: References: <20220310140911.50924-1-chao.p.peng@linux.intel.com> <20220310140911.50924-5-chao.p.peng@linux.intel.com> Date: Thu, 07 Apr 2022 10:09:55 -0700 From: "Andy Lutomirski" To: "Sean Christopherson" , "Chao Peng" Cc: "kvm list" , "Linux Kernel Mailing List" , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, "Linux API" , qemu-devel@nongnu.org, "Paolo Bonzini" , "Jonathan Corbet" , "Vitaly Kuznetsov" , "Wanpeng Li" , "Jim Mattson" , "Joerg Roedel" , "Thomas Gleixner" , "Ingo Molnar" , "Borislav Petkov" , "the arch/x86 maintainers" , "H. Peter Anvin" , "Hugh Dickins" , "Jeff Layton" , "J . Bruce Fields" , "Andrew Morton" , "Mike Rapoport" , "Steven Price" , "Maciej S . Szmigiero" , "Vlastimil Babka" , "Vishal Annapurve" , "Yu Zhang" , "Kirill A. Shutemov" , "Nakajima, Jun" , "Dave Hansen" , "Andi Kleen" , "David Hildenbrand" Subject: Re: [PATCH v5 04/13] mm/shmem: Restrict MFD_INACCESSIBLE memory against RLIMIT_MEMLOCK Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 7, 2022, at 9:05 AM, Sean Christopherson wrote: > On Thu, Mar 10, 2022, Chao Peng wrote: >> Since page migration / swapping is not supported yet, MFD_INACCESSIBLE >> memory behave like longterm pinned pages and thus should be accounted= to >> mm->pinned_vm and be restricted by RLIMIT_MEMLOCK. >>=20 >> Signed-off-by: Chao Peng >> --- >> mm/shmem.c | 25 ++++++++++++++++++++++++- >> 1 file changed, 24 insertions(+), 1 deletion(-) >>=20 >> diff --git a/mm/shmem.c b/mm/shmem.c >> index 7b43e274c9a2..ae46fb96494b 100644 >> --- a/mm/shmem.c >> +++ b/mm/shmem.c >> @@ -915,14 +915,17 @@ static void notify_fallocate(struct inode *inod= e, pgoff_t start, pgoff_t end) >> static void notify_invalidate_page(struct inode *inode, struct folio= *folio, >> pgoff_t start, pgoff_t end) >> { >> -#ifdef CONFIG_MEMFILE_NOTIFIER >> struct shmem_inode_info *info =3D SHMEM_I(inode); >> =20 >> +#ifdef CONFIG_MEMFILE_NOTIFIER >> start =3D max(start, folio->index); >> end =3D min(end, folio->index + folio_nr_pages(folio)); >> =20 >> memfile_notifier_invalidate(&info->memfile_notifiers, start, end); >> #endif >> + >> + if (info->xflags & SHM_F_INACCESSIBLE) >> + atomic64_sub(end - start, ¤t->mm->pinned_vm); > > As Vishal's to-be-posted selftest discovered, this is broken as=20 > current->mm may > be NULL. Or it may be a completely different mm, e.g. AFAICT there's=20 > nothing that > prevents a different process from punching hole in the shmem backing. > How about just not charging the mm in the first place? There=E2=80=99s = precedent: ramfs and hugetlbfs (at least sometimes =E2=80=94 I=E2=80=99v= e lost track of the current status). In any case, for an administrator to try to assemble the various rlimits= into a coherent policy is, and always has been, quite messy. ISTM cgrou= p limits, which can actually add across processes usefully, are much bet= ter. So, aside from the fact that these fds aren=E2=80=99t in a filesystem an= d are thus available by default, I=E2=80=99m not convinced that this acc= ounting is useful or necessary. Maybe we could just have some switch require to enable creation of priva= te memory in the first place, and anyone who flips that switch without c= onfiguring cgroups is subject to DoS.