Received: by 2002:a25:ca44:0:0:0:0:0 with SMTP id a65csp1371478ybg; Wed, 29 Jul 2020 12:22:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzjpQ/jQZD7QzgZVoPG+hujjttHTvZA6NuNcjt/qZa1rha2Ob2IgxieOTjSXCe3I/+n0HDH X-Received: by 2002:a17:906:74d0:: with SMTP id z16mr16472122ejl.51.1596050537093; Wed, 29 Jul 2020 12:22:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1596050537; cv=none; d=google.com; s=arc-20160816; b=aJcDErQc0HPkTPezeNY8UxY3hkbL4iLyp762hlikzaG1kKd7bsOybBsdQ9KVY66TWu V9yUduk+yrCZ7k2HzAoQuS8DJvgx7JxdS+GOmG75yUZlFQllC1Gd/mikBDdEKHmVIxKZ jTxJ4S283QMGKKc6LcU5eaEuSddMYpvibBWXl7ChPgD+ZQEwqDc+cp5jUTspOrcRNyLo kFinsHiiyjYnUYTYXrMaclUu39Cuqi/GgyrqsnuKX+BeuGPxCqehHYPSla9Fpmh/Gd8h Vk0qvQZX/YGWqwc/iAXSeotNE4PwoBiMyg2mEcdurySiM96UhoTdug7x+I+u+R6lIRZ/ 9LHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=xf8y6+QE7u4z9tDYoWpKfnyILr6WJGhsLpbcNKIkelU=; b=Z+LWsI/PhpVQY6m+jgXI7PEpG0qGl0j8ealC2dVuyVT0pNf3OSH+C4anslxsOIxYJj 0ZB9/NkxBp0zEcd3p06sMoVcXYRy0H/BtADAdC4HGk2pArlHdnteUMooqqC9+8URwKRE aDyl9RxMMIbbEpvTXp+t2Y7qPNJdmodo0XilfcT4ZiHM/Lv/nsomjooC9M5AfA3ik6xY Z8uiZH9XdqNungEmqrvbyQeMrlytzGYKrdrbk7YX2MmAg82Mz677W6QacysohxlAzSfJ Nj3X739uouEFijBmB5BI5BQNxQitalKa3Lx5nza95XFRddFFeuF5k0yi+OczESKJtmAl 5aZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=n8so8KYJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h15si1621123ejq.222.2020.07.29.12.21.54; Wed, 29 Jul 2020 12:22:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=n8so8KYJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726772AbgG2TVT (ORCPT + 99 others); Wed, 29 Jul 2020 15:21:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726581AbgG2TVT (ORCPT ); Wed, 29 Jul 2020 15:21:19 -0400 Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11934C0619D2 for ; Wed, 29 Jul 2020 12:21:19 -0700 (PDT) Received: by mail-qk1-x744.google.com with SMTP id h7so23339006qkk.7 for ; Wed, 29 Jul 2020 12:21:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=xf8y6+QE7u4z9tDYoWpKfnyILr6WJGhsLpbcNKIkelU=; b=n8so8KYJL7JTmU4FRh9v0OQdg1omhOYvRSE92qmDvKlITdi/LJg3OMvlrRg7KCISxw GDiEDm6vbZDm4z7zCnQ9gOZen6GBUgOVlhajnAsr/bMFrhTaaNfLHX6ukTqQc3u9ykaz vpfHcgO1dQPjfiB4BBf0JmXcZAONXwPrRp/fHP/7yMRdFvHzaxud620vIIfCJkM31y/u L9Vuw+e+5PV9CF1QdpRziQUdRphSv2XcZkdlItKQQib8zHIYmj31ZVVNPDcL7HUx1bYa D4ZPuZTk8pwt0T18C2F5ftoc4JBv5FbT7S2naq/vJtjg8z0HqS+SFgPElReYcxMN1Ewn bkOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=xf8y6+QE7u4z9tDYoWpKfnyILr6WJGhsLpbcNKIkelU=; b=fj3MIjEDDXk7bFZWLcTJeH2OB3oFV4eGAQiArDRZhMpT/ZDX5zh0raNyZQyQsG8zt9 iiTB1hBAWJR8ZQTV7S4iwLwcOHG9rtaQCe78Qm8PGpQyEKpUBISuEkXAkvQSpdHMWJ2J iO2bpdIrZxW/ZLvRmR9dmdU/tTrIPYxRa3BJBc3IdinUHRViE9KjNeprtbx2LETXeAGL 6uQMQSQQufsHUJz6L4vhlFt09S6cK2qZXnbjCLUlDsqQUk2PNQr5WTaw+DFpQj3lt/Hv /y9U/40jEDvPSanPnE8VfEOU8H1RB3lxk5von5YtCXmT1x+7IT5tmSfQBiuDLQkSiwmu w9Sg== X-Gm-Message-State: AOAM532as62BgzeuNhGvXUHgE9P/Tvmfck+JDRd/lUNs0FWwEqxv+YFQ Tc2i0G0EvKzhTw5LwqCPKu5shQ== X-Received: by 2002:a37:b307:: with SMTP id c7mr35592806qkf.307.1596050477865; Wed, 29 Jul 2020 12:21:17 -0700 (PDT) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id s128sm2212336qkd.108.2020.07.29.12.21.12 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Wed, 29 Jul 2020 12:21:16 -0700 (PDT) Date: Wed, 29 Jul 2020 12:21:11 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Pengfei Li cc: akpm@linux-foundation.org, bmt@zurich.ibm.com, dledford@redhat.com, willy@infradead.org, vbabka@suse.cz, kirill.shutemov@linux.intel.com, jgg@ziepe.ca, alex.williamson@redhat.com, cohuck@redhat.com, daniel.m.jordan@oracle.com, dbueso@suse.de, jglisse@redhat.com, jhubbard@nvidia.com, ldufour@linux.ibm.com, Liam.Howlett@oracle.com, peterz@infradead.org, cl@linux.com, jack@suse.cz, rientjes@google.com, walken@google.com, hughd@google.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 2/2] mm, util: account_locked_vm() does not hold mmap_lock In-Reply-To: <20200726080224.205470-2-fly@kernel.page> Message-ID: References: <20200726080224.205470-1-fly@kernel.page> <20200726080224.205470-2-fly@kernel.page> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 26 Jul 2020, Pengfei Li wrote: > Since mm->locked_vm is already an atomic counter, account_locked_vm() > does not need to hold mmap_lock. I am worried that this patch, already added to mmotm, along with its 1/2 making locked_vm an atomic64, might be rushed into v5.9 with just that two-line commit description, and no discussion at all. locked_vm belongs fundamentally to mm/mlock.c, and the lock to guard it is mmap_lock; and mlock() has some complicated stuff to do under that lock while it decides how to adjust locked_vm. It is very easy to convert an unsigned long to an atomic64_t, but "atomic read, check limit and do stuff, atomic add" does not give the same guarantee as holding the right lock around it all. (At the very least, __account_locked_vm() in 1/2 should be changed to replace its atomic64_add by an atomic64_cmpxchg, to enforce the limit that it just checked. But that will be no more than lipstick on a pig, when the right lock that everyone else agrees upon is not being held.) Now, it can be argued that our locked_vm and pinned_vm maintenance is so random and deficient, and too difficult to keep right across a sprawl of drivers, that we should just be grateful for those that do volunteer to subject themselves to RLIMIT_MEMLOCK limitation, and never mind if it's a little racy. And it may well be that all those who have made considerable efforts in the past to improve the situation, have more interesting things to devote their time to, and would prefer not to get dragged back here. But let's at least give this a little more visibility, and hope to hear opinions one way or the other from those who care. Hugh > > Signed-off-by: Pengfei Li > --- > drivers/vfio/vfio_iommu_type1.c | 8 ++------ > mm/util.c | 15 +++------------ > 2 files changed, 5 insertions(+), 18 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 78013be07fe7..53818fce78a6 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -376,12 +376,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) > if (!mm) > return -ESRCH; /* process exited */ > > - ret = mmap_write_lock_killable(mm); > - if (!ret) { > - ret = __account_locked_vm(mm, abs(npage), npage > 0, dma->task, > - dma->lock_cap); > - mmap_write_unlock(mm); > - } > + ret = __account_locked_vm(mm, abs(npage), npage > 0, > + dma->task, dma->lock_cap); > > if (async) > mmput(mm); > diff --git a/mm/util.c b/mm/util.c > index 473add0dc275..320fdd537aea 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -424,8 +424,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack) > * @task: task used to check RLIMIT_MEMLOCK > * @bypass_rlim: %true if checking RLIMIT_MEMLOCK should be skipped > * > - * Assumes @task and @mm are valid (i.e. at least one reference on each), and > - * that mmap_lock is held as writer. > + * Assumes @task and @mm are valid (i.e. at least one reference on each). > * > * Return: > * * 0 on success > @@ -437,8 +436,6 @@ int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc, > unsigned long locked_vm, limit; > int ret = 0; > > - mmap_assert_write_locked(mm); > - > locked_vm = atomic64_read(&mm->locked_vm); > if (inc) { > if (!bypass_rlim) { > @@ -476,17 +473,11 @@ EXPORT_SYMBOL_GPL(__account_locked_vm); > */ > int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc) > { > - int ret; > - > if (pages == 0 || !mm) > return 0; > > - mmap_write_lock(mm); > - ret = __account_locked_vm(mm, pages, inc, current, > - capable(CAP_IPC_LOCK)); > - mmap_write_unlock(mm); > - > - return ret; > + return __account_locked_vm(mm, pages, inc, > + current, capable(CAP_IPC_LOCK)); > } > EXPORT_SYMBOL_GPL(account_locked_vm); > > -- > 2.26.2