Received: by 2002:a05:6358:a55:b0:ec:fcf4:3ecf with SMTP id 21csp5677679rwb; Tue, 17 Jan 2023 17:26:39 -0800 (PST) X-Google-Smtp-Source: AMrXdXuFQ5besUNCu5QTYYaN9q/J7cF4OHAvmRlxHQyAXBRrkO9d2tI4ocPtP3s+uckVMnZDM4Ti X-Received: by 2002:a17:906:c58:b0:78d:f456:1ed0 with SMTP id t24-20020a1709060c5800b0078df4561ed0mr18878822ejf.33.1674005199341; Tue, 17 Jan 2023 17:26:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674005199; cv=none; d=google.com; s=arc-20160816; b=oQppNCJwAvvNnWE8hD7UxH54D7Y0FjYpZCDQwm5ZvZfinpmcwnvS718+eMe9Qj5ICF Lr/FVFIELUDy4J0UwjS0AAbd22x2Kf7IMtkxPG5DV17Dl6D0wPq0xeXgxA3MAwO43SBU q9Yi2tgo1IwCj/Pjg4jOAXY3SKLC0BBNWlUKkfO+ZSKEpNUgTtXMGdlZIF6v0MsDBLzo DcnbCxgQXMhpzSL6uelIjnhFzJNJgbNd0wzyQwGhMtIByhiWjqUvWhyAhDCo3uY5H58D VNfWkZxjCq1kpmU/7Kl/KONtA9lGjK9DDdWqst6WtfIrmMpGzG5LjESSWQlos30P/K8z mS3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=+UzsDlXlC7t5BIJkt4S1C9pOgCAtRZ1rD2tVfs3+9vM=; b=swzT+xn+tdLL0As/y5y7dRFg0ea6Ha+Fm170MX3aEtkW3+xykWi60rPX4k8K+BqGsq OZZbdXejh5nTRbAzCuFIEgyXuoPvn464jcr0FQzrVt8UNh7oyNlh9squaXrQjdAPzIdE frzut73rleLlr6qiirBgGEekPl1Eu/Bj2Ic1/b/uTb155DFsMTToWAAi51rYJx4Bs5g3 WXuMaV24nzWKKq0/f5pxvIQ6YQISHD+dUNh1+IVyKJHAQvKzXqcX/KJm01NzgB7scrml tXxc4rp11muaNP49FCx6rr+5Fk/QGKjJYduN8rhw7zYBTudOstrt0nWgxZs8RCnnLE7d ZTww== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=HR7OBma7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o21-20020a170906975500b007c18706010asi40424981ejy.753.2023.01.17.17.26.28; Tue, 17 Jan 2023 17:26:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=HR7OBma7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229913AbjARBNg (ORCPT + 46 others); Tue, 17 Jan 2023 20:13:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229898AbjARBNN (ORCPT ); Tue, 17 Jan 2023 20:13:13 -0500 Received: from mail-yw1-x1134.google.com (mail-yw1-x1134.google.com [IPv6:2607:f8b0:4864:20::1134]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 935D03C2E for ; Tue, 17 Jan 2023 17:07:09 -0800 (PST) Received: by mail-yw1-x1134.google.com with SMTP id 00721157ae682-4e4a6af2d99so128450497b3.4 for ; Tue, 17 Jan 2023 17:07:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=+UzsDlXlC7t5BIJkt4S1C9pOgCAtRZ1rD2tVfs3+9vM=; b=HR7OBma7oNcSAgyQdahWkS/+GYEiMo2K7u7SH7XKaEtDUcUden1EUbI6LD9VVF7J1x W7lI2euLfggvfs2V0JWHWkHe3qxzN6ohS9ExKzTaJMGOQSZuOKR9RScixndA7s2QYaQ6 HfUWv9VD3eHjySCGt6xoaLyVVtj2EEONzfERHlbh0iPGw2sCO/lcesCY2eTVPQRoMJb+ gqR21sjs3J+x5zerViL5YkWvVAzNFkjnCi2vD95FzbGwg3CJYzMrtPoGRP4w9LZjSUIu gRLpF7+BYQnoKYJBu29968t+KMhHg8v34Cty12VpH/W2gIUx8Z32jLwQzC5wkSaXhVBr /4kA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+UzsDlXlC7t5BIJkt4S1C9pOgCAtRZ1rD2tVfs3+9vM=; b=Ou/F4nyyDKsg+aL+NPV0Umxp5z2J2LRkP3hK/Kgj0G8FOXbdMlQkJHfIm6EGzp3BNt /COSP8zv8SogQGoScNIbEyPn550Me3sALBipcpdJ9+lAsyNB7U9lA/7uiPYRcZ2jnifa uxHSIFnUF1+wNYLxr1svNGLhVucX8+C2rBO3fs9uHu3xzwJaZPibBo3VtTaTsS2QYwv/ f9MO75rB+3eK1gV+YLGowUUqegJ59u5rhmc824eiY6aR3uaJcBW5eMPbCgQ0i9dB7nQe xG0zc1r3Qs2cVoE4oSq7H8veX8Vpm3VtGDhRAtXHfHUt9yg0WqmUBq0unIRPTFDr9bW0 sVXQ== X-Gm-Message-State: AFqh2kp68jAdbI8TYt9DP0kjQ1x3pjo5uW+qal3dCZhbJFw6Xa2kAldV 58YAZt/q5gA1C71L1+yywro5QzQJqxHDa1Gzzrg26g== X-Received: by 2002:a81:6d8d:0:b0:490:89c3:21b0 with SMTP id i135-20020a816d8d000000b0049089c321b0mr672398ywc.132.1674004028426; Tue, 17 Jan 2023 17:07:08 -0800 (PST) MIME-Version: 1.0 References: <20230109205336.3665937-1-surenb@google.com> <20230109205336.3665937-29-surenb@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Tue, 17 Jan 2023 17:06:57 -0800 Message-ID: Subject: Re: [PATCH 28/41] mm: introduce lock_vma_under_rcu to be used from arch-specific code To: Michal Hocko Cc: akpm@linux-foundation.org, michel@lespinasse.org, jglisse@google.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, paulmck@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, hughlynch@google.com, leewalsh@google.com, posk@google.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 17, 2023 at 7:47 AM Michal Hocko wrote: > > On Mon 09-01-23 12:53:23, Suren Baghdasaryan wrote: > > Introduce lock_vma_under_rcu function to lookup and lock a VMA during > > page fault handling. When VMA is not found, can't be locked or changes > > after being locked, the function returns NULL. The lookup is performed > > under RCU protection to prevent the found VMA from being destroyed before > > the VMA lock is acquired. VMA lock statistics are updated according to > > the results. > > For now only anonymous VMAs can be searched this way. In other cases the > > function returns NULL. > > Could you describe why only anonymous vmas are handled at this stage and > what (roughly) has to be done to support other vmas? lock_vma_under_rcu > doesn't seem to have any anonymous vma specific requirements AFAICS. TBH I haven't spent too much time looking into file-backed page faults yet but a couple of tasks I can think of are: - Ensure that all vma->vm_ops->fault() handlers do not rely on mmap_lock being read-locked; - vma->vm_file freeing like VMA freeing will need to be done after RCU grace period since page fault handlers use it. This will require some caution because simply adding it into __vm_area_free() called via call_rcu() will cause corresponding fops->release() to be called asynchronously. I had to solve this issue with out-of-tree SPF implementation when asynchronously called snd_pcm_release() was problematic. I'm sure I'm missing more potential issues and maybe Matthew and Michel can pinpoint more things to resolve here? > > Also isn't lock_vma_under_rcu effectively find_read_lock_vma? Not that > the naming is really the most important part but the rcu locking is > internal to the function so why should we spread this implementation > detail to the world... I wanted the name to indicate that the lookup is done with no locks held. But I'm open to suggestions. > > > Signed-off-by: Suren Baghdasaryan > > --- > > include/linux/mm.h | 3 +++ > > mm/memory.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 54 insertions(+) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index c464fc8a514c..d0fddf6a1de9 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -687,6 +687,9 @@ static inline void vma_assert_no_reader(struct vm_area_struct *vma) > > vma); > > } > > > > +struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > > + unsigned long address); > > + > > #else /* CONFIG_PER_VMA_LOCK */ > > > > static inline void vma_init_lock(struct vm_area_struct *vma) {} > > diff --git a/mm/memory.c b/mm/memory.c > > index 9ece18548db1..a658e26d965d 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -5242,6 +5242,57 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > > } > > EXPORT_SYMBOL_GPL(handle_mm_fault); > > > > +#ifdef CONFIG_PER_VMA_LOCK > > +/* > > + * Lookup and lock a VMA under RCU protection. Returned VMA is guaranteed to be > > + * stable and not isolated. If the VMA is not found or is being modified the > > + * function returns NULL. > > + */ > > +struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > > + unsigned long address) > > +{ > > + MA_STATE(mas, &mm->mm_mt, address, address); > > + struct vm_area_struct *vma, *validate; > > + > > + rcu_read_lock(); > > + vma = mas_walk(&mas); > > +retry: > > + if (!vma) > > + goto inval; > > + > > + /* Only anonymous vmas are supported for now */ > > + if (!vma_is_anonymous(vma)) > > + goto inval; > > + > > + if (!vma_read_trylock(vma)) > > + goto inval; > > + > > + /* Check since vm_start/vm_end might change before we lock the VMA */ > > + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) { > > + vma_read_unlock(vma); > > + goto inval; > > + } > > + > > + /* Check if the VMA got isolated after we found it */ > > + mas.index = address; > > + validate = mas_walk(&mas); > > + if (validate != vma) { > > + vma_read_unlock(vma); > > + count_vm_vma_lock_event(VMA_LOCK_MISS); > > + /* The area was replaced with another one. */ > > + vma = validate; > > + goto retry; > > + } > > + > > + rcu_read_unlock(); > > + return vma; > > +inval: > > + rcu_read_unlock(); > > + count_vm_vma_lock_event(VMA_LOCK_ABORT); > > + return NULL; > > +} > > +#endif /* CONFIG_PER_VMA_LOCK */ > > + > > #ifndef __PAGETABLE_P4D_FOLDED > > /* > > * Allocate p4d page table. > > -- > > 2.39.0 > > -- > Michal Hocko > SUSE Labs