Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp1013067rwe; Thu, 1 Sep 2022 11:01:39 -0700 (PDT) X-Google-Smtp-Source: AA6agR4tgT8PF//Ltx5dcUJGkxnEtYtQTb1PHEBGDXRWYQYqmS8U3Tn9RlU9l61EppVCLTlgvPbM X-Received: by 2002:a17:90a:f416:b0:1fd:c729:bd0b with SMTP id ch22-20020a17090af41600b001fdc729bd0bmr314649pjb.171.1662055299141; Thu, 01 Sep 2022 11:01:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1662055299; cv=none; d=google.com; s=arc-20160816; b=Mtayql29leSOnD/Q0op5XP08z/RKtbN2tEIvPC6jVhqH+KDBWrIDGuqozC7pLafckw YLN99B/ioV/537hPaaSTx+hRjXef21S4YZaPabxFYKL+YiAPms6QavXFSYhoQ/2lqrNR v0VMPwOPxT5vsNVBKOoyjfeHyhYl2TdhFjQpEDTI5mq40CMo+DgTOBm1jR6qiAl/LO1s uaLcWB7IIroxuz/c91HcS4PCrxyW0BQaiChOxQWj/GqTBId1iT5Mi/l62sEYiHN2DxYU 1/iD/xCerrhXRrSW8nUVeSr+4vS9kAQ+0VJ4Hkh0Oj4ft4eJVG0K2iK87j9PIfOkCSy1 Tj6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=eIMt0K4Dq/+E4lj9SQ/fnMIDQQYvNY/8MPu6+maRPr0=; b=TTiW3rjsiu4j4/ah2peIfrzdQhdoB0bLAZtkxLJjbNpVYTJBh0GErtMa/YfKUg/Ui4 IoQeBs2QVQspeNxEC8U9vsxWefBkt38tCX6SBi+qnBD3iLMt93+j4fKI5XcrZ9I5TlYU 02sFuL14CpGCjdxCEadKVlsJACbcd5MAnK+S5012LfaXMpxEHZw0AmnIR0PM+mpjKPIg Ehuci5Zdrx67TwczGGerVxujFaoTamHKPoHHc1+nvV42xqbe4acxZg+pQUcXNzu23Z49 5M6Kn8F4lYCGNTTVlAkFzje0YeMUtrE4uJ9xJPXfqcmT/WqN8w2Ezt+nWoHbcIm3JvPt 7M0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="OULzLK//"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l186-20020a6388c3000000b0042c3dd6e3ecsi7994565pgd.527.2022.09.01.11.01.23; Thu, 01 Sep 2022 11:01:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="OULzLK//"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234850AbiIARgf (ORCPT + 99 others); Thu, 1 Sep 2022 13:36:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44730 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233250AbiIARfj (ORCPT ); Thu, 1 Sep 2022 13:35:39 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4851548C83 for ; Thu, 1 Sep 2022 10:35:35 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-335ff2ef600so235531877b3.18 for ; Thu, 01 Sep 2022 10:35:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date; bh=eIMt0K4Dq/+E4lj9SQ/fnMIDQQYvNY/8MPu6+maRPr0=; b=OULzLK//lObTRMGAKryDz/AAItdyiK5yqhZqIIxu+sL1Dieokt2jidmnz3C7/IkvFZ m9XlR/LP/dWBo2UxnXyYn7qdVCzxgLbSFD2CG9EOYY1gruRvre30gd8M27oZrFHurKvO KqLi6QjNq7f2EcLrXhJZo+HJMeLkP9RVRxlBnBeo9kXpz7L5PtGPdJWGZy8ptDnDibsD BU++Zg9dqZVio0ctwQpkNxICzBLOlMkeBzfVJBfO5ajkBVunOWr6F+myD0p5P9YWiss6 hO75UDowCTjrDDdzDEi22CFFlp+R1gx4mTcBiehvp3OGDbyQbHz6/QfMRzJJk1Rc6B1v GPRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date; bh=eIMt0K4Dq/+E4lj9SQ/fnMIDQQYvNY/8MPu6+maRPr0=; b=0zh6HWvC2VGSaNYS+u3GT4M/2kZGOpRSKuMSbgcUcD8L3uSNs4uNahp+lGZLR7xfYt BZHDJR8tdLoH8vqG0ZGUhcim/l9hCsy86OPa3WCBp/z1TUVdMN2Aq0Kk5uhm0ZwLERjd u8bfFg9mQDX2RnHFFTXWBFLSKXNdewiAbLhOiCcFmvZ8IdS5yOskyXbGDYHMYsY15nRt 1VnG4FR0j7+7cDD8N17T0eeqr3v3Z/kirrs88z6ZGYxOmF8zzY5JhL+EVl8N9eeh8jyM rxj8FPV0ToXokgr/R2B/mT7xQbvGEWa3iTTG6/WWkxnlRUK+RkSld+W4016Sb6uMptJ1 Ojeg== X-Gm-Message-State: ACgBeo1K1g8kGrvcd7DT3Q4oKlteb0vbiYXLpWYOq+XuRVkXzYd9Psnz 7nwOzo3aN7kPGKmPdeePRDfpmlydypOHeQHx2gK2rVoszIK6VRCsjt5vsYu1AkL4XJ2IlXG+qNg pz1ke6bdFRu31o0dDH/7Q X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:1bfc:e7ee:6530:4449]) (user=surenb job=sendgmr) by 2002:a0d:ff86:0:b0:341:5844:5527 with SMTP id p128-20020a0dff86000000b0034158445527mr14770867ywf.504.1662053734514; Thu, 01 Sep 2022 10:35:34 -0700 (PDT) Date: Thu, 1 Sep 2022 10:34:53 -0700 In-Reply-To: <20220901173516.702122-1-surenb@google.com> Mime-Version: 1.0 References: <20220901173516.702122-1-surenb@google.com> X-Mailer: git-send-email 2.37.2.789.g6183377224-goog Message-ID: <20220901173516.702122-6-surenb@google.com> Subject: [RFC PATCH RESEND 05/28] mm: add per-VMA lock and helper functions to control it From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@suse.de, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, paulmck@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, rientjes@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, surenb@google.com, kernel-team@android.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-ccpol: medium X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Introduce a per-VMA rw_semaphore to be used during page fault handling instead of mmap_lock. Because there are cases when multiple VMAs need to be exclusively locked during VMA tree modifications, instead of the usual lock/unlock patter we mark a VMA as locked by taking per-VMA lock exclusively and setting vma->lock_seq to the current mm->lock_seq. When mmap_write_lock holder is done with all modifications and drops mmap_lock, it will increment mm->lock_seq, effectively unlocking all VMAs marked as locked. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 78 +++++++++++++++++++++++++++++++++++++++ include/linux/mm_types.h | 7 ++++ include/linux/mmap_lock.h | 13 +++++++ kernel/fork.c | 4 ++ mm/init-mm.c | 3 ++ 5 files changed, 105 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 7d322a979455..476bf936c5f0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -611,6 +611,83 @@ struct vm_operations_struct { unsigned long addr); }; +#ifdef CONFIG_PER_VMA_LOCK +static inline void vma_init_lock(struct vm_area_struct *vma) +{ + init_rwsem(&vma->lock); + vma->vm_lock_seq = -1; +} + +static inline void vma_mark_locked(struct vm_area_struct *vma) +{ + int mm_lock_seq; + + mmap_assert_write_locked(vma->vm_mm); + + /* + * current task is holding mmap_write_lock, both vma->vm_lock_seq and + * mm->mm_lock_seq can't be concurrently modified. + */ + mm_lock_seq = READ_ONCE(vma->vm_mm->mm_lock_seq); + if (vma->vm_lock_seq == mm_lock_seq) + return; + + down_write(&vma->lock); + vma->vm_lock_seq = mm_lock_seq; + up_write(&vma->lock); +} + +static inline bool vma_read_trylock(struct vm_area_struct *vma) +{ + if (unlikely(down_read_trylock(&vma->lock) == 0)) + return false; + + /* + * Overflow might produce false locked result but it's not critical. + * False unlocked result is critical but is impossible because we + * modify and check vma->vm_lock_seq under vma->lock protection and + * mm->mm_lock_seq modification invalidates all existing locks. + */ + if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq)) { + up_read(&vma->lock); + return false; + } + return true; +} + +static inline void vma_read_unlock(struct vm_area_struct *vma) +{ + up_read(&vma->lock); +} + +static inline void vma_assert_locked(struct vm_area_struct *vma) +{ + lockdep_assert_held(&vma->lock); + VM_BUG_ON_VMA(!rwsem_is_locked(&vma->lock), vma); +} + +static inline void vma_assert_write_locked(struct vm_area_struct *vma, int pos) +{ + mmap_assert_write_locked(vma->vm_mm); + /* + * current task is holding mmap_write_lock, both vma->vm_lock_seq and + * mm->mm_lock_seq can't be concurrently modified. + */ + VM_BUG_ON_VMA(vma->vm_lock_seq != READ_ONCE(vma->vm_mm->mm_lock_seq), vma); +} + +#else /* CONFIG_PER_VMA_LOCK */ + +static inline void vma_init_lock(struct vm_area_struct *vma) {} +static inline void vma_mark_locked(struct vm_area_struct *vma) {} +static inline bool vma_read_trylock(struct vm_area_struct *vma) + { return false; } +static inline void vma_read_unlock(struct vm_area_struct *vma) {} +static inline void vma_assert_locked(struct vm_area_struct *vma) {} +static inline void vma_assert_write_locked(struct vm_area_struct *vma, int pos) {} + +#endif /* CONFIG_PER_VMA_LOCK */ + static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) { static const struct vm_operations_struct dummy_vm_ops = {}; @@ -619,6 +696,7 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) vma->vm_mm = mm; vma->vm_ops = &dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); + vma_init_lock(vma); } static inline void vma_set_anonymous(struct vm_area_struct *vma) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index bed25ef7c994..6a03f59c1e78 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -486,6 +486,10 @@ struct vm_area_struct { struct mempolicy *vm_policy; /* NUMA policy for the VMA */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_PER_VMA_LOCK + struct rw_semaphore lock; + int vm_lock_seq; +#endif } __randomize_layout; struct kioctx_table; @@ -567,6 +571,9 @@ struct mm_struct { * init_mm.mmlist, and are protected * by mmlist_lock */ +#ifdef CONFIG_PER_VMA_LOCK + int mm_lock_seq; +#endif unsigned long hiwater_rss; /* High-watermark of RSS usage */ diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index e49ba91bb1f0..a391ae226564 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -72,6 +72,17 @@ static inline void mmap_assert_write_locked(struct mm_struct *mm) VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); } +#ifdef CONFIG_PER_VMA_LOCK +static inline void vma_mark_unlocked_all(struct mm_struct *mm) +{ + mmap_assert_write_locked(mm); + /* No races during update due to exclusive mmap_lock being held */ + WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1); +} +#else +static inline void vma_mark_unlocked_all(struct mm_struct *mm) {} +#endif + static inline void mmap_init_lock(struct mm_struct *mm) { init_rwsem(&mm->mmap_lock); @@ -114,12 +125,14 @@ static inline bool mmap_write_trylock(struct mm_struct *mm) static inline void mmap_write_unlock(struct mm_struct *mm) { __mmap_lock_trace_released(mm, true); + vma_mark_unlocked_all(mm); up_write(&mm->mmap_lock); } static inline void mmap_write_downgrade(struct mm_struct *mm) { __mmap_lock_trace_acquire_returned(mm, false, true); + vma_mark_unlocked_all(mm); downgrade_write(&mm->mmap_lock); } diff --git a/kernel/fork.c b/kernel/fork.c index 614872438393..bfab31ecd11e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -475,6 +475,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig) */ *new = data_race(*orig); INIT_LIST_HEAD(&new->anon_vma_chain); + vma_init_lock(new); new->vm_next = new->vm_prev = NULL; dup_anon_vma_name(orig, new); } @@ -1130,6 +1131,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, seqcount_init(&mm->write_protect_seq); mmap_init_lock(mm); INIT_LIST_HEAD(&mm->mmlist); +#ifdef CONFIG_PER_VMA_LOCK + WRITE_ONCE(mm->mm_lock_seq, 0); +#endif mm_pgtables_bytes_init(mm); mm->map_count = 0; mm->locked_vm = 0; diff --git a/mm/init-mm.c b/mm/init-mm.c index fbe7844d0912..8399f90d631c 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -37,6 +37,9 @@ struct mm_struct init_mm = { .page_table_lock = __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock), .arg_lock = __SPIN_LOCK_UNLOCKED(init_mm.arg_lock), .mmlist = LIST_HEAD_INIT(init_mm.mmlist), +#ifdef CONFIG_PER_VMA_LOCK + .mm_lock_seq = 0, +#endif .user_ns = &init_user_ns, .cpu_bitmap = CPU_BITS_NONE, #ifdef CONFIG_IOMMU_SVA -- 2.37.2.789.g6183377224-goog