Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp3200839yba; Tue, 16 Apr 2019 06:48:10 -0700 (PDT) X-Google-Smtp-Source: APXvYqwtRfr0UXS/tslRY70hVr1aytLPB5JRcBpPM+NBX4ftAZ/OZsTyr/R0hI5r60BcWr0qpV// X-Received: by 2002:a17:902:7b96:: with SMTP id w22mr82691487pll.28.1555422490001; Tue, 16 Apr 2019 06:48:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555422489; cv=none; d=google.com; s=arc-20160816; b=cOtaRqKZ6dD9AWk0aULP+f5uuypbAbehzVxTBvTfIG5iWz/2S5M2n4hpRc/YsvWuAK 4c0QUUwpqR4sy1Mj/o8G/VpX2YLxIm4QGXwR4dzq+bTMwmYY6pn8dHEut79yCMLGSpLQ 5Ij0DURtOipkCHQe5vrYZwyJs4T/lkCXCgHPy7Jwk8fU69k4LFWQoc287DWHMLKxXBnI UpjKHANsUKI6BDBwQ7Q1aArVbsGflXRG7QKV8VeAxIkeCHoLZDyMIq7WIT18p68BOsbe O8UWvo263y/9lzKarA3lsyXUSN6CKLTGnERNMAQzyzBJjlfcKODW/M7tnkGSdFfe4iK/ bBJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :mime-version:references:in-reply-to:date:subject:cc:to:from; bh=ZfEwFFloPr/8tc9stcBjbN7acEwKiBqAEeq/s4rwt0o=; b=DILQYexXyYadfNZEn4wNBj/TVXY1mvoUT8Yae5ne85APJRxUJKeAC+KYgtef9tT3D+ vUQZ7eeuo2E1MIUplotb4JsGew+CY7f9QmaYxN9NavhUFKe2Ltrb67NyfR/DfDcyOlbA j3d0PsjZqRjY+KVcPhO2aVAyIh63fYwxG9oGWLyZRw8uEVEeHV66srZ9tzRIpdwlB2y0 yqXQ1YDoNHCQoYBgpYDSsjttWYeKrob/hmTJKcFIytdzG+uyCs3bzt1Xl/iCsrWkDu9K ybGYlYDUp8E2yNG0+xvW3uUtbBdQLInZGm/R496hjoHqMp60vmme7i/2eLfonoLtXay1 xEsA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j36si9160570plb.210.2019.04.16.06.47.53; Tue, 16 Apr 2019 06:48:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729512AbfDPNq6 (ORCPT + 99 others); Tue, 16 Apr 2019 09:46:58 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:50312 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729103AbfDPNq4 (ORCPT ); Tue, 16 Apr 2019 09:46:56 -0400 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x3GDknFt059395 for ; Tue, 16 Apr 2019 09:46:55 -0400 Received: from e06smtp04.uk.ibm.com (e06smtp04.uk.ibm.com [195.75.94.100]) by mx0a-001b2d01.pphosted.com with ESMTP id 2rwe36nyfj-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 16 Apr 2019 09:46:52 -0400 Received: from localhost by e06smtp04.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 16 Apr 2019 14:46:31 +0100 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp04.uk.ibm.com (192.168.101.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 16 Apr 2019 14:46:22 +0100 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x3GDkLXG49545466 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 16 Apr 2019 13:46:21 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 06E084C044; Tue, 16 Apr 2019 13:46:21 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8661E4C040; Tue, 16 Apr 2019 13:46:18 +0000 (GMT) Received: from nimbus.lab.toulouse-stg.fr.ibm.com (unknown [9.101.4.33]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 16 Apr 2019 13:46:18 +0000 (GMT) From: Laurent Dufour To: akpm@linux-foundation.org, mhocko@kernel.org, peterz@infradead.org, kirill@shutemov.name, ak@linux.intel.com, dave@stgolabs.net, jack@suse.cz, Matthew Wilcox , aneesh.kumar@linux.ibm.com, benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org, Thomas Gleixner , Ingo Molnar , hpa@zytor.com, Will Deacon , Sergey Senozhatsky , sergey.senozhatsky.work@gmail.com, Andrea Arcangeli , Alexei Starovoitov , kemi.wang@intel.com, Daniel Jordan , David Rientjes , Jerome Glisse , Ganesh Mahendran , Minchan Kim , Punit Agrawal , vinayak menon , Yang Shi , zhong jiang , Haiyan Song , Balbir Singh , sj38.park@gmail.com, Michel Lespinasse , Mike Rapoport Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, haren@linux.vnet.ibm.com, npiggin@gmail.com, paulmck@linux.vnet.ibm.com, Tim Chen , linuxppc-dev@lists.ozlabs.org, x86@kernel.org Subject: [PATCH v12 19/31] mm: protect the RB tree with a sequence lock Date: Tue, 16 Apr 2019 15:45:10 +0200 X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190416134522.17540-1-ldufour@linux.ibm.com> References: <20190416134522.17540-1-ldufour@linux.ibm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 x-cbid: 19041613-0016-0000-0000-0000026F7277 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19041613-0017-0000-0000-000032CBBD8F Message-Id: <20190416134522.17540-20-ldufour@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-04-16_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904160093 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Introducing a per mm_struct seqlock, mm_seq field, to protect the changes made in the MM RB tree. This allows to walk the RB tree without grabbing the mmap_sem, and on the walk is done to double check that sequence counter was stable during the walk. The mm seqlock is held while inserting and removing entries into the MM RB tree. Later in this series, it will be check when looking for a VMA without holding the mmap_sem. This is based on the initial work from Peter Zijlstra: https://lore.kernel.org/linux-mm/20100104182813.479668508@chello.nl/ Signed-off-by: Laurent Dufour --- include/linux/mm_types.h | 3 +++ kernel/fork.c | 3 +++ mm/init-mm.c | 3 +++ mm/mmap.c | 48 +++++++++++++++++++++++++++++++--------- 4 files changed, 46 insertions(+), 11 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index e78f72eb2576..24b3f8ce9e42 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -358,6 +358,9 @@ struct mm_struct { struct { struct vm_area_struct *mmap; /* list of VMAs */ struct rb_root mm_rb; +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + seqlock_t mm_seq; +#endif u64 vmacache_seqnum; /* per-thread vmacache */ #ifdef CONFIG_MMU unsigned long (*get_unmapped_area) (struct file *filp, diff --git a/kernel/fork.c b/kernel/fork.c index 2992d2c95256..3a1739197ebc 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1008,6 +1008,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, mm->mmap = NULL; mm->mm_rb = RB_ROOT; mm->vmacache_seqnum = 0; +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + seqlock_init(&mm->mm_seq); +#endif atomic_set(&mm->mm_users, 1); atomic_set(&mm->mm_count, 1); init_rwsem(&mm->mmap_sem); diff --git a/mm/init-mm.c b/mm/init-mm.c index a787a319211e..69346b883a4e 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -27,6 +27,9 @@ */ struct mm_struct init_mm = { .mm_rb = RB_ROOT, +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + .mm_seq = __SEQLOCK_UNLOCKED(init_mm.mm_seq), +#endif .pgd = swapper_pg_dir, .mm_users = ATOMIC_INIT(2), .mm_count = ATOMIC_INIT(1), diff --git a/mm/mmap.c b/mm/mmap.c index 13460b38b0fb..f7f6027a7dff 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -170,6 +170,24 @@ void unlink_file_vma(struct vm_area_struct *vma) } } +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT +static inline void mm_write_seqlock(struct mm_struct *mm) +{ + write_seqlock(&mm->mm_seq); +} +static inline void mm_write_sequnlock(struct mm_struct *mm) +{ + write_sequnlock(&mm->mm_seq); +} +#else +static inline void mm_write_seqlock(struct mm_struct *mm) +{ +} +static inline void mm_write_sequnlock(struct mm_struct *mm) +{ +} +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ + /* * Close a vm structure and free it, returning the next. */ @@ -445,26 +463,32 @@ static void vma_gap_update(struct vm_area_struct *vma) } static inline void vma_rb_insert(struct vm_area_struct *vma, - struct rb_root *root) + struct mm_struct *mm) { + struct rb_root *root = &mm->mm_rb; + /* All rb_subtree_gap values must be consistent prior to insertion */ validate_mm_rb(root, NULL); rb_insert_augmented(&vma->vm_rb, root, &vma_gap_callbacks); } -static void __vma_rb_erase(struct vm_area_struct *vma, struct rb_root *root) +static void __vma_rb_erase(struct vm_area_struct *vma, struct mm_struct *mm) { + struct rb_root *root = &mm->mm_rb; + /* * Note rb_erase_augmented is a fairly large inline function, * so make sure we instantiate it only once with our desired * augmented rbtree callbacks. */ + mm_write_seqlock(mm); rb_erase_augmented(&vma->vm_rb, root, &vma_gap_callbacks); + mm_write_sequnlock(mm); /* wmb */ } static __always_inline void vma_rb_erase_ignore(struct vm_area_struct *vma, - struct rb_root *root, + struct mm_struct *mm, struct vm_area_struct *ignore) { /* @@ -472,21 +496,21 @@ static __always_inline void vma_rb_erase_ignore(struct vm_area_struct *vma, * with the possible exception of the "next" vma being erased if * next->vm_start was reduced. */ - validate_mm_rb(root, ignore); + validate_mm_rb(&mm->mm_rb, ignore); - __vma_rb_erase(vma, root); + __vma_rb_erase(vma, mm); } static __always_inline void vma_rb_erase(struct vm_area_struct *vma, - struct rb_root *root) + struct mm_struct *mm) { /* * All rb_subtree_gap values must be consistent prior to erase, * with the possible exception of the vma being erased. */ - validate_mm_rb(root, vma); + validate_mm_rb(&mm->mm_rb, vma); - __vma_rb_erase(vma, root); + __vma_rb_erase(vma, mm); } /* @@ -601,10 +625,12 @@ void __vma_link_rb(struct mm_struct *mm, struct vm_area_struct *vma, * immediately update the gap to the correct value. Finally we * rebalance the rbtree after all augmented values have been set. */ + mm_write_seqlock(mm); rb_link_node(&vma->vm_rb, rb_parent, rb_link); vma->rb_subtree_gap = 0; vma_gap_update(vma); - vma_rb_insert(vma, &mm->mm_rb); + vma_rb_insert(vma, mm); + mm_write_sequnlock(mm); } static void __vma_link_file(struct vm_area_struct *vma) @@ -680,7 +706,7 @@ static __always_inline void __vma_unlink_common(struct mm_struct *mm, { struct vm_area_struct *next; - vma_rb_erase_ignore(vma, &mm->mm_rb, ignore); + vma_rb_erase_ignore(vma, mm, ignore); next = vma->vm_next; if (has_prev) prev->vm_next = next; @@ -2674,7 +2700,7 @@ detach_vmas_to_be_unmapped(struct mm_struct *mm, struct vm_area_struct *vma, insertion_point = (prev ? &prev->vm_next : &mm->mmap); vma->vm_prev = NULL; do { - vma_rb_erase(vma, &mm->mm_rb); + vma_rb_erase(vma, mm); mm->map_count--; tail_vma = vma; vma = vma->vm_next; -- 2.21.0