Received: by 2002:a05:6358:bb9e:b0:b9:5105:a5b4 with SMTP id df30csp4600745rwb; Tue, 6 Sep 2022 09:41:14 -0700 (PDT) X-Google-Smtp-Source: AA6agR4/Wf/hHpUnfwxTZOaCDuj5LTU0a/YnAlIX5mzOZlPAi1KsVEJCbjQjZSxYfX1icQ9P+Ft1 X-Received: by 2002:a63:4c43:0:b0:42b:1b03:af6d with SMTP id m3-20020a634c43000000b0042b1b03af6dmr46122863pgl.309.1662482474492; Tue, 06 Sep 2022 09:41:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1662482474; cv=none; d=google.com; s=arc-20160816; b=tDCztKMpBAF2LqnLZmyHgaIUHFF7ST905NiG9qlynfLqETRPrI418d46z9OmWbHfG0 P7/A6xgWv03ANMCNKbs6oKD7CD3ukX85INE5BvDXLIa12fCws0pbxvZt9PkcfFn/7pgO oshSAbou9tud+Ml4OqpDXGZEfc17dp48GwScgMkH6rlX9GG122YpmTogOo1GLSsgJcsO u2G39tm6642tUQqDcOGY9ew6nkT1fU6c3ljgn1gg6oJUV/lASMnQBYNcihBiBxhe5P+R qBRjoZ/wGyWFQgD7Xr3SGtd5SRK+dqZrEw4TROsbscsCmYTcudW0NtXjCCCkLVtx+UG7 ua1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=mOKMGbj0h+IoItTPXGITUaX6+BxdUTBobKe1JTaSnpU=; b=rtM9Uk34FMv3BAHgsDG9en0768wFWYG4rrD2EJgGUDNhMl10zSLOBxoiae+JQ7vtwG 93gyj/GqIa6HAlmPOadq1HN9YhSxsWHLgyv8zmUKZsBuIlow5ur/DhJK5e1JgYgz++yT wEDuGfAHfKrtOiIuEWZaZ5hJlKjHHkPHVMvSHB9JUM86Qu+1HZx9XRXTgxvrlFBHkupp 4zFRV+mm6ykOxLh2xdBU3zqi1npbDobgSzieBrvuo/40XzMvk3r95a/8wNXvea00Rkbm EnfOgAK+NhwJVS59cFE16jabmM88qdVsIi1KxmgIYSAvqfdF40OzMvTzYJFboiP+SE1/ wcNQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=mqr+CLhL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i11-20020a63584b000000b0042bbf71e70csi12933246pgm.427.2022.09.06.09.41.00; Tue, 06 Sep 2022 09:41:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=mqr+CLhL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233117AbiIFPjZ (ORCPT + 99 others); Tue, 6 Sep 2022 11:39:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49914 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238153AbiIFPim (ORCPT ); Tue, 6 Sep 2022 11:38:42 -0400 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 02068C6FE0 for ; Tue, 6 Sep 2022 07:48:21 -0700 (PDT) Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 286EKQs8031075; Tue, 6 Sep 2022 14:27:01 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=mOKMGbj0h+IoItTPXGITUaX6+BxdUTBobKe1JTaSnpU=; b=mqr+CLhLxBtoZkbFNOpyD8ZfgMACYYYIcPHk3oR2WScnQYOkTbNYapKbWcGuuUvx7EDj aAjPwgveYuPGTyLUM69DOMiw0aPCptXxxKvnxiOJpH1wjDn1ZOscujgLjqZL0tKBK3ik AEiYGOdr0lBFLFDdWA7lIOxd4rqW7Lenv6ufz5lSHv3o+Db1TxPJJ7jsCTIemuEUE0w4 +XvA0zYXi+lj+DI7b0NpoWAnNQKVlghvUdZcvZlWQaq5ObT1JiijmA/Ew73KZQwL/NVB AMwRZpj7b+r2asLhGxmKybbIZ5MVnvkGxuziCjbzoZSbp+KkRjuIwiS4FasIvHqzVF0M Bg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3je7vs86w3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 06 Sep 2022 14:27:01 +0000 Received: from m0098421.ppops.net (m0098421.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 286ELGRq001209; Tue, 6 Sep 2022 14:27:00 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3je7vs86ur-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 06 Sep 2022 14:27:00 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 286EM2l5032470; Tue, 6 Sep 2022 14:26:58 GMT Received: from b06avi18626390.portsmouth.uk.ibm.com (b06avi18626390.portsmouth.uk.ibm.com [9.149.26.192]) by ppma04ams.nl.ibm.com with ESMTP id 3jbxj8uven-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 06 Sep 2022 14:26:58 +0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06avi18626390.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 286ENRMu38011302 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 6 Sep 2022 14:23:27 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 236A5A405F; Tue, 6 Sep 2022 14:26:56 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F2BF6A4054; Tue, 6 Sep 2022 14:26:54 +0000 (GMT) Received: from [9.101.4.33] (unknown [9.101.4.33]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 6 Sep 2022 14:26:54 +0000 (GMT) Message-ID: <1624be86-4c17-46e5-fafc-eb8afb7b9b4a@linux.ibm.com> Date: Tue, 6 Sep 2022 16:26:54 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.2.0 Subject: Re: [RFC PATCH RESEND 06/28] mm: mark VMA as locked whenever vma->vm_flags are modified Content-Language: fr To: Suren Baghdasaryan , akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@suse.de, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, laurent.dufour@fr.ibm.com, paulmck@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, rientjes@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, kernel-team@android.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org References: <20220901173516.702122-1-surenb@google.com> <20220901173516.702122-7-surenb@google.com> From: Laurent Dufour In-Reply-To: <20220901173516.702122-7-surenb@google.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 6dJbI8DKo_nJ9KArZv3m8trGfj4Z_S3o X-Proofpoint-ORIG-GUID: NqUqfyaWz9LmhuH8uw3FRePI3DXiS3ka X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-06_07,2022-09-06_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 phishscore=0 priorityscore=1501 clxscore=1015 mlxscore=0 impostorscore=0 bulkscore=0 spamscore=0 mlxlogscore=994 suspectscore=0 lowpriorityscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2209060067 X-Spam-Status: No, score=-3.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Le 01/09/2022 à 19:34, Suren Baghdasaryan a écrit : > VMA flag modifications should be done under VMA lock to prevent concurrent > page fault handling in that area. > > Signed-off-by: Suren Baghdasaryan > --- > fs/proc/task_mmu.c | 1 + > fs/userfaultfd.c | 6 ++++++ > mm/madvise.c | 1 + > mm/mlock.c | 2 ++ > mm/mmap.c | 1 + > mm/mprotect.c | 1 + > 6 files changed, 12 insertions(+) There are few changes also done in the driver's space, for instance: *** arch/x86/kernel/cpu/sgx/driver.c: sgx_mmap[98] vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO; *** arch/x86/kernel/cpu/sgx/virt.c: sgx_vepc_mmap[108] vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTDUMP | VM_DONTCOPY; *** drivers/dax/device.c: dax_mmap[311] vma->vm_flags |= VM_HUGEPAGE; I guess these changes to vm_flags should be protected as well, or to be checked one by one. > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 4e0023643f8b..ceffa5c2c650 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -1285,6 +1285,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf, > for (vma = mm->mmap; vma; vma = vma->vm_next) { > if (!(vma->vm_flags & VM_SOFTDIRTY)) > continue; > + vma_mark_locked(vma); > vma->vm_flags &= ~VM_SOFTDIRTY; > vma_set_page_prot(vma); > } > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index 175de70e3adf..fe557b3d1c07 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -620,6 +620,7 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx, > mmap_write_lock(mm); > for (vma = mm->mmap; vma; vma = vma->vm_next) > if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx) { > + vma_mark_locked(vma); > vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; > vma->vm_flags &= ~__VM_UFFD_FLAGS; > } > @@ -653,6 +654,7 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs) > > octx = vma->vm_userfaultfd_ctx.ctx; > if (!octx || !(octx->features & UFFD_FEATURE_EVENT_FORK)) { > + vma_mark_locked(vma); > vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; > vma->vm_flags &= ~__VM_UFFD_FLAGS; > return 0; > @@ -734,6 +736,7 @@ void mremap_userfaultfd_prep(struct vm_area_struct *vma, > atomic_inc(&ctx->mmap_changing); > } else { > /* Drop uffd context if remap feature not enabled */ > + vma_mark_locked(vma); > vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; > vma->vm_flags &= ~__VM_UFFD_FLAGS; > } > @@ -891,6 +894,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file) > vma = prev; > else > prev = vma; > + vma_mark_locked(vma); > vma->vm_flags = new_flags; > vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; > } > @@ -1449,6 +1453,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > * the next vma was merged into the current one and > * the current one has not been updated yet. > */ > + vma_mark_locked(vma); > vma->vm_flags = new_flags; > vma->vm_userfaultfd_ctx.ctx = ctx; > > @@ -1630,6 +1635,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, > * the next vma was merged into the current one and > * the current one has not been updated yet. > */ > + vma_mark_locked(vma); > vma->vm_flags = new_flags; > vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; > > diff --git a/mm/madvise.c b/mm/madvise.c > index 5f0f0948a50e..a173f0025abd 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -181,6 +181,7 @@ static int madvise_update_vma(struct vm_area_struct *vma, > /* > * vm_flags is protected by the mmap_lock held in write mode. > */ > + vma_mark_locked(vma); > vma->vm_flags = new_flags; > if (!vma->vm_file) { > error = replace_anon_vma_name(vma, anon_name); > diff --git a/mm/mlock.c b/mm/mlock.c > index b14e929084cc..f62e1a4d05f2 100644 > --- a/mm/mlock.c > +++ b/mm/mlock.c > @@ -380,6 +380,7 @@ static void mlock_vma_pages_range(struct vm_area_struct *vma, > */ > if (newflags & VM_LOCKED) > newflags |= VM_IO; > + vma_mark_locked(vma); > WRITE_ONCE(vma->vm_flags, newflags); > > lru_add_drain(); > @@ -456,6 +457,7 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev, > > if ((newflags & VM_LOCKED) && (oldflags & VM_LOCKED)) { > /* No work to do, and mlocking twice would be wrong */ > + vma_mark_locked(vma); > vma->vm_flags = newflags; > } else { > mlock_vma_pages_range(vma, start, end, newflags); > diff --git a/mm/mmap.c b/mm/mmap.c > index 693e6776be39..f89c9b058105 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -1818,6 +1818,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, > out: > perf_event_mmap(vma); > > + vma_mark_locked(vma); > vm_stat_account(mm, vm_flags, len >> PAGE_SHIFT); > if (vm_flags & VM_LOCKED) { > if ((vm_flags & VM_SPECIAL) || vma_is_dax(vma) || I guess, this doesn't really impact, but the call to vma_mark_locked(vma) may be done only in the case the vm_flags field is touched. Something like this: vm_stat_account(mm, vm_flags, len >> PAGE_SHIFT); if (vm_flags & VM_LOCKED) { if ((vm_flags & VM_SPECIAL) || vma_is_dax(vma) || is_vm_hugetlb_page(vma) || - vma == get_gate_vma(current->mm)) + vma == get_gate_vma(current->mm)) { + vma_mark_locked(vma); vma->vm_flags &= VM_LOCKED_CLEAR_MASK; - else + } else mm->locked_vm += (len >> PAGE_SHIFT); } > diff --git a/mm/mprotect.c b/mm/mprotect.c > index bc6bddd156ca..df47fc21b0e4 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -621,6 +621,7 @@ mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma, > * vm_flags and vm_page_prot are protected by the mmap_lock > * held in write mode. > */ > + vma_mark_locked(vma); > vma->vm_flags = newflags; > /* > * We want to check manually if we can change individual PTEs writable