Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp29368126rwd; Wed, 5 Jul 2023 10:45:51 -0700 (PDT) X-Google-Smtp-Source: APBJJlEAOjGvHrEAwMc/9OiLspXa5gPS1MncNYWaKJUP1Opc+jm4DBEdes/3AmyHmM0w2jRADu61 X-Received: by 2002:a05:6870:32ca:b0:1b3:c5ac:cefa with SMTP id r10-20020a05687032ca00b001b3c5accefamr8001223oac.49.1688579151087; Wed, 05 Jul 2023 10:45:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688579151; cv=none; d=google.com; s=arc-20160816; b=0Q+GKc4ELBfYQRSuudMx4f/K7y1flmMIV2Baai1526bBWDiEQVneCikii3SONQQ7h6 X3y+CmfiddGgJlem68AmY2Do4Zw2Zg6KuepsCL+xt+Rma+EwGCSbnI1GPxs1tp2nPTnl cpIoI/SPDxu0iyiyZBCgXg3GXcYSFtcQ1FMm4KlOBDmXwE+O8ZZI6eYhOHdCGVem3Fbf trx7BmUC0PLTaCh/ACfap6Mcc+C4N5GcyPsicOqmnUo6qdPio/c5B5sHeIV9KvzawiYj 2bpqgco2yqDuvK1M91sufFdQ9Oi1OpGZ0IKhrFooEPBXnYeOgQ2mKkix9AlN/Knqy9O/ XiEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=sHjvoo64g8hqR4hb9CfrVJyaF2fuw5Igy5Tcp1DXgn0=; fh=XoYrfkI/Mz6HdSuOPTze/5z9X6bN5OFYVUpfo7Vucsc=; b=DstoSqiApTbDdlT0ndKIfUAI+ldqjZryOY5t68dhCwH48UhmofQ+dTo5FzgSIatbzh m012CYJMry42X/GGbVC19VV2hhQDm9Q7QA5DmEr6xEohCfHN32Z0uS7bugt6ux9tysGx /AE7yiZasS6JX1+9RKQcjOMoGG05xI0C8kWXHufJn6W+2D2FoMVOheXTu31PTzvGRaqD EMrHaLYbr2Sx0PJzKivaW2pX5Ag0k5oy8fcFT7tskMZIdmxuegDK5ql+GYjmR+RKh/rv FxVJ67U8HcL2wqBxlQl2SYnw8BNM13kcZQTpbTFOtFurFXHrzSClNnqQXHYvVn2PgO9+ vc1g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=UxvRl93O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b10-20020a17090a800a00b0026395d2aa4dsi1942525pjn.67.2023.07.05.10.45.35; Wed, 05 Jul 2023 10:45:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=UxvRl93O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232489AbjGERYJ (ORCPT + 99 others); Wed, 5 Jul 2023 13:24:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44602 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232461AbjGERYI (ORCPT ); Wed, 5 Jul 2023 13:24:08 -0400 Received: from mail-yw1-x112b.google.com (mail-yw1-x112b.google.com [IPv6:2607:f8b0:4864:20::112b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C5C02188 for ; Wed, 5 Jul 2023 10:24:04 -0700 (PDT) Received: by mail-yw1-x112b.google.com with SMTP id 00721157ae682-57916e0badcso61367427b3.0 for ; Wed, 05 Jul 2023 10:24:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1688577844; x=1691169844; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=sHjvoo64g8hqR4hb9CfrVJyaF2fuw5Igy5Tcp1DXgn0=; b=UxvRl93OuANxiyXqhyXqCmP3i8b5MiWAFlpa+NkGqyp7Idba76W8Zi+pYaDuYc8QQF tpXpBKOQWeP4Jj2xp63wGWrjOd4zyUlRPhTbHr2unVBgGIU/ruEwEiVucvDbTdW6MrvD KjtcaaJSZWTFmn9jrVitGbjRJwCfra5ctoo1050hawVosrwe51cJAqB5NuG6QQob/d5P bnD/AS6wPoTYkovC1lf3GjAXyZ5Xi+kc0m1epLh3aUHr6BJMy+xZs79S5v2Q0hMgLSP5 dNx/+zAc0HrJG9uD1lfdHtoyZquD1XibsyHsndmcWiqPSC8JvH2WG4E6M7jWRR9Tcj7X CgvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688577844; x=1691169844; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sHjvoo64g8hqR4hb9CfrVJyaF2fuw5Igy5Tcp1DXgn0=; b=DURK/YHpNbVvftWEZ6GTlMyT/zePHMPMFBgZNAIdRhewKnB9iqEPYOVyJr6k965lB2 vZbyMXdxutqTfyGnY+8uWpsL+zMf8bwkG9LwRSyWmr2/NFp7F04QjqboPhWPz9wqAmls D8N+ABxPRv4E5zNSnqeNvTl3gpp7FqlZNADsexaLhBwyb57lBmu+jYzx43OhMXOaSTvE kdL7NPxsWxZXPqTzl+vUa5cgOl3OOaphTKJdD41zJbfgiOmHXcDjq71jtTEoS/FOk8dD dabhb17kdmO8Yw3kueeMTNWxq2fSsCScu++zidpMrBwuPBgfJihB7ZOYgrJVixXWXNa4 91cA== X-Gm-Message-State: ABy/qLaz7OdDMuBP1zhz3c4Fpgj4aaKBY/9vcw+knM0/o64GQKtqiTy9 m3U4K4z2mefI/bTUG/siyNHt2ks5+DM8sKJsJVfijg== X-Received: by 2002:a25:69cd:0:b0:c4e:c503:d5f6 with SMTP id e196-20020a2569cd000000b00c4ec503d5f6mr9299399ybc.64.1688577843719; Wed, 05 Jul 2023 10:24:03 -0700 (PDT) MIME-Version: 1.0 References: <20230705171213.2843068-1-surenb@google.com> <20230705171213.2843068-2-surenb@google.com> <10c8fe17-fa9b-bf34-cb88-c758e07c9d72@redhat.com> In-Reply-To: <10c8fe17-fa9b-bf34-cb88-c758e07c9d72@redhat.com> From: Suren Baghdasaryan Date: Wed, 5 Jul 2023 10:23:52 -0700 Message-ID: Subject: Re: [PATCH v3 1/2] fork: lock VMAs of the parent process when forking To: David Hildenbrand Cc: akpm@linux-foundation.org, jirislaby@kernel.org, jacobly.alt@gmail.com, holger@applied-asynchrony.com, hdegoede@redhat.com, michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 5, 2023 at 10:14=E2=80=AFAM David Hildenbrand wrote: > > On 05.07.23 19:12, Suren Baghdasaryan wrote: > > When forking a child process, parent write-protects an anonymous page > > and COW-shares it with the child being forked using copy_present_pte(). > > Parent's TLB is flushed right before we drop the parent's mmap_lock in > > dup_mmap(). If we get a write-fault before that TLB flush in the parent= , > > and we end up replacing that anonymous page in the parent process in > > do_wp_page() (because, COW-shared with the child), this might lead to > > some stale writable TLB entries targeting the wrong (old) page. > > Similar issue happened in the past with userfaultfd (see flush_tlb_page= () > > call inside do_wp_page()). > > Lock VMAs of the parent process when forking a child, which prevents > > concurrent page faults during fork operation and avoids this issue. > > This fix can potentially regress some fork-heavy workloads. Kernel buil= d > > time did not show noticeable regression on a 56-core machine while a > > stress test mapping 10000 VMAs and forking 5000 times in a tight loop > > shows ~5% regression. If such fork time regression is unacceptable, > > disabling CONFIG_PER_VMA_LOCK should restore its performance. Further > > optimizations are possible if this regression proves to be problematic. > > > > Suggested-by: David Hildenbrand > > Reported-by: Jiri Slaby > > Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51= b@kernel.org/ > > Reported-by: Holger Hoffst=C3=A4tte > > Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34= c@applied-asynchrony.com/ > > Reported-by: Jacob Young > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=3D217624 > > Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling fi= rst") > > Cc: stable@vger.kernel.org > > Signed-off-by: Suren Baghdasaryan > > --- > > kernel/fork.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/kernel/fork.c b/kernel/fork.c > > index b85814e614a5..403bc2b72301 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c > > @@ -658,6 +658,12 @@ static __latent_entropy int dup_mmap(struct mm_str= uct *mm, > > retval =3D -EINTR; > > goto fail_uprobe_end; > > } > > +#ifdef CONFIG_PER_VMA_LOCK > > + /* Disallow any page faults before calling flush_cache_dup_mm */ > > + for_each_vma(old_vmi, mpnt) > > + vma_start_write(mpnt); > > + vma_iter_init(&old_vmi, oldmm, 0); > > +#endif > > flush_cache_dup_mm(oldmm); > > uprobe_dup_mmap(oldmm, mm); > > /* > > The old version was most probably fine as well, but this certainly looks > even safer. > > Acked-by: David Hildenbrand Thanks! > > -- > Cheers, > > David / dhildenb >