Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp838187rwb; Sun, 6 Nov 2022 14:52:12 -0800 (PST) X-Google-Smtp-Source: AMsMyM7gKO+1buvPEqkxjsFB7bRTYg5A7dvfxB9jUwkFs/4tXBScR1s+QoPSxvYRfDIl6qAyt8QY X-Received: by 2002:a05:6a00:1810:b0:56b:f29d:cc8e with SMTP id y16-20020a056a00181000b0056bf29dcc8emr46325303pfa.33.1667775131933; Sun, 06 Nov 2022 14:52:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1667775131; cv=none; d=google.com; s=arc-20160816; b=z4WqjeBVEzvYqnrS3+OQU+oRT2zSuY6t+I6J9U0nXy86grikrSd7DkHQgfftx67RSt FsykddRhksltc6VZ6k0yhQXSXbJTF2+3Ep7mNEwjb6mE7X6ZgovQhhl4ZVf5CgplN7r2 AMWvqYEkbu1nNXEoKtmlXYcqcrCk14ya1TyoN1Rw+mHUTHP6YDjJAU40fXlETIwzG3tW WkAU4yB2qjE7bXuYpgwEug5ma5kaEGEZBw+HzfJ7Q39rWQ2rJNYKsmzn/P9yIYIDshzb NLia3U7t+3fQnzS0ySJH7eqxBgDUgIfmza1JIBCHNw8yBNJ4Rle6ItFyVoWEZZUz9Drf v/Bw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=ZMovLP0zi5Hy9M5Np2yrgkMh7vSv6N1lqtVs8OK/j6M=; b=ckxFE/OwrTZXLGux2uZnG1dYrhNOWrS82IeGj0JUMA1h+k90nZwE+SNBUVRSXU5fB2 ph83YhI3ACLobIz2H+Endk9dMWav9px28wRA6AVKuiTnIMJKR2kx4/WpxCwD/E+jgXKi uKU5KKcShrnvrZNycAdD2dZLfsznwk1Jap8gNP9x4Td+JjW70RyTL11IuTbJNI4idq7B uTkCr4LkIdULS1r3W7GdjuL7TvhqmLYvgoCZY9X/i1LvaZa02vcjRP/3v/sB6Zg87SQQ cI9gEXj0D0v3sIb0RAjF6/NzFa17hErNdQdDpyfyzjp8m//Z7vR3u9Uc+caXyk5lks8p ElfA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=SFFtpaaF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v5-20020a056a00148500b0056cf72b41cdsi7955662pfu.1.2022.11.06.14.52.00; Sun, 06 Nov 2022 14:52:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=SFFtpaaF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230107AbiKFWlf (ORCPT + 95 others); Sun, 6 Nov 2022 17:41:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230050AbiKFWle (ORCPT ); Sun, 6 Nov 2022 17:41:34 -0500 Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com [IPv6:2607:f8b0:4864:20::72e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3DB82FD07 for ; Sun, 6 Nov 2022 14:41:33 -0800 (PST) Received: by mail-qk1-x72e.google.com with SMTP id v8so6280634qkg.12 for ; Sun, 06 Nov 2022 14:41:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ZMovLP0zi5Hy9M5Np2yrgkMh7vSv6N1lqtVs8OK/j6M=; b=SFFtpaaFdKF014jghg/vqNT3X4Waz9R9v3wN9nf0iRah92p1UUnQbcxIoHsPXF1RLq cK+uyn4DmRxuQl+k0wyeRO+yTDJgIff6MXt4wr6tsRo71fWvG3YluXBbLG+G8JrDK2Mc Z6GjqaAaevseb04xMi+8briB+EYfhb4ZaGah4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ZMovLP0zi5Hy9M5Np2yrgkMh7vSv6N1lqtVs8OK/j6M=; b=RIH3kydrEQzSipwEgqrMMCGHMLqP1Zhrz8qFpVDQQvmeZMQpvdpnvh0yBJ8Hhjb//L no+G33WFxUqXnEKRWbygTWJAREKqJ/gAYaQ6ki4Y+9+nmkossy/rkSpDzPmquqCRs/SO 3WKkHqpYJVYfx/65WZH11LlHnbmQsLNUuXTaLkfW/nC/mFkeRQj6r3F6JU4th4OHCLN3 Ay9BG4J10w8N0/69qnsG9EmZ9JH9T4RQW435JXGUE7agHB8YNDMbccSHWDPtALmx9NEX hQ3qcQ74nxc0YLtpiv4pywXjujlZe6RsngMlmf0mPrpx6DiNFor8/x7IxB3r1oKEhKxD p2Uw== X-Gm-Message-State: ACrzQf1fsAWJANALLqWbd3F21w4UX/DujUN8+T6LR2RxAwmUxVE5iSVw YIrYy/iDj+ZKGMWaPRH4yoNsqA2wqCS72w== X-Received: by 2002:a05:620a:1430:b0:6fa:b78:7e07 with SMTP id k16-20020a05620a143000b006fa0b787e07mr33169340qkj.120.1667774492125; Sun, 06 Nov 2022 14:41:32 -0800 (PST) Received: from mail-yw1-f179.google.com (mail-yw1-f179.google.com. [209.85.128.179]) by smtp.gmail.com with ESMTPSA id i5-20020ac871c5000000b003a494b61e67sm4762074qtp.46.2022.11.06.14.41.31 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 06 Nov 2022 14:41:31 -0800 (PST) Received: by mail-yw1-f179.google.com with SMTP id 00721157ae682-3704852322fso89331047b3.8 for ; Sun, 06 Nov 2022 14:41:31 -0800 (PST) X-Received: by 2002:a05:6902:1352:b0:6bb:3f4b:9666 with SMTP id g18-20020a056902135200b006bb3f4b9666mr42872513ybu.101.1667774108259; Sun, 06 Nov 2022 14:35:08 -0800 (PST) MIME-Version: 1.0 References: <140B437E-B994-45B7-8DAC-E9B66885BEEF@gmail.com> <8a1e97c9-bd5-7473-6da8-2aa75198fbe8@google.com> In-Reply-To: <8a1e97c9-bd5-7473-6da8-2aa75198fbe8@google.com> From: Linus Torvalds Date: Sun, 6 Nov 2022 14:34:51 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: mm: delay rmap removal until after TLB flush To: Hugh Dickins Cc: Johannes Weiner , Stephen Rothwell , Alexander Gordeev , Peter Zijlstra , Will Deacon , Aneesh Kumar , Nick Piggin , Heiko Carstens , Vasily Gorbik , Christian Borntraeger , Sven Schnelle , Nadav Amit , Jann Horn , John Hubbard , X86 ML , Matthew Wilcox , Andrew Morton , kernel list , Linux-MM , Andrea Arcangeli , "Kirill A . Shutemov" , Joerg Roedel , Uros Bizjak , Alistair Popple , linux-arch Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [ Editing down to just the bare-bones problem cases ] On Sun, Nov 6, 2022 at 1:06 PM Hugh Dickins wrote: > > anon_vma (bad) > -------------- > > See folio_lock_anon_vma_read(): folio_mapped() plays a key role in > establishing the continued validity of an anon_vma. See comments > above folio_get_anon_vma(), some by me but most by PeterZ IIRC. > > I believe what has happened is that your patchset has, very intentionally, > kept the page as "folio_mapped" until after free_pgtables() does its > unlink_anon_vmas(); but that is telling folio_lock_anon_vma_read() > that the anon_vma is safe to use when actually it has been freed. > (It looked like a page table when I peeped at it.) > > I'm not certain, but I think that you made page_zap_pte_rmap() handle > anon as well as file, just for the righteous additional simplification; > but I'm afraid that (without opening a huge anon_vma refcounting can of > worms) that unification has to be reverted, and anon left to go the > same old way it did before. Indeed. I made them separate initially, just because the only case that mattered for the dirty bit was the file-mapped case. But then the two functions ended up being basically the identical function, so I unified them again. But the anonvma lifetime issue looks very real, and so doing the "delay rmap only for file mappings" seems sane. In fact, I wonder if we should delay it only for *dirty* file mappings, since it doesn't matter for the clean case. Hmm. I already threw away my branch (since Andrew picked the patches up), so a question for Andrew: do you want me to re-do the branch entirely, or do you want me to just send you an incremental patch? To make for minimal changes, I'd drop the 're-unification' patch, and then small updates to the zap_pte_range() code to keep the anon (and possibly non-dirty) case synchronous. And btw, this one is interesting: for anonymous (and non-dirty file-mapped) patches, we actually can end up delaying the final page free (and the rmap zapping) all the way to "tlb_finish_mmu()". Normally we still have the vma's all available, but yes, free_pgtables() can and does happen before the final TLB flush. The file-mapped dirty case doesn't have that issue - not just because it doesn't have an anonvma at all, but because it also does that "force_flush" thing that just measn that the page freeign never gets delayed that far in the first place. > mm-unstable (bad) > ----------------- > Aside from that PageAnon issue, mm-unstable is in an understandably bad > state because you could not have foreseen my subpages_mapcount addition > to page_remove_rmap(). page_zap_pte_rmap() now needs to handle the > PageCompound (but not the "compound") case too. I rushed you and akpm > an emergency patch for that on Friday night, but you, let's say, had > reservations about it. So I haven't posted it, and while the PageAnon > issue remains, I think your patchset has to be removed from mm-unstable > and linux-next anyway. So I think I'm fine with your patch, I just want to move the memcg accounting to outside of it. I can re-do my series on top of mm-unstable, I guess. That's probably the easiest way to handle this all. Andrew - can you remove those patches again, and I'll create a new series for you? Linus