Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp8543500rwd; Tue, 20 Jun 2023 17:15:19 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5LuCvmk7N1tScVf8hCOPIQyLyioHzwd4VXbfyLyhHsrp+mluHjHxhi9t9PwYQsYr/U3cqo X-Received: by 2002:a05:6808:1a29:b0:39e:c6e4:a4a8 with SMTP id bk41-20020a0568081a2900b0039ec6e4a4a8mr12505901oib.12.1687306519166; Tue, 20 Jun 2023 17:15:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687306519; cv=none; d=google.com; s=arc-20160816; b=uyqUUhnHW/8ciU1nAhn8Dkn3oJOc3Cp/hWoXNUGu6iE/3gzV3u/Gw1CBRh6hB9tnsM 34Ilyex7h5pddKosvyWERjd2uZS/9612zLgZaQQxr4Wg1+lG4aCXdZWLxfeaU8t7QhMZ HZ4dmGBQRI6a+uwDyJbaO7g0Gqz0erUuBEsdqymD0sRFkp+TRnL4m4UQZygXQpKRAX2V wagznBvkQCJ/Dm/vgCqqYiD2TEZV5vm3a5C3jq1sn/ysis+FZvgUhhoJciZWZd23jn5U 9qUKOozXNA4adKXN5mP3sdVzs/QjAtKkyamoynYLhcYjHGhPZjqjArhyRnk9W8I60Qj6 dENA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Vb4/4UZZyVYXRpnqB4ac9M1vy7WiRZuGJxfCVEQQGnA=; b=FrUbooOFkEl1y480ZGadtCbpxS0AKyawucO3gua1im7UDk228Rra8jTv5YF58u843R ecnTDoit2N60LYMVSnU/JeywdTKWsEIjw43xLIIKqREr5EA5vIewINlT8s5JbOyxEErs h8nOaF0ty0lbMBdXBQYYNrwerJXxNrbbuBHmL9++mpb/CA3SyrEzI5EV7/UbQLUea3aR tI0xp/NDd96UJdyojxpcKl5lwXME+jpl4BT0G0lZqa3pPgILat9+iwfm7yqXnMem0pUB zlA0adKvVLUphJq1q9r5/uhYAE2qjqHSY3ZMqIRP4qVcxX/9KHiWJ/MkmKGzsn3I+aP6 C/fA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=eHDK8AuL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id pc14-20020a17090b3b8e00b0025bf86c41absi3295932pjb.151.2023.06.20.17.15.06; Tue, 20 Jun 2023 17:15:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=eHDK8AuL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229758AbjFTXwo (ORCPT + 99 others); Tue, 20 Jun 2023 19:52:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36896 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229685AbjFTXwn (ORCPT ); Tue, 20 Jun 2023 19:52:43 -0400 Received: from mail-qv1-xf33.google.com (mail-qv1-xf33.google.com [IPv6:2607:f8b0:4864:20::f33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D3F91720 for ; Tue, 20 Jun 2023 16:52:40 -0700 (PDT) Received: by mail-qv1-xf33.google.com with SMTP id 6a1803df08f44-62ffc10180aso41414226d6.3 for ; Tue, 20 Jun 2023 16:52:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; t=1687305160; x=1689897160; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Vb4/4UZZyVYXRpnqB4ac9M1vy7WiRZuGJxfCVEQQGnA=; b=eHDK8AuLxbgGjPONohhcJU+nFgRy3VUDmlsbmajZr4D1UWA3e8AuMYXaeciUgBkep8 f6bKXam0Jvh8Fg7MhXn+MFgX3auRLfaEe2gd6saoiplTzPSIaF+kI0Qk0wgOmmfVZKHs 6JKEGXzCCBK6/+fpaJ1IjsLnhvPkkSeY2cwgGMmWbML9jCxP4UGazGmT4xxDCwrd5oUv XT0f1oc61zdfM88/KTJtP2UPYpFj8Ale9ODjFUxpR8mMv4Y8u2pkZdAgw3PtmINxiS7a nLJfeWKVJl0lroNewiTzHETr0Qxsvmv2HDsbeeYrg9e06SzsTsZ5fe+Ezg8CBNEOt3oG oN9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687305160; x=1689897160; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Vb4/4UZZyVYXRpnqB4ac9M1vy7WiRZuGJxfCVEQQGnA=; b=NXTRd8GP9AXbi+7ToB1PyJWlYfbNoujfpq+bIJo5Co2LytuGPELBOGx6KggkZB1N7N OXdTWN2lEJNWiOWP94Xnuz4qXFzWh/4CbeKlFEDQZEQqn2dynDE9LvLlNDeWUxBnRjZW oJEBWtO8HTW5RCCHfrs0hcCGm5pws2ukEV/82rMIaKRkCv7crPj7w3KOj+4I1/sVeRmc cF62Df8CspC9k9v5VhZA1qKLr4eyDsxgee2evEJPZ5JZP6bOAaYscWGBcpoXAIyyPBzZ PCWpVOIjoXP5lePPo0ZM321XOWRUw+gs4NYPtvwcS8X2WDaMBypptOzOd2eaZUv7IE3D RubA== X-Gm-Message-State: AC+VfDwm9aTImYSfZwZV79zZ9RlkA8HpFiAE8DZFFSCClAd9C546JY/v KmzUZ1p7kHTbocD8fMCYcCNeLA== X-Received: by 2002:ad4:5bae:0:b0:62d:ddeb:3781 with SMTP id 14-20020ad45bae000000b0062dddeb3781mr16704999qvq.20.1687305159707; Tue, 20 Jun 2023 16:52:39 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-68-25-194.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.68.25.194]) by smtp.gmail.com with ESMTPSA id jo30-20020a056214501e00b006301819be40sm1816704qvb.49.2023.06.20.16.52.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 16:52:39 -0700 (PDT) Received: from jgg by wakko with local (Exim 4.95) (envelope-from ) id 1qBl9K-007Vog-KI; Tue, 20 Jun 2023 20:52:38 -0300 Date: Tue, 20 Jun 2023 20:52:38 -0300 From: Jason Gunthorpe To: Hugh Dickins Cc: Andrew Morton , Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David Sc. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v2 05/12] powerpc: add pte_free_defer() for pgtables sharing page Message-ID: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> <5cd9f442-61da-4c3d-eca-b7f44d22aa5f@google.com> <2ad8b6cf-692a-ff89-ecc-586c20c5e07f@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2ad8b6cf-692a-ff89-ecc-586c20c5e07f@google.com> X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 20, 2023 at 12:54:25PM -0700, Hugh Dickins wrote: > On Tue, 20 Jun 2023, Jason Gunthorpe wrote: > > On Tue, Jun 20, 2023 at 12:47:54AM -0700, Hugh Dickins wrote: > > > Add powerpc-specific pte_free_defer(), to call pte_free() via call_rcu(). > > > pte_free_defer() will be called inside khugepaged's retract_page_tables() > > > loop, where allocating extra memory cannot be relied upon. This precedes > > > the generic version to avoid build breakage from incompatible pgtable_t. > > > > > > This is awkward because the struct page contains only one rcu_head, but > > > that page may be shared between PTE_FRAG_NR pagetables, each wanting to > > > use the rcu_head at the same time: account concurrent deferrals with a > > > heightened refcount, only the first making use of the rcu_head, but > > > re-deferring if more deferrals arrived during its grace period. > > > > You didn't answer my question why we can't just move the rcu to the > > actual free page? > > I thought that I had answered it, perhaps not to your satisfaction: > > https://lore.kernel.org/linux-mm/9130acb-193-6fdd-f8df-75766e663978@google.com/ > > My conclusion then was: > Not very good reasons: good enough, or can you supply a better patch? Oh, I guess I didn't read that email as answering the question.. I was saying to make pte_fragment_free() unconditionally do the RCU. It is the only thing that uses the page->rcu_head, and it means PPC would double RCU the final free on the TLB path, but that is probably OK for now. This means pte_free_defer() won't do anything special on PPC as PPC will always RCU free these things, this address the defer concern too, I think. Overall it is easier to reason about. I looked at fixing the TLB stuff to avoid the double rcu but quickly got scared that ppc was using a kmem_cache to allocate other page table sizes so there is not a reliable struct page to get a rcu_head from. This looks like the main challenge for ppc... We'd have to teach the tlb code to not do its own RCU stuff for table levels that the arch is already RCU freeing - and that won't get us to full RCU freeing on PPC. Anyhow, this is a full version of what I was thinking: diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c index 20652daa1d7e3a..b5dcd0f27fc115 100644 --- a/arch/powerpc/mm/pgtable-frag.c +++ b/arch/powerpc/mm/pgtable-frag.c @@ -106,6 +106,21 @@ pte_t *pte_fragment_alloc(struct mm_struct *mm, int kernel) return __alloc_for_ptecache(mm, kernel); } +static void pgtable_free_cb(struct rcu_head *head) +{ + struct page *page = container_of(head, struct page, rcu_head); + + pgtable_pte_page_dtor(page); + __free_page(page); +} + +static void pgtable_free_cb_kernel(struct rcu_head *head) +{ + struct page *page = container_of(head, struct page, rcu_head); + + __free_page(page); +} + void pte_fragment_free(unsigned long *table, int kernel) { struct page *page = virt_to_page(table); @@ -115,8 +130,13 @@ void pte_fragment_free(unsigned long *table, int kernel) BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0); if (atomic_dec_and_test(&page->pt_frag_refcount)) { + /* + * Always RCU free pagetable memory. rcu_head overlaps with lru + * which is no longer in use by the time the table is freed. + */ if (!kernel) - pgtable_pte_page_dtor(page); - __free_page(page); + call_rcu(&page->rcu_head, pgtable_free_cb); + else + call_rcu(&page->rcu_head, pgtable_free_cb_kernel); } }