Received: by 2002:a05:7412:5112:b0:fa:6e18:a558 with SMTP id fm18csp434790rdb; Tue, 23 Jan 2024 04:28:40 -0800 (PST) X-Google-Smtp-Source: AGHT+IGMTHYxZkK37rM6vcBcGum2XREE/teGjZwGGwPgGOJ9s02H735utJfNoedy4+Ia26+UBk6H X-Received: by 2002:a05:6a00:1914:b0:6db:9c1:7164 with SMTP id y20-20020a056a00191400b006db09c17164mr8752530pfi.15.1706012920108; Tue, 23 Jan 2024 04:28:40 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706012920; cv=pass; d=google.com; s=arc-20160816; b=W4lTxpqoWr1rN5jIX8tm9lRE2GdFu+BmTem9qnPkbmu3ojU2g457Oi1xfvowYV+qCW j1rZcCJnhb94RRnio7rKS4l4TqRYhl3IqJXHOlZsSx7YTjSphC0X0HHbr8QG2/NPy8Om 4XAnPObLrOs10LzhQEBOw2Wl5mJY3bkkt/QjGrYPuP5QLY/BXiLpr64yiVMOx5PcxxO2 AfHJsZgHZBqK/u/iU2OiJKSmMBy5dj41EbeJFEwR1REcZiMrDg9XmPftXY7B/tiTwfjX LrF0iGrhZFgASx42m32g1A4IRWcMTC/KPMpMXMKk3BznRLJh4x67cdT1bHEGt+gf55jn d//g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=N4xx9k7SikQemK4CJzfA9IWlw4YYFRavqho8Llbe8Bg=; fh=cLjkAanmB8aobrGIK/HTftjty0FQAaQ+aRvIv8PHcYU=; b=tQmCJjXhoGRMuAyI/f4RIJYuYrbes9LibTPUU7hPcsaPF7gGySWs6eyqG1YFxb3gY0 5+naTDT0Owdy1FNlRvXOV413/ADbSPvmXJHttsyBmh/5QFm1xnR2vbI99pEV0YXSVhIA 98ffGEJ93mF54pNl2GgJ9w+PrkQrP2IJYkDsOt3tNBS4ipqb8+jDMzzPkTP7dzud697/ bMCfidbRbPoV9aAJufHSwqX6dD1j/ZIIP+agGIxyAXHnVpuke693pNVcc6mjTFa+8F8e pQXe19f8fw7KJm7mLENPtiIKyUciP7gi3pBo35X8WaFpoyqez2qI4GDCOm90kgKFOEzb EFZQ== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-35246-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-35246-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id k1-20020aa78201000000b006d9b10a55aasi11645371pfi.393.2024.01.23.04.28.39 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Jan 2024 04:28:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-35246-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-35246-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-35246-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id BC102288B00 for ; Tue, 23 Jan 2024 12:28:39 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C93EF5EE62; Tue, 23 Jan 2024 12:28:33 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 395035DF00; Tue, 23 Jan 2024 12:28:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706012913; cv=none; b=oROpqbZ6X1L24CEYgKnfGYp5TM03EBy/TvdJH1+fX+8ZAJQA8vXqxlQI7cVvEjkrAE38FQxQTGwnGK51Pcls4GdBeiEqwRVhi3CAhkGRixg7+w6KL5jDF0Yu2mE+ueeTtBw+LfVqk3ZQqCnRA7qGdxDOYvqTGEB+aF20HGua2b8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706012913; c=relaxed/simple; bh=U1Wm9oAyEK9SoZyY9g3iPj8Ybyi1tRDJeV2UVkh7Wvo=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=k4+yHgJ9xwtqFBQ1i1v6YgZhuTSlfMAoo0qWkgbLuaqWjQt44lvhwX6CPdunqmAbiZ6cqlfEZ42+zhU7nr2fpFPPhs/D1eBiNA6mMfGyM19X4IA4TWAJJ/qkJA7Hs45j0MIwxBr2z84A2u7iykEdm4HLclY4hn/Ifzmd4T2tkT4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7B0011FB; Tue, 23 Jan 2024 04:29:15 -0800 (PST) Received: from [10.57.77.165] (unknown [10.57.77.165]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6882C3F762; Tue, 23 Jan 2024 04:28:26 -0800 (PST) Message-ID: <40112a27-eddb-4c1a-a859-a34e202e6564@arm.com> Date: Tue, 23 Jan 2024 12:28:25 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 09/11] mm/memory: optimize fork() with PTE-mapped THP Content-Language: en-GB To: David Hildenbrand , linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, Andrew Morton , Matthew Wilcox , Russell King , Catalin Marinas , Will Deacon , Dinh Nguyen , Michael Ellerman , Nicholas Piggin , Christophe Leroy , "Aneesh Kumar K.V" , "Naveen N. Rao" , Paul Walmsley , Palmer Dabbelt , Albert Ou , Alexander Gordeev , Gerald Schaefer , Heiko Carstens , Vasily Gorbik , Christian Borntraeger , Sven Schnelle , "David S. Miller" , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, sparclinux@vger.kernel.org References: <20240122194200.381241-1-david@redhat.com> <20240122194200.381241-10-david@redhat.com> <63be0c3c-bf34-4cbb-b47b-7c9be0e65058@arm.com> <31a0661e-fa69-419c-9936-98bfe168d5a7@redhat.com> From: Ryan Roberts In-Reply-To: <31a0661e-fa69-419c-9936-98bfe168d5a7@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 23/01/2024 12:19, David Hildenbrand wrote: > [...] > >> >> I wrote some documentation for this (based on Matthew's docs for set_ptes() in >> my version. Perhaps it makes sense to add it here, given this is overridable by >> the arch. >> >> /** >>   * wrprotect_ptes - Write protect a consecutive set of pages. >>   * @mm: Address space that the pages are mapped into. >>   * @addr: Address of first page to write protect. >>   * @ptep: Page table pointer for the first entry. >>   * @nr: Number of pages to write protect. >>   * >>   * May be overridden by the architecture, else implemented as a loop over >>   * ptep_set_wrprotect(). >>   * >>   * Context: The caller holds the page table lock. The PTEs are all in the same >>   * PMD. >>   */ >> > > I could have sworn I had a documentation at some point. Let me add some, thanks. > > [...] > >>> + >>> +    /* >>> +     * If we likely have to copy, just don't bother with batching. Make >>> +     * sure that the common "small folio" case stays as fast as possible >>> +     * by keeping the batching logic separate. >>> +     */ >>> +    if (unlikely(!*prealloc && folio_test_large(folio) && max_nr != 1)) { >>> +        nr = folio_pte_batch(folio, addr, src_pte, pte, max_nr); >>> +        if (folio_test_anon(folio)) { >>> +            folio_ref_add(folio, nr); >>> +            if (unlikely(folio_try_dup_anon_rmap_ptes(folio, page, >>> +                                  nr, src_vma))) { >> >> What happens if its not the first page of the batch that fails here? Aren't you >> signalling that you need a prealloc'ed page for the wrong pte? Shouldn't you >> still batch copy all the way up to the failing page first? Perhaps it all comes >> out in the wash and these events are so infrequent that we don't care about the >> lost batching opportunity? > > I assume you mean the weird corner case that some folio pages in the range have > PAE set, others don't -- and the folio maybe pinned. > > In that case, we fallback to individual pages, and might have preallocated a > page although we wouldn't have to preallocate one for processing the next page > (that doesn't have PAE set). > > It should all work, although not optimized to the extreme, and as it's a corner > case, we don't particularly care. Hopefully, in the future we'll only have a > single PAE flag per folio. > > Or am I missing something? No, your explanation makes sense. Just wanted to check this all definitely worked, because the flow is slightly different to my previous version that was doing try_dup_rmap page-by-page. > >> >>> +                folio_ref_sub(folio, nr); >>> +                return -EAGAIN; >>> +            } >>> +            rss[MM_ANONPAGES] += nr; >>> +            VM_WARN_ON_FOLIO(PageAnonExclusive(page), folio); >>> +        } else { >>> +            folio_ref_add(folio, nr); >> >> Perhaps hoist this out to immediately after folio_pte_batch() since you're >> calling it on both branches? > > Makes sense, thanks. >