Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp6644700rwp; Tue, 18 Jul 2023 03:53:24 -0700 (PDT) X-Google-Smtp-Source: APBJJlHlDm70OfDdvxdIT+4E1cZxu7mXtzlAWWhRo2UWbYH2ukkjbVgJIUx8aZaMqSy6WOSFv1B5 X-Received: by 2002:a05:6512:4016:b0:4f4:d071:be48 with SMTP id br22-20020a056512401600b004f4d071be48mr1503304lfb.14.1689677604388; Tue, 18 Jul 2023 03:53:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689677604; cv=none; d=google.com; s=arc-20160816; b=bEZgCL9p6GgakHeuAl6FUqaW0cV4IdVz8nkcJfUuuFz/X9teiCLcCETc8YJJ8uz3iH VTte0gA2pDUojgT4+HMAd7Rggo2uSQK1yZ+PUOxCMLacKYU2So+n5oner6RKkqkLdT6C 6GL801LnL425O4JVk2XAtqdSwpQmmdtm2ja1mYtjncVBMfpmaq1UX9+U1hvNz8VYsX1j 4ZREF7gDM4xRbxJfEmsAFiZIj3myMX/h0/KnJUprkiT+SwnkQR/FL4+n/2BgLYcmSDdg RRcA4e/MfB/FGyCGKDwqANg9Fls05gR6PCl+PcD7GbxMw+qnQk9AhtxaDGVs2S/MEOHW UTuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id; bh=Z8LRS8Zyl8jRX9FEmAU4P8S/gmJzvPYSBAkjI/m59xI=; fh=YTYxLdpNknYaB7rufKNPv+ijfsnV7XxADLC3JrguFzQ=; b=bSekXchn54q8fLs1POS0pG5PS3Te2nkbe5gS0IKf5ihkTKqpPqbLwSOHF/cCaHEJr0 qC/Tx89L3BOLLRZVAArdDOBpQ20WvR/avR4Tfs+oO3aeEW7WGmQ3+JXhnrBrdoRBk8QL xWTUcZ3jMPDg0B5+aI+3jpbvtdI0kKdaBWIeYuse4Gtfc34ecqS3mcOEJw72iCIIcn7W Qgq0LnNbZspkdOMI1LDqM4exdSt2YYtsWzkpInYEH6WpkUGyx2BiGPsBVstUysn+gOdS SbJjshYajJOWp5PFinVrWX8O1A3yh1HiqeQCeorHSiytZ4YPl4+IYWRPlkDlKf2P2SgE 9bEw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gx3-20020a170906f1c300b00991b7749bd5si1081527ejb.779.2023.07.18.03.53.00; Tue, 18 Jul 2023 03:53:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231446AbjGRKgP (ORCPT + 99 others); Tue, 18 Jul 2023 06:36:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43446 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229583AbjGRKgO (ORCPT ); Tue, 18 Jul 2023 06:36:14 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 81BAD1B6 for ; Tue, 18 Jul 2023 03:36:13 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B3D622F4; Tue, 18 Jul 2023 03:36:56 -0700 (PDT) Received: from [10.1.34.52] (C02Z41KALVDN.cambridge.arm.com [10.1.34.52]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id F153A3F67D; Tue, 18 Jul 2023 03:36:10 -0700 (PDT) Message-ID: Date: Tue, 18 Jul 2023 11:36:09 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH v3 3/4] mm: FLEXIBLE_THP for improved performance To: Hugh Dickins Cc: Yu Zhao , Matthew Wilcox , Andrew Morton , "Kirill A. Shutemov" , Yin Fengwei , David Hildenbrand , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230714160407.4142030-1-ryan.roberts@arm.com> <20230714161733.4144503-3-ryan.roberts@arm.com> <432490d1-8d1e-1742-295a-d6e60a054ab6@arm.com> <5df787a0-8e69-2472-cdd6-f96a3f7dfaaf@arm.com> <8bdfd8d8-5662-4615-86dc-d60259bd16d@google.com> From: Ryan Roberts In-Reply-To: <8bdfd8d8-5662-4615-86dc-d60259bd16d@google.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 18/07/2023 00:37, Hugh Dickins wrote: > On Mon, 17 Jul 2023, Ryan Roberts wrote: > >>>>>> +static int alloc_anon_folio(struct vm_fault *vmf, struct folio **folio) >>>>>> +{ >>>>>> + int i; >>>>>> + gfp_t gfp; >>>>>> + pte_t *pte; >>>>>> + unsigned long addr; >>>>>> + struct vm_area_struct *vma = vmf->vma; >>>>>> + int prefer = anon_folio_order(vma); >>>>>> + int orders[] = { >>>>>> + prefer, >>>>>> + prefer > PAGE_ALLOC_COSTLY_ORDER ? PAGE_ALLOC_COSTLY_ORDER : 0, >>>>>> + 0, >>>>>> + }; >>>>>> + >>>>>> + *folio = NULL; >>>>>> + >>>>>> + if (vmf_orig_pte_uffd_wp(vmf)) >>>>>> + goto fallback; >>>>>> + >>>>>> + for (i = 0; orders[i]; i++) { >>>>>> + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << orders[i]); >>>>>> + if (addr >= vma->vm_start && >>>>>> + addr + (PAGE_SIZE << orders[i]) <= vma->vm_end) >>>>>> + break; >>>>>> + } >>>>>> + >>>>>> + if (!orders[i]) >>>>>> + goto fallback; >>>>>> + >>>>>> + pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK); >>>>>> + if (!pte) >>>>>> + return -EAGAIN; >>>>> >>>>> It would be a bug if this happens. So probably -EINVAL? >>>> >>>> Not sure what you mean? Hugh Dickins' series that went into v6.5-rc1 makes it >>>> possible for pte_offset_map() to fail (if I understood correctly) and we have to >>>> handle this. The intent is that we will return from the fault without making any >>>> change, then we will refault and try again. >>> >>> Thanks for checking that -- it's very relevant. One detail is that >>> that series doesn't affect anon. IOW, collapsing PTEs into a PMD can't >>> happen while we are holding mmap_lock for read here, and therefore, >>> the race that could cause pte_offset_map() on shmem/file PTEs to fail >>> doesn't apply here. >> >> But Hugh's patches have changed do_anonymous_page() to handle failure from >> pte_offset_map_lock(). So I was just following that pattern. If this really >> can't happen, then I'd rather WARN/BUG on it, and simplify alloc_anon_folio()'s >> prototype to just return a `struct folio *` (and if it's null that means ENOMEM). >> >> Hugh, perhaps you can comment? > > I agree with your use of -EAGAIN there: I find it better to allow for the > possibility, than to go to great effort persuading that it's impossible; > especially because what's possible tomorrow may differ from today. > > And notice that, before my changes, there used to be a pmd_trans_unstable() > check above, implying that it is possible for it to fail (for more reasons > than corruption causing pmd_bad()) - one scenario would be that the > pte_alloc() above succeeded *because* someone else had managed to insert > a huge pmd there already (maybe we have MMF_DISABLE_THP but they did not). > > But I see from later mail that Yu Zhao now agrees with your -EAGAIN too, > so we are all on the same folio. Thanks for the explanation. I think we are all now agreed that the error case needs handling and -EAGAIN is the correct code. > > Hugh > > p.s. while giving opinions, I'm one of those against using "THP" for > large but not pmd-mappable folios; and was glad to see Matthew arguing > the same way when considering THP_SWPOUT in another thread today. Honestly, I don't have an opinion either way on this (probably because I don't have the full context and history of THP like many of you do). So given there is a fair bit of opposition to FLEXIBLE_THP, I'll change it back to LARGE_ANON_FOLIO (and move it out of the THP sub-menu) in the next revision.