Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp162697rdb; Tue, 5 Dec 2023 01:34:41 -0800 (PST) X-Google-Smtp-Source: AGHT+IFWBLfGqj0sKhvVNJt8ZubUYt6A1vo/JWjDoHKMvXo8F+SowFLEZDGO5MKgkfFGm4tlmaIt X-Received: by 2002:a17:902:e993:b0:1d0:4a2e:1d89 with SMTP id f19-20020a170902e99300b001d04a2e1d89mr4462366plb.31.1701768881475; Tue, 05 Dec 2023 01:34:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701768881; cv=none; d=google.com; s=arc-20160816; b=jpG7nkKP7fxT+oZZjveaBTbtGghjCXVvRpqnf8skPzJi+IqAf0dETUb5HQkqZA8xtw sMkZcx7I6oPWgFPik3CidLJmBJkHsnWCO2u09ABhTqM13L6X0ZC15C0V2px72PuCk0+/ wq7YYknCJoVr08xzvMWr5w0KBuKfAO3BScI/7KqUNHC7N5YvuhstjI9vsTHrYFHaNH94 msPBEnOwySXX9qLKUXKnWr21+wwNEWnKZKM12dFyWbSngl8R6gwnxOTxN+dGxHVmm9e6 5F6JZrrwwTQblQSj1OqOcP/tCGqpkAYJsITU/W5MdTNvpy8te1YRw+4X6pKUocAjp4AF O1UA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=CrUKsivj0JgBfmXjTSKnqIkLu1r29O8Ap60f5CGo5A4=; fh=MTTYBQRFW0zSU9Y6X7cRLWQBLxm0Mg/1ZqQwpNiQrAg=; b=bJsX2ps6hpdbFi8YyeWVfyViY+MDxI8okPZ5u+N0AtKX1F3AtlrFDTvVDYBGZ584tk 8L2AonZHazxPzW4Ga83NwUpRNOvybFOI6D3oZojb483QYGOafFUzHnS2EFA2G8cRVgl8 kMTkhb/oi5E1qxuUwI7L8+TCdhQtsYr7uAV00ojmbDAh8XkFNdSebjmKN0qHn62KKVnH DNLfaUULoWTKYhJxz+1W5FGYPkE2E/l1oqu+LECKaOWP1NAqIm21BECFrfXjwFre6+Qi mXi+xlSvUmq9/SmJNULHlb1QBndDsm4C1DOq0NC9ZSblfZ+fUp23ytfPmNkXDOeoe+R1 6fjw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id t9-20020a170902e84900b001cfdf2b1d86si10026942plg.96.2023.12.05.01.34.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 01:34:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 7E341804C495; Tue, 5 Dec 2023 01:34:38 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231782AbjLEJeX convert rfc822-to-8bit (ORCPT + 99 others); Tue, 5 Dec 2023 04:34:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50206 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229710AbjLEJeW (ORCPT ); Tue, 5 Dec 2023 04:34:22 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E34679A for ; Tue, 5 Dec 2023 01:34:27 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 381B8139F; Tue, 5 Dec 2023 01:35:14 -0800 (PST) Received: from [10.57.73.130] (unknown [10.57.73.130]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8863E3F6C4; Tue, 5 Dec 2023 01:34:24 -0800 (PST) Message-ID: <2de0617e-d1d7-49ec-9cb8-206eaf37caed@arm.com> Date: Tue, 5 Dec 2023 09:34:23 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v8 00/10] Multi-size THP for anonymous memory Content-Language: en-GB To: Andrew Morton Cc: Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins , Kefeng Wang , Barry Song <21cnbao@gmail.com>, Alistair Popple , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org References: <20231204102027.57185-1-ryan.roberts@arm.com> <20231204113039.42510c23455026e40c5e2a56@linux-foundation.org> From: Ryan Roberts In-Reply-To: <20231204113039.42510c23455026e40c5e2a56@linux-foundation.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Tue, 05 Dec 2023 01:34:38 -0800 (PST) On 04/12/2023 19:30, Andrew Morton wrote: > On Mon, 4 Dec 2023 10:20:17 +0000 Ryan Roberts wrote: > >> Hi All, >> >> >> Prerequisites >> ============= >> >> Some work items identified as being prerequisites are listed on page 3 at [9]. >> The summary is: >> >> | item | status | >> |:------------------------------|:------------------------| >> | mlock | In mainline (v6.7) | >> | madvise | In mainline (v6.6) | >> | compaction | v1 posted [10] | >> | numa balancing | Investigated: see below | >> | user-triggered page migration | In mainline (v6.7) | >> | khugepaged collapse | In mainline (NOP) | > > What does "prerequisites" mean here? Won't compile without? Kernel > crashes without? Nice-to-have-after? Please expand on this. Short answer: It's supposed to mean things that either need to be done to prevent the mm from regressing (both correctness and performance) when multi-size THP is present but disabled, or things that need to be done to make the mm robust (but not neccessarily optimially performant) when multi-size THP is enabled. But in reality, all of the things on the list could really be reclassified as "nice-to-have-after", IMHO; their absence will neither cause compilation nor runtime errors. Longer answer: When I first started looking at this, I was advised that there were likely a number of corners which made assumptions about large folios always being PMD-sized, and if not found and fixed, could lead to stability issues. At the time I was also pursuing a strategy of multi-size THP being a compile-time feature with no runtime control, so I decided it was important for multi-size THP to not effectively disable other features (e.g. various madvise ops used to ignore PTE-mapped large folios). This list represents all the things that I could find based on code review, as well as things suggested by others, and in the end, they all fall into that last category of "PTE-mapped large folios efectively disable existing features". But given we now have runtime controls to opt-in to multi-size THP, I'm not sure we need to classify these as prerequisites. But I didn't want to unilaterally make that decision, given this list has previously been discussed and agreed by others. It's also worth noting that in the case of compaction, that's already a problem for large folios in the page cache; large folios will be skipped. > > I looked at [9], but access is denied. Sorry about that; its owned by David Rientjes so I can't fix that for you. It's a PDF of a slide with the following table: +-------------------------------+------------------------------------------------------------------------+--------------+--------------------+ | Item | Description | Assignee | Status | +-------------------------------+------------------------------------------------------------------------+--------------+--------------------+ | mlock | Large, pte-mapped folios are ignored when mlock is requested. | Yin, Fengwei | In mainline (v6.7) | | | Code comment for mlock_vma_folio() says "...filter out pte mappings | | | | | of THPs which cannot be consistently counted: a pte mapping of the | | | | | THP head cannot be distinguished by the page alone." | | | | madvise | MADV_COLD, MADV_PAGEOUT, MADV_FREE: For large folios, code assumes | Yin, Fengwei | In mainline (v6.6) | | | exclusive only if mapcount==1, else skips remainder of operation. | | | | | For large, pte-mapped folios, exclusive folios can have mapcount | | | | | upto nr_pages and still be exclusive. Even better; don't split | | | | | the folio if it fits entirely within the range. | | | | compaction | Raised at LSFMM: Compaction skips non-order-0 pages. | Zi Yan | v1 posted | | | Already problem for page-cache pages today. | | | | numa balancing | Large, pte-mapped folios are ignored by numa-balancing code. Commit | John Hubbard | Investigated: | | | comment (e81c480): "We're going to have THP mapped with PTEs. It | | Not prerequisite | | | will confuse numabalancing. Let's skip them for now." | | | | user-triggered page migration | mm/migrate.c (migrate_pages syscall) We don't want to migrate folio | Kefeng Wang | In mainline (v6.7) | | | that is shared. | | | | khugepaged collapse | collapse small-sized THP to PMD-sized THP in khugepaged/MADV_COLLAPSE. | Ryan Roberts | In mainline (NOP) | | | Kirill thinks khugepage should already be able to collapse | | | | | small large folios to PMD-sized THP; verification required. | | | +-------------------------------+------------------------------------------------------------------------+--------------+--------------------+ Thanks, Ryan > >> [9] https://drive.google.com/file/d/1GnfYFpr7_c1kA41liRUW5YtCb8Cj18Ud/view?usp=sharing&resourcekey=0-U1Mj3-RhLD1JV6EThpyPyA > >