Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp168558rdb; Tue, 5 Dec 2023 01:50:40 -0800 (PST) X-Google-Smtp-Source: AGHT+IGEoU9WEebPDUeg2QDCRgAeF8NuDPRyKk1BOFiT+kUoqMfElpBoJ4TOb/M+ejCRjmu2gMTa X-Received: by 2002:a17:90a:8046:b0:286:6cc1:272 with SMTP id e6-20020a17090a804600b002866cc10272mr681047pjw.61.1701769840248; Tue, 05 Dec 2023 01:50:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701769840; cv=none; d=google.com; s=arc-20160816; b=r28vNlmvSl6szyKucZuGA2Sugdz8gwgT5M3XSLt4pvnuG+oCkwJXI3KLKlnjXaAD69 lNglwX5htsEYjAWXUWkhms7BNc8IlQRHAtbC5mlQYmbp61cFfM07WZsNrREZEWG+Vsr+ 2xq8ZJXFuG1/KoSFtgBbKPiPiNG2eGeqHfkHM4+Sc+1rKJ884jMHnNTh6fSt2iQ2QqWG VGs2ESYTX83tNGR5cU1oG7B0rxjotD7oXgTGQXQl3qg3eqzA54FMbyCFa+oNfEE0RsRi 8411AdKBTGZqf6t0XjqoEwq1zEgp/kDaIqLEOTE9adiGru+mAqvE/kAQShg8YxEv/hGk M+kA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=LceMQS5rgt3qgoMzi3+IsygpP8C6t9lp8kWcRIsIaK4=; fh=E9a34w7y7mxIWf2Ip8hre7OG9y3SK5wj2HS003vmZaU=; b=AzrRgMQu2Gvtjwaxl/fTo0HIWz5pMRJTDr9qgclIP20MiKLH12t6fFOudSv+Q6VRXM IrO5ZUvqxfI7W/EtIg1pXRSuhrRnHAzmQmYz6rcah7yuwyr+KwsGs3PPTLZZL9o0i5vN 5ho3VxJmzpLjjUMVMcMufjGb/5gXNMeSVQHtWGPBxvj2MS1cT5A+CghfPzQv2EVlAiAh SI0pGeiXu8x3tt9Yvhvzi7XXKKQ+uDexkZdIO1NJA1eQlrfyBw3bs/STGmT1LVhqTAuA LJQc2xaygoxoDM7J7PeWE4pXe8ZVb1B68yLkP/MJB4f2qk6VEDFEM1zIk/LoW4A9aGU5 ylUA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id y8-20020a17090aa40800b00286b69f3ba7si2888087pjp.23.2023.12.05.01.50.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 01:50:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 7B7AE8096397; Tue, 5 Dec 2023 01:50:37 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231778AbjLEJuO (ORCPT + 99 others); Tue, 5 Dec 2023 04:50:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229596AbjLEJuN (ORCPT ); Tue, 5 Dec 2023 04:50:13 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6936C9E for ; Tue, 5 Dec 2023 01:50:19 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B209EC15; Tue, 5 Dec 2023 01:51:05 -0800 (PST) Received: from [10.57.73.130] (unknown [10.57.73.130]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0D1FE3F5A1; Tue, 5 Dec 2023 01:50:15 -0800 (PST) Message-ID: <8adbde1c-970b-4a26-81b0-91b913c4850b@arm.com> Date: Tue, 5 Dec 2023 09:50:14 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v8 03/10] mm: thp: Introduce multi-size THP sysfs interface Content-Language: en-GB To: Barry Song <21cnbao@gmail.com> Cc: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins , Kefeng Wang , Alistair Popple , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org References: <20231204102027.57185-1-ryan.roberts@arm.com> <20231204102027.57185-4-ryan.roberts@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Tue, 05 Dec 2023 01:50:37 -0800 (PST) On 05/12/2023 04:21, Barry Song wrote: > On Mon, Dec 4, 2023 at 11:21 PM Ryan Roberts wrote: >> >> In preparation for adding support for anonymous multi-size THP, >> introduce new sysfs structure that will be used to control the new >> behaviours. A new directory is added under transparent_hugepage for each >> supported THP size, and contains an `enabled` file, which can be set to >> "inherit" (to inherit the global setting), "always", "madvise" or >> "never". For now, the kernel still only supports PMD-sized anonymous >> THP, so only 1 directory is populated. >> >> The first half of the change converts transhuge_vma_suitable() and >> hugepage_vma_check() so that they take a bitfield of orders for which >> the user wants to determine support, and the functions filter out all >> the orders that can't be supported, given the current sysfs >> configuration and the VMA dimensions. If there is only 1 order set in >> the input then the output can continue to be treated like a boolean; >> this is the case for most call sites. The resulting functions are >> renamed to thp_vma_suitable_orders() and thp_vma_allowable_orders() >> respectively. >> >> The second half of the change implements the new sysfs interface. It has >> been done so that each supported THP size has a `struct thpsize`, which >> describes the relevant metadata and is itself a kobject. This is pretty >> minimal for now, but should make it easy to add new per-thpsize files to >> the interface if needed in future (e.g. per-size defrag). Rather than >> keep the `enabled` state directly in the struct thpsize, I've elected to >> directly encode it into huge_anon_orders_[always|madvise|inherit] >> bitfields since this reduces the amount of work required in >> thp_vma_allowable_orders() which is called for every page fault. >> >> See Documentation/admin-guide/mm/transhuge.rst, as modified by this >> commit, for details of how the new sysfs interface works. >> >> Signed-off-by: Ryan Roberts > > Reviewed-by: Barry Song Thanks! > >> -khugepaged will be automatically started when >> -transparent_hugepage/enabled is set to "always" or "madvise, and it'll >> -be automatically shutdown if it's set to "never". >> +khugepaged will be automatically started when one or more hugepage >> +sizes are enabled (either by directly setting "always" or "madvise", >> +or by setting "inherit" while the top-level enabled is set to "always" >> +or "madvise"), and it'll be automatically shutdown when the last >> +hugepage size is disabled (either by directly setting "never", or by >> +setting "inherit" while the top-level enabled is set to "never"). >> >> Khugepaged controls >> ------------------- >> >> +.. note:: >> + khugepaged currently only searches for opportunities to collapse to >> + PMD-sized THP and no attempt is made to collapse to other THP >> + sizes. > > For small-size THP, collapse is probably a bad idea. we like a one-shot > try in Android especially we are using a 64KB and less large folio size. if > PF succeeds in getting large folios, we map large folios, otherwise we > give up as those memories can be quite unstably swapped-out, swapped-in > and madvised to be DONTNEED. > > too many compactions will increase power consumption and decrease UI > response. Understood; that's very useful information for the Android context. Multiple people have made comments about eventually needing khugepaged (or something similar) support in the server context though to async collapse to contpte size. Actually one suggestion was a user space daemon that scans and collapses with MADV_COLLAPSE. I suspect the key will be to ensure whatever solution we go for is flexible and can be enabled/disabled/configured for the different environments. > > Thanks > Barry