Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp193553rdb; Tue, 5 Dec 2023 02:51:06 -0800 (PST) X-Google-Smtp-Source: AGHT+IFUuy7+KdKh8BjsoVjzn1l//mTkyyyHBzRa6JoaIrXZfI7R8CHlv55575svF1RBIIocxPF2 X-Received: by 2002:a17:90b:164a:b0:286:a1ce:a5ed with SMTP id il10-20020a17090b164a00b00286a1cea5edmr729870pjb.25.1701773466417; Tue, 05 Dec 2023 02:51:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701773466; cv=none; d=google.com; s=arc-20160816; b=aLjjf9V0j2g5PrGl/8NCBA7R+4lNiwj8YjixL5IJkXMID3OQFN5X3j0yqUbCM8XkF+ swEB4cYge8P1wz4Y7LJQ9rFZWUzzZOCGVXQWcpOgB/+AcgSfvbvaoaVT2EaVt4sZ/0Pz nJiANdiA6qbgCOCpvXSBpFm8IvkJ7QRJto+PL3dg/ngjjWkLDhu60cteSyLnsSTie/an jAzndBCrvqLXNak+RqFV5oEHM9eAWyM4i5LHLeVVAD7vq/+T6fLcezIiqaz1gltDpZpR iYxW93/ma1Egr18Y4klnT+W7s9XJKPQBisijhefbxwMXZrpgU5Q8VQ/tQXTdzYWypm9O aG6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=k17CcjFFwD2FA33VsJd0DUA3Wx+W3PbuCKTiupENqzw=; fh=rwaTEtiOZvfM4b/vVhAxzx6okpygo5aVBzdBx4dFGj4=; b=gKCOxOtqw6WqY3mPqdAzXjf/FqbTCd1Jb6BUfzgSbXZRU/2eA7AJyUjOVD6P4gbDjQ KXqCjZUTPpAqRf/vTSHSFSM20izCcBsCm61qR457/dHdj016dFDyehu2CAB6D6LSJtNN ePMnIuYOqkCumRKkJC+VyoVkasEv8l6dweb+DKLtaLyQ1sk+S5aeGZ/fs+DilBd759MH b09f/UxjNYDCahBNbdhsPfyuj08YesRNifmhYQOIuQDRyibRJ9LUQNWuarKfVg5bIWk0 XhcwSHNbAFjVIfev9dWaqVrkG1EOORzpKkdzYSedh4l/tleQovhV8YwzS8LVGCVOuWzm kY/Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id f11-20020a17090ab94b00b00286e9e44534si391486pjw.38.2023.12.05.02.51.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 02:51:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 0F25B807F172; Tue, 5 Dec 2023 02:50:32 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376593AbjLEKuP (ORCPT + 99 others); Tue, 5 Dec 2023 05:50:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50460 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376569AbjLEKuM (ORCPT ); Tue, 5 Dec 2023 05:50:12 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id F118A122 for ; Tue, 5 Dec 2023 02:50:18 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3E0421FB; Tue, 5 Dec 2023 02:51:05 -0800 (PST) Received: from [10.57.73.130] (unknown [10.57.73.130]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8A5B83F5A1; Tue, 5 Dec 2023 02:50:15 -0800 (PST) Message-ID: <075826b4-2df8-4460-a8f2-c0581d098cff@arm.com> Date: Tue, 5 Dec 2023 10:50:15 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v8 03/10] mm: thp: Introduce multi-size THP sysfs interface Content-Language: en-GB To: David Hildenbrand , Barry Song <21cnbao@gmail.com> Cc: Andrew Morton , Matthew Wilcox , Yin Fengwei , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins , Kefeng Wang , Alistair Popple , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org References: <20231204102027.57185-1-ryan.roberts@arm.com> <20231204102027.57185-4-ryan.roberts@arm.com> <8adbde1c-970b-4a26-81b0-91b913c4850b@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Tue, 05 Dec 2023 02:50:32 -0800 (PST) On 05/12/2023 09:57, David Hildenbrand wrote: > On 05.12.23 10:50, Ryan Roberts wrote: >> On 05/12/2023 04:21, Barry Song wrote: >>> On Mon, Dec 4, 2023 at 11:21 PM Ryan Roberts wrote: >>>> >>>> In preparation for adding support for anonymous multi-size THP, >>>> introduce new sysfs structure that will be used to control the new >>>> behaviours. A new directory is added under transparent_hugepage for each >>>> supported THP size, and contains an `enabled` file, which can be set to >>>> "inherit" (to inherit the global setting), "always", "madvise" or >>>> "never". For now, the kernel still only supports PMD-sized anonymous >>>> THP, so only 1 directory is populated. >>>> >>>> The first half of the change converts transhuge_vma_suitable() and >>>> hugepage_vma_check() so that they take a bitfield of orders for which >>>> the user wants to determine support, and the functions filter out all >>>> the orders that can't be supported, given the current sysfs >>>> configuration and the VMA dimensions. If there is only 1 order set in >>>> the input then the output can continue to be treated like a boolean; >>>> this is the case for most call sites. The resulting functions are >>>> renamed to thp_vma_suitable_orders() and thp_vma_allowable_orders() >>>> respectively. >>>> >>>> The second half of the change implements the new sysfs interface. It has >>>> been done so that each supported THP size has a `struct thpsize`, which >>>> describes the relevant metadata and is itself a kobject. This is pretty >>>> minimal for now, but should make it easy to add new per-thpsize files to >>>> the interface if needed in future (e.g. per-size defrag). Rather than >>>> keep the `enabled` state directly in the struct thpsize, I've elected to >>>> directly encode it into huge_anon_orders_[always|madvise|inherit] >>>> bitfields since this reduces the amount of work required in >>>> thp_vma_allowable_orders() which is called for every page fault. >>>> >>>> See Documentation/admin-guide/mm/transhuge.rst, as modified by this >>>> commit, for details of how the new sysfs interface works. >>>> >>>> Signed-off-by: Ryan Roberts >>> >>> Reviewed-by: Barry Song >> >> Thanks! >> >>> >>>> -khugepaged will be automatically started when >>>> -transparent_hugepage/enabled is set to "always" or "madvise, and it'll >>>> -be automatically shutdown if it's set to "never". >>>> +khugepaged will be automatically started when one or more hugepage >>>> +sizes are enabled (either by directly setting "always" or "madvise", >>>> +or by setting "inherit" while the top-level enabled is set to "always" >>>> +or "madvise"), and it'll be automatically shutdown when the last >>>> +hugepage size is disabled (either by directly setting "never", or by >>>> +setting "inherit" while the top-level enabled is set to "never"). >>>> >>>>   Khugepaged controls >>>>   ------------------- >>>> >>>> +.. note:: >>>> +   khugepaged currently only searches for opportunities to collapse to >>>> +   PMD-sized THP and no attempt is made to collapse to other THP >>>> +   sizes. >>> >>> For small-size THP, collapse is probably a bad idea. we like a one-shot >>> try in Android especially we are using a 64KB and less large folio size. if >>> PF succeeds in getting large folios, we map large folios, otherwise we >>> give up as those memories can be quite unstably swapped-out, swapped-in >>> and madvised to be DONTNEED. >>> >>> too many compactions will increase power consumption and decrease UI >>> response. >> >> Understood; that's very useful information for the Android context. Multiple >> people have made comments about eventually needing khugepaged (or something >> similar) support in the server context though to async collapse to contpte size. >> Actually one suggestion was a user space daemon that scans and collapses with >> MADV_COLLAPSE. I suspect the key will be to ensure whatever solution we go for >> is flexible and can be enabled/disabled/configured for the different >> environments. > > There certainly is interest for 2 MiB THP on arm64 64k where the THP size would > normally be 512 MiB. In that scenario, khugepaged makes perfect sense. Indeed