Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp5788039iob; Tue, 10 May 2022 03:48:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxTEjALsLXIUMEu5k9WvmBN9HUiM61La68DoRv5ahrmRESUHsDl4pTUgMiXo6im+b8sUyFR X-Received: by 2002:a17:907:9609:b0:6f4:d4d7:7c82 with SMTP id gb9-20020a170907960900b006f4d4d77c82mr18506945ejc.483.1652179739726; Tue, 10 May 2022 03:48:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652179739; cv=none; d=google.com; s=arc-20160816; b=qXB8BhCypv7rwQ0vbNQAuG1vfBJ8cgOAowPkrjJFOqk+MoD3pWovk0nQ6pdRUq6L9a 6s2s6S2L1QMCD06HnBSmkDI+ZjxsEt9TZzuYpnaTFSo6y/udYn88HWieamTXwsXNh3f1 MHw1uWVL3WH/0+ESyLrDcTpIbvmsNjUnETdlhhfXa/PwGXz0CiCNSYsH3Eb0XlGff2bM LCw04X4jb+74qnKjqTWSdySQ09+okZEktyLjgXoevviGiXk6r3Ye0wPMWCvc3iI5UTJJ VTuTtUv3dO9LTOovm9IrN2TxwoUQNJdd6vgs6HpQlTuKQ91vOWPv5BMEOpJGENCPyu6w gD/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :from:references:cc:to:content-language:user-agent:mime-version:date :message-id:dkim-signature:dkim-signature; bh=1XmhHsmMxly2ssjXZq07GT+YLgJJKJCm1DDtateBDTk=; b=xFCC3K/tAolhDlSl3L1JXCXk+0ikE8uadB7LrI4BggrawKZ6WFNvpqHhIADCDnlkgU zyt/W5CTEgsTFBtq+5CCvzJbNFJiXPCnK/eoVYK2fa9iIuxqczbnAR/9i9Ort1KNyWbF EzGcBu6SZj6mECY2gx2+gWZj0EJfVRVM0PSuXSf1qul68u2Dm0fRbyTXS++Sqm34EpMd JKPGMGHwNWWsZFsxnfBs1h4MT7cJa3H6JxhVOpU2O7ty3tFIP2/YvISs2fdhTBtZlhsU 341JyhMjsGxFL2gR1JMbtSM4k5YPDbCEJaYEMYDtQdwnqy3xQ49Nur829u1XqnIhGU4H VqKg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=IJuN4UGr; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519 header.b=Hky868WP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d7-20020a170906040700b006df76385cf8si16156627eja.408.2022.05.10.03.48.35; Tue, 10 May 2022 03:48:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=IJuN4UGr; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519 header.b=Hky868WP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237014AbiEJHor (ORCPT + 99 others); Tue, 10 May 2022 03:44:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51554 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235358AbiEJHoU (ORCPT ); Tue, 10 May 2022 03:44:20 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5F8672B3F7D; Tue, 10 May 2022 00:35:13 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 1A7A91FA40; Tue, 10 May 2022 07:35:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1652168112; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1XmhHsmMxly2ssjXZq07GT+YLgJJKJCm1DDtateBDTk=; b=IJuN4UGrs9rc2dJVVBhsa1bjZ3T3IclRPChNG+pU/f0P5j2jhqhJdBv6iHyS8YLVKXrX9O IgR+SAlvxejrlifKj7vvU9VRi+tTDzUTwOLC8ZzoO9LJStWIqpcJbGIH0Mg8qtBSPfOOXE DtIELQ9PZ5agpNUPUxAzWBC7OuPsRuE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1652168112; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1XmhHsmMxly2ssjXZq07GT+YLgJJKJCm1DDtateBDTk=; b=Hky868WPFR6DQGfmlvXKlSdXSBp6m/eni1+ZiF8ZqPIJzJCqTRAyxT9iOGNx5/1MoEmPNW oW/SwcsdgAngRzDQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id DFBA313AA5; Tue, 10 May 2022 07:35:11 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id xutPNa8VemJXeQAAMHmgww (envelope-from ); Tue, 10 May 2022 07:35:11 +0000 Message-ID: <0da1c63b-5cc3-7fc9-1fb4-fdc385539bbc@suse.cz> Date: Tue, 10 May 2022 09:35:11 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Content-Language: en-US To: Yang Shi Cc: "Kirill A. Shutemov" , Miaohe Lin , Song Liu , Rik van Riel , Matthew Wilcox , Zi Yan , Theodore Ts'o , Andrew Morton , Linux MM , Linux FS-devel Mailing List , Linux Kernel Mailing List References: <20220404200250.321455-1-shy828301@gmail.com> <627a71f8-e879-69a5-ceb3-fc8d29d2f7f1@suse.cz> From: Vlastimil Babka Subject: Re: [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/9/22 22:34, Yang Shi wrote: > On Mon, May 9, 2022 at 9:05 AM Vlastimil Babka wrote: >> >> On 4/4/22 22:02, Yang Shi wrote: >> > include/linux/huge_mm.h | 14 ++++++++++++ >> > include/linux/khugepaged.h | 59 ++++++++++++--------------------------------------- >> > include/linux/sched/coredump.h | 3 ++- >> > kernel/fork.c | 4 +--- >> > mm/huge_memory.c | 15 ++++--------- >> > mm/khugepaged.c | 76 +++++++++++++++++++++++++++++++++++++----------------------------- >> > mm/mmap.c | 14 ++++++++---- >> > mm/shmem.c | 12 ----------- >> > 8 files changed, 88 insertions(+), 109 deletions(-) >> >> Resending my general feedback from mm-commits thread to include the >> public ML's: >> >> There's modestly less lines in the end, some duplicate code removed, >> special casing in shmem.c removed, that's all good as it is. Also patch 8/8 >> become quite boring in v3, no need to change individual filesystems and also >> no hook in fault path, just the common mmap path. So I would just handle >> patch 6 differently as I just replied to it, and acked the rest. >> >> That said it's still unfortunately rather a mess of functions that have >> similar names. transhuge_vma_enabled(vma). hugepage_vma_check(vma), >> transparent_hugepage_active(vma), transhuge_vma_suitable(vma, addr)? >> So maybe still some space for further cleanups. But the series is fine as it >> is so we don't have to wait for it now. > > Yeah, I agree that we do have a lot thp checks. Will find some time to > look into it deeper later. Thanks. >> >> We could also consider that the tracking of which mm is to be scanned is >> modelled after ksm which has its own madvise flag, but also no "always" >> mode. What if for THP we only tracked actual THP madvised mm's, and in >> "always" mode just scanned all vm's, would that allow ripping out some code >> perhaps, while not adding too many unnecessary scans? If some processes are > > Do you mean add all mm(s) to the scan list unconditionally? I don't > think it will scale. It might be interesting to find out how many mm's (percentage of all mm's) are typically in the list with "always" enabled. I wouldn't be surprised if it was nearly all of them. Having at least one large enough anonymous area sounds like something all processes would have these days? >> being scanned without any effect, maybe track success separately, and scan >> them less frequently etc. That could be ultimately more efficinet than >> painfully tracking just *eligibility* for scanning in "always" mode? > > Sounds like we need a couple of different lists, for example, inactive > and active? And promote or demote mm(s) between the two lists? TBH I > don't see too many benefits at the moment. Or I misunderstood you? Yeah, something like that. It would of course require finding out whether khugepaged is consuming too much cpu uselessly these days while not processing fast enough mm's where it succeeds more. >> >> Even more radical thing to consider (maybe that's a LSF/MM level topic, too >> bad :) is that we scan pagetables in ksm, khugepaged, numa balancing, soon >> in MGLRU, and I probably forgot something else. Maybe time to think about >> unifying those scanners? > > We do have pagewalk (walk_page_range()) which is used by a couple of > mm stuff, for example, mlock, mempolicy, mprotect, etc. I'm not sure > whether it is feasible for khugepaged, ksm, etc, or not since I didn't > look that hard. But I agree it should be worth looking at. pagewalk is a framework to simplify writing code that processes page tables for a given one-off task, yeah. But this would be something a bit different, e.g. a kernel thread that does the sum of what khugepaged/ksm/etc do. Numa balancing uses task_work instead of kthread so that would require consideration on which mechanism the unified daemon would use. >> >>