Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp3024548pxk; Mon, 7 Sep 2020 00:21:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxD8tQ9W54MsRznmXRnlJ+50ImqFm9ju1aEYVnZMIYaQayoj7PnqpTur52TCOHw9SXnDMXm X-Received: by 2002:a17:906:d78c:: with SMTP id pj12mr19525252ejb.36.1599463299104; Mon, 07 Sep 2020 00:21:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1599463299; cv=none; d=google.com; s=arc-20160816; b=Q8fyeQ+IyN7JkQ6gGCm8tOrB0MNGgb7WUjZeQ1MjM7MvnEFp9L+2q0GPecdeI2ZZbA ynidCU91KFvrUeub+tdfwrGgnHkbl2DcOR/2Yf+y/VwpbxCBx0aO+kzG06twr+x6LUhq R6jFRAG83wO1HYU2d0K89JjEbQQ0b3jFdAst3F/m7U8EnqHSAPftxdD+k+8Kip/t+7ZQ BvGpRAj616GCOAFUHnz6M7X9ZJwLihIhwtrTPcSnaIBygBpfKs40bzseKtobi+43IDaS LdRaBurLCGEkNCalwHABGzbtoKGlDhSC+fCOSXlIkyHW/zgophHR/7uvrLZnMkiurn/d tzjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=YFmJSLHJLmcRqqg1xxsFa+MUbiOLn2195S2Y2+M97vU=; b=zaoMdPWQwt826LQ0bfyIe7NQp9ZR6LLycWzm3RHRi21OqpYZ+Vj3DWu9K/D+r39avM jAohl2456aVC5PkIlJ62odbkQ4ZuvioGY9ryGwNa+notE/E1czzzil0JWrI1RPp9XCSp Ci7wV2U6HITNlgNzpGEPRqfhA9ASftiha/NzHD38npIeqBfacFpi+8Zk2+EHnzZoGeP+ 1iRAawC1m8O0NyoFp0IuYXoP/wLXaxznIk86kfXpVsaRhabNWdBcL9W5VOVtXwilUJSn 5MyUWw0Ca6gLAKYlDbGHd3i2P9iDbPGyW2l5HSb3K8xCnDfY2N8zFA5t/o2pz8v6XkyA ueqw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id by16si8744638edb.294.2020.09.07.00.21.16; Mon, 07 Sep 2020 00:21:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726942AbgIGHUU (ORCPT + 99 others); Mon, 7 Sep 2020 03:20:20 -0400 Received: from mx2.suse.de ([195.135.220.15]:38944 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726410AbgIGHUR (ORCPT ); Mon, 7 Sep 2020 03:20:17 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id D30E9AD49; Mon, 7 Sep 2020 07:20:16 +0000 (UTC) Date: Mon, 7 Sep 2020 09:20:14 +0200 From: Michal Hocko To: Roman Gushchin Cc: Zi Yan , linux-mm@kvack.org, Rik van Riel , "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 00/16] 1GB THP support on x86_64 Message-ID: <20200907072014.GD30144@dhcp22.suse.cz> References: <20200902180628.4052244-1-zi.yan@sent.com> <20200903073254.GP4617@dhcp22.suse.cz> <20200903162527.GF60440@carbon.dhcp.thefacebook.com> <20200904074207.GC15277@dhcp22.suse.cz> <20200904211045.GA567128@carbon.DHCP.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200904211045.GA567128@carbon.DHCP.thefacebook.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 04-09-20 14:10:45, Roman Gushchin wrote: > On Fri, Sep 04, 2020 at 09:42:07AM +0200, Michal Hocko wrote: [...] > > An explicit opt-in sounds much more appropriate to me as well. If we go > > with a specific API then I would not make it 1GB pages specific. Why > > cannot we have an explicit interface to "defragment" address space > > range into large pages and the kernel would use large pages where > > appropriate? Or is the additional copying prohibitively expensive? > > Can you, please, elaborate a bit more here? It seems like madvise(MADV_HUGEPAGE) > provides something similar to what you're describing, but there are lot > of details here, so I'm probably missing something. MADV_HUGEPAGE is controlling a preference for THP to be used for a particular address range. So it looks similar but the historical behavior is to control page faults as well and the behavior depends on the global setup. I've had in mind something much simpler. Effectively an API to invoke khugepaged (like) functionality synchronously from the calling context on the specific address range. It could be more aggressive than the regular khugepaged and create even 1G pages (or as large THPs as page tables can handle on the particular arch for that matter). As this would be an explicit call we do not have to be worried about the resulting latency because it would be an explicit call by the userspace. The default khugepaged has a harder position there because has no understanding of the target address space and cannot make any cost/benefit evaluation so it has to be more conservative. -- Michal Hocko SUSE Labs