Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp1123467pxk; Fri, 4 Sep 2020 00:45:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxhOHFwDaLM86g7Rz7YvYOafluj/fcVqg6yqg3jlVevOqvWvv4LfAXLKuLzR0uuYh5uiVud X-Received: by 2002:a17:906:2717:: with SMTP id z23mr6105462ejc.19.1599205540562; Fri, 04 Sep 2020 00:45:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1599205540; cv=none; d=google.com; s=arc-20160816; b=By1wtmXCj2CFO26CxQOPttrUyt7SLHpUtHiHM39Y678L+maUuljyVFB2myoaOWLMxs djJNHOl2FpOQddRdU9dhCjx3aqhwpYvOODkzXJpjcWvbhqqxKkuvH+QmSvTHt/Rjti7D 3jmCOoOxzSDyW/kGCel3zvk6ptcWg8NfqloiTKjBLMBpk6SlWksrwKuMTMTAButsMqPW daBaTambQgsqlzBpb9XZoL54fFykSjYXlxW+ZOIu/RYTQ+MlEJPJutuhuZp6+emNWWxL 7UfqCKGS7t5N75AdTMdxNOD1Cz+5u2M7oHMFAU/AMo5B62t18Oyuy/MgCmyqq2NVOJW0 PU+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=2VB7Dl0bH8RraFc4ZVoXTV6rbFh+Om+XVULtfG2M5tk=; b=gQMs+qjvk3HdYiHZ4U93CeX/FrflMVFg66Cnw5IMdWnXICt9Vpc7GmUnvsIJvR+Yz9 gxnWkzUl/lR0l3CsYQrEMrVyefEt5EFFDGZDpeistszK4SaLSm06dWYHi8ohPNxG5rT9 peY4E2eURMW2d6itdBT6wiKmUr2LE7XbYu65O+sr/s3/pgsw8g+CB3Zpx0UfQ01HU114 2h4MnzqXVde7WUD5iBBpAEFDuJAdQOUgf/WL/hyU5Zpfn+qLt/CH86HUBavUyfgaaI60 3SAyknJSmmWthbEQuFG86ahQVe/WIagk4IbEHBb+tN3LysyYgN2f6xFuC0OMOKuXf0yT nsag== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y22si3714427ejc.453.2020.09.04.00.45.17; Fri, 04 Sep 2020 00:45:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729664AbgIDHmN (ORCPT + 99 others); Fri, 4 Sep 2020 03:42:13 -0400 Received: from mx2.suse.de ([195.135.220.15]:41222 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726127AbgIDHmK (ORCPT ); Fri, 4 Sep 2020 03:42:10 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id BBDB1ABC1; Fri, 4 Sep 2020 07:42:09 +0000 (UTC) Date: Fri, 4 Sep 2020 09:42:07 +0200 From: Michal Hocko To: Roman Gushchin Cc: Zi Yan , linux-mm@kvack.org, Rik van Riel , "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 00/16] 1GB THP support on x86_64 Message-ID: <20200904074207.GC15277@dhcp22.suse.cz> References: <20200902180628.4052244-1-zi.yan@sent.com> <20200903073254.GP4617@dhcp22.suse.cz> <20200903162527.GF60440@carbon.dhcp.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200903162527.GF60440@carbon.dhcp.thefacebook.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 03-09-20 09:25:27, Roman Gushchin wrote: > On Thu, Sep 03, 2020 at 09:32:54AM +0200, Michal Hocko wrote: > > On Wed 02-09-20 14:06:12, Zi Yan wrote: > > > From: Zi Yan > > > > > > Hi all, > > > > > > This patchset adds support for 1GB THP on x86_64. It is on top of > > > v5.9-rc2-mmots-2020-08-25-21-13. > > > > > > 1GB THP is more flexible for reducing translation overhead and increasing the > > > performance of applications with large memory footprint without application > > > changes compared to hugetlb. > > > > Please be more specific about usecases. This better have some strong > > ones because THP code is complex enough already to add on top solely > > based on a generic TLB pressure easing. > > Hello, Michal! > > We at Facebook are using 1 GB hugetlbfs pages and are getting noticeable > performance wins on some workloads. Let me clarify. I am not questioning 1GB (or large) pages in general. I believe it is quite clear that there are usecases which hugely benefit from them. I am mostly asking for the transparent part of it which traditionally means that userspace mostly doesn't have to care and get them. 2MB THPs have established certain expectations mostly a really aggressive pro-active instanciation. This has bitten us many times and create a "you need to disable THP to fix your problem whatever that is" cargo cult. I hope we do not want to repeat that mistake here again. > Historically we allocated gigantic pages at the boot time, but recently moved > to cma-based dynamic approach. Still, hugetlbfs interface requires more management > than we would like to do. 1 GB THP seems to be a better alternative. So I definitely > see it as a very useful feature. > > Given the cost of an allocation, I'm slightly skeptical about an automatic > heuristics-based approach, but if an application can explicitly mark target areas > with madvise(), I don't see why it wouldn't work. An explicit opt-in sounds much more appropriate to me as well. If we go with a specific API then I would not make it 1GB pages specific. Why cannot we have an explicit interface to "defragment" address space range into large pages and the kernel would use large pages where appropriate? Or is the additional copying prohibitively expensive? -- Michal Hocko SUSE Labs