Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp2563463pxb; Wed, 9 Feb 2022 23:53:45 -0800 (PST) X-Google-Smtp-Source: ABdhPJwoO1MC4l7CTLa0b06sVmZDnorBTAGM4AOty23yNkw4nYhlTTTzYC1PYWdQNdwtuFYuKkuZ X-Received: by 2002:a17:90a:6585:: with SMTP id k5mr1524395pjj.94.1644479625410; Wed, 09 Feb 2022 23:53:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644479625; cv=none; d=google.com; s=arc-20160816; b=pCYeKn0sztb+EgnDYIS0kR3dHZwxpY+urGS2g99G8L1vbUKt/RNwNFNoYmhzTxkodt xwjIV8znGOGct9XftWpiqjJDY3y5hR37F60E+qoyC/xdB0HSLaHiHDAGh07lZMFm/9dN gALq/yVohRuXVJg+V6Uh677LPNkn0UorppYV054tBNNH01t49YU1YwxPe0tko/Ia607M kdZTsJ5iCnFfCi8x1/O393EzSWWMXEw/5Cxd4kxrExdiKkZovSvC0DPU550KgW5jCvGk T0zs4i+D8YHrHdjRLTwzArTTIx2JWPICZjlBT1txKPlMs06fPmxaZ3qIsw/g2ip/PHb0 oZeQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=H0bpw8ZiHtzokDVaZikQejorexIqOBZeSJyqUU8svCs=; b=kG6sLA9eb8OlWgZtKBCCTDVjzgz/CEmycrmPjyR9NgdTVphNz4m5nxu+wjghc4W+dW bmMXK/lr9jQdKaTcbSLd1DmFsZqydauB4bYYf8g9m5ndns7cgraxwFHOm0Wn93HIkj4N 7TUwQGjkuOIcae/4hCv/6Q1wXlDovQxgVycK5YifAEbbuL82/rDyodxZfacCDhnk9f+O NvyaBxjThstDWErVInd49XJUgbFsKC839t7geywTYm8++poZZvt5+63Bab8BiT73aPhC ePqz4IQhY+bSfxNdWyojhhJBGgLDsQAGP2whz099xqI1IUijEhj8JiV/WwX1osMNt3GV CxvA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=0cKvSKcc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u9si1356622ple.380.2022.02.09.23.53.28; Wed, 09 Feb 2022 23:53:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=0cKvSKcc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236417AbiBJHpn (ORCPT + 99 others); Thu, 10 Feb 2022 02:45:43 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:56166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236403AbiBJHpm (ORCPT ); Thu, 10 Feb 2022 02:45:42 -0500 Received: from mail-yb1-xb2e.google.com (mail-yb1-xb2e.google.com [IPv6:2607:f8b0:4864:20::b2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EAEF5D71 for ; Wed, 9 Feb 2022 23:45:43 -0800 (PST) Received: by mail-yb1-xb2e.google.com with SMTP id p19so12955973ybc.6 for ; Wed, 09 Feb 2022 23:45:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=H0bpw8ZiHtzokDVaZikQejorexIqOBZeSJyqUU8svCs=; b=0cKvSKcc1KKaEgZvwZfBB/sT+3/J1bnyZKqgjpIZH3rOhPAVTwoVlFnAwsIH1eY6+w xG5HvcMtYGBEMZD95shkO5qKpKZQqrDhZBPWJUNSKw8AEoiJQlXPfzDggY1b02YLIVYz INdKXKl5tkMhiaCPWdSeatI0sUHNHWRnSzSN4bC74ET3lwztwbIa3iQlEurjNt9gYOfG 9vxZ8pfHutYzYB+KdqxiEDx7G0XFvBEbc8IXucA0do8ZjID/HpgHx8FpfUxtWOjH11DW 1lJ7dT2Nuhmr/xXRAnR1K+TaX8CFHjHf+2L5A0sDifPNWsD3JWVNe0TV/JJPl+7m/uQv AWEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=H0bpw8ZiHtzokDVaZikQejorexIqOBZeSJyqUU8svCs=; b=KPM9FOStDJTJefU/XS4rsj8m7JSBUJjdl45oLMJdH4Lvhpgy4GXoCjHVG4eoK6j3K2 XebHaPBncAXP/FUGS4r+CMfuAT1nH1UEh6+YVjxe+ckP4l4OwjTTaAZNDhXUNtaVnwiv ux1xg/z8XZ5DAbsvgWtIvlS7Ju48b37YboZ2RDa39Wa58OlEXUNnKdLD9cXcFh1/BWGM 0pz+PC2fkGaeTx8ymNPN94rqqEXUwkMXAUrGVRlQdsXV9KXpQOcvk2WHcjjrAPxEAKgx T5E9eCf5vflocgW1GYjCYyiYKUAjWTFjWuahN+7EkO7ZVEE6sw89GNyOUSkArSRQ6oix L/nA== X-Gm-Message-State: AOAM532pC/WikTVNRfMTivtyFdWe/Cs+0SVhVhqjefa9KGfKhCsZuHmJ cxG6nbJMLPGS30hPwk3kKEd4o+pt6m49AujotdK49Q== X-Received: by 2002:a81:4051:: with SMTP id m17mr5819673ywn.319.1644479143117; Wed, 09 Feb 2022 23:45:43 -0800 (PST) MIME-Version: 1.0 References: <20211101031651.75851-1-songmuchun@bytedance.com> <35c5217d-eb8f-6f70-544a-a3e8bd009a46@oracle.com> <20211123190952.7d1e0cac2d72acacd2df016c@linux-foundation.org> In-Reply-To: From: Muchun Song Date: Thu, 10 Feb 2022 15:45:06 +0800 Message-ID: Subject: Re: [PATCH v7 0/5] Free the 2nd vmemmap page associated with each HugeTLB page To: Mike Kravetz Cc: Andrew Morton , Oscar Salvador , David Hildenbrand , Michal Hocko , Matthew Wilcox , Jonathan Corbet , Xiongchun duan , Fam Zheng , Muchun Song , Qi Zheng , Linux Doc Mailing List , LKML , Linux Memory Management List , "Song Bao Hua (Barry Song)" , Barry Song <21cnbao@gmail.com>, "Bodeddula, Balasubramaniam" , Jue Wang Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 10, 2022 at 6:49 AM Mike Kravetz wrote: > > On 2/8/22 23:44, Muchun Song wrote: > > On Wed, Jan 26, 2022 at 4:04 PM Muchun Song wrote: > >> > >> On Wed, Nov 24, 2021 at 11:09 AM Andrew Morton > >> wrote: > >>> > >>> On Mon, 22 Nov 2021 12:21:32 +0800 Muchun Song wrote: > >>> > >>>> On Wed, Nov 10, 2021 at 2:18 PM Muchun Song wrote: > >>>>> > >>>>> On Tue, Nov 9, 2021 at 3:33 AM Mike Kravetz wrote: > >>>>>> > >>>>>> On 11/8/21 12:16 AM, Muchun Song wrote: > >>>>>>> On Mon, Nov 1, 2021 at 11:22 AM Muchun Song wrote: > >>>>>>>> > >>>>>>>> This series can minimize the overhead of struct page for 2MB HugeTLB pages > >>>>>>>> significantly. It further reduces the overhead of struct page by 12.5% for > >>>>>>>> a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB > >>>>>>>> HugeTLB. It is a nice gain. Comments and reviews are welcome. Thanks. > >>>>>>>> > >>>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> Ping guys. Does anyone have any comments or suggestions > >>>>>>> on this series? > >>>>>>> > >>>>>>> Thanks. > >>>>>>> > >>>>>> > >>>>>> I did look over the series earlier. I have no issue with the hugetlb and > >>>>>> vmemmap modifications as they are enhancements to the existing > >>>>>> optimizations. My primary concern is the (small) increased overhead > >>>>>> for the helpers as outlined in your cover letter. Since these helpers > >>>>>> are not limited to hugetlb and used throughout the kernel, I would > >>>>>> really like to get comments from others with a better understanding of > >>>>>> the potential impact. > >>>>> > >>>>> Thanks Mike. I'd like to hear others' comments about this as well. > >>>>> From my point of view, maybe the (small) overhead is acceptable > >>>>> since it only affects the head page, however Matthew Wilcox's folio > >>>>> series could reduce this situation as well. > >>> > >>> I think Mike was inviting you to run some tests to quantify the > >>> overhead ;) > >> > >> Hi Andrew, > >> > >> Sorry for the late reply. > >> > >> Specific overhead figures are already in the cover letter. Also, > >> I did some other tests, e.g. kernel compilation, sysbench. I didn't > >> see any regressions. > > > > The overhead is introduced by page_fixed_fake_head() which > > has an "if" statement and an access to a possible cold cache line. > > I think the main overhead is from the latter. However, probabilistically, > > only 1/64 of the pages need to do the latter. And > > page_fixed_fake_head() is already simple (I mean the overhead > > is small enough) and many performance bottlenecks in mm are > > not in compound_head(). This also matches the tests I did. > > I didn't see any regressions after enabling this feature. > > > > I knew Mike's concern is the increased overhead to use cases > > beyond HugeTLB. If we really want to avoid the access to > > a possible cold cache line, we can introduce a new page > > flag like PG_hugetlb and test if it is set in the page->flags, > > if so, then return the read head page struct. Then > > page_fixed_fake_head() looks like below. > > > > static __always_inline const struct page *page_fixed_fake_head(const > > struct page *page) > > { > > if (!hugetlb_free_vmemmap_enabled()) > > return page; > > > > if (test_bit(PG_hugetlb, &page->flags)) { > > unsigned long head = READ_ONCE(page[1].compound_head); > > > > if (likely(head & 1)) > > return (const struct page *)(head - 1); > > } > > return page; > > } > > > > But I don't think it's worth doing this. > > > > Hi Mike and Andrew, > > > > Since these helpers are not limited to hugetlb and used throughout the > > kernel, I would really like to get comments from others with a better > > understanding of the potential impact. Do you have any appropriate > > reviewers to invite? > > > > I think the appropriate people are already on Cc as they provided input on > the original vmemmap optimization series. > > The question that needs to be answered is simple enough: Is the savings of > one vmemmap page per hugetlb page worth the extra minimal overhead in > compound_head()? Like most things, this depends on workload. > > One thing to note is that compound_page() overhead is only introduced if > hugetlb vmemmap freeing is enabled. Correct? Definitely correct. > During the original vmemmap > optimization discussions, people thought it important that this be 'opt in'. I do not know if distos will enable this by default. But, perhaps the > potential overhead can be thought of as just part of 'opting in' for > vmemmap optimizations. I agree. Does anyone else have a different opinion? Thanks.