Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp653229imm; Wed, 11 Jul 2018 08:43:26 -0700 (PDT) X-Google-Smtp-Source: AAOMgpc52KihhNWm9tAGGyGBtMhlkBybbMOJJE3ZoSfd+MhNG1p16Nch+B9vkmon0VrXSXeem1m0 X-Received: by 2002:aa7:87d0:: with SMTP id i16-v6mr30488442pfo.82.1531323806437; Wed, 11 Jul 2018 08:43:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531323806; cv=none; d=google.com; s=arc-20160816; b=Ui0XHzfvX2C9qyQxld8Yz0m8sgIQ+g4oyqzd33b+HRMcz4k0dA4bCHqnPDASgfx9bW AdVOXfhzC27DYXPW+/LTzyw3iI2SDWakhz4k3bdg1++b8tDgABzn1AA7w4qWW8BwhL1U 1EOZSeGx53y/4nE2/8RsU24wY5wYNXbK6ARY/i+ncC+HGYtR0TDN7bwUxKV9MdQQytgo kcIeOsA3Fu2wQcyl2vDLWR/L63weH0LxvpfXs/LEPBkmn65gavY/3TKR4PQk8hmh6nyq tHB5h1E4XA1E1ecG0utxiTA2nJj3mOvbgyB+Ugnz2dfrSaQdRTPZLWhI+xruSP5uCWQd HpGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=CbGCx3FbIfO6jDqFLdcE07KLvdyjhMpCL7sJIZGX8HU=; b=LxWSor75+N38b0YOBv7Azb5++3YQZqjcsHgbLFY3q+3fDc00mJY1FoiW0tndBWOZI5 ZhWskvWIneou/Ggrpf0nI0IbXT0M0Ju4maDur8B5uAaOcZPGX/0abPKfgF7UlFKNHwF8 MZrw0uajHj0BYdrx1bvPOwCmfsH5LqUgsI6rw9z+Yylc+rHCnmhfzz6gZe+T8Gv42kr7 dyu/BV3R2EIloZEBOpgUc8vCfH9yZBTjLcd1voBJdV3HOuzYVhF+NrS98OBEDsXiXAaz g4pTpyvi7fG2BlBQI+zLoUHFRZbb5QHOm6OPVDpBJ5k39zl2dPu8TR0WuLzVqYyjRySs U5pw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d8-v6si17957613pgq.162.2018.07.11.08.43.11; Wed, 11 Jul 2018 08:43:26 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732749AbeGKMyA (ORCPT + 99 others); Wed, 11 Jul 2018 08:54:00 -0400 Received: from mx2.suse.de ([195.135.220.15]:38090 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726457AbeGKMyA (ORCPT ); Wed, 11 Jul 2018 08:54:00 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id D3EF7ADAB; Wed, 11 Jul 2018 12:49:47 +0000 (UTC) Date: Wed, 11 Jul 2018 14:49:47 +0200 From: Michal Hocko To: Mike Kravetz Cc: Cannon Matthews , Andrew Morton , Nadia Yvette Chambers , linux-mm@kvack.org, linux-kernel@vger.kernel.org, andreslc@google.com, pfeiner@google.com, dmatlack@google.com, gthelen@google.com Subject: Re: [PATCH] mm: hugetlb: don't zero 1GiB bootmem pages. Message-ID: <20180711124947.GB20172@dhcp22.suse.cz> References: <20180710184903.68239-1-cannonmatthews@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.0 (2018-05-17) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 10-07-18 13:46:57, Mike Kravetz wrote: > On 07/10/2018 11:49 AM, Cannon Matthews wrote: > > When using 1GiB pages during early boot, use the new > > memblock_virt_alloc_try_nid_raw() function to allocate memory without > > zeroing it. Zeroing out hundreds or thousands of GiB in a single core > > memset() call is very slow, and can make early boot last upwards of > > 20-30 minutes on multi TiB machines. > > > > To be safe, still zero the first sizeof(struct boomem_huge_page) bytes > > since this is used a temporary storage place for this info until > > gather_bootmem_prealloc() processes them later. > > > > The rest of the memory does not need to be zero'd as the hugetlb pages > > are always zero'd on page fault. > > > > Tested: Booted with ~3800 1G pages, and it booted successfully in > > roughly the same amount of time as with 0, as opposed to the 25+ > > minutes it would take before. > > > > Nice improvement! > > > Signed-off-by: Cannon Matthews > > --- > > mm/hugetlb.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 3612fbb32e9d..c93a2c77e881 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -2101,7 +2101,7 @@ int __alloc_bootmem_huge_page(struct hstate *h) > > for_each_node_mask_to_alloc(h, nr_nodes, node, &node_states[N_MEMORY]) { > > void *addr; > > > > - addr = memblock_virt_alloc_try_nid_nopanic( > > + addr = memblock_virt_alloc_try_nid_raw( > > huge_page_size(h), huge_page_size(h), > > 0, BOOTMEM_ALLOC_ACCESSIBLE, node); > > if (addr) { > > @@ -2109,7 +2109,12 @@ int __alloc_bootmem_huge_page(struct hstate *h) > > * Use the beginning of the huge page to store the > > * huge_bootmem_page struct (until gather_bootmem > > * puts them into the mem_map). > > + * > > + * memblock_virt_alloc_try_nid_raw returns non-zero'd > > + * memory so zero out just enough for this struct, the > > + * rest will be zero'd on page fault. > > */ > > + memset(addr, 0, sizeof(struct huge_bootmem_page)); > > This forced me to look at the usage of huge_bootmem_page. It is defined as: > struct huge_bootmem_page { > struct list_head list; > struct hstate *hstate; > #ifdef CONFIG_HIGHMEM > phys_addr_t phys; > #endif > }; > > The list and hstate fields are set immediately after allocating the memory > block here and elsewhere. However, I can't find any code that sets phys. > Although, it is potentially used in gather_bootmem_prealloc(). It appears > powerpc used this field at one time, but no longer does. > > Am I missing something? If yes, then I am missing it as well. phys is a cool name to grep for... Anyway, does it really make any sense to allow gigantic pages on HIGHMEM systems in the first place? -- Michal Hocko SUSE Labs