Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp36999imm; Thu, 12 Jul 2018 13:39:54 -0700 (PDT) X-Google-Smtp-Source: AAOMgpeEGn+D4prt2ihAxgZuS4MTSkUdVuK2FjoXelKjKz3aDlv6qKVQdqxi3oXirzKbRrpjZnF+ X-Received: by 2002:a17:902:7481:: with SMTP id h1-v6mr3617464pll.183.1531427994591; Thu, 12 Jul 2018 13:39:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531427994; cv=none; d=google.com; s=arc-20160816; b=gRUnHEe6RqMt9PoLBHOPJ6duJK4ynFVrl8RbtlvlOgTtheqQvqtbpw7JOjpn17FVi3 GsNkoX6Z0vbM2qWJPZmNVWVOfKoToaxpTzUbibHO+P1SwagFlKPIAoW0vQYPusA7cckN rTRWUpA7tXGx5Ss4MF3MWWWugOn40cUg8i6X4IO2XTKENOqEVhv80D6N6GE3fEM+gwpL Er2AOFOIDs3WaXXWzBEHRu5EmRGYc3OHA0eZY1nNCOosDhpMsaB/spYd1rw5yV6lNocj LHMkJALyHIvpq0lrvAbIUyPs3e/E3s6hlGSVwfPbSKmZ2k9XYJ6ayaajQBhny9VSPw91 W+Bg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:to:from:dkim-signature:arc-authentication-results; bh=52nPtAFnoaat4GGsNBA80i9HKao4FbGCVuSEyAA24oE=; b=OYrO0aoROKtduxfuvrteyd6+hnV7whIliFr4Fsgo37VIsHiBNvAgnegDsaTeJAzdXN 10x5dF3AKPmt9mX1AQdXxI5G+0pcFo8nerSlMX7ak2HHh10NhmzoRwA5rhgpgSp/gtb9 0Xi72E8Fnkz1GgwXpEypFQ7Dn8pYfiZfP17NMyvOf45vBqmyle63DS+XkXkVSbSi0duK hYTb5E9CjfHG8bsC6y4utmFdkgHSvpUAzDI2LjrEBHyOVpoVBiyfAmZoimtiBAzQpK4k JQ2rrGRoXYDNumpP3O3uBiitEROn1lTzSnWXjHcFeejOt6UIkIwDpTZkYTF96lDhiXuz lB4Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=vyTvYu2P; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l33-v6si22394177pld.514.2018.07.12.13.39.39; Thu, 12 Jul 2018 13:39:54 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=vyTvYu2P; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732675AbeGLUte (ORCPT + 99 others); Thu, 12 Jul 2018 16:49:34 -0400 Received: from aserp2130.oracle.com ([141.146.126.79]:57068 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732660AbeGLUte (ORCPT ); Thu, 12 Jul 2018 16:49:34 -0400 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6CKUEpJ184252; Thu, 12 Jul 2018 20:37:52 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=52nPtAFnoaat4GGsNBA80i9HKao4FbGCVuSEyAA24oE=; b=vyTvYu2Pxe3Ww70cHtm8uzcbFbnjAHH921jBSVmzqAl7SFOClAB6YoqhKiqFCBPJhoDa /m9MoI6iut6r5tIF6gBk1P3uLSaBjV6fU6MdSTIQKPkaltXD6zdWFdvy59gtdqwpdqBl CnGlzUB+Zwibk7R0U58cJ8qhV2usQZn7+7O6eqF0eEBCkkE1EU+ZSiVYj4DDW1PO0KPJ bU33qdaMSi3bvQ54E0h4vZTGFI6MmlBlvMO0sptRmVK5mKneUJ8q7os0nRLmCfLGCzPe RfRhbWmug0xWWRY7Y7XnRalZIHs/wtTRxwPskYab+qpNL3U17D+ik/Gmr3jXL5bxWdGZ JQ== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2130.oracle.com with ESMTP id 2k2p765bvf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 12 Jul 2018 20:37:51 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w6CKbo8v030571 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 12 Jul 2018 20:37:51 GMT Received: from abhmp0013.oracle.com (abhmp0013.oracle.com [141.146.116.19]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w6CKboZF004324; Thu, 12 Jul 2018 20:37:50 GMT Received: from localhost.localdomain (/73.69.118.222) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 12 Jul 2018 13:37:49 -0700 From: Pavel Tatashin To: steven.sistare@oracle.com, daniel.m.jordan@oracle.com, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, kirill.shutemov@linux.intel.com, mhocko@suse.com, linux-mm@kvack.org, dan.j.williams@intel.com, jack@suse.cz, jglisse@redhat.com, jrdr.linux@gmail.com, bhe@redhat.com, gregkh@linuxfoundation.org, vbabka@suse.cz, richard.weiyang@gmail.com, dave.hansen@intel.com, rientjes@google.com, mingo@kernel.org, osalvador@techadventures.net, pasha.tatashin@oracle.com, abdhalee@linux.vnet.ibm.com, mpe@ellerman.id.au Subject: [PATCH v5 4/5] mm/sparse: add new sparse_init_nid() and sparse_init() Date: Thu, 12 Jul 2018 16:37:29 -0400 Message-Id: <20180712203730.8703-5-pasha.tatashin@oracle.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180712203730.8703-1-pasha.tatashin@oracle.com> References: <20180712203730.8703-1-pasha.tatashin@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8952 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807120216 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org sparse_init() requires to temporary allocate two large buffers: usemap_map and map_map. Baoquan He has identified that these buffers are so large that Linux is not bootable on small memory machines, such as a kdump boot. The buffers are especially large when CONFIG_X86_5LEVEL is set, as they are scaled to the maximum physical memory size. Baoquan provided a fix, which reduces these sizes of these buffers, but it is much better to get rid of them entirely. Add a new way to initialize sparse memory: sparse_init_nid(), which only operates within one memory node, and thus allocates memory either in large contiguous block or allocates section by section. This eliminates the need for use of temporary buffers. For simplified bisecting and review temporarly call sparse_init() new_sparse_init(), the new interface is going to be enabled as well as old code removed in the next patch. Signed-off-by: Pavel Tatashin --- mm/sparse.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) diff --git a/mm/sparse.c b/mm/sparse.c index 01c616342909..4087b94afddf 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -200,6 +200,11 @@ static inline int next_present_section_nr(int section_nr) (section_nr <= __highest_present_section_nr)); \ section_nr = next_present_section_nr(section_nr)) +static inline unsigned long first_present_section_nr(void) +{ + return next_present_section_nr(-1); +} + /* * Record how many memory sections are marked as present * during system bootup. @@ -668,6 +673,86 @@ void __init sparse_init(void) memblock_free_early(__pa(usemap_map), size); } +/* + * Initialize sparse on a specific node. The node spans [pnum_begin, pnum_end) + * And number of present sections in this node is map_count. + */ +static void __init sparse_init_nid(int nid, unsigned long pnum_begin, + unsigned long pnum_end, + unsigned long map_count) +{ + unsigned long pnum, usemap_longs, *usemap; + struct page *map; + + usemap_longs = BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS); + usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nid), + usemap_size() * + map_count); + if (!usemap) { + pr_err("%s: node[%d] usemap allocation failed", __func__, nid); + goto failed; + } + sparse_buffer_init(map_count * section_map_size(), nid); + for_each_present_section_nr(pnum_begin, pnum) { + if (pnum >= pnum_end) + break; + + map = sparse_mem_map_populate(pnum, nid, NULL); + if (!map) { + pr_err("%s: node[%d] memory map backing failed. Some memory will not be available.", + __func__, nid); + pnum_begin = pnum; + goto failed; + } + check_usemap_section_nr(nid, usemap); + sparse_init_one_section(__nr_to_section(pnum), pnum, map, usemap); + usemap += usemap_longs; + } + sparse_buffer_fini(); + return; +failed: + /* We failed to allocate, mark all the following pnums as not present */ + for_each_present_section_nr(pnum_begin, pnum) { + struct mem_section *ms; + + if (pnum >= pnum_end) + break; + ms = __nr_to_section(pnum); + ms->section_mem_map = 0; + } +} + +/* + * Allocate the accumulated non-linear sections, allocate a mem_map + * for each and record the physical to section mapping. + */ +void __init new_sparse_init(void) +{ + unsigned long pnum_begin = first_present_section_nr(); + int nid_begin = sparse_early_nid(__nr_to_section(pnum_begin)); + unsigned long pnum_end, map_count = 1; + + /* Setup pageblock_order for HUGETLB_PAGE_SIZE_VARIABLE */ + set_pageblock_order(); + + for_each_present_section_nr(pnum_begin + 1, pnum_end) { + int nid = sparse_early_nid(__nr_to_section(pnum_end)); + + if (nid == nid_begin) { + map_count++; + continue; + } + /* Init node with sections in range [pnum_begin, pnum_end) */ + sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count); + nid_begin = nid; + pnum_begin = pnum_end; + map_count = 1; + } + /* cover the last node */ + sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count); + vmemmap_populate_print_last(); +} + #ifdef CONFIG_MEMORY_HOTPLUG /* Mark all memory sections within the pfn range as online */ -- 2.18.0