Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp1980714imm; Sat, 30 Jun 2018 08:14:41 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJJLW8OkBXxnrYs2jMHHBsDaLjWga+6+218MOPFf8XMguvNO206OlNyi1YuR49TwPsPHJnl X-Received: by 2002:a17:902:1e4:: with SMTP id b91-v6mr19198136plb.155.1530371681453; Sat, 30 Jun 2018 08:14:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530371681; cv=none; d=google.com; s=arc-20160816; b=F+ly0ajGy0Yi2GBa5x73gXeS7O45c22t7Z1z0n0kMYT6HBLvN1ZTFFS1efgWh2ZHze JsyDaZ2IOPOqtDZ+InBe7Qwu7xmR1jNEX39I9p347OzeTnp0C2NuONZM/lMG3b9eACR6 h11dbt/LzhTWKVYeeVbAZzrXlkfeiBLvyTlInSxcwD7a35mwF3/r0x5RPxMhLbertCou CH1UKgFds5ETpjsLfbYFjormvsfdXWBWcsG2l+45gTtINfWHMBBF3sZQ1OV+9VwjL/Tz cnGjG0DLaPqmWnM9zI5o+gLDMRIuVUTNR0/JpJ37IjTnuWIW7ee93HJdWcsqIUD3wD1n 4mqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=GH9BCxz69CqPG2/g58JCzPJUhP4wCORVeDqkXtkoDhM=; b=wtpkkGVTRnPw9cGHQLY8bqMRaVBx48IBDpqztEw5eJIA3LpsrBKl735EBnJBlow94b WOCgCbvMBl4DD2GIE0UIi5x3Cs9mhqlB1Fzyt5bD/77z7Sd6AF8AS+KySsdWGUYTCvtQ ZB//0ae37OY9B8PU2700mEP7j2AV64fEdVBrEm4LSubReMj0GqjLzgR4ApdNrPynDAYS iPJTDdFE87lDj9Q58wJNybInGOIcX++4qYvzPuOWAseTl5qhjE8HAK1dROiTukHDsgy3 IfYjdk4lz06YU1cq8AHCn2HG6QooFVyceOsUTxHgQOHdeMUiqYwd92NYgkJeH/JYWf9f UanQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m37-v6si11481760plg.491.2018.06.30.08.14.27; Sat, 30 Jun 2018 08:14:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751283AbeF3PNs (ORCPT + 99 others); Sat, 30 Jun 2018 11:13:48 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:46508 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751109AbeF3PNp (ORCPT ); Sat, 30 Jun 2018 11:13:45 -0400 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w5UF4CBj127097 for ; Sat, 30 Jun 2018 11:13:45 -0400 Received: from e06smtp07.uk.ibm.com (e06smtp07.uk.ibm.com [195.75.94.103]) by mx0b-001b2d01.pphosted.com with ESMTP id 2jx2t3ya93-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Sat, 30 Jun 2018 11:13:44 -0400 Received: from localhost by e06smtp07.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sat, 30 Jun 2018 16:13:43 +0100 Received: from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198) by e06smtp07.uk.ibm.com (192.168.101.137) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Sat, 30 Jun 2018 16:13:38 +0100 Received: from d06av24.portsmouth.uk.ibm.com (d06av24.portsmouth.uk.ibm.com [9.149.105.60]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w5UFDcVR33292534 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Sat, 30 Jun 2018 15:13:38 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C74E142041; Sat, 30 Jun 2018 16:13:24 +0100 (BST) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 16E484203F; Sat, 30 Jun 2018 16:13:23 +0100 (BST) Received: from rapoport-lnx (unknown [9.148.205.240]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Sat, 30 Jun 2018 16:13:22 +0100 (BST) Received: by rapoport-lnx (sSMTP sendmail emulation); Sat, 30 Jun 2018 18:13:35 +0300 From: Mike Rapoport To: Richard Henderson , Ivan Kokshaysky Cc: Michal Hocko , linux-alpha , linux-mm , lkml , Mike Rapoport Subject: [PATCH v2] alpha: switch to NO_BOOTMEM Date: Sat, 30 Jun 2018 18:13:30 +0300 X-Mailer: git-send-email 2.7.4 X-TM-AS-GCONF: 00 x-cbid: 18063015-0028-0000-0000-000002D6B6E3 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18063015-0029-0000-0000-0000238E2C06 Message-Id: <1530371610-22174-1-git-send-email-rppt@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-06-30_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1806300178 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Replace bootmem allocator with memblock and enable use of NO_BOOTMEM like on most other architectures. Alpha gets the description of the physical memory from the firmware as an array of memory clusters. Each cluster that is not reserved by the firmware is added to memblock.memory. Once the memblock.memory is set up, we reserve the kernel and initrd pages with memblock reserve. Since we don't need the bootmem bitmap anymore, the code that finds an appropriate place is removed. The conversion does not take care of NUMA support which is marked broken for more than 10 years now. Signed-off-by: Mike Rapoport --- v2: describe the conversion as per Michal's request Tested with qemu-system-alpha. I've added some tweaks to sys_dp264 to force memory split for testing with CONFIG_DISCONTIGMEM=y The allyesconfig build requires update to DEFERRED_STRUCT_PAGE_INIT dependencies [1] which is already in -mm tree. [1] https://lkml.org/lkml/2018/6/29/353 arch/alpha/Kconfig | 2 + arch/alpha/kernel/core_irongate.c | 4 +- arch/alpha/kernel/setup.c | 98 ++++----------------------------- arch/alpha/mm/numa.c | 113 +++++--------------------------------- 4 files changed, 29 insertions(+), 188 deletions(-) diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index 04a4a138ed13..040692a8d433 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -30,6 +30,8 @@ config ALPHA select ODD_RT_SIGACTION select OLD_SIGSUSPEND select CPU_NO_EFFICIENT_FFS if !ALPHA_EV67 + select HAVE_MEMBLOCK + select NO_BOOTMEM help The Alpha is a 64-bit general-purpose processor designed and marketed by the Digital Equipment Corporation of blessed memory, diff --git a/arch/alpha/kernel/core_irongate.c b/arch/alpha/kernel/core_irongate.c index aec757250e07..f70986683fc6 100644 --- a/arch/alpha/kernel/core_irongate.c +++ b/arch/alpha/kernel/core_irongate.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include @@ -241,8 +242,7 @@ albacore_init_arch(void) size / 1024); } #endif - reserve_bootmem_node(NODE_DATA(0), pci_mem, memtop - - pci_mem, BOOTMEM_DEFAULT); + memblock_reserve(pci_mem, memtop - pci_mem); printk("irongate_init_arch: temporarily reserving " "region %08lx-%08lx for PCI\n", pci_mem, memtop - 1); } diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c index 5576f7646fb6..4f0d94471bc9 100644 --- a/arch/alpha/kernel/setup.c +++ b/arch/alpha/kernel/setup.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -312,9 +313,7 @@ setup_memory(void *kernel_end) { struct memclust_struct * cluster; struct memdesc_struct * memdesc; - unsigned long start_kernel_pfn, end_kernel_pfn; - unsigned long bootmap_size, bootmap_pages, bootmap_start; - unsigned long start, end; + unsigned long kernel_size; unsigned long i; /* Find free clusters, and init and free the bootmem accordingly. */ @@ -322,6 +321,8 @@ setup_memory(void *kernel_end) (hwrpb->mddt_offset + (unsigned long) hwrpb); for_each_mem_cluster(memdesc, cluster, i) { + unsigned long end; + printk("memcluster %lu, usage %01lx, start %8lu, end %8lu\n", i, cluster->usage, cluster->start_pfn, cluster->start_pfn + cluster->numpages); @@ -335,6 +336,9 @@ setup_memory(void *kernel_end) end = cluster->start_pfn + cluster->numpages; if (end > max_low_pfn) max_low_pfn = end; + + memblock_add(PFN_PHYS(cluster->start_pfn), + cluster->numpages << PAGE_SHIFT); } /* @@ -363,87 +367,9 @@ setup_memory(void *kernel_end) max_low_pfn = mem_size_limit; } - /* Find the bounds of kernel memory. */ - start_kernel_pfn = PFN_DOWN(KERNEL_START_PHYS); - end_kernel_pfn = PFN_UP(virt_to_phys(kernel_end)); - bootmap_start = -1; - - try_again: - if (max_low_pfn <= end_kernel_pfn) - panic("not enough memory to boot"); - - /* We need to know how many physically contiguous pages - we'll need for the bootmap. */ - bootmap_pages = bootmem_bootmap_pages(max_low_pfn); - - /* Now find a good region where to allocate the bootmap. */ - for_each_mem_cluster(memdesc, cluster, i) { - if (cluster->usage & 3) - continue; - - start = cluster->start_pfn; - end = start + cluster->numpages; - if (start >= max_low_pfn) - continue; - if (end > max_low_pfn) - end = max_low_pfn; - if (start < start_kernel_pfn) { - if (end > end_kernel_pfn - && end - end_kernel_pfn >= bootmap_pages) { - bootmap_start = end_kernel_pfn; - break; - } else if (end > start_kernel_pfn) - end = start_kernel_pfn; - } else if (start < end_kernel_pfn) - start = end_kernel_pfn; - if (end - start >= bootmap_pages) { - bootmap_start = start; - break; - } - } - - if (bootmap_start == ~0UL) { - max_low_pfn >>= 1; - goto try_again; - } - - /* Allocate the bootmap and mark the whole MM as reserved. */ - bootmap_size = init_bootmem(bootmap_start, max_low_pfn); - - /* Mark the free regions. */ - for_each_mem_cluster(memdesc, cluster, i) { - if (cluster->usage & 3) - continue; - - start = cluster->start_pfn; - end = cluster->start_pfn + cluster->numpages; - if (start >= max_low_pfn) - continue; - if (end > max_low_pfn) - end = max_low_pfn; - if (start < start_kernel_pfn) { - if (end > end_kernel_pfn) { - free_bootmem(PFN_PHYS(start), - (PFN_PHYS(start_kernel_pfn) - - PFN_PHYS(start))); - printk("freeing pages %ld:%ld\n", - start, start_kernel_pfn); - start = end_kernel_pfn; - } else if (end > start_kernel_pfn) - end = start_kernel_pfn; - } else if (start < end_kernel_pfn) - start = end_kernel_pfn; - if (start >= end) - continue; - - free_bootmem(PFN_PHYS(start), PFN_PHYS(end) - PFN_PHYS(start)); - printk("freeing pages %ld:%ld\n", start, end); - } - - /* Reserve the bootmap memory. */ - reserve_bootmem(PFN_PHYS(bootmap_start), bootmap_size, - BOOTMEM_DEFAULT); - printk("reserving pages %ld:%ld\n", bootmap_start, bootmap_start+PFN_UP(bootmap_size)); + /* Reserve the kernel memory. */ + kernel_size = virt_to_phys(kernel_end) - KERNEL_START_PHYS; + memblock_reserve(KERNEL_START_PHYS, kernel_size); #ifdef CONFIG_BLK_DEV_INITRD initrd_start = INITRD_START; @@ -459,8 +385,8 @@ setup_memory(void *kernel_end) initrd_end, phys_to_virt(PFN_PHYS(max_low_pfn))); } else { - reserve_bootmem(virt_to_phys((void *)initrd_start), - INITRD_SIZE, BOOTMEM_DEFAULT); + memblock_reserve(virt_to_phys((void *)initrd_start), + INITRD_SIZE); } } #endif /* CONFIG_BLK_DEV_INITRD */ diff --git a/arch/alpha/mm/numa.c b/arch/alpha/mm/numa.c index a9e86475f169..26cd925d19b1 100644 --- a/arch/alpha/mm/numa.c +++ b/arch/alpha/mm/numa.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -59,12 +60,10 @@ setup_memory_node(int nid, void *kernel_end) struct memclust_struct * cluster; struct memdesc_struct * memdesc; unsigned long start_kernel_pfn, end_kernel_pfn; - unsigned long bootmap_size, bootmap_pages, bootmap_start; unsigned long start, end; unsigned long node_pfn_start, node_pfn_end; unsigned long node_min_pfn, node_max_pfn; int i; - unsigned long node_datasz = PFN_UP(sizeof(pg_data_t)); int show_init = 0; /* Find the bounds of current node */ @@ -134,24 +133,14 @@ setup_memory_node(int nid, void *kernel_end) /* Cute trick to make sure our local node data is on local memory */ node_data[nid] = (pg_data_t *)(__va(node_min_pfn << PAGE_SHIFT)); #endif - /* Quasi-mark the pg_data_t as in-use */ - node_min_pfn += node_datasz; - if (node_min_pfn >= node_max_pfn) { - printk(" not enough mem to reserve NODE_DATA"); - return; - } - NODE_DATA(nid)->bdata = &bootmem_node_data[nid]; - printk(" Detected node memory: start %8lu, end %8lu\n", node_min_pfn, node_max_pfn); DBGDCONT(" DISCONTIG: node_data[%d] is at 0x%p\n", nid, NODE_DATA(nid)); - DBGDCONT(" DISCONTIG: NODE_DATA(%d)->bdata is at 0x%p\n", nid, NODE_DATA(nid)->bdata); /* Find the bounds of kernel memory. */ start_kernel_pfn = PFN_DOWN(KERNEL_START_PHYS); end_kernel_pfn = PFN_UP(virt_to_phys(kernel_end)); - bootmap_start = -1; if (!nid && (node_max_pfn < end_kernel_pfn || node_min_pfn > start_kernel_pfn)) panic("kernel loaded out of ram"); @@ -161,89 +150,11 @@ setup_memory_node(int nid, void *kernel_end) has much larger alignment than 8Mb, so it's safe. */ node_min_pfn &= ~((1UL << (MAX_ORDER-1))-1); - /* We need to know how many physically contiguous pages - we'll need for the bootmap. */ - bootmap_pages = bootmem_bootmap_pages(node_max_pfn-node_min_pfn); - - /* Now find a good region where to allocate the bootmap. */ - for_each_mem_cluster(memdesc, cluster, i) { - if (cluster->usage & 3) - continue; - - start = cluster->start_pfn; - end = start + cluster->numpages; - - if (start >= node_max_pfn || end <= node_min_pfn) - continue; - - if (end > node_max_pfn) - end = node_max_pfn; - if (start < node_min_pfn) - start = node_min_pfn; - - if (start < start_kernel_pfn) { - if (end > end_kernel_pfn - && end - end_kernel_pfn >= bootmap_pages) { - bootmap_start = end_kernel_pfn; - break; - } else if (end > start_kernel_pfn) - end = start_kernel_pfn; - } else if (start < end_kernel_pfn) - start = end_kernel_pfn; - if (end - start >= bootmap_pages) { - bootmap_start = start; - break; - } - } - - if (bootmap_start == -1) - panic("couldn't find a contiguous place for the bootmap"); - - /* Allocate the bootmap and mark the whole MM as reserved. */ - bootmap_size = init_bootmem_node(NODE_DATA(nid), bootmap_start, - node_min_pfn, node_max_pfn); - DBGDCONT(" bootmap_start %lu, bootmap_size %lu, bootmap_pages %lu\n", - bootmap_start, bootmap_size, bootmap_pages); + memblock_add(PFN_PHYS(node_min_pfn), + (node_max_pfn - node_min_pfn) << PAGE_SHIFT); - /* Mark the free regions. */ - for_each_mem_cluster(memdesc, cluster, i) { - if (cluster->usage & 3) - continue; - - start = cluster->start_pfn; - end = cluster->start_pfn + cluster->numpages; - - if (start >= node_max_pfn || end <= node_min_pfn) - continue; - - if (end > node_max_pfn) - end = node_max_pfn; - if (start < node_min_pfn) - start = node_min_pfn; - - if (start < start_kernel_pfn) { - if (end > end_kernel_pfn) { - free_bootmem_node(NODE_DATA(nid), PFN_PHYS(start), - (PFN_PHYS(start_kernel_pfn) - - PFN_PHYS(start))); - printk(" freeing pages %ld:%ld\n", - start, start_kernel_pfn); - start = end_kernel_pfn; - } else if (end > start_kernel_pfn) - end = start_kernel_pfn; - } else if (start < end_kernel_pfn) - start = end_kernel_pfn; - if (start >= end) - continue; - - free_bootmem_node(NODE_DATA(nid), PFN_PHYS(start), PFN_PHYS(end) - PFN_PHYS(start)); - printk(" freeing pages %ld:%ld\n", start, end); - } - - /* Reserve the bootmap memory. */ - reserve_bootmem_node(NODE_DATA(nid), PFN_PHYS(bootmap_start), - bootmap_size, BOOTMEM_DEFAULT); - printk(" reserving pages %ld:%ld\n", bootmap_start, bootmap_start+PFN_UP(bootmap_size)); + NODE_DATA(nid)->node_start_pfn = node_min_pfn; + NODE_DATA(nid)->node_present_pages = node_max_pfn - node_min_pfn; node_set_online(nid); } @@ -251,6 +162,7 @@ setup_memory_node(int nid, void *kernel_end) void __init setup_memory(void *kernel_end) { + unsigned long kernel_size; int nid; show_mem_layout(); @@ -262,6 +174,9 @@ setup_memory(void *kernel_end) for (nid = 0; nid < MAX_NUMNODES; nid++) setup_memory_node(nid, kernel_end); + kernel_size = virt_to_phys(kernel_end) - KERNEL_START_PHYS; + memblock_reserve(KERNEL_START_PHYS, kernel_size); + #ifdef CONFIG_BLK_DEV_INITRD initrd_start = INITRD_START; if (initrd_start) { @@ -279,9 +194,8 @@ setup_memory(void *kernel_end) phys_to_virt(PFN_PHYS(max_low_pfn))); } else { nid = kvaddr_to_nid(initrd_start); - reserve_bootmem_node(NODE_DATA(nid), - virt_to_phys((void *)initrd_start), - INITRD_SIZE, BOOTMEM_DEFAULT); + memblock_reserve(virt_to_phys((void *)initrd_start), + INITRD_SIZE); } } #endif /* CONFIG_BLK_DEV_INITRD */ @@ -303,9 +217,8 @@ void __init paging_init(void) dma_local_pfn = virt_to_phys((char *)MAX_DMA_ADDRESS) >> PAGE_SHIFT; for_each_online_node(nid) { - bootmem_data_t *bdata = &bootmem_node_data[nid]; - unsigned long start_pfn = bdata->node_min_pfn; - unsigned long end_pfn = bdata->node_low_pfn; + unsigned long start_pfn = NODE_DATA(nid)->node_start_pfn; + unsigned long end_pfn = start_pfn + NODE_DATA(nid)->node_present_pages; if (dma_local_pfn >= end_pfn - start_pfn) zones_size[ZONE_DMA] = end_pfn - start_pfn; -- 2.7.4