Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1381440imm; Thu, 12 Jul 2018 00:29:55 -0700 (PDT) X-Google-Smtp-Source: AAOMgpfTWJmXpbSA745BqJf4OKy+uknYUgiPBX1uXdn6/jL1yAwTFXWHOQAr9dxsvyf86cllybfJ X-Received: by 2002:a17:902:e00a:: with SMTP id ca10-v6mr1088974plb.224.1531380595904; Thu, 12 Jul 2018 00:29:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531380595; cv=none; d=google.com; s=arc-20160816; b=n4TA6P4nSnr5K+gzXbl9v4K4jk8DyavU8nBmZlr2DEo/gml/ywwT5ouXyfmvXRxLUu BK0T4KaDJt3AmliXb6OMRKpXV1zjK4O+739bxqutU3awQ34w99rnccj8qxI23iQ8789B WwhSlhdiH8L580NON+gSF2c5QH8yuQRRTZGFReF1+KNTbLR0L+1qZk6jQod4+D3fU0pp NfkxrYZIYyXz+Seteq4Z0z2nFQZFPjisXMJd4Qdi7dFKuBPLdqGDVFqLcHv8PKso4FQ7 coxoJ4eaBVAo1k+M9w3pSPFkZOGkAmdkIXU7IsYgAF8MFSAOmEUX0INqRtpoIIoXD6+d Lv3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:user-agent:in-reply-to :content-disposition:mime-version:references:subject:cc:to:from:date :arc-authentication-results; bh=/KgtOT80ScmdXD8x45C+aw4iJzyPL2NbqfEMnVxJMXQ=; b=ZAXnNZ3vmzQksAbEIwD57Q55TW/eBBiyX0Oz88Rcwl3a0hz3nCUHSrQs2o9E9uSgNq o+6wmbmWDUqiJUGw6ckS4Og9qmLUlO4HvYDaBg4GpfJb0Y3nVwsUr5qmDc6tvqVoccfv x5Et5/NiX/EgjEWaaTr+8TLcA8ZR0Gv3dPLNEu1eUi+jAl/3xd4YTvUhTrP9vnJu0854 BMfWn5bYT07SWSXMc3q/QaFd/d1+J5/OOHPRQjXu3tld8yBV8UElobZx6E30C2aCZUg3 LgiwxulDmr0kTKVDEOaN3zEtxmRcmlF1ARl68pTyPjfKJYsfNssSInWpemqM1b3nxux2 AvkA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s2-v6si19974771pgc.447.2018.07.12.00.29.39; Thu, 12 Jul 2018 00:29:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726568AbeGLHhL (ORCPT + 99 others); Thu, 12 Jul 2018 03:37:11 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:49292 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725995AbeGLHhL (ORCPT ); Thu, 12 Jul 2018 03:37:11 -0400 Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w6C7NdDd094869 for ; Thu, 12 Jul 2018 03:28:50 -0400 Received: from e06smtp07.uk.ibm.com (e06smtp07.uk.ibm.com [195.75.94.103]) by mx0a-001b2d01.pphosted.com with ESMTP id 2k61xd1w3p-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 12 Jul 2018 03:28:50 -0400 Received: from localhost by e06smtp07.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 12 Jul 2018 08:28:48 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp07.uk.ibm.com (192.168.101.137) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 12 Jul 2018 08:28:46 +0100 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w6C7Sjd542336342 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 12 Jul 2018 07:28:45 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B3FE5AE045; Thu, 12 Jul 2018 10:28:38 +0100 (BST) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 17866AE053; Thu, 12 Jul 2018 10:28:38 +0100 (BST) Received: from rapoport-lnx (unknown [9.148.8.135]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Thu, 12 Jul 2018 10:28:37 +0100 (BST) Date: Thu, 12 Jul 2018 10:28:42 +0300 From: Mike Rapoport To: Matt Turner , Richard Henderson , Ivan Kokshaysky Cc: Michal Hocko , linux-alpha , linux-mm , lkml Subject: Re: [PATCH v2] alpha: switch to NO_BOOTMEM References: <1530371610-22174-1-git-send-email-rppt@linux.vnet.ibm.com> <20180704124446.GF4352@rapoport-lnx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180704124446.GF4352@rapoport-lnx> User-Agent: Mutt/1.5.24 (2015-08-30) X-TM-AS-GCONF: 00 x-cbid: 18071207-0028-0000-0000-000002DABF14 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18071207-0029-0000-0000-00002392693A Message-Id: <20180712072842.GC4422@rapoport-lnx> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-07-12_03:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807120076 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (added Matt Turner, sorry, should have done it from the beginning) Any comments on this? > On Sat, Jun 30, 2018 at 06:13:30PM +0300, Mike Rapoport wrote: > Replace bootmem allocator with memblock and enable use of NO_BOOTMEM like > on most other architectures. > > Alpha gets the description of the physical memory from the firmware as an > array of memory clusters. Each cluster that is not reserved by the firmware > is added to memblock.memory. > > Once the memblock.memory is set up, we reserve the kernel and initrd pages > with memblock reserve. > > Since we don't need the bootmem bitmap anymore, the code that finds an > appropriate place is removed. > > The conversion does not take care of NUMA support which is marked broken > for more than 10 years now. > > Signed-off-by: Mike Rapoport > --- > v2: describe the conversion as per Michal's request > > Tested with qemu-system-alpha. I've added some tweaks to sys_dp264 to force > memory split for testing with CONFIG_DISCONTIGMEM=y > > The allyesconfig build requires update to DEFERRED_STRUCT_PAGE_INIT > dependencies [1] which is already in -mm tree. > > [1] https://lkml.org/lkml/2018/6/29/353 > > arch/alpha/Kconfig | 2 + > arch/alpha/kernel/core_irongate.c | 4 +- > arch/alpha/kernel/setup.c | 98 ++++----------------------------- > arch/alpha/mm/numa.c | 113 +++++--------------------------------- > 4 files changed, 29 insertions(+), 188 deletions(-) > > diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig > index 04a4a138ed13..040692a8d433 100644 > --- a/arch/alpha/Kconfig > +++ b/arch/alpha/Kconfig > @@ -30,6 +30,8 @@ config ALPHA > select ODD_RT_SIGACTION > select OLD_SIGSUSPEND > select CPU_NO_EFFICIENT_FFS if !ALPHA_EV67 > + select HAVE_MEMBLOCK > + select NO_BOOTMEM > help > The Alpha is a 64-bit general-purpose processor designed and > marketed by the Digital Equipment Corporation of blessed memory, > diff --git a/arch/alpha/kernel/core_irongate.c b/arch/alpha/kernel/core_irongate.c > index aec757250e07..f70986683fc6 100644 > --- a/arch/alpha/kernel/core_irongate.c > +++ b/arch/alpha/kernel/core_irongate.c > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -241,8 +242,7 @@ albacore_init_arch(void) > size / 1024); > } > #endif > - reserve_bootmem_node(NODE_DATA(0), pci_mem, memtop - > - pci_mem, BOOTMEM_DEFAULT); > + memblock_reserve(pci_mem, memtop - pci_mem); > printk("irongate_init_arch: temporarily reserving " > "region %08lx-%08lx for PCI\n", pci_mem, memtop - 1); > } > diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c > index 5576f7646fb6..4f0d94471bc9 100644 > --- a/arch/alpha/kernel/setup.c > +++ b/arch/alpha/kernel/setup.c > @@ -30,6 +30,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -312,9 +313,7 @@ setup_memory(void *kernel_end) > { > struct memclust_struct * cluster; > struct memdesc_struct * memdesc; > - unsigned long start_kernel_pfn, end_kernel_pfn; > - unsigned long bootmap_size, bootmap_pages, bootmap_start; > - unsigned long start, end; > + unsigned long kernel_size; > unsigned long i; > > /* Find free clusters, and init and free the bootmem accordingly. */ > @@ -322,6 +321,8 @@ setup_memory(void *kernel_end) > (hwrpb->mddt_offset + (unsigned long) hwrpb); > > for_each_mem_cluster(memdesc, cluster, i) { > + unsigned long end; > + > printk("memcluster %lu, usage %01lx, start %8lu, end %8lu\n", > i, cluster->usage, cluster->start_pfn, > cluster->start_pfn + cluster->numpages); > @@ -335,6 +336,9 @@ setup_memory(void *kernel_end) > end = cluster->start_pfn + cluster->numpages; > if (end > max_low_pfn) > max_low_pfn = end; > + > + memblock_add(PFN_PHYS(cluster->start_pfn), > + cluster->numpages << PAGE_SHIFT); > } > > /* > @@ -363,87 +367,9 @@ setup_memory(void *kernel_end) > max_low_pfn = mem_size_limit; > } > > - /* Find the bounds of kernel memory. */ > - start_kernel_pfn = PFN_DOWN(KERNEL_START_PHYS); > - end_kernel_pfn = PFN_UP(virt_to_phys(kernel_end)); > - bootmap_start = -1; > - > - try_again: > - if (max_low_pfn <= end_kernel_pfn) > - panic("not enough memory to boot"); > - > - /* We need to know how many physically contiguous pages > - we'll need for the bootmap. */ > - bootmap_pages = bootmem_bootmap_pages(max_low_pfn); > - > - /* Now find a good region where to allocate the bootmap. */ > - for_each_mem_cluster(memdesc, cluster, i) { > - if (cluster->usage & 3) > - continue; > - > - start = cluster->start_pfn; > - end = start + cluster->numpages; > - if (start >= max_low_pfn) > - continue; > - if (end > max_low_pfn) > - end = max_low_pfn; > - if (start < start_kernel_pfn) { > - if (end > end_kernel_pfn > - && end - end_kernel_pfn >= bootmap_pages) { > - bootmap_start = end_kernel_pfn; > - break; > - } else if (end > start_kernel_pfn) > - end = start_kernel_pfn; > - } else if (start < end_kernel_pfn) > - start = end_kernel_pfn; > - if (end - start >= bootmap_pages) { > - bootmap_start = start; > - break; > - } > - } > - > - if (bootmap_start == ~0UL) { > - max_low_pfn >>= 1; > - goto try_again; > - } > - > - /* Allocate the bootmap and mark the whole MM as reserved. */ > - bootmap_size = init_bootmem(bootmap_start, max_low_pfn); > - > - /* Mark the free regions. */ > - for_each_mem_cluster(memdesc, cluster, i) { > - if (cluster->usage & 3) > - continue; > - > - start = cluster->start_pfn; > - end = cluster->start_pfn + cluster->numpages; > - if (start >= max_low_pfn) > - continue; > - if (end > max_low_pfn) > - end = max_low_pfn; > - if (start < start_kernel_pfn) { > - if (end > end_kernel_pfn) { > - free_bootmem(PFN_PHYS(start), > - (PFN_PHYS(start_kernel_pfn) > - - PFN_PHYS(start))); > - printk("freeing pages %ld:%ld\n", > - start, start_kernel_pfn); > - start = end_kernel_pfn; > - } else if (end > start_kernel_pfn) > - end = start_kernel_pfn; > - } else if (start < end_kernel_pfn) > - start = end_kernel_pfn; > - if (start >= end) > - continue; > - > - free_bootmem(PFN_PHYS(start), PFN_PHYS(end) - PFN_PHYS(start)); > - printk("freeing pages %ld:%ld\n", start, end); > - } > - > - /* Reserve the bootmap memory. */ > - reserve_bootmem(PFN_PHYS(bootmap_start), bootmap_size, > - BOOTMEM_DEFAULT); > - printk("reserving pages %ld:%ld\n", bootmap_start, bootmap_start+PFN_UP(bootmap_size)); > + /* Reserve the kernel memory. */ > + kernel_size = virt_to_phys(kernel_end) - KERNEL_START_PHYS; > + memblock_reserve(KERNEL_START_PHYS, kernel_size); > > #ifdef CONFIG_BLK_DEV_INITRD > initrd_start = INITRD_START; > @@ -459,8 +385,8 @@ setup_memory(void *kernel_end) > initrd_end, > phys_to_virt(PFN_PHYS(max_low_pfn))); > } else { > - reserve_bootmem(virt_to_phys((void *)initrd_start), > - INITRD_SIZE, BOOTMEM_DEFAULT); > + memblock_reserve(virt_to_phys((void *)initrd_start), > + INITRD_SIZE); > } > } > #endif /* CONFIG_BLK_DEV_INITRD */ > diff --git a/arch/alpha/mm/numa.c b/arch/alpha/mm/numa.c > index a9e86475f169..26cd925d19b1 100644 > --- a/arch/alpha/mm/numa.c > +++ b/arch/alpha/mm/numa.c > @@ -11,6 +11,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -59,12 +60,10 @@ setup_memory_node(int nid, void *kernel_end) > struct memclust_struct * cluster; > struct memdesc_struct * memdesc; > unsigned long start_kernel_pfn, end_kernel_pfn; > - unsigned long bootmap_size, bootmap_pages, bootmap_start; > unsigned long start, end; > unsigned long node_pfn_start, node_pfn_end; > unsigned long node_min_pfn, node_max_pfn; > int i; > - unsigned long node_datasz = PFN_UP(sizeof(pg_data_t)); > int show_init = 0; > > /* Find the bounds of current node */ > @@ -134,24 +133,14 @@ setup_memory_node(int nid, void *kernel_end) > /* Cute trick to make sure our local node data is on local memory */ > node_data[nid] = (pg_data_t *)(__va(node_min_pfn << PAGE_SHIFT)); > #endif > - /* Quasi-mark the pg_data_t as in-use */ > - node_min_pfn += node_datasz; > - if (node_min_pfn >= node_max_pfn) { > - printk(" not enough mem to reserve NODE_DATA"); > - return; > - } > - NODE_DATA(nid)->bdata = &bootmem_node_data[nid]; > - > printk(" Detected node memory: start %8lu, end %8lu\n", > node_min_pfn, node_max_pfn); > > DBGDCONT(" DISCONTIG: node_data[%d] is at 0x%p\n", nid, NODE_DATA(nid)); > - DBGDCONT(" DISCONTIG: NODE_DATA(%d)->bdata is at 0x%p\n", nid, NODE_DATA(nid)->bdata); > > /* Find the bounds of kernel memory. */ > start_kernel_pfn = PFN_DOWN(KERNEL_START_PHYS); > end_kernel_pfn = PFN_UP(virt_to_phys(kernel_end)); > - bootmap_start = -1; > > if (!nid && (node_max_pfn < end_kernel_pfn || node_min_pfn > start_kernel_pfn)) > panic("kernel loaded out of ram"); > @@ -161,89 +150,11 @@ setup_memory_node(int nid, void *kernel_end) > has much larger alignment than 8Mb, so it's safe. */ > node_min_pfn &= ~((1UL << (MAX_ORDER-1))-1); > > - /* We need to know how many physically contiguous pages > - we'll need for the bootmap. */ > - bootmap_pages = bootmem_bootmap_pages(node_max_pfn-node_min_pfn); > - > - /* Now find a good region where to allocate the bootmap. */ > - for_each_mem_cluster(memdesc, cluster, i) { > - if (cluster->usage & 3) > - continue; > - > - start = cluster->start_pfn; > - end = start + cluster->numpages; > - > - if (start >= node_max_pfn || end <= node_min_pfn) > - continue; > - > - if (end > node_max_pfn) > - end = node_max_pfn; > - if (start < node_min_pfn) > - start = node_min_pfn; > - > - if (start < start_kernel_pfn) { > - if (end > end_kernel_pfn > - && end - end_kernel_pfn >= bootmap_pages) { > - bootmap_start = end_kernel_pfn; > - break; > - } else if (end > start_kernel_pfn) > - end = start_kernel_pfn; > - } else if (start < end_kernel_pfn) > - start = end_kernel_pfn; > - if (end - start >= bootmap_pages) { > - bootmap_start = start; > - break; > - } > - } > - > - if (bootmap_start == -1) > - panic("couldn't find a contiguous place for the bootmap"); > - > - /* Allocate the bootmap and mark the whole MM as reserved. */ > - bootmap_size = init_bootmem_node(NODE_DATA(nid), bootmap_start, > - node_min_pfn, node_max_pfn); > - DBGDCONT(" bootmap_start %lu, bootmap_size %lu, bootmap_pages %lu\n", > - bootmap_start, bootmap_size, bootmap_pages); > + memblock_add(PFN_PHYS(node_min_pfn), > + (node_max_pfn - node_min_pfn) << PAGE_SHIFT); > > - /* Mark the free regions. */ > - for_each_mem_cluster(memdesc, cluster, i) { > - if (cluster->usage & 3) > - continue; > - > - start = cluster->start_pfn; > - end = cluster->start_pfn + cluster->numpages; > - > - if (start >= node_max_pfn || end <= node_min_pfn) > - continue; > - > - if (end > node_max_pfn) > - end = node_max_pfn; > - if (start < node_min_pfn) > - start = node_min_pfn; > - > - if (start < start_kernel_pfn) { > - if (end > end_kernel_pfn) { > - free_bootmem_node(NODE_DATA(nid), PFN_PHYS(start), > - (PFN_PHYS(start_kernel_pfn) > - - PFN_PHYS(start))); > - printk(" freeing pages %ld:%ld\n", > - start, start_kernel_pfn); > - start = end_kernel_pfn; > - } else if (end > start_kernel_pfn) > - end = start_kernel_pfn; > - } else if (start < end_kernel_pfn) > - start = end_kernel_pfn; > - if (start >= end) > - continue; > - > - free_bootmem_node(NODE_DATA(nid), PFN_PHYS(start), PFN_PHYS(end) - PFN_PHYS(start)); > - printk(" freeing pages %ld:%ld\n", start, end); > - } > - > - /* Reserve the bootmap memory. */ > - reserve_bootmem_node(NODE_DATA(nid), PFN_PHYS(bootmap_start), > - bootmap_size, BOOTMEM_DEFAULT); > - printk(" reserving pages %ld:%ld\n", bootmap_start, bootmap_start+PFN_UP(bootmap_size)); > + NODE_DATA(nid)->node_start_pfn = node_min_pfn; > + NODE_DATA(nid)->node_present_pages = node_max_pfn - node_min_pfn; > > node_set_online(nid); > } > @@ -251,6 +162,7 @@ setup_memory_node(int nid, void *kernel_end) > void __init > setup_memory(void *kernel_end) > { > + unsigned long kernel_size; > int nid; > > show_mem_layout(); > @@ -262,6 +174,9 @@ setup_memory(void *kernel_end) > for (nid = 0; nid < MAX_NUMNODES; nid++) > setup_memory_node(nid, kernel_end); > > + kernel_size = virt_to_phys(kernel_end) - KERNEL_START_PHYS; > + memblock_reserve(KERNEL_START_PHYS, kernel_size); > + > #ifdef CONFIG_BLK_DEV_INITRD > initrd_start = INITRD_START; > if (initrd_start) { > @@ -279,9 +194,8 @@ setup_memory(void *kernel_end) > phys_to_virt(PFN_PHYS(max_low_pfn))); > } else { > nid = kvaddr_to_nid(initrd_start); > - reserve_bootmem_node(NODE_DATA(nid), > - virt_to_phys((void *)initrd_start), > - INITRD_SIZE, BOOTMEM_DEFAULT); > + memblock_reserve(virt_to_phys((void *)initrd_start), > + INITRD_SIZE); > } > } > #endif /* CONFIG_BLK_DEV_INITRD */ > @@ -303,9 +217,8 @@ void __init paging_init(void) > dma_local_pfn = virt_to_phys((char *)MAX_DMA_ADDRESS) >> PAGE_SHIFT; > > for_each_online_node(nid) { > - bootmem_data_t *bdata = &bootmem_node_data[nid]; > - unsigned long start_pfn = bdata->node_min_pfn; > - unsigned long end_pfn = bdata->node_low_pfn; > + unsigned long start_pfn = NODE_DATA(nid)->node_start_pfn; > + unsigned long end_pfn = start_pfn + NODE_DATA(nid)->node_present_pages; > > if (dma_local_pfn >= end_pfn - start_pfn) > zones_size[ZONE_DMA] = end_pfn - start_pfn; > -- > 2.7.4 -- Sincerely yours, Mike.