Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp3916538ybz; Mon, 4 May 2020 12:05:50 -0700 (PDT) X-Google-Smtp-Source: APiQypLVjEbzbyZ8r4Gd2+lcU53BHqvElh34UVwUby7D2g5c5nRpJSbbtQgifXDFuZzwMsVcKrMk X-Received: by 2002:a50:a624:: with SMTP id d33mr16499958edc.58.1588619149863; Mon, 04 May 2020 12:05:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588619149; cv=none; d=google.com; s=arc-20160816; b=saFSOyYHx0qUxdRgJiklBriaCLPfA4blyfN58P24+5bg9TljiduC1tSSAfuLCVj7WD 1zUsqwXQFlj207Xrh5E43vW8mpSmXjQfvTr2urqLuHCLgMqdW4BGNSfWzs8zHegwH0Jc r5cImBPqlwdnCBlU8kJqzeltOP10NMXPk28ZtPrbmjwT/hUP8YWjF74wqcbv2+eQdxEa V3z/GG9Ik3Nc2f/U9IYaSca959lhEO/2TRpglU980C9mMdlPrdXfvuZPKpo9IZjwDmQG C2RXr7MZiNOCOcEILJufbZSVRmZanCXejxEoW01eUya++rCuFslaIk5YSO7Mj67PCIRV EctA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=3DVF+lgsiekXA9h6tv6LV4F597LFQ9My0fRuyYxe868=; b=JDRAzWBGGS8WFh98vq451rsX9CQYnQ9VCmMr/mf3mrQ0v/PfAX3rs8brRh0/0Hg/hW Dsvrh2Y/iNhU4aZL/7p9El3oTJQCvK7ISRes8TsLPJjgoKXnpyHCQ2YQwpk0ihI5IiB+ 0yAtosH2l8Xe30yNS+ty+uaqXVjrDYUHoXL7RhjWQYTwr77xpQ3KjbpICH1YRJavSDRe bqD/SOQZ+AC+TDt5ukjLdU9LdU04vjDV8lG67jkQe15ae+lE+R4M/oYxyM7ySUCmN1GL bPWo48LKavwRU2ckWfHywneZZdUHLtq1HdjkbKsv1Z/d9gXbFKzJlav0/LUkrCg78LHD wP4Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Mw5Q1f0a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l4si7276807edl.169.2020.05.04.12.05.26; Mon, 04 May 2020 12:05:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Mw5Q1f0a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727105AbgEDTDB (ORCPT + 99 others); Mon, 4 May 2020 15:03:01 -0400 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:41892 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725981AbgEDTDA (ORCPT ); Mon, 4 May 2020 15:03:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1588618979; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3DVF+lgsiekXA9h6tv6LV4F597LFQ9My0fRuyYxe868=; b=Mw5Q1f0aFhApTfcSWjc50ACpaYMVANO9tKsVHRRTVAf8NhX5tzTRP6hcaZZqVVfmxyr6yq eW8WwihDwZQMGWolQ3o36XBuu1GtZi72MmTm0QNK3PC6+7uWbWwXmcBNnyz62WIIbmFe04 111rToSDSqM32D5sbN+NqKQ80BtlptY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-325-Puc1_toyMROoH1ne7lS_cw-1; Mon, 04 May 2020 15:02:54 -0400 X-MC-Unique: Puc1_toyMROoH1ne7lS_cw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 350C71840400; Mon, 4 May 2020 19:02:52 +0000 (UTC) Received: from t480s.redhat.com (ovpn-114-235.ams2.redhat.com [10.36.114.235]) by smtp.corp.redhat.com (Postfix) with ESMTP id F06C7385; Mon, 4 May 2020 19:02:48 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-nvdimm@lists.01.org, kexec@lists.infradead.org, Vishal Verma , Dave Jiang , Pavel Tatashin , David Hildenbrand , Andrew Morton , Michal Hocko , Pankaj Gupta , Wei Yang , Baoquan He , Dave Hansen , Eric Biederman , Dan Williams Subject: [PATCH v3 1/3] mm/memory_hotplug: Introduce add_memory_device_managed() Date: Mon, 4 May 2020 21:02:25 +0200 Message-Id: <20200504190227.18269-2-david@redhat.com> In-Reply-To: <20200504190227.18269-1-david@redhat.com> References: <20200504190227.18269-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Some device drivers rely on memory they managed to not get added to the initial (firmware) memmap as system RAM - so it's not used as initial system RAM by the kernel and the driver is under control. While this is t= he case during cold boot and after a reboot, kexec is not aware of that and might add such memory to the initial (firmware) memmap of the kexec kerne= l. We need ways to teach kernel and userspace that this system ram is different. For example, dax/kmem allows to decide at runtime if persistent memory is to be used as system ram. Another future user is virtio-mem, which has to coordinate with its hypervisor to deal with inaccessible parts within memory resources. We want to let users in the kernel (esp. kexec) but also user space (esp. kexec-tools) know that this memory has different semantics and needs to be handled differently: 1. Don't create entries in /sys/firmware/memmap/ 2. Name the memory resource "System RAM ($DRIVER)" (exposed via /proc/iomem) ($DRIVER might be "kmem", "virtio_mem"). 3. Flag the memory resource IORESOURCE_MEM_DRIVER_MANAGED /sys/firmware/memmap/ [1] represents the "raw firmware-provided memory ma= p" because "on most architectures that firmware-provided memory map is modified afterwards by the kernel itself". The primary user is kexec on x86-64. Since commit d96ae5309165 ("memory-hotplug: create /sys/firmware/memmap entry for new memory"), we add all hotplugged memory to that firmware memmap - which makes perfect sense for traditiona= l memory hotplug on x86-64, where real HW will also add hotplugged DIMMs to the firmware memmap. We replicate what the "raw firmware-provided memory map" looks like after hot(un)plug. To keep things simple, let the user provide the full resource name instead of only the driver name - this way, we don't have to manually allocate/craft strings for memory resources. Also use the resource name to make decisions, to avoid passing additional flags. In case the name isn't "System RAM", it's special. We don't have to worry about firmware_map_remove() on the removal path. I= f there is no entry, it will simply return with -EINVAL. We'll adapt dax/kmem in a follow-up patch. [1] https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-firmware-m= emmap Cc: Andrew Morton Cc: Michal Hocko Cc: Pankaj Gupta Cc: Wei Yang Cc: Baoquan He Cc: Dave Hansen Cc: Eric Biederman Cc: Pavel Tatashin Cc: Dan Williams Signed-off-by: David Hildenbrand --- include/linux/ioport.h | 1 + include/linux/memory_hotplug.h | 2 ++ mm/memory_hotplug.c | 62 +++++++++++++++++++++++++++++++--- 3 files changed, 61 insertions(+), 4 deletions(-) diff --git a/include/linux/ioport.h b/include/linux/ioport.h index a9b9170b5dd2..cc9a5b4593ca 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -103,6 +103,7 @@ struct resource { #define IORESOURCE_MEM_32BIT (3<<3) #define IORESOURCE_MEM_SHADOWABLE (1<<5) /* dup: IORESOURCE_SHADOWABLE *= / #define IORESOURCE_MEM_EXPANSIONROM (1<<6) +#define IORESOURCE_MEM_DRIVER_MANAGED (1<<7) =20 /* PnP I/O specific bits (IORESOURCE_BITS) */ #define IORESOURCE_IO_16BIT_ADDR (1<<0) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplu= g.h index 7dca9cd6076b..fee7fab5d706 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -342,6 +342,8 @@ extern void __ref free_area_init_core_hotplug(int nid= ); extern int __add_memory(int nid, u64 start, u64 size); extern int add_memory(int nid, u64 start, u64 size); extern int add_memory_resource(int nid, struct resource *resource); +extern int add_memory_driver_managed(int nid, u64 start, u64 size, + const char *resource_name); extern void move_pfn_range_to_zone(struct zone *zone, unsigned long star= t_pfn, unsigned long nr_pages, struct vmem_altmap *altmap); extern void remove_pfn_range_from_zone(struct zone *zone, diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index ad54349a2550..c4d5c45820d0 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -98,11 +98,14 @@ void mem_hotplug_done(void) u64 max_mem_size =3D U64_MAX; =20 /* add this memory to iomem resource */ -static struct resource *register_memory_resource(u64 start, u64 size) +static struct resource *register_memory_resource(u64 start, u64 size, + const char *resource_name) { struct resource *res; unsigned long flags =3D IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; - char *resource_name =3D "System RAM"; + + if (strcmp(resource_name, "System RAM")) + flags |=3D IORESOURCE_MEM_DRIVER_MANAGED; =20 /* * Make sure value parsed from 'mem=3D' only restricts memory adding @@ -1057,7 +1060,8 @@ int __ref add_memory_resource(int nid, struct resou= rce *res) BUG_ON(ret); =20 /* create new memmap entry */ - firmware_map_add_hotplug(start, start + size, "System RAM"); + if (!strcmp(res->name, "System RAM")) + firmware_map_add_hotplug(start, start + size, "System RAM"); =20 /* device_online() will take the lock when calling online_pages() */ mem_hotplug_done(); @@ -1083,7 +1087,7 @@ int __ref __add_memory(int nid, u64 start, u64 size= ) struct resource *res; int ret; =20 - res =3D register_memory_resource(start, size); + res =3D register_memory_resource(start, size, "System RAM"); if (IS_ERR(res)) return PTR_ERR(res); =20 @@ -1105,6 +1109,56 @@ int add_memory(int nid, u64 start, u64 size) } EXPORT_SYMBOL_GPL(add_memory); =20 +/* + * Add special, driver-managed memory to the system as system RAM. Such + * memory is not exposed via the raw firmware-provided memmap as system + * RAM, instead, it is detected and added by a driver - during cold boot= , + * after a reboot, and after kexec. + * + * Reasons why this memory should not be used for the initial memmap of = a + * kexec kernel or for placing kexec images: + * - The booting kernel is in charge of determining how this memory will= be + * used (e.g., use persistent memory as system RAM) + * - Coordination with a hypervisor is required before this memory + * can be used (e.g., inaccessible parts). + * + * For this memory, no entries in /sys/firmware/memmap ("raw firmware-pr= ovided + * memory map") are created. Also, the created memory resource is flagge= d + * with IORESOURCE_MEM_DRIVER_MANAGED, so in-kernel users can special-ca= se + * this memory as well (esp., not place kexec images onto it). + * + * The resource_name (visible via /proc/iomem) has to have the format + * "System RAM ($DRIVER)". + */ +int add_memory_driver_managed(int nid, u64 start, u64 size, + const char *resource_name) +{ + struct resource *res; + int rc; + + if (!resource_name || + strstr(resource_name, "System RAM (") !=3D resource_name || + resource_name[strlen(resource_name) - 1] !=3D ')') + return -EINVAL; + + lock_device_hotplug(); + + res =3D register_memory_resource(start, size, resource_name); + if (IS_ERR(res)) { + rc =3D PTR_ERR(res); + goto out_unlock; + } + + rc =3D add_memory_resource(nid, res); + if (rc < 0) + release_memory_resource(res); + +out_unlock: + unlock_device_hotplug(); + return rc; +} +EXPORT_SYMBOL_GPL(add_memory_driver_managed); + #ifdef CONFIG_MEMORY_HOTREMOVE /* * Confirm all pages in a range [start, end) belong to the same zone (sk= ipping --=20 2.25.3