Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753429AbbDBTJH (ORCPT ); Thu, 2 Apr 2015 15:09:07 -0400 Received: from mail-la0-f48.google.com ([209.85.215.48]:36411 "EHLO mail-la0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753240AbbDBTJD (ORCPT ); Thu, 2 Apr 2015 15:09:03 -0400 MIME-Version: 1.0 In-Reply-To: References: <1427872339-6688-2-git-send-email-hch@lst.de> From: Andy Lutomirski Date: Thu, 2 Apr 2015 12:08:41 -0700 Message-ID: Subject: Re: [tip:x86/pmem] x86/mm: Add support for the non-standard protected e820 type To: axboe@fb.com, Boaz Harrosh , Dan Williams , "H. Peter Anvin" , Andy Lutomirski , Jens Axboe , "linux-kernel@vger.kernel.org" , Thomas Gleixner , Borislav Petkov , Andrew Morton , Linus Torvalds , Christoph Hellwig , Ross Zwisler , Ingo Molnar , Matthew Wilcox , keith.busch@intel.com Cc: "linux-tip-commits@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 11622 Lines: 298 On Thu, Apr 2, 2015 at 5:31 AM, tip-bot for Christoph Hellwig wrote: > Commit-ID: ec776ef6bbe1734c29cd6bd05219cd93b2731bd4 > Gitweb: http://git.kernel.org/tip/ec776ef6bbe1734c29cd6bd05219cd93b2731bd4 > Author: Christoph Hellwig > AuthorDate: Wed, 1 Apr 2015 09:12:18 +0200 > Committer: Ingo Molnar > CommitDate: Wed, 1 Apr 2015 17:02:43 +0200 > > x86/mm: Add support for the non-standard protected e820 type > > Various recent BIOSes support NVDIMMs or ADR using a > non-standard e820 memory type, and Intel supplied reference > Linux code using this type to various vendors. > > Wire this e820 table type up to export platform devices for the > pmem driver so that we can use it in Linux. This scares me a bit. Do we know that the upcoming ACPI 6.0 enumeration mechanism *won't* use e820 type 12? If it will, what stops a non-legacy device from being incorrectly claimed as a legacy device? --Andy > > Based on earlier work from: > > Dave Jiang > Dan Williams > > Includes fixes for NUMA regions from Boaz Harrosh. > > Tested-by: Ross Zwisler > Signed-off-by: Christoph Hellwig > Acked-by: Dan Williams > Cc: Andrew Morton > Cc: Andy Lutomirski > Cc: Boaz Harrosh > Cc: Borislav Petkov > Cc: H. Peter Anvin > Cc: Jens Axboe > Cc: Jens Axboe > Cc: Keith Busch > Cc: Linus Torvalds > Cc: Matthew Wilcox > Cc: Thomas Gleixner > Cc: linux-nvdimm@ml01.01.org > Link: http://lkml.kernel.org/r/1427872339-6688-2-git-send-email-hch@lst.de > [ Minor cleanups. ] > Signed-off-by: Ingo Molnar > --- > Documentation/kernel-parameters.txt | 6 +++++ > arch/x86/Kconfig | 10 +++++++ > arch/x86/include/uapi/asm/e820.h | 10 +++++++ > arch/x86/kernel/Makefile | 1 + > arch/x86/kernel/e820.c | 26 +++++++++++++----- > arch/x86/kernel/pmem.c | 53 +++++++++++++++++++++++++++++++++++++ > 6 files changed, 100 insertions(+), 6 deletions(-) > > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt > index bfcb1a6..c87122d 100644 > --- a/Documentation/kernel-parameters.txt > +++ b/Documentation/kernel-parameters.txt > @@ -1965,6 +1965,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. > or > memmap=0x10000$0x18690000 > > + memmap=nn[KMG]!ss[KMG] > + [KNL,X86] Mark specific memory as protected. > + Region of memory to be used, from ss to ss+nn. > + The memory region may be marked as e820 type 12 (0xc) > + and is NVDIMM or ADR memory. > + > memory_corruption_check=0/1 [X86] > Some BIOSes seem to corrupt the first 64k of > memory when doing things like suspend/resume. > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index b7d31ca..9e3bcd6 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -1430,6 +1430,16 @@ config ILLEGAL_POINTER_VALUE > > source "mm/Kconfig" > > +config X86_PMEM_LEGACY > + bool "Support non-standard NVDIMMs and ADR protected memory" > + help > + Treat memory marked using the non-standard e820 type of 12 as used > + by the Intel Sandy Bridge-EP reference BIOS as protected memory. > + The kernel will offer these regions to the 'pmem' driver so > + they can be used for persistent storage. > + > + Say Y if unsure. > + > config HIGHPTE > bool "Allocate 3rd-level pagetables from highmem" > depends on HIGHMEM > diff --git a/arch/x86/include/uapi/asm/e820.h b/arch/x86/include/uapi/asm/e820.h > index d993e33..960a8a9 100644 > --- a/arch/x86/include/uapi/asm/e820.h > +++ b/arch/x86/include/uapi/asm/e820.h > @@ -33,6 +33,16 @@ > #define E820_NVS 4 > #define E820_UNUSABLE 5 > > +/* > + * This is a non-standardized way to represent ADR or NVDIMM regions that > + * persist over a reboot. The kernel will ignore their special capabilities > + * unless the CONFIG_X86_PMEM_LEGACY=y option is set. > + * > + * ( Note that older platforms also used 6 for the same type of memory, > + * but newer versions switched to 12 as 6 was assigned differently. Some > + * time they will learn... ) > + */ > +#define E820_PRAM 12 > > /* > * reserved RAM used by kernel itself > diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile > index cdb1b70..971f18c 100644 > --- a/arch/x86/kernel/Makefile > +++ b/arch/x86/kernel/Makefile > @@ -94,6 +94,7 @@ obj-$(CONFIG_KVM_GUEST) += kvm.o kvmclock.o > obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_$(BITS).o > obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o > obj-$(CONFIG_PARAVIRT_CLOCK) += pvclock.o > +obj-$(CONFIG_X86_PMEM_LEGACY) += pmem.o > > obj-$(CONFIG_PCSPKR_PLATFORM) += pcspeaker.o > > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c > index 46201de..11cc7d5 100644 > --- a/arch/x86/kernel/e820.c > +++ b/arch/x86/kernel/e820.c > @@ -149,6 +149,9 @@ static void __init e820_print_type(u32 type) > case E820_UNUSABLE: > printk(KERN_CONT "unusable"); > break; > + case E820_PRAM: > + printk(KERN_CONT "persistent (type %u)", type); > + break; > default: > printk(KERN_CONT "type %u", type); > break; > @@ -343,7 +346,7 @@ int __init sanitize_e820_map(struct e820entry *biosmap, int max_nr_map, > * continue building up new bios map based on this > * information > */ > - if (current_type != last_type) { > + if (current_type != last_type || current_type == E820_PRAM) { > if (last_type != 0) { > new_bios[new_bios_entry].size = > change_point[chgidx]->addr - last_addr; > @@ -688,6 +691,7 @@ void __init e820_mark_nosave_regions(unsigned long limit_pfn) > register_nosave_region(pfn, PFN_UP(ei->addr)); > > pfn = PFN_DOWN(ei->addr + ei->size); > + > if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN) > register_nosave_region(PFN_UP(ei->addr), pfn); > > @@ -748,7 +752,7 @@ u64 __init early_reserve_e820(u64 size, u64 align) > /* > * Find the highest page frame number we have available > */ > -static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type) > +static unsigned long __init e820_end_pfn(unsigned long limit_pfn) > { > int i; > unsigned long last_pfn = 0; > @@ -759,7 +763,11 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type) > unsigned long start_pfn; > unsigned long end_pfn; > > - if (ei->type != type) > + /* > + * Persistent memory is accounted as ram for purposes of > + * establishing max_pfn and mem_map. > + */ > + if (ei->type != E820_RAM && ei->type != E820_PRAM) > continue; > > start_pfn = ei->addr >> PAGE_SHIFT; > @@ -784,12 +792,12 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type) > } > unsigned long __init e820_end_of_ram_pfn(void) > { > - return e820_end_pfn(MAX_ARCH_PFN, E820_RAM); > + return e820_end_pfn(MAX_ARCH_PFN); > } > > unsigned long __init e820_end_of_low_ram_pfn(void) > { > - return e820_end_pfn(1UL<<(32 - PAGE_SHIFT), E820_RAM); > + return e820_end_pfn(1UL << (32-PAGE_SHIFT)); > } > > static void early_panic(char *msg) > @@ -866,6 +874,9 @@ static int __init parse_memmap_one(char *p) > } else if (*p == '$') { > start_at = memparse(p+1, &p); > e820_add_region(start_at, mem_size, E820_RESERVED); > + } else if (*p == '!') { > + start_at = memparse(p+1, &p); > + e820_add_region(start_at, mem_size, E820_PRAM); > } else > e820_remove_range(mem_size, ULLONG_MAX - mem_size, E820_RAM, 1); > > @@ -907,6 +918,7 @@ static inline const char *e820_type_to_string(int e820_type) > case E820_ACPI: return "ACPI Tables"; > case E820_NVS: return "ACPI Non-volatile Storage"; > case E820_UNUSABLE: return "Unusable memory"; > + case E820_PRAM: return "Persistent RAM"; > default: return "reserved"; > } > } > @@ -940,7 +952,9 @@ void __init e820_reserve_resources(void) > * pci device BAR resource and insert them later in > * pcibios_resource_survey() > */ > - if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20)) { > + if (((e820.map[i].type != E820_RESERVED) && > + (e820.map[i].type != E820_PRAM)) || > + res->start < (1ULL<<20)) { > res->flags |= IORESOURCE_BUSY; > insert_resource(&iomem_resource, res); > } > diff --git a/arch/x86/kernel/pmem.c b/arch/x86/kernel/pmem.c > new file mode 100644 > index 0000000..3420c87 > --- /dev/null > +++ b/arch/x86/kernel/pmem.c > @@ -0,0 +1,53 @@ > +/* > + * Copyright (c) 2015, Christoph Hellwig. > + */ > +#include > +#include > +#include > +#include > +#include > +#include > + > +static __init void register_pmem_device(struct resource *res) > +{ > + struct platform_device *pdev; > + int error; > + > + pdev = platform_device_alloc("pmem", PLATFORM_DEVID_AUTO); > + if (!pdev) > + return; > + > + error = platform_device_add_resources(pdev, res, 1); > + if (error) > + goto out_put_pdev; > + > + error = platform_device_add(pdev); > + if (error) > + goto out_put_pdev; > + return; > + > +out_put_pdev: > + dev_warn(&pdev->dev, "failed to add 'pmem' (persistent memory) device!\n"); > + platform_device_put(pdev); > +} > + > +static __init int register_pmem_devices(void) > +{ > + int i; > + > + for (i = 0; i < e820.nr_map; i++) { > + struct e820entry *ei = &e820.map[i]; > + > + if (ei->type == E820_PRAM) { > + struct resource res = { > + .flags = IORESOURCE_MEM, > + .start = ei->addr, > + .end = ei->addr + ei->size - 1, > + }; > + register_pmem_device(&res); > + } > + } > + > + return 0; > +} > +device_initcall(register_pmem_devices); -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/