Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1173858AbdDXUwW (ORCPT ); Mon, 24 Apr 2017 16:52:22 -0400 Received: from mail-oi0-f41.google.com ([209.85.218.41]:33218 "EHLO mail-oi0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1172462AbdDXUwM (ORCPT ); Mon, 24 Apr 2017 16:52:12 -0400 MIME-Version: 1.0 In-Reply-To: References: <20170419133630.GA2311@x1> <20170420132632.GD2311@x1> From: Dan Williams Date: Mon, 24 Apr 2017 13:52:10 -0700 Message-ID: Subject: Re: KASLR causes intermittent boot failures on some systems To: Thomas Garnier Cc: Baoquan He , Jeff Moyer , Ingo Molnar , LKML , "linux-nvdimm@lists.01.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4247 Lines: 94 On Mon, Apr 24, 2017 at 1:37 PM, Thomas Garnier wrote: > ) > > On Thu, Apr 20, 2017 at 6:26 AM, Baoquan He wrote: >> On 04/19/17 at 07:27am, Thomas Garnier wrote: >>> On Wed, Apr 19, 2017 at 6:36 AM, Baoquan He wrote: >>> > Hi all, >>> > >>> > I login in Jeff's system, and added debug code, no clue found. However >>> > DaveY found he disabled page_offset randomization only and the efi issue >>> > won't be seen on his system with kaslr enabled. I did it too on Jeff's >>> > pmem system, it has the same result. I have rebooted several times, all >>> > boot successfully. In the current code, no __PAGE_OFFSET_BASE is used >>> > directly, don't know why it failed. >>> >>> Great! I still cannot repro it. >>> >>> > >>> > Does anyone have any idea or hint I can try? I read pmem code about >>> > the devm_nsio_enable/pmem_attach_disk/arch_add_memory, have no idea yet. >>> >>> I would test couple things: >>> - Set page_offset_base to 0 by default and set it to >>> __PAGE_OFFSET_BASE in kernel_randomize_memory (without randomizing >>> it). If it crashes on a low address, it might be due to using __va or >>> PAGE_OFFSET in general before randomization is done. >>> - Does any change in __PAGE_OFFSET lead to a crash? Or only when >>> __PAGE_OFFSET is on a specific range. Given that you may have to >>> reboot multiple times to get a crash, I assume that a specific range >>> is the problem but might be worth checking. >> >> I added debug code and collected boot logs about failure cases and >> success cases, seems it's related to crossing pgd entry issue. Below >> code change is part of my debugging code, I added printing anywhere, >> just abstract this for better understanding of the printed information >> below it. The emulated pmem memory is [1TB, 1TB+192G], namely >> [0x10000000000, 0x13000000000). If the left pud entries indexed from 1TB >> is smaller than 192, it will fail. init_memory_mapping might have >> handled direct mapping well, I am not sure if __add_pages is OK. >> >> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c >> index 5b536be..f3f8d43 100644 >> --- a/drivers/nvdimm/pmem.c >> +++ b/drivers/nvdimm/pmem.c >> @@ -87,6 +87,8 @@ static int read_pmem(struct page *page, unsigned int off, >> { >> int rc; >> void *mem = kmap_atomic(page); >> + pr_info("pfn:0x%llu, off=0x%lx, pmem_addr:0x%llx, len:0x%lx\n", >> + page_to_pfn(page), off, pmem_addr, len); >> >> rc = memcpy_from_pmem(mem + off, pmem_addr, len); >> kunmap_atomic(mem); >> @@ -312,6 +318,8 @@ static int pmem_attach_disk(struct device *dev, >> if (IS_ERR(addr)) >> return PTR_ERR(addr); >> pmem->virt_addr = addr; >> + pr_info("pmem->virt_addr:0x%llx, pmem->phys_addr:0x%llx, pmem->size:0x%llx\n", >> + pmem->virt_addr, pmem->phys_addr, pmem->size); >> >> blk_queue_write_cache(q, true, true); >> blk_queue_make_request(q, pmem_make_request); >> >> > > Super useful. I can see that the virt_addr field can be set in three > locations (http://lxr.free-electrons.com/source/drivers/nvdimm/pmem.c#L288). > Can you check which one is used for the faulting addresses? > > Also the two functions used (devm_memremap_pages and devm_memremap) > seem to check if the region intersects with IORESOURCE_SYSTEM_RAM, if > it does then the mapping is not done and the __va is returned. I would > be interested to know if this is what's happening. Basically logging > the VA on these lines: > > - http://lxr.free-electrons.com/source/kernel/memremap.c#L307 > - http://lxr.free-electrons.com/source/kernel/memremap.c#L98 > > This way, we can get closer to which code does not handle PG boundary correctly. > > Thanks! > When using the memmap= parameter we're using this call by default: } else if (pmem_should_map_pages(dev)) { addr = devm_memremap_pages(dev, &nsio->res, &q->q_usage_counter, NULL); pmem->pfn_flags |= PFN_MAP; } else ...where we are assuming that the memmap= parameter does not specify a range-size that will exhaust all of system-memory just to hold the struct page array.