Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752201AbdI0SAM (ORCPT ); Wed, 27 Sep 2017 14:00:12 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:60540 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751907AbdI0SAK (ORCPT ); Wed, 27 Sep 2017 14:00:10 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org BF6A26044E Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=rruigrok@codeaurora.org Subject: Re: ARM64: kernel panics in DABT in sys_msync path To: Will Deacon Cc: Yury Norov , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org References: <20170924213622.75e7r3k56tgxlezh@yury-thinkpad> <20170925105335.GA24042@arm.com> <20170925140240.vl5mvbce5lb37dxe@yury-thinkpad> <20170925190426.6prpcfn7lly26clm@yury-thinkpad> <20170926102324.GC8693@arm.com> <547ed590-3ab4-cc11-cbea-f587541d2b08@codeaurora.org> <20170926173112.GA16650@arm.com> <20170927155007.GA16211@arm.com> From: Richard Ruigrok Message-ID: <38058a06-1c8a-015c-cb9d-1c1b17a1edf3@codeaurora.org> Date: Wed, 27 Sep 2017 12:00:06 -0600 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <20170927155007.GA16211@arm.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2730 Lines: 59 On 9/27/2017 9:50 AM, Will Deacon wrote: > On Tue, Sep 26, 2017 at 06:31:12PM +0100, Will Deacon wrote: >> On Tue, Sep 26, 2017 at 08:23:35AM -0600, Ruigrok, Richard wrote: >>> On 9/26/2017 4:23 AM, Will Deacon wrote: >>>> On Mon, Sep 25, 2017 at 01:54:57PM -0600, Ruigrok, Richard wrote: >>>>> I also found this issue with kernels from 4.11 through 4.13. In my tests, I >>>>> found that it reproduces only with 4K page and Transparent Huge Pages. With 64K >>>>> page I was not able to reproduce. RH also reported it here: https:// >>>>> bugzilla.redhat.com/show_bug.cgi?id=1491504 Linaro reported on the RPK kernel >>>>> (4.12) on Centriq2400 and ThunderX >>>>> >>>>> >>>>> https://bugs.linaro.org/show_bug.cgi?id=3191 >>>>> >>>>> https://bugs.linaro.org/show_bug.cgi?id=3068. >>>> These two aren't the same bug (that's a forward progress issue that we're >>>> currently working on). I don't have permission to look at the redhat one, >>>> but is it just an RCU stall or actually the Oops reported by Yury? >>>> >>>>> I was able to bisect down to a specific commit. >>>> I think we're chasing two different things here, so not sure I trust the >>>> bisect! >>>> >>> The RCU stall is side effect.  The issue I'm seeing has the same stack >>> trace and same stimulus (rwtest).  Following are the details. >> FWIW, I think I've worked out what's going on here and I should have a patch >> tomorrow. > Diff below. I'm going to follow up with a separate thread about this, > because the proper fix is going to be invasive. I'll keep you on cc. > > Out of curiosity: what version of GCC are you using to compile the kernel? I'm using gcc-linaro-6.3.1-2017.02-x86_64_aarch64-linux-gnu Thanks for the patch, test results to follow. Richard > > Will > > --->8 > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index bc4e92337d16..b46e54c2399b 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -401,7 +401,7 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd) > /* Find an entry in the third-level page table. */ > #define pte_index(addr) (((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)) > > -#define pte_offset_phys(dir,addr) (pmd_page_paddr(*(dir)) + pte_index(addr) * sizeof(pte_t)) > +#define pte_offset_phys(dir,addr) (pmd_page_paddr(READ_ONCE(*(dir))) + pte_index(addr) * sizeof(pte_t)) > #define pte_offset_kernel(dir,addr) ((pte_t *)__va(pte_offset_phys((dir), (addr)))) > > #define pte_offset_map(dir,addr) pte_offset_kernel((dir), (addr)) -- Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.