Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755047AbbHXSW4 (ORCPT ); Mon, 24 Aug 2015 14:22:56 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:41932 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751235AbbHXSWy (ORCPT ); Mon, 24 Aug 2015 14:22:54 -0400 Subject: Re: [PATCH 2/2] acpi, apei: use appropriate pgprot_t to map GHES memory To: Ingo Molnar References: <1439591850-29002-1-git-send-email-zjzhang@codeaurora.org> <1439591850-29002-2-git-send-email-zjzhang@codeaurora.org> <20150822092429.GB18233@gmail.com> Cc: Will Deacon , Thomas Gleixner , "H . Peter Anvin" , "linux-kernel @ vger . kernel . org" , "linux-efi @ vger . kernel . org" , Matt Fleming , Borislav Petkov , Ard Biesheuvel , Catalin Marinas , Matt Fleming From: "Zhang, Jonathan Zhixiong" Message-ID: <55DB60FA.8050406@codeaurora.org> Date: Mon, 24 Aug 2015 11:22:50 -0700 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20150822092429.GB18233@gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2521 Lines: 59 On 8/22/2015 2:24 AM, Ingo Molnar wrote: > > * Jonathan (Zhixiong) Zhang wrote: > >> From: "Jonathan (Zhixiong) Zhang" >> >> With ACPI APEI firmware first handling, generic hardware error >> record is updated by firmware in GHES memory region. On an arm64 >> platform, firmware updates GHES memory region with uncached >> access attribute, and then Linux reads stale data from cache. > > This paragraph *still* doesn't parse for me. It's not any English > I can recognize: what is a 'With ACPI APEI firmware first handling'? APEI is ACPI Platform Error Interface; it is part of ACPI spec, defining the aspect of hardware error handling. "firmware first handling" is a terminology used in APEI. It describes such mechanism that when hardware error happens, firmware intersects/handles such hardware error, formulates hardware error record and writes the record to GHES memory region, notifies the kernel through NMI/interrupt, then the kernel GHES driver grabs the error record from the GHES memory region. > >> With current code, GHES memory region is mapped with PAGE_KERNEL >> based on the assumption that cache coherency of GHES memory region >> is maintained by firmware on all platforms. This assumption is >> not true for above mentioned arm64 platform. >> >> Instead GHES memory region should be mapped with page protection type >> according to what is returned from arch_apei_get_mem_attribute(). > > ... plus what this changelog still doesn't mention is the most important part of > any bug fix description: how does the user notice this in practice and why does he > care? The changelog mentioned that Linux would read stale data from cache. When stale data is read, kernel reports there is no new hardware error when there actually is. This may lead to further damage in various scenarios, such as error propagation caused data corruption. > > Thanks, > > Ingo > -- > To unsubscribe from this list: send the line "unsubscribe linux-efi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Jonathan (Zhixiong) Zhang The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/