Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751984AbdHCWG1 (ORCPT ); Thu, 3 Aug 2017 18:06:27 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:58510 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751879AbdHCWG0 (ORCPT ); Thu, 3 Aug 2017 18:06:26 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org BD88B605A4 Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=tbaicar@codeaurora.org Subject: Re: [PATCH] acpi: apei: clear error status before acknowledging the error From: "Baicar, Tyler" To: "Luck, Tony" Cc: Borislav Petkov , rjw@rjwysocki.net, lenb@kernel.org, will.deacon@arm.com, james.morse@arm.com, shiju.jose@huawei.com, geliangtang@gmail.com, andriy.shevchenko@linux.intel.com, linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org References: <1501280703-21471-1-git-send-email-tbaicar@codeaurora.org> <20170729065345.GA30608@nazgul.tnic> <20170731170017.2vwxhewivgpyvpea@intel.com> <01b3550d-1fca-c051-3581-41dda3b62779@codeaurora.org> Message-ID: <82fe23f6-efd8-d256-7f34-a0bbc91237d3@codeaurora.org> Date: Thu, 3 Aug 2017 16:06:22 -0600 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <01b3550d-1fca-c051-3581-41dda3b62779@codeaurora.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2093 Lines: 58 On 7/31/2017 11:44 AM, Baicar, Tyler wrote: > On 7/31/2017 11:00 AM, Luck, Tony wrote: >> On Mon, Jul 31, 2017 at 10:15:27AM -0600, Baicar, Tyler wrote: >>> I think the better thing to do in this case is still send the ack. If >>> ghes_read_estatus() fails, then >>> either we are unable to read the estatus or the estatus is >>> empty/invalid. >> Right now we silently handle that failure of ghes_read_estatus(). That >> might be hiding some Linux bugs if we are calling ghes_proc() in cases >> where we shouldn't. >> >> Perhaps we should have something like this, so if systems do start >> acting >> weirdly there will be a note that we took this path: >> >> rc = ghes_read_estatus(ghes, 0); >> if (rc) { >> pr_notice("surprise failure reading ghes estatus\n"); >> goto out; >> } > Thank you Tony for the feedback, I can add a print like this in the > next version. I'll verify that > rc is not -ENOENT though so we don't print it on empty scenarios since > the polled source > will be hitting this path frequently. > Hi Tony, I think I'm going to avoid adding this print, the failures are reported in prints in ghes_read_estatus(), so it looks a little redundant: [ 133.601165] [Firmware Warn]: GHES: Failed to read error status block! [ 133.601167] surprise failure reading GHES estatus Thanks, Tyler >> >>> If we do not send the ack, then we will be in a scenario where FW >>> will not >>> send any more errors. >> We might ACK something that the firmware didn't send, which may >> lead to other problems. >> >>> I think it would be better to still have the FW send the errors and >>> kernel >>> complain about issues with >> But I agree with this. We should send the ACK. Luckliy this doesn't >> have >> a long legacy problem because the whole ACK mechanism is a new thing. So >> we only have to worry about GHESv2 supporting BIOS. >> >> -Tony > -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.