Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp1136737pxv; Fri, 23 Jul 2021 00:09:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx5LQdkOpO5ZrZc4obsx4HTI6GsTUI+MqWc1zt1I3mAMxySPmj3FZhNNFGyHMOTBWu/CIiM X-Received: by 2002:a17:906:5e09:: with SMTP id n9mr3382937eju.15.1627024196025; Fri, 23 Jul 2021 00:09:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627024196; cv=none; d=google.com; s=arc-20160816; b=ZcfD7xlpogKMe8HpqW98W99VN/Q7hXTwRuK87IqtibEMiy4jxLs4wuyvahKhhPnz2u sUGk8VVCNXQ7+bJtK+O6jG5PzuuX57l8zYUwKX+ylDMeDk0UgcPjb7f1h0gwDGfZAi+3 ioZKBBwV7HQm6D23A8Uj3X3xjZVPCotgbjWlTICLmxOiOwfjeZHagAieeuEJFM/oRSB1 ZZ/OhSrZ0bKUJ3qx0ymmWngf82dORTmh7lcbjVVVvELZZYvUM7OpItC6gzk7dFHfL55o BN57VRCzcNiu5ArtP+bfGhZ4cfxr5q7q1IjkrUKukvNOKsFwZztRItPC2HQyq4a/UqjE +pKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=nfkPx7NC5zaeXceVlyVGlTHRL5ArI5NO9jvFKmnisTY=; b=lKr7zeONu04qfFcJjVPweVYQ87+98eDR2gSby3AlaUYMPp4OzMn68qX5r7931UxWeV anv/AvKPspJwoKbh1t0qD3TL+Yhl9ha1FB/5m7e0JZloMa1lskRqR4TvNI1pGZZZ//R0 hxqrFOLplL+ezCrR6xBg8z6T1hcnpyo0EBcd6h+pzkpSsXvuXswtE0hNGqSUecBYn0sC wXaPKchI/Rs5e23obonUPxA/MNkh35gz4IXGnLHC4TWuSbCfBCorOKLywaXxl3cYkCMa OjAoL0YsXtdJjyNRIZ6iKBSZo6noWJ8wzDhUViUoLIU10s71QS5RBa9Z0mNK9UjaYImu 0F2A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@canonical.com header.s=20210705 header.b=uvrH6AvZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w2si18408781edd.427.2021.07.23.00.09.31; Fri, 23 Jul 2021 00:09:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@canonical.com header.s=20210705 header.b=uvrH6AvZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234235AbhGWGZF (ORCPT + 99 others); Fri, 23 Jul 2021 02:25:05 -0400 Received: from smtp-relay-canonical-0.canonical.com ([185.125.188.120]:33318 "EHLO smtp-relay-canonical-0.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234254AbhGWGY6 (ORCPT ); Fri, 23 Jul 2021 02:24:58 -0400 Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPS id D4B493F345 for ; Fri, 23 Jul 2021 07:05:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1627023926; bh=nfkPx7NC5zaeXceVlyVGlTHRL5ArI5NO9jvFKmnisTY=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=uvrH6AvZOq1wRvT1Bd8/SfOxzDlUfB1BHR1wawlqi/NdJeikPUd7XLty6QwnQUU/Q KmPDnkxPkGWG8o7m7YVSZ9yov8oVMmMbaMxqMDdPRtUxWxHler3Dg0QtiyFPZs9t2I KYDfwoC/KWiYJBeTQEqPndVz1lU3p8GiHIOFquUtupJn6wznZccBqT3HenWSRlEtb+ uG2Mq+Sm4zb+GHhs+XCC+p9gqWZywaj9V2jL3uhGvvTtNCA/XnMQRlw7XifC4NwTb0 bcVXU4j//wcTgoOMtvRwINl1NGUaKvUxSWvfdaGDmyI7pPaa/5jMgcklFVGrmjTIcQ HQZzr8iCv7dWw== Received: by mail-ed1-f72.google.com with SMTP id eg50-20020a05640228b2b02903a2e0d2acb7so232858edb.16 for ; Fri, 23 Jul 2021 00:05:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=nfkPx7NC5zaeXceVlyVGlTHRL5ArI5NO9jvFKmnisTY=; b=V+RPkMJHw+Pt+7lBHchxkOhQLcfBLtz1uNu0dT3/wXW8rBGyS5GmWUnMrlJEMHsHG7 SX4AE8kZ8It5P21rIthFRobDv3l/7Q2tJwM0jAfpqlV4E5UAxBmWCWgM+QfCCkiXYhbm 9NrsQ4Ca/38DyXYU2Z/S+CkIwub3PTpTwEMopopBfTfyaOFrcIbKVK40Y5KuGhuQoNjj C5VoJbuGyw3jXlYsq9JJDoo6PiCf2mySEbJdY+QY/lZy9G+z7wL/Otr51lEQAxxBmReb n1U79V8sFXCizUYvjjtOaX+xwDXPxeY2zaK4i0JhZA5CJAA5HEavuGeZgKS1Gs2vLPeD Z15g== X-Gm-Message-State: AOAM530Zup1t0exwRwIMGrgaJOnejnOVExt00Gjj9BByVmmR1YnJRPZQ 1Ce3a0K8S7G6oRH+BHaGtc0EcSeJXFirroBHRgf9KJberhP+2DLOFu2Z5ofoFFx/Hm4QFmGikjM GwYba4mfnUo09Tu+VOByozffFkfRoohcArdIpG2Xy8vJfHLXm2ND6n/e/XQ== X-Received: by 2002:a17:906:f0d8:: with SMTP id dk24mr3430030ejb.432.1627023926312; Fri, 23 Jul 2021 00:05:26 -0700 (PDT) X-Received: by 2002:a17:906:f0d8:: with SMTP id dk24mr3430007ejb.432.1627023926038; Fri, 23 Jul 2021 00:05:26 -0700 (PDT) MIME-Version: 1.0 References: <20210722222351.GA354095@bjorn-Precision-5520> In-Reply-To: From: Kai-Heng Feng Date: Fri, 23 Jul 2021 15:05:12 +0800 Message-ID: Subject: Re: [PATCH 1/2] PCI/AER: Disable AER interrupt during suspend To: Christoph Hellwig Cc: Bjorn Helgaas , Joerg Roedel , "open list:PCI ENHANCED ERROR HANDLING (EEH) FOR POWERPC" , "open list:PCI SUBSYSTEM" , open list , Lalithambika Krishnakumar , Alex Williamson , "Oliver O'Halloran" , Bjorn Helgaas , Mika Westerberg , Lu Baolu Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 23, 2021 at 1:24 PM Christoph Hellwig wrote: > > On Thu, Jul 22, 2021 at 05:23:51PM -0500, Bjorn Helgaas wrote: > > Marking both of these as "not applicable" for now because I don't > > think we really understand what's going on. > > > > Apparently a DMA occurs during suspend or resume and triggers an ACS > > violation. I don't think think such a DMA should occur in the first > > place. > > > > Or maybe, since you say the problem happens right after ACS is enabled > > during resume, we're doing the ACS enable incorrectly? Although I > > would think we should not be doing DMA at the same time we're enabling > > ACS, either. > > > > If this really is a system firmware issue, both HP and Dell should > > have the knowledge and equipment to figure out what's going on. > > DMA on resume sounds really odd. OTOH the below mentioned case of > a DMA during suspend seems very like in some setup. NVMe has the > concept of a host memory buffer (HMB) that allows the PCIe device > to use arbitrary host memory for internal purposes. Combine this > with the "Storage D3" misfeature in modern x86 platforms that force > a slot into d3cold without consulting the driver first and you'd see > symptoms like this. Another case would be the NVMe equivalent of the > AER which could lead to a completion without host activity. The issue can also be observed on non-HMB NVMe. > > We now have quirks in the ACPI layer and NVMe to fully shut down the > NVMe controllers on these messed up systems with the "Storage D3" > misfeature which should avoid such "spurious" DMAs at the cost of > wearning out the device much faster. Since the issue is on S3, I think the NVMe always fully shuts down. Kai-Heng