Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp211656pxu; Wed, 7 Oct 2020 00:39:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwxRDgkuQDQ9rbHvjs7RegrKfiZn15UTDlejNkyUyt9ysDkbm8/Wn6v5oGok5kKD62FQKSr X-Received: by 2002:a17:906:fa99:: with SMTP id lt25mr1910746ejb.511.1602056365913; Wed, 07 Oct 2020 00:39:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1602056365; cv=none; d=google.com; s=arc-20160816; b=qph5nsTfxsFtVG6afWsgPeQycnV/V4pGdNHIWXR13sjg1Ww1Y9bb2C8s3+dL0beLc2 AxIA1xrbV2xPrq/Wu5DoJBxnjugmPMnnpbL73tXMuIRwrpQ8rVlGr6Ta7pP4PTLakS3m 8n/F79HHBnEZqVCOwfja/U13SA7QVW7HuRARCaLr88MY7/qi4pKxNHx6nTBp0mvtJAeR kl+OLNyc6Krn9sWbWuOiBFXNka6epH3oX8huXfdi+0aSfZ6eSVKvR8dh0+oqXbltPTbz DrshzcgiqOv0puKczchW5FfqDtti4UP29o8fJbLb0CL9bive0BJYHnRsPAq2qfAB9bMZ Z3Iw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=BOwACkyD8dEtsrUHuldNDeZjoZsGRiveXSWhLCbbCV4=; b=QXmmrB+jWvhYL9Jkrc7FGeF1rFcluLhgvH0PCGdtHUzVPvbxYfmlT4YUcKLxt7Zbu9 Dq0S0aHZS+l7PSI8KMORl3pnky7UwrEZ6Om/yJZPiF+7JhkDM3HczBX392VsD+DZrkrV NHuYEk19JEMHeYtYuu0I+ppcjsjDz0Me8XLmPuVYS5YvUaI0JxxKUP7o7rh3TERAPa91 2IP3J9LN9fojyK1RI+6TDn8fpnlLM3/LBa4PaRrMYinPxUBrkzORmK6n+d6zTpJ7KIhn SJwGejsBCbmyWUhQ+HQ/VAD0cB75O4XpIf5JXHgnGEHWUoIwk8xYd0c4u7XqYVUzh+Fd BcvA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=dP79MDkd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z11si801063edl.450.2020.10.07.00.39.02; Wed, 07 Oct 2020 00:39:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=dP79MDkd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727710AbgJGHeA (ORCPT + 99 others); Wed, 7 Oct 2020 03:34:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52098 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727570AbgJGHeA (ORCPT ); Wed, 7 Oct 2020 03:34:00 -0400 Received: from mail-ed1-x544.google.com (mail-ed1-x544.google.com [IPv6:2a00:1450:4864:20::544]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDAA8C061755; Wed, 7 Oct 2020 00:33:59 -0700 (PDT) Received: by mail-ed1-x544.google.com with SMTP id b12so1076437edz.11; Wed, 07 Oct 2020 00:33:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=BOwACkyD8dEtsrUHuldNDeZjoZsGRiveXSWhLCbbCV4=; b=dP79MDkd6MgzJF1q/AtCcrvFV98Hp4azX/2FgqgeE/hidSKPgCcpGlOhJEaEhnha2k 05+kNxOCYZF23LhCyFDA9AHv/298dvpowB6rL99MGqyNUmTE1H1BVn1Wo0kqevfFF+ly tBn/E+C4Cgwm1mIafrXTQ+ifw9JrFeXmSq4ksHqfJKklfCDr8QKv/zhMlSk6EsBjP6bR 1urCqTU24IHjbPdh9PkGo0aW+F6hmr4U/WOgdfgGyLqCzEXm2N1P0B4v8E0/7J4Tqa5k 00zlHz7pnxKYsPo091W3LXVP+s+d9zzy2QjtCFZovjtU5CZuur1JwgECetHc0P3sNbtp SmjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=BOwACkyD8dEtsrUHuldNDeZjoZsGRiveXSWhLCbbCV4=; b=iu2b7SAmhL1J88Xvb/rmtIOU3lHxU+dnCC+JSKQ5ooaTaUxEGzy8P/3EBROLoVK6Mk VYPGHwZLN4eX3HFdjCqz6+yIoqsDYOGLPEVVqUOq9oklAGCWyhtjxuJ1rBElx5nrSQah ym1VMoZdv7u0kRP9xzUTelKTTVZOfKrujHCYTlho4Dx+6JRzp5nU6jXBcVDImFdGSZ4f GIss7JvFX+N1667qwX9/kbScCpldpTCb1bxx3zWZbKNWfCL6DFgYpDq4bH95U1iCkp76 EAg45PyPJhBIoyp/tDCuyIIMLrkaVuMa3i7867rVSFEBBrXGMgTov9L3n27HThszTMQz FRUQ== X-Gm-Message-State: AOAM532SZiNOTTEXXV5QDXnTrfqiez8SnwF1dVWKMPSlbGbQwOcDNegN uXvXijJbIkNsCU08FpgFZNgfQuCzoiPFNqsOifw= X-Received: by 2002:a50:9f66:: with SMTP id b93mr2163710edf.201.1602056038619; Wed, 07 Oct 2020 00:33:58 -0700 (PDT) MIME-Version: 1.0 References: <20201003075514.32935-1-haifeng.zhao@intel.com> <20201004045745.GA3207@araj-mobl1.jf.intel.com> In-Reply-To: <20201004045745.GA3207@araj-mobl1.jf.intel.com> From: Ethan Zhao Date: Wed, 7 Oct 2020 15:33:47 +0800 Message-ID: Subject: Re: [PATCH v7 0/5] Fix DPC hotplug race and enhance error handling To: "Raj, Ashok" Cc: Ethan Zhao , Bjorn Helgaas , Oliver , ruscur@russell.cc, Lukas Wunner , Andy Shevchenko , Stuart Hayes , Alexandru Gagniuc , Mika Westerberg , linux-pci , Linux Kernel Mailing List , Sathyanarayanan Kuppuswamy , Ashok Raj Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Raj, On Sun, Oct 4, 2020 at 12:57 PM Raj, Ashok wrote: > > Hi Ethan > > On Sat, Oct 03, 2020 at 03:55:09AM -0400, Ethan Zhao wrote: > > Hi,folks, > > > > This simple patch set fixed some serious security issues found when DPC > > error injection and NVMe SSD hotplug brute force test were doing -- race > > condition between DPC handler and pciehp, AER interrupt handlers, caused > > system hang and system with DPC feature couldn't recover to normal > > working state as expected (NVMe instance lost, mount operation hang, > > race PCIe access caused uncorrectable errors reported alternatively etc). > > I think maybe picking from other commit messages to make this description in > cover letter bit clear. The fundamental premise is that when due to error > conditions when events are processed by both DPC handler and hotplug handling of > DLLSC both operating on the same device object ends up with crashes. Yep, that's right. Thanks, Ethan > > > > > > With this patch set applied, stable 5.9-rc6 on ICS (Ice Lake SP platform, > > see > > https://en.wikichip.org/wiki/intel/microarchitectures/ice_lake_(server)) > > > > could pass the PCIe Gen4 NVMe SSD brute force hotplug test with any time > > interval between hot-remove and plug-in operation tens of times without > > any errors occur and system works normal. > > > > > With this patch set applied, system with DPC feature could recover from > > NON-FATAL and FATAL errors injection test and works as expected. > > > > System works smoothly when errors happen while hotplug is doing, no > > uncorrectable errors found. > > > > Brute DPC error injection script: > > > > for i in {0..100} > > do > > setpci -s 64:02.0 0x196.w=000a > > setpci -s 65:00.0 0x04.w=0544 > > mount /dev/nvme0n1p1 /root/nvme > > sleep 1 > > done > > > > Other details see every commits description part. > > > > This patch set could be applied to stable 5.9-rc6/rc7 directly. > > > > Help to review and test. > > > > v2: changed according to review by Andy Shevchenko. > > v3: changed patch 4/5 to simpler coding. > > v4: move function pci_wait_port_outdpc() to DPC driver and its > > declaration to pci.h. (tip from Christoph Hellwig ). > > v5: fix building issue reported by lkp@intel.com with some config. > > v6: move patch[3/5] as the first patch according to Lukas's suggestion. > > and rewrite the comment part of patch[3/5]. > > v7: change the patch[4/5], based on Bjorn's code and truth table. > > change the patch[5/5] about the debug output information. > > > > Thanks, > > Ethan > > > > > > Ethan Zhao (5): > > PCI/ERR: get device before call device driver to avoid NULL pointer > > dereference > > PCI/DPC: define a function to check and wait till port finish DPC > > handling > > PCI: pciehp: check and wait port status out of DPC before handling > > DLLSC and PDC > > PCI: only return true when dev io state is really changed > > PCI/ERR: don't mix io state not changed and no driver together > > > > drivers/pci/hotplug/pciehp_hpc.c | 4 ++- > > drivers/pci/pci.h | 55 +++++++++++++------------------- > > drivers/pci/pcie/dpc.c | 27 ++++++++++++++++ > > drivers/pci/pcie/err.c | 18 +++++++++-- > > 4 files changed, 68 insertions(+), 36 deletions(-) > > > > > > base-commit: a1b8638ba1320e6684aa98233c15255eb803fac7 > > -- > > 2.18.4 > >