Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp63710ybl; Wed, 22 Jan 2020 16:11:19 -0800 (PST) X-Google-Smtp-Source: APXvYqyRB/u2p99p5D13NwIHQxDK7JDMak6pq446z9GchCrRS6z53vO1IuGjsx4DXNtXzSpE6FUk X-Received: by 2002:a9d:7852:: with SMTP id c18mr8891811otm.247.1579738279213; Wed, 22 Jan 2020 16:11:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579738279; cv=none; d=google.com; s=arc-20160816; b=nNM1BKEOf2fUjMt4wR863157YRhPUAHyLDJG7iO8wIrvYxhd9tVgWkPeseUHkYqaus 4WJNZQmkew2VIwVKUcLfMbbYlX+7sZq/+Rzn5y+8yDm3UkpuRbHWeJprwt0gQXK3hct+ jjaXlcsN8nIf/e0rpI5wWUavsv7GZIBnmFo6rRG4T8fJPEv6PpcxzCrTP53n+bh/3aMb j2/KbnBgX0WHPXvngYhXt+ryh3y6I1+zPzgi99K1L6H1NYjv1BdxQV/KwtO2p0Lwkq4h KYallU85b6DsqsRfCU+E+ODzvoPxIRXuEUF6TWD1vm2JMNY5sqCkVIls3TeLov0GNHrD C8sw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=ljR4qIScgnHlVgotRz5jvHzcfQpNvaV3xGDWeSwVuWI=; b=RH2Qk1Vfw9HLFZyF9H5WBxqu0sX07kmD4Bm4yYuD0zCVxezi9J65zUSw1+nLWgFcuY SvungmCblkjDwNSNUXgKaCnfxCEaEDrEniUgUcgm6of0qpFrvtI8surhXwe6lh0xgSu0 Gw/CtLblyuYZMBy0f1pZXDN/hCgW+pype0VkArmJICNhRyUft6wmNVomcHWwNxa5DDfL 4iLQlxodzQEUtSNaBtyXjIfxcRDSAUxooyzrZ0dI+LbumO0pYhtMHSjyVsvP1Apl51RP XKa379Pucvny2ifxnnNmT5o4p4xkecd4Wj4dmSZBG0Kejju9isG6aIwc+jceDHm2iSPj 11/g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=VlyKzP0a; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 12si37126oiz.131.2020.01.22.16.11.06; Wed, 22 Jan 2020 16:11:19 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=VlyKzP0a; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726231AbgAWAIW (ORCPT + 99 others); Wed, 22 Jan 2020 19:08:22 -0500 Received: from mail-lf1-f66.google.com ([209.85.167.66]:35179 "EHLO mail-lf1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725989AbgAWAIW (ORCPT ); Wed, 22 Jan 2020 19:08:22 -0500 Received: by mail-lf1-f66.google.com with SMTP id z18so982144lfe.2 for ; Wed, 22 Jan 2020 16:08:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ljR4qIScgnHlVgotRz5jvHzcfQpNvaV3xGDWeSwVuWI=; b=VlyKzP0aHglwI0D+fnQ6xSvab3aMMbG4BkNrmqROWoeBP1sfY1F+z/40Y6ZaJ9RRoS VMI1GSrctXZvVoV1+bCx+hEvKrzCoQ6AriVBgiSP9kt9mQkzCBExmT+f7c0T97GT4ufO twLmXNlfgnlTSMMciIYIpm4Rz/XgTauBDiaSc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ljR4qIScgnHlVgotRz5jvHzcfQpNvaV3xGDWeSwVuWI=; b=hc64MqoQkFqHlMFqnVsxe6iH8XjZDDPJJyOvr+WIGvWRKa/vc2lNxxnHnVGq8qhubp NG2VxJWe7WjDbfQdkn3twzksVr+FbpL1ptC5vZKh/U1AvbjWEoIKBT8pYZS8DlxpsfFM qcvWBH3pWGgO8FxGnyaHXDtOF4/QVqBvmJZxnMb1oJh1tCjV0qPwAAcIm2o1KoYAsKsH Fne2gv0/xng6/avJRp2z0PK14DgLfLlVpnmuktZqXftFo0LgwUnnhrkw058yLXBKacrD DXtGo2jgU2i+YOX0UaV5EGWs7Jgdq74KLUqkdrlNfYWPJg4GjDuzdi5WJKAHerQ0yQ5Y AEeQ== X-Gm-Message-State: APjAAAW/0mEmLK/6MStw9zEjOqMymmEGmn57iq40mq0LRLt5jBeM9VfT LuvkEFnYTCDlafxSy0P+UtXUY+WfjfU= X-Received: by 2002:a19:ca07:: with SMTP id a7mr3091682lfg.103.1579738098814; Wed, 22 Jan 2020 16:08:18 -0800 (PST) Received: from mail-lf1-f44.google.com (mail-lf1-f44.google.com. [209.85.167.44]) by smtp.gmail.com with ESMTPSA id p12sm90133lfc.43.2020.01.22.16.08.17 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 22 Jan 2020 16:08:18 -0800 (PST) Received: by mail-lf1-f44.google.com with SMTP id v201so935393lfa.11 for ; Wed, 22 Jan 2020 16:08:17 -0800 (PST) X-Received: by 2002:a05:6512:2035:: with SMTP id s21mr2832952lfs.99.1579738097075; Wed, 22 Jan 2020 16:08:17 -0800 (PST) MIME-Version: 1.0 References: <20200116133102.1.I9c7e72144ef639cc135ea33ef332852a6b33730f@changeid> <20200122172816.GA139285@google.com> <875zh3ukoy.fsf@nanos.tec.linutronix.de> In-Reply-To: <875zh3ukoy.fsf@nanos.tec.linutronix.de> From: Evan Green Date: Wed, 22 Jan 2020 16:07:40 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] PCI/MSI: Avoid torn updates to MSI pairs To: Thomas Gleixner Cc: Bjorn Helgaas , linux-pci@vger.kernel.org, LKML , Marc Zyngier , Christoph Hellwig , Rajat Jain Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 22, 2020 at 3:37 PM Thomas Gleixner wrote: > > Evan Green writes: > > On Wed, Jan 22, 2020 at 9:28 AM Bjorn Helgaas wrote: > >> I suspect this *is* a problem because I think disabling MSI doesn't > >> disable interrupts; it just means the device will interrupt using INTx > >> instead of MSI. And the driver is probably not prepared to handle > >> INTx. > >> > >> PCIe r5.0, sec 7.7.1.2, seems relevant: "If MSI and MSI-X are both > >> disabled, the Function requests servicing using INTx interrupts (if > >> supported)." > > Disabling MSI is not an option. Masking yes, but MSI does not have > mandatory masking. We already attempt masking on migration, which covers > only MSI-X reliably, but not all MSI incarnations. > > So I assume that problem happens on a MSI interrupt, right? > > >> Maybe the IRQ guys have ideas about how to solve this? > > Maybe :) > > > But don't we already do this in __pci_restore_msi_state(): > > pci_intx_for_msi(dev, 0); > > pci_msi_set_enable(dev, 0); > > arch_restore_msi_irqs(dev); > > > > I'd think if there were a chance for a line-based interrupt to get in > > and wedge itself, it would already be happening there. > > That's a completely different beast. It's used when resetting a device > and for other stuff like virt state migration. That's not a model for > affinity changes of a live device. Hm. Ok. > > > One other way you could avoid torn MSI writes would be to ensure that > > if you migrate IRQs across cores, you keep the same x86 vector number. > > That way the address portion would be updated, and data doesn't > > change, so there's no window. But that may not actually be feasible. > > That's not possible simply because the x86 vector space is limited. If > we would have to guarantee that then we'd end up with a max of ~220 > interrupts per system. Sufficient for your notebook, but the big iron > people would be not amused. Right, that occurred to me as well. The actual requirement isn't quite as restrictive. What you really need is the old vector to be registered on both the old CPU and the new CPU. Then once the interrupt is confirmed to have moved we could release both the old vector both CPUs, leaving only the new vector on the new CPU. In that world some SMP affinity transitions might fail, which is a bummer. To avoid that, you could first migrate to a vector that's available on both the source and destination CPUs, keeping affinity the same. Then change affinity in a separate step. Or alternatively, you could permanently designate a "transit" vector. If an interrupt fires on this vector, then we call all ISRs currently in transit between CPUs. You might end up calling ISRs that didn't actually need service, but at least that's better than missing edges. > > The real critical path here is the CPU hotplug path. > > For regular migration between two online CPUs we use the 'migrate when > the irq is actually serviced ' mechanism. That might have the same issue > on misdesigned devices which are firing the next interrupt before the > one on the flight is serviced, but I haven't seen any reports with that > symptom yet. > > But before I dig deeper into this, please provide the output of > > 'lscpci -vvv' and 'cat /proc/interrupts' > Here it is: https://pastebin.com/YyxBUvQ2 This is a Comet Lake system. It has 8 HT cores, but 4 of those cores have already been offlined. At the bottom of the paste I also included the script I used that causes a repro in a minute or two. I simply run this, then put some stress on USB. For me that stress was "join a Hangouts meeting", since that stressed both my USB webcam and USB ethernet. The script exits when xhci dies. -Evan