Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp5085980ybl; Wed, 22 Jan 2020 10:03:40 -0800 (PST) X-Google-Smtp-Source: APXvYqyp2rQySYMM5Knz+OHsXSylER/HZ1QAuZ2+FiGv2M7MMlNVEBN7LM5Q/AG1hu/dMBQ216HD X-Received: by 2002:aca:be57:: with SMTP id o84mr7816904oif.138.1579716219973; Wed, 22 Jan 2020 10:03:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579716219; cv=none; d=google.com; s=arc-20160816; b=FCSWLKOwU/hf3lFgh3Plh4wVqH4c0COPYgMP3sNTPDlTIcRWLowxUJNAs3jV1BCPP5 KNdhZToj0pIY4RJqFgyYbjwCMLQ/EiLVOaUeSq/zif1Tfetn/PJJgY3998o9h5foQR/2 uWSdMuN+PpyCIejKs/s8mtxhvtaBScp6bVxeKZpEvrSHBrZxVExW5CHp9apltk2G4iaN 6P+BW6JgEZxHUBPAredwflog4UxZz+knDvMLSBw1zsF922LjyADwqkOFEJTXY5UmCxNp Z4aar9EjyBIyL4ygUA3RLkU77g9CyNU69LHI1ooLKfbx75IzHZuOou+pdWaRM0NCjG51 BhRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=Ve6hAdMotmeqWUy7rZgNLYw8FBj8creK8SvepbZVV/4=; b=eQ5W4ldgaEh3HNANknSDwxhclDOsvteuTTaEis1vq/Uzyer2ugqabtnbn9OhT/yP5w LxNEBiPxaus2+8QD+c2dteiax1Mtyi1S4wmbxXB+dQHBZ9R/Ls6XAnlCh5KX+A0uVMrw Xi5XmMphIvFfxyBN62zyF3KRJ5b1gRo+p39b/5jNzrcDWZmM/WOLF1hTkWW+G9OOBoT/ +LadwCyYqQnMl1hWPGINsRNgBP4i474mP6EUJubeD7RGKvSiZmtsBFMFpHYuSfvIn2du z6xFS1WOvKTH3LCqRAmA0FWqVRLoYTxilet33xP+WNhTXhnlsyJZFR724+dm+187obnT hGCw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="cljBKZe/"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w19si23700585otj.209.2020.01.22.10.03.25; Wed, 22 Jan 2020 10:03:39 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="cljBKZe/"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726227AbgAVSBj (ORCPT + 99 others); Wed, 22 Jan 2020 13:01:39 -0500 Received: from mail-lf1-f65.google.com ([209.85.167.65]:43714 "EHLO mail-lf1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725884AbgAVSBi (ORCPT ); Wed, 22 Jan 2020 13:01:38 -0500 Received: by mail-lf1-f65.google.com with SMTP id 9so273084lfq.10 for ; Wed, 22 Jan 2020 10:01:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Ve6hAdMotmeqWUy7rZgNLYw8FBj8creK8SvepbZVV/4=; b=cljBKZe/KhdmnH6JgB6QD1OQKOXQmzpc+CIplxIB1wb6JuPEscasAxY1yPfh3nBF+E ei/5n4BhrxodchC7RX+Y9yEo+HlIgvwB8IM/XmSIIelWMBvSoeBKa0+Lb3kydKEZCie6 J2orWaCGI/gMymrkpVBPNFKM4eGakuSWSQQVc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Ve6hAdMotmeqWUy7rZgNLYw8FBj8creK8SvepbZVV/4=; b=lBvGbHQKzSQXAfECqLtSC5YtFk9teDFrID9z+YDcrvvo1bm3BZNDLWb/o+lcg4mj5A dyEFzROb/cVwPlvp4PYNsdWu3417fz9RNf563SljqQhyMTnEh6yRhtb48sqmCSyrtTss P8k1csEP+CKuXwSHGE59yzrFvKd6s+jtEb6VzzV4lc7vMCcki1fVVqgOLjdXqubICfGR QzWcY3btpmRUrAO9CY0sBnO/FWw1ZFH/Gin7fUvl8s5BSRFp2UptO3Av/0ih6xjHUirg e0BqXUiwWJvsAkQqZH+4uXsx5ynLxdPT2lt642uhw3Ak0W4vxqumLukiu3nRxTmF9n3R zrAQ== X-Gm-Message-State: APjAAAW8+KyA40VGTgkzqFyFVTXpXJfPj8GBYZTbiLanf1yBoykwEcQt 8gDDbRVTnR/czuLYSeuvptp3l5KQ1tM= X-Received: by 2002:a19:f00d:: with SMTP id p13mr2415300lfc.37.1579716096767; Wed, 22 Jan 2020 10:01:36 -0800 (PST) Received: from mail-lf1-f45.google.com (mail-lf1-f45.google.com. [209.85.167.45]) by smtp.gmail.com with ESMTPSA id r15sm20674954ljh.11.2020.01.22.10.01.36 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 22 Jan 2020 10:01:36 -0800 (PST) Received: by mail-lf1-f45.google.com with SMTP id z18so309413lfe.2 for ; Wed, 22 Jan 2020 10:01:36 -0800 (PST) X-Received: by 2002:a05:6512:2035:: with SMTP id s21mr2211453lfs.99.1579716095165; Wed, 22 Jan 2020 10:01:35 -0800 (PST) MIME-Version: 1.0 References: <20200117162444.v2.1.I9c7e72144ef639cc135ea33ef332852a6b33730f@changeid> In-Reply-To: From: Evan Green Date: Wed, 22 Jan 2020 10:00:59 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2] PCI/MSI: Avoid torn updates to MSI pairs To: Rajat Jain Cc: Bjorn Helgaas , linux-pci , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 22, 2020 at 3:26 AM Rajat Jain wrote: > > On Fri, Jan 17, 2020 at 4:26 PM Evan Green wrote: > > > > __pci_write_msi_msg() updates three registers in the device: address > > high, address low, and data. On x86 systems, address low contains > > CPU targeting info, and data contains the vector. The order of writes > > is address, then data. > > > > This is problematic if an interrupt comes in after address has > > been written, but before data is updated, and both the SMP affinity > > and target vector are being changed. In this case, the interrupt targets > > the wrong vector on the new CPU. > > > > This case is pretty easy to stumble into using xhci and CPU hotplugging. > > Create a script that repeatedly targets interrupts at a set of cores and > > then offlines those cores. Put some stress on USB, and then watch xhci > > lose an interrupt and die. > > Do I understand it right, that even with this patch, the driver might > still miss the same interrupt (because we are disabling the interrupt > for that time) - the improvement this patch brings is that it will at > least not be delivered to the wrong CPU or via a wrong vector? In my experiments, the driver no longer misses the interrupt. XHCI is particularly sensitive to this, if it misses one interrupt it seems to completely wedge the driver. I think in my case the device pends the interrupts until MSIs are re-enabled, because I don't see anything other than MSI for xhci in /proc/interrupts. But I'm not sure if other devices may fall back to line-based interrupts for a moment, and if that's a problem. Although, I already see we call pci_msi_set_enable(0) whenever we set up MSIs, presumably for this same reason of avoiding torn MSIs. So my fix is really just doing the same thing for an additional case. And if getting stuck in a never-to-be-handled line based interrupt were a problem, you'd think it would also be a problem in pci_restore_msi_state(), where the same thing is done. Maybe my fix is at the wrong level, and should be up in pci_msi_domain_write_msg() instead? Though I see a lot of callers to pci_write_msi_msg() that I worry have the same problem. -Evan