Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1168040ybl; Thu, 23 Jan 2020 15:04:15 -0800 (PST) X-Google-Smtp-Source: APXvYqx3OSxBkdQux5vzEraATHXx1+K1WxPn92OOVrHM4BwR75pK84m+prNQkHImLx254x/sesIy X-Received: by 2002:a9d:6b12:: with SMTP id g18mr540509otp.211.1579820655186; Thu, 23 Jan 2020 15:04:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579820655; cv=none; d=google.com; s=arc-20160816; b=Adn2nb0edugPFtY6NMK2bJL77Ck6bhXMy8vQIVBNaAiR3+CgJZiswGDPwHdi+OQOXN 62roI97HtUCHk1DP4Abn1iFWMOeeAZwb0GtPhWv3X3chXNSamWgwGYQrJ3skwm/WSz53 8D6UrDT4JG66EE6DK/2f1kFHbNnHcAu2/OOo5YuxjwMtpHKNrL7dXOfpduLaNb9LASpW qqYXmnqJOvXq+c8mW7QX6ryO0LFiRCkR3oDOzvoLwrO0tC6S8MX3DLgsIZJnfZ4tqJ8P rpv0wMeyDluTH2RqAEOb+ej196klmA6ZSh0TcG/+C7LoIIdFSxA9vYvs70kToie5GRRk sTxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=nS7ocb5jumTWRAIkSgB97xnkdSH5hp40IMpcRwtHcbY=; b=JTGIGdo8QrZ97UMQ4l7+XbNdbkouvwGVlxUZQjYffJbD/GS5g+AObTJ+m6Rd3nR9Cz pbrIGrYccd1CdZqWK/UjNAUVRZ2UKEqaO6Pd7BX6VAMuNyaMrOuZc32SjLkIFWKS45QX /DyrvTXjSVFFeUhS+N9HluwPyuq4/Gjh00bClkUCbV1fHlmEK3AaH64b5Y9ppeAAhjQU XktsqF1IBmNN88uu8wSm4fTY2Q5kEzTErs29+boZUTXQb2IW0tmPghSyhup/We7fLwwD EetVWOYamg3DHytcFpnUsGn2Q+8TiT5KyWcUG5NOycBbQpMZPZOmPosa1tms9oWI3/O9 eF9w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=QHKBwjvI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s194si83698oih.103.2020.01.23.15.04.02; Thu, 23 Jan 2020 15:04:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=QHKBwjvI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729732AbgAWXAG (ORCPT + 99 others); Thu, 23 Jan 2020 18:00:06 -0500 Received: from mail-lj1-f195.google.com ([209.85.208.195]:32814 "EHLO mail-lj1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729596AbgAWXAG (ORCPT ); Thu, 23 Jan 2020 18:00:06 -0500 Received: by mail-lj1-f195.google.com with SMTP id y6so259595lji.0 for ; Thu, 23 Jan 2020 15:00:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=nS7ocb5jumTWRAIkSgB97xnkdSH5hp40IMpcRwtHcbY=; b=QHKBwjvIIEDpjKu390N/gc1f9Q6gVEAklPmkY+Ok83BmjLFPgsYimIPaB7w1N5KOuB XmXHjgtDrWaV/cu4fniFp9KuQKOAKHB3BEC222MdEkfvQhSskR8IVqks1DjwMdbuZaG6 5ceBFph6PeI4oRJXdzPwLHMoChv0YMYyEP4vE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=nS7ocb5jumTWRAIkSgB97xnkdSH5hp40IMpcRwtHcbY=; b=XNWQw+bwUXtj4yCAQdtR/WeSuH4wN8VFgtOxZWzsYeetmrdLSd7IU+KuxCbahQumF3 MHsNL5v0jSiyagm6IiuZ1zFFWC1VJwxEcEYJIsMlF78uaY6dDz53H6/U4zJL23Ybm9R/ f3IIyEamjYh8v/ijCnwb2SK22/P2ES/ItOkZNbIsFRYyO1oOLzJJP8Orp06Td7HKaRpM k+meWHweqG7Gpjs95a9D17oNZTvCx0re+kku2lYMpRpMyveHEFpJJPkt5bym3lBMUcTd EMK9QNWsRxAS9N2Yz+jKOVJkA4JEfZkJl/Mz5UzN/XZVvr3ci5OZfUuLnW44rKLy/Sv6 1egw== X-Gm-Message-State: APjAAAXlWVCmOsPvzsncZooF+87WBVArkP7RTgG7UEOMTH5Ei5HBsZFD QW5c26OxCVsUNym5kDPUexGEItSB7dY= X-Received: by 2002:a2e:808a:: with SMTP id i10mr402199ljg.151.1579820404185; Thu, 23 Jan 2020 15:00:04 -0800 (PST) Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com. [209.85.167.54]) by smtp.gmail.com with ESMTPSA id k25sm1987404lji.42.2020.01.23.15.00.03 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 23 Jan 2020 15:00:03 -0800 (PST) Received: by mail-lf1-f54.google.com with SMTP id v201so3559821lfa.11 for ; Thu, 23 Jan 2020 15:00:03 -0800 (PST) X-Received: by 2002:a19:e011:: with SMTP id x17mr44398lfg.59.1579820402396; Thu, 23 Jan 2020 15:00:02 -0800 (PST) MIME-Version: 1.0 References: <20200117162444.v2.1.I9c7e72144ef639cc135ea33ef332852a6b33730f@changeid> <87y2tytv5i.fsf@nanos.tec.linutronix.de> <87eevqkpgn.fsf@nanos.tec.linutronix.de> In-Reply-To: From: Evan Green Date: Thu, 23 Jan 2020 14:59:25 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2] PCI/MSI: Avoid torn updates to MSI pairs To: Thomas Gleixner Cc: Rajat Jain , Bjorn Helgaas , linux-pci , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 23, 2020 at 12:59 PM Evan Green wrote: > > On Thu, Jan 23, 2020 at 10:17 AM Thomas Gleixner wrote: > > > > Evan, > > > > Thomas Gleixner writes: > > > This is not yet debugged fully and as this is happening on MSI-X I'm not > > > really convinced yet that your 'torn write' theory holds. > > > > can you please apply the debug patch below and run your test. When the > > failure happens, stop the tracer and collect the trace. > > > > Another question. Did you ever try to change the affinity of that > > interrupt without hotplug rapidly while the device makes traffic? If > > not, it would be interesting whether this leads to a failure as well. > > Thanks for the patch. Looks pretty familiar :) > I ran into issues where trace_printks on offlined cores seem to > disappear. I even made sure the cores were back online when I > collected the trace. So your logs might not be useful. Known issue > with the tracer? > > I figured I'd share my own debug chicken scratch, in case you could > glean anything from it. The LOG entries print out timestamps (divide > by 1000000) that you can match up back to earlier in the log (ie so > the last XHCI MSI change occurred at 74.032501, the last interrupt > came in at 74.032405). Forgive the mess. > > I also tried changing the affinity rapidly without CPU hotplug, but > didn't see the issue, at least not in the few minutes I waited > (normally repros easily within 1 minute). An interesting datapoint. One additional datapoint. The intel guys suggested enabling CONFIG_IRQ_REMAP, which does seem to eliminate the issue for me. I'm still hoping there's a smaller fix so I don't have to add all that in. -Evan