Received: by 2002:a05:6a10:d5a5:0:0:0:0 with SMTP id gn37csp4258797pxb; Mon, 4 Oct 2021 22:05:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJya1+w5F/+HP1us7il6KCXVLsVx+Xga+Rz5hJPmn79XfzOO31HxIeK6jLA+StMJqaQTavHK X-Received: by 2002:a50:d88b:: with SMTP id p11mr23188818edj.287.1633410329769; Mon, 04 Oct 2021 22:05:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633410329; cv=none; d=google.com; s=arc-20160816; b=RZLRqPtUwTt3sjDxo/M0c9tmlTmWNcQd+RjjL2H30M8XmCSSDJ54NwrkofFae8iVUZ js1d5QbiNaqhrGgHdHogQmdEQ4RBOl/RQx2O8wJVkibox5SwR4Jqn83JZ9yEr7tUIoW/ EtVfDqtAG09uVDMWlG+atJcEDkUbRIizjt3gSqzoXW8A6Hm4TmxY8XgT28PXfzEr/oT9 hxF7GAjsSzj66GiKuCu8QXQ3zguVaJTG35hhUAAhvh1PxOICuv/1nvOQXmY9VSLDDhoP KnDvNUUzTXgu4icfxighnWvjqkqDB0B56odX3JuOOMwtvNOHLyibriFABlb/26nK/Yag 6ezA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=4sv1HPCgRHpn4yyOFITBxjMLxNIHOvFNMiBGKmMcH4k=; b=YDJR2Exc7pPRFmK3jhD/KkZiOzb+girYLs4oWvVB4Y6yclnfO1bnsCL9BtzdLHUATS LdVacNQBeLqLKaw5wBs2PZXL7Mxc6gRRZAk+M1FqXxK8il+6CO8a6xE+GboT7MOsfoyy UvfYioT2Er9t63JqlUoJtrrIe7XHGmcd1AJC2YRnM+MVEfKdP0pWD84Cgfcyz9WRiZLU jk9M9O9z/XXSjnqEcQCL8t5bgevg4NCEnKAJwMRsKEld0bGm+g1bLDM6/esoywW5wrso yaJUAtLLHJ4b4b6nVfjqXzOcpDlvmrtAUJx4EGq0zCEECoSprJeVxp3++epDtH317iFJ VyXg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@canonical.com header.s=20210705 header.b=vvalc3mv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j9si20597757ejn.701.2021.10.04.22.05.04; Mon, 04 Oct 2021 22:05:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@canonical.com header.s=20210705 header.b=vvalc3mv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231816AbhJEFE3 (ORCPT + 99 others); Tue, 5 Oct 2021 01:04:29 -0400 Received: from smtp-relay-internal-0.canonical.com ([185.125.188.122]:35598 "EHLO smtp-relay-internal-0.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230493AbhJEFE1 (ORCPT ); Tue, 5 Oct 2021 01:04:27 -0400 Received: from mail-pj1-f72.google.com (mail-pj1-f72.google.com [209.85.216.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id 983033F22C for ; Tue, 5 Oct 2021 05:02:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1633410152; bh=4sv1HPCgRHpn4yyOFITBxjMLxNIHOvFNMiBGKmMcH4k=; h=Subject:To:Cc:References:From:Message-ID:Date:MIME-Version: In-Reply-To:Content-Type; b=vvalc3mvXOQ3g1yrOrBn8g0EqBaCQlMxOr85d8lNUBE3JwoVtDMhaINKTSXW6tUwG 1yeQWAHACG4Xn/6kGiEMdnEk+r6zE6QOw7jIAk/+zhUgjwwzYSDjLFXreygVjIDa5O vExidkOjIqaE7lddhb5tGqSUYJVrctx0RMwBqi2BXwVTyKkPSUm3wxjT1chfbqGnSS aAUmGANzz4vDlX2CDuVkVhK0MoOJx0em05z6H8r0dnotrsyi8lUpYYDJl8khOkrLtM xMDCIqS0CDdKXghrrEJ7hLk9WpT45kXawhQWxw8kmno8L0qLjT3IIMVh5EulC8S8Nc llUqsXnxj/Otg== Received: by mail-pj1-f72.google.com with SMTP id o15-20020a17090ac08f00b0019fafa34327so862665pjs.3 for ; Mon, 04 Oct 2021 22:02:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=4sv1HPCgRHpn4yyOFITBxjMLxNIHOvFNMiBGKmMcH4k=; b=xmaRl4CNUWNz3QFjgYp2ew+bOdRu5Uwj07p6GzJWR1rSVUG0Zm2np/wsAYrf9ZGuT0 g9Q7QBvWEncNwA3QbjBfHRg2b1pkL7EZhF+h/upLpOlPR3NihbA4ktNC6UVakY3phgBj nKHWTaaTAqcUVfeczWQSLtuF1AYsZaI0gPyOYRWuSBKDYpuqbCzb/ediJgpPzv+drPsN HnNwrCBev2T5hSdlDMhwTZ50CkzrLC7dEIQnonupGOS0yA1auDstVOBmV2y0TH0lVCfx aW/LRjXfI0KX0vsNdBVlreB8sZGa0RYMzUskRzCr3uVmkmi5YCjBZeb21rF7/CkMPiRR v4Iw== X-Gm-Message-State: AOAM531E5rrWZ/7758BK5swsjIKqVuVlrvC0AL2Ry9A+x1H55kj7DtST n3laFkJN1rLr4JR2kykUyyvHK4p58v8CXQa16so5LPdaGqTlWlFpHCZqIEfOLRyQWzFqe/e6oHU l6HxdEpDg70zKJGVruvZsMX6/ewIEuawM83/66b8OkA== X-Received: by 2002:a63:f80a:: with SMTP id n10mr13873450pgh.303.1633410151083; Mon, 04 Oct 2021 22:02:31 -0700 (PDT) X-Received: by 2002:a63:f80a:: with SMTP id n10mr13873440pgh.303.1633410150715; Mon, 04 Oct 2021 22:02:30 -0700 (PDT) Received: from [192.168.1.107] (125-237-197-94-fibre.sparkbb.co.nz. [125.237.197.94]) by smtp.gmail.com with ESMTPSA id b23sm16272954pfi.135.2021.10.04.22.02.27 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 04 Oct 2021 22:02:30 -0700 (PDT) Subject: Re: [PROBLEM] Frequently get "irq 31: nobody cared" when passing through 2x GPUs that share same pci switch via vfio To: Alex Williamson Cc: linux-pci@vger.kernel.org, lkml , kvm@vger.kernel.org, nathan.langford@xcelesunifiedtechnologies.com References: <20210914104301.48270518.alex.williamson@redhat.com> <9e8d0e9e-1d94-35e8-be1f-cf66916c24b2@canonical.com> <20210915103235.097202d2.alex.williamson@redhat.com> From: Matthew Ruffell Message-ID: <2fadf33d-8487-94c2-4460-2a20fdb2ea12@canonical.com> Date: Tue, 5 Oct 2021 18:02:24 +1300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <20210915103235.097202d2.alex.williamson@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Alex, Have you had an opportunity to have a look at this a bit deeper? On 16/09/21 4:32 am, Alex Williamson wrote: > > Adding debugging to the vfio-pci interrupt handler, it's correctly > deferring the interrupt as the GPU device is not identifying itself as > the source of the interrupt via the status register. In fact, setting > the disable INTx bit in the GPU command register while the interrupt > storm occurs does not stop the interrupts. > > The interrupt storm does seem to be related to the bus resets, but I > can't figure out yet how multiple devices per switch factors into the > issue. Serializing all bus resets via a mutex doesn't seem to change > the behavior. > > I'm still investigating, but if anyone knows how to get access to the > Broadcom datasheet or errata for this switch, please let me know. We have managed to obtain a recent errata for this switch, and it doesn't mention any interrupt storms with nested switches. What would I be looking for in the errata? I cannot share our copy, sorry. Is there anything that we can do to help? Thanks, Matthew