Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp5049413ybl; Tue, 4 Feb 2020 06:49:36 -0800 (PST) X-Google-Smtp-Source: APXvYqw3teFTevAdmNoKsFvdJj9fQ/iGuYXnKXnqgxP6NVNZ4g6WdsZ2DJnO3crU7f2uOnxXDpj7 X-Received: by 2002:aca:ab53:: with SMTP id u80mr3566029oie.94.1580827775857; Tue, 04 Feb 2020 06:49:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580827775; cv=none; d=google.com; s=arc-20160816; b=FmVwuweb1cx7IfP5umv8TlMbAomkNsL3yydSN0hBtiWKChHJdIGbcl+jXI2B8TM8b0 5NcS46bIKuiV74++iQ7U5KMIwEcDdpbOxJqcVa6h2HOygBq8GDujD3gpj2Uu/tVnJccM N46P348jfQZiQle1WLH64kFcnI5GgduPqqBZ5OoQQKCXfVoAVB/gulR6RA77rsa6yUFJ +G/1z7fIpL/dpbUONo+wGSjwFQmXLXb7V2Rpsx2bOfEn4oOfXr2n1Mr39JLhNeNf8CYI Fn16bPkrfrTaxK43Etob+hROEPDIDrVVwe1ta8FikaQQEl5ojpl1qdfUU/53tNufCRAF /DlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=3w1h+lRRfIpIEUdjt/Pa7xTQOn5j0pYRtKr5A2aGqqE=; b=pXZXCQnXilrob52sOWOsTYeYPsT1dIDCdF5ZyZIlZM5YOskC6sKKZWA7vCCzR1Q/21 D38mnGx24cGAt1/2IpIMpvMOglVRo0mOPNoZ+O5VoDCpT5gclDRF9v78OqYy8ktwsmaE zIfLaRZnBAsMsJ9J+v9b6Fu6LU/+eZHevInxwV13HwaVaH4MifPtz68mQp0mBeJ6Cv0i XnHLCvL6JmaXoBPFlNVYFqfP2faGOlN4Tq6MKXNCIxjEigFX5E5HKtauBRF/O5Ueo9oN EFOXtKg1m718ie9LYujRrzdAhnYU9spk6Wl19z53Avv9XVp96FPxgz6O7PLad8tQ0oVl tRZg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=iqlf35kM; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u5si13373485otg.66.2020.02.04.06.49.23; Tue, 04 Feb 2020 06:49:35 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=iqlf35kM; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727337AbgBDOsE (ORCPT + 99 others); Tue, 4 Feb 2020 09:48:04 -0500 Received: from mail-wm1-f41.google.com ([209.85.128.41]:54486 "EHLO mail-wm1-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727316AbgBDOsD (ORCPT ); Tue, 4 Feb 2020 09:48:03 -0500 Received: by mail-wm1-f41.google.com with SMTP id g1so3592183wmh.4; Tue, 04 Feb 2020 06:48:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=3w1h+lRRfIpIEUdjt/Pa7xTQOn5j0pYRtKr5A2aGqqE=; b=iqlf35kMVH6QZnkCDUtO9VxLzLbryZekhFqhzbOfC2q1N7Lu0/fSYKd9vxFLa6Z458 28/EdLI8XKeBsL4QKQsXtHw9cUBitSTb/AFnu51DffAPJWwTybsgEgcM6gfipNH/tAem zLKSVkMic1j6CSTh5eWvCrxRyEU0XC6pZtRuwYUnA2Oj1T8GxYUNee93F6gBRF1rnBCQ 5OaDM3nqD9iiYyzuecTtAFJUh5CMbmCdlHquDbXQXBefLSWuUPHGOuEtBw0MWW65J+ep 94vBfxJCpMsU+NL3G5LWAyMm537D+cGCuNL1WYPfprdG7j3hOQsZZgH43zi5v6HWl4KC XlZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=3w1h+lRRfIpIEUdjt/Pa7xTQOn5j0pYRtKr5A2aGqqE=; b=bM7GXRS1F8b+GKiFSescOTLaI3SE3ZhJvbBO0ZNx5Hm5Hyg7t4u3MiKdOUV3Xzaei1 z7i/ORNfaguSRIskUztRaX+LyQhFOiuzpy4KMLv8BbpxjthZkpcEe24xLakGO+b9WTw0 JZGoOZRd3H00DQtiIEdC6NraZ87W48l1FvW3X5JsF9aRIwzizInwG9FtMmtQ6Ixo8jpM ar4iZ+J0Tab+qrr0Qo62olkaf8lb/y7CR3mebzKzjFuIRK3ZKZ7G9jHzh/ej5+SLK18r 29d29GmPfiyO/brwg5tcjtHqEslvJUC03KvNoeZcMMz2g7i4EDMiRZOy0Er5Gq6HrOhf gANw== X-Gm-Message-State: APjAAAXXtwRXZhTWKLgUNQ6z9J8y+JTFC5poZLm5ze4fDjZfQdYTbl7g rqaEgyFZBaCi0jmaKTddzOnWwjYp32Csfpnwz+4= X-Received: by 2002:a05:600c:218b:: with SMTP id e11mr6311726wme.56.1580827681631; Tue, 04 Feb 2020 06:48:01 -0800 (PST) MIME-Version: 1.0 References: <20200120023326.GA149019@google.com> <8409fd7ad6b83da75c914a71accf522953a460a0.camel@pengutronix.de> <20200204043825.thpbqpz3ao7zqvlh@wunner.de> In-Reply-To: <20200204043825.thpbqpz3ao7zqvlh@wunner.de> From: Alex Deucher Date: Tue, 4 Feb 2020 09:47:50 -0500 Message-ID: Subject: Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification" To: Lukas Wunner Cc: Dave Airlie , Lucas Stach , Ben Skeggs , Karol Herbst , "Alex G." , Bjorn Helgaas , Alexandru Gagniuc , Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , David Airlie , Daniel Vetter , Jan Vesely , Alex Williamson , Austin Bolen , Shyam Iyer , Sinan Kaya , Linux PCI , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 3, 2020 at 11:38 PM Lukas Wunner wrote: > > On Mon, Feb 03, 2020 at 04:16:36PM -0500, Alex Deucher wrote: > > AMD has had a micro-controller on the GPU handling pcie link speeds > > and widths dynamically (in addition to GPU clocks and voltages) for > > about 12 years or so at this point to save power when the GPU is idle > > and improve performance when it's required. The micro-controller > > changes the link parameters dynamically based on load independent of > > the driver. The driver can tweak the heuristics, or even disable the > > dynamic changes, but by default it's enabled when the driver loads. > > The ucode for this micro-controller is loaded by the driver so you'll > > see fixed clocks and widths prior to the driver loading. We'd need > > some sort of opt out I suppose for periods when the driver has enabled > > dynamic pcie power management in the micro-controller. > > Note that there are *two* bits in the Link Status Register: > > * Link Autonomous Bandwidth Status > "This bit is Set by hardware to indicate that hardware has > autonomously changed Link speed or width, without the Port > transitioning through DL_Down status, for reasons other than to > attempt to correct unreliable Link operation. This bit must be set if > the Physical Layer reports a speed or width change was initiated by > the Downstream component that was indicated as an autonomous change." > > * Link Bandwidth Management Status > "This bit is Set by hardware to indicate that either of the > following has occurred without the Port transitioning through > DL_Down status. [...] Hardware has changed Link speed or width to > attempt to correct unreliable Link operation, either through an > LTSSM timeout or a higher level process." > > See PCIe Base Spec 4.0 sec 7.8.8, 7.8.7, 4.2.6.3.3.1. > > The two bits generate *separate* interrupts. We only enable the > interrupt for the latter. > > If AMD GPUs generate a Link Bandwidth Management Interrupt upon > autonomously changing bandwidth for power management reasons > (instead of to correct unreliability issues), that would be a > spec violation. > > So the question is, do your GPUs violate the spec in this regard? I don't know off hand. I can ask the firmware team. That said, I haven't seen any reports of problems with our GPUs with this code enabled. Alex