Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp1407918pxb; Fri, 21 Jan 2022 17:58:39 -0800 (PST) X-Google-Smtp-Source: ABdhPJyHVMlfoR7I1xGkkCuG3ObiS9cXjC5FlcZARK0sB1G6Yxo9tj5K1PTXl2yLg2yorDRd9smB X-Received: by 2002:a17:90b:230f:: with SMTP id mt15mr3342337pjb.72.1642816719233; Fri, 21 Jan 2022 17:58:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642816719; cv=none; d=google.com; s=arc-20160816; b=Fqf0eg1kvxTgmhfQSovrGfjbqHcSMcHuL9zziJctkaYNFbxJBNE03Ia7qtawQUDiv1 bRiC2JRvv+DHH6FNrreMJRxWqP4y+nL14mc43JTvn8EsYqRg4TLbOMkvZSchComeRviW 1zWcmuftsQDt267dR9DgDMJnMWNBY8s2GyUswSDoZnvHxUdVXkjVkJr7ZqanG40ssugH VKK6uDJ5pHjbKuAbXwFw0pGdzED8SsuJUO6tI+a/46LQEiCHSHFuQdO5ZVHydTJaH9q8 +LbHOaExhiySnwOxif+LqBPTw8ZTE4jb49DKUWRFaanchpmEM8c2oZr/0iDMgRtZF7vQ 3KRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=ojyQZXVH6xO5d+3ttXWHWJ4YFDHvnaljBcpusb/bm/Y=; b=XZfd1HsZN/QAv97B7KUr/yVdGqeKbC3DEwpD1qNY1est+afAtr0b6n/jvO8vHzHNOV 4SW+ugSxGBtQYTsPxeXBymD/R6EFWzHWD0yRf/wwctfiJ7JsI888QRlrGgZsZlqtnZGx hBUk/IdoWM9E54ibK6voYvouYFKJ4vWji1dx8zdRxNHDYogNcfJwDfUrKppE8KaG4jH6 JwciFNvZfs5wz7jBAvvQ4HUrIhBAz0lCLqj3exQ5Eq131KT2rmweQO4D4Ipe7wxIfIRC 1h+6fJTvzpv9d2TCaygF/gCytFwUnPXdDsC/cd9PmmFH5hWAKK+8qEu7oV3UZ+pAyz3l 2BVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="hR3Tf/qz"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c1si5170971pgq.667.2022.01.21.17.58.27; Fri, 21 Jan 2022 17:58:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="hR3Tf/qz"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378808AbiAUQp4 (ORCPT + 99 others); Fri, 21 Jan 2022 11:45:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345050AbiAUQpz (ORCPT ); Fri, 21 Jan 2022 11:45:55 -0500 Received: from mail-ot1-x32f.google.com (mail-ot1-x32f.google.com [IPv6:2607:f8b0:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 62557C06173B; Fri, 21 Jan 2022 08:45:55 -0800 (PST) Received: by mail-ot1-x32f.google.com with SMTP id t4-20020a05683022e400b00591aaf48277so12420753otc.13; Fri, 21 Jan 2022 08:45:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ojyQZXVH6xO5d+3ttXWHWJ4YFDHvnaljBcpusb/bm/Y=; b=hR3Tf/qzNb2+R1EPxpg6y4V3Ha8vohcPmd00D4U2G1ZRh4D7XYSK94lVmUrDK6/PRq tcY04cigWNsLmAgnhe3BgUfLw/MPj5G5Eo8UeKgdD+wWw4YgJMwQuZdFo742b+MNcWH9 bpMDOK5D+npvxrtF7Y5RH6n57bhQqK9GtfLXMogfcSCjF7n0vmUA7H3D2n8MuRZxFHm8 NxPx5QGyE9wO0XGwDPZxAH3d12X5/MQV9SNLf9NUJLkMDC0Uq53X67iLG+sr3mfU5vTH NL9adc9vYHqKdAzFDat9ivfbPhxnlESlOc3THN0W+thbM5/FzzVtz0PJFNXS76JEDanB ARFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ojyQZXVH6xO5d+3ttXWHWJ4YFDHvnaljBcpusb/bm/Y=; b=wMSarkdCGrdtUaXE4T9m30JjMaYZidO+b1/qtbgMDHhR6hNVWWK9Sz+JIVh9ucbiFi X9Be2NwVPOhNKj1C8AciDbQ2d1pUAICacNkCgiEx8tKpsZ7Zv806QKczaAkWJFmmqeyY Cll/7vDm3a1h2Fk8pGLGZOKlRZd1UwmPpXoSB1ParE1JA8EscMZS56lZtdw1O22KNmke KQWcb6TiN/dkPr/91KTPiV5zs9tAQ+/paR4YaSbBMc4BoYkp3Nu2fdJLWbQBrHkGUhXx e8h/oDOYzeVGVjpUPxBWjpt2TIseEMwgzg6ck3uOkPJPkLnEtOpL7VnajePdEx+hqxC/ HXGg== X-Gm-Message-State: AOAM531eFkDnejp0v4GK00XQZsYXvLBxlG8tzYo+U+eliNddFXVNqbC6 YDkx0IpqFigT9+0thD0ncer65lusJ+J5rXNmDrE= X-Received: by 2002:a05:6830:19e6:: with SMTP id t6mr3458344ott.357.1642783554524; Fri, 21 Jan 2022 08:45:54 -0800 (PST) MIME-Version: 1.0 References: <87ee57c8fu.fsf@turner.link> <87a6ftk9qy.fsf@dmarc-none.turner.link> <87zgnp96a4.fsf@turner.link> In-Reply-To: From: Alex Deucher Date: Fri, 21 Jan 2022 11:45:43 -0500 Message-ID: Subject: Re: [REGRESSION] Too-low frequency limit for AMD GPU PCI-passed-through to Windows VM To: Thorsten Leemhuis Cc: James Turner , Alex Deucher , Lijo Lazar , regressions@lists.linux.dev, kvm@vger.kernel.org, Greg KH , "Pan, Xinhui" , LKML , "amd-gfx@lists.freedesktop.org" , Alex Williamson , =?UTF-8?Q?Christian_K=C3=B6nig?= Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 21, 2022 at 3:35 AM Thorsten Leemhuis wrote: > > Hi, this is your Linux kernel regression tracker speaking. > > On 21.01.22 03:13, James Turner wrote: > > > > I finished the bisection (log below). The issue was introduced in > > f9b7f3703ff9 ("drm/amdgpu/acpi: make ATPX/ATCS structures global (v2)"). > > FWIW, that was: > > > drm/amdgpu/acpi: make ATPX/ATCS structures global (v2) > > They are global ACPI methods, so maybe the structures > > global in the driver. This simplified a number of things > > in the handling of these methods. > > > > v2: reset the handle if verify interface fails (Lijo) > > v3: fix compilation when ACPI is not defined. > > > > Reviewed-by: Lijo Lazar > > Signed-off-by: Alex Deucher > > In that case we need to get those two and the maintainers for the driver > involved by addressing them with this mail. And to make it easy for them > here is a link and a quote from the original report: > > https://lore.kernel.org/all/87ee57c8fu.fsf@turner.link/ Are you ever loading the amdgpu driver in your tests? If not, I don't see how this patch would affect anything as the driver code would never have executed. It would appear not based on your example. Alex > > ``` > > Hi, > > > > With newer kernels, starting with the v5.14 series, when using a MS > > Windows 10 guest VM with PCI passthrough of an AMD Radeon Pro WX 3200 > > discrete GPU, the passed-through GPU will not run above 501 MHz, even > > when it is under 100% load and well below the temperature limit. As a > > result, GPU-intensive software (such as video games) runs unusably > > slowly in the VM. > > > > In contrast, with older kernels, the passed-through GPU runs at up to > > 1295 MHz (the correct hardware limit), so GPU-intensive software runs at > > a reasonable speed in the VM. > > > > I've confirmed that the issue exists with the following kernel versions: > > > > - v5.16 > > - v5.14 > > - v5.14-rc1 > > > > The issue does not exist with the following kernels: > > > > - v5.13 > > - various packaged (non-vanilla) 5.10.* Arch Linux `linux-lts` kernels > > > > So, the issue was introduced between v5.13 and v5.14-rc1. I'm willing to > > bisect the commit history to narrow it down further, if that would be > > helpful. > > > > The configuration details and test results are provided below. In > > summary, for the kernels with this issue, the GPU core stays at a > > constant 0.8 V, the GPU core clock ranges from 214 MHz to 501 MHz, and > > the GPU memory stays at a constant 625 MHz, in the VM. For the correctly > > working kernels, the GPU core ranges from 0.85 V to 1.0 V, the GPU core > > clock ranges from 214 MHz to 1295 MHz, and the GPU memory stays at 1500 > > MHz, in the VM. > > > > Please let me know if additional information would be helpful. > > > > Regards, > > James Turner > > > > # Configuration Details > > > > Hardware: > > > > - Dell Precision 7540 laptop > > - CPU: Intel Core i7-9750H (x86-64) > > - Discrete GPU: AMD Radeon Pro WX 3200 > > - The internal display is connected to the integrated GPU, and external > > displays are connected to the discrete GPU. > > > > Software: > > > > - KVM host: Arch Linux > > - self-built vanilla kernel (built using Arch Linux `PKGBUILD` > > modified to use vanilla kernel sources from git.kernel.org) > > - libvirt 1:7.10.0-2 > > - qemu 6.2.0-2 > > > > - KVM guest: Windows 10 > > - GPU driver: Radeon Pro Software Version 21.Q3 (Note that I also > > experienced this issue with the 20.Q4 driver, using packaged > > (non-vanilla) Arch Linux kernels on the host, before updating to the > > 21.Q3 driver.) > > > > Kernel config: > > > > - For v5.13, v5.14-rc1, and v5.14, I used > > https://github.com/archlinux/svntogit-packages/blob/89c24952adbfa645d9e1a6f12c572929f7e4e3c7/trunk/config > > (The build script ran `make olddefconfig` on that config file.) > > > > - For v5.16, I used > > https://github.com/archlinux/svntogit-packages/blob/94f84e1ad8a530e54aa34cadbaa76e8dcc439d10/trunk/config > > (The build script ran `make olddefconfig` on that config file.) > > > > I set up the VM with PCI passthrough according to the instructions at > > https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF > > > > I'm passing through the following PCI devices to the VM, as listed by > > `lspci -D -nn`: > > > > 0000:01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Lexa XT [Radeon PRO WX 3200] [1002:6981] > > 0000:01:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Baffin HDMI/DP Audio [Radeon RX 550 640SP / RX 560/560X] [1002:aae0] > > > > The host kernel command line includes the following relevant options: > > > > intel_iommu=on vfio-pci.ids=1002:6981,1002:aae0 > > > > to enable IOMMU and bind the `vfio-pci` driver to the PCI devices. > > > > My `/etc/mkinitcpio.conf` includes the following line: > > > > MODULES=(vfio_pci vfio vfio_iommu_type1 vfio_virqfd i915 amdgpu) > > > > to load `vfio-pci` before the graphics drivers. (Note that removing > > `i915 amdgpu` has no effect on this issue.) > > > > I'm using libvirt to manage the VM. The relevant portions of the XML > > file are: > > > > > > > >
> > > >
> > > > > > > >
> > > >
> > > > > > # Test Results > > > > For testing, I used the following procedure: > > > > 1. Boot the host machine and log in. > > > > 2. Run the following commands to gather information. For all the tests, > > the output was identical. > > > > - `cat /proc/sys/kernel/tainted` printed: > > > > 0 > > > > - `hostnamectl | grep "Operating System"` printed: > > > > Operating System: Arch Linux > > > > - `lspci -nnk -d 1002:6981` printed > > > > 01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Lexa XT [Radeon PRO WX 3200] [1002:6981] > > Subsystem: Dell Device [1028:0926] > > Kernel driver in use: vfio-pci > > Kernel modules: amdgpu > > > > - `lspci -nnk -d 1002:aae0` printed > > > > 01:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Baffin HDMI/DP Audio [Radeon RX 550 640SP / RX 560/560X] [1002:aae0] > > Subsystem: Dell Device [1028:0926] > > Kernel driver in use: vfio-pci > > Kernel modules: snd_hda_intel > > > > - `sudo dmesg | grep -i vfio` printed the kernel command line and the > > following messages: > > > > VFIO - User Level meta-driver version: 0.3 > > vfio-pci 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none > > vfio_pci: add [1002:6981[ffffffff:ffffffff]] class 0x000000/00000000 > > vfio_pci: add [1002:aae0[ffffffff:ffffffff]] class 0x000000/00000000 > > vfio-pci 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none > > > > 3. Start the Windows VM using libvirt and log in. Record sensor > > information. > > > > 4. Run a graphically-intensive video game to put the GPU under load. > > Record sensor information. > > > > 5. Stop the game. Record sensor information. > > > > 6. Shut down the VM. Save the output of `sudo dmesg`. > > > > I compared the `sudo dmesg` output for v5.13 and v5.14-rc1 and didn't > > see any relevant differences. > > > > Note that the issue occurs only within the guest VM. When I'm not using > > a VM (after removing `vfio-pci.ids=1002:6981,1002:aae0` from the kernel > > command line so that the PCI devices are bound to their normal `amdgpu` > > and `snd_hda_intel` drivers instead of the `vfio-pci` driver), the GPU > > operates correctly on the host. > > > > ## Linux v5.16 (issue present) > > > > $ cat /proc/version > > Linux version 5.16.0-1 (linux@archlinux) (gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1) #1 SMP PREEMPT Sun, 16 Jan 2022 01:51:08 +0000 > > > > Before running the game: > > > > - GPU core: 214.0 MHz, 0.800 V, 0.0% load, 53.0 degC > > - GPU memory: 625.0 MHz > > > > While running the game: > > > > - GPU core: 501.0 MHz, 0.800 V, 100.0% load, 54.0 degC > > - GPU memory: 625.0 MHz > > > > After stopping the game: > > > > - GPU core: 214.0 MHz, 0.800 V, 0.0% load, 51.0 degC > > - GPU memory: 625.0 MHz > > > > ## Linux v5.14 (issue present) > > > > $ cat /proc/version > > Linux version 5.14.0-1 (linux@archlinux) (gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1) #1 SMP PREEMPT Sun, 16 Jan 2022 03:19:35 +0000 > > > > Before running the game: > > > > - GPU core: 214.0 MHz, 0.800 V, 0.0% load, 50.0 degC > > - GPU memory: 625.0 MHz > > > > While running the game: > > > > - GPU core: 501.0 MHz, 0.800 V, 100.0% load, 54.0 degC > > - GPU memory: 625.0 MHz > > > > After stopping the game: > > > > - GPU core: 214.0 MHz, 0.800 V, 0.0% load, 49.0 degC > > - GPU memory: 625.0 MHz > > > > ## Linux v5.14-rc1 (issue present) > > > > $ cat /proc/version > > Linux version 5.14.0-rc1-1 (linux@archlinux) (gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1) #1 SMP PREEMPT Sun, 16 Jan 2022 18:31:35 +0000 > > > > Before running the game: > > > > - GPU core: 214.0 MHz, 0.800 V, 0.0% load, 50.0 degC > > - GPU memory: 625.0 MHz > > > > While running the game: > > > > - GPU core: 501.0 MHz, 0.800 V, 100.0% load, 54.0 degC > > - GPU memory: 625.0 MHz > > > > After stopping the game: > > > > - GPU core: 214.0 MHz, 0.800 V, 0.0% load, 49.0 degC > > - GPU memory: 625.0 MHz > > > > ## Linux v5.13 (works correctly, issue not present) > > > > $ cat /proc/version > > Linux version 5.13.0-1 (linux@archlinux) (gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1) #1 SMP PREEMPT Sun, 16 Jan 2022 02:39:18 +0000 > > > > Before running the game: > > > > - GPU core: 214.0 MHz, 0.850 V, 0.0% load, 55.0 degC > > - GPU memory: 1500.0 MHz > > > > While running the game: > > > > - GPU core: 1295.0 MHz, 1.000 V, 100.0% load, 67.0 degC > > - GPU memory: 1500.0 MHz > > > > After stopping the game: > > > > - GPU core: 214.0 MHz, 0.850 V, 0.0% load, 52.0 degC > > - GPU memory: 1500.0 MHz > > ``` > > Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat) > > P.S.: As a Linux kernel regression tracker I'm getting a lot of reports > on my table. I can only look briefly into most of them. Unfortunately > therefore I sometimes will get things wrong or miss something important. > I hope that's not the case here; if you think it is, don't hesitate to > tell me about it in a public reply, that's in everyone's interest. > > BTW, I have no personal interest in this issue, which is tracked using > regzbot, my Linux kernel regression tracking bot > (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting > this mail to get things rolling again and hence don't need to be CC on > all further activities wrt to this regression. > > #regzbot introduced f9b7f3703ff9 > #regzbot title drm: amdgpu: Too-low frequency limit for AMD GPU > PCI-passed-through to Windows VM > > > > Would any additional information be helpful? > > > > git bisect start > > # bad: [e73f0f0ee7541171d89f2e2491130c7771ba58d3] Linux 5.14-rc1 > > git bisect bad e73f0f0ee7541171d89f2e2491130c7771ba58d3 > > # good: [62fb9874f5da54fdb243003b386128037319b219] Linux 5.13 > > git bisect good 62fb9874f5da54fdb243003b386128037319b219 > > # bad: [e058a84bfddc42ba356a2316f2cf1141974625c9] Merge tag 'drm-next-2021-07-01' of git://anongit.freedesktop.org/drm/drm > > git bisect bad e058a84bfddc42ba356a2316f2cf1141974625c9 > > # good: [a6eaf3850cb171c328a8b0db6d3c79286a1eba9d] Merge tag 'sched-urgent-2021-06-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > > git bisect good a6eaf3850cb171c328a8b0db6d3c79286a1eba9d > > # good: [007b312c6f294770de01fbc0643610145012d244] Merge tag 'mac80211-next-for-net-next-2021-06-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next > > git bisect good 007b312c6f294770de01fbc0643610145012d244 > > # bad: [18703923a66aecf6f7ded0e16d22eb412ddae72f] drm/amdgpu: Fix incorrect register offsets for Sienna Cichlid > > git bisect bad 18703923a66aecf6f7ded0e16d22eb412ddae72f > > # good: [c99c4d0ca57c978dcc2a2f41ab8449684ea154cc] Merge tag 'amd-drm-next-5.14-2021-05-19' of https://gitlab.freedesktop.org/agd5f/linux into drm-next > > git bisect good c99c4d0ca57c978dcc2a2f41ab8449684ea154cc > > # good: [43ed3c6c786d996a264fcde68dbb36df6f03b965] Merge tag 'drm-misc-next-2021-06-01' of git://anongit.freedesktop.org/drm/drm-misc into drm-next > > git bisect good 43ed3c6c786d996a264fcde68dbb36df6f03b965 > > # bad: [050cd3d616d96c3a04f4877842a391c0a4fdcc7a] drm/amd/display: Add support for SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616. > > git bisect bad 050cd3d616d96c3a04f4877842a391c0a4fdcc7a > > # good: [f43ae2d1806c2b8a0934cb4acddd3cf3750d10f8] drm/amdgpu: Fix inconsistent indenting > > git bisect good f43ae2d1806c2b8a0934cb4acddd3cf3750d10f8 > > # good: [6566cae7aef30da8833f1fa0eb854baf33b96676] drm/amd/display: fix odm scaling > > git bisect good 6566cae7aef30da8833f1fa0eb854baf33b96676 > > # good: [5ac1dd89df549648b67f4d5e3a01b2d653914c55] drm/amd/display/dc/dce/dmub_outbox: Convert over to kernel-doc > > git bisect good 5ac1dd89df549648b67f4d5e3a01b2d653914c55 > > # good: [a76eb7d30f700e5bdecc72d88d2226d137b11f74] drm/amd/display/dc/dce110/dce110_hw_sequencer: Include header containing our prototypes > > git bisect good a76eb7d30f700e5bdecc72d88d2226d137b11f74 > > # good: [dd1d82c04e111b5a864638ede8965db2fe6d8653] drm/amdgpu/swsmu/aldebaran: fix check in is_dpm_running > > git bisect good dd1d82c04e111b5a864638ede8965db2fe6d8653 > > # bad: [f9b7f3703ff97768a8dfabd42bdb107681f1da22] drm/amdgpu/acpi: make ATPX/ATCS structures global (v2) > > git bisect bad f9b7f3703ff97768a8dfabd42bdb107681f1da22 > > # good: [f1688bd69ec4b07eda1657ff953daebce7cfabf6] drm/amd/amdgpu:save psp ring wptr to avoid attack > > git bisect good f1688bd69ec4b07eda1657ff953daebce7cfabf6 > > # first bad commit: [f9b7f3703ff97768a8dfabd42bdb107681f1da22] drm/amdgpu/acpi: make ATPX/ATCS structures global (v2) > > > > James > >