Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp2240053pxp; Mon, 21 Mar 2022 14:42:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx1i3eu3+7g6xVFNRdfN39Gw4YI/SgjMRsVJ9tQ/N8QqkGos4eRTmyg7dYJS6iLqnUhU6Hs X-Received: by 2002:a17:902:d4c1:b0:153:d493:3f1 with SMTP id o1-20020a170902d4c100b00153d49303f1mr15043190plg.102.1647898936841; Mon, 21 Mar 2022 14:42:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647898936; cv=none; d=google.com; s=arc-20160816; b=UKN0O+L5VDhSkypPElmXD+IidsZ74NMrXb4pJVZWwRdGwoc43HX/U0xZD89YSZ38zK jLLa/i+Y/bEmuqIhpS41yKK77OkmDpNeze1Ry2PJBPDBM4P0Vaqahu91OHZuecmchJAu ZpZccMv0a5DxI7bqTPvwC0meUK/jue8xDzaO9yBAiha5a2IDMSuEPlbAOBj4hEsrtjjs 0ZZl+BxI4Tmv//URymRG9w/PmPH3hUUqW/G4AAQomxuRCLyzK3zZfaLaPJODDZaf7Vw3 EEyGYRlo+36ziB84DIWr5BOGyHr3D4nUWUpFqCB6F/X9ZVNhPmpGLiKRUAj+MIN8OvPa Og+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:in-reply-to:date:subject :cc:to:from:references:dkim-signature:dkim-signature; bh=uXQFEtZDztyGX9AqVl1DV7goVhiMjSgo2VCtiWc9Ki8=; b=yzDYVacq9RxyqyKNspfpsqkw/w1Eo7kdLDNI21C313p54Dcip2Qcps4kMgu+NMJxpD On2Zk37tq3CJY/pbbha85W6RD7CWL4sJ2GCEorZOdcQBUKbUb4k1Fbni5E/JwciObq+9 VRuD4Os/Y015bPlx8pxCO2WMEUYDaySVlzLbT3/ISRib3fE3Cjd7RuF1qkcz359yAy7b yrqYF01fnH3WxgU9y8vP+zRWdafZEsY3SI9s/2GZeDQM7eCls3tllr7E1Gi+KhE1Az+V t2gkC9eI68LhtPIi7UM1Rmbo6bbpEXrt6gs3F66Q2GtMl9m36PybafepX5GP4yKNYrLt Z8fg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@turner.link header.s=fm3 header.b=ZARfCnE0; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=QvSwI85a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=dmarc-none.turner.link Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id m20-20020a17090ade1400b001c680fd1581si385691pjv.97.2022.03.21.14.42.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Mar 2022 14:42:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@turner.link header.s=fm3 header.b=ZARfCnE0; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=QvSwI85a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=dmarc-none.turner.link Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3BD3727D555; Mon, 21 Mar 2022 14:15:06 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344128AbiCUCDX (ORCPT + 99 others); Sun, 20 Mar 2022 22:03:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238328AbiCUCDT (ORCPT ); Sun, 20 Mar 2022 22:03:19 -0400 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 31CA1160167; Sun, 20 Mar 2022 19:01:55 -0700 (PDT) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id C579D5C010A; Sun, 20 Mar 2022 22:01:52 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Sun, 20 Mar 2022 22:01:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=turner.link; h= cc:cc:content-type:date:date:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to; s=fm3; bh=uXQFEtZDztyGX9AqVl1DV7goVhiMjSgo2VCtiW c9Ki8=; b=ZARfCnE0qOE6r2TNSfo5JUnuSRgVMItuvc+hzB1EuKdG87zevOdYOk 9e6856l2J2TYvMr2fhsC705gBJ23mlHBTJJkZ/Gcdcy0/xheFs6ICIHeN75PVT7G iqBs/4UWq/2EmZia8TPmlZr+nIYr08SGXpZw+GEdHRBlzOZl9dupzWOYcBePz2Cx GPSn1/ABJ6e0+IUeQA0k0lr8J4cdJwA8Dbbkn/Tx+ZkKBUU0XAtwAjQ0xlIIap2n 7RaSw+DFsizjC/rjWVt/9omoPalsWtMXuI4+t/1KsqTRPxv7L7ybWVNH85YDbILs OWcNbZmEWhcEErBCdU1mvdV/WkUkcWwg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=uXQFEtZDztyGX9AqV l1DV7goVhiMjSgo2VCtiWc9Ki8=; b=QvSwI85adgO3ty/9m5FB9zLE9x3Gf/nOT /HD/CNxZtbczkOOuWaQjiSQO4TbxKPVGdoM6+ShctBBliwIXqbHPYEFkqjaK6xMt mOWdkmT2j7ZHatiEjjhUDZSnVzf7kMgvU5dd2hsgd3ig6OMPlr3adz5I8Z0PijKX QU5QE3cIpdjWZm31HD5uFg1dJsKwbYwjCursog3VM2ROlW0m7k7jmNL3NzzcUd3g agkIAR8dE3SDGp5PK4kYi3JLA5rjtG+eQm1TjLCuTDEPb21OuvdIG31G8pL2R0/O hKNfbTsPLOik9q/UJtw4sZguf3UIqZfLXb/+zHH+3zID41XBiqVDw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvvddrudegvddggeduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfhfhvffuffgjkfggtgesthdtredttddttdenucfhrhhomheplfgrmhgvshcu vfhurhhnvghruceolhhinhhugihkvghrnhgvlhdrfhhoshhssegumhgrrhgtqdhnohhnvg drthhurhhnvghrrdhlihhnkheqnecuggftrfgrthhtvghrnhepfffhveeugfevteeileej vdeltdegtdeggfeujefgveekueevkeehheehffduleevnecuffhomhgrihhnpegrrhgthh hlihhnuhigrdhorhhgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghi lhhfrhhomheplhhinhhugihkvghrnhgvlhdrfhhoshhssegumhgrrhgtqdhnohhnvgdrth hurhhnvghrrdhlihhnkh X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sun, 20 Mar 2022 22:01:51 -0400 (EDT) References: <87ee57c8fu.fsf@turner.link> <87sftfqwlx.fsf@dmarc-none.turner.link> <87ee4wprsx.fsf@turner.link> <4b3ed7f6-d2b6-443c-970e-d963066ebfe3@amd.com> <87pmo8r6ob.fsf@turner.link> <5a68afe4-1e9e-c683-e06d-30afc2156f14@leemhuis.info> <87pmnnpmh5.fsf@dmarc-none.turner.link> <092b825a-10ff-e197-18a1-d3e3a097b0e3@leemhuis.info> <877d96to55.fsf@dmarc-none.turner.link> <87lexdw8gd.fsf@turner.link> <40b3084a-11b8-0962-4b33-34b56d3a87a3@molgen.mpg.de> <20220318084625.27d42a51.alex.williamson@redhat.com> <20220318092552.518a50ef.alex.williamson@redhat.com> From: James Turner To: Alex Williamson Cc: Alex Deucher , Thorsten Leemhuis , Paul Menzel , Xinhui Pan , regressions@lists.linux.dev, kvm@vger.kernel.org, Greg KH , Lijo Lazar , LKML , amd-gfx list , Alexander Deucher , Christian =?utf-8?Q?K=C3=B6nig?= Subject: Re: [REGRESSION] Too-low frequency limit for AMD GPU PCI-passed-through to Windows VM Date: Sun, 20 Mar 2022 21:26:51 -0400 In-reply-to: <20220318092552.518a50ef.alex.williamson@redhat.com> Message-ID: <87mthkkqr4.fsf@turner.link> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >>> Right, interference from host drivers and pre-boot environments is >>> always a concern with GPU assignment in particular. AMD GPUs have a >>> long history of poor behavior relative to things like PCI secondary >>> bus resets which we use to try to get devices to clean, reusable >>> states for assignment. Here a device is being bound to a host driver >>> that initiates some sort of power control, unbound from that driver >>> and exposed to new drivers far beyond the scope of the kernel's >>> regression policy. Perhaps it's possible to undo such power control >>> when unbinding the device, but it's not necessarily a given that >>> such a thing is possible for this device without a cold reset. >>> >>> IMO, it's not fair to restrict the kernel from such advancements. If >>> the use case is within a VM, don't bind host drivers. It's difficult >>> to make promises when dynamically switching between host and >>> userspace drivers for devices that don't have functional reset >>> mechanisms. To clarify, the GPU is never bound to the `amdgpu` driver on the host. I'm binding it to `vfio-pci` on the host at boot, specifically to avoid issues with dynamic rebinding. To do this, I'm passing `vfio-pci.ids=1002:6981,1002:aae0` on the kernel command line, and I've confirmed that this option is working: % lspci -nnk -d 1002:6981 01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Lexa XT [Radeon PRO WX 3200] [1002:6981] Subsystem: Dell Device [1028:0926] Kernel driver in use: vfio-pci Kernel modules: amdgpu % lspci -nnk -d 1002:aae0 01:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Baffin HDMI/DP Audio [Radeon RX 550 640SP / RX 560/560X] [1002:aae0] Subsystem: Dell Device [1028:0926] Kernel driver in use: vfio-pci Kernel modules: snd_hda_intel Starting with f9b7f3703ff9 ("drm/amdgpu/acpi: make ATPX/ATCS structures global (v2)") this is insufficient for the GPU to work properly in the VM, since the `amdgpu` module is calling global ACPI methods which affect the GPU even though it's not bound to the `amdgpu` driver. >> Additionally, operating the isolated device in a VM on a constrained >> environment like a laptop may have other adverse side effects. The >> driver in the guest would ideally know that this is a laptop and needs >> to properly interact with APCI to handle power management on the >> device. If that is not the case, the driver in the guest may end up >> running the device out of spec with what the platform supports. It's >> also likely to break suspend and resume, especially on systems which >> use S0ix since the firmware will generally only turn off certain power >> rails if all of the devices on the rails have been put into the proper >> state. That state may vary depending on the platform requirements. Fwiw, the guest Windows AMD driver can tell that it's a mobile GPU, and as a result, the driver GUI locks various performance parameters to the defaults. The cooling system and power supply seem to work without issues. As the load on the GPU increases, the fan speed increases. The GPU stays below the critical temperature with plenty of margin, even at 100% load. The voltage reported by the GPU adjusts with the load, and I haven't experienced any glitches which would suggest that the GPU is not getting enough power or something. I haven't tried suspend/resume. What are the differences between a laptop and desktop, aside from the size of the cooling system? Could the issue reported here affect desktop systems, too? As far as what to do for this issue: Personally, I don't mind blacklisting `amdgpu` on my machine. My primary concerns are: 1. Other users may experience this issue and have trouble figuring out what's happening, or they may not even realize that they're experiencing significantly-lower-than-expected performance. 2. It's possible that this issue affects some machines which use an AMD GPU for the host and a second AMD GPU for the guest. For those machines, blacklisting `amdgpu` would not be an option, since that would disable the AMD GPU for the host. I've tried to help with concern 1 by mentioning this issue on the Arch Linux Wiki [1]. Another thing that would help is to print a warning message to the kernel ring buffer when an AMD GPU is bound to `vfio-pci` and the `amdgpu` module is loaded. (It would say something like, "Although the device is bound to `vfio-pci`, loading the `amdgpu` module may still affect it via ACPI. Consider blacklisting `amdgpu` if the GPU does not behave as expected.") I'm not sure if there's any way to address concern 2, aside from fixing the firmware / Windows AMD driver. I thought of one more thing I could test -- I could try a Linux guest instead of a Windows guest to determine if the issue is due to the firmware or the guest Windows AMD driver. Would that be helpful? [1]: https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF#Too-low_frequency_limit_for_AMD_GPU_passed-through_to_virtual_machine James