This reverts commit b516ea586d717472178e6ef1c152e85608b0ce32.
While this fixes audio for a number of users, this commit has the
sideaffect of breaking the BIOS workaround that's required to make the
GPU on the nvidia P50 work, by causing the GPU's PCI device function to
stop working after it's been set to multifunction mode.
Signed-off-by: Lyude Paul <[email protected]>
Fixes: b516ea586d71 ("PCI: Enable NVIDIA HDA controllers")
Cc: Lukas Wunner <[email protected]>
Cc: Daniel Drake <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Aaron Plattner <[email protected]>
Cc: Peter Wu <[email protected]>
Cc: Ilia Mirkin <[email protected]>
Cc: Karol Herbst <[email protected]>
Cc: Maik Freudenberg <[email protected]>
Cc: [email protected]
---
I'm not really holding my breath on this patch to being accepted:
there's a good chance there's a better solution for this (and I'm going
to continue investigating for one after sending this patch), this is
more just to start a conversation on what the proper way to fix this is.
So, I'm kind of confused about why exactly this was implemented as an
early boot quirk in the first place. If we're seeing the GPU's PCI
device, we already know the GPU is there. Shouldn't we be able to check
for the existence of the HDA device once we probe the GPU in nouveau?
This would make a lot more sense and be a lot less troublesome. I can
see that in the discussion on
https://bugs.freedesktop.org/show_bug.cgi?id=75985
That people mentioned that unloading nouveau then trying to reprobe for
the audio device didn't work, but that still doesn't explain why this
was implemented as an early quirk and not as something we just do before
nouveau is setup. Can we maybe move this somewhere a little more
sensible?
drivers/pci/quirks.c | 30 ------------------------------
include/linux/pci_ids.h | 1 -
2 files changed, 31 deletions(-)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 208aacf39329..c66c0ca446c4 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5011,36 +5011,6 @@ DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
PCI_CLASS_SERIAL_UNKNOWN, 8,
quirk_gpu_usb_typec_ucsi);
-/*
- * Enable the NVIDIA GPU integrated HDA controller if the BIOS left it
- * disabled. https://devtalk.nvidia.com/default/topic/1024022
- */
-static void quirk_nvidia_hda(struct pci_dev *gpu)
-{
- u8 hdr_type;
- u32 val;
-
- /* There was no integrated HDA controller before MCP89 */
- if (gpu->device < PCI_DEVICE_ID_NVIDIA_GEFORCE_320M)
- return;
-
- /* Bit 25 at offset 0x488 enables the HDA controller */
- pci_read_config_dword(gpu, 0x488, &val);
- if (val & BIT(25))
- return;
-
- pci_info(gpu, "Enabling HDA controller\n");
- pci_write_config_dword(gpu, 0x488, val | BIT(25));
-
- /* The GPU becomes a multi-function device when the HDA is enabled */
- pci_read_config_byte(gpu, PCI_HEADER_TYPE, &hdr_type);
- gpu->multifunction = !!(hdr_type & 0x80);
-}
-DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
- PCI_BASE_CLASS_DISPLAY, 16, quirk_nvidia_hda);
-DECLARE_PCI_FIXUP_CLASS_RESUME_EARLY(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
- PCI_BASE_CLASS_DISPLAY, 16, quirk_nvidia_hda);
-
/*
* Some IDT switches incorrectly flag an ACS Source Validation error on
* completions for config read requests even though PCIe r4.0, sec
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index c842735a4f45..f496fb619287 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -1336,7 +1336,6 @@
#define PCI_DEVICE_ID_NVIDIA_NFORCE_MCP78S_SMBUS 0x0752
#define PCI_DEVICE_ID_NVIDIA_NFORCE_MCP77_IDE 0x0759
#define PCI_DEVICE_ID_NVIDIA_NFORCE_MCP73_SMBUS 0x07D8
-#define PCI_DEVICE_ID_NVIDIA_GEFORCE_320M 0x08A0
#define PCI_DEVICE_ID_NVIDIA_NFORCE_MCP79_SMBUS 0x0AA2
#define PCI_DEVICE_ID_NVIDIA_NFORCE_MCP89_SATA 0x0D85
--
2.21.0
On Wed, 2019-07-31 at 23:26 +0200, Karol Herbst wrote:
> On Wed, Jul 31, 2019 at 11:18 PM Lukas Wunner <[email protected]> wrote:
> > On Wed, Jul 31, 2019 at 04:19:27PM -0400, Lyude Paul wrote:
> > > While this fixes audio for a number of users, this commit has the
> > > sideaffect of breaking the BIOS workaround that's required to make the
> > > GPU on the nvidia P50 work, by causing the GPU's PCI device function to
> > > stop working after it's been set to multifunction mode.
> >
> > This is missing a reference to the commit introducing the P50 quirk,
> > which is e0547c81bfcf ("PCI: Reset Lenovo ThinkPad P50 nvgpu at boot
> > if necessary").
> >
> > Please describe in more detail how the GPU's PCI function stops working.
> > Does it respond with "all ones" when accessing MMIO?
> > Do MMIO accesses cause the system to hang?
> >
> > Could you provide lspci -vvxx output for the GPU and its associated
> > HDA controller with and without b516ea586d71?
> >
> > Does this machine have external display connectors via which audio
> > can be streamed?
> >
> >
> > > I'm not really holding my breath on this patch to being accepted:
> > > there's a good chance there's a better solution for this (and I'm going
> > > to continue investigating for one after sending this patch), this is
> > > more just to start a conversation on what the proper way to fix this is.
> >
> > Posting as an RFC might have been more appropriate then.
> >
>
> no, a revert is actually appropriate. If a commit fixes something,
> but breaks something else, it gets either reverted or fixed. If nobody
> fixes it, then revert it is.
To answer Lukas's question btw: most of the details on how things break are
back in the original commit (sorry for forgetting the reference!), there's a
_lot_ of explanation there that I'd rather not retype, so just refer back to
the commit and bug @ https://bugs.freedesktop.org/show_bug.cgi?id=75985
Additionally, there was some extra discussion providing some more detail in
the email thread that I had with Bjorn:
https://lkml.org/lkml/2019/2/12/1172
As for how this commit breaks the workaround: it seems that when we enable the
HDA controller and put the GPU into multifunction mode, the function-level
reset stops working and thus we can't reset the GPU anymore. Currently I can
see a couple of solutions (again, please feel free to suggest more!):
* Just revert the commit. We should do this if necessary, but of course I'd
much rather try finding a fix first
* Disable the HDA controller temporarily when a GPU reset is neded in
quirk_reset_lenovo_thinkpad_p50_nvgpu(), then call the function level
reset, then re-enable the HDA controller. I have no idea if this actually
works yet, but I'm about to try this on my system
* Get quirk_reset_lenovo_thinkpad_p50_nvgpu() to run before
quirk_nvidia_hda(). This would probably be fine, but we would need to
rework some stuff in the PCI subsystem (maybe it already has a way to do
this? haven't checked yet) so that we could perform an flr probe early
enough to perform the quirk
>
> > > So, I'm kind of confused about why exactly this was implemented as an
> > > early boot quirk in the first place. If we're seeing the GPU's PCI
> > > device, we already know the GPU is there. Shouldn't we be able to check
> > > for the existence of the HDA device once we probe the GPU in nouveau?
> >
> > I think a motivation to keep this generic was to make it work with
> > other drivers besides nouveau, specifically Nvidia's proprietary driver.
> > nouveau might not even be enabled.
> >
> >
> > > that still doesn't explain why this was implemented as an early quirk
> >
> > This isn't an early quirk. Those live in arch/x86/kernel/early-quirks.c.
> > This is just a PCI quirk executed on device enumeration and on resume.
> > Devices aren't necessarily enumerated only on boot, e.g. think
> > Thunderbolt.
> >
> > Thanks,
> >
> > Lukas
--
Cheers,
Lyude Paul
On Wed, Jul 31, 2019 at 04:19:27PM -0400, Lyude Paul wrote:
> While this fixes audio for a number of users, this commit has the
> sideaffect of breaking the BIOS workaround that's required to make the
> GPU on the nvidia P50 work, by causing the GPU's PCI device function to
> stop working after it's been set to multifunction mode.
This is missing a reference to the commit introducing the P50 quirk,
which is e0547c81bfcf ("PCI: Reset Lenovo ThinkPad P50 nvgpu at boot
if necessary").
Please describe in more detail how the GPU's PCI function stops working.
Does it respond with "all ones" when accessing MMIO?
Do MMIO accesses cause the system to hang?
Could you provide lspci -vvxx output for the GPU and its associated
HDA controller with and without b516ea586d71?
Does this machine have external display connectors via which audio
can be streamed?
> I'm not really holding my breath on this patch to being accepted:
> there's a good chance there's a better solution for this (and I'm going
> to continue investigating for one after sending this patch), this is
> more just to start a conversation on what the proper way to fix this is.
Posting as an RFC might have been more appropriate then.
> So, I'm kind of confused about why exactly this was implemented as an
> early boot quirk in the first place. If we're seeing the GPU's PCI
> device, we already know the GPU is there. Shouldn't we be able to check
> for the existence of the HDA device once we probe the GPU in nouveau?
I think a motivation to keep this generic was to make it work with
other drivers besides nouveau, specifically Nvidia's proprietary driver.
nouveau might not even be enabled.
> that still doesn't explain why this was implemented as an early quirk
This isn't an early quirk. Those live in arch/x86/kernel/early-quirks.c.
This is just a PCI quirk executed on device enumeration and on resume.
Devices aren't necessarily enumerated only on boot, e.g. think Thunderbolt.
Thanks,
Lukas
On Wed, Jul 31, 2019 at 11:18 PM Lukas Wunner <[email protected]> wrote:
>
> On Wed, Jul 31, 2019 at 04:19:27PM -0400, Lyude Paul wrote:
> > While this fixes audio for a number of users, this commit has the
> > sideaffect of breaking the BIOS workaround that's required to make the
> > GPU on the nvidia P50 work, by causing the GPU's PCI device function to
> > stop working after it's been set to multifunction mode.
>
> This is missing a reference to the commit introducing the P50 quirk,
> which is e0547c81bfcf ("PCI: Reset Lenovo ThinkPad P50 nvgpu at boot
> if necessary").
>
> Please describe in more detail how the GPU's PCI function stops working.
> Does it respond with "all ones" when accessing MMIO?
> Do MMIO accesses cause the system to hang?
>
> Could you provide lspci -vvxx output for the GPU and its associated
> HDA controller with and without b516ea586d71?
>
> Does this machine have external display connectors via which audio
> can be streamed?
>
>
> > I'm not really holding my breath on this patch to being accepted:
> > there's a good chance there's a better solution for this (and I'm going
> > to continue investigating for one after sending this patch), this is
> > more just to start a conversation on what the proper way to fix this is.
>
> Posting as an RFC might have been more appropriate then.
>
no, a revert is actually appropriate. If a commit fixes something,
but breaks something else, it gets either reverted or fixed. If nobody
fixes it, then revert it is.
>
> > So, I'm kind of confused about why exactly this was implemented as an
> > early boot quirk in the first place. If we're seeing the GPU's PCI
> > device, we already know the GPU is there. Shouldn't we be able to check
> > for the existence of the HDA device once we probe the GPU in nouveau?
>
> I think a motivation to keep this generic was to make it work with
> other drivers besides nouveau, specifically Nvidia's proprietary driver.
> nouveau might not even be enabled.
>
>
> > that still doesn't explain why this was implemented as an early quirk
>
> This isn't an early quirk. Those live in arch/x86/kernel/early-quirks.c.
> This is just a PCI quirk executed on device enumeration and on resume.
> Devices aren't necessarily enumerated only on boot, e.g. think Thunderbolt.
>
> Thanks,
>
> Lukas
Also, I realized after sending this that I should clarify something so there
isn't any confusion.
A bunch of people on the bug that was mentioned in b516ea586d71 ("PCI: Enable
NVIDIA HDA controllers") said that this worked perfectly for their P50
laptops. While I don't doubt that at all, it should be noted that the P50
quirk there is only present on a _very specific_ subset of P50 SKUs, so it's
quite likely that the people in that bug report just didn't have a P50 that
hits this issue. The relevant model numbers of the P50 with the flakey bioses
that require this quirk should be mentioned here:
https://bugzilla.kernel.org/show_bug.cgi?id=203003
On Wed, 2019-07-31 at 16:19 -0400, Lyude Paul wrote:
> This reverts commit b516ea586d717472178e6ef1c152e85608b0ce32.
>
> While this fixes audio for a number of users, this commit has the
> sideaffect of breaking the BIOS workaround that's required to make the
> GPU on the nvidia P50 work, by causing the GPU's PCI device function to
> stop working after it's been set to multifunction mode.
>
> Signed-off-by: Lyude Paul <[email protected]>
> Fixes: b516ea586d71 ("PCI: Enable NVIDIA HDA controllers")
> Cc: Lukas Wunner <[email protected]>
> Cc: Daniel Drake <[email protected]>
> Cc: Bjorn Helgaas <[email protected]>
> Cc: Aaron Plattner <[email protected]>
> Cc: Peter Wu <[email protected]>
> Cc: Ilia Mirkin <[email protected]>
> Cc: Karol Herbst <[email protected]>
> Cc: Maik Freudenberg <[email protected]>
> Cc: [email protected]
> ---
>
> I'm not really holding my breath on this patch to being accepted:
> there's a good chance there's a better solution for this (and I'm going
> to continue investigating for one after sending this patch), this is
> more just to start a conversation on what the proper way to fix this is.
>
> So, I'm kind of confused about why exactly this was implemented as an
> early boot quirk in the first place. If we're seeing the GPU's PCI
> device, we already know the GPU is there. Shouldn't we be able to check
> for the existence of the HDA device once we probe the GPU in nouveau?
> This would make a lot more sense and be a lot less troublesome. I can
> see that in the discussion on
>
> https://bugs.freedesktop.org/show_bug.cgi?id=75985
>
> That people mentioned that unloading nouveau then trying to reprobe for
> the audio device didn't work, but that still doesn't explain why this
> was implemented as an early quirk and not as something we just do before
> nouveau is setup. Can we maybe move this somewhere a little more
> sensible?
>
> drivers/pci/quirks.c | 30 ------------------------------
> include/linux/pci_ids.h | 1 -
> 2 files changed, 31 deletions(-)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 208aacf39329..c66c0ca446c4 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5011,36 +5011,6 @@ DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA,
> PCI_ANY_ID,
> PCI_CLASS_SERIAL_UNKNOWN, 8,
> quirk_gpu_usb_typec_ucsi);
>
> -/*
> - * Enable the NVIDIA GPU integrated HDA controller if the BIOS left it
> - * disabled. https://devtalk.nvidia.com/default/topic/1024022
> - */
> -static void quirk_nvidia_hda(struct pci_dev *gpu)
> -{
> - u8 hdr_type;
> - u32 val;
> -
> - /* There was no integrated HDA controller before MCP89 */
> - if (gpu->device < PCI_DEVICE_ID_NVIDIA_GEFORCE_320M)
> - return;
> -
> - /* Bit 25 at offset 0x488 enables the HDA controller */
> - pci_read_config_dword(gpu, 0x488, &val);
> - if (val & BIT(25))
> - return;
> -
> - pci_info(gpu, "Enabling HDA controller\n");
> - pci_write_config_dword(gpu, 0x488, val | BIT(25));
> -
> - /* The GPU becomes a multi-function device when the HDA is enabled */
> - pci_read_config_byte(gpu, PCI_HEADER_TYPE, &hdr_type);
> - gpu->multifunction = !!(hdr_type & 0x80);
> -}
> -DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
> - PCI_BASE_CLASS_DISPLAY, 16, quirk_nvidia_hda);
> -DECLARE_PCI_FIXUP_CLASS_RESUME_EARLY(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
> - PCI_BASE_CLASS_DISPLAY, 16, quirk_nvidia_hda);
> -
> /*
> * Some IDT switches incorrectly flag an ACS Source Validation error on
> * completions for config read requests even though PCIe r4.0, sec
> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> index c842735a4f45..f496fb619287 100644
> --- a/include/linux/pci_ids.h
> +++ b/include/linux/pci_ids.h
> @@ -1336,7 +1336,6 @@
> #define PCI_DEVICE_ID_NVIDIA_NFORCE_MCP78S_SMBUS 0x0752
> #define PCI_DEVICE_ID_NVIDIA_NFORCE_MCP77_IDE 0x0759
> #define PCI_DEVICE_ID_NVIDIA_NFORCE_MCP73_SMBUS 0x07D8
> -#define PCI_DEVICE_ID_NVIDIA_GEFORCE_320M 0x08A0
> #define PCI_DEVICE_ID_NVIDIA_NFORCE_MCP79_SMBUS 0x0AA2
> #define PCI_DEVICE_ID_NVIDIA_NFORCE_MCP89_SATA 0x0D85
>
--
Cheers,
Lyude Paul
On Wed, 2019-07-31 at 17:35 -0400, Lyude Paul wrote:
> On Wed, 2019-07-31 at 23:26 +0200, Karol Herbst wrote:
> > On Wed, Jul 31, 2019 at 11:18 PM Lukas Wunner <[email protected]> wrote:
> > > On Wed, Jul 31, 2019 at 04:19:27PM -0400, Lyude Paul wrote:
> > > > While this fixes audio for a number of users, this commit has the
> > > > sideaffect of breaking the BIOS workaround that's required to make the
> > > > GPU on the nvidia P50 work, by causing the GPU's PCI device function
> > > > to
> > > > stop working after it's been set to multifunction mode.
> > >
> > > This is missing a reference to the commit introducing the P50 quirk,
> > > which is e0547c81bfcf ("PCI: Reset Lenovo ThinkPad P50 nvgpu at boot
> > > if necessary").
> > >
> > > Please describe in more detail how the GPU's PCI function stops working.
> > > Does it respond with "all ones" when accessing MMIO?
> > > Do MMIO accesses cause the system to hang?
> > >
> > > Could you provide lspci -vvxx output for the GPU and its associated
> > > HDA controller with and without b516ea586d71?
> > >
> > > Does this machine have external display connectors via which audio
> > > can be streamed?
> > >
> > >
> > > > I'm not really holding my breath on this patch to being accepted:
> > > > there's a good chance there's a better solution for this (and I'm
> > > > going
> > > > to continue investigating for one after sending this patch), this is
> > > > more just to start a conversation on what the proper way to fix this
> > > > is.
> > >
> > > Posting as an RFC might have been more appropriate then.
> > >
> >
> > no, a revert is actually appropriate. If a commit fixes something,
> > but breaks something else, it gets either reverted or fixed. If nobody
> > fixes it, then revert it is.
>
> To answer Lukas's question btw: most of the details on how things break are
> back in the original commit (sorry for forgetting the reference!), there's a
> _lot_ of explanation there that I'd rather not retype, so just refer back to
> the commit and bug @ https://bugs.freedesktop.org/show_bug.cgi?id=75985
>
> Additionally, there was some extra discussion providing some more detail in
> the email thread that I had with Bjorn:
>
> https://lkml.org/lkml/2019/2/12/1172
>
> As for how this commit breaks the workaround: it seems that when we enable
> the
> HDA controller and put the GPU into multifunction mode, the function-level
> reset stops working and thus we can't reset the GPU anymore. Currently I can
> see a couple of solutions (again, please feel free to suggest more!):
>
> * Just revert the commit. We should do this if necessary, but of course I'd
> much rather try finding a fix first
> * Disable the HDA controller temporarily when a GPU reset is neded in
> quirk_reset_lenovo_thinkpad_p50_nvgpu(), then call the function level
> reset, then re-enable the HDA controller. I have no idea if this actually
> works yet, but I'm about to try this on my system
> * Get quirk_reset_lenovo_thinkpad_p50_nvgpu() to run before
> quirk_nvidia_hda(). This would probably be fine, but we would need to
> rework some stuff in the PCI subsystem (maybe it already has a way to do
> this? haven't checked yet) so that we could perform an flr probe early
> enough to perform the quirk
Good news! After some investigation looks like that function level reset
actually does work, just that after we put it in multifunction mode
pci_parent_bus_reset() sees multiple devices on the bus and returns -ENOTTY as
a result. So I should definitely be able to come up with a fix for this other
then reverting this :). Will send out patches soon
> > > > So, I'm kind of confused about why exactly this was implemented as an
> > > > early boot quirk in the first place. If we're seeing the GPU's PCI
> > > > device, we already know the GPU is there. Shouldn't we be able to
> > > > check
> > > > for the existence of the HDA device once we probe the GPU in nouveau?
> > >
> > > I think a motivation to keep this generic was to make it work with
> > > other drivers besides nouveau, specifically Nvidia's proprietary driver.
> > > nouveau might not even be enabled.
> > >
> > >
> > > > that still doesn't explain why this was implemented as an early quirk
> > >
> > > This isn't an early quirk. Those live in arch/x86/kernel/early-
> > > quirks.c.
> > > This is just a PCI quirk executed on device enumeration and on resume.
> > > Devices aren't necessarily enumerated only on boot, e.g. think
> > > Thunderbolt.
> > >
> > > Thanks,
> > >
> > > Lukas
--
Cheers,
Lyude Paul