2022-08-19 17:35:37

by Mario Limonciello

[permalink] [raw]
Subject: Re: [RFC 0/2] Stop the abuse of Linux-* _OSI strings

On 8/19/2022 10:44, Karol Herbst wrote:
> On Fri, Aug 19, 2022 at 4:25 PM Mario Limonciello
> <[email protected]> wrote:
>>
>> 3 _OSI strings were introduced in recent years that were intended
>> to workaround very specific problems found on specific systems.
>>
>> The idea was supposed to be that these quirks were only used on
>> those systems, but this proved to be a bad assumption. I've found
>> at least one system in the wild where the vendor using the _OSI
>> string doesn't match the _OSI string and the neither does the use.
>>
>> So this brings a good time to review keeping those strings in the kernel.
>> There are 3 strings that were introduced:
>>
>> Linux-Dell-Video
>> -> Intended for systems with NVIDIA cards that didn't support RTD3
>> Linux-Lenovo-NV-HDMI-Audio
>> -> Intended for powering on NVIDIA HDMI device
>> Linux-HPI-Hybrid-Graphics
>> -> Intended for changing dGPU output
>>
>> AFAIK the first string is no longer relevant as nouveau now supports
>> RTD3. If that's wrong, this can be changed for the series.
>>
>
> Nouveau always supported RTD3, because that's mainly a kernel feature.
> When those were introduced we simply had a bug only hit on a few
> systems. And instead of helping us to debug this, this workaround was
> added :( We were not even asked about this.

My apologies, I was certainly part of the impetus for this W/A in the
first place while I was at my previous employer. Your comment
re-affirms to me that at least the first patch is correct.

>
> I am a bit curious about the other two though as I am not even sure
> they are needed at all as we put other work arounds in place. @Lyude
> Paul might know more about these.
>

If the other two really aren't needed anymore, then yeah we should just
tear all 3 out. If that's the direction we go, I would appreciate some
commit IDs to reference in the commit message for tearing them out so
that if they end up backporting to stable we know how far they should go.


2022-08-19 18:02:48

by Karol Herbst

[permalink] [raw]
Subject: Re: [RFC 0/2] Stop the abuse of Linux-* _OSI strings

On Fri, Aug 19, 2022 at 6:00 PM Limonciello, Mario
<[email protected]> wrote:
>
> On 8/19/2022 10:44, Karol Herbst wrote:
> > On Fri, Aug 19, 2022 at 4:25 PM Mario Limonciello
> > <[email protected]> wrote:
> >>
> >> 3 _OSI strings were introduced in recent years that were intended
> >> to workaround very specific problems found on specific systems.
> >>
> >> The idea was supposed to be that these quirks were only used on
> >> those systems, but this proved to be a bad assumption. I've found
> >> at least one system in the wild where the vendor using the _OSI
> >> string doesn't match the _OSI string and the neither does the use.
> >>
> >> So this brings a good time to review keeping those strings in the kernel.
> >> There are 3 strings that were introduced:
> >>
> >> Linux-Dell-Video
> >> -> Intended for systems with NVIDIA cards that didn't support RTD3
> >> Linux-Lenovo-NV-HDMI-Audio
> >> -> Intended for powering on NVIDIA HDMI device
> >> Linux-HPI-Hybrid-Graphics
> >> -> Intended for changing dGPU output
> >>
> >> AFAIK the first string is no longer relevant as nouveau now supports
> >> RTD3. If that's wrong, this can be changed for the series.
> >>
> >
> > Nouveau always supported RTD3, because that's mainly a kernel feature.
> > When those were introduced we simply had a bug only hit on a few
> > systems. And instead of helping us to debug this, this workaround was
> > added :( We were not even asked about this.
>
> My apologies, I was certainly part of the impetus for this W/A in the
> first place while I was at my previous employer. Your comment
> re-affirms to me that at least the first patch is correct.
>

Yeah, no worries. I just hope that people in the future will
communicate such things.

Anyway, there are a few issues with the runpm stuff left, and looking
at what nvidia does in their open driver makes me wonder if we might
need a bigger overhaul of runpm. They do apply bridge/host controller
specific workarounds and I suspect some of them are related here as
the workaround I came up with in nouveau can be seen in 434fdb51513bf.

But also having access to documentation/specification from what Nvidia
is doing would be quite helpful. We know that on some really new AMD
systems we run into new issues and this needs some investigation. I
simply don't access to any laptops where this problem can be seen.

> >
> > I am a bit curious about the other two though as I am not even sure
> > they are needed at all as we put other work arounds in place. @Lyude
> > Paul might know more about these.
> >
>
> If the other two really aren't needed anymore, then yeah we should just
> tear all 3 out. If that's the direction we go, I would appreciate some
> commit IDs to reference in the commit message for tearing them out so
> that if they end up backporting to stable we know how far they should go.
>

2022-08-19 18:03:28

by Mario Limonciello

[permalink] [raw]
Subject: Re: [RFC 0/2] Stop the abuse of Linux-* _OSI strings

On 8/19/2022 11:37, Karol Herbst wrote:
> On Fri, Aug 19, 2022 at 6:00 PM Limonciello, Mario
> <[email protected]> wrote:
>>
>> On 8/19/2022 10:44, Karol Herbst wrote:
>>> On Fri, Aug 19, 2022 at 4:25 PM Mario Limonciello
>>> <[email protected]> wrote:
>>>>
>>>> 3 _OSI strings were introduced in recent years that were intended
>>>> to workaround very specific problems found on specific systems.
>>>>
>>>> The idea was supposed to be that these quirks were only used on
>>>> those systems, but this proved to be a bad assumption. I've found
>>>> at least one system in the wild where the vendor using the _OSI
>>>> string doesn't match the _OSI string and the neither does the use.
>>>>
>>>> So this brings a good time to review keeping those strings in the kernel.
>>>> There are 3 strings that were introduced:
>>>>
>>>> Linux-Dell-Video
>>>> -> Intended for systems with NVIDIA cards that didn't support RTD3
>>>> Linux-Lenovo-NV-HDMI-Audio
>>>> -> Intended for powering on NVIDIA HDMI device
>>>> Linux-HPI-Hybrid-Graphics
>>>> -> Intended for changing dGPU output
>>>>
>>>> AFAIK the first string is no longer relevant as nouveau now supports
>>>> RTD3. If that's wrong, this can be changed for the series.
>>>>
>>>
>>> Nouveau always supported RTD3, because that's mainly a kernel feature.
>>> When those were introduced we simply had a bug only hit on a few
>>> systems. And instead of helping us to debug this, this workaround was
>>> added :( We were not even asked about this.
>>
>> My apologies, I was certainly part of the impetus for this W/A in the
>> first place while I was at my previous employer. Your comment
>> re-affirms to me that at least the first patch is correct.
>>
>
> Yeah, no worries. I just hope that people in the future will
> communicate such things.
>
> Anyway, there are a few issues with the runpm stuff left, and looking
> at what nvidia does in their open driver makes me wonder if we might
> need a bigger overhaul of runpm. They do apply bridge/host controller
> specific workarounds and I suspect some of them are related here as
> the workaround I came up with in nouveau can be seen in 434fdb51513bf.

But this overhaul shouldn't gate removing this _OSI string, or you think
it should?

>
> But also having access to documentation/specification from what Nvidia
> is doing would be quite helpful. We know that on some really new AMD
> systems we run into new issues and this needs some investigation. I
> simply don't access to any laptops where this problem can be seen.
>

Do you mean there are specifically remaining issues on AMD APU + NVIDIA
dGPU systems? Any public bugs by chance?

Depending on what these are I'm happy to try to help with at least
access. If we have them maybe we can try to make the right connections
to get some hardware to you, or at least remotely access it.

>>>
>>> I am a bit curious about the other two though as I am not even sure
>>> they are needed at all as we put other work arounds in place. @Lyude
>>> Paul might know more about these.
>>>
>>
>> If the other two really aren't needed anymore, then yeah we should just
>> tear all 3 out. If that's the direction we go, I would appreciate some
>> commit IDs to reference in the commit message for tearing them out so
>> that if they end up backporting to stable we know how far they should go.
>>
>

2022-08-19 18:47:53

by Karol Herbst

[permalink] [raw]
Subject: Re: [RFC 0/2] Stop the abuse of Linux-* _OSI strings

On Fri, Aug 19, 2022 at 6:43 PM Limonciello, Mario
<[email protected]> wrote:
>
> On 8/19/2022 11:37, Karol Herbst wrote:
> > On Fri, Aug 19, 2022 at 6:00 PM Limonciello, Mario
> > <[email protected]> wrote:
> >>
> >> On 8/19/2022 10:44, Karol Herbst wrote:
> >>> On Fri, Aug 19, 2022 at 4:25 PM Mario Limonciello
> >>> <[email protected]> wrote:
> >>>>
> >>>> 3 _OSI strings were introduced in recent years that were intended
> >>>> to workaround very specific problems found on specific systems.
> >>>>
> >>>> The idea was supposed to be that these quirks were only used on
> >>>> those systems, but this proved to be a bad assumption. I've found
> >>>> at least one system in the wild where the vendor using the _OSI
> >>>> string doesn't match the _OSI string and the neither does the use.
> >>>>
> >>>> So this brings a good time to review keeping those strings in the kernel.
> >>>> There are 3 strings that were introduced:
> >>>>
> >>>> Linux-Dell-Video
> >>>> -> Intended for systems with NVIDIA cards that didn't support RTD3
> >>>> Linux-Lenovo-NV-HDMI-Audio
> >>>> -> Intended for powering on NVIDIA HDMI device
> >>>> Linux-HPI-Hybrid-Graphics
> >>>> -> Intended for changing dGPU output
> >>>>
> >>>> AFAIK the first string is no longer relevant as nouveau now supports
> >>>> RTD3. If that's wrong, this can be changed for the series.
> >>>>
> >>>
> >>> Nouveau always supported RTD3, because that's mainly a kernel feature.
> >>> When those were introduced we simply had a bug only hit on a few
> >>> systems. And instead of helping us to debug this, this workaround was
> >>> added :( We were not even asked about this.
> >>
> >> My apologies, I was certainly part of the impetus for this W/A in the
> >> first place while I was at my previous employer. Your comment
> >> re-affirms to me that at least the first patch is correct.
> >>
> >
> > Yeah, no worries. I just hope that people in the future will
> > communicate such things.
> >
> > Anyway, there are a few issues with the runpm stuff left, and looking
> > at what nvidia does in their open driver makes me wonder if we might
> > need a bigger overhaul of runpm. They do apply bridge/host controller
> > specific workarounds and I suspect some of them are related here as
> > the workaround I came up with in nouveau can be seen in 434fdb51513bf.
>
> But this overhaul shouldn't gate removing this _OSI string, or you think
> it should?
>

Hard to tell. If there are affected systems but have those _OSI
strings in place so it's hidden, this would be annoying, but then we
might have more pointers on what's actually broken. Anyway, we don't
need those workarounds and rather a real fix for all those issues. And
I suspect the real fix is to apply specific workarounds for specific
systems.

> >
> > But also having access to documentation/specification from what Nvidia
> > is doing would be quite helpful. We know that on some really new AMD
> > systems we run into new issues and this needs some investigation. I
> > simply don't access to any laptops where this problem can be seen.
> >
>
> Do you mean there are specifically remaining issues on AMD APU + NVIDIA
> dGPU systems? Any public bugs by chance?
>
> Depending on what these are I'm happy to try to help with at least
> access. If we have them maybe we can try to make the right connections
> to get some hardware to you, or at least remotely access it.
>

https://gitlab.freedesktop.org/drm/nouveau/-/issues/108

there might be more though, but this should be a good start.

> >>>
> >>> I am a bit curious about the other two though as I am not even sure
> >>> they are needed at all as we put other work arounds in place. @Lyude
> >>> Paul might know more about these.
> >>>
> >>
> >> If the other two really aren't needed anymore, then yeah we should just
> >> tear all 3 out. If that's the direction we go, I would appreciate some
> >> commit IDs to reference in the commit message for tearing them out so
> >> that if they end up backporting to stable we know how far they should go.
> >>
> >
>