2024-05-06 19:21:46

by Wren Turkal

[permalink] [raw]
Subject: path to landing patch to fix warm boot issue for qca6390

Krzysztof,

I am reaching out to you as you had the most important objections to the
change to fix qca6390 for the warm boot/module reload bug that I am
experiencing.

For context, the problem is that the hci_uart module will send specific
vendor specfic commands during shutdown of the hardware under most
situations. These VSCs put the bluetooth device into a non-working state
on my Dell XPS 13 9310 with qca6390 bluetooth hardware.

Zijun's proposed fix is to not send these commands when it's not
appropriate for the hardware. The vendor commands should be avoided when
the hardware does not have persistent configuration or when the device
is in setup state (indicating that is has never been setup and should
not be sent the VSCs on the shutdown path). This is what Zijun's patch
implements.

In addition, Zijun's change removes the influence of both
the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts
that those flags should not influence the sending of the VSCs in the
shutdown path. If I understand KK's objections properly, this is where
his objection is stemming from. KK, is this correct?

Zijun's proposed fix can be seen here:
https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/

I'm wondering if we can resolve this impasse by splitting the change
into two changes, as follows:

1. Change that removes the influence of the QCA_BT_OFF and HCI_RUNNING
flags in the shutdown path.
2. Add the quirk from Zijun's patch that fixes my hardward configuration.

I'm hoping that better clearer descriptions for #1 can help get that
landed since the logic current appears to be at odds with how the
hardware works.

Also, I am happy to split the patches into the two patches, or (maybe
more ideally) just modify the commit message to better indicate the
reason the change. I just need guidance from maintainers so that
whatever work I do leads to something acceptable for y'all.

So, please help me get this done. I am just a user with broken hardware
and a fondness for Linux. I would love to help do what's needed to get
this fix landed.

Please help me get there,
Wren T
--
You're more amazing than you think!


2024-05-06 19:49:50

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: path to landing patch to fix warm boot issue for qca6390

Hi Wren,

On Mon, May 6, 2024 at 3:24 PM Wren Turkal <[email protected]> wrote:
>
> Krzysztof,
>
> I am reaching out to you as you had the most important objections to the
> change to fix qca6390 for the warm boot/module reload bug that I am
> experiencing.
>
> For context, the problem is that the hci_uart module will send specific
> vendor specfic commands during shutdown of the hardware under most
> situations. These VSCs put the bluetooth device into a non-working state
> on my Dell XPS 13 9310 with qca6390 bluetooth hardware.
>
> Zijun's proposed fix is to not send these commands when it's not
> appropriate for the hardware. The vendor commands should be avoided when
> the hardware does not have persistent configuration or when the device
> is in setup state (indicating that is has never been setup and should
> not be sent the VSCs on the shutdown path). This is what Zijun's patch
> implements.
>
> In addition, Zijun's change removes the influence of both
> the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts
> that those flags should not influence the sending of the VSCs in the
> shutdown path. If I understand KK's objections properly, this is where
> his objection is stemming from. KK, is this correct?
>
> Zijun's proposed fix can be seen here:
> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
>
> I'm wondering if we can resolve this impasse by splitting the change
> into two changes, as follows:
>
> 1. Change that removes the influence of the QCA_BT_OFF and HCI_RUNNING
> flags in the shutdown path.
> 2. Add the quirk from Zijun's patch that fixes my hardward configuration.
>
> I'm hoping that better clearer descriptions for #1 can help get that
> landed since the logic current appears to be at odds with how the
> hardware works.
>
> Also, I am happy to split the patches into the two patches, or (maybe
> more ideally) just modify the commit message to better indicate the
> reason the change. I just need guidance from maintainers so that
> whatever work I do leads to something acceptable for y'all.
>
> So, please help me get this done. I am just a user with broken hardware
> and a fondness for Linux. I would love to help do what's needed to get
> this fix landed.

Ive also objected to that change, in fact the whole shutdown sequence
is sort of bogus in my opinion and the driver shall really have some
means to find out what mode it is in when it reboots, regardless if
cold or warm boot, since otherwise we are in trouble if the user is
booting from another OS that doesn't do the expected shutdown
sequence.

--
Luiz Augusto von Dentz

2024-05-07 06:27:31

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: path to landing patch to fix warm boot issue for qca6390

On 06/05/2024 21:21, Wren Turkal wrote:
> Krzysztof,
>
> I am reaching out to you as you had the most important objections to the
> change to fix qca6390 for the warm boot/module reload bug that I am
> experiencing.
>
> For context, the problem is that the hci_uart module will send specific
> vendor specfic commands during shutdown of the hardware under most
> situations. These VSCs put the bluetooth device into a non-working state
> on my Dell XPS 13 9310 with qca6390 bluetooth hardware.
>
> Zijun's proposed fix is to not send these commands when it's not
> appropriate for the hardware. The vendor commands should be avoided when
> the hardware does not have persistent configuration or when the device
> is in setup state (indicating that is has never been setup and should
> not be sent the VSCs on the shutdown path). This is what Zijun's patch
> implements.
>
> In addition, Zijun's change removes the influence of both
> the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts
> that those flags should not influence the sending of the VSCs in the
> shutdown path. If I understand KK's objections properly, this is where
> his objection is stemming from. KK, is this correct?

Yes, because this basically reverts my fix for shutdown path and
re-opens the race condition.

>
> Zijun's proposed fix can be seen here:
> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
>
> I'm wondering if we can resolve this impasse by splitting the change
> into two changes, as follows:
>
> 1. Change that removes the influence of the QCA_BT_OFF and HCI_RUNNING
> flags in the shutdown path.

There was no explanation why this was removed, so it does not matter to
me whether this should be separate path or not.

Zijun, even though I asked four times, did not provide information on
which kernel this was prepared and tested, and on which hardware.

I assume Zijun did not understand original issue, thus assumes changing
the code to HCI_SETUP fixes it.


> 2. Add the quirk from Zijun's patch that fixes my hardward configuration.
>
> I'm hoping that better clearer descriptions for #1 can help get that
> landed since the logic current appears to be at odds with how the
> hardware works.
>
> Also, I am happy to split the patches into the two patches, or (maybe
> more ideally) just modify the commit message to better indicate the
> reason the change. I just need guidance from maintainers so that
> whatever work I do leads to something acceptable for y'all.
>
> So, please help me get this done. I am just a user with broken hardware
> and a fondness for Linux. I would love to help do what's needed to get
> this fix landed.

Best regards,
Krzysztof


2024-05-07 14:58:23

by Lk Sii

[permalink] [raw]
Subject: Re: path to landing patch to fix warm boot issue for qca6390

On 2024/5/7 14:27, Krzysztof Kozlowski wrote:
> On 06/05/2024 21:21, Wren Turkal wrote:
>> Krzysztof,
>>
>> I am reaching out to you as you had the most important objections to the
>> change to fix qca6390 for the warm boot/module reload bug that I am
>> experiencing.
>>
>> For context, the problem is that the hci_uart module will send specific
>> vendor specfic commands during shutdown of the hardware under most
>> situations. These VSCs put the bluetooth device into a non-working state
>> on my Dell XPS 13 9310 with qca6390 bluetooth hardware.
>>
>> Zijun's proposed fix is to not send these commands when it's not
>> appropriate for the hardware. The vendor commands should be avoided when
>> the hardware does not have persistent configuration or when the device
>> is in setup state (indicating that is has never been setup and should
>> not be sent the VSCs on the shutdown path). This is what Zijun's patch
>> implements.
>>
>> In addition, Zijun's change removes the influence of both
>> the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts
>> that those flags should not influence the sending of the VSCs in the
>> shutdown path. If I understand KK's objections properly, this is where
>> his objection is stemming from. KK, is this correct?
>
> Yes, because this basically reverts my fix for shutdown path and
> re-opens the race condition.
>
>>
>> Zijun's proposed fix can be seen here:
>> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
>>
>> I'm wondering if we can resolve this impasse by splitting the change
>> into two changes, as follows:
>>
>> 1. Change that removes the influence of the QCA_BT_OFF and HCI_RUNNING
>> flags in the shutdown path.
>
> There was no explanation why this was removed, so it does not matter to
> me whether this should be separate path or not.
>
> Zijun, even though I asked four times, did not provide information on
> which kernel this was prepared and tested, and on which hardware.
>
> I assume Zijun did not understand original issue, thus assumes changing
> the code to HCI_SETUP fixes it.
>
>
>> 2. Add the quirk from Zijun's patch that fixes my hardward configuration.
>>
>> I'm hoping that better clearer descriptions for #1 can help get that
>> landed since the logic current appears to be at odds with how the
>> hardware works.
>>
>> Also, I am happy to split the patches into the two patches, or (maybe
>> more ideally) just modify the commit message to better indicate the
>> reason the change. I just need guidance from maintainers so that
>> whatever work I do leads to something acceptable for y'all.
>>
>> So, please help me get this done. I am just a user with broken hardware
>> and a fondness for Linux. I would love to help do what's needed to get
>> this fix landed.
>
it seems there are a new below discussion thread for the single patch
https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
you maybe go there to discuss.
thanks
> Best regards,
> Krzysztof
>
>
>


2024-05-10 19:16:29

by Wren Turkal

[permalink] [raw]
Subject: Re: path to landing patch to fix warm boot issue for qca6390

On 5/6/24 12:49 PM, Luiz Augusto von Dentz wrote:
> Hi Wren,
>
> On Mon, May 6, 2024 at 3:24 PM Wren Turkal <[email protected]> wrote:
>>
>> Krzysztof,
>>
>> I am reaching out to you as you had the most important objections to the
>> change to fix qca6390 for the warm boot/module reload bug that I am
>> experiencing.
>>
>> For context, the problem is that the hci_uart module will send specific
>> vendor specfic commands during shutdown of the hardware under most
>> situations. These VSCs put the bluetooth device into a non-working state
>> on my Dell XPS 13 9310 with qca6390 bluetooth hardware.
>>
>> Zijun's proposed fix is to not send these commands when it's not
>> appropriate for the hardware. The vendor commands should be avoided when
>> the hardware does not have persistent configuration or when the device
>> is in setup state (indicating that is has never been setup and should
>> not be sent the VSCs on the shutdown path). This is what Zijun's patch
>> implements.
>>
>> In addition, Zijun's change removes the influence of both
>> the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts
>> that those flags should not influence the sending of the VSCs in the
>> shutdown path. If I understand KK's objections properly, this is where
>> his objection is stemming from. KK, is this correct?
>>
>> Zijun's proposed fix can be seen here:
>> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
>>
>> I'm wondering if we can resolve this impasse by splitting the change
>> into two changes, as follows:
>>
>> 1. Change that removes the influence of the QCA_BT_OFF and HCI_RUNNING
>> flags in the shutdown path.
>> 2. Add the quirk from Zijun's patch that fixes my hardward configuration.
>>
>> I'm hoping that better clearer descriptions for #1 can help get that
>> landed since the logic current appears to be at odds with how the
>> hardware works.
>>
>> Also, I am happy to split the patches into the two patches, or (maybe
>> more ideally) just modify the commit message to better indicate the
>> reason the change. I just need guidance from maintainers so that
>> whatever work I do leads to something acceptable for y'all.
>>
>> So, please help me get this done. I am just a user with broken hardware
>> and a fondness for Linux. I would love to help do what's needed to get
>> this fix landed.
>
> Ive also objected to that change, in fact the whole shutdown sequence
> is sort of bogus in my opinion and the driver shall really have some
> means to find out what mode it is in when it reboots, regardless if
> cold or warm boot, since otherwise we are in trouble if the user is
> booting from another OS that doesn't do the expected shutdown
> sequence.

This criticism makes a ton of sense. I'm sorry I missed it before. There
were a lot of threads moving in parallel. However, I am curious. Given
that the patch improves the situation for users (like me). Is there any
way we can separate the redesign of the shutdown sequence and the UX
improvement that comes with this patch?

Here's my concern. I am happy to do the work to redesign this. However,
I don't think I have the information needed to do this since I don't
have access to the technical docs for the qca6390. I am worried that not
accepting some form of this patch is letting perfect be the enemy of the
good. And I am not sure how I personally can help with that. If you
think it's possible for me to do this without the docs for the hardware,
I am willing to give it a shot if I can get some guidance. Honestly, I
wish I had the skill to be confident about a change like this, but I don't.

Any ideas on how to move forward would be greatly appreciated.

And just to be perfectly clear, I have tested this patch on my laptop.
It greatly enhances my ability to use my hardware since I can reboot the
machine without having to make sure to power cycle the laptop. This is
not a theoretical improvement.

Thanks,
wt
--
You're more amazing than you think!

2024-05-10 19:49:15

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: path to landing patch to fix warm boot issue for qca6390

Hi Wren,

On Fri, May 10, 2024 at 3:14 PM Wren Turkal <[email protected]> wrote:
>
> On 5/6/24 12:49 PM, Luiz Augusto von Dentz wrote:
> > Hi Wren,
> >
> > On Mon, May 6, 2024 at 3:24 PM Wren Turkal <[email protected]> wrote:
> >>
> >> Krzysztof,
> >>
> >> I am reaching out to you as you had the most important objections to the
> >> change to fix qca6390 for the warm boot/module reload bug that I am
> >> experiencing.
> >>
> >> For context, the problem is that the hci_uart module will send specific
> >> vendor specfic commands during shutdown of the hardware under most
> >> situations. These VSCs put the bluetooth device into a non-working state
> >> on my Dell XPS 13 9310 with qca6390 bluetooth hardware.
> >>
> >> Zijun's proposed fix is to not send these commands when it's not
> >> appropriate for the hardware. The vendor commands should be avoided when
> >> the hardware does not have persistent configuration or when the device
> >> is in setup state (indicating that is has never been setup and should
> >> not be sent the VSCs on the shutdown path). This is what Zijun's patch
> >> implements.
> >>
> >> In addition, Zijun's change removes the influence of both
> >> the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts
> >> that those flags should not influence the sending of the VSCs in the
> >> shutdown path. If I understand KK's objections properly, this is where
> >> his objection is stemming from. KK, is this correct?
> >>
> >> Zijun's proposed fix can be seen here:
> >> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
> >>
> >> I'm wondering if we can resolve this impasse by splitting the change
> >> into two changes, as follows:
> >>
> >> 1. Change that removes the influence of the QCA_BT_OFF and HCI_RUNNING
> >> flags in the shutdown path.
> >> 2. Add the quirk from Zijun's patch that fixes my hardward configuration.
> >>
> >> I'm hoping that better clearer descriptions for #1 can help get that
> >> landed since the logic current appears to be at odds with how the
> >> hardware works.
> >>
> >> Also, I am happy to split the patches into the two patches, or (maybe
> >> more ideally) just modify the commit message to better indicate the
> >> reason the change. I just need guidance from maintainers so that
> >> whatever work I do leads to something acceptable for y'all.
> >>
> >> So, please help me get this done. I am just a user with broken hardware
> >> and a fondness for Linux. I would love to help do what's needed to get
> >> this fix landed.
> >
> > Ive also objected to that change, in fact the whole shutdown sequence
> > is sort of bogus in my opinion and the driver shall really have some
> > means to find out what mode it is in when it reboots, regardless if
> > cold or warm boot, since otherwise we are in trouble if the user is
> > booting from another OS that doesn't do the expected shutdown
> > sequence.
>
> This criticism makes a ton of sense. I'm sorry I missed it before. There
> were a lot of threads moving in parallel. However, I am curious. Given
> that the patch improves the situation for users (like me). Is there any
> way we can separate the redesign of the shutdown sequence and the UX
> improvement that comes with this patch?
>
> Here's my concern. I am happy to do the work to redesign this. However,
> I don't think I have the information needed to do this since I don't
> have access to the technical docs for the qca6390. I am worried that not
> accepting some form of this patch is letting perfect be the enemy of the
> good. And I am not sure how I personally can help with that. If you
> think it's possible for me to do this without the docs for the hardware,
> I am willing to give it a shot if I can get some guidance. Honestly, I
> wish I had the skill to be confident about a change like this, but I don't.
>
> Any ideas on how to move forward would be greatly appreciated.
>
> And just to be perfectly clear, I have tested this patch on my laptop.
> It greatly enhances my ability to use my hardware since I can reboot the
> machine without having to make sure to power cycle the laptop. This is
> not a theoretical improvement.

I would really love some explanation why can't the driver know under
what mode the controller is when it gets probed, because to me we
cannot accept a driver that only works under certain condition after
the boot and in case it is really impossible, can't even power cycle
it to get it back to cold boot stage???

Also the criticism here should be directed to the vendor, how long
have we been discussing problems in the QCA driver? And the only thing
I see coming our way are work-arounds of the problems, the address not
being unique coming from the firmware itself and when provided via DT
the address is in the wrong byteorder and now that the driver must
communicate the firmware on shutdown in order to get it working
properly on the next boot.

--
Luiz Augusto von Dentz

2024-05-10 20:55:46

by Wren Turkal

[permalink] [raw]
Subject: Re: path to landing patch to fix warm boot issue for qca6390

On 5/10/24 12:48 PM, Luiz Augusto von Dentz wrote:
> Hi Wren,
>
> On Fri, May 10, 2024 at 3:14 PM Wren Turkal <[email protected]> wrote:
>>
>> On 5/6/24 12:49 PM, Luiz Augusto von Dentz wrote:
>>> Hi Wren,
>>>
>>> On Mon, May 6, 2024 at 3:24 PM Wren Turkal <[email protected]> wrote:
>>>>
>>>> Krzysztof,
>>>>
>>>> I am reaching out to you as you had the most important objections to the
>>>> change to fix qca6390 for the warm boot/module reload bug that I am
>>>> experiencing.
>>>>
>>>> For context, the problem is that the hci_uart module will send specific
>>>> vendor specfic commands during shutdown of the hardware under most
>>>> situations. These VSCs put the bluetooth device into a non-working state
>>>> on my Dell XPS 13 9310 with qca6390 bluetooth hardware.
>>>>
>>>> Zijun's proposed fix is to not send these commands when it's not
>>>> appropriate for the hardware. The vendor commands should be avoided when
>>>> the hardware does not have persistent configuration or when the device
>>>> is in setup state (indicating that is has never been setup and should
>>>> not be sent the VSCs on the shutdown path). This is what Zijun's patch
>>>> implements.
>>>>
>>>> In addition, Zijun's change removes the influence of both
>>>> the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts
>>>> that those flags should not influence the sending of the VSCs in the
>>>> shutdown path. If I understand KK's objections properly, this is where
>>>> his objection is stemming from. KK, is this correct?
>>>>
>>>> Zijun's proposed fix can be seen here:
>>>> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
>>>>
>>>> I'm wondering if we can resolve this impasse by splitting the change
>>>> into two changes, as follows:
>>>>
>>>> 1. Change that removes the influence of the QCA_BT_OFF and HCI_RUNNING
>>>> flags in the shutdown path.
>>>> 2. Add the quirk from Zijun's patch that fixes my hardward configuration.
>>>>
>>>> I'm hoping that better clearer descriptions for #1 can help get that
>>>> landed since the logic current appears to be at odds with how the
>>>> hardware works.
>>>>
>>>> Also, I am happy to split the patches into the two patches, or (maybe
>>>> more ideally) just modify the commit message to better indicate the
>>>> reason the change. I just need guidance from maintainers so that
>>>> whatever work I do leads to something acceptable for y'all.
>>>>
>>>> So, please help me get this done. I am just a user with broken hardware
>>>> and a fondness for Linux. I would love to help do what's needed to get
>>>> this fix landed.
>>>
>>> Ive also objected to that change, in fact the whole shutdown sequence
>>> is sort of bogus in my opinion and the driver shall really have some
>>> means to find out what mode it is in when it reboots, regardless if
>>> cold or warm boot, since otherwise we are in trouble if the user is
>>> booting from another OS that doesn't do the expected shutdown
>>> sequence.
>>
>> This criticism makes a ton of sense. I'm sorry I missed it before. There
>> were a lot of threads moving in parallel. However, I am curious. Given
>> that the patch improves the situation for users (like me). Is there any
>> way we can separate the redesign of the shutdown sequence and the UX
>> improvement that comes with this patch?
>>
>> Here's my concern. I am happy to do the work to redesign this. However,
>> I don't think I have the information needed to do this since I don't
>> have access to the technical docs for the qca6390. I am worried that not
>> accepting some form of this patch is letting perfect be the enemy of the
>> good. And I am not sure how I personally can help with that. If you
>> think it's possible for me to do this without the docs for the hardware,
>> I am willing to give it a shot if I can get some guidance. Honestly, I
>> wish I had the skill to be confident about a change like this, but I don't.
>>
>> Any ideas on how to move forward would be greatly appreciated.
>>
>> And just to be perfectly clear, I have tested this patch on my laptop.
>> It greatly enhances my ability to use my hardware since I can reboot the
>> machine without having to make sure to power cycle the laptop. This is
>> not a theoretical improvement.
>
> I would really love some explanation why can't the driver know under
> what mode the controller is when it gets probed, because to me we
> cannot accept a driver that only works under certain condition after
> the boot and in case it is really impossible, can't even power cycle
> it to get it back to cold boot stage???

This is a great technical criticism of the driver, and I think you
deserve that explanation.

However, with the driver already in the kernel, shouldn't the bias be
toward mitigating the extremely bad UX and not hold users hostage for
the bad design which has already been approved and landed in the kernel?

> Also the criticism here should be directed to the vendor, how long
> have we been discussing problems in the QCA driver? And the only thing
> I see coming our way are work-arounds of the problems, the address not
> being unique coming from the firmware itself and when provided via DT
> the address is in the wrong byteorder and now that the driver must
> communicate the firmware on shutdown in order to get it working
> properly on the next boot.

I agree that Qualcomm should get flack for this, however, the UX problem
can be mitigated with a logic fix that doesn't make the init/shutdown
design problem any worse than it currently seems to be. I mean, wouldn't
this logic have to exist somewhere even if it weren't the shutdown path?

If you are trying to use this as leverage to get Qualcomm to do a bigger
thing (redesign the init/shutdown logic), I do think that tactic
needlessly puts users in the crossfire. I can totally understand why
you'd do it. I am just suffering the crossfire in the meantime, and it
doesn't feel great.

wt
--
You're more amazing than you think!

2024-05-10 21:25:49

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: path to landing patch to fix warm boot issue for qca6390

Hi Wren,

On Fri, May 10, 2024 at 4:54 PM Wren Turkal <[email protected]> wrote:
>
> On 5/10/24 12:48 PM, Luiz Augusto von Dentz wrote:
> > Hi Wren,
> >
> > On Fri, May 10, 2024 at 3:14 PM Wren Turkal <[email protected]> wrote:
> >>
> >> On 5/6/24 12:49 PM, Luiz Augusto von Dentz wrote:
> >>> Hi Wren,
> >>>
> >>> On Mon, May 6, 2024 at 3:24 PM Wren Turkal <[email protected]> wrote:
> >>>>
> >>>> Krzysztof,
> >>>>
> >>>> I am reaching out to you as you had the most important objections to the
> >>>> change to fix qca6390 for the warm boot/module reload bug that I am
> >>>> experiencing.
> >>>>
> >>>> For context, the problem is that the hci_uart module will send specific
> >>>> vendor specfic commands during shutdown of the hardware under most
> >>>> situations. These VSCs put the bluetooth device into a non-working state
> >>>> on my Dell XPS 13 9310 with qca6390 bluetooth hardware.
> >>>>
> >>>> Zijun's proposed fix is to not send these commands when it's not
> >>>> appropriate for the hardware. The vendor commands should be avoided when
> >>>> the hardware does not have persistent configuration or when the device
> >>>> is in setup state (indicating that is has never been setup and should
> >>>> not be sent the VSCs on the shutdown path). This is what Zijun's patch
> >>>> implements.
> >>>>
> >>>> In addition, Zijun's change removes the influence of both
> >>>> the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts
> >>>> that those flags should not influence the sending of the VSCs in the
> >>>> shutdown path. If I understand KK's objections properly, this is where
> >>>> his objection is stemming from. KK, is this correct?
> >>>>
> >>>> Zijun's proposed fix can be seen here:
> >>>> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
> >>>>
> >>>> I'm wondering if we can resolve this impasse by splitting the change
> >>>> into two changes, as follows:
> >>>>
> >>>> 1. Change that removes the influence of the QCA_BT_OFF and HCI_RUNNING
> >>>> flags in the shutdown path.
> >>>> 2. Add the quirk from Zijun's patch that fixes my hardward configuration.
> >>>>
> >>>> I'm hoping that better clearer descriptions for #1 can help get that
> >>>> landed since the logic current appears to be at odds with how the
> >>>> hardware works.
> >>>>
> >>>> Also, I am happy to split the patches into the two patches, or (maybe
> >>>> more ideally) just modify the commit message to better indicate the
> >>>> reason the change. I just need guidance from maintainers so that
> >>>> whatever work I do leads to something acceptable for y'all.
> >>>>
> >>>> So, please help me get this done. I am just a user with broken hardware
> >>>> and a fondness for Linux. I would love to help do what's needed to get
> >>>> this fix landed.
> >>>
> >>> Ive also objected to that change, in fact the whole shutdown sequence
> >>> is sort of bogus in my opinion and the driver shall really have some
> >>> means to find out what mode it is in when it reboots, regardless if
> >>> cold or warm boot, since otherwise we are in trouble if the user is
> >>> booting from another OS that doesn't do the expected shutdown
> >>> sequence.
> >>
> >> This criticism makes a ton of sense. I'm sorry I missed it before. There
> >> were a lot of threads moving in parallel. However, I am curious. Given
> >> that the patch improves the situation for users (like me). Is there any
> >> way we can separate the redesign of the shutdown sequence and the UX
> >> improvement that comes with this patch?
> >>
> >> Here's my concern. I am happy to do the work to redesign this. However,
> >> I don't think I have the information needed to do this since I don't
> >> have access to the technical docs for the qca6390. I am worried that not
> >> accepting some form of this patch is letting perfect be the enemy of the
> >> good. And I am not sure how I personally can help with that. If you
> >> think it's possible for me to do this without the docs for the hardware,
> >> I am willing to give it a shot if I can get some guidance. Honestly, I
> >> wish I had the skill to be confident about a change like this, but I don't.
> >>
> >> Any ideas on how to move forward would be greatly appreciated.
> >>
> >> And just to be perfectly clear, I have tested this patch on my laptop.
> >> It greatly enhances my ability to use my hardware since I can reboot the
> >> machine without having to make sure to power cycle the laptop. This is
> >> not a theoretical improvement.
> >
> > I would really love some explanation why can't the driver know under
> > what mode the controller is when it gets probed, because to me we
> > cannot accept a driver that only works under certain condition after
> > the boot and in case it is really impossible, can't even power cycle
> > it to get it back to cold boot stage???
>
> This is a great technical criticism of the driver, and I think you
> deserve that explanation.
>
> However, with the driver already in the kernel, shouldn't the bias be
> toward mitigating the extremely bad UX and not hold users hostage for
> the bad design which has already been approved and landed in the kernel?
>
> > Also the criticism here should be directed to the vendor, how long
> > have we been discussing problems in the QCA driver? And the only thing
> > I see coming our way are work-arounds of the problems, the address not
> > being unique coming from the firmware itself and when provided via DT
> > the address is in the wrong byteorder and now that the driver must
> > communicate the firmware on shutdown in order to get it working
> > properly on the next boot.
>
> I agree that Qualcomm should get flack for this, however, the UX problem
> can be mitigated with a logic fix that doesn't make the init/shutdown
> design problem any worse than it currently seems to be. I mean, wouldn't
> this logic have to exist somewhere even if it weren't the shutdown path?
>
> If you are trying to use this as leverage to get Qualcomm to do a bigger
> thing (redesign the init/shutdown logic), I do think that tactic
> needlessly puts users in the crossfire. I can totally understand why
> you'd do it. I am just suffering the crossfire in the meantime, and it
> doesn't feel great.

So you prefer to risk getting a kernel crash due to UAF over Bluetooth
not working? Really? Because I haven't seen any configuration that
those changes you tested don't reintroduce the UAF, which is why I
haven't applied that change to begin with, so no I'm not holding back
to pressure Qualcomm to redesign the shutdown logic, it just these
things got entangled because I just realized the shutdown thingy is
really out of place, imo, but I'd be fine if there is a temporary fix
until someone finally decide to spend some time to really fix the
shutdown logic.


--
Luiz Augusto von Dentz

2024-05-10 22:58:40

by Paul Menzel

[permalink] [raw]
Subject: Re: path to landing patch to fix warm boot issue for qca6390

Dear Luiz,


Am 10.05.24 um 23:25 schrieb Luiz Augusto von Dentz:

> On Fri, May 10, 2024 at 4:54 PM Wren Turkal <[email protected]> wrote:
>>
>> On 5/10/24 12:48 PM, Luiz Augusto von Dentz wrote:

>>> On Fri, May 10, 2024 at 3:14 PM Wren Turkal <[email protected]> wrote:
>>>>
>>>> On 5/6/24 12:49 PM, Luiz Augusto von Dentz wrote:

>>>>> On Mon, May 6, 2024 at 3:24 PM Wren Turkal <[email protected]> wrote:
>>>>>>
>>>>>> Krzysztof,
>>>>>>
>>>>>> I am reaching out to you as you had the most important objections to the
>>>>>> change to fix qca6390 for the warm boot/module reload bug that I am
>>>>>> experiencing.
>>>>>>
>>>>>> For context, the problem is that the hci_uart module will send specific
>>>>>> vendor specfic commands during shutdown of the hardware under most
>>>>>> situations. These VSCs put the bluetooth device into a non-working state
>>>>>> on my Dell XPS 13 9310 with qca6390 bluetooth hardware.
>>>>>>
>>>>>> Zijun's proposed fix is to not send these commands when it's not
>>>>>> appropriate for the hardware. The vendor commands should be avoided when
>>>>>> the hardware does not have persistent configuration or when the device
>>>>>> is in setup state (indicating that is has never been setup and should
>>>>>> not be sent the VSCs on the shutdown path). This is what Zijun's patch
>>>>>> implements.
>>>>>>
>>>>>> In addition, Zijun's change removes the influence of both
>>>>>> the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts
>>>>>> that those flags should not influence the sending of the VSCs in the
>>>>>> shutdown path. If I understand KK's objections properly, this is where
>>>>>> his objection is stemming from. KK, is this correct?
>>>>>>
>>>>>> Zijun's proposed fix can be seen here:
>>>>>> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
>>>>>>
>>>>>> I'm wondering if we can resolve this impasse by splitting the change
>>>>>> into two changes, as follows:
>>>>>>
>>>>>> 1. Change that removes the influence of the QCA_BT_OFF and HCI_RUNNING
>>>>>> flags in the shutdown path.
>>>>>> 2. Add the quirk from Zijun's patch that fixes my hardward configuration.
>>>>>>
>>>>>> I'm hoping that better clearer descriptions for #1 can help get that
>>>>>> landed since the logic current appears to be at odds with how the
>>>>>> hardware works.
>>>>>>
>>>>>> Also, I am happy to split the patches into the two patches, or (maybe
>>>>>> more ideally) just modify the commit message to better indicate the
>>>>>> reason the change. I just need guidance from maintainers so that
>>>>>> whatever work I do leads to something acceptable for y'all.
>>>>>>
>>>>>> So, please help me get this done. I am just a user with broken hardware
>>>>>> and a fondness for Linux. I would love to help do what's needed to get
>>>>>> this fix landed.
>>>>>
>>>>> Ive also objected to that change, in fact the whole shutdown sequence
>>>>> is sort of bogus in my opinion and the driver shall really have some
>>>>> means to find out what mode it is in when it reboots, regardless if
>>>>> cold or warm boot, since otherwise we are in trouble if the user is
>>>>> booting from another OS that doesn't do the expected shutdown
>>>>> sequence.
>>>>
>>>> This criticism makes a ton of sense. I'm sorry I missed it before. There
>>>> were a lot of threads moving in parallel. However, I am curious. Given
>>>> that the patch improves the situation for users (like me). Is there any
>>>> way we can separate the redesign of the shutdown sequence and the UX
>>>> improvement that comes with this patch?
>>>>
>>>> Here's my concern. I am happy to do the work to redesign this. However,
>>>> I don't think I have the information needed to do this since I don't
>>>> have access to the technical docs for the qca6390. I am worried that not
>>>> accepting some form of this patch is letting perfect be the enemy of the
>>>> good. And I am not sure how I personally can help with that. If you
>>>> think it's possible for me to do this without the docs for the hardware,
>>>> I am willing to give it a shot if I can get some guidance. Honestly, I
>>>> wish I had the skill to be confident about a change like this, but I don't.
>>>>
>>>> Any ideas on how to move forward would be greatly appreciated.
>>>>
>>>> And just to be perfectly clear, I have tested this patch on my laptop.
>>>> It greatly enhances my ability to use my hardware since I can reboot the
>>>> machine without having to make sure to power cycle the laptop. This is
>>>> not a theoretical improvement.
>>>
>>> I would really love some explanation why can't the driver know under
>>> what mode the controller is when it gets probed, because to me we
>>> cannot accept a driver that only works under certain condition after
>>> the boot and in case it is really impossible, can't even power cycle
>>> it to get it back to cold boot stage???
>>
>> This is a great technical criticism of the driver, and I think you
>> deserve that explanation.
>>
>> However, with the driver already in the kernel, shouldn't the bias be
>> toward mitigating the extremely bad UX and not hold users hostage for
>> the bad design which has already been approved and landed in the kernel?
>>
>>> Also the criticism here should be directed to the vendor, how long
>>> have we been discussing problems in the QCA driver? And the only thing
>>> I see coming our way are work-arounds of the problems, the address not
>>> being unique coming from the firmware itself and when provided via DT
>>> the address is in the wrong byteorder and now that the driver must
>>> communicate the firmware on shutdown in order to get it working
>>> properly on the next boot.
>>
>> I agree that Qualcomm should get flack for this, however, the UX problem
>> can be mitigated with a logic fix that doesn't make the init/shutdown
>> design problem any worse than it currently seems to be. I mean, wouldn't
>> this logic have to exist somewhere even if it weren't the shutdown path?
>>
>> If you are trying to use this as leverage to get Qualcomm to do a bigger
>> thing (redesign the init/shutdown logic), I do think that tactic
>> needlessly puts users in the crossfire. I can totally understand why
>> you'd do it. I am just suffering the crossfire in the meantime, and it
>> doesn't feel great.
>
> So you prefer to risk getting a kernel crash due to UAF over Bluetooth
> not working? Really? Because I haven't seen any configuration that
> those changes you tested don't reintroduce the UAF, which is why I
> haven't applied that change to begin with, so no I'm not holding back
> to pressure Qualcomm to redesign the shutdown logic, it just these
> things got entangled because I just realized the shutdown thingy is
> really out of place, imo, but I'd be fine if there is a temporary fix
> until someone finally decide to spend some time to really fix the
> shutdown logic.

The no-regression policy is clear to not cause regressions for users.
Even if we had crash reports from before, fixing one bug and causing
another is unwanted to my understanding.


Kind regards,

Paul

2024-05-10 23:21:10

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: path to landing patch to fix warm boot issue for qca6390

Hi Paul,

On Fri, May 10, 2024 at 6:57 PM Paul Menzel <[email protected]> wrote:
>
> Dear Luiz,
>
>
> Am 10.05.24 um 23:25 schrieb Luiz Augusto von Dentz:
>
> > On Fri, May 10, 2024 at 4:54 PM Wren Turkal <[email protected]> wrote:
> >>
> >> On 5/10/24 12:48 PM, Luiz Augusto von Dentz wrote:
>
> >>> On Fri, May 10, 2024 at 3:14 PM Wren Turkal <[email protected]> wrote:
> >>>>
> >>>> On 5/6/24 12:49 PM, Luiz Augusto von Dentz wrote:
>
> >>>>> On Mon, May 6, 2024 at 3:24 PM Wren Turkal <[email protected]> wrote:
> >>>>>>
> >>>>>> Krzysztof,
> >>>>>>
> >>>>>> I am reaching out to you as you had the most important objections to the
> >>>>>> change to fix qca6390 for the warm boot/module reload bug that I am
> >>>>>> experiencing.
> >>>>>>
> >>>>>> For context, the problem is that the hci_uart module will send specific
> >>>>>> vendor specfic commands during shutdown of the hardware under most
> >>>>>> situations. These VSCs put the bluetooth device into a non-working state
> >>>>>> on my Dell XPS 13 9310 with qca6390 bluetooth hardware.
> >>>>>>
> >>>>>> Zijun's proposed fix is to not send these commands when it's not
> >>>>>> appropriate for the hardware. The vendor commands should be avoided when
> >>>>>> the hardware does not have persistent configuration or when the device
> >>>>>> is in setup state (indicating that is has never been setup and should
> >>>>>> not be sent the VSCs on the shutdown path). This is what Zijun's patch
> >>>>>> implements.
> >>>>>>
> >>>>>> In addition, Zijun's change removes the influence of both
> >>>>>> the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts
> >>>>>> that those flags should not influence the sending of the VSCs in the
> >>>>>> shutdown path. If I understand KK's objections properly, this is where
> >>>>>> his objection is stemming from. KK, is this correct?
> >>>>>>
> >>>>>> Zijun's proposed fix can be seen here:
> >>>>>> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
> >>>>>>
> >>>>>> I'm wondering if we can resolve this impasse by splitting the change
> >>>>>> into two changes, as follows:
> >>>>>>
> >>>>>> 1. Change that removes the influence of the QCA_BT_OFF and HCI_RUNNING
> >>>>>> flags in the shutdown path.
> >>>>>> 2. Add the quirk from Zijun's patch that fixes my hardward configuration.
> >>>>>>
> >>>>>> I'm hoping that better clearer descriptions for #1 can help get that
> >>>>>> landed since the logic current appears to be at odds with how the
> >>>>>> hardware works.
> >>>>>>
> >>>>>> Also, I am happy to split the patches into the two patches, or (maybe
> >>>>>> more ideally) just modify the commit message to better indicate the
> >>>>>> reason the change. I just need guidance from maintainers so that
> >>>>>> whatever work I do leads to something acceptable for y'all.
> >>>>>>
> >>>>>> So, please help me get this done. I am just a user with broken hardware
> >>>>>> and a fondness for Linux. I would love to help do what's needed to get
> >>>>>> this fix landed.
> >>>>>
> >>>>> Ive also objected to that change, in fact the whole shutdown sequence
> >>>>> is sort of bogus in my opinion and the driver shall really have some
> >>>>> means to find out what mode it is in when it reboots, regardless if
> >>>>> cold or warm boot, since otherwise we are in trouble if the user is
> >>>>> booting from another OS that doesn't do the expected shutdown
> >>>>> sequence.
> >>>>
> >>>> This criticism makes a ton of sense. I'm sorry I missed it before. There
> >>>> were a lot of threads moving in parallel. However, I am curious. Given
> >>>> that the patch improves the situation for users (like me). Is there any
> >>>> way we can separate the redesign of the shutdown sequence and the UX
> >>>> improvement that comes with this patch?
> >>>>
> >>>> Here's my concern. I am happy to do the work to redesign this. However,
> >>>> I don't think I have the information needed to do this since I don't
> >>>> have access to the technical docs for the qca6390. I am worried that not
> >>>> accepting some form of this patch is letting perfect be the enemy of the
> >>>> good. And I am not sure how I personally can help with that. If you
> >>>> think it's possible for me to do this without the docs for the hardware,
> >>>> I am willing to give it a shot if I can get some guidance. Honestly, I
> >>>> wish I had the skill to be confident about a change like this, but I don't.
> >>>>
> >>>> Any ideas on how to move forward would be greatly appreciated.
> >>>>
> >>>> And just to be perfectly clear, I have tested this patch on my laptop.
> >>>> It greatly enhances my ability to use my hardware since I can reboot the
> >>>> machine without having to make sure to power cycle the laptop. This is
> >>>> not a theoretical improvement.
> >>>
> >>> I would really love some explanation why can't the driver know under
> >>> what mode the controller is when it gets probed, because to me we
> >>> cannot accept a driver that only works under certain condition after
> >>> the boot and in case it is really impossible, can't even power cycle
> >>> it to get it back to cold boot stage???
> >>
> >> This is a great technical criticism of the driver, and I think you
> >> deserve that explanation.
> >>
> >> However, with the driver already in the kernel, shouldn't the bias be
> >> toward mitigating the extremely bad UX and not hold users hostage for
> >> the bad design which has already been approved and landed in the kernel?
> >>
> >>> Also the criticism here should be directed to the vendor, how long
> >>> have we been discussing problems in the QCA driver? And the only thing
> >>> I see coming our way are work-arounds of the problems, the address not
> >>> being unique coming from the firmware itself and when provided via DT
> >>> the address is in the wrong byteorder and now that the driver must
> >>> communicate the firmware on shutdown in order to get it working
> >>> properly on the next boot.
> >>
> >> I agree that Qualcomm should get flack for this, however, the UX problem
> >> can be mitigated with a logic fix that doesn't make the init/shutdown
> >> design problem any worse than it currently seems to be. I mean, wouldn't
> >> this logic have to exist somewhere even if it weren't the shutdown path?
> >>
> >> If you are trying to use this as leverage to get Qualcomm to do a bigger
> >> thing (redesign the init/shutdown logic), I do think that tactic
> >> needlessly puts users in the crossfire. I can totally understand why
> >> you'd do it. I am just suffering the crossfire in the meantime, and it
> >> doesn't feel great.
> >
> > So you prefer to risk getting a kernel crash due to UAF over Bluetooth
> > not working? Really? Because I haven't seen any configuration that
> > those changes you tested don't reintroduce the UAF, which is why I
> > haven't applied that change to begin with, so no I'm not holding back
> > to pressure Qualcomm to redesign the shutdown logic, it just these
> > things got entangled because I just realized the shutdown thingy is
> > really out of place, imo, but I'd be fine if there is a temporary fix
> > until someone finally decide to spend some time to really fix the
> > shutdown logic.
>
> The no-regression policy is clear to not cause regressions for users.
> Even if we had crash reports from before, fixing one bug and causing
> another is unwanted to my understanding.

So you are also suggesting that a system crash is less of a problem
then a regression? I'm not opposed to having a fix to the regression,
but it needs to come without the reintroduction of a system crash,
period. The no-regression policy only applies as long as it doesn't
cause even bigger problems like a system crash or a security problem
for example, otherwise we need to find a better option, which in my
opinion is to rework the driver shutdown logic, if I had the hardware
and could experiment with I would just attempt to detect the hardware
mode at probe and if that fails force a reset and start over since the
setup stage exists exactly for the driver to have exclusive access to
the controller it can in theory power cycle the controller.

>
> Kind regards,
>
> Paul



--
Luiz Augusto von Dentz

2024-05-10 23:33:53

by Wren Turkal

[permalink] [raw]
Subject: Re: path to landing patch to fix warm boot issue for qca6390

On 5/10/24 2:25 PM, Luiz Augusto von Dentz wrote:
> Hi Wren,
>
> On Fri, May 10, 2024 at 4:54 PM Wren Turkal <[email protected]> wrote:
>>
>> On 5/10/24 12:48 PM, Luiz Augusto von Dentz wrote:
>>> Hi Wren,
>>>
>>> On Fri, May 10, 2024 at 3:14 PM Wren Turkal <[email protected]> wrote:
>>>>
>>>> On 5/6/24 12:49 PM, Luiz Augusto von Dentz wrote:
>>>>> Hi Wren,
>>>>>
>>>>> On Mon, May 6, 2024 at 3:24 PM Wren Turkal <[email protected]> wrote:
>>>>>>
>>>>>> Krzysztof,
>>>>>>
>>>>>> I am reaching out to you as you had the most important objections to the
>>>>>> change to fix qca6390 for the warm boot/module reload bug that I am
>>>>>> experiencing.
>>>>>>
>>>>>> For context, the problem is that the hci_uart module will send specific
>>>>>> vendor specfic commands during shutdown of the hardware under most
>>>>>> situations. These VSCs put the bluetooth device into a non-working state
>>>>>> on my Dell XPS 13 9310 with qca6390 bluetooth hardware.
>>>>>>
>>>>>> Zijun's proposed fix is to not send these commands when it's not
>>>>>> appropriate for the hardware. The vendor commands should be avoided when
>>>>>> the hardware does not have persistent configuration or when the device
>>>>>> is in setup state (indicating that is has never been setup and should
>>>>>> not be sent the VSCs on the shutdown path). This is what Zijun's patch
>>>>>> implements.
>>>>>>
>>>>>> In addition, Zijun's change removes the influence of both
>>>>>> the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts
>>>>>> that those flags should not influence the sending of the VSCs in the
>>>>>> shutdown path. If I understand KK's objections properly, this is where
>>>>>> his objection is stemming from. KK, is this correct?
>>>>>>
>>>>>> Zijun's proposed fix can be seen here:
>>>>>> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
>>>>>>
>>>>>> I'm wondering if we can resolve this impasse by splitting the change
>>>>>> into two changes, as follows:
>>>>>>
>>>>>> 1. Change that removes the influence of the QCA_BT_OFF and HCI_RUNNING
>>>>>> flags in the shutdown path.
>>>>>> 2. Add the quirk from Zijun's patch that fixes my hardward configuration.
>>>>>>
>>>>>> I'm hoping that better clearer descriptions for #1 can help get that
>>>>>> landed since the logic current appears to be at odds with how the
>>>>>> hardware works.
>>>>>>
>>>>>> Also, I am happy to split the patches into the two patches, or (maybe
>>>>>> more ideally) just modify the commit message to better indicate the
>>>>>> reason the change. I just need guidance from maintainers so that
>>>>>> whatever work I do leads to something acceptable for y'all.
>>>>>>
>>>>>> So, please help me get this done. I am just a user with broken hardware
>>>>>> and a fondness for Linux. I would love to help do what's needed to get
>>>>>> this fix landed.
>>>>>
>>>>> Ive also objected to that change, in fact the whole shutdown sequence
>>>>> is sort of bogus in my opinion and the driver shall really have some
>>>>> means to find out what mode it is in when it reboots, regardless if
>>>>> cold or warm boot, since otherwise we are in trouble if the user is
>>>>> booting from another OS that doesn't do the expected shutdown
>>>>> sequence.
>>>>
>>>> This criticism makes a ton of sense. I'm sorry I missed it before. There
>>>> were a lot of threads moving in parallel. However, I am curious. Given
>>>> that the patch improves the situation for users (like me). Is there any
>>>> way we can separate the redesign of the shutdown sequence and the UX
>>>> improvement that comes with this patch?
>>>>
>>>> Here's my concern. I am happy to do the work to redesign this. However,
>>>> I don't think I have the information needed to do this since I don't
>>>> have access to the technical docs for the qca6390. I am worried that not
>>>> accepting some form of this patch is letting perfect be the enemy of the
>>>> good. And I am not sure how I personally can help with that. If you
>>>> think it's possible for me to do this without the docs for the hardware,
>>>> I am willing to give it a shot if I can get some guidance. Honestly, I
>>>> wish I had the skill to be confident about a change like this, but I don't.
>>>>
>>>> Any ideas on how to move forward would be greatly appreciated.
>>>>
>>>> And just to be perfectly clear, I have tested this patch on my laptop.
>>>> It greatly enhances my ability to use my hardware since I can reboot the
>>>> machine without having to make sure to power cycle the laptop. This is
>>>> not a theoretical improvement.
>>>
>>> I would really love some explanation why can't the driver know under
>>> what mode the controller is when it gets probed, because to me we
>>> cannot accept a driver that only works under certain condition after
>>> the boot and in case it is really impossible, can't even power cycle
>>> it to get it back to cold boot stage???
>>
>> This is a great technical criticism of the driver, and I think you
>> deserve that explanation.
>>
>> However, with the driver already in the kernel, shouldn't the bias be
>> toward mitigating the extremely bad UX and not hold users hostage for
>> the bad design which has already been approved and landed in the kernel?
>>
>>> Also the criticism here should be directed to the vendor, how long
>>> have we been discussing problems in the QCA driver? And the only thing
>>> I see coming our way are work-arounds of the problems, the address not
>>> being unique coming from the firmware itself and when provided via DT
>>> the address is in the wrong byteorder and now that the driver must
>>> communicate the firmware on shutdown in order to get it working
>>> properly on the next boot.
>>
>> I agree that Qualcomm should get flack for this, however, the UX problem
>> can be mitigated with a logic fix that doesn't make the init/shutdown
>> design problem any worse than it currently seems to be. I mean, wouldn't
>> this logic have to exist somewhere even if it weren't the shutdown path?
>>
>> If you are trying to use this as leverage to get Qualcomm to do a bigger
>> thing (redesign the init/shutdown logic), I do think that tactic
>> needlessly puts users in the crossfire. I can totally understand why
>> you'd do it. I am just suffering the crossfire in the meantime, and it
>> doesn't feel great.
>
> So you prefer to risk getting a kernel crash due to UAF over Bluetooth
> not working? Really? Because I haven't seen any configuration that
> those changes you tested don't reintroduce the UAF, which is why I
> haven't applied that change to begin with, so no I'm not holding back
> to pressure Qualcomm to redesign the shutdown logic, it just these
> things got entangled because I just realized the shutdown thingy is
> really out of place, imo, but I'd be fine if there is a temporary fix
> until someone finally decide to spend some time to really fix the
> shutdown logic.

Luiz, I'm sorry. I do not want a crash instead. I didn't understand that
the solution I proposed (i.e. adding Zijun's logic without removing KK's
logic) would introduce a new crash opportunity. I previously thought you
were saying one of the following things:
1. The crash opportunity already existed due the init/shudown sequences.
2. The crash opportunity already existed due the init/shudown sequences
when removing KK's logic.

If it was #1, I was hoping that adding the logic would make the risk no
worse.

If it was #2, I was hoping that my suggestion of adding Zijun's logic
without removing KK's logic might represent an acceptable middleground
for a temporary fix that would "correct" the logic without introducing a
new crash opportunity.

I feel like I may not be clear about what I mean by adding Zijun's logic
and not removing KK's logic. Maybe something like this diff:

diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index 2f7ae38d85eb..fcac44ae7898 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -2456,6 +2456,10 @@ static void qca_serdev_shutdown(struct device *dev)
!test_bit(HCI_RUNNING, &hdev->flags))
return;

+ if (test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP,
&hdev->quirks) ||
+ hci_dev_test_flag(hdev, HCI_SETUP))
+ return;
+
serdev_device_write_flush(serdev);
ret = serdev_device_write_buf(serdev, ibs_wake_cmd,
sizeof(ibs_wake_cmd));

I think this diff is mangled due to using Thunderbird, but I hope this
helps convey what I was asking about.

If I am understanding you correctly now, you are saying that simply
introducing Zijun's logic (without removing KK's logic) will introduce a
new crash opportunity. Is that correct?

Thanks,
wt
--
You're more amazing than you think!

2024-05-11 06:27:37

by Lk Sii

[permalink] [raw]
Subject: Re: path to landing patch to fix warm boot issue for qca6390



On 2024/5/11 07:33, Wren Turkal wrote:
> On 5/10/24 2:25 PM, Luiz Augusto von Dentz wrote:
>> Hi Wren,
>>
>> On Fri, May 10, 2024 at 4:54 PM Wren Turkal <[email protected]> wrote:
>>>
>>> On 5/10/24 12:48 PM, Luiz Augusto von Dentz wrote:
>>>> Hi Wren,
>>>>
>>>> On Fri, May 10, 2024 at 3:14 PM Wren Turkal <[email protected]>
>>>> wrote:
>>>>>
>>>>> On 5/6/24 12:49 PM, Luiz Augusto von Dentz wrote:
>>>>>> Hi Wren,
>>>>>>
>>>>>> On Mon, May 6, 2024 at 3:24 PM Wren Turkal <[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> Krzysztof,
>>>>>>>
>>>>>>> I am reaching out to you as you had the most important objections
>>>>>>> to the
>>>>>>> change to fix qca6390 for the warm boot/module reload bug that I am
>>>>>>> experiencing.
>>>>>>>
>>>>>>> For context, the problem is that the hci_uart module will send
>>>>>>> specific
>>>>>>> vendor specfic commands during shutdown of the hardware under most
>>>>>>> situations. These VSCs put the bluetooth device into a
>>>>>>> non-working state
>>>>>>> on my Dell XPS 13 9310 with qca6390 bluetooth hardware.
>>>>>>>
>>>>>>> Zijun's proposed fix is to not send these commands when it's not
>>>>>>> appropriate for the hardware. The vendor commands should be
>>>>>>> avoided when
>>>>>>> the hardware does not have persistent configuration or when the
>>>>>>> device
>>>>>>> is in setup state (indicating that is has never been setup and
>>>>>>> should
>>>>>>> not be sent the VSCs on the shutdown path). This is what Zijun's
>>>>>>> patch
>>>>>>> implements.
>>>>>>>
>>>>>>> In addition, Zijun's change removes the influence of both
>>>>>>> the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts
>>>>>>> that those flags should not influence the sending of the VSCs in the
>>>>>>> shutdown path. If I understand KK's objections properly, this is
>>>>>>> where
>>>>>>> his objection is stemming from. KK, is this correct?
>>>>>>>
>>>>>>> Zijun's proposed fix can be seen here:
>>>>>>> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
>>>>>>>
>>>>>>> I'm wondering if we can resolve this impasse by splitting the change
>>>>>>> into two changes, as follows:
>>>>>>>
>>>>>>> 1. Change that removes the influence of the QCA_BT_OFF and
>>>>>>> HCI_RUNNING
>>>>>>> flags in the shutdown path.
>>>>>>> 2. Add the quirk from Zijun's patch that fixes my hardward
>>>>>>> configuration.
>>>>>>>
>>>>>>> I'm hoping that better clearer descriptions for #1 can help get that
>>>>>>> landed since the logic current appears to be at odds with how the
>>>>>>> hardware works.
>>>>>>>
>>>>>>> Also, I am happy to split the patches into the two patches, or
>>>>>>> (maybe
>>>>>>> more ideally) just modify the commit message to better indicate the
>>>>>>> reason the change. I just need guidance from maintainers so that
>>>>>>> whatever work I do leads to something acceptable for y'all.
>>>>>>>
>>>>>>> So, please help me get this done. I am just a user with broken
>>>>>>> hardware
>>>>>>> and a fondness for Linux. I would love to help do what's needed
>>>>>>> to get
>>>>>>> this fix landed.
>>>>>>
>>>>>> Ive also objected to that change, in fact the whole shutdown sequence
>>>>>> is sort of bogus in my opinion and the driver shall really have some
>>>>>> means to find out what mode it is in when it reboots, regardless if
>>>>>> cold or warm boot, since otherwise we are in trouble if the user is
>>>>>> booting from another OS that doesn't do the expected shutdown
>>>>>> sequence.
>>>>>
>>>>> This criticism makes a ton of sense. I'm sorry I missed it before.
>>>>> There
>>>>> were a lot of threads moving in parallel. However, I am curious. Given
>>>>> that the patch improves the situation for users (like me). Is there
>>>>> any
>>>>> way we can separate the redesign of the shutdown sequence and the UX
>>>>> improvement that comes with this patch?
>>>>>
>>>>> Here's my concern. I am happy to do the work to redesign this.
>>>>> However,
>>>>> I don't think I have the information needed to do this since I don't
>>>>> have access to the technical docs for the qca6390. I am worried
>>>>> that not
>>>>> accepting some form of this patch is letting perfect be the enemy
>>>>> of the
>>>>> good. And I am not sure how I personally can help with that. If you
>>>>> think it's possible for me to do this without the docs for the
>>>>> hardware,
>>>>> I am willing to give it a shot if I can get some guidance. Honestly, I
>>>>> wish I had the skill to be confident about a change like this, but
>>>>> I don't.
>>>>>
>>>>> Any ideas on how to move forward would be greatly appreciated.
>>>>>
>>>>> And just to be perfectly clear, I have tested this patch on my laptop.
>>>>> It greatly enhances my ability to use my hardware since I can
>>>>> reboot the
>>>>> machine without having to make sure to power cycle the laptop. This is
>>>>> not a theoretical improvement.
>>>>
>>>> I would really love some explanation why can't the driver know under
>>>> what mode the controller is when it gets probed, because to me we
>>>> cannot accept a driver that only works under certain condition after
>>>> the boot and in case it is really impossible, can't even power cycle
>>>> it to get it back to cold boot stage???
>>>
>>> This is a great technical criticism of the driver, and I think you
>>> deserve that explanation.
>>>
>>> However, with the driver already in the kernel, shouldn't the bias be
>>> toward mitigating the extremely bad UX and not hold users hostage for
>>> the bad design which has already been approved and landed in the kernel?
>>>
>>>> Also the criticism here should be directed to the vendor, how long
>>>> have we been discussing problems in the QCA driver? And the only thing
>>>> I see coming our way are work-arounds of the problems, the address not
>>>> being unique coming from the firmware itself and when provided via DT
>>>> the address is in the wrong byteorder and now that the driver must
>>>> communicate the firmware on shutdown in order to get it working
>>>> properly on the next boot.
>>>
>>> I agree that Qualcomm should get flack for this, however, the UX problem
>>> can be mitigated with a logic fix that doesn't make the init/shutdown
>>> design problem any worse than it currently seems to be. I mean, wouldn't
>>> this logic have to exist somewhere even if it weren't the shutdown path?
>>>
>>> If you are trying to use this as leverage to get Qualcomm to do a bigger
>>> thing (redesign the init/shutdown logic), I do think that tactic
>>> needlessly puts users in the crossfire. I can totally understand why
>>> you'd do it. I am just suffering the crossfire in the meantime, and it
>>> doesn't feel great.
>>
>> So you prefer to risk getting a kernel crash due to UAF over Bluetooth
>> not working? Really? Because I haven't seen any configuration that
>> those changes you tested don't reintroduce the UAF, which is why I
>> haven't applied that change to begin with, so no I'm not holding back
>> to pressure Qualcomm to redesign the shutdown logic, it just these
>> things got entangled because I just realized the shutdown thingy is
>> really out of place, imo, but I'd be fine if there is a temporary fix
>> until someone finally decide to spend some time to really fix the
>> shutdown logic.
>
> Luiz, I'm sorry. I do not want a crash instead. I didn't understand that
> the solution I proposed (i.e. adding Zijun's logic without removing KK's
> logic) would introduce a new crash opportunity. I previously thought you
> were saying one of the following things:
> 1. The crash opportunity already existed due the init/shudown sequences.
> 2. The crash opportunity already existed due the init/shudown sequences
> when removing KK's logic.
>
> If it was #1, I was hoping that adding the logic would make the risk no
> worse.
>
> If it was #2, I was hoping that my suggestion of adding Zijun's logic
> without removing KK's logic might represent an acceptable middleground
> for a temporary fix that would "correct" the logic without introducing a
> new crash opportunity.
>
> I feel like I may not be clear about what I mean by adding Zijun's logic
> and not removing KK's logic. Maybe something like this diff:
>
> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
> index 2f7ae38d85eb..fcac44ae7898 100644
> --- a/drivers/bluetooth/hci_qca.c
> +++ b/drivers/bluetooth/hci_qca.c
> @@ -2456,6 +2456,10 @@ static void qca_serdev_shutdown(struct device *dev)
>                      !test_bit(HCI_RUNNING, &hdev->flags))
>                          return;
>
> +               if (test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP,
> &hdev->quirks) ||
> +                   hci_dev_test_flag(hdev, HCI_SETUP))
> +                       return;
> +
>                  serdev_device_write_flush(serdev);
>                  ret = serdev_device_write_buf(serdev, ibs_wake_cmd,
>                                                sizeof(ibs_wake_cmd));
>
> I think this diff is mangled due to using Thunderbird, but I hope this
> helps convey what I was asking about.
>
> If I am understanding you correctly now, you are saying that simply
> introducing Zijun's logic (without removing KK's logic) will introduce a
> new crash opportunity. Is that correct?
>

as Zijun declared. i believe Zijun's change will solve both this
reported regression issue and the use-after-free(crash) issue.

> Thanks,
> wt


2024-05-13 20:13:57

by Wren Turkal

[permalink] [raw]
Subject: Re: path to landing patch to fix warm boot issue for qca6390

On 5/10/24 11:25 PM, Lk Sii wrote:
> On 2024/5/11 07:33, Wren Turkal wrote:
>> On 5/10/24 2:25 PM, Luiz Augusto von Dentz wrote:
>>> Hi Wren,
>>>
>>> On Fri, May 10, 2024 at 4:54 PM Wren Turkal <[email protected]> wrote:
>>>>
>>>> On 5/10/24 12:48 PM, Luiz Augusto von Dentz wrote:
>>>>> Hi Wren,
>>>>>
>>>>> On Fri, May 10, 2024 at 3:14 PM Wren Turkal <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>> On 5/6/24 12:49 PM, Luiz Augusto von Dentz wrote:
>>>>>>> Hi Wren,
>>>>>>>
>>>>>>> On Mon, May 6, 2024 at 3:24 PM Wren Turkal <[email protected]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Krzysztof,
>>>>>>>>
>>>>>>>> I am reaching out to you as you had the most important objections
>>>>>>>> to the
>>>>>>>> change to fix qca6390 for the warm boot/module reload bug that I am
>>>>>>>> experiencing.
>>>>>>>>
>>>>>>>> For context, the problem is that the hci_uart module will send
>>>>>>>> specific
>>>>>>>> vendor specfic commands during shutdown of the hardware under most
>>>>>>>> situations. These VSCs put the bluetooth device into a
>>>>>>>> non-working state
>>>>>>>> on my Dell XPS 13 9310 with qca6390 bluetooth hardware.
>>>>>>>>
>>>>>>>> Zijun's proposed fix is to not send these commands when it's not
>>>>>>>> appropriate for the hardware. The vendor commands should be
>>>>>>>> avoided when
>>>>>>>> the hardware does not have persistent configuration or when the
>>>>>>>> device
>>>>>>>> is in setup state (indicating that is has never been setup and
>>>>>>>> should
>>>>>>>> not be sent the VSCs on the shutdown path). This is what Zijun's
>>>>>>>> patch
>>>>>>>> implements.
>>>>>>>>
>>>>>>>> In addition, Zijun's change removes the influence of both
>>>>>>>> the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts
>>>>>>>> that those flags should not influence the sending of the VSCs in the
>>>>>>>> shutdown path. If I understand KK's objections properly, this is
>>>>>>>> where
>>>>>>>> his objection is stemming from. KK, is this correct?
>>>>>>>>
>>>>>>>> Zijun's proposed fix can be seen here:
>>>>>>>> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
>>>>>>>>
>>>>>>>> I'm wondering if we can resolve this impasse by splitting the change
>>>>>>>> into two changes, as follows:
>>>>>>>>
>>>>>>>> 1. Change that removes the influence of the QCA_BT_OFF and
>>>>>>>> HCI_RUNNING
>>>>>>>> flags in the shutdown path.
>>>>>>>> 2. Add the quirk from Zijun's patch that fixes my hardward
>>>>>>>> configuration.
>>>>>>>>
>>>>>>>> I'm hoping that better clearer descriptions for #1 can help get that
>>>>>>>> landed since the logic current appears to be at odds with how the
>>>>>>>> hardware works.
>>>>>>>>
>>>>>>>> Also, I am happy to split the patches into the two patches, or
>>>>>>>> (maybe
>>>>>>>> more ideally) just modify the commit message to better indicate the
>>>>>>>> reason the change. I just need guidance from maintainers so that
>>>>>>>> whatever work I do leads to something acceptable for y'all.
>>>>>>>>
>>>>>>>> So, please help me get this done. I am just a user with broken
>>>>>>>> hardware
>>>>>>>> and a fondness for Linux. I would love to help do what's needed
>>>>>>>> to get
>>>>>>>> this fix landed.
>>>>>>>
>>>>>>> Ive also objected to that change, in fact the whole shutdown sequence
>>>>>>> is sort of bogus in my opinion and the driver shall really have some
>>>>>>> means to find out what mode it is in when it reboots, regardless if
>>>>>>> cold or warm boot, since otherwise we are in trouble if the user is
>>>>>>> booting from another OS that doesn't do the expected shutdown
>>>>>>> sequence.
>>>>>>
>>>>>> This criticism makes a ton of sense. I'm sorry I missed it before.
>>>>>> There
>>>>>> were a lot of threads moving in parallel. However, I am curious. Given
>>>>>> that the patch improves the situation for users (like me). Is there
>>>>>> any
>>>>>> way we can separate the redesign of the shutdown sequence and the UX
>>>>>> improvement that comes with this patch?
>>>>>>
>>>>>> Here's my concern. I am happy to do the work to redesign this.
>>>>>> However,
>>>>>> I don't think I have the information needed to do this since I don't
>>>>>> have access to the technical docs for the qca6390. I am worried
>>>>>> that not
>>>>>> accepting some form of this patch is letting perfect be the enemy
>>>>>> of the
>>>>>> good. And I am not sure how I personally can help with that. If you
>>>>>> think it's possible for me to do this without the docs for the
>>>>>> hardware,
>>>>>> I am willing to give it a shot if I can get some guidance. Honestly, I
>>>>>> wish I had the skill to be confident about a change like this, but
>>>>>> I don't.
>>>>>>
>>>>>> Any ideas on how to move forward would be greatly appreciated.
>>>>>>
>>>>>> And just to be perfectly clear, I have tested this patch on my laptop.
>>>>>> It greatly enhances my ability to use my hardware since I can
>>>>>> reboot the
>>>>>> machine without having to make sure to power cycle the laptop. This is
>>>>>> not a theoretical improvement.
>>>>>
>>>>> I would really love some explanation why can't the driver know under
>>>>> what mode the controller is when it gets probed, because to me we
>>>>> cannot accept a driver that only works under certain condition after
>>>>> the boot and in case it is really impossible, can't even power cycle
>>>>> it to get it back to cold boot stage???
>>>>
>>>> This is a great technical criticism of the driver, and I think you
>>>> deserve that explanation.
>>>>
>>>> However, with the driver already in the kernel, shouldn't the bias be
>>>> toward mitigating the extremely bad UX and not hold users hostage for
>>>> the bad design which has already been approved and landed in the kernel?
>>>>
>>>>> Also the criticism here should be directed to the vendor, how long
>>>>> have we been discussing problems in the QCA driver? And the only thing
>>>>> I see coming our way are work-arounds of the problems, the address not
>>>>> being unique coming from the firmware itself and when provided via DT
>>>>> the address is in the wrong byteorder and now that the driver must
>>>>> communicate the firmware on shutdown in order to get it working
>>>>> properly on the next boot.
>>>>
>>>> I agree that Qualcomm should get flack for this, however, the UX problem
>>>> can be mitigated with a logic fix that doesn't make the init/shutdown
>>>> design problem any worse than it currently seems to be. I mean, wouldn't
>>>> this logic have to exist somewhere even if it weren't the shutdown path?
>>>>
>>>> If you are trying to use this as leverage to get Qualcomm to do a bigger
>>>> thing (redesign the init/shutdown logic), I do think that tactic
>>>> needlessly puts users in the crossfire. I can totally understand why
>>>> you'd do it. I am just suffering the crossfire in the meantime, and it
>>>> doesn't feel great.
>>>
>>> So you prefer to risk getting a kernel crash due to UAF over Bluetooth
>>> not working? Really? Because I haven't seen any configuration that
>>> those changes you tested don't reintroduce the UAF, which is why I
>>> haven't applied that change to begin with, so no I'm not holding back
>>> to pressure Qualcomm to redesign the shutdown logic, it just these
>>> things got entangled because I just realized the shutdown thingy is
>>> really out of place, imo, but I'd be fine if there is a temporary fix
>>> until someone finally decide to spend some time to really fix the
>>> shutdown logic.
>>
>> Luiz, I'm sorry. I do not want a crash instead. I didn't understand that
>> the solution I proposed (i.e. adding Zijun's logic without removing KK's
>> logic) would introduce a new crash opportunity. I previously thought you
>> were saying one of the following things:
>> 1. The crash opportunity already existed due the init/shudown sequences.
>> 2. The crash opportunity already existed due the init/shudown sequences
>> when removing KK's logic.
>>
>> If it was #1, I was hoping that adding the logic would make the risk no
>> worse.
>>
>> If it was #2, I was hoping that my suggestion of adding Zijun's logic
>> without removing KK's logic might represent an acceptable middleground
>> for a temporary fix that would "correct" the logic without introducing a
>> new crash opportunity.
>>
>> I feel like I may not be clear about what I mean by adding Zijun's logic
>> and not removing KK's logic. Maybe something like this diff:
>>
>> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
>> index 2f7ae38d85eb..fcac44ae7898 100644
>> --- a/drivers/bluetooth/hci_qca.c
>> +++ b/drivers/bluetooth/hci_qca.c
>> @@ -2456,6 +2456,10 @@ static void qca_serdev_shutdown(struct device *dev)
>>                      !test_bit(HCI_RUNNING, &hdev->flags))
>>                          return;
>>
>> +               if (test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP,
>> &hdev->quirks) ||
>> +                   hci_dev_test_flag(hdev, HCI_SETUP))
>> +                       return;
>> +
>>                  serdev_device_write_flush(serdev);
>>                  ret = serdev_device_write_buf(serdev, ibs_wake_cmd,
>>                                                sizeof(ibs_wake_cmd));
>>
>> I think this diff is mangled due to using Thunderbird, but I hope this
>> helps convey what I was asking about.
>>
>> If I am understanding you correctly now, you are saying that simply
>> introducing Zijun's logic (without removing KK's logic) will introduce a
>> new crash opportunity. Is that correct?
>>
>
> as Zijun declared. i believe Zijun's change will solve both this
> reported regression issue and the use-after-free(crash) issue.

I did see Zijun's claim of that. However, I think that both KK and Luiz
are not convinced by the explanation. Also, if that explanation does
convince KK and Luiz, I think that the explanation needs to be added to
the commit message.

I'm hoping that Luiz will at least respond to the middleground I
proposed as a workaround.

Luiz and KK, I also want y'all to know that I really appreciate your
striving to make this work well and taking the time to explain some of
it to me. I know I must seem like a pest at this point, and I am sorry
for that. I do wish I had more ability to propose a workable fix, both
for the reason that I am personally being hurt by this problem and the
fact that I think it reflects badly on Linux to have an issue like this
exposed to users.

wt
--
You're more amazing than you think!

2024-05-13 20:46:39

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: path to landing patch to fix warm boot issue for qca6390

Hi Wren,

On Mon, May 13, 2024 at 4:13 PM Wren Turkal <[email protected]> wrote:
>
> On 5/10/24 11:25 PM, Lk Sii wrote:
> > On 2024/5/11 07:33, Wren Turkal wrote:
> >> On 5/10/24 2:25 PM, Luiz Augusto von Dentz wrote:
> >>> Hi Wren,
> >>>
> >>> On Fri, May 10, 2024 at 4:54 PM Wren Turkal <[email protected]> wrote:
> >>>>
> >>>> On 5/10/24 12:48 PM, Luiz Augusto von Dentz wrote:
> >>>>> Hi Wren,
> >>>>>
> >>>>> On Fri, May 10, 2024 at 3:14 PM Wren Turkal <[email protected]>
> >>>>> wrote:
> >>>>>>
> >>>>>> On 5/6/24 12:49 PM, Luiz Augusto von Dentz wrote:
> >>>>>>> Hi Wren,
> >>>>>>>
> >>>>>>> On Mon, May 6, 2024 at 3:24 PM Wren Turkal <[email protected]>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Krzysztof,
> >>>>>>>>
> >>>>>>>> I am reaching out to you as you had the most important objections
> >>>>>>>> to the
> >>>>>>>> change to fix qca6390 for the warm boot/module reload bug that I am
> >>>>>>>> experiencing.
> >>>>>>>>
> >>>>>>>> For context, the problem is that the hci_uart module will send
> >>>>>>>> specific
> >>>>>>>> vendor specfic commands during shutdown of the hardware under most
> >>>>>>>> situations. These VSCs put the bluetooth device into a
> >>>>>>>> non-working state
> >>>>>>>> on my Dell XPS 13 9310 with qca6390 bluetooth hardware.
> >>>>>>>>
> >>>>>>>> Zijun's proposed fix is to not send these commands when it's not
> >>>>>>>> appropriate for the hardware. The vendor commands should be
> >>>>>>>> avoided when
> >>>>>>>> the hardware does not have persistent configuration or when the
> >>>>>>>> device
> >>>>>>>> is in setup state (indicating that is has never been setup and
> >>>>>>>> should
> >>>>>>>> not be sent the VSCs on the shutdown path). This is what Zijun's
> >>>>>>>> patch
> >>>>>>>> implements.
> >>>>>>>>
> >>>>>>>> In addition, Zijun's change removes the influence of both
> >>>>>>>> the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts
> >>>>>>>> that those flags should not influence the sending of the VSCs in the
> >>>>>>>> shutdown path. If I understand KK's objections properly, this is
> >>>>>>>> where
> >>>>>>>> his objection is stemming from. KK, is this correct?
> >>>>>>>>
> >>>>>>>> Zijun's proposed fix can be seen here:
> >>>>>>>> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
> >>>>>>>>
> >>>>>>>> I'm wondering if we can resolve this impasse by splitting the change
> >>>>>>>> into two changes, as follows:
> >>>>>>>>
> >>>>>>>> 1. Change that removes the influence of the QCA_BT_OFF and
> >>>>>>>> HCI_RUNNING
> >>>>>>>> flags in the shutdown path.
> >>>>>>>> 2. Add the quirk from Zijun's patch that fixes my hardward
> >>>>>>>> configuration.
> >>>>>>>>
> >>>>>>>> I'm hoping that better clearer descriptions for #1 can help get that
> >>>>>>>> landed since the logic current appears to be at odds with how the
> >>>>>>>> hardware works.
> >>>>>>>>
> >>>>>>>> Also, I am happy to split the patches into the two patches, or
> >>>>>>>> (maybe
> >>>>>>>> more ideally) just modify the commit message to better indicate the
> >>>>>>>> reason the change. I just need guidance from maintainers so that
> >>>>>>>> whatever work I do leads to something acceptable for y'all.
> >>>>>>>>
> >>>>>>>> So, please help me get this done. I am just a user with broken
> >>>>>>>> hardware
> >>>>>>>> and a fondness for Linux. I would love to help do what's needed
> >>>>>>>> to get
> >>>>>>>> this fix landed.
> >>>>>>>
> >>>>>>> Ive also objected to that change, in fact the whole shutdown sequence
> >>>>>>> is sort of bogus in my opinion and the driver shall really have some
> >>>>>>> means to find out what mode it is in when it reboots, regardless if
> >>>>>>> cold or warm boot, since otherwise we are in trouble if the user is
> >>>>>>> booting from another OS that doesn't do the expected shutdown
> >>>>>>> sequence.
> >>>>>>
> >>>>>> This criticism makes a ton of sense. I'm sorry I missed it before.
> >>>>>> There
> >>>>>> were a lot of threads moving in parallel. However, I am curious. Given
> >>>>>> that the patch improves the situation for users (like me). Is there
> >>>>>> any
> >>>>>> way we can separate the redesign of the shutdown sequence and the UX
> >>>>>> improvement that comes with this patch?
> >>>>>>
> >>>>>> Here's my concern. I am happy to do the work to redesign this.
> >>>>>> However,
> >>>>>> I don't think I have the information needed to do this since I don't
> >>>>>> have access to the technical docs for the qca6390. I am worried
> >>>>>> that not
> >>>>>> accepting some form of this patch is letting perfect be the enemy
> >>>>>> of the
> >>>>>> good. And I am not sure how I personally can help with that. If you
> >>>>>> think it's possible for me to do this without the docs for the
> >>>>>> hardware,
> >>>>>> I am willing to give it a shot if I can get some guidance. Honestly, I
> >>>>>> wish I had the skill to be confident about a change like this, but
> >>>>>> I don't.
> >>>>>>
> >>>>>> Any ideas on how to move forward would be greatly appreciated.
> >>>>>>
> >>>>>> And just to be perfectly clear, I have tested this patch on my laptop.
> >>>>>> It greatly enhances my ability to use my hardware since I can
> >>>>>> reboot the
> >>>>>> machine without having to make sure to power cycle the laptop. This is
> >>>>>> not a theoretical improvement.
> >>>>>
> >>>>> I would really love some explanation why can't the driver know under
> >>>>> what mode the controller is when it gets probed, because to me we
> >>>>> cannot accept a driver that only works under certain condition after
> >>>>> the boot and in case it is really impossible, can't even power cycle
> >>>>> it to get it back to cold boot stage???
> >>>>
> >>>> This is a great technical criticism of the driver, and I think you
> >>>> deserve that explanation.
> >>>>
> >>>> However, with the driver already in the kernel, shouldn't the bias be
> >>>> toward mitigating the extremely bad UX and not hold users hostage for
> >>>> the bad design which has already been approved and landed in the kernel?
> >>>>
> >>>>> Also the criticism here should be directed to the vendor, how long
> >>>>> have we been discussing problems in the QCA driver? And the only thing
> >>>>> I see coming our way are work-arounds of the problems, the address not
> >>>>> being unique coming from the firmware itself and when provided via DT
> >>>>> the address is in the wrong byteorder and now that the driver must
> >>>>> communicate the firmware on shutdown in order to get it working
> >>>>> properly on the next boot.
> >>>>
> >>>> I agree that Qualcomm should get flack for this, however, the UX problem
> >>>> can be mitigated with a logic fix that doesn't make the init/shutdown
> >>>> design problem any worse than it currently seems to be. I mean, wouldn't
> >>>> this logic have to exist somewhere even if it weren't the shutdown path?
> >>>>
> >>>> If you are trying to use this as leverage to get Qualcomm to do a bigger
> >>>> thing (redesign the init/shutdown logic), I do think that tactic
> >>>> needlessly puts users in the crossfire. I can totally understand why
> >>>> you'd do it. I am just suffering the crossfire in the meantime, and it
> >>>> doesn't feel great.
> >>>
> >>> So you prefer to risk getting a kernel crash due to UAF over Bluetooth
> >>> not working? Really? Because I haven't seen any configuration that
> >>> those changes you tested don't reintroduce the UAF, which is why I
> >>> haven't applied that change to begin with, so no I'm not holding back
> >>> to pressure Qualcomm to redesign the shutdown logic, it just these
> >>> things got entangled because I just realized the shutdown thingy is
> >>> really out of place, imo, but I'd be fine if there is a temporary fix
> >>> until someone finally decide to spend some time to really fix the
> >>> shutdown logic.
> >>
> >> Luiz, I'm sorry. I do not want a crash instead. I didn't understand that
> >> the solution I proposed (i.e. adding Zijun's logic without removing KK's
> >> logic) would introduce a new crash opportunity. I previously thought you
> >> were saying one of the following things:
> >> 1. The crash opportunity already existed due the init/shudown sequences.
> >> 2. The crash opportunity already existed due the init/shudown sequences
> >> when removing KK's logic.
> >>
> >> If it was #1, I was hoping that adding the logic would make the risk no
> >> worse.
> >>
> >> If it was #2, I was hoping that my suggestion of adding Zijun's logic
> >> without removing KK's logic might represent an acceptable middleground
> >> for a temporary fix that would "correct" the logic without introducing a
> >> new crash opportunity.
> >>
> >> I feel like I may not be clear about what I mean by adding Zijun's logic
> >> and not removing KK's logic. Maybe something like this diff:
> >>
> >> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
> >> index 2f7ae38d85eb..fcac44ae7898 100644
> >> --- a/drivers/bluetooth/hci_qca.c
> >> +++ b/drivers/bluetooth/hci_qca.c
> >> @@ -2456,6 +2456,10 @@ static void qca_serdev_shutdown(struct device *dev)
> >> !test_bit(HCI_RUNNING, &hdev->flags))
> >> return;
> >>
> >> + if (test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP,
> >> &hdev->quirks) ||
> >> + hci_dev_test_flag(hdev, HCI_SETUP))
> >> + return;
> >> +
> >> serdev_device_write_flush(serdev);
> >> ret = serdev_device_write_buf(serdev, ibs_wake_cmd,
> >> sizeof(ibs_wake_cmd));
> >>
> >> I think this diff is mangled due to using Thunderbird, but I hope this
> >> helps convey what I was asking about.
> >>
> >> If I am understanding you correctly now, you are saying that simply
> >> introducing Zijun's logic (without removing KK's logic) will introduce a
> >> new crash opportunity. Is that correct?
> >>
> >
> > as Zijun declared. i believe Zijun's change will solve both this
> > reported regression issue and the use-after-free(crash) issue.
>
> I did see Zijun's claim of that. However, I think that both KK and Luiz
> are not convinced by the explanation. Also, if that explanation does
> convince KK and Luiz, I think that the explanation needs to be added to
> the commit message.
>
> I'm hoping that Luiz will at least respond to the middleground I
> proposed as a workaround.

I recall suggesting using HCI_UART_PROTO_READY instead since that
tells when serdev_device_close has been run:

diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index 0c9c9ee56592..bbbc86d4932a 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -2455,8 +2455,8 @@ static void qca_serdev_shutdown(struct device *dev)
const u8 edl_reset_soc_cmd[] = { 0x01, 0x00, 0xFC, 0x01, 0x05 };

if (qcadev->btsoc_type == QCA_QCA6390) {
- if (test_bit(QCA_BT_OFF, &qca->flags) ||
- !test_bit(HCI_RUNNING, &hdev->flags))
+ /* Check if serdev_device_close() has already been called. */
+ if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
return;

serdev_device_write_flush(serdev);


--
Luiz Augusto von Dentz

2024-05-14 22:18:13

by Wren Turkal

[permalink] [raw]
Subject: Re: path to landing patch to fix warm boot issue for qca6390

Hey Luiz,

On 5/13/24 1:46 PM, Luiz Augusto von Dentz wrote:
> Hi Wren,
>
> On Mon, May 13, 2024 at 4:13 PM Wren Turkal <[email protected]> wrote:
>>
>> On 5/10/24 11:25 PM, Lk Sii wrote:
>>> On 2024/5/11 07:33, Wren Turkal wrote:
>>>> On 5/10/24 2:25 PM, Luiz Augusto von Dentz wrote:
>>>>> Hi Wren,
>>>>>
>>>>> On Fri, May 10, 2024 at 4:54 PM Wren Turkal <[email protected]> wrote:
>>>>>>
>>>>>> On 5/10/24 12:48 PM, Luiz Augusto von Dentz wrote:
>>>>>>> Hi Wren,
>>>>>>>
>>>>>>> On Fri, May 10, 2024 at 3:14 PM Wren Turkal <[email protected]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On 5/6/24 12:49 PM, Luiz Augusto von Dentz wrote:
>>>>>>>>> Hi Wren,
>>>>>>>>>
>>>>>>>>> On Mon, May 6, 2024 at 3:24 PM Wren Turkal <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Krzysztof,
>>>>>>>>>>
>>>>>>>>>> I am reaching out to you as you had the most important objections
>>>>>>>>>> to the
>>>>>>>>>> change to fix qca6390 for the warm boot/module reload bug that I am
>>>>>>>>>> experiencing.
>>>>>>>>>>
>>>>>>>>>> For context, the problem is that the hci_uart module will send
>>>>>>>>>> specific
>>>>>>>>>> vendor specfic commands during shutdown of the hardware under most
>>>>>>>>>> situations. These VSCs put the bluetooth device into a
>>>>>>>>>> non-working state
>>>>>>>>>> on my Dell XPS 13 9310 with qca6390 bluetooth hardware.
>>>>>>>>>>
>>>>>>>>>> Zijun's proposed fix is to not send these commands when it's not
>>>>>>>>>> appropriate for the hardware. The vendor commands should be
>>>>>>>>>> avoided when
>>>>>>>>>> the hardware does not have persistent configuration or when the
>>>>>>>>>> device
>>>>>>>>>> is in setup state (indicating that is has never been setup and
>>>>>>>>>> should
>>>>>>>>>> not be sent the VSCs on the shutdown path). This is what Zijun's
>>>>>>>>>> patch
>>>>>>>>>> implements.
>>>>>>>>>>
>>>>>>>>>> In addition, Zijun's change removes the influence of both
>>>>>>>>>> the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun asserts
>>>>>>>>>> that those flags should not influence the sending of the VSCs in the
>>>>>>>>>> shutdown path. If I understand KK's objections properly, this is
>>>>>>>>>> where
>>>>>>>>>> his objection is stemming from. KK, is this correct?
>>>>>>>>>>
>>>>>>>>>> Zijun's proposed fix can be seen here:
>>>>>>>>>> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
>>>>>>>>>>
>>>>>>>>>> I'm wondering if we can resolve this impasse by splitting the change
>>>>>>>>>> into two changes, as follows:
>>>>>>>>>>
>>>>>>>>>> 1. Change that removes the influence of the QCA_BT_OFF and
>>>>>>>>>> HCI_RUNNING
>>>>>>>>>> flags in the shutdown path.
>>>>>>>>>> 2. Add the quirk from Zijun's patch that fixes my hardward
>>>>>>>>>> configuration.
>>>>>>>>>>
>>>>>>>>>> I'm hoping that better clearer descriptions for #1 can help get that
>>>>>>>>>> landed since the logic current appears to be at odds with how the
>>>>>>>>>> hardware works.
>>>>>>>>>>
>>>>>>>>>> Also, I am happy to split the patches into the two patches, or
>>>>>>>>>> (maybe
>>>>>>>>>> more ideally) just modify the commit message to better indicate the
>>>>>>>>>> reason the change. I just need guidance from maintainers so that
>>>>>>>>>> whatever work I do leads to something acceptable for y'all.
>>>>>>>>>>
>>>>>>>>>> So, please help me get this done. I am just a user with broken
>>>>>>>>>> hardware
>>>>>>>>>> and a fondness for Linux. I would love to help do what's needed
>>>>>>>>>> to get
>>>>>>>>>> this fix landed.
>>>>>>>>>
>>>>>>>>> Ive also objected to that change, in fact the whole shutdown sequence
>>>>>>>>> is sort of bogus in my opinion and the driver shall really have some
>>>>>>>>> means to find out what mode it is in when it reboots, regardless if
>>>>>>>>> cold or warm boot, since otherwise we are in trouble if the user is
>>>>>>>>> booting from another OS that doesn't do the expected shutdown
>>>>>>>>> sequence.
>>>>>>>>
>>>>>>>> This criticism makes a ton of sense. I'm sorry I missed it before.
>>>>>>>> There
>>>>>>>> were a lot of threads moving in parallel. However, I am curious. Given
>>>>>>>> that the patch improves the situation for users (like me). Is there
>>>>>>>> any
>>>>>>>> way we can separate the redesign of the shutdown sequence and the UX
>>>>>>>> improvement that comes with this patch?
>>>>>>>>
>>>>>>>> Here's my concern. I am happy to do the work to redesign this.
>>>>>>>> However,
>>>>>>>> I don't think I have the information needed to do this since I don't
>>>>>>>> have access to the technical docs for the qca6390. I am worried
>>>>>>>> that not
>>>>>>>> accepting some form of this patch is letting perfect be the enemy
>>>>>>>> of the
>>>>>>>> good. And I am not sure how I personally can help with that. If you
>>>>>>>> think it's possible for me to do this without the docs for the
>>>>>>>> hardware,
>>>>>>>> I am willing to give it a shot if I can get some guidance. Honestly, I
>>>>>>>> wish I had the skill to be confident about a change like this, but
>>>>>>>> I don't.
>>>>>>>>
>>>>>>>> Any ideas on how to move forward would be greatly appreciated.
>>>>>>>>
>>>>>>>> And just to be perfectly clear, I have tested this patch on my laptop.
>>>>>>>> It greatly enhances my ability to use my hardware since I can
>>>>>>>> reboot the
>>>>>>>> machine without having to make sure to power cycle the laptop. This is
>>>>>>>> not a theoretical improvement.
>>>>>>>
>>>>>>> I would really love some explanation why can't the driver know under
>>>>>>> what mode the controller is when it gets probed, because to me we
>>>>>>> cannot accept a driver that only works under certain condition after
>>>>>>> the boot and in case it is really impossible, can't even power cycle
>>>>>>> it to get it back to cold boot stage???
>>>>>>
>>>>>> This is a great technical criticism of the driver, and I think you
>>>>>> deserve that explanation.
>>>>>>
>>>>>> However, with the driver already in the kernel, shouldn't the bias be
>>>>>> toward mitigating the extremely bad UX and not hold users hostage for
>>>>>> the bad design which has already been approved and landed in the kernel?
>>>>>>
>>>>>>> Also the criticism here should be directed to the vendor, how long
>>>>>>> have we been discussing problems in the QCA driver? And the only thing
>>>>>>> I see coming our way are work-arounds of the problems, the address not
>>>>>>> being unique coming from the firmware itself and when provided via DT
>>>>>>> the address is in the wrong byteorder and now that the driver must
>>>>>>> communicate the firmware on shutdown in order to get it working
>>>>>>> properly on the next boot.
>>>>>>
>>>>>> I agree that Qualcomm should get flack for this, however, the UX problem
>>>>>> can be mitigated with a logic fix that doesn't make the init/shutdown
>>>>>> design problem any worse than it currently seems to be. I mean, wouldn't
>>>>>> this logic have to exist somewhere even if it weren't the shutdown path?
>>>>>>
>>>>>> If you are trying to use this as leverage to get Qualcomm to do a bigger
>>>>>> thing (redesign the init/shutdown logic), I do think that tactic
>>>>>> needlessly puts users in the crossfire. I can totally understand why
>>>>>> you'd do it. I am just suffering the crossfire in the meantime, and it
>>>>>> doesn't feel great.
>>>>>
>>>>> So you prefer to risk getting a kernel crash due to UAF over Bluetooth
>>>>> not working? Really? Because I haven't seen any configuration that
>>>>> those changes you tested don't reintroduce the UAF, which is why I
>>>>> haven't applied that change to begin with, so no I'm not holding back
>>>>> to pressure Qualcomm to redesign the shutdown logic, it just these
>>>>> things got entangled because I just realized the shutdown thingy is
>>>>> really out of place, imo, but I'd be fine if there is a temporary fix
>>>>> until someone finally decide to spend some time to really fix the
>>>>> shutdown logic.
>>>>
>>>> Luiz, I'm sorry. I do not want a crash instead. I didn't understand that
>>>> the solution I proposed (i.e. adding Zijun's logic without removing KK's
>>>> logic) would introduce a new crash opportunity. I previously thought you
>>>> were saying one of the following things:
>>>> 1. The crash opportunity already existed due the init/shudown sequences.
>>>> 2. The crash opportunity already existed due the init/shudown sequences
>>>> when removing KK's logic.
>>>>
>>>> If it was #1, I was hoping that adding the logic would make the risk no
>>>> worse.
>>>>
>>>> If it was #2, I was hoping that my suggestion of adding Zijun's logic
>>>> without removing KK's logic might represent an acceptable middleground
>>>> for a temporary fix that would "correct" the logic without introducing a
>>>> new crash opportunity.
>>>>
>>>> I feel like I may not be clear about what I mean by adding Zijun's logic
>>>> and not removing KK's logic. Maybe something like this diff:
>>>>
>>>> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
>>>> index 2f7ae38d85eb..fcac44ae7898 100644
>>>> --- a/drivers/bluetooth/hci_qca.c
>>>> +++ b/drivers/bluetooth/hci_qca.c
>>>> @@ -2456,6 +2456,10 @@ static void qca_serdev_shutdown(struct device *dev)
>>>> !test_bit(HCI_RUNNING, &hdev->flags))
>>>> return;
>>>>
>>>> + if (test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP,
>>>> &hdev->quirks) ||
>>>> + hci_dev_test_flag(hdev, HCI_SETUP))
>>>> + return;
>>>> +
>>>> serdev_device_write_flush(serdev);
>>>> ret = serdev_device_write_buf(serdev, ibs_wake_cmd,
>>>> sizeof(ibs_wake_cmd));
>>>>
>>>> I think this diff is mangled due to using Thunderbird, but I hope this
>>>> helps convey what I was asking about.
>>>>
>>>> If I am understanding you correctly now, you are saying that simply
>>>> introducing Zijun's logic (without removing KK's logic) will introduce a
>>>> new crash opportunity. Is that correct?
>>>>
>>>
>>> as Zijun declared. i believe Zijun's change will solve both this
>>> reported regression issue and the use-after-free(crash) issue.
>>
>> I did see Zijun's claim of that. However, I think that both KK and Luiz
>> are not convinced by the explanation. Also, if that explanation does
>> convince KK and Luiz, I think that the explanation needs to be added to
>> the commit message.
>>
>> I'm hoping that Luiz will at least respond to the middleground I
>> proposed as a workaround.
>
> I recall suggesting using HCI_UART_PROTO_READY instead since that
> tells when serdev_device_close has been run:
>
> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
> index 0c9c9ee56592..bbbc86d4932a 100644
> --- a/drivers/bluetooth/hci_qca.c
> +++ b/drivers/bluetooth/hci_qca.c
> @@ -2455,8 +2455,8 @@ static void qca_serdev_shutdown(struct device *dev)
> const u8 edl_reset_soc_cmd[] = { 0x01, 0x00, 0xFC, 0x01, 0x05 };
>
> if (qcadev->btsoc_type == QCA_QCA6390) {
> - if (test_bit(QCA_BT_OFF, &qca->flags) ||
> - !test_bit(HCI_RUNNING, &hdev->flags))
> + /* Check if serdev_device_close() has already been called. */
> + if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
> return;
>
> serdev_device_write_flush(serdev);

I am testing this right now. I will let you know how it goes.

Thanks,
wt

--
You're more amazing than you think!

2024-05-15 05:11:07

by Wren Turkal

[permalink] [raw]
Subject: Re: path to landing patch to fix warm boot issue for qca6390

Luiz,

On 5/14/24 3:17 PM, Wren Turkal wrote:
> Hey Luiz,
>
> On 5/13/24 1:46 PM, Luiz Augusto von Dentz wrote:
>> Hi Wren,
>>
>> On Mon, May 13, 2024 at 4:13 PM Wren Turkal <[email protected]> wrote:
>>>
>>> On 5/10/24 11:25 PM, Lk Sii wrote:
>>>> On 2024/5/11 07:33, Wren Turkal wrote:
>>>>> On 5/10/24 2:25 PM, Luiz Augusto von Dentz wrote:
>>>>>> Hi Wren,
>>>>>>
>>>>>> On Fri, May 10, 2024 at 4:54 PM Wren Turkal <[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> On 5/10/24 12:48 PM, Luiz Augusto von Dentz wrote:
>>>>>>>> Hi Wren,
>>>>>>>>
>>>>>>>> On Fri, May 10, 2024 at 3:14 PM Wren Turkal <[email protected]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On 5/6/24 12:49 PM, Luiz Augusto von Dentz wrote:
>>>>>>>>>> Hi Wren,
>>>>>>>>>>
>>>>>>>>>> On Mon, May 6, 2024 at 3:24 PM Wren Turkal <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Krzysztof,
>>>>>>>>>>>
>>>>>>>>>>> I am reaching out to you as you had the most important
>>>>>>>>>>> objections
>>>>>>>>>>> to the
>>>>>>>>>>> change to fix qca6390 for the warm boot/module reload bug
>>>>>>>>>>> that I am
>>>>>>>>>>> experiencing.
>>>>>>>>>>>
>>>>>>>>>>> For context, the problem is that the hci_uart module will send
>>>>>>>>>>> specific
>>>>>>>>>>> vendor specfic commands during shutdown of the hardware under
>>>>>>>>>>> most
>>>>>>>>>>> situations. These VSCs put the bluetooth device into a
>>>>>>>>>>> non-working state
>>>>>>>>>>> on my Dell XPS 13 9310 with qca6390 bluetooth hardware.
>>>>>>>>>>>
>>>>>>>>>>> Zijun's proposed fix is to not send these commands when it's not
>>>>>>>>>>> appropriate for the hardware. The vendor commands should be
>>>>>>>>>>> avoided when
>>>>>>>>>>> the hardware does not have persistent configuration or when the
>>>>>>>>>>> device
>>>>>>>>>>> is in setup state (indicating that is has never been setup and
>>>>>>>>>>> should
>>>>>>>>>>> not be sent the VSCs on the shutdown path). This is what Zijun's
>>>>>>>>>>> patch
>>>>>>>>>>> implements.
>>>>>>>>>>>
>>>>>>>>>>> In addition, Zijun's change removes the influence of both
>>>>>>>>>>> the QCA_BT_OFF qca flag and and HCI_RUNNING hdev flag. Zijun
>>>>>>>>>>> asserts
>>>>>>>>>>> that those flags should not influence the sending of the VSCs
>>>>>>>>>>> in the
>>>>>>>>>>> shutdown path. If I understand KK's objections properly, this is
>>>>>>>>>>> where
>>>>>>>>>>> his objection is stemming from. KK, is this correct?
>>>>>>>>>>>
>>>>>>>>>>> Zijun's proposed fix can be seen here:
>>>>>>>>>>> https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/
>>>>>>>>>>>
>>>>>>>>>>> I'm wondering if we can resolve this impasse by splitting the
>>>>>>>>>>> change
>>>>>>>>>>> into two changes, as follows:
>>>>>>>>>>>
>>>>>>>>>>> 1. Change that removes the influence of the QCA_BT_OFF and
>>>>>>>>>>> HCI_RUNNING
>>>>>>>>>>> flags in the shutdown path.
>>>>>>>>>>> 2. Add the quirk from Zijun's patch that fixes my hardward
>>>>>>>>>>> configuration.
>>>>>>>>>>>
>>>>>>>>>>> I'm hoping that better clearer descriptions for #1 can help
>>>>>>>>>>> get that
>>>>>>>>>>> landed since the logic current appears to be at odds with how
>>>>>>>>>>> the
>>>>>>>>>>> hardware works.
>>>>>>>>>>>
>>>>>>>>>>> Also, I am happy to split the patches into the two patches, or
>>>>>>>>>>> (maybe
>>>>>>>>>>> more ideally) just modify the commit message to better
>>>>>>>>>>> indicate the
>>>>>>>>>>> reason the change. I just need guidance from maintainers so that
>>>>>>>>>>> whatever work I do leads to something acceptable for y'all.
>>>>>>>>>>>
>>>>>>>>>>> So, please help me get this done. I am just a user with broken
>>>>>>>>>>> hardware
>>>>>>>>>>> and a fondness for Linux. I would love to help do what's needed
>>>>>>>>>>> to get
>>>>>>>>>>> this fix landed.
>>>>>>>>>>
>>>>>>>>>> Ive also objected to that change, in fact the whole shutdown
>>>>>>>>>> sequence
>>>>>>>>>> is sort of bogus in my opinion and the driver shall really
>>>>>>>>>> have some
>>>>>>>>>> means to find out what mode it is in when it reboots,
>>>>>>>>>> regardless if
>>>>>>>>>> cold or warm boot, since otherwise we are in trouble if the
>>>>>>>>>> user is
>>>>>>>>>> booting from another OS that doesn't do the expected shutdown
>>>>>>>>>> sequence.
>>>>>>>>>
>>>>>>>>> This criticism makes a ton of sense. I'm sorry I missed it before.
>>>>>>>>> There
>>>>>>>>> were a lot of threads moving in parallel. However, I am
>>>>>>>>> curious. Given
>>>>>>>>> that the patch improves the situation for users (like me). Is
>>>>>>>>> there
>>>>>>>>> any
>>>>>>>>> way we can separate the redesign of the shutdown sequence and
>>>>>>>>> the UX
>>>>>>>>> improvement that comes with this patch?
>>>>>>>>>
>>>>>>>>> Here's my concern. I am happy to do the work to redesign this.
>>>>>>>>> However,
>>>>>>>>> I don't think I have the information needed to do this since I
>>>>>>>>> don't
>>>>>>>>> have access to the technical docs for the qca6390. I am worried
>>>>>>>>> that not
>>>>>>>>> accepting some form of this patch is letting perfect be the enemy
>>>>>>>>> of the
>>>>>>>>> good. And I am not sure how I personally can help with that. If
>>>>>>>>> you
>>>>>>>>> think it's possible for me to do this without the docs for the
>>>>>>>>> hardware,
>>>>>>>>> I am willing to give it a shot if I can get some guidance.
>>>>>>>>> Honestly, I
>>>>>>>>> wish I had the skill to be confident about a change like this, but
>>>>>>>>> I don't.
>>>>>>>>>
>>>>>>>>> Any ideas on how to move forward would be greatly appreciated.
>>>>>>>>>
>>>>>>>>> And just to be perfectly clear, I have tested this patch on my
>>>>>>>>> laptop.
>>>>>>>>> It greatly enhances my ability to use my hardware since I can
>>>>>>>>> reboot the
>>>>>>>>> machine without having to make sure to power cycle the laptop.
>>>>>>>>> This is
>>>>>>>>> not a theoretical improvement.
>>>>>>>>
>>>>>>>> I would really love some explanation why can't the driver know
>>>>>>>> under
>>>>>>>> what mode the controller is when it gets probed, because to me we
>>>>>>>> cannot accept a driver that only works under certain condition
>>>>>>>> after
>>>>>>>> the boot and in case it is really impossible, can't even power
>>>>>>>> cycle
>>>>>>>> it to get it back to cold boot stage???
>>>>>>>
>>>>>>> This is a great technical criticism of the driver, and I think you
>>>>>>> deserve that explanation.
>>>>>>>
>>>>>>> However, with the driver already in the kernel, shouldn't the
>>>>>>> bias be
>>>>>>> toward mitigating the extremely bad UX and not hold users hostage
>>>>>>> for
>>>>>>> the bad design which has already been approved and landed in the
>>>>>>> kernel?
>>>>>>>
>>>>>>>> Also the criticism here should be directed to the vendor, how long
>>>>>>>> have we been discussing problems in the QCA driver? And the only
>>>>>>>> thing
>>>>>>>> I see coming our way are work-arounds of the problems, the
>>>>>>>> address not
>>>>>>>> being unique coming from the firmware itself and when provided
>>>>>>>> via DT
>>>>>>>> the address is in the wrong byteorder and now that the driver must
>>>>>>>> communicate the firmware on shutdown in order to get it working
>>>>>>>> properly on the next boot.
>>>>>>>
>>>>>>> I agree that Qualcomm should get flack for this, however, the UX
>>>>>>> problem
>>>>>>> can be mitigated with a logic fix that doesn't make the
>>>>>>> init/shutdown
>>>>>>> design problem any worse than it currently seems to be. I mean,
>>>>>>> wouldn't
>>>>>>> this logic have to exist somewhere even if it weren't the
>>>>>>> shutdown path?
>>>>>>>
>>>>>>> If you are trying to use this as leverage to get Qualcomm to do a
>>>>>>> bigger
>>>>>>> thing (redesign the init/shutdown logic), I do think that tactic
>>>>>>> needlessly puts users in the crossfire. I can totally understand why
>>>>>>> you'd do it. I am just suffering the crossfire in the meantime,
>>>>>>> and it
>>>>>>> doesn't feel great.
>>>>>>
>>>>>> So you prefer to risk getting a kernel crash due to UAF over
>>>>>> Bluetooth
>>>>>> not working? Really? Because I haven't seen any configuration that
>>>>>> those changes you tested don't reintroduce the UAF, which is why I
>>>>>> haven't applied that change to begin with, so no I'm not holding back
>>>>>> to pressure Qualcomm to redesign the shutdown logic, it just these
>>>>>> things got entangled because I just realized the shutdown thingy is
>>>>>> really out of place, imo, but I'd be fine if there is a temporary fix
>>>>>> until someone finally decide to spend some time to really fix the
>>>>>> shutdown logic.
>>>>>
>>>>> Luiz, I'm sorry. I do not want a crash instead. I didn't understand
>>>>> that
>>>>> the solution I proposed (i.e. adding Zijun's logic without removing
>>>>> KK's
>>>>> logic) would introduce a new crash opportunity. I previously
>>>>> thought you
>>>>> were saying one of the following things:
>>>>> 1. The crash opportunity already existed due the init/shudown
>>>>> sequences.
>>>>> 2. The crash opportunity already existed due the init/shudown
>>>>> sequences
>>>>> when removing KK's logic.
>>>>>
>>>>> If it was #1, I was hoping that adding the logic would make the
>>>>> risk no
>>>>> worse.
>>>>>
>>>>> If it was #2, I was hoping that my suggestion of adding Zijun's logic
>>>>> without removing KK's logic might represent an acceptable middleground
>>>>> for a temporary fix that would "correct" the logic without
>>>>> introducing a
>>>>> new crash opportunity.
>>>>>
>>>>> I feel like I may not be clear about what I mean by adding Zijun's
>>>>> logic
>>>>> and not removing KK's logic. Maybe something like this diff:
>>>>>
>>>>> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
>>>>> index 2f7ae38d85eb..fcac44ae7898 100644
>>>>> --- a/drivers/bluetooth/hci_qca.c
>>>>> +++ b/drivers/bluetooth/hci_qca.c
>>>>> @@ -2456,6 +2456,10 @@ static void qca_serdev_shutdown(struct
>>>>> device *dev)
>>>>>                        !test_bit(HCI_RUNNING, &hdev->flags))
>>>>>                            return;
>>>>>
>>>>> +               if (test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP,
>>>>> &hdev->quirks) ||
>>>>> +                   hci_dev_test_flag(hdev, HCI_SETUP))
>>>>> +                       return;
>>>>> +
>>>>>                    serdev_device_write_flush(serdev);
>>>>>                    ret = serdev_device_write_buf(serdev, ibs_wake_cmd,
>>>>>
>>>>> sizeof(ibs_wake_cmd));
>>>>>
>>>>> I think this diff is mangled due to using Thunderbird, but I hope this
>>>>> helps convey what I was asking about.
>>>>>
>>>>> If I am understanding you correctly now, you are saying that simply
>>>>> introducing Zijun's logic (without removing KK's logic) will
>>>>> introduce a
>>>>> new crash opportunity. Is that correct?
>>>>>
>>>>
>>>> as Zijun declared. i believe Zijun's change will solve both this
>>>> reported regression issue and the use-after-free(crash) issue.
>>>
>>> I did see Zijun's claim of that. However, I think that both KK and Luiz
>>> are not convinced by the explanation. Also, if that explanation does
>>> convince KK and Luiz, I think that the explanation needs to be added to
>>> the commit message.
>>>
>>> I'm hoping that Luiz will at least respond to the middleground I
>>> proposed as a workaround.
>>
>> I recall suggesting using HCI_UART_PROTO_READY instead since that
>> tells when serdev_device_close has been run:
>>
>> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
>> index 0c9c9ee56592..bbbc86d4932a 100644
>> --- a/drivers/bluetooth/hci_qca.c
>> +++ b/drivers/bluetooth/hci_qca.c
>> @@ -2455,8 +2455,8 @@ static void qca_serdev_shutdown(struct device *dev)
>>          const u8 edl_reset_soc_cmd[] = { 0x01, 0x00, 0xFC, 0x01, 0x05 };
>>
>>          if (qcadev->btsoc_type == QCA_QCA6390) {
>> -               if (test_bit(QCA_BT_OFF, &qca->flags) ||
>> -                   !test_bit(HCI_RUNNING, &hdev->flags))
>> +               /* Check if serdev_device_close() has already been
>> called. */
>> +               if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
>>                          return;
>>
>>                  serdev_device_write_flush(serdev);
>
> I am testing this right now. I will let you know how it goes.

I applied your patch and tested a number of scenarios. Here are the
results. Apologies for any bad line wraps. I guess I haven't managed to
tame Thunderbird yet:

warm boot:
unpatched kernel to patched kernel : doesn't work
patched kernel to patched kernel : works

patched kernel tests:
starting from cold boot:
works
disable/re-enable bt by plasma UI:
works
restart bluetooth service:
works
unload/re-load hci_uart and bluetooth service:
doesn't work after reload; can't enable BT
I think this worked with Zijun's commit, but I'm not 100% sure.
disable bluetooth service, warm boot, start bluetooth service:
fails to connect to devices

What follows are logs for the non-working cases that only involve the
patched kernel:

logs for unload/re-load hci_uart and bluetooth service test when
reloading the module:
May 14 21:42:34 braindead.localdomain kernel: Bluetooth: HCI UART driver
ver 2.3
May 14 21:42:34 braindead.localdomain kernel: Bluetooth: HCI UART
protocol H4 registered
May 14 21:42:34 braindead.localdomain kernel: Bluetooth: HCI UART
protocol QCA registered
May 14 21:42:34 braindead.localdomain kernel: Bluetooth: hci0: setting
up ROME/QCA6390
May 14 21:42:34 braindead.localdomain kernel: Bluetooth: hci0: QCA
Product ID :0x00000010
May 14 21:42:34 braindead.localdomain kernel: Bluetooth: hci0: QCA SOC
Version :0x400a0200
May 14 21:42:34 braindead.localdomain kernel: Bluetooth: hci0: QCA ROM
Version :0x00000200
May 14 21:42:34 braindead.localdomain kernel: Bluetooth: hci0: QCA Patch
Version:0x00003ac0
May 14 21:42:34 braindead.localdomain kernel: Bluetooth: hci0: QCA
controller version 0x02000200
May 14 21:42:34 braindead.localdomain kernel: Bluetooth: hci0: QCA
Downloading qca/htbtfw20.tlv
May 14 21:42:34 braindead.localdomain kernel: Bluetooth: hci0: QCA
Failed to send TLV segment (-110)
May 14 21:42:34 braindead.localdomain kernel: Bluetooth: hci0: QCA
Failed to download patch (-110)
May 14 21:42:34 braindead.localdomain kernel: Bluetooth: hci0: Retry BT
power ON:0
May 14 21:42:37 braindead.localdomain kernel: Bluetooth: hci0: command
0xfc00 tx timeout
May 14 21:42:37 braindead.localdomain kernel: Bluetooth: hci0: Reading
QCA version information failed (-110)
May 14 21:42:37 braindead.localdomain kernel: Bluetooth: hci0: Retry BT
power ON:1
May 14 21:42:39 braindead.localdomain kernel: Bluetooth: hci0: command
0xfc00 tx timeout
May 14 21:42:39 braindead.localdomain kernel: Bluetooth: hci0: Reading
QCA version information failed (-110)
May 14 21:42:39 braindead.localdomain kernel: Bluetooth: hci0: Retry BT
power ON:2
May 14 21:42:41 braindead.localdomain kernel: Bluetooth: hci0: command
0xfc00 tx timeout
May 14 21:42:41 braindead.localdomain kernel: Bluetooth: hci0: Reading
QCA version information failed (-110)


logs for disable bluetooth service, warm boot, start bluetooth service
manually test starting at the point of starting bluetooth service:
May 14 21:45:53 braindead.localdomain kernel: Bluetooth: hci0:
hci_devcd_init Return:-95
May 14 21:46:59 braindead.localdomain kernel: Bluetooth: BNEP (Ethernet
Emulation) ver 1.3
May 14 21:46:59 braindead.localdomain kernel: Bluetooth: BNEP filters:
protocol multicast
May 14 21:46:59 braindead.localdomain kernel: Bluetooth: BNEP socket
layer initialized
May 14 21:46:59 braindead.localdomain kernel: Bluetooth: MGMT ver 1.22
May 14 21:46:59 braindead.localdomain kernel: Bluetooth: RFCOMM TTY
layer initialized
May 14 21:46:59 braindead.localdomain kernel: Bluetooth: RFCOMM socket
layer initialized
May 14 21:46:59 braindead.localdomain kernel: Bluetooth: RFCOMM ver 1.11
May 14 21:47:01 braindead.localdomain kernel: Bluetooth: hci0: Opcode
0x0c03 failed: -110
May 14 21:47:01 braindead.localdomain kernel: Bluetooth: hci0:
mem_dump_status: 2
May 14 21:47:03 braindead.localdomain kernel: Bluetooth: hci0: Opcode
0x0c03 failed: -110
May 14 21:47:13 braindead.localdomain kernel: Bluetooth: hci0: Opcode
0x0c03 failed: -110

So, I noticed that your patch doesn't include the quirk conditional from
Zijun's commit. I am assuming that that was intentional. However, I
wanted to point it out in case that was somehow material to the results.

Please let me know what else I can do to help. And again, I really
appreciate your effort on this.

wt
--
You're more amazing than you think!