2024-01-22 12:18:20

by Andrei Volkov

[permalink] [raw]
Subject: regression in c7f59461f5a78: Bluetooth: Fix a refcnt underflow problem for hci_conn

Hello,

Lately we've bumped with regression introduced by commit:

 c7f59461f5a78 ("Bluetooth: Fix a refcnt underflow problem for
hci_conn", 2023-10-04)

The regression related with adding "hci_conn_ssp_enabled()" check in
"hci_io_capa_request_evt()" handler, and broke pairing process initiated
by the external device.

Precisely, some ext. devices, like any phone equipped with Android ver <
14 (we have not latest one, so we didn't check), always send "IO
Capability Request" before "Read Remote Extended Features" command, as
consequence the flag "HCI_CONN_SSP_ENABLED" not yet activated at the
time of "hci_io_capa_request_evt()" execution  and
"hci_conn_ssp_enabled()" always returns false in time of the pairing.

As a result, pairing always fails. The quick and dirty fix is revert the
ssp check introduced in the subj. commit, like below:

--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -5329,7 +5329,7 @@ static void hci_io_capa_request_evt(struct hci_dev
*hdev, void *data,
        hci_dev_lock(hdev);

        conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr);
-       if (!conn || !hci_conn_ssp_enabled(conn))
+       if (!conn)
                goto unlock;

        hci_conn_hold(conn);


However, a more thorough and correct fix requires discussion and
testing. Therefore, I would like to get any comments/suggestion from the
community before doing this.

Regards
Andrey VOLKOV



2024-01-22 12:54:15

by bluez.test.bot

[permalink] [raw]
Subject: RE: regression in c7f59461f5a78: Bluetooth: Fix a refcnt underflow problem for hci_conn

This is an automated email and please do not reply to this email.

Dear Submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
While preparing the CI tests, the patches you submitted couldn't be applied to the current HEAD of the repository.

----- Output -----

error: corrupt patch at line 4
hint: Use 'git am --show-current-patch' to see the failed patch

Please resolve the issue and submit the patches again.


---
Regards,
Linux Bluetooth

2024-01-22 14:04:06

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: regression in c7f59461f5a78: Bluetooth: Fix a refcnt underflow problem for hci_conn

Hi Andrei,

On Mon, Jan 22, 2024 at 7:18 AM Andrei Volkov
<[email protected]> wrote:
>
> Hello,
>
> Lately we've bumped with regression introduced by commit:
>
> c7f59461f5a78 ("Bluetooth: Fix a refcnt underflow problem for
> hci_conn", 2023-10-04)
>
> The regression related with adding "hci_conn_ssp_enabled()" check in
> "hci_io_capa_request_evt()" handler, and broke pairing process initiated
> by the external device.
>
> Precisely, some ext. devices, like any phone equipped with Android ver <
> 14 (we have not latest one, so we didn't check), always send "IO
> Capability Request" before "Read Remote Extended Features" command, as
> consequence the flag "HCI_CONN_SSP_ENABLED" not yet activated at the
> time of "hci_io_capa_request_evt()" execution and
> "hci_conn_ssp_enabled()" always returns false in time of the pairing.
>
> As a result, pairing always fails. The quick and dirty fix is revert the
> ssp check introduced in the subj. commit, like below:
>
> --- a/net/bluetooth/hci_event.c
> +++ b/net/bluetooth/hci_event.c
> @@ -5329,7 +5329,7 @@ static void hci_io_capa_request_evt(struct hci_dev
> *hdev, void *data,
> hci_dev_lock(hdev);
>
> conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr);
> - if (!conn || !hci_conn_ssp_enabled(conn))
> + if (!conn)
> goto unlock;
>
> hci_conn_hold(conn);
>
>
> However, a more thorough and correct fix requires discussion and
> testing. Therefore, I would like to get any comments/suggestion from the
> community before doing this.

I think we need to do something like this, so we consider only the
local SSP flag when evaluating if we should proceed to respond or just
drop:

diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index 6130c969f361..a15924db83d9 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -5327,9 +5327,12 @@ static void hci_io_capa_request_evt(struct
hci_dev *hdev, void *data,
hci_dev_lock(hdev);

conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr);
- if (!conn || !hci_conn_ssp_enabled(conn))
+ if (!conn || !hci_dev_test_flag(hdev, HCI_SSP_ENABLED))
goto unlock;

+ /* Assume remote supports SSP since it has triggered this event */
+ set_bit(HCI_CONN_SSP_ENABLED, &conn->flags);
+
hci_conn_hold(conn);

if (!hci_dev_test_flag(hdev, HCI_MGMT))


> Regards
> Andrey VOLKOV
>
>


--
Luiz Augusto von Dentz

2024-01-22 14:46:00

by Andrei Volkov

[permalink] [raw]
Subject: Re: regression in c7f59461f5a78: Bluetooth: Fix a refcnt underflow problem for hci_conn

Hi Luiz,

Wouldn't it be better to add a 'yet-another' flag as an addition to
HCI_CONN_SSP_ENABLED, and clear it unconditionally at the top of
'hci_remote_ext_features_evt', before

"        conn = hci_conn_hash_lookup_handle "

check?

In this case broken ext_features response (with ev->status != 0 or conn
== NULL) will not indirectly enable the SSP feature.

Regards
Andrei VOLKOV

Le 22/01/2024 à 15:02, Luiz Augusto von Dentz a écrit :

> Hi Andrei,
>
> On Mon, Jan 22, 2024 at 7:18 AM Andrei Volkov
> <[email protected]> wrote:
>> Hello,
>>
>> Lately we've bumped with regression introduced by commit:
>>
>> c7f59461f5a78 ("Bluetooth: Fix a refcnt underflow problem for
>> hci_conn", 2023-10-04)
>>
>> The regression related with adding "hci_conn_ssp_enabled()" check in
>> "hci_io_capa_request_evt()" handler, and broke pairing process initiated
>> by the external device.
>>
>> Precisely, some ext. devices, like any phone equipped with Android ver <
>> 14 (we have not latest one, so we didn't check), always send "IO
>> Capability Request" before "Read Remote Extended Features" command, as
>> consequence the flag "HCI_CONN_SSP_ENABLED" not yet activated at the
>> time of "hci_io_capa_request_evt()" execution and
>> "hci_conn_ssp_enabled()" always returns false in time of the pairing.
>>
>> As a result, pairing always fails. The quick and dirty fix is revert the
>> ssp check introduced in the subj. commit, like below:
>>
>> --- a/net/bluetooth/hci_event.c
>> +++ b/net/bluetooth/hci_event.c
>> @@ -5329,7 +5329,7 @@ static void hci_io_capa_request_evt(struct hci_dev
>> *hdev, void *data,
>> hci_dev_lock(hdev);
>>
>> conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr);
>> - if (!conn || !hci_conn_ssp_enabled(conn))
>> + if (!conn)
>> goto unlock;
>>
>> hci_conn_hold(conn);
>>
>>
>> However, a more thorough and correct fix requires discussion and
>> testing. Therefore, I would like to get any comments/suggestion from the
>> community before doing this.
> I think we need to do something like this, so we consider only the
> local SSP flag when evaluating if we should proceed to respond or just
> drop:
>
> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> index 6130c969f361..a15924db83d9 100644
> --- a/net/bluetooth/hci_event.c
> +++ b/net/bluetooth/hci_event.c
> @@ -5327,9 +5327,12 @@ static void hci_io_capa_request_evt(struct
> hci_dev *hdev, void *data,
> hci_dev_lock(hdev);
>
> conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr);
> - if (!conn || !hci_conn_ssp_enabled(conn))
> + if (!conn || !hci_dev_test_flag(hdev, HCI_SSP_ENABLED))
> goto unlock;
>
> + /* Assume remote supports SSP since it has triggered this event */
> + set_bit(HCI_CONN_SSP_ENABLED, &conn->flags);
> +
> hci_conn_hold(conn);
>
> if (!hci_dev_test_flag(hdev, HCI_MGMT))
>
>
>> Regards
>> Andrey VOLKOV
>>
>>
>

2024-01-22 16:07:17

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: regression in c7f59461f5a78: Bluetooth: Fix a refcnt underflow problem for hci_conn

Hi Andrei,

On Mon, Jan 22, 2024 at 9:45 AM Andrei Volkov
<[email protected]> wrote:
>
> Hi Luiz,
>
> Wouldn't it be better to add a 'yet-another' flag as an addition to
> HCI_CONN_SSP_ENABLED, and clear it unconditionally at the top of
> 'hci_remote_ext_features_evt', before
>
> " conn = hci_conn_hash_lookup_handle "
>
> check?
>
> In this case broken ext_features response (with ev->status != 0 or conn
> == NULL) will not indirectly enable the SSP feature.

HCI_CONN_SSP_ENABLED is already at conn level, besides that it is too
late to clear it if the SSP procedure has already taken place and on
top of it I don't want to change the code too much and risk having
more regressions like this.

Btw, something tells me that Android is not actually doing the
HCI_OP_READ_REMOTE_EXT_FEATURES since we do have CI testing SSP and
this change hasn't cause us any problems, do you know what command it
uses? Perhaps it tries SSP right-away or does it cache the previous
response?

> Regards
> Andrei VOLKOV
>
> Le 22/01/2024 à 15:02, Luiz Augusto von Dentz a écrit :
>
> > Hi Andrei,
> >
> > On Mon, Jan 22, 2024 at 7:18 AM Andrei Volkov
> > <[email protected]> wrote:
> >> Hello,
> >>
> >> Lately we've bumped with regression introduced by commit:
> >>
> >> c7f59461f5a78 ("Bluetooth: Fix a refcnt underflow problem for
> >> hci_conn", 2023-10-04)
> >>
> >> The regression related with adding "hci_conn_ssp_enabled()" check in
> >> "hci_io_capa_request_evt()" handler, and broke pairing process initiated
> >> by the external device.
> >>
> >> Precisely, some ext. devices, like any phone equipped with Android ver <
> >> 14 (we have not latest one, so we didn't check), always send "IO
> >> Capability Request" before "Read Remote Extended Features" command, as
> >> consequence the flag "HCI_CONN_SSP_ENABLED" not yet activated at the
> >> time of "hci_io_capa_request_evt()" execution and
> >> "hci_conn_ssp_enabled()" always returns false in time of the pairing.
> >>
> >> As a result, pairing always fails. The quick and dirty fix is revert the
> >> ssp check introduced in the subj. commit, like below:
> >>
> >> --- a/net/bluetooth/hci_event.c
> >> +++ b/net/bluetooth/hci_event.c
> >> @@ -5329,7 +5329,7 @@ static void hci_io_capa_request_evt(struct hci_dev
> >> *hdev, void *data,
> >> hci_dev_lock(hdev);
> >>
> >> conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr);
> >> - if (!conn || !hci_conn_ssp_enabled(conn))
> >> + if (!conn)
> >> goto unlock;
> >>
> >> hci_conn_hold(conn);
> >>
> >>
> >> However, a more thorough and correct fix requires discussion and
> >> testing. Therefore, I would like to get any comments/suggestion from the
> >> community before doing this.
> > I think we need to do something like this, so we consider only the
> > local SSP flag when evaluating if we should proceed to respond or just
> > drop:
> >
> > diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> > index 6130c969f361..a15924db83d9 100644
> > --- a/net/bluetooth/hci_event.c
> > +++ b/net/bluetooth/hci_event.c
> > @@ -5327,9 +5327,12 @@ static void hci_io_capa_request_evt(struct
> > hci_dev *hdev, void *data,
> > hci_dev_lock(hdev);
> >
> > conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr);
> > - if (!conn || !hci_conn_ssp_enabled(conn))
> > + if (!conn || !hci_dev_test_flag(hdev, HCI_SSP_ENABLED))
> > goto unlock;
> >
> > + /* Assume remote supports SSP since it has triggered this event */
> > + set_bit(HCI_CONN_SSP_ENABLED, &conn->flags);
> > +
> > hci_conn_hold(conn);
> >
> > if (!hci_dev_test_flag(hdev, HCI_MGMT))
> >
> >
> >> Regards
> >> Andrey VOLKOV
> >>
> >>
> >



--
Luiz Augusto von Dentz

2024-01-22 17:06:49

by Andrei Volkov

[permalink] [raw]
Subject: Re: regression in c7f59461f5a78: Bluetooth: Fix a refcnt underflow problem for hci_conn

Hi Luiz,

Le 22/01/2024 à 16:10, Luiz Augusto von Dentz a écrit :

> Hi Andrei,
>
> On Mon, Jan 22, 2024 at 9:45 AM Andrei Volkov
> <[email protected]> wrote:
>> Hi Luiz,
>>
>> Wouldn't it be better to add a 'yet-another' flag as an addition to
>> HCI_CONN_SSP_ENABLED, and clear it unconditionally at the top of
>> 'hci_remote_ext_features_evt', before
>>
>> " conn = hci_conn_hash_lookup_handle"
>>
>> check?
>>
>> In this case broken ext_features response (with ev->status != 0 or conn
>> == NULL) will not indirectly enable the SSP feature.
> HCI_CONN_SSP_ENABLED is already at conn level, besides that it is too
> late to clear it if the SSP procedure has already taken place and on
> top of it I don't want to change the code too much and risk having
> more regressions like this.
>
> Btw, something tells me that Android is not actually doing the
> HCI_OP_READ_REMOTE_EXT_FEATURES since we do have CI testing SSP and
> this change hasn't cause us any problems, do you know what command it
> uses? Perhaps it tries SSP right-away or does it cache the previous
> response?
>
The real problem is that since the ext. device does not receive a
response to the IO request (cmd 0x31) from BlueZ, it refuses to continue
the pairing. But with proposed reverting the ssp check, we are probably
coming back to the initial href problem.

Btw, for me it's unclear how the additional check in
"hci_io_capa_request_evt" helps with href undderrun. It looks like the
original FSM implementation is missing something, and the fix actually
is masking the problem.

Regards
Andrei VOLKOV

>> Regards
>> Andrei VOLKOV
>>
>> Le 22/01/2024 à 15:02, Luiz Augusto von Dentz a écrit :
>>
>>> Hi Andrei,
>>>
>>> On Mon, Jan 22, 2024 at 7:18 AM Andrei Volkov
>>> <[email protected]> wrote:
>>>> Hello,
>>>>
>>>> Lately we've bumped with regression introduced by commit:
>>>>
>>>> c7f59461f5a78 ("Bluetooth: Fix a refcnt underflow problem for
>>>> hci_conn", 2023-10-04)
>>>>
>>>> The regression related with adding "hci_conn_ssp_enabled()" check in
>>>> "hci_io_capa_request_evt()" handler, and broke pairing process initiated
>>>> by the external device.
>>>>
>>>> Precisely, some ext. devices, like any phone equipped with Android ver <
>>>> 14 (we have not latest one, so we didn't check), always send "IO
>>>> Capability Request" before "Read Remote Extended Features" command, as
>>>> consequence the flag "HCI_CONN_SSP_ENABLED" not yet activated at the
>>>> time of "hci_io_capa_request_evt()" execution and
>>>> "hci_conn_ssp_enabled()" always returns false in time of the pairing.
>>>>
>>>> As a result, pairing always fails. The quick and dirty fix is revert the
>>>> ssp check introduced in the subj. commit, like below:
>>>>
>>>> --- a/net/bluetooth/hci_event.c
>>>> +++ b/net/bluetooth/hci_event.c
>>>> @@ -5329,7 +5329,7 @@ static void hci_io_capa_request_evt(struct hci_dev
>>>> *hdev, void *data,
>>>> hci_dev_lock(hdev);
>>>>
>>>> conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr);
>>>> - if (!conn || !hci_conn_ssp_enabled(conn))
>>>> + if (!conn)
>>>> goto unlock;
>>>>
>>>> hci_conn_hold(conn);
>>>>
>>>>
>>>> However, a more thorough and correct fix requires discussion and
>>>> testing. Therefore, I would like to get any comments/suggestion from the
>>>> community before doing this.
>>> I think we need to do something like this, so we consider only the
>>> local SSP flag when evaluating if we should proceed to respond or just
>>> drop:
>>>
>>> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
>>> index 6130c969f361..a15924db83d9 100644
>>> --- a/net/bluetooth/hci_event.c
>>> +++ b/net/bluetooth/hci_event.c
>>> @@ -5327,9 +5327,12 @@ static void hci_io_capa_request_evt(struct
>>> hci_dev *hdev, void *data,
>>> hci_dev_lock(hdev);
>>>
>>> conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr);
>>> - if (!conn || !hci_conn_ssp_enabled(conn))
>>> + if (!conn || !hci_dev_test_flag(hdev, HCI_SSP_ENABLED))
>>> goto unlock;
>>>
>>> + /* Assume remote supports SSP since it has triggered this event */
>>> + set_bit(HCI_CONN_SSP_ENABLED, &conn->flags);
>>> +
>>> hci_conn_hold(conn);
>>>
>>> if (!hci_dev_test_flag(hdev, HCI_MGMT))
>>>
>>>
>>>> Regards
>>>> Andrey VOLKOV
>>>>
>>>>
>
>

2024-01-22 17:30:47

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: regression in c7f59461f5a78: Bluetooth: Fix a refcnt underflow problem for hci_conn

Hi Andrei,

On Mon, Jan 22, 2024 at 10:50 AM Andrei Volkov
<[email protected]> wrote:
>
> Hi Luiz,
>
> Le 22/01/2024 à 16:10, Luiz Augusto von Dentz a écrit :
>
> > Hi Andrei,
> >
> > On Mon, Jan 22, 2024 at 9:45 AM Andrei Volkov
> > <[email protected]> wrote:
> >> Hi Luiz,
> >>
> >> Wouldn't it be better to add a 'yet-another' flag as an addition to
> >> HCI_CONN_SSP_ENABLED, and clear it unconditionally at the top of
> >> 'hci_remote_ext_features_evt', before
> >>
> >> " conn = hci_conn_hash_lookup_handle"
> >>
> >> check?
> >>
> >> In this case broken ext_features response (with ev->status != 0 or conn
> >> == NULL) will not indirectly enable the SSP feature.
> > HCI_CONN_SSP_ENABLED is already at conn level, besides that it is too
> > late to clear it if the SSP procedure has already taken place and on
> > top of it I don't want to change the code too much and risk having
> > more regressions like this.
> >
> > Btw, something tells me that Android is not actually doing the
> > HCI_OP_READ_REMOTE_EXT_FEATURES since we do have CI testing SSP and
> > this change hasn't cause us any problems, do you know what command it
> > uses? Perhaps it tries SSP right-away or does it cache the previous
> > response?
> >
> The real problem is that since the ext. device does not receive a
> response to the IO request (cmd 0x31) from BlueZ, it refuses to continue
> the pairing. But with proposed reverting the ssp check, we are probably
> coming back to the initial href problem.
>
> Btw, for me it's unclear how the additional check in
> "hci_io_capa_request_evt" helps with href undderrun. It looks like the
> original FSM implementation is missing something, and the fix actually
> is masking the problem.

That is a totally different problem, most likely related to
conn->disc_work not being cancelled properly or something, so this
probably was only masking it for some reason but it is not a proper
fix, anyway it is still a good idea to check if hdev has SSP enabled
in any case, since it can be enabled/disabled at runtime thus why I'm
not completely reverting it.

If the conn_timeout problem comes back then we need to investigate
what is really causing that, but there is a high chance this has been
fixed since we have been reworking some of the code paths related to
hci_conn_del, etc.

> Regards
> Andrei VOLKOV
>
> >> Regards
> >> Andrei VOLKOV
> >>
> >> Le 22/01/2024 à 15:02, Luiz Augusto von Dentz a écrit :
> >>
> >>> Hi Andrei,
> >>>
> >>> On Mon, Jan 22, 2024 at 7:18 AM Andrei Volkov
> >>> <[email protected]> wrote:
> >>>> Hello,
> >>>>
> >>>> Lately we've bumped with regression introduced by commit:
> >>>>
> >>>> c7f59461f5a78 ("Bluetooth: Fix a refcnt underflow problem for
> >>>> hci_conn", 2023-10-04)
> >>>>
> >>>> The regression related with adding "hci_conn_ssp_enabled()" check in
> >>>> "hci_io_capa_request_evt()" handler, and broke pairing process initiated
> >>>> by the external device.
> >>>>
> >>>> Precisely, some ext. devices, like any phone equipped with Android ver <
> >>>> 14 (we have not latest one, so we didn't check), always send "IO
> >>>> Capability Request" before "Read Remote Extended Features" command, as
> >>>> consequence the flag "HCI_CONN_SSP_ENABLED" not yet activated at the
> >>>> time of "hci_io_capa_request_evt()" execution and
> >>>> "hci_conn_ssp_enabled()" always returns false in time of the pairing.
> >>>>
> >>>> As a result, pairing always fails. The quick and dirty fix is revert the
> >>>> ssp check introduced in the subj. commit, like below:
> >>>>
> >>>> --- a/net/bluetooth/hci_event.c
> >>>> +++ b/net/bluetooth/hci_event.c
> >>>> @@ -5329,7 +5329,7 @@ static void hci_io_capa_request_evt(struct hci_dev
> >>>> *hdev, void *data,
> >>>> hci_dev_lock(hdev);
> >>>>
> >>>> conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr);
> >>>> - if (!conn || !hci_conn_ssp_enabled(conn))
> >>>> + if (!conn)
> >>>> goto unlock;
> >>>>
> >>>> hci_conn_hold(conn);
> >>>>
> >>>>
> >>>> However, a more thorough and correct fix requires discussion and
> >>>> testing. Therefore, I would like to get any comments/suggestion from the
> >>>> community before doing this.
> >>> I think we need to do something like this, so we consider only the
> >>> local SSP flag when evaluating if we should proceed to respond or just
> >>> drop:
> >>>
> >>> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> >>> index 6130c969f361..a15924db83d9 100644
> >>> --- a/net/bluetooth/hci_event.c
> >>> +++ b/net/bluetooth/hci_event.c
> >>> @@ -5327,9 +5327,12 @@ static void hci_io_capa_request_evt(struct
> >>> hci_dev *hdev, void *data,
> >>> hci_dev_lock(hdev);
> >>>
> >>> conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr);
> >>> - if (!conn || !hci_conn_ssp_enabled(conn))
> >>> + if (!conn || !hci_dev_test_flag(hdev, HCI_SSP_ENABLED))
> >>> goto unlock;
> >>>
> >>> + /* Assume remote supports SSP since it has triggered this event */
> >>> + set_bit(HCI_CONN_SSP_ENABLED, &conn->flags);
> >>> +
> >>> hci_conn_hold(conn);
> >>>
> >>> if (!hci_dev_test_flag(hdev, HCI_MGMT))
> >>>
> >>>
> >>>> Regards
> >>>> Andrey VOLKOV
> >>>>
> >>>>
> >
> >



--
Luiz Augusto von Dentz