2019-11-10 20:16:11

by Aurelien Jarno

[permalink] [raw]
Subject: BlueZ/mesh: RX not working after daemon restart (with workaround)

Hi all,

On my system (Raspberry PI 3), the RX path doesn't work anymore
following a restart of the bluetooth-meshd daemon. I have tracked down
that to the fact that the receive callbacks are setup before the HCI is
fully initialized. Said otherwise, BT_HCI_CMD_LE_SET_SCAN_PARAMETERS is
called before BT_HCI_CMD_RESET and the callback calling
BT_HCI_CMD_LE_SET_SCAN_ENABLE is not called. This timing dependent and
probably not reproducible on all hardware.

I have workarounded the issue by adding a small delay between the HCI
initialization and the call to node_attach_io_all():

diff --git a/mesh/mesh.c b/mesh/mesh.c
index 9b2b2073b..1c06060f9 100644
--- a/mesh/mesh.c
+++ b/mesh/mesh.c
@@ -167,6 +167,10 @@ bool mesh_init(const char *config_dir, enum mesh_io_type type, void *opts)
mesh_io_get_caps(mesh.io, &caps);
mesh.max_filters = caps.max_num_filters;

+ for (int i = 0 ; i < 100 ; i++) {
+ l_main_iterate(10);
+ }
+
node_attach_io_all(mesh.io);

return true;

I guess there is a better way to do that by waiting for the HCI to be
fully initialized before calling node_attach_io_all() or by using a
callback instead. However I do not know the codebase good enough to fix
that properly.

Aurelien

--
Aurelien Jarno GPG: 4096R/1DDD8C9B
[email protected] http://www.aurel32.net


2019-11-10 21:07:03

by Steve Brown

[permalink] [raw]
Subject: Re: BlueZ/mesh: RX not working after daemon restart (with workaround)

On Sun, 2019-11-10 at 21:08 +0100, Aurelien Jarno wrote:
> Hi all,
>
> On my system (Raspberry PI 3), the RX path doesn't work anymore
> following a restart of the bluetooth-meshd daemon. I have tracked
> down
> that to the fact that the receive callbacks are setup before the HCI
> is
> fully initialized. Said otherwise, BT_HCI_CMD_LE_SET_SCAN_PARAMETERS
> is
> called before BT_HCI_CMD_RESET and the callback calling
> BT_HCI_CMD_LE_SET_SCAN_ENABLE is not called. This timing dependent
> and
> probably not reproducible on all hardware.
>
> I have workarounded the issue by adding a small delay between the HCI
> initialization and the call to node_attach_io_all():
>
> diff --git a/mesh/mesh.c b/mesh/mesh.c
> index 9b2b2073b..1c06060f9 100644
> --- a/mesh/mesh.c
> +++ b/mesh/mesh.c
> @@ -167,6 +167,10 @@ bool mesh_init(const char *config_dir, enum
> mesh_io_type type, void *opts)
> mesh_io_get_caps(mesh.io, &caps);
> mesh.max_filters = caps.max_num_filters;
>
> + for (int i = 0 ; i < 100 ; i++) {
> + l_main_iterate(10);
> + }
> +
> node_attach_io_all(mesh.io);
>
> return true;
>
> I guess there is a better way to do that by waiting for the HCI to be
> fully initialized before calling node_attach_io_all() or by using a
> callback instead. However I do not know the codebase good enough to
> fix
> that properly.
>
> Aurelien
>
I've experienced something similar on my rpi3. I found that on restart,
discover-unprovisioned stopped working.

In my case, it appears that meshd assumes that if there are existing
nodes, scanning has been enabled. Thus, calls from mesh-cfgclient to
discover additional unprovisioned nodes do not need another hci scan
enable at mesh/mesh-io-generic.c:736.

If meshd is restarted with preexisting nodes, scanning is still assumed
to already be enabled, but it's not. This breaks discover-unprovisioned
for me.

I suspect this is a symptom of a deeper problem where mesh/mesh-config-
json.c:load_node doesn't completely reestablish the node state that
existed when the node was originally added.

Steve






2019-11-10 21:40:48

by Aurelien Jarno

[permalink] [raw]
Subject: Re: BlueZ/mesh: RX not working after daemon restart (with workaround)

Hi,

On 2019-11-10 13:59, Steve Brown wrote:
> On Sun, 2019-11-10 at 21:08 +0100, Aurelien Jarno wrote:
> > Hi all,
> >
> > On my system (Raspberry PI 3), the RX path doesn't work anymore
> > following a restart of the bluetooth-meshd daemon. I have tracked
> > down
> > that to the fact that the receive callbacks are setup before the HCI
> > is
> > fully initialized. Said otherwise, BT_HCI_CMD_LE_SET_SCAN_PARAMETERS
> > is
> > called before BT_HCI_CMD_RESET and the callback calling
> > BT_HCI_CMD_LE_SET_SCAN_ENABLE is not called. This timing dependent
> > and
> > probably not reproducible on all hardware.
> >
> > I have workarounded the issue by adding a small delay between the HCI
> > initialization and the call to node_attach_io_all():
> >
> > diff --git a/mesh/mesh.c b/mesh/mesh.c
> > index 9b2b2073b..1c06060f9 100644
> > --- a/mesh/mesh.c
> > +++ b/mesh/mesh.c
> > @@ -167,6 +167,10 @@ bool mesh_init(const char *config_dir, enum
> > mesh_io_type type, void *opts)
> > mesh_io_get_caps(mesh.io, &caps);
> > mesh.max_filters = caps.max_num_filters;
> >
> > + for (int i = 0 ; i < 100 ; i++) {
> > + l_main_iterate(10);
> > + }
> > +
> > node_attach_io_all(mesh.io);
> >
> > return true;
> >
> > I guess there is a better way to do that by waiting for the HCI to be
> > fully initialized before calling node_attach_io_all() or by using a
> > callback instead. However I do not know the codebase good enough to
> > fix
> > that properly.
> >
> > Aurelien
> >
> I've experienced something similar on my rpi3. I found that on restart,
> discover-unprovisioned stopped working.

In my case I also observe the same.

> In my case, it appears that meshd assumes that if there are existing
> nodes, scanning has been enabled. Thus, calls from mesh-cfgclient to
> discover additional unprovisioned nodes do not need another hci scan
> enable at mesh/mesh-io-generic.c:736.
>
> If meshd is restarted with preexisting nodes, scanning is still assumed
> to already be enabled, but it's not. This breaks discover-unprovisioned
> for me.

Yes, I think this is exactly my problem. If there are existing nodes,
recv_register is called before the HCI is configured and pvt->rx_regs is
filled at mesh/mesh-io-generic.c:738. This means that later scanning is
assumed to be enabled. However the call to bt_hci_send with
BT_HCI_CMD_LE_SET_SCAN_PARAMETERS fails as the HCI is not yet
initialized and the callback set_recv_scan_enable() supposed to enable
scanning is not called.

So when loading a node, scanning is assumed to be enabled, but it is
not practice.

I believe my workaround should work on your system (maybe after
adjusting the number of iterations of the loop).

Aurelien

--
Aurelien Jarno GPG: 4096R/1DDD8C9B
[email protected] http://www.aurel32.net

2019-11-12 06:45:21

by Stotland, Inga

[permalink] [raw]
Subject: Re: BlueZ/mesh: RX not working after daemon restart (with workaround)

Hi Aurelien,

On Sun, 2019-11-10 at 22:39 +0100, Aurelien Jarno wrote:
> Hi,
>
> On 2019-11-10 13:59, Steve Brown wrote:
> > On Sun, 2019-11-10 at 21:08 +0100, Aurelien Jarno wrote:
> > > Hi all,
> > >
> > > On my system (Raspberry PI 3), the RX path doesn't work anymore
> > > following a restart of the bluetooth-meshd daemon. I have tracked
> > > down
> > > that to the fact that the receive callbacks are setup before the HCI
> > > is
> > > fully initialized. Said otherwise, BT_HCI_CMD_LE_SET_SCAN_PARAMETERS
> > > is
> > > called before BT_HCI_CMD_RESET and the callback calling
> > > BT_HCI_CMD_LE_SET_SCAN_ENABLE is not called. This timing dependent
> > > and
> > > probably not reproducible on all hardware.
> > >
> > > I have workarounded the issue by adding a small delay between the HCI
> > > initialization and the call to node_attach_io_all():
> > >
> > > diff --git a/mesh/mesh.c b/mesh/mesh.c
> > > index 9b2b2073b..1c06060f9 100644
> > > --- a/mesh/mesh.c
> > > +++ b/mesh/mesh.c
> > > @@ -167,6 +167,10 @@ bool mesh_init(const char *config_dir, enum
> > > mesh_io_type type, void *opts)
> > > mesh_io_get_caps(mesh.io, &caps);
> > > mesh.max_filters = caps.max_num_filters;
> > >
> > > + for (int i = 0 ; i < 100 ; i++) {
> > > + l_main_iterate(10);
> > > + }
> > > +
> > > node_attach_io_all(mesh.io);
> > >
> > > return true;
> > >
> > > I guess there is a better way to do that by waiting for the HCI to be
> > > fully initialized before calling node_attach_io_all() or by using a
> > > callback instead. However I do not know the codebase good enough to
> > > fix
> > > that properly.
> > >
> > > Aurelien
> > >
> > I've experienced something similar on my rpi3. I found that on restart,
> > discover-unprovisioned stopped working.
>
> In my case I also observe the same.
>
> > In my case, it appears that meshd assumes that if there are existing
> > nodes, scanning has been enabled. Thus, calls from mesh-cfgclient to
> > discover additional unprovisioned nodes do not need another hci scan
> > enable at mesh/mesh-io-generic.c:736.
> >
> > If meshd is restarted with preexisting nodes, scanning is still assumed
> > to already be enabled, but it's not. This breaks discover-unprovisioned
> > for me.
>
> Yes, I think this is exactly my problem. If there are existing nodes,
> recv_register is called before the HCI is configured and pvt->rx_regs is
> filled at mesh/mesh-io-generic.c:738. This means that later scanning is
> assumed to be enabled. However the call to bt_hci_send with
> BT_HCI_CMD_LE_SET_SCAN_PARAMETERS fails as the HCI is not yet
> initialized and the callback set_recv_scan_enable() supposed to enable
> scanning is not called.
>
> So when loading a node, scanning is assumed to be enabled, but it is
> not practice.
>
> I believe my workaround should work on your system (maybe after
> adjusting the number of iterations of the loop).
>
> Aurelien
>

Thanks for the analysis. I think we should switch to callback approach,
i.e. initialize io first and register the RX on the successful init
callback.

Best regards,
Inga