This set of patches addresses the situation when an application
calls Leave() or Attach() methods on a node that has another incomplete
method call on Network interface associated with it.
The simple solution is to return error in response to Leave() method
if an application still owes the daemon either a reply to GetManagedObjects
request in case of Attach() or a reply to JoinComplete() in case of
Join(), Create() or Import()
Similarly, a "Busy" error is returned in response to Attach() method
if an application still owes the daemon a reply to JoinComplete()
in case of pending Join(), Create() or Import()
Inga Stotland (5):
mesh: Delete unused function
mesh: Make "Busy" and "InProgress" to be distinct errors
mesh: Add destroy callback to dbus_send_with_timeout()
mesh: Add timeout to GetManagedObjects call
mesh: Add "node is busy" check for Leave() & Attach()
doc/mesh-api.txt | 5 +++-
mesh/dbus.c | 10 +++++++-
mesh/dbus.h | 1 +
mesh/error.h | 1 +
mesh/manager.c | 11 ++++-----
mesh/mesh.c | 22 +++++++++++------
mesh/node.c | 63 +++++++++++++++++++++++++++++-------------------
mesh/node.h | 2 +-
test/test-mesh | 2 --
9 files changed, 74 insertions(+), 43 deletions(-)
--
2.26.2
This separates "Busy" and "InProgress" error codes:
MESH_ERROR_IN_PROGRESS maps to org.bluez.mesh.Error.InProgress
MESH_ERROR_BUSY maps to org.bluez.mesh.Error.Busy
Minor API change:
UpdateAppKey() returns "InProgress" error instead of "Busy"
---
doc/mesh-api.txt | 2 +-
mesh/dbus.c | 3 ++-
mesh/error.h | 1 +
mesh/manager.c | 11 +++++------
4 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/doc/mesh-api.txt b/doc/mesh-api.txt
index 24cface22..e85f0bf52 100644
--- a/doc/mesh-api.txt
+++ b/doc/mesh-api.txt
@@ -654,7 +654,7 @@ Methods:
org.bluez.mesh.Error.Failed
org.bluez.mesh.Error.InvalidArguments
org.bluez.mesh.Error.DoesNotExist
- org.bluez.mesh.Error.Busy
+ org.bluez.mesh.Error.InProgress
void DeleteAppKey(uint16 app_index)
diff --git a/mesh/dbus.c b/mesh/dbus.c
index bf0f73bd9..83ae22c9f 100644
--- a/mesh/dbus.c
+++ b/mesh/dbus.c
@@ -56,7 +56,8 @@ static struct error_entry const error_table[] =
{ ERROR_INTERFACE ".NotAuthorized", "Permission denied"},
{ ERROR_INTERFACE ".NotFound", "Object not found"},
{ ERROR_INTERFACE ".InvalidArgs", "Invalid arguments"},
- { ERROR_INTERFACE ".InProgress", "Already in progress"},
+ { ERROR_INTERFACE ".InProgress", "Operation already in progress"},
+ { ERROR_INTERFACE ".Busy", "Busy"},
{ ERROR_INTERFACE ".AlreadyExists", "Already exists"},
{ ERROR_INTERFACE ".DoesNotExist", "Does not exist"},
{ ERROR_INTERFACE ".Canceled", "Operation canceled"},
diff --git a/mesh/error.h b/mesh/error.h
index f3e0f5476..2809915b0 100644
--- a/mesh/error.h
+++ b/mesh/error.h
@@ -27,6 +27,7 @@ enum mesh_error {
MESH_ERROR_NOT_AUTHORIZED,
MESH_ERROR_NOT_FOUND,
MESH_ERROR_INVALID_ARGS,
+ MESH_ERROR_IN_PROGRESS,
MESH_ERROR_BUSY,
MESH_ERROR_ALREADY_EXISTS,
MESH_ERROR_DOES_NOT_EXIST,
diff --git a/mesh/manager.c b/mesh/manager.c
index 2be471088..8ef681366 100644
--- a/mesh/manager.c
+++ b/mesh/manager.c
@@ -60,7 +60,7 @@ static void scan_cancel(struct l_timeout *timeout, void *user_data)
struct mesh_io *io;
struct mesh_net *net;
- l_debug("scan_cancel");
+ l_debug("");
if (scan_timeout)
l_timeout_remove(scan_timeout);
@@ -249,11 +249,10 @@ static struct l_dbus_message *add_node_call(struct l_dbus *dbus,
return dbus_error(msg, MESH_ERROR_INVALID_ARGS, NULL);
if (!l_dbus_message_iter_get_fixed_array(&iter_uuid, &uuid, &n)
- || n != 16) {
- l_debug("n = %u", n);
+ || n != 16)
return dbus_error(msg, MESH_ERROR_INVALID_ARGS,
"Bad device UUID");
- }
+
/* Allow AddNode to cancel Scanning if from the same node */
if (scan_node) {
if (scan_node != node)
@@ -263,7 +262,6 @@ static struct l_dbus_message *add_node_call(struct l_dbus *dbus,
}
/* Invoke Prov Initiator */
-
add_pending = l_new(struct add_data, 1);
memcpy(add_pending->uuid, uuid, 16);
add_pending->node = node;
@@ -554,7 +552,8 @@ static struct l_dbus_message *update_subnet_call(struct l_dbus *dbus,
}
/* All other phases mean KR already in progress over-the-air */
- return dbus_error(msg, MESH_ERROR_BUSY, "Key Refresh in progress");
+ return dbus_error(msg, MESH_ERROR_IN_PROGRESS,
+ "Key Refresh in progress");
}
static struct l_dbus_message *delete_subnet_call(struct l_dbus *dbus,
--
2.26.2
Switch to using dbus_send_with_timeout when making a request
to get managed objects from an application.
---
mesh/node.c | 43 ++++++++++++++++++++-----------------------
1 file changed, 20 insertions(+), 23 deletions(-)
diff --git a/mesh/node.c b/mesh/node.c
index d1d4da23d..7ec06437b 100644
--- a/mesh/node.c
+++ b/mesh/node.c
@@ -1459,7 +1459,7 @@ static void get_managed_objects_cb(struct l_dbus_message *msg, void *user_data)
struct keyring_net_key net_key;
uint8_t dev_key[16];
- if (l_dbus_message_is_error(msg)) {
+ if (!msg || l_dbus_message_is_error(msg)) {
l_error("Failed to get app's dbus objects");
goto fail;
}
@@ -1616,8 +1616,7 @@ static void get_managed_objects_cb(struct l_dbus_message *msg, void *user_data)
fail:
/* Handle failed requests */
- if (node)
- node_remove(node);
+ node_remove(node);
if (req->type == REQUEST_TYPE_JOIN)
req->join_ready_cb(NULL, NULL);
@@ -1628,6 +1627,20 @@ fail:
l_free(req->import);
}
+static void send_managed_objects_request(const char *destination,
+ const char *path,
+ struct managed_obj_request *req)
+{
+ struct l_dbus_message *msg;
+
+ msg = l_dbus_message_new_method_call(dbus_get_bus(), destination, path,
+ L_DBUS_INTERFACE_OBJECT_MANAGER,
+ "GetManagedObjects");
+ l_dbus_message_set_arguments(msg, "");
+ dbus_send_with_timeout(dbus_get_bus(), msg, get_managed_objects_cb,
+ req, l_free, DEFAULT_DBUS_TIMEOUT);
+}
+
/* Establish relationship between application and mesh node */
void node_attach(const char *app_root, const char *sender, uint64_t token,
node_ready_func_t cb, void *user_data)
@@ -1661,11 +1674,7 @@ void node_attach(const char *app_root, const char *sender, uint64_t token,
req->attach = node;
req->type = REQUEST_TYPE_ATTACH;
- l_dbus_method_call(dbus_get_bus(), sender, app_root,
- L_DBUS_INTERFACE_OBJECT_MANAGER,
- "GetManagedObjects", NULL,
- get_managed_objects_cb,
- req, l_free);
+ send_managed_objects_request(sender, app_root, req);
}
/* Create a temporary pre-provisioned node */
@@ -1681,11 +1690,7 @@ void node_join(const char *app_root, const char *sender, const uint8_t *uuid,
req->join_ready_cb = cb;
req->type = REQUEST_TYPE_JOIN;
- l_dbus_method_call(dbus_get_bus(), sender, app_root,
- L_DBUS_INTERFACE_OBJECT_MANAGER,
- "GetManagedObjects", NULL,
- get_managed_objects_cb,
- req, l_free);
+ send_managed_objects_request(sender, app_root, req);
}
void node_import(const char *app_root, const char *sender, const uint8_t *uuid,
@@ -1715,11 +1720,7 @@ void node_import(const char *app_root, const char *sender, const uint8_t *uuid,
req->type = REQUEST_TYPE_IMPORT;
- l_dbus_method_call(dbus_get_bus(), sender, app_root,
- L_DBUS_INTERFACE_OBJECT_MANAGER,
- "GetManagedObjects", NULL,
- get_managed_objects_cb,
- req, l_free);
+ send_managed_objects_request(sender, app_root, req);
}
void node_create(const char *app_root, const char *sender, const uint8_t *uuid,
@@ -1735,11 +1736,7 @@ void node_create(const char *app_root, const char *sender, const uint8_t *uuid,
req->pending_msg = user_data;
req->type = REQUEST_TYPE_CREATE;
- l_dbus_method_call(dbus_get_bus(), sender, app_root,
- L_DBUS_INTERFACE_OBJECT_MANAGER,
- "GetManagedObjects", NULL,
- get_managed_objects_cb,
- req, l_free);
+ send_managed_objects_request(sender, app_root, req);
}
static void build_element_config(void *a, void *b)
--
2.26.2
This deletes unused function node_is_provisioned()
---
mesh/node.c | 5 -----
mesh/node.h | 1 -
2 files changed, 6 deletions(-)
diff --git a/mesh/node.c b/mesh/node.c
index db888d27c..d1d4da23d 100644
--- a/mesh/node.c
+++ b/mesh/node.c
@@ -598,11 +598,6 @@ bool node_is_provisioner(struct mesh_node *node)
return node->provisioner;
}
-bool node_is_provisioned(struct mesh_node *node)
-{
- return (!IS_UNASSIGNED(node->primary));
-}
-
void node_app_key_delete(struct mesh_node *node, uint16_t net_idx,
uint16_t app_idx)
{
diff --git a/mesh/node.h b/mesh/node.h
index 290681e28..e26d410c8 100644
--- a/mesh/node.h
+++ b/mesh/node.h
@@ -39,7 +39,6 @@ struct mesh_node *node_find_by_addr(uint16_t addr);
struct mesh_node *node_find_by_uuid(uint8_t uuid[16]);
struct mesh_node *node_find_by_token(uint64_t token);
bool node_is_provisioner(struct mesh_node *node);
-bool node_is_provisioned(struct mesh_node *node);
void node_app_key_delete(struct mesh_node *node, uint16_t net_idx,
uint16_t app_idx);
uint16_t node_get_primary(struct mesh_node *node);
--
2.26.2
This adds a destroy callback as a function parameter to
dbus_send_with_timeout() to allow automatic release of user data
on either reply or timeout.
---
mesh/dbus.c | 7 +++++++
mesh/dbus.h | 1 +
mesh/mesh.c | 12 ++++++------
3 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/mesh/dbus.c b/mesh/dbus.c
index 83ae22c9f..63ea420ed 100644
--- a/mesh/dbus.c
+++ b/mesh/dbus.c
@@ -41,6 +41,7 @@ struct send_info {
struct l_dbus *dbus;
struct l_timeout *timeout;
l_dbus_message_func_t cb;
+ l_dbus_destroy_func_t destroy;
void *user_data;
uint32_t serial;
};
@@ -159,6 +160,10 @@ static void send_reply(struct l_dbus_message *message, void *user_data)
l_timeout_remove(info->timeout);
info->cb(message, info->user_data);
+
+ if (info->destroy)
+ info->destroy(info->user_data);
+
l_free(info);
}
@@ -173,6 +178,7 @@ static void send_timeout(struct l_timeout *timeout, void *user_data)
void dbus_send_with_timeout(struct l_dbus *dbus, struct l_dbus_message *msg,
l_dbus_message_func_t cb,
void *user_data,
+ l_dbus_destroy_func_t destroy,
unsigned int seconds)
{
struct send_info *info = l_new(struct send_info, 1);
@@ -180,6 +186,7 @@ void dbus_send_with_timeout(struct l_dbus *dbus, struct l_dbus_message *msg,
info->dbus = dbus;
info->cb = cb;
info->user_data = user_data;
+ info->destroy = destroy;
info->serial = l_dbus_send_with_reply(dbus, msg, send_reply,
info, NULL);
info->timeout = l_timeout_create(seconds, send_timeout, info, NULL);
diff --git a/mesh/dbus.h b/mesh/dbus.h
index aafb85f6b..89d6b1d31 100644
--- a/mesh/dbus.h
+++ b/mesh/dbus.h
@@ -36,4 +36,5 @@ struct l_dbus_message *dbus_error(struct l_dbus_message *msg, int err,
void dbus_send_with_timeout(struct l_dbus *dbus, struct l_dbus_message *msg,
l_dbus_message_func_t cb,
void *user_data,
+ l_dbus_destroy_func_t destroy,
unsigned int seconds);
diff --git a/mesh/mesh.c b/mesh/mesh.c
index 24ea3afd6..a5935c216 100644
--- a/mesh/mesh.c
+++ b/mesh/mesh.c
@@ -439,12 +439,12 @@ static void send_join_failed(const char *owner, const char *path,
free_pending_join_call(true);
}
-static void prov_join_complete_reply_cb(struct l_dbus_message *message,
+static void prov_join_complete_reply_cb(struct l_dbus_message *msg,
void *user_data)
{
bool failed = false;
- if (!message || l_dbus_message_is_error(message))
+ if (!msg || l_dbus_message_is_error(msg))
failed = true;
if (!failed)
@@ -488,7 +488,7 @@ static bool prov_complete_cb(void *user_data, uint8_t status,
l_dbus_message_set_arguments(msg, "t", l_get_be64(token));
dbus_send_with_timeout(dbus, msg, prov_join_complete_reply_cb,
- NULL, DEFAULT_DBUS_TIMEOUT);
+ NULL, NULL, DEFAULT_DBUS_TIMEOUT);
return true;
}
@@ -666,12 +666,12 @@ static struct l_dbus_message *leave_call(struct l_dbus *dbus,
return l_dbus_message_new_method_return(msg);
}
-static void create_join_complete_reply_cb(struct l_dbus_message *message,
+static void create_join_complete_reply_cb(struct l_dbus_message *msg,
void *user_data)
{
struct mesh_node *node = user_data;
- if (!message || l_dbus_message_is_error(message)) {
+ if (!msg || l_dbus_message_is_error(msg)) {
node_remove(node);
return;
}
@@ -716,7 +716,7 @@ static void create_node_ready_cb(void *user_data, int status,
l_dbus_message_set_arguments(msg, "t", l_get_be64(token));
dbus_send_with_timeout(dbus, msg, create_join_complete_reply_cb,
- node, DEFAULT_DBUS_TIMEOUT);
+ node, NULL, DEFAULT_DBUS_TIMEOUT);
l_dbus_message_unref(pending_msg);
}
--
2.26.2
This introduces the following behavior change for those methods
on Network interface that specify node token as an input parameter
Leave() method:
If Leave method is called for a node that is being processed as a result
of a Create, Import, Join or Attach method calls in progress, node removal
is not allowed and org.bluez.mesh.Error.Busy error is returned.
Attach() method:
If Attach method is called for a node that is being processed as a result
of a Create, Import or Join method calls in progress, node attachment
is not allowed and org.bluez.mesh.Error.Busy error is returned.
---
doc/mesh-api.txt | 3 +++
mesh/mesh.c | 10 +++++++++-
mesh/node.c | 21 +++++++++++++++++++++
mesh/node.h | 1 +
test/test-mesh | 2 --
5 files changed, 34 insertions(+), 3 deletions(-)
diff --git a/doc/mesh-api.txt b/doc/mesh-api.txt
index e85f0bf52..4bef6174c 100644
--- a/doc/mesh-api.txt
+++ b/doc/mesh-api.txt
@@ -116,6 +116,7 @@ Methods:
org.bluez.mesh.Error.InvalidArguments
org.bluez.mesh.Error.NotFound,
org.bluez.mesh.Error.AlreadyExists,
+ org.bluez.mesh.Error.Busy,
org.bluez.mesh.Error.Failed
void Leave(uint64 token)
@@ -126,6 +127,8 @@ Methods:
PossibleErrors:
org.bluez.mesh.Error.InvalidArguments
+ org.bluez.mesh.Error.NotFound
+ org.bluez.mesh.Error.Busy
void CreateNetwork(object app_root, array{byte}[16] uuid)
diff --git a/mesh/mesh.c b/mesh/mesh.c
index a5935c216..c8767ee7a 100644
--- a/mesh/mesh.c
+++ b/mesh/mesh.c
@@ -655,13 +655,21 @@ static struct l_dbus_message *leave_call(struct l_dbus *dbus,
void *user_data)
{
uint64_t token;
+ struct mesh_node *node;
l_debug("Leave");
if (!l_dbus_message_get_arguments(msg, "t", &token))
return dbus_error(msg, MESH_ERROR_INVALID_ARGS, NULL);
- node_remove(node_find_by_token(token));
+ node = node_find_by_token(token);
+ if (!node)
+ return dbus_error(msg, MESH_ERROR_NOT_FOUND, NULL);
+
+ if (node_is_busy(node))
+ return dbus_error(msg, MESH_ERROR_BUSY, NULL);
+
+ node_remove(node);
return l_dbus_message_new_method_return(msg);
}
diff --git a/mesh/node.c b/mesh/node.c
index 7ec06437b..567f2e6db 100644
--- a/mesh/node.c
+++ b/mesh/node.c
@@ -88,6 +88,7 @@ struct mesh_node {
char *storage_dir;
uint32_t disc_watch;
uint32_t seq_number;
+ bool busy;
bool provisioner;
uint16_t primary;
struct node_composition comp;
@@ -598,6 +599,11 @@ bool node_is_provisioner(struct mesh_node *node)
return node->provisioner;
}
+bool node_is_busy(struct mesh_node *node)
+{
+ return node->busy;
+}
+
void node_app_key_delete(struct mesh_node *node, uint16_t net_idx,
uint16_t app_idx)
{
@@ -1352,6 +1358,8 @@ static bool add_local_node(struct mesh_node *node, uint16_t unicast, bool kr,
/* Initialize configuration server model */
cfgmod_server_init(node, PRIMARY_ELE_IDX);
+ node->busy = true;
+
return true;
}
@@ -1459,6 +1467,9 @@ static void get_managed_objects_cb(struct l_dbus_message *msg, void *user_data)
struct keyring_net_key net_key;
uint8_t dev_key[16];
+ if (req->type == REQUEST_TYPE_ATTACH)
+ req->attach->busy = false;
+
if (!msg || l_dbus_message_is_error(msg)) {
l_error("Failed to get app's dbus objects");
goto fail;
@@ -1654,6 +1665,12 @@ void node_attach(const char *app_root, const char *sender, uint64_t token,
return;
}
+ /* Check if there is a pending request associated with this node */
+ if (node->busy) {
+ cb(user_data, MESH_ERROR_BUSY, NULL);
+ return;
+ }
+
/* Check if the node is already in use */
if (node->owner) {
l_warn("The node is already in use");
@@ -1674,6 +1691,8 @@ void node_attach(const char *app_root, const char *sender, uint64_t token,
req->attach = node;
req->type = REQUEST_TYPE_ATTACH;
+ node->busy = true;
+
send_managed_objects_request(sender, app_root, req);
}
@@ -2347,6 +2366,8 @@ void node_finalize_new_node(struct mesh_node *node, struct mesh_io *io)
free_node_dbus_resources(node);
mesh_agent_remove(node->agent);
+ node->busy = false;
+
/* Register callback for the node's io */
attach_io(node, io);
}
diff --git a/mesh/node.h b/mesh/node.h
index e26d410c8..b8b2b1b49 100644
--- a/mesh/node.h
+++ b/mesh/node.h
@@ -39,6 +39,7 @@ struct mesh_node *node_find_by_addr(uint16_t addr);
struct mesh_node *node_find_by_uuid(uint8_t uuid[16]);
struct mesh_node *node_find_by_token(uint64_t token);
bool node_is_provisioner(struct mesh_node *node);
+bool node_is_busy(struct mesh_node *node);
void node_app_key_delete(struct mesh_node *node, uint16_t net_idx,
uint16_t app_idx);
uint16_t node_get_primary(struct mesh_node *node);
diff --git a/test/test-mesh b/test/test-mesh
index 38f0c0a74..7c8a25482 100755
--- a/test/test-mesh
+++ b/test/test-mesh
@@ -412,8 +412,6 @@ class Application(dbus.service.Object):
token = value
have_token = True
- if attached == False:
- attach(token)
@dbus.service.method(MESH_APPLICATION_IFACE,
in_signature="s", out_signature="")
--
2.26.2
Patchset Applied
On Wed, 2020-06-10 at 10:11 -0700, Inga Stotland wrote:
> This set of patches addresses the situation when an application
> calls Leave() or Attach() methods on a node that has another incomplete
> method call on Network interface associated with it.
>
> The simple solution is to return error in response to Leave() method
> if an application still owes the daemon either a reply to GetManagedObjects
> request in case of Attach() or a reply to JoinComplete() in case of
> Join(), Create() or Import()
>
> Similarly, a "Busy" error is returned in response to Attach() method
> if an application still owes the daemon a reply to JoinComplete()
> in case of pending Join(), Create() or Import()
>
> Inga Stotland (5):
> mesh: Delete unused function
> mesh: Make "Busy" and "InProgress" to be distinct errors
> mesh: Add destroy callback to dbus_send_with_timeout()
> mesh: Add timeout to GetManagedObjects call
> mesh: Add "node is busy" check for Leave() & Attach()
>
> doc/mesh-api.txt | 5 +++-
> mesh/dbus.c | 10 +++++++-
> mesh/dbus.h | 1 +
> mesh/error.h | 1 +
> mesh/manager.c | 11 ++++-----
> mesh/mesh.c | 22 +++++++++++------
> mesh/node.c | 63 +++++++++++++++++++++++++++++-------------------
> mesh/node.h | 2 +-
> test/test-mesh | 2 --
> 9 files changed, 74 insertions(+), 43 deletions(-)
>
Hi Inga,
On 06/10, Inga Stotland wrote:
> Attach() method:
> If Attach method is called for a node that is being processed as a result
> of a Create, Import or Join method calls in progress, node attachment
> is not allowed and org.bluez.mesh.Error.Busy error is returned.
I don't think I understand how this is supposed to be used by the
application.
So far, we've implemented the API by calling Import() and then, as soon as
JoinComplete() call arrives, calling Attach().
> @@ -1654,6 +1665,12 @@ void node_attach(const char *app_root, const char *sender, uint64_t token,
> return;
> }
>
> + /* Check if there is a pending request associated with this node */
> + if (node->busy) {
> + cb(user_data, MESH_ERROR_BUSY, NULL);
> + return;
> + }
> +
> /* Check if the node is already in use */
> if (node->owner) {
> l_warn("The node is already in use");
With this patch, this call sequence fails, because now we're supposed to
send a *reply* to JoinComplete first, and only then call Attach()?
I'm using a high-level API for D-Bus, so I don't really control when the
reply is sent, so at the moment the only way to implement this would be
by delaying Attach() by a fixed timeout - and I'm not comfortable with
that.
regards
--
Michał Lowas-Rzechonek <[email protected]>
Silvair http://silvair.com
Jasnogórska 44, 31-358 Krakow, POLAND
Hi Michał,
On Mon, 2020-06-15 at 10:22 +0200, Michał Lowas-Rzechonek wrote:
> Hi Inga,
>
> On 06/10, Inga Stotland wrote:
> > Attach() method:
> > If Attach method is called for a node that is being processed as a result
> > of a Create, Import or Join method calls in progress, node attachment
> > is not allowed and org.bluez.mesh.Error.Busy error is returned.
>
> I don't think I understand how this is supposed to be used by the
> application.
>
> So far, we've implemented the API by calling Import() and then, as soon as
> JoinComplete() call arrives, calling Attach().
>
> > @@ -1654,6 +1665,12 @@ void node_attach(const char *app_root, const char *sender, uint64_t token,
> > return;
> > }
> >
> > + /* Check if there is a pending request associated with this node */
> > + if (node->busy) {
> > + cb(user_data, MESH_ERROR_BUSY, NULL);
> > + return;
> > + }
> > +
> > /* Check if the node is already in use */
> > if (node->owner) {
> > l_warn("The node is already in use");
>
> With this patch, this call sequence fails, because now we're supposed to
> send a *reply* to JoinComplete first, and only then call Attach()?
A couple months ago, we made the decision (with your input, I believe) that we needed to use JoinComplete on
every node creation (Join, Import, Create), to ensure that the App has the token, and all required
informationnthe daemon requires the App to have. If the daemon does *not* get the successful return from
JoinComplete, it destroys the entire node... So therefore the node isn't entirely complete until the daemon
receives the return.
This creates an awkward state for the node... Part of Attach() requires the daemon to make an ObjectManager
request of the app to get the latest composition settings, and validate the App which is attaching with that
token. It also creates the unfortunate situation revealed in one of your test-suite cases where if Leave() was
called before returning the JoinComplete() call, we Seg-Faulted.
Both Leave(), and the situation of the *first* Attach() following JoinComplete are "Once in a Node's Lifetime"
situations. We decided that the safest course of action here was to require the JoinComplete return prior
either Attach or Leave... Because the node needs finalizing.
>
> I'm using a high-level API for D-Bus, so I don't really control when the
> reply is sent, so at the moment the only way to implement this would be
> by delaying Attach() by a fixed timeout - and I'm not comfortable with
> that.
Yeah, I can see how this is now required...
In the mesh-cfgclient tool (which is also built on ELL) we accomplish this by scheduling an idle_oneshot for
the Attach.
It could also be done by issuing the l_dbus_send of the JoinComplete response within join_complete, before
calling the Attach send... Then returning NULL as the return result of join_complete (where the response would
normally be sent).
>
> regards
Inga, Brian,
On 06/15, Gix, Brian wrote:
> > With this patch, this call sequence fails, because now we're supposed to
> > send a *reply* to JoinComplete first, and only then call Attach()?
>
> A couple months ago, we made the decision (with your input, I believe)
> that we needed to use JoinComplete on every node creation (Join,
> Import, Create), to ensure that the App has the token (...)
Yes, I remember that discussion. The rationale was ability to catch
bugs in the application, and get rid of created, but effectively
unusable nodes.
> This creates (...) the unfortunate situation revealed in one of your
> test-suite cases where if Leave() was called before returning the
> JoinComplete() call, we Seg-Faulted.
Indeed, although I don't think it's necessary to introduce a "busy"
state...
In case of Leave(), this call removes the node anyway, so what I think
would suffice is to check whether the node still exists when
JoinComplete reply arrives, to avoid freeing the node twice (causing
SEGV).
void node_finalize_new_node(struct mesh_node *node, struct mesh_io *io)
{
if (!node)
return;
if (!l_queue_find(nodes, match_simple, L_UINT_TO_PTR(node)))
return;
// ...
}
This would allow the application to call Leave *before* sending a reply
to JoinComplete.
As for Attach(), I also think it should be legal to call it before
replying to JoinComplete. The worst thing that can happen is that
application successfully attaches, then replies to JoinComplete with an
error. This would simply remove the node, and the application would be
promptly detached.
> > I'm using a high-level API for D-Bus, so I don't really control when the
> > reply is sent, so at the moment the only way to implement this would be
> > by delaying Attach() by a fixed timeout - and I'm not comfortable with
> > that.
>
>
> Yeah, I can see how this is now required...
>
> In the mesh-cfgclient tool (which is also built on ELL) we accomplish
> this by scheduling an idle_oneshot for the Attach.
Unfortunately, not all main loops have API to schedule "idle" calls,
i.e. calls executed when the loop doesn't have anything better to do.
I know that both ELL and Glib do, but AFAIR Qt does not (it uses timers
with timeout set to 0, if I'm not mistaken), and Python's asyncio
doesn't either.
I don't think requiring a specific sequence of dbus messages is a good
idea :(
regards
--
Michał Lowas-Rzechonek <[email protected]>
Silvair http://silvair.com
Jasnogórska 44, 31-358 Krakow, POLAND
Hi Michal,
On Mon, 2020-06-15 at 22:32 +0200, [email protected]
wrote:
> Inga, Brian,
>
> On 06/15, Gix, Brian wrote:
> > > With this patch, this call sequence fails, because now we're supposed to
> > > send a *reply* to JoinComplete first, and only then call Attach()?
> >
> > A couple months ago, we made the decision (with your input, I believe)
> > that we needed to use JoinComplete on every node creation (Join,
> > Import, Create), to ensure that the App has the token (...)
>
> Yes, I remember that discussion. The rationale was ability to catch
> bugs in the application, and get rid of created, but effectively
> unusable nodes.
>
> > This creates (...) the unfortunate situation revealed in one of your
> > test-suite cases where if Leave() was called before returning the
> > JoinComplete() call, we Seg-Faulted.
>
> Indeed, although I don't think it's necessary to introduce a "busy"
> state...
>
> In case of Leave(), this call removes the node anyway, so what I think
> would suffice is to check whether the node still exists when
> JoinComplete reply arrives, to avoid freeing the node twice (causing
> SEGV).
>
> void node_finalize_new_node(struct mesh_node *node, struct mesh_io *io)
> {
> if (!node)
> return;
>
> if (!l_queue_find(nodes, match_simple, L_UINT_TO_PTR(node)))
> return;
>
I am afraid that this kind of check may lead to a race condition (rare, but possible) when:
- a un-finalized node has been removed via Leave and
- the daemon still waiting for JoinCOmplete() reply and
- meanwhile another one has either been created or imported reusing the memory allocation (entirely possible) and has been attached
So when the original JoinComplete returns either via timeout/error/ok,we may unintentionally remove dbus resources of a new node that has been validated and attached.
> // ...
> }
>
> This would allow the application to call Leave *before* sending a reply
> to JoinComplete.
>
> As for Attach(), I also think it should be legal to call it before
> replying to JoinComplete. The worst thing that can happen is that
> application successfully attaches, then replies to JoinComplete with an
> error. This would simply remove the node, and the application would be
> promptly detached.
>
>
I guess we could introduce an internal timer inside the daemon to put
Attach on hold until JoinComplete is done. If JoinComplete returns an
error, then Attach won'r go through and would return error as well
> > > I'm using a high-level API for D-Bus, so I don't really control when the
> > > reply is sent, so at the moment the only way to implement this would be
> > > by delaying Attach() by a fixed timeout - and I'm not comfortable with
> > > that.
> >
> >
> > Yeah, I can see how this is now required...
> >
> > In the mesh-cfgclient tool (which is also built on ELL) we accomplish
> > this by scheduling an idle_oneshot for the Attach.
>
> Unfortunately, not all main loops have API to schedule "idle" calls,
> i.e. calls executed when the loop doesn't have anything better to do.
>
> I know that both ELL and Glib do, but AFAIR Qt does not (it uses timers
> with timeout set to 0, if I'm not mistaken), and Python's asyncio
> doesn't either.
>
> I don't think requiring a specific sequence of dbus messages is a good
> idea :(
>
> regards
Hi Michał, Inga,
On Mon, 2020-06-15 at 22:25 +0000, Stotland, Inga wrote:
> Hi Michal,
>
> On Mon, 2020-06-15 at 22:32 +0200, [email protected]
> wrote:
> > Inga, Brian,
> >
> > On 06/15, Gix, Brian wrote:
> > > > With this patch, this call sequence fails, because now we're supposed to
> > > > send a *reply* to JoinComplete first, and only then call Attach()?
> > >
> > > A couple months ago, we made the decision (with your input, I believe)
> > > that we needed to use JoinComplete on every node creation (Join,
> > > Import, Create), to ensure that the App has the token (...)
> >
> > Yes, I remember that discussion. The rationale was ability to catch
> > bugs in the application, and get rid of created, but effectively
> > unusable nodes.
> >
> > > This creates (...) the unfortunate situation revealed in one of your
> > > test-suite cases where if Leave() was called before returning the
> > > JoinComplete() call, we Seg-Faulted.
> >
> > Indeed, although I don't think it's necessary to introduce a "busy"
> > state...
> >
> > In case of Leave(), this call removes the node anyway, so what I think
> > would suffice is to check whether the node still exists when
> > JoinComplete reply arrives, to avoid freeing the node twice (causing
> > SEGV).
> >
> > void node_finalize_new_node(struct mesh_node *node, struct mesh_io *io)
> > {
> > if (!node)
> > return;
> >
> > if (!l_queue_find(nodes, match_simple, L_UINT_TO_PTR(node)))
> > return;
> >
>
> I am afraid that this kind of check may lead to a race condition (rare, but possible) when:
> - a un-finalized node has been removed via Leave and
> - the daemon still waiting for JoinCOmplete() reply and
> - meanwhile another one has either been created or imported reusing the memory allocation (entirely
> possible) and has been attached
>
> So when the original JoinComplete returns either via timeout/error/ok,we may unintentionally remove dbus
> resources of a new node that has been validated and attached.
>
>
> > // ...
> > }
> >
> > This would allow the application to call Leave *before* sending a reply
> > to JoinComplete.
> >
> > As for Attach(), I also think it should be legal to call it before
> > replying to JoinComplete. The worst thing that can happen is that
> > application successfully attaches, then replies to JoinComplete with an
> > error. This would simply remove the node, and the application would be
> > promptly detached.
> >
> >
>
> I guess we could introduce an internal timer inside the daemon to put
> Attach on hold until JoinComplete is done. If JoinComplete returns an
> error, then Attach won'r go through and would return error as well
So I have created a patch with this option, that I have sent to the reflector.
(See: [PATCH BlueZ] mesh: Add deferal of Attach() and Leave() if busy )
It maintains the *busy* check to make sure we are in a safe state to perform the Attach or Leave, and if not we
defer the call for 1 second... With the understanding that *nobody* should be delaying the response to
JoinComplete. This handles just the simple "Out Of Expected Order" issue, but will still respond to Attach and
Leave with the Busy error if when we reconsider the call, the node has still not been completed.
I have tested this with no memory leaks, and verified that either Attach or Leave can be called within the
JoinComplete application handler.
>
> > > > I'm using a high-level API for D-Bus, so I don't really control when the
> > > > reply is sent, so at the moment the only way to implement this would be
> > > > by delaying Attach() by a fixed timeout - and I'm not comfortable with
> > > > that.
> > >
> > > Yeah, I can see how this is now required...
> > >
> > > In the mesh-cfgclient tool (which is also built on ELL) we accomplish
> > > this by scheduling an idle_oneshot for the Attach.
> >
> > Unfortunately, not all main loops have API to schedule "idle" calls,
> > i.e. calls executed when the loop doesn't have anything better to do.
> >
> > I know that both ELL and Glib do, but AFAIR Qt does not (it uses timers
> > with timeout set to 0, if I'm not mistaken), and Python's asyncio
> > doesn't either.
> >
> > I don't think requiring a specific sequence of dbus messages is a good
> > idea :(
> >
> > regards