2011-03-18 17:46:42

by Tony Ibbs

[permalink] [raw]
Subject: [PATCH 00/11] RFC: KBUS messaging subsystem

KBUS is a lightweight, Linux kernel mediated messaging system,
particularly intended for use in embedded environments.

It is meant to be simple to use and understand. It is designed to
provide predictable message delivery, deterministic message ordering,
and a guaranteed reply for each request. It is especially aimed at
situations where existing solutions, such as DBUS, cannot be used,
typically because of system constraints.

We have various customers using KBUS in real life, and believe it to
be useful. I had a showcase table for KBUS at the ELCE in Cambridge,
October last year, and there seemed to be interest.

The KBUS project home page is at http://kbus-messaging.org/, from
which there are links to the original Google code repository, more
documentation, and various userspace libraries.

There is a working repository with these patches applied to
Linux 2.6.37, available via:

git pull git://github.com/crazyscot/linux-2.6-kbus.git kbus-2.6.37

These patches have been applied in branch apply-patchset-20110318

In order to keep the size of individual patches down, the main code
has been split over several patches (0004..0009). With luck this
should also make it easier to understand what KBUS is trying to do.

Tony Ibbs (11):
Documentation for KBUS
KBUS external header file.
KBUS internal header file
KBUS main source file, basic device support only
KBUS add support for messages
KBUS add ability to receive messages only once
KBUS add ability to add devices at runtime
KBUS add Replier Bind Events
KBUS Replier Bind Event set-aside lists
KBUS report state to userspace
KBUS configuration and Makefile

Documentation/Kbus.txt | 1222 ++++++++++++
include/linux/kbus_defns.h | 666 +++++++
init/Kconfig | 2 +
ipc/Kconfig | 117 ++
ipc/Makefile | 9 +
ipc/kbus_internal.h | 723 +++++++
ipc/kbus_main.c | 4690 ++++++++++++++++++++++++++++++++++++++++++++
ipc/kbus_report.c | 256 +++
8 files changed, 7685 insertions(+), 0 deletions(-)
create mode 100644 Documentation/Kbus.txt
create mode 100644 include/linux/kbus_defns.h
create mode 100644 ipc/Kconfig
create mode 100644 ipc/kbus_internal.h
create mode 100644 ipc/kbus_main.c
create mode 100644 ipc/kbus_report.c

--
1.7.4.1


2011-03-18 17:45:31

by Tony Ibbs

[permalink] [raw]
Subject: [PATCH 07/11] KBUS add ability to add devices at runtime

Users do not always know how many KBUS devices will be needed when
the system starts. This allows a normal user to request an extra
device.

Signed-off-by: Tony Ibbs <[email protected]>
---
include/linux/kbus_defns.h | 12 +++++++++++-
ipc/kbus_main.c | 17 +++++++++++++++++
2 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/include/linux/kbus_defns.h b/include/linux/kbus_defns.h
index 9da72e4..4d3fcd5 100644
--- a/include/linux/kbus_defns.h
+++ b/include/linux/kbus_defns.h
@@ -611,8 +611,18 @@ struct kbus_replier_bind_event_data {
*/
#define KBUS_IOC_VERBOSE _IOWR(KBUS_IOC_MAGIC, 15, char *)

+/*
+ * NEWDEVICE - request another KBUS device (/dev/kbus<n>).
+ *
+ * The next device number (up to a maximum of 255) will be allocated.
+ *
+ * arg(out): __u32, the new device number (<n>)
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_NEWDEVICE _IOR(KBUS_IOC_MAGIC, 16, char *)
+
/* If adding another IOCTL, remember to increment the next number! */
-#define KBUS_IOC_MAXNR 15
+#define KBUS_IOC_MAXNR 16

#if !__KERNEL__ && defined(__cplusplus)
}
diff --git a/ipc/kbus_main.c b/ipc/kbus_main.c
index a75d2e1..48ba4b2 100644
--- a/ipc/kbus_main.c
+++ b/ipc/kbus_main.c
@@ -3532,6 +3532,23 @@ static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
retval = kbus_set_verbosity(priv, arg);
break;

+ case KBUS_IOC_NEWDEVICE:
+ /*
+ * Request a new device
+ *
+ * arg out: the new device number
+ * return: 0 means OK, otherwise not OK.
+ */
+ kbus_maybe_dbg(priv->dev, "%u NEWDEVICE %d\n",
+ id, kbus_num_devices);
+ retval = kbus_setup_new_device(kbus_num_devices);
+ if (retval > 0) {
+ kbus_num_devices++;
+ retval = __put_user(kbus_num_devices - 1,
+ (u32 __user *) arg);
+ }
+ break;
+
default:
/* *Should* be redundant, if we got our range checks right */
retval = -ENOTTY;
--
1.7.4.1

2011-03-18 17:45:40

by Tony Ibbs

[permalink] [raw]
Subject: [PATCH 08/11] KBUS add Replier Bind Events

It is useful to be able to write userspace proxies which allow
messages sent on one KBUS device to be received on another. In
order to do this, the userspace program needs to be reliably
informed of Replier bind and unbind events.

These proxies are referrred to as Limpets in the KBUS documentation.
There is a brief introduction to the concept here:
http://presentations.kbus.googlecode.com/hg/talks/europython2010.html#limpets

Signed-off-by: Tony Ibbs <[email protected]>
---
include/linux/kbus_defns.h | 30 ++++-
ipc/kbus_internal.h | 5 +
ipc/kbus_main.c | 376 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 410 insertions(+), 1 deletions(-)

diff --git a/include/linux/kbus_defns.h b/include/linux/kbus_defns.h
index 4d3fcd5..c646a1d 100644
--- a/include/linux/kbus_defns.h
+++ b/include/linux/kbus_defns.h
@@ -489,6 +489,19 @@ struct kbus_replier_bind_event_data {
#define KBUS_MSG_NAME_REPLIER_DISAPPEARED "$.KBUS.Replier.Disappeared"
#define KBUS_MSG_NAME_ERROR_SENDING "$.KBUS.ErrorSending"

+/*
+ * Replier Bind Event
+ * ------------------
+ * This is the only message name for which KBUS generates data -- see
+ * kbus_replier_bind_event_data. It is also the only message name which KBUS
+ * does not allow binding to as a Replier.
+ *
+ * This is the message that is sent when a Replier binds or unbinds to another
+ * message name, if the KBUS_IOC_REPORTREPLIERBINDS ioctl has been used to
+ * request such notification.
+ */
+#define KBUS_MSG_NAME_REPLIER_BIND_EVENT "$.KBUS.ReplierBindEvent"
+
#define KBUS_IOC_MAGIC 'k' /* 0x6b - which seems fair enough for now */
/*
* RESET: reserved for future use
@@ -621,8 +634,23 @@ struct kbus_replier_bind_event_data {
*/
#define KBUS_IOC_NEWDEVICE _IOR(KBUS_IOC_MAGIC, 16, char *)

+/*
+ * REPORTREPLIERBINDS - request synthetic messages announcing Replier
+ * bind/unbind events.
+ *
+ * If this flag is set, then when someone binds or unbinds to a message name as
+ * a Replier, KBUS will send out a synthetic Announcement of this fact.
+ *
+ * arg(in): __u32, 1 to change to "report", 0 to change to "do not report",
+ * 0xFFFFFFFF to just return the current/previous state.
+ * arg(out): __u32, the previous state.
+ * retval: 0 for success, negative for failure (-EINVAL if arg in was not one
+ * of the specified values)
+ */
+#define KBUS_IOC_REPORTREPLIERBINDS _IOWR(KBUS_IOC_MAGIC, 17, char *)
+
/* If adding another IOCTL, remember to increment the next number! */
-#define KBUS_IOC_MAXNR 16
+#define KBUS_IOC_MAXNR 17

#if !__KERNEL__ && defined(__cplusplus)
}
diff --git a/ipc/kbus_internal.h b/ipc/kbus_internal.h
index 28c153c..13dc896 100644
--- a/ipc/kbus_internal.h
+++ b/ipc/kbus_internal.h
@@ -618,6 +618,11 @@ struct kbus_dev {

/* Are we wanting debugging messages? */
u32 verbose;
+
+ /*
+ * Are we wanting to send a synthetic message for each Replier
+ * bind/unbind? */
+ u32 report_replier_binds;
};

/*
diff --git a/ipc/kbus_main.c b/ipc/kbus_main.c
index 48ba4b2..6372b40 100644
--- a/ipc/kbus_main.c
+++ b/ipc/kbus_main.c
@@ -1031,6 +1031,203 @@ static void kbus_push_synthetic_message(struct kbus_dev *dev,
}

/*
+ * Add the data part to a bind/unbind synthetic message.
+ *
+ * 'is_bind' is true if this was a "bind" event, false if it was an "unbind".
+ *
+ * 'name' is the message name (or wildcard) that was bound (or unbound) to.
+ *
+ * Returns 0 if all goes well, or a negative value if something goes wrong.
+ */
+static int kbus_add_bind_message_data(struct kbus_private_data *priv,
+ struct kbus_msg *new_msg,
+ u32 is_bind,
+ u32 name_len, char *name)
+{
+ struct kbus_replier_bind_event_data *data;
+ struct kbus_data_ptr *wrapped_data;
+ u32 padded_name_len = KBUS_PADDED_NAME_LEN(name_len);
+ u32 data_len = sizeof(*data) + padded_name_len;
+ u32 rest_len = padded_name_len / 4;
+ char *name_p;
+
+ unsigned long *parts;
+ unsigned *lengths;
+
+ data = kmalloc(data_len, GFP_KERNEL);
+ if (!data) {
+ kbus_free_message(new_msg);
+ return -ENOMEM;
+ }
+
+ name_p = (char *)&data->rest[0];
+
+ data->is_bind = is_bind;
+ data->binder = priv->id;
+ data->name_len = name_len;
+
+ data->rest[rest_len - 1] = 0; /* terminating with enough '\0' */
+ strncpy(name_p, name, name_len);
+
+ /*
+ * And that data, unfortunately for our simple mindedness,
+ * needs wrapping up in a reference count...
+ *
+ * Note/remember that we are happy for the reference counting
+ * wrapper to "own" our data, and free it when the it is done
+ * with it.
+ *
+ * I'm going to assert that we have less than a PAGE length
+ * of data, so we can simply do:
+ */
+ parts = kmalloc(sizeof(*parts), GFP_KERNEL);
+ if (!parts) {
+ kfree(data);
+ return -ENOMEM;
+ }
+ lengths = kmalloc(sizeof(*lengths), GFP_KERNEL);
+ if (!lengths) {
+ kfree(parts);
+ kfree(data);
+ return -ENOMEM;
+ }
+ lengths[0] = data_len;
+ parts[0] = (unsigned long)data;
+ wrapped_data =
+ kbus_wrap_data_in_ref(false, 1, parts, lengths, data_len);
+ if (!wrapped_data) {
+ kfree(lengths);
+ kfree(parts);
+ kfree(data);
+ return -ENOMEM;
+ }
+
+ new_msg->data_len = data_len;
+ new_msg->data_ref = wrapped_data;
+ return 0;
+}
+
+/*
+ * Create a new Replier Bind Event synthetic message.
+ *
+ * The initial design of things didn't really expect us to be
+ * generating messages with actual data inside the kernel module,
+ * so it's all a little bit more complicated than it might otherwise
+ * be, and there's some playing with things directly that might best
+ * be done otherwise (notably, sorting out the wrapping up of the
+ * data in a reference count). I think that's excusable given this
+ * should be the only sort of message we ever generate with actual
+ * data (or so I believe).
+ *
+ * Returns the new message, or NULL.
+ */
+static struct kbus_msg
+*kbus_new_synthetic_bind_message(struct kbus_private_data *priv,
+ u32 is_bind,
+ u32 name_len, char *name)
+{
+ ssize_t retval = 0;
+ struct kbus_msg *new_msg;
+ struct kbus_msg_id in_reply_to = { 0, 0 }; /* no-one */
+
+ kbus_maybe_dbg(priv->dev,
+ " Creating synthetic bind message for '%s'"
+ " (%s)\n", name, is_bind ? "bind" : "unbind");
+
+ new_msg = kbus_build_kbus_message(priv->dev,
+ KBUS_MSG_NAME_REPLIER_BIND_EVENT,
+ 0, 0, in_reply_to);
+ if (!new_msg)
+ return NULL;
+
+ /*
+ * What happens if any one of the listeners can't receive the message
+ * because they don't have room in their queues?
+ *
+ * If we flag the message as ALL_OR_FAIL, then if we can't deliver
+ * to all of the listeners who care, we will get -EBUSY returned to
+ * us, which we shall then return as -EAGAIN (people expect to check
+ * for -EAGAIN to find out if they should, well, try again).
+ *
+ * In this scenario, the user needs to catch a "bind"/"unbind" return
+ * of -EAGAIN and realise that it needs to try again.
+ */
+ new_msg->flags |= KBUS_BIT_ALL_OR_FAIL;
+
+ /*
+ * That gave us the basis of the message, but now we need to add in
+ * its meaning.
+ */
+ retval = kbus_add_bind_message_data(priv, new_msg, is_bind,
+ name_len, name);
+ if (retval < 0) {
+ kbus_free_message(new_msg);
+ return NULL;
+ }
+
+ return new_msg;
+}
+
+/*
+ * Generate a bind/unbind synthetic message, and broadcast it.
+ *
+ * This is for use when we have been asked to announce when a Replier binds or
+ * unbinds.
+ *
+ * 'priv' is the sender - the entity that is doing the actual bind/unbind.
+ *
+ * 'is_bind' is true if this was a "bind" event, false if it was an "unbind".
+ *
+ * 'name' is the message name (or wildcard) that was bound (or unbound) to.
+ *
+ * Returns 0 if all goes well, or a negative value if something goes wrong,
+ * notably -EAGAIN if we couldn't send the message to ALL the Listeners who
+ * have bound to receive it.
+ */
+static int kbus_push_synthetic_bind_message(struct kbus_private_data *priv,
+ u32 is_bind,
+ u32 name_len, char *name)
+{
+
+ ssize_t retval = 0;
+ struct kbus_msg *new_msg;
+
+ kbus_maybe_dbg(priv->dev,
+ " Pushing synthetic bind message for '%s'"
+ " (%s) onto queue\n", name,
+ is_bind ? "bind" : "unbind");
+
+ new_msg =
+ kbus_new_synthetic_bind_message(priv, is_bind, name_len, name);
+ if (new_msg == NULL)
+ return -ENOMEM;
+
+ kbus_maybe_report_message(priv->dev, new_msg);
+ kbus_maybe_dbg(priv->dev,
+ "Writing synthetic message to "
+ "recipients\n");
+
+ retval = kbus_write_to_recipients(priv, priv->dev, new_msg);
+ /*
+ * kbus_push_message takes a copy of our message data, so we
+ * must remember to free ours. Since we've made sure that it
+ * looks just like a user-generated message (i.e., the name
+ * can be freed as well as the data), this is fairly simple.
+ */
+ kbus_free_message(new_msg);
+
+ /*
+ * Because we used ALL_OR_FAIL, we will get -EBUSY back if we FAIL,
+ * but we want to tell the user -EAGAIN, since they *should* try
+ * again later.
+ */
+ if (retval == -EBUSY)
+ retval = -EAGAIN;
+
+ return retval;
+}
+
+/*
* Pop the next message off our queue.
*
* Returns a pointer to the message, or NULL if there is no next message.
@@ -1442,6 +1639,21 @@ static int kbus_remember_binding(struct kbus_dev *dev,
new->name_len = name_len;
new->name = name;

+ if (replier && dev->report_replier_binds) {
+ /*
+ * We've been asked to announce when a Replier binds.
+ * If we can't tell all the Listeners who care, we want
+ * to give up, rather than tell some of them, and then
+ * bind anyway.
+ */
+ retval = kbus_push_synthetic_bind_message(priv, true,
+ name_len, name);
+ if (retval != 0) { /* Hopefully, just -EBUSY */
+ kfree(new);
+ return retval;
+ }
+ }
+
list_add(&new->list, &dev->bound_message_list);
return 0;
}
@@ -1561,6 +1773,27 @@ static int kbus_forget_binding(struct kbus_dev *dev,
return -EINVAL;
}

+ if (replier && dev->report_replier_binds) {
+
+ /*
+ * We want to send a message indicating that we've unbound
+ * the Replier for this message.
+ *
+ * If we can't tell all the Listeners who're listening for this
+ * message, we want to give up, rather then tell some of them,
+ * and then unbind anyway.
+ */
+ int retval = kbus_push_synthetic_bind_message(priv, false,
+ name_len, name);
+ if (retval != 0) /* Hopefully, just -EBUSY */
+ return retval;
+ /*
+ * Note that if we were ourselves listening for replier bind
+ * events, then we will ourselves get the message announcing
+ * we're about to unbind.
+ */
+ }
+
kbus_maybe_dbg(priv->dev, " %u Unbound %u %c '%.*s'\n",
priv->id, binding->bound_to_id,
(binding->is_replier ? 'R' : 'L'),
@@ -1584,6 +1817,49 @@ static int kbus_forget_binding(struct kbus_dev *dev,
return 0;
}

+
+/*
+ * Report a Replier Bind Event for unbinding from the given message name
+ */
+static void kbus_report_unbinding(struct kbus_private_data *priv,
+ u32 name_len, char *name)
+{
+ struct kbus_msg *msg;
+ int retval;
+
+ kbus_maybe_dbg(priv->dev,
+ " %u Safe report unbinding of '%.*s'\n",
+ priv->id, name_len, name);
+
+ /* Generate the "X has unbound from Y" message */
+ msg = kbus_new_synthetic_bind_message(priv, false, name_len, name);
+ if (msg == NULL)
+ return; /* There is nothing sensible to do here */
+
+ /* ...and send it */
+ retval = kbus_write_to_recipients(priv, priv->dev, msg);
+ if (retval != -EBUSY)
+ goto done_sending;
+
+ /*
+ * If someone who had bound to it wasn't able to take the message,
+ * then there's not a lot we can do at this stage.
+ *
+ * XXX This is, of course, unacceptable. Sorting it out will
+ * XXX be done in the next tranch of code, though, since it
+ * XXX is not terribly simple.
+ */
+ kbus_maybe_dbg(priv->dev,
+ " %u Someone was unable to receive message '%*s'\n",
+ priv->id, name_len, name);
+
+done_sending:
+ /* Don't forget to free our copy of the message */
+ if (msg)
+ kbus_free_message(msg);
+ /* We aren't returning any status code. Oh well. */
+}
+
/*
* Remove all bindings for a particular listener.
*
@@ -1608,6 +1884,10 @@ static void kbus_forget_my_bindings(struct kbus_private_data *priv)
ptr->bound_to_id, (ptr->is_replier ? 'R' : 'L'),
ptr->name_len, ptr->name);

+ if (ptr->is_replier && dev->report_replier_binds)
+ kbus_report_unbinding(priv, ptr->name_len,
+ ptr->name);
+
list_del(&ptr->list);
kfree(ptr->name);
kfree(ptr);
@@ -2624,6 +2904,14 @@ static int kbus_bind(struct kbus_private_data *priv,
goto done;
}

+ if (bind->is_replier && !strcmp(name,
+ KBUS_MSG_NAME_REPLIER_BIND_EVENT)) {
+ kbus_maybe_dbg(priv->dev, "cannot bind %s as a Replier\n",
+ KBUS_MSG_NAME_REPLIER_BIND_EVENT);
+ retval = -EBADMSG;
+ goto done;
+ }
+
kbus_maybe_dbg(priv->dev, "%u BIND %c '%.*s'\n", priv->id,
(bind->is_replier ? 'R' : 'L'), bind->name_len, name);

@@ -3353,6 +3641,83 @@ static int kbus_set_verbosity(struct kbus_private_data *priv,
return __put_user(old_value, (u32 __user *) arg);
}

+/* Report all existing replier bindings to the requester */
+static int kbus_report_existing_binds(struct kbus_private_data *priv,
+ struct kbus_dev *dev)
+{
+ struct kbus_message_binding *ptr;
+ struct kbus_message_binding *next;
+
+ list_for_each_entry_safe(ptr, next, &dev->bound_message_list, list) {
+ struct kbus_msg *new_msg;
+ int retval;
+
+ kbus_maybe_dbg(priv->dev, " %u Report %c '%.*s'\n",
+ priv->id, (ptr->is_replier ? 'R' : 'L'),
+ ptr->name_len, ptr->name);
+
+ if (!ptr->is_replier)
+ continue;
+
+ new_msg = kbus_new_synthetic_bind_message(ptr->bound_to, true,
+ ptr->name_len, ptr->name);
+ if (new_msg == NULL)
+ return -ENOMEM;
+
+ /*
+ * It is perhaps a bit inefficient to check this per binding,
+ * but it saves us doing two passes through the list.
+ */
+ if (kbus_queue_is_full(priv, "limpet", false)) {
+ /* Giving up is probably the best we can do */
+ kbus_free_message(new_msg);
+ return -EBUSY;
+ }
+
+ retval = kbus_push_message(priv, new_msg, NULL, false);
+
+ kbus_free_message(new_msg);
+ if (retval)
+ return retval;
+ }
+ return 0;
+}
+
+static int kbus_set_report_binds(struct kbus_private_data *priv,
+ struct kbus_dev *dev, unsigned long arg)
+{
+ int retval = 0;
+ u32 report_replier_binds;
+ int old_value = priv->dev->report_replier_binds;
+
+ retval = __get_user(report_replier_binds, (u32 __user *) arg);
+ if (retval)
+ return retval;
+
+ kbus_maybe_dbg(priv->dev,
+ "%u REPORTREPLIERBINDS requests %u (was %d)\n",
+ priv->id, report_replier_binds, old_value);
+
+ switch (report_replier_binds) {
+ case 0:
+ priv->dev->report_replier_binds = false;
+ break;
+ case 1:
+ priv->dev->report_replier_binds = true;
+ /* And report the current state of bindings... */
+ retval = kbus_report_existing_binds(priv, dev);
+ if (retval)
+ return retval;
+ break;
+ case 0xFFFFFFFF:
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return __put_user(old_value, (u32 __user *) arg);
+}
+
static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
int err = 0;
@@ -3549,6 +3914,17 @@ static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
}
break;

+ case KBUS_IOC_REPORTREPLIERBINDS:
+ /*
+ * Should we report Replier bind/unbind events?
+ *
+ * arg in: 0 (for no), 1 (for yes), 0xFFFFFFFF (for query)
+ * arg out: the previous value, before we were called
+ * return: 0 means OK, otherwise not OK
+ */
+ retval = kbus_set_report_binds(priv, dev, arg);
+ break;
+
default:
/* *Should* be redundant, if we got our range checks right */
retval = -ENOTTY;
--
1.7.4.1

2011-03-18 17:45:46

by Tony Ibbs

[permalink] [raw]
Subject: [PATCH 06/11] KBUS add ability to receive messages only once

Sometimes a recipient has bound to a message name more than once,
but only wants to receive one copy of each message matching those
bindings.

Signed-off-by: Tony Ibbs <[email protected]>
---
include/linux/kbus_defns.h | 18 +++++++---
ipc/kbus_internal.h | 30 ++++++++++++++++
ipc/kbus_main.c | 82 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 125 insertions(+), 5 deletions(-)

diff --git a/include/linux/kbus_defns.h b/include/linux/kbus_defns.h
index d43c498..9da72e4 100644
--- a/include/linux/kbus_defns.h
+++ b/include/linux/kbus_defns.h
@@ -581,13 +581,21 @@ struct kbus_replier_bind_event_data {
* retval: 0 for success, negative for failure
*/
#define KBUS_IOC_UNREPLIEDTO _IOR(KBUS_IOC_MAGIC, 13, char *)
-
/*
- * IOCTL 14 is not used, because it is introduced in the next revision,
- * (obviously, in real history this was done in a different order) and
- * I don't want to alter the number for VERBOSE.
+ * MSGONLYONCE - should we receive a message only once?
+ *
+ * This IOCTL tells a Ksock whether it should only receive a particular message
+ * once, even if it is both a Replier and Listener for the message (in which
+ * case it will always get the message as Replier, if appropriate), or if it is
+ * registered as multiple Listeners for the message.
+ *
+ * arg(in): __u32, 1 to change to "only once", 0 to change to the default,
+ * 0xFFFFFFFF to just return the current/previous state.
+ * arg(out): __u32, the previous state.
+ * retval: 0 for success, negative for failure (-EINVAL if arg in was not one
+ * of the specified values)
*/
-
+#define KBUS_IOC_MSGONLYONCE _IOWR(KBUS_IOC_MAGIC, 14, char *)
/*
* VERBOSE - should KBUS output verbose "printk" messages (for this device)?
*
diff --git a/ipc/kbus_internal.h b/ipc/kbus_internal.h
index 2d9e737..28c153c 100644
--- a/ipc/kbus_internal.h
+++ b/ipc/kbus_internal.h
@@ -534,6 +534,36 @@ struct kbus_private_data {
* In fact, the 'outstanding_requests' list is used, simply because
* it was implemented first.
*/
+
+ /*
+ * By default, if a Ksock binds to a message name as both Replier and
+ * Listener (typically by binding to a specific message name as Replier
+ * and to a wildcard including it as Listener), and a Reqest of that
+ * name is sent to that Ksock, it will get the message once as Replier
+ * (marked "WANT_YOU_TO_REPLY"), and once as listener.
+ *
+ * This is technically sensible, but can be irritating to the end user
+ * who really often only wants to receive the message once.
+ *
+ * If "messages_only_once" is set, then when a message is about to be
+ * put onto a Ksocks message queue, it will only be added if it (i.e.,
+ * a message with the same id) has not already just been added. This
+ * is safe because Requests to the specific Replier are always dealt
+ * with first.
+ *
+ * As a side-effect, which I think also makes sense, this will also
+ * mean that if a Listener has bound to the same message name multiple
+ * times (as a Listener), then they will only get the message once.
+ */
+ int messages_only_once;
+ /*
+ * Messages can be added to either end of our message queue (i.e.,
+ * depending on whether they're urgent or not). This means that the
+ * "only once" mechanism needs to check both ends of the queue (which
+ * is a pain). Or we can just remember the message id of the last
+ * message pushed onto the queue. Which is much simpler.
+ */
+ struct kbus_msg_id msg_id_just_pushed;
};

/* What is a sensible number for the default maximum number of messages? */
diff --git a/ipc/kbus_main.c b/ipc/kbus_main.c
index 944b60c..a75d2e1 100644
--- a/ipc/kbus_main.c
+++ b/ipc/kbus_main.c
@@ -851,6 +851,46 @@ static int kbus_push_message(struct kbus_private_data *priv,
" %u Pushing message onto queue (%s)\n",
priv->id, for_replier ? "replier" : "listener");

+ /*
+ * 1. Check to see if this Ksock has the "only one copy
+ * of a message" flag set.
+ * 2. If it does, check if our message (id) is already on
+ * the queue, and if it is, just skip adding it.
+ *
+ * (this means if the Ksock was destined to get the message
+ * several times, either as Replier and Listener, or as
+ * multiple Listeners to the same message name, it will only
+ * get it once, for this "push")
+ *
+ * If "for_replier" is set we necessarily push the message - see below.
+ */
+ if (priv->messages_only_once && !for_replier) {
+ /*
+ * 1. We've been asked to only send one copy of a message
+ * to each Ksock that should receive it.
+ * 2. This is not a Reply (to our Ksock) or a Request (to
+ * our Ksock as Replier)
+ *
+ * So, given that, has a message with that id already been
+ * added to the message queue?
+ *
+ * (Note that if a message would be included because of
+ * multiple message name bindings, we do not say anything
+ * about which binding we will actually add the message
+ * for - so unbinding later on may or may not cause a
+ * message to go away, in this case.)
+ */
+ if (kbus_same_message_id(&priv->msg_id_just_pushed,
+ msg->id.network_id,
+ msg->id.serial_num)) {
+ kbus_maybe_dbg(priv->dev,
+ " %u Ignoring message "
+ "under 'once only' rule\n",
+ priv->id);
+ return 0;
+ }
+ }
+
new_msg = kbus_copy_message(priv->dev, msg);
if (!new_msg)
return -EFAULT;
@@ -898,6 +938,7 @@ static int kbus_push_message(struct kbus_private_data *priv,
}

priv->message_count++;
+ priv->msg_id_just_pushed = msg->id;

if (!kbus_same_message_id(&msg->in_reply_to, 0, 0)) {
/*
@@ -3243,6 +3284,36 @@ static int kbus_nummsgs(struct kbus_private_data *priv,
return __put_user(count, (u32 __user *) arg);
}

+static int kbus_onlyonce(struct kbus_private_data *priv,
+ unsigned long arg)
+{
+ int retval = 0;
+ u32 only_once;
+ int old_value = priv->messages_only_once;
+
+ retval = __get_user(only_once, (u32 __user *) arg);
+ if (retval)
+ return retval;
+
+ kbus_maybe_dbg(priv->dev, "%u ONLYONCE requests %u (was %d)\n",
+ priv->id, only_once, old_value);
+
+ switch (only_once) {
+ case 0:
+ priv->messages_only_once = false;
+ break;
+ case 1:
+ priv->messages_only_once = true;
+ break;
+ case 0xFFFFFFFF:
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return __put_user(old_value, (u32 __user *) arg);
+}
+
static int kbus_set_verbosity(struct kbus_private_data *priv,
unsigned long arg)
{
@@ -3439,6 +3510,17 @@ static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
(u32 __user *) arg);
break;

+ case KBUS_IOC_MSGONLYONCE:
+ /*
+ * Should we receive a given message only once?
+ *
+ * arg in: 0 (for no), 1 (for yes), 0xFFFFFFFF (for query)
+ * arg out: the previous value, before we were called
+ * return: 0 means OK, otherwise not OK
+ */
+ retval = kbus_onlyonce(priv, arg);
+ break;
+
case KBUS_IOC_VERBOSE:
/*
* Should we output verbose/debug messages?
--
1.7.4.1

2011-03-18 17:46:10

by Tony Ibbs

[permalink] [raw]
Subject: [PATCH 04/11] KBUS main source file, basic device support only

This first version of the main source file just provides device
open/close and various basic IOCTLs, including bind/unbind.

Signed-off-by: Tony Ibbs <[email protected]>
---
ipc/kbus_main.c | 1061 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 1061 insertions(+), 0 deletions(-)
create mode 100644 ipc/kbus_main.c

diff --git a/ipc/kbus_main.c b/ipc/kbus_main.c
new file mode 100644
index 0000000..87c7506
--- /dev/null
+++ b/ipc/kbus_main.c
@@ -0,0 +1,1061 @@
+/* Kbus kernel module
+ *
+ * This is a character device driver, providing the messaging support
+ * for kbus.
+ */
+
+/*
+ * ***** BEGIN LICENSE BLOCK *****
+ * Version: MPL 1.1
+ *
+ * The contents of this file are subject to the Mozilla Public License Version
+ * 1.1 (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ * http://www.mozilla.org/MPL/
+ *
+ * Software distributed under the License is distributed on an "AS IS" basis,
+ * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
+ * for the specific language governing rights and limitations under the
+ * License.
+ *
+ * The Original Code is the KBUS Lightweight Linux-kernel mediated
+ * message system
+ *
+ * The Initial Developer of the Original Code is Kynesim, Cambridge UK.
+ * Portions created by the Initial Developer are Copyright (C) 2009
+ * the Initial Developer. All Rights Reserved.
+ *
+ * Contributor(s):
+ * Kynesim, Cambridge UK
+ * Tony Ibbs <[email protected]>
+ *
+ * Alternatively, the contents of this file may be used under the terms of the
+ * GNU Public License version 2 (the "GPL"), in which case the provisions of
+ * the GPL are applicable instead of the above. If you wish to allow the use
+ * of your version of this file only under the terms of the GPL and not to
+ * allow others to use your version of this file under the MPL, indicate your
+ * decision by deleting the provisions above and replace them with the notice
+ * and other provisions required by the GPL. If you do not delete the
+ * provisions above, a recipient may use your version of this file under either
+ * the MPL or the GPL.
+ *
+ * ***** END LICENSE BLOCK *****
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+
+#include <linux/fs.h>
+#include <linux/device.h> /* device classes (for hotplugging), &c */
+#include <linux/cdev.h> /* registering character devices */
+#include <linux/list.h>
+#include <linux/ctype.h> /* for isalnum */
+#include <linux/poll.h>
+#include <linux/slab.h> /* for kmalloc, etc. */
+#include <linux/sched.h> /* for current->pid */
+#include <linux/uaccess.h> /* copy_*_user() functions */
+#include <asm/page.h> /* PAGE_SIZE */
+
+#include <linux/kbus_defns.h>
+#include "kbus_internal.h"
+
+static int kbus_num_devices = CONFIG_KBUS_DEF_NUM_DEVICES;
+
+/* Who we are -- devices */
+static int kbus_major; /* 0 => We'll go for dynamic allocation */
+static int kbus_minor; /* 0 => We're happy to start with device 0 */
+
+/* Our actual devices, 0 through kbus_num_devices-1 */
+static struct kbus_dev **kbus_devices;
+
+static struct class *kbus_class_p;
+
+/* ========================================================================= */
+
+/* I really want this function where it is in the code, so need to foreshadow */
+static int kbus_setup_new_device(int which);
+
+/* ========================================================================= */
+
+/* What's the symbolic name of a replier type? */
+__maybe_unused
+static const char *kbus_replier_type_name(enum kbus_replier_type t)
+{
+ switch (t) {
+ case UNSET: return "UNSET";
+ case WILD_STAR: return "WILD_STAR";
+ case WILD_PERCENT: return "WILD_PERCENT";
+ case SPECIFIC: return "SPECIFIC";
+ }
+ pr_err("kbus: unhandled enum lookup %d in "
+ "kbus_replier_type_name - memory corruption?", t);
+ return "???";
+}
+
+/*
+ * Given a message name, is it valid?
+ *
+ * We have nothing to say on maximum length.
+ *
+ * Returns 0 if it's OK, 1 if it's naughty
+ */
+static int kbus_bad_message_name(char *name, size_t name_len)
+{
+ size_t ii;
+ int dot_at = 1;
+
+ if (name_len < 3)
+ return 1;
+
+ if (name == NULL || name[0] != '$' || name[1] != '.')
+ return 1;
+
+ if (name[name_len - 2] == '.' && name[name_len - 1] == '*')
+ name_len -= 2;
+ else if (name[name_len - 2] == '.' && name[name_len - 1] == '%')
+ name_len -= 2;
+
+ if (name[name_len - 1] == '.')
+ return 1;
+
+ for (ii = 2; ii < name_len; ii++) {
+ if (name[ii] == '.') {
+ if (dot_at == ii - 1)
+ return 1;
+ dot_at = ii;
+ } else if (!isalnum(name[ii]))
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Find out who, if anyone, is bound as a replier to the given message name.
+ *
+ * Returns 1 if we found a replier, 0 if we did not (but all went well), and
+ * a negative value if something went wrong.
+ */
+static int kbus_find_replier(struct kbus_dev *dev,
+ struct kbus_private_data **bound_to,
+ u32 name_len, char *name)
+{
+ struct kbus_message_binding *ptr;
+ struct kbus_message_binding *next;
+
+ list_for_each_entry_safe(ptr, next, &dev->bound_message_list, list) {
+ /*
+ * We are only interested in a replier binding to the name.
+ * We *could* check for the name and then check for
+ * reply-ness - if we found a name match that was *not* a
+ * replyer, then we'd have finished. However, checking the
+ * name is expensive, and I rather assume that a caller is
+ * only checking if they expect a positive result, so it's
+ * simpler to do a lazier check.
+ */
+ if (!ptr->is_replier || ptr->name_len != name_len ||
+ strncmp(name, ptr->name, name_len))
+ continue;
+
+ kbus_maybe_dbg(dev, " '%.*s' has replier %u\n",
+ ptr->name_len, ptr->name, ptr->bound_to_id);
+ *bound_to = ptr->bound_to;
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Add a new binding.
+ *
+ * Doesn't allow more than one replier to be bound for a message name.
+ *
+ * NB: If it succeeds, then it wants to keep hold of 'name', so don't
+ * free it...
+ *
+ * Returns 0 if all went well, a negative value if it did not. Specifically,
+ * -EADDRINUSE if an attempt was made to bind as a replier to a message name
+ * that already has a replier bound.
+ */
+static int kbus_remember_binding(struct kbus_dev *dev,
+ struct kbus_private_data *priv,
+ u32 replier,
+ u32 name_len, char *name)
+{
+ int retval = 0;
+ struct kbus_message_binding *new;
+
+ /* If we want a replier, and there already is one, we lose */
+ if (replier) {
+ struct kbus_private_data *reply_to;
+ retval = kbus_find_replier(dev, &reply_to, name_len, name);
+ /*
+ * "Address in use" isn't quite right, but lets the caller
+ * have some hope of telling what went wrong, and this is a
+ * useful case to distinguish.
+ */
+ if (retval == 1) {
+ kbus_maybe_dbg(dev,
+ "%u CANNOT BIND '%.*s' as "
+ "replier, already bound\n",
+ priv->id, name_len, name);
+ return -EADDRINUSE;
+ }
+ }
+
+ new = kmalloc(sizeof(*new), GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+
+ new->bound_to = priv;
+ new->bound_to_id = priv->id; /* Useful shorthand? */
+ new->is_replier = replier;
+ new->name_len = name_len;
+ new->name = name;
+
+ list_add(&new->list, &dev->bound_message_list);
+ return 0;
+}
+
+/*
+ * Find a particular binding.
+ *
+ * Return a pointer to the binding, or NULL if it was not found.
+ */
+static struct kbus_message_binding
+*kbus_find_binding(struct kbus_dev *dev,
+ struct kbus_private_data *priv,
+ u32 replier, u32 name_len, char *name)
+{
+ struct kbus_message_binding *ptr;
+ struct kbus_message_binding *next;
+
+ list_for_each_entry_safe(ptr, next, &dev->bound_message_list, list) {
+ if (priv != ptr->bound_to)
+ continue;
+ if (replier != ptr->is_replier)
+ continue;
+ if (name_len != ptr->name_len)
+ continue;
+ if (strncmp(name, ptr->name, name_len))
+ continue;
+
+ kbus_maybe_dbg(priv->dev, " %u Found %c '%.*s'\n",
+ priv->id, (ptr->is_replier ? 'R' : 'L'),
+ ptr->name_len, ptr->name);
+ return ptr;
+ }
+ return NULL;
+}
+
+/*
+ * Remove an existing binding.
+ *
+ * Returns 0 if all went well, a negative value if it did not.
+ */
+static int kbus_forget_binding(struct kbus_dev *dev,
+ struct kbus_private_data *priv,
+ u32 replier, u32 name_len, char *name)
+{
+ struct kbus_message_binding *binding;
+
+ binding = kbus_find_binding(dev, priv, replier, name_len, name);
+ if (binding == NULL) {
+ kbus_maybe_dbg(priv->dev,
+ " %u Could not find/unbind "
+ "%u %c '%.*s'\n",
+ priv->id, priv->id,
+ (replier ? 'R' : 'L'), name_len, name);
+ return -EINVAL;
+ }
+
+ kbus_maybe_dbg(priv->dev, " %u Unbound %u %c '%.*s'\n",
+ priv->id, binding->bound_to_id,
+ (binding->is_replier ? 'R' : 'L'),
+ binding->name_len, binding->name);
+
+ /*
+ * If we supported sending messages (yet), we'd need to forget
+ * any messages in our queue that match this binding.
+ */
+
+ /* And remove the binding once that has been done. */
+ list_del(&binding->list);
+ kfree(binding->name);
+ kfree(binding);
+ return 0;
+}
+
+/*
+ * Remove all bindings for a particular listener.
+ *
+ * Called from kbus_release, which will itself handle removing messages
+ * (that *were* bound) from the message queue.
+ */
+static void kbus_forget_my_bindings(struct kbus_private_data *priv)
+{
+ struct kbus_dev *dev = priv->dev;
+ u32 bound_to_id = priv->id;
+
+ struct kbus_message_binding *ptr;
+ struct kbus_message_binding *next;
+
+ kbus_maybe_dbg(dev, "%u Forgetting my bindings\n", priv->id);
+
+ list_for_each_entry_safe(ptr, next, &dev->bound_message_list, list) {
+ if (bound_to_id != ptr->bound_to_id)
+ continue;
+
+ kbus_maybe_dbg(dev, " Unbound %u %c '%.*s'\n",
+ ptr->bound_to_id, (ptr->is_replier ? 'R' : 'L'),
+ ptr->name_len, ptr->name);
+
+ list_del(&ptr->list);
+ kfree(ptr->name);
+ kfree(ptr);
+ }
+}
+
+/*
+ * Remove all bindings.
+ *
+ * Assumed to be called because the device is closing, and thus doesn't lock,
+ * nor does it worry about generating synthetic messages as requests are doomed
+ * not to get replies.
+ */
+static void kbus_forget_all_bindings(struct kbus_dev *dev)
+{
+ struct kbus_message_binding *ptr;
+ struct kbus_message_binding *next;
+
+ kbus_maybe_dbg(dev, "Forgetting bindings\n");
+
+ list_for_each_entry_safe(ptr, next, &dev->bound_message_list, list) {
+
+ kbus_maybe_dbg(dev, " Unbinding %u %c '%.*s'\n",
+ ptr->bound_to_id,
+ (ptr->is_replier ? 'R' : 'L'),
+ ptr->name_len, ptr->name);
+
+ list_del(&ptr->list);
+ kfree(ptr->name);
+ kfree(ptr);
+ }
+}
+
+/*
+ * Add a new open file to our remembrances.
+ *
+ * Returns 0 if all went well, a negative value if it did not.
+ */
+static int kbus_remember_open_ksock(struct kbus_dev *dev,
+ struct kbus_private_data *priv)
+{
+ list_add(&priv->list, &dev->open_ksock_list);
+
+ kbus_maybe_dbg(priv->dev, "Remembered 'open file' id %u\n",
+ priv->id);
+ return 0;
+}
+
+/*
+ * Remove an open file remembrance.
+ *
+ * Returns 0 if all went well, -EINVAL if we couldn't find the open Ksock
+ */
+static int kbus_forget_open_ksock(struct kbus_dev *dev, u32 id)
+{
+ struct kbus_private_data *ptr;
+ struct kbus_private_data *next;
+
+ list_for_each_entry_safe(ptr, next, &dev->open_ksock_list, list) {
+ if (id != ptr->id)
+ continue;
+
+ kbus_maybe_dbg(dev, " Forgetting open Ksock %u\n", id);
+
+ /* So remove it from our list */
+ list_del(&ptr->list);
+ /* But *we* mustn't free the actual datastructure! */
+ return 0;
+ }
+ kbus_maybe_dbg(dev, " Could not forget open Ksock %u\n", id);
+
+ return -EINVAL;
+}
+
+/*
+ * Forget all our "open file" remembrances.
+ *
+ * Assumed to be called because the device is closing, and thus doesn't lock.
+ */
+static void kbus_forget_all_open_ksocks(struct kbus_dev *dev)
+{
+ struct kbus_private_data *ptr;
+ struct kbus_private_data *next;
+
+ list_for_each_entry_safe(ptr, next, &dev->open_ksock_list, list) {
+
+ kbus_maybe_dbg(dev, " Forgetting open Ksock %u\n", ptr->id);
+
+ /* So remove it from our list */
+ list_del(&ptr->list);
+ /* But *we* mustn't free the actual datastructure! */
+ }
+}
+
+static int kbus_open(struct inode *inode, struct file *filp)
+{
+ struct kbus_private_data *priv;
+ struct kbus_dev *dev;
+
+ priv = kmalloc(sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ /*
+ * Use the official magic to retrieve our actual device data
+ * so we can remember it for other file operations.
+ */
+ dev = container_of(inode->i_cdev, struct kbus_dev, cdev);
+
+ if (mutex_lock_interruptible(&dev->mux)) {
+ kfree(priv);
+ return -ERESTARTSYS;
+ }
+
+ /*
+ * Our file descriptor id ("listener id") needs to be unique for this
+ * device, and thus we want to be carefully inside our lock.
+ *
+ * We shall (for now at least) ignore wrap-around - 32 bits is big
+ * enough that it shouldn't cause non-unique ids in our target
+ * applications.
+ *
+ * Listener id 0 is reserved, and we'll use that (on occasion) to mean
+ * kbus itself.
+ */
+ if (dev->next_ksock_id == 0)
+ dev->next_ksock_id++;
+
+ memset(priv, 0, sizeof(*priv));
+ priv->dev = dev;
+ priv->id = dev->next_ksock_id++;
+ priv->pid = current->pid;
+ priv->max_messages = CONFIG_KBUS_DEF_MAX_MESSAGES;
+ priv->sending = false;
+ priv->num_replies_unsent = 0;
+ priv->max_replies_unsent = 0;
+
+ INIT_LIST_HEAD(&priv->message_queue);
+ INIT_LIST_HEAD(&priv->replies_unsent);
+
+ (void)kbus_remember_open_ksock(dev, priv);
+
+ filp->private_data = priv;
+
+ mutex_unlock(&dev->mux);
+
+ kbus_maybe_dbg(dev, "%u OPEN\n", priv->id);
+
+ return 0;
+}
+
+static int kbus_release(struct inode *inode __always_unused, struct file *filp)
+{
+ int retval2 = 0;
+ struct kbus_private_data *priv = filp->private_data;
+ struct kbus_dev *dev = priv->dev;
+
+ if (mutex_lock_interruptible(&dev->mux))
+ return -ERESTARTSYS;
+
+ kbus_maybe_dbg(dev, "%u RELEASE\n", priv->id);
+
+ kbus_forget_my_bindings(priv);
+ retval2 = kbus_forget_open_ksock(dev, priv->id);
+ kfree(priv);
+
+ mutex_unlock(&dev->mux);
+
+ return retval2;
+}
+
+static int kbus_bind(struct kbus_private_data *priv,
+ struct kbus_dev *dev, unsigned long arg)
+{
+ int retval = 0;
+ struct kbus_bind_request *bind;
+ char *name = NULL;
+
+ bind = kmalloc(sizeof(*bind), GFP_KERNEL);
+ if (!bind)
+ return -ENOMEM;
+ if (copy_from_user(bind, (void __user *)arg, sizeof(*bind))) {
+ retval = -EFAULT;
+ goto done;
+ }
+
+ if (bind->name_len == 0) {
+ kbus_maybe_dbg(dev, "bind name is length 0\n");
+ retval = -EBADMSG;
+ goto done;
+ } else if (bind->name_len > KBUS_MAX_NAME_LEN) {
+ kbus_maybe_dbg(dev, "bind name is length %d\n",
+ bind->name_len);
+ retval = -ENAMETOOLONG;
+ goto done;
+ }
+
+ name = kmalloc(bind->name_len + 1, GFP_KERNEL);
+ if (!name) {
+ retval = -ENOMEM;
+ goto done;
+ }
+ if (copy_from_user(name, (char __user *) bind->name, bind->name_len)) {
+ retval = -EFAULT;
+ goto done;
+ }
+ name[bind->name_len] = 0;
+
+ if (kbus_bad_message_name(name, bind->name_len)) {
+ retval = -EBADMSG;
+ goto done;
+ }
+
+ kbus_maybe_dbg(priv->dev, "%u BIND %c '%.*s'\n", priv->id,
+ (bind->is_replier ? 'R' : 'L'), bind->name_len, name);
+
+ retval = kbus_remember_binding(dev, priv,
+ bind->is_replier, bind->name_len, name);
+ if (retval == 0)
+ /* The binding will use our copy of the message name */
+ name = NULL;
+
+done:
+ kfree(name);
+ kfree(bind);
+ return retval;
+}
+
+static int kbus_unbind(struct kbus_private_data *priv,
+ struct kbus_dev *dev, unsigned long arg)
+{
+ int retval = 0;
+ struct kbus_bind_request *bind;
+ char *name = NULL;
+
+ bind = kmalloc(sizeof(*bind), GFP_KERNEL);
+ if (!bind)
+ return -ENOMEM;
+ if (copy_from_user(bind, (void __user *)arg, sizeof(*bind))) {
+ retval = -EFAULT;
+ goto done;
+ }
+
+ if (bind->name_len == 0) {
+ kbus_maybe_dbg(priv->dev, "unbind name is length 0\n");
+ retval = -EBADMSG;
+ goto done;
+ } else if (bind->name_len > KBUS_MAX_NAME_LEN) {
+ kbus_maybe_dbg(priv->dev, "unbind name is length %d\n",
+ bind->name_len);
+ retval = -ENAMETOOLONG;
+ goto done;
+ }
+
+ name = kmalloc(bind->name_len + 1, GFP_KERNEL);
+ if (!name) {
+ retval = -ENOMEM;
+ goto done;
+ }
+ if (copy_from_user(name, (char __user *) bind->name, bind->name_len)) {
+ retval = -EFAULT;
+ goto done;
+ }
+ name[bind->name_len] = 0;
+
+ if (kbus_bad_message_name(name, bind->name_len)) {
+ retval = -EBADMSG;
+ goto done;
+ }
+
+ kbus_maybe_dbg(priv->dev, "%u UNBIND %c '%.*s'\n", priv->id,
+ (bind->is_replier ? 'R' : 'L'), bind->name_len, name);
+
+ retval = kbus_forget_binding(dev, priv,
+ bind->is_replier, bind->name_len, name);
+
+done:
+ kfree(name);
+ kfree(bind);
+ return retval;
+}
+
+static int kbus_replier(struct kbus_private_data *priv __maybe_unused,
+ struct kbus_dev *dev, unsigned long arg)
+{
+ struct kbus_private_data *replier;
+ struct kbus_bind_query *query;
+ char *name = NULL;
+ int retval = 0;
+
+ query = kmalloc(sizeof(*query), GFP_KERNEL);
+ if (!query)
+ return -ENOMEM;
+ if (copy_from_user(query, (void __user *)arg, sizeof(*query))) {
+ retval = -EFAULT;
+ goto done;
+ }
+
+ if (query->name_len == 0 || query->name_len > KBUS_MAX_NAME_LEN) {
+ kbus_maybe_dbg(priv->dev, "Replier name is length %d\n",
+ query->name_len);
+ retval = -ENAMETOOLONG;
+ goto done;
+ }
+
+ name = kmalloc(query->name_len + 1, GFP_KERNEL);
+ if (!name) {
+ retval = -ENOMEM;
+ goto done;
+ }
+ if (copy_from_user(name, (char __user *) query->name,
+ query->name_len)) {
+ retval = -EFAULT;
+ goto done;
+ }
+ name[query->name_len] = 0;
+
+ kbus_maybe_dbg(priv->dev, "%u REPLIER for '%.*s'\n",
+ priv->id, query->name_len, name);
+
+ retval = kbus_find_replier(dev, &replier, query->name_len, name);
+ if (retval < 0)
+ goto done;
+
+ if (retval)
+ query->return_id = replier->id;
+ else
+ query->return_id = 0;
+ /*
+ * Copy the whole structure back, rather than try to work out (in a
+ * guaranteed-safe manner) where the 'id' actually lives
+ */
+ if (copy_to_user((void __user *)arg, query, sizeof(*query))) {
+ retval = -EFAULT;
+ goto done;
+ }
+done:
+ kfree(name);
+ kfree(query);
+ return retval;
+}
+
+/* How much of the current message is left to read? */
+extern u32 kbus_lenleft(struct kbus_private_data *priv)
+{
+ return 0; /* no message => nothing to read */
+}
+
+static int kbus_maxmsgs(struct kbus_private_data *priv,
+ unsigned long arg)
+{
+ int retval = 0;
+ u32 requested_max;
+
+ retval = __get_user(requested_max, (u32 __user *) arg);
+ if (retval)
+ return retval;
+
+ kbus_maybe_dbg(priv->dev, "%u MAXMSGS requests %u (was %u)\n",
+ priv->id, requested_max, priv->max_messages);
+
+ /* A value of 0 is just a query for what the current length is */
+ if (requested_max > 0)
+ priv->max_messages = requested_max;
+
+ return __put_user(priv->max_messages, (u32 __user *) arg);
+}
+
+static int kbus_nummsgs(struct kbus_private_data *priv,
+ struct kbus_dev *dev __maybe_unused, unsigned long arg)
+{
+ u32 count = priv->message_count;
+
+ kbus_maybe_dbg(dev, "%u NUMMSGS %u\n", priv->id, count);
+
+ return __put_user(count, (u32 __user *) arg);
+}
+
+static int kbus_set_verbosity(struct kbus_private_data *priv,
+ unsigned long arg)
+{
+ int retval = 0;
+ u32 verbose;
+ int old_value = priv->dev->verbose;
+
+ retval = __get_user(verbose, (u32 __user *) arg);
+ if (retval)
+ return retval;
+
+ /*
+ * If we're *leaving* verbose mode, we should say so.
+ * However, we also want to say if we're *entering* verbose
+ * mode, and that means we can't use kbus_maybe_dbg (since
+ * we're not yet in verbose mode)
+ */
+#ifdef DEBUG
+ dev_dbg(priv->dev->dev,
+ "%u VERBOSE requests %u (was %d)\n",
+ priv->id, verbose, old_value);
+#endif
+
+ switch (verbose) {
+ case 0:
+ priv->dev->verbose = false;
+ break;
+ case 1:
+ priv->dev->verbose = true;
+ break;
+ case 0xFFFFFFFF:
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return __put_user(old_value, (u32 __user *) arg);
+}
+
+static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+ int err = 0;
+ int retval = 0;
+ struct kbus_private_data *priv = filp->private_data;
+ struct kbus_dev *dev = priv->dev;
+ u32 id = priv->id;
+
+ if (_IOC_TYPE(cmd) != KBUS_IOC_MAGIC)
+ return -ENOTTY;
+ if (_IOC_NR(cmd) > KBUS_IOC_MAXNR)
+ return -ENOTTY;
+ /*
+ * Check our arguments at least vaguely match. Note that VERIFY_WRITE
+ * allows R/W transfers. Remember that 'type' is user-oriented, while
+ * access_ok is kernel-oriented, so the concept of "read" and "write"
+ * is reversed
+ */
+ if (_IOC_DIR(cmd) & _IOC_READ)
+ err = !access_ok(VERIFY_WRITE, (void __user *)arg,
+ _IOC_SIZE(cmd));
+ else if (_IOC_DIR(cmd) & _IOC_WRITE)
+ err = !access_ok(VERIFY_READ, (void __user *)arg,
+ _IOC_SIZE(cmd));
+ if (err)
+ return -EFAULT;
+
+ if (mutex_lock_interruptible(&dev->mux))
+ return -ERESTARTSYS;
+
+ switch (cmd) {
+
+ case KBUS_IOC_RESET:
+ /* This is currently a no-op, but may be useful later */
+ kbus_maybe_dbg(priv->dev, "%u RESET\n", id);
+ break;
+
+ case KBUS_IOC_BIND:
+ /*
+ * BIND: indicate that a file wants to receive messages of a
+ * given name
+ */
+ retval = kbus_bind(priv, dev, arg);
+ break;
+
+ case KBUS_IOC_UNBIND:
+ /*
+ * UNBIND: indicate that a file no longer wants to receive
+ * messages of a given name
+ */
+ retval = kbus_unbind(priv, dev, arg);
+ break;
+
+ case KBUS_IOC_KSOCKID:
+ /*
+ * What is the "Ksock id" for this file descriptor
+ */
+ kbus_maybe_dbg(priv->dev, "%u KSOCKID %u\n", id, id);
+ retval = __put_user(id, (u32 __user *) arg);
+ break;
+
+ case KBUS_IOC_REPLIER:
+ /*
+ * Who (if anyone) is bound to reply to this message?
+ * arg in: message name
+ * arg out: listener id
+ * return: 0 means no-one, 1 means someone
+ * We can't just return the id as the return value of the ioctl,
+ * because it's an unsigned int, and the ioctl return must be
+ * signed...
+ */
+ retval = kbus_replier(priv, dev, arg);
+ break;
+
+ case KBUS_IOC_MAXMSGS:
+ /*
+ * Set (and/or query) maximum number of messages in this
+ * interfaces queue.
+ *
+ * arg in: 0 (for query) or maximum number wanted
+ * arg out: maximum number allowed
+ * return: 0 means OK, otherwise not OK
+ */
+ retval = kbus_maxmsgs(priv, arg);
+ break;
+
+ case KBUS_IOC_NUMMSGS:
+ /* How many messages are in our queue?
+ *
+ * arg out: maximum number allowed
+ * return: 0 means OK, otherwise not OK
+ */
+ retval = kbus_nummsgs(priv, dev, arg);
+ break;
+
+ case KBUS_IOC_VERBOSE:
+ /*
+ * Should we output verbose/debug messages?
+ *
+ * arg in: 0 (for no), 1 (for yes), 0xFFFFFFFF (for query)
+ * arg out: the previous value, before we were called
+ * return: 0 means OK, otherwise not OK
+ */
+ retval = kbus_set_verbosity(priv, arg);
+ break;
+
+ default:
+ /* *Should* be redundant, if we got our range checks right */
+ retval = -ENOTTY;
+ break;
+ }
+
+ mutex_unlock(&dev->mux);
+ return retval;
+}
+
+/* File operations for /dev/kbus<n> */
+static const struct file_operations kbus_fops = {
+ .owner = THIS_MODULE,
+ .unlocked_ioctl = kbus_ioctl,
+ .open = kbus_open,
+ .release = kbus_release,
+};
+
+static void kbus_setup_cdev(struct kbus_dev *dev, int devno)
+{
+ int err;
+
+ /*
+ * Remember to initialise the mutex *before* making the device
+ * available!
+ */
+ mutex_init(&dev->mux);
+
+ /*
+ * This seems like a sensible place to setup other device specific
+ * stuff, too.
+ */
+ INIT_LIST_HEAD(&dev->bound_message_list);
+ INIT_LIST_HEAD(&dev->open_ksock_list);
+
+ dev->next_ksock_id = 0;
+ dev->next_msg_serial_num = 0;
+
+ cdev_init(&dev->cdev, &kbus_fops);
+ dev->cdev.owner = THIS_MODULE;
+
+ err = cdev_add(&dev->cdev, devno, 1);
+ if (err)
+ pr_err("Error %d adding kbus0 as a character device\n",
+ err);
+}
+
+static void kbus_teardown_cdev(struct kbus_dev *dev)
+{
+ kbus_forget_all_bindings(dev);
+ kbus_forget_all_open_ksocks(dev);
+
+ cdev_del(&dev->cdev);
+}
+
+
+/*
+ * Actually setup /dev/kbus<which>.
+ *
+ * Returns <which> or a negative error code.
+ */
+static int kbus_setup_new_device(int which)
+{
+ struct kbus_dev *new = NULL;
+ dev_t this_devno;
+
+ if (which < 0 || which > (KBUS_MAX_NUM_DEVICES - 1)) {
+ pr_err("kbus: next device index %d not %d..%d\n",
+ which, KBUS_MIN_NUM_DEVICES, KBUS_MAX_NUM_DEVICES);
+ return -EINVAL;
+ }
+
+ new = kmalloc(sizeof(*new), GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+
+ memset(new, 0, sizeof(*new));
+
+ /* Connect the device up with its operations */
+ this_devno = MKDEV(kbus_major, kbus_minor + which);
+ kbus_setup_cdev(new, this_devno);
+ new->index = which;
+
+ new->verbose = KBUS_DEFAULT_VERBOSE_SETTING;
+
+ new->dev = device_create(kbus_class_p, NULL,
+ this_devno, NULL, "kbus%d", which);
+
+ kbus_devices[which] = new;
+ return which;
+}
+
+/* Allow the reporting infrastructure to "see" our internals */
+extern void kbus_get_device_data(int *num_devices,
+ struct kbus_dev ***devices)
+{
+ *num_devices = kbus_num_devices;
+ *devices = kbus_devices;
+}
+
+static int __init kbus_init(void)
+{
+ int result;
+ int ii;
+ dev_t devno = 0;
+
+#ifdef DEBUG
+ pr_notice("Initialising KBUS module (%d device%s)\n",
+ kbus_num_devices, kbus_num_devices == 1 ? "" : "s");
+#endif
+
+ if (kbus_num_devices < KBUS_MIN_NUM_DEVICES ||
+ kbus_num_devices > KBUS_MAX_NUM_DEVICES) {
+ pr_err("kbus: requested number of devices %d not %d..%d\n",
+ kbus_num_devices,
+ KBUS_MIN_NUM_DEVICES, KBUS_MAX_NUM_DEVICES);
+ return -EINVAL;
+ }
+
+ /* ================================================================= */
+ /*
+ * Our main purpose is to provide /dev/kbus
+ * We wish to start our device numbering with device 0, and device 0
+ * should always be present,
+ */
+ result = alloc_chrdev_region(&devno, kbus_minor, KBUS_MAX_NUM_DEVICES,
+ "kbus");
+ /* We're quite happy with dynamic allocation of our major number */
+ kbus_major = MAJOR(devno);
+ if (result < 0) {
+ pr_warn("kbus: Cannot allocate character device region "
+ "(error %d)\n", -result);
+ return result;
+ }
+
+ kbus_devices = kmalloc(KBUS_MAX_NUM_DEVICES * sizeof(struct kbus_dev *),
+ GFP_KERNEL);
+ if (!kbus_devices) {
+ pr_warn("kbus: Cannot allocate devices\n");
+ unregister_chrdev_region(devno, kbus_num_devices);
+ return -ENOMEM;
+ }
+ memset(kbus_devices, 0, kbus_num_devices * sizeof(struct kbus_dev *));
+
+ /*
+ * To make the user's life as simple as possible, let's make our device
+ * hot pluggable -- this means that on a modern system it *should* just
+ * appear, as if by magic (and go away again when the module is
+ * removed).
+ */
+ kbus_class_p = class_create(THIS_MODULE, "kbus");
+ if (IS_ERR(kbus_class_p)) {
+ long err = PTR_ERR(kbus_class_p);
+ if (err == -EEXIST) {
+ pr_warn("kbus: Cannot create kbus class, "
+ "it already exists\n");
+ } else {
+ pr_err("kbus: Error creating kbus class\n");
+ unregister_chrdev_region(devno, kbus_num_devices);
+ return err;
+ }
+ }
+
+ /* And connect up the number of devices we've been asked for */
+ for (ii = 0; ii < kbus_num_devices; ii++) {
+ int res = kbus_setup_new_device(ii);
+ if (res < 0) {
+ unregister_chrdev_region(devno, kbus_num_devices);
+ class_destroy(kbus_class_p);
+ return res;
+ }
+ }
+
+ /* Set up the files that allow users to see something of our state */
+ kbus_setup_reporting();
+
+ return 0;
+}
+
+static void __exit kbus_exit(void)
+{
+ /* No locking done, as we're standing down */
+
+ int ii;
+ dev_t devno = MKDEV(kbus_major, kbus_minor);
+
+#ifdef DEBUG
+ pr_notice("Standing down kbus module\n");
+#endif
+
+ for (ii = 0; ii < kbus_num_devices; ii++) {
+ kbus_teardown_cdev(kbus_devices[ii]);
+ kfree(kbus_devices[ii]);
+ }
+ unregister_chrdev_region(devno, kbus_num_devices);
+
+ /*
+ * If I'm destroying the class, do I actually need to destroy the
+ * individual device therein? Best safe...
+ */
+ for (ii = 0; ii < kbus_num_devices; ii++) {
+ dev_t this_devno = MKDEV(kbus_major, kbus_minor + ii);
+ device_destroy(kbus_class_p, this_devno);
+ }
+ class_destroy(kbus_class_p);
+
+ kbus_remove_reporting();
+}
+
+module_param(kbus_num_devices, int, S_IRUGO);
+MODULE_PARM_DESC(kbus_num_devices,
+ "Number of KBUS device nodes to create initially");
+module_init(kbus_init);
+module_exit(kbus_exit);
+
+MODULE_DESCRIPTION("KBUS lightweight messaging system");
+MODULE_AUTHOR("[email protected], [email protected]");
+/*
+ * All well-behaved Linux kernel modules should be licensed under GPL v2.
+ * So shall it be.
+ *
+ * (According to the comments in <linux/module.h>, the "v2" is implicit here)
+ *
+ * We also license under the MPL, to allow free use outwith Linux if anyone
+ * wishes.
+ */
+MODULE_LICENSE("Dual MPL/GPL");
--
1.7.4.1

2011-03-18 17:46:26

by Tony Ibbs

[permalink] [raw]
Subject: [PATCH 09/11] KBUS Replier Bind Event set-aside lists

In order to make sending Replier Bind Events reliable, we need
to introduce set-aside message lists, in case a client that
wants to receive such events has a full list when the event
occurs (making bind/unbind depend on the recipient of the
event is not acceptable).

Signed-off-by: Tony Ibbs <[email protected]>
---
include/linux/kbus_defns.h | 7 +
ipc/kbus_internal.h | 62 ++++++
ipc/kbus_main.c | 464 ++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 518 insertions(+), 15 deletions(-)

diff --git a/include/linux/kbus_defns.h b/include/linux/kbus_defns.h
index c646a1d..82779a6 100644
--- a/include/linux/kbus_defns.h
+++ b/include/linux/kbus_defns.h
@@ -482,12 +482,19 @@ struct kbus_replier_bind_event_data {
* bound as Replier closed.
* * ErrorSending - an unexpected error occurred when trying to send a Request
* to its Replier whilst polling.
+ *
+ * Synthetic Announcements with no data
+ * ------------------------------------
+ * * UnbindEventsLost - sent (instead of a Replier Bind Event) when the unbind
+ * events "set aside" list has filled up, and thus unbind events have been
+ * lost.
*/
#define KBUS_MSG_NAME_REPLIER_GONEAWAY "$.KBUS.Replier.GoneAway"
#define KBUS_MSG_NAME_REPLIER_IGNORED "$.KBUS.Replier.Ignored"
#define KBUS_MSG_NAME_REPLIER_UNBOUND "$.KBUS.Replier.Unbound"
#define KBUS_MSG_NAME_REPLIER_DISAPPEARED "$.KBUS.Replier.Disappeared"
#define KBUS_MSG_NAME_ERROR_SENDING "$.KBUS.ErrorSending"
+#define KBUS_MSG_NAME_UNBIND_EVENTS_LOST "$.KBUS.UnbindEventsLost"

/*
* Replier Bind Event
diff --git a/ipc/kbus_internal.h b/ipc/kbus_internal.h
index 13dc896..a24fcaf 100644
--- a/ipc/kbus_internal.h
+++ b/ipc/kbus_internal.h
@@ -414,6 +414,20 @@ struct kbus_write_msg {
};

/*
+ * The data for an unsent Replier Bind Event (in the unsent_unbind_msg_list)
+ *
+ * Note that 'binding' may theroretically be NULL (although I don't think this
+ * should ever actually happen).
+ */
+struct kbus_unsent_message_item {
+ struct list_head list;
+ struct kbus_private_data *send_to; /* who we want to send it to */
+ u32 send_to_id; /* but the id is often useful */
+ struct kbus_msg *msg; /* the message itself */
+ struct kbus_message_binding *binding; /* and why we remembered it */
+};
+
+/*
* This is the data for an individual Ksock
*
* Each time we open /dev/kbus<n>, we need to remember a unique id for
@@ -564,6 +578,13 @@ struct kbus_private_data {
* message pushed onto the queue. Which is much simpler.
*/
struct kbus_msg_id msg_id_just_pushed;
+
+ /*
+ * If this flag is set, then we may have outstanding Replier Unbound
+ * Event messages (kept on a list on our device). These must be read
+ * before any "normal" messages (on our message_queue) get read.
+ */
+ int maybe_got_unsent_unbind_msgs;
};

/* What is a sensible number for the default maximum number of messages? */
@@ -571,6 +592,18 @@ struct kbus_private_data {
#define CONFIG_KBUS_DEF_MAX_MESSAGES 100
#endif

+/*
+ * What about the maximum number of unsent unbind event messages?
+ * This may want to be quite large, to allow for Limpets with momentary
+ * network outages.
+ *
+ * The default value is probably too small, but experimantation is
+ * needed to determine a more sensible value.
+ */
+#ifndef CONFIG_KBUS_MAX_UNSENT_UNBIND_MESSAGES
+#define CONFIG_KBUS_MAX_UNSENT_UNBIND_MESSAGES 1000
+#endif
+
/* Information belonging to each /dev/kbus<N> device */
struct kbus_dev {
struct cdev cdev; /* Character device data */
@@ -623,6 +656,35 @@ struct kbus_dev {
* Are we wanting to send a synthetic message for each Replier
* bind/unbind? */
u32 report_replier_binds;
+
+ /*
+ * If Replier (un)bind events have been requested, then when
+ * kbus_release is called, a message must be sent for each Replier that
+ * is (of necessity) unbound from the Ksock being released. For a
+ * normal unbound, if any of the Repliers doesn't have room in its
+ * message queue for such an event, then the unbind fails with -EAGAIN.
+ * This isn't acceptable for kbus_release (apart from anything else,
+ * the release might be due to the original program falling over).
+ * It's not acceptable to fail to send the messages (that's a general
+ * KBUS principle).
+ *
+ * The only sensible solution seems to be to put the messages we'd
+ * like to have sent onto a set-aside list, and mark each recipient
+ * as having messages thereon. Then, each time a Ksock looks for a
+ * new message, it should first check to see if it might have one
+ * on the set-aside list, and if it does, read that instead.
+ *
+ * Once we're doing this, though, we need some limit on how big that
+ * set-aside list may grow (to allow for user processes that keep
+ * binding and falling over!). When the list gets "too long", we set a
+ * "gone tragically wrong" flag, and instead of adding more unbind
+ * events, we instead add a single "gone tragically wrong" message for
+ * each Ksock. We don't revert to remembering unbind events again until
+ * the list has been emptied.
+ */
+ struct list_head unsent_unbind_msg_list;
+ u32 unsent_unbind_msg_count;
+ int unsent_unbind_is_tragic;
};

/*
diff --git a/ipc/kbus_main.c b/ipc/kbus_main.c
index 6372b40..296a1a1 100644
--- a/ipc/kbus_main.c
+++ b/ipc/kbus_main.c
@@ -88,6 +88,11 @@ static int kbus_write_to_recipients(struct kbus_private_data *priv,
struct kbus_dev *dev,
struct kbus_msg *msg);

+static void kbus_forget_unbound_unsent_unbind_msgs(struct kbus_private_data
+ *priv,
+ struct kbus_message_binding
+ *binding);
+
static int kbus_alloc_ref_data(struct kbus_private_data *priv,
u32 data_len,
struct kbus_data_ptr **ret_ref_data);
@@ -1803,6 +1808,13 @@ static int kbus_forget_binding(struct kbus_dev *dev,
kbus_forget_matching_messages(priv, binding);

/*
+ * Maybe including any set-aside Replier Unbind Events...
+ */
+ if (!strncmp(KBUS_MSG_NAME_REPLIER_BIND_EVENT, binding->name,
+ binding->name_len))
+ kbus_forget_unbound_unsent_unbind_msgs(priv, binding);
+
+ /*
* We carefully don't try to do anything about requests that
* have already been read - the fact that the user has unbound
* from receiving new messages with this name doesn't imply
@@ -1817,43 +1829,219 @@ static int kbus_forget_binding(struct kbus_dev *dev,
return 0;
}

+/*
+ * Add a (copy of a) message to the "unsent Replier Unbind Event" list
+ *
+ * 'priv' is who we are trying to send to, 'msg' is the message we were
+ * trying to send.
+ *
+ * Returns 0 if all went well, a negative value if it did not.
+ */
+static int kbus_remember_unsent_unbind_event(struct kbus_dev *dev,
+ struct kbus_private_data *priv,
+ struct kbus_msg *msg,
+ struct kbus_message_binding
+ *binding)
+{
+ struct kbus_unsent_message_item *new;
+ struct kbus_msg *new_msg = NULL;
+
+ kbus_maybe_dbg(priv->dev,
+ " Remembering unsent unbind event "
+ "%u '%.*s' to %u\n",
+ dev->unsent_unbind_msg_count, msg->name_len,
+ msg->name_ref->name, priv->id);
+
+ new = kmalloc(sizeof(*new), GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+
+ new_msg = kbus_copy_message(dev, msg);
+ if (!new_msg) {
+ kfree(new);
+ return -EFAULT;
+ }
+
+ new->send_to = priv;
+ new->send_to_id = priv->id; /* Useful shorthand? */
+ new->msg = new_msg;
+ new->binding = binding;
+
+ /*
+ * The order should be the same as a normal message queue,
+ * so add to the end...
+ */
+ list_add_tail(&new->list, &dev->unsent_unbind_msg_list);
+ dev->unsent_unbind_msg_count++;
+ return 0;
+}
+
+/*
+ * Return true if this listener already has a "gone tragic" message.
+ *
+ * Look at the end of the unsent Replier Unbind Event message list, to see
+ * if the given listener already has a "gone tragic" message (since if it
+ * does, we will not want to add another).
+ */
+static int kbus_listener_already_got_tragic_msg(struct kbus_dev *dev,
+ struct kbus_private_data
+ *listener)
+{
+ struct kbus_unsent_message_item *ptr;
+ struct kbus_unsent_message_item *next;
+
+ kbus_maybe_dbg(dev,
+ " Checking for 'gone tragic' event for %u\n",
+ listener->id);
+
+ list_for_each_entry_safe_reverse(ptr, next,
+ &dev->unsent_unbind_msg_list, list) {
+
+ if (kbus_message_name_matches(
+ ptr->msg->name_ref->name,
+ ptr->msg->name_len,
+ KBUS_MSG_NAME_REPLIER_BIND_EVENT))
+ /*
+ * If we get a Replier Bind Event, then we're past all
+ * the "tragic world" messages
+ */
+ break;
+ if (ptr->send_to_id == listener->id) {
+ kbus_maybe_dbg(dev, " Found\n");
+ return true;
+ }
+ }
+
+ kbus_maybe_dbg(dev, " Not found\n");
+ return false;
+}

/*
- * Report a Replier Bind Event for unbinding from the given message name
+ * Report a Replier Bind Event for unbinding from the given message name,
+ * in such a way that we do not lose the message even if we can't send it
+ * right away.
*/
-static void kbus_report_unbinding(struct kbus_private_data *priv,
- u32 name_len, char *name)
+static void kbus_safe_report_unbinding(struct kbus_private_data *priv,
+ u32 name_len, char *name)
{
+ /* 1. Generate a new unbinding event message
+ * 2. Try sending it to everyone who cares
+ * 3. If that failed, then find out who *does* care
+ * 4. Is there room for that many messages on the set-aside list?
+ * 5. If there is, add (a copy of) the message for each
+ * 6. If there is not, set the "tragic" flag, and add (a copy of)
+ * the "world gone tragic" message for each
+ * 7. If we've added something to the set-aside list, then set
+ * the "maybe got something on the set-aside list" flag for
+ * each recipient. */
+
struct kbus_msg *msg;
- int retval;
+ struct kbus_message_binding **listeners = NULL;
+ struct kbus_message_binding *replier = NULL;
+ int retval = 0;
+ int num_listeners;
+ int ii;

kbus_maybe_dbg(priv->dev,
" %u Safe report unbinding of '%.*s'\n",
priv->id, name_len, name);

- /* Generate the "X has unbound from Y" message */
+ /* Generate the message we'd *like* to send */
msg = kbus_new_synthetic_bind_message(priv, false, name_len, name);
if (msg == NULL)
return; /* There is nothing sensible to do here */

- /* ...and send it */
+ /* If we're lucky, we can just send it */
retval = kbus_write_to_recipients(priv, priv->dev, msg);
if (retval != -EBUSY)
goto done_sending;

/*
- * If someone who had bound to it wasn't able to take the message,
- * then there's not a lot we can do at this stage.
+ * So at least one of the people we were trying to send to was not able
+ * to take the message, presumably because their message queue is full.
+ * Thus we need to put aside one copy of the message for each
+ * recipient, to be delivered when it *can* be received.
*
- * XXX This is, of course, unacceptable. Sorting it out will
- * XXX be done in the next tranch of code, though, since it
- * XXX is not terribly simple.
+ * So before we do anything else, we need to know who those recipients
+ * are.
*/
+
kbus_maybe_dbg(priv->dev,
- " %u Someone was unable to receive message '%*s'\n",
- priv->id, name_len, name);
+ " %u Need to add messages to set-aside list\n",
+ priv->id);
+
+ /*
+ * We're expecting some listeners, but no replier.
+ * Since this is a duplicate of what we did in kbus_write_to_recipients,
+ * and since our ksock is locked whilst we're working, we can assume
+ * that we should get the same result. For the sake of completeness,
+ * check the error return anyway, but I'm not going to worry about
+ * whether we suddenly have a replier popping up unexpectedly...
+ */
+ num_listeners = kbus_find_listeners(priv->dev, &listeners, &replier,
+ msg->name_len, msg->name_ref->name);
+ if (num_listeners < 0) {
+ kbus_maybe_dbg(priv->dev,
+ " Error %d finding listeners\n",
+ num_listeners);
+ retval = num_listeners;
+ goto done_sending;
+ }
+
+ if (priv->dev->unsent_unbind_is_tragic ||
+ (num_listeners + priv->dev->unsent_unbind_msg_count >
+ CONFIG_KBUS_MAX_UNSENT_UNBIND_MESSAGES)) {
+ struct kbus_msg_id in_reply_to = { 0, 0 }; /* no-one */
+ /*
+ * Either the list had already gone tragic, or we've
+ * filled it up with "normal" unbind event messages
+ */
+ priv->dev->unsent_unbind_is_tragic = true;
+
+ /* In which case we need a different message */
+ kbus_free_message(msg);
+ msg = kbus_build_kbus_message(priv->dev,
+ KBUS_MSG_NAME_UNBIND_EVENTS_LOST,
+ 0, 0, in_reply_to);
+ if (msg == NULL)
+ goto done_sending;
+
+ for (ii = 0; ii < num_listeners; ii++) {
+ /*
+ * We only want to add a "gone tragic" message if the
+ * recipient does not already have such a message
+ * stacked...
+ */
+ if (kbus_listener_already_got_tragic_msg(priv->dev,
+ listeners[ii]->bound_to))
+ continue;
+ retval = kbus_remember_unsent_unbind_event(priv->dev,
+ listeners[ii]->bound_to,
+ msg, listeners[ii]);
+ /* And remember that we've got something on the
+ * set-aside list */
+ listeners[ii]->bound_to->maybe_got_unsent_unbind_msgs =
+ true;
+ if (retval)
+ break; /* No good choice here */
+ }
+ } else {
+ /* There's room to add these messages as-is */
+ for (ii = 0; ii < num_listeners; ii++) {
+ retval = kbus_remember_unsent_unbind_event(priv->dev,
+ listeners[ii]->bound_to,
+ msg, listeners[ii]);
+ /* And remember that we've got something on the
+ * set-aside list */
+ listeners[ii]->bound_to->maybe_got_unsent_unbind_msgs =
+ true;
+ if (retval)
+ break; /* No good choice here */
+ }
+ }

done_sending:
+ kfree(listeners);
/* Don't forget to free our copy of the message */
if (msg)
kbus_free_message(msg);
@@ -1861,6 +2049,196 @@ done_sending:
}

/*
+ * Return how many messages we have in the unsent Replier Unbind Event list.
+ */
+static u32 kbus_count_unsent_unbind_msgs(struct kbus_private_data *priv)
+{
+ struct kbus_dev *dev = priv->dev;
+
+ struct kbus_unsent_message_item *ptr;
+ struct kbus_unsent_message_item *next;
+
+ u32 count = 0;
+
+ kbus_maybe_dbg(dev, "%u Counting unsent unbind messages\n", priv->id);
+
+ list_for_each_entry_safe(ptr, next, &dev->unsent_unbind_msg_list,
+ list) {
+ if (ptr->send_to_id == priv->id)
+ count++;
+ }
+ return count;
+}
+
+/*
+ * Maybe move an unsent Replier Unbind Event message to the main message list.
+ *
+ * Check if we have an unsent event on the set-aside list. If we do, move the
+ * first one across to our normal message queue.
+ *
+ * Returns 0 if all goes well, or a negative value if something went wrong.
+ */
+static int kbus_maybe_move_unsent_unbind_msg(struct kbus_private_data *priv)
+{
+ struct kbus_dev *dev = priv->dev;
+
+ struct kbus_unsent_message_item *ptr;
+ struct kbus_unsent_message_item *next;
+
+ kbus_maybe_dbg(dev,
+ "%u Looking for an unsent unbind message\n", priv->id);
+
+ list_for_each_entry_safe(ptr, next, &dev->unsent_unbind_msg_list,
+ list) {
+ int retval;
+
+ if (ptr->send_to_id != priv->id)
+ continue;
+
+ kbus_maybe_report_message(priv->dev, ptr->msg);
+ /*
+ * Move the message into our normal message queue.
+ *
+ * We *must* use kbus_push_message() to do this, as
+ * we wish to keep our promise that this shall be the
+ * only way of adding a message to the queue.
+ */
+ retval = kbus_push_message(priv, ptr->msg, ptr->binding, false);
+ if (retval)
+ return retval; /* What else can we do? */
+
+ list_del(&ptr->list);
+ /* Mustn't forget to free *our* copy of the message */
+ kbus_free_message(ptr->msg);
+ kfree(ptr);
+ dev->unsent_unbind_msg_count--;
+ goto check_tragic;
+ }
+
+ /*
+ * Since we didn't find anything, we can safely unset the flag that
+ * says there might be something to find...
+ */
+ priv->maybe_got_unsent_unbind_msgs = false;
+
+check_tragic:
+ /*
+ * And if we've succeeded in emptying the list, we can unset the
+ * "gone tragic" flag for it, too, if it was set.
+ */
+ if (list_empty(&dev->unsent_unbind_msg_list))
+ dev->unsent_unbind_is_tragic = false;
+ return 0;
+}
+
+/*
+ * Forget any outstanding unsent Replier Unbind Event messages for this binding.
+ *
+ * Called from kbus_release.
+ */
+static void kbus_forget_unbound_unsent_unbind_msgs(
+ struct kbus_private_data *priv,
+ struct kbus_message_binding *binding)
+{
+ struct kbus_dev *dev = priv->dev;
+
+ struct kbus_unsent_message_item *ptr;
+ struct kbus_unsent_message_item *next;
+
+ u32 count = 0;
+
+ kbus_maybe_dbg(dev,
+ " %u Forgetting unsent unbind messages for "
+ "this binding\n", priv->id);
+
+ list_for_each_entry_safe(ptr, next, &dev->unsent_unbind_msg_list,
+ list) {
+ if (ptr->binding == binding) {
+ list_del(&ptr->list);
+ kbus_free_message(ptr->msg);
+ kfree(ptr);
+ dev->unsent_unbind_msg_count--;
+ count++;
+ }
+ }
+ kbus_maybe_dbg(dev, "%u Forgot %u unsent unbind messages\n",
+ priv->id, count);
+ /*
+ * And if we've succeeded in emptying the list, we can unset the
+ * "gone tragic" flag for it, too, if it was set.
+ */
+ if (list_empty(&dev->unsent_unbind_msg_list))
+ dev->unsent_unbind_is_tragic = false;
+}
+
+/*
+ * Forget any outstanding unsent Replier Unbind Event messages for this Replier.
+ *
+ * Called from kbus_release.
+ */
+static void kbus_forget_my_unsent_unbind_msgs(struct kbus_private_data *priv)
+{
+ struct kbus_dev *dev = priv->dev;
+
+ struct kbus_unsent_message_item *ptr;
+ struct kbus_unsent_message_item *next;
+
+ u32 count = 0;
+
+ kbus_maybe_dbg(dev,
+ "%u Forgetting my unsent unbind messages\n", priv->id);
+
+ list_for_each_entry_safe(ptr, next, &dev->unsent_unbind_msg_list,
+ list) {
+ if (ptr->send_to_id == priv->id) {
+ list_del(&ptr->list);
+ kbus_free_message(ptr->msg);
+ kfree(ptr);
+ dev->unsent_unbind_msg_count--;
+ count++;
+ }
+ }
+ kbus_maybe_dbg(dev, "%u Forgot %u unsent unbind messages\n",
+ priv->id, count);
+ /*
+ * And if we've succeeded in emptying the list, we can unset the
+ * "gone tragic" flag for it, too, if it was set.
+ */
+ if (list_empty(&dev->unsent_unbind_msg_list))
+ dev->unsent_unbind_is_tragic = false;
+}
+
+/*
+ * Forget any outstanding unsent Replier Unbind Event messages.
+ *
+ * Assumed to be called because the device is closing, and thus doesn't lock,
+ * or worry about lost messages.
+ */
+static void kbus_forget_unsent_unbind_msgs(struct kbus_dev *dev)
+{
+ struct kbus_unsent_message_item *ptr;
+ struct kbus_unsent_message_item *next;
+
+ kbus_maybe_dbg(dev,
+ " Forgetting unsent unbind event messages\n");
+
+ list_for_each_entry_safe(ptr, next, &dev->unsent_unbind_msg_list,
+ list) {
+
+ if (!kbus_message_name_matches(
+ ptr->msg->name_ref->name,
+ ptr->msg->name_len,
+ KBUS_MSG_NAME_REPLIER_BIND_EVENT))
+ kbus_maybe_report_message(dev, ptr->msg);
+
+ list_del(&ptr->list);
+ kbus_free_message(ptr->msg);
+ kfree(ptr);
+ dev->unsent_unbind_msg_count--;
+ }
+}
+
+/*
* Remove all bindings for a particular listener.
*
* Called from kbus_release, which will itself handle removing messages
@@ -1885,8 +2263,8 @@ static void kbus_forget_my_bindings(struct kbus_private_data *priv)
ptr->name_len, ptr->name);

if (ptr->is_replier && dev->report_replier_binds)
- kbus_report_unbinding(priv, ptr->name_len,
- ptr->name);
+ kbus_safe_report_unbinding(priv, ptr->name_len,
+ ptr->name);

list_del(&ptr->list);
kfree(ptr->name);
@@ -2088,6 +2466,8 @@ static int kbus_release(struct inode *inode __always_unused, struct file *filp)

kbus_empty_message_queue(priv);
kbus_forget_my_bindings(priv);
+ if (priv->maybe_got_unsent_unbind_msgs)
+ kbus_forget_my_unsent_unbind_msgs(priv);
kbus_empty_replies_unsent(priv);
retval2 = kbus_forget_open_ksock(dev, priv->id);
kfree(priv);
@@ -2933,6 +3313,7 @@ static int kbus_unbind(struct kbus_private_data *priv,
int retval = 0;
struct kbus_bind_request *bind;
char *name = NULL;
+ u32 old_message_count = priv->message_count;

bind = kmalloc(sizeof(*bind), GFP_KERNEL);
if (!bind)
@@ -2975,6 +3356,36 @@ static int kbus_unbind(struct kbus_private_data *priv,
retval = kbus_forget_binding(dev, priv,
bind->is_replier, bind->name_len, name);

+ /*
+ * If we're unbinding from $.KBUS.ReplierBindEvent, and there
+ * are (or may be) any such kept for us on the unread Replier
+ * Unbind Event list, then we need to remove them as well...
+ *
+ * NOTE that the following only checks for exact matchs to
+ * $.KBUS.ReplierBindEvent, which should be sufficient...
+ */
+ if (priv->maybe_got_unsent_unbind_msgs &&
+ !strcmp(name, KBUS_MSG_NAME_REPLIER_BIND_EVENT))
+ kbus_forget_my_unsent_unbind_msgs(priv);
+
+ /*
+ * If that removed any messages from the message queue, then we have
+ * room to consider moving a message across from the unread Replier
+ * Unbind Event list
+ */
+ if (priv->message_count < old_message_count &&
+ priv->maybe_got_unsent_unbind_msgs) {
+ int rv = kbus_maybe_move_unsent_unbind_msg(priv);
+ /* If this fails, we're probably stumped */
+ if (rv)
+ /* The best we can do is grumble gently. We still
+ * want to return retval, not rv.
+ */
+ dev_err(priv->dev->dev,
+ "Failed to move unsent messages on "
+ "unbind (error %d)\n", -rv);
+ }
+
done:
kfree(name);
kfree(bind);
@@ -3140,6 +3551,21 @@ static int kbus_nextmsg(struct kbus_private_data *priv,
return retval;
}

+ /*
+ * If we (maybe) have any unread Replier Unbind Event messages,
+ * we now have room to copy one across to the message list
+ */
+ kbus_maybe_dbg(priv->dev,
+ " ++ maybe_got_unsent_unbind_msgs %d\n",
+ priv->maybe_got_unsent_unbind_msgs);
+
+ if (priv->maybe_got_unsent_unbind_msgs) {
+ retval = kbus_maybe_move_unsent_unbind_msg(priv);
+ /* If this fails, we're probably stumped */
+ if (retval)
+ return retval;
+ }
+
retval = __put_user(KBUS_ENTIRE_MSG_LEN(msg->name_len, msg->data_len),
(u32 __user *) arg);
if (retval)
@@ -3567,6 +3993,12 @@ static int kbus_nummsgs(struct kbus_private_data *priv,
{
u32 count = priv->message_count;

+ if (priv->maybe_got_unsent_unbind_msgs) {
+ kbus_maybe_dbg(dev, "%u NUMMSGS 'main' count %u\n",
+ priv->id, count);
+ count += kbus_count_unsent_unbind_msgs(priv);
+ }
+
kbus_maybe_dbg(dev, "%u NUMMSGS %u\n", priv->id, count);

return __put_user(count, (u32 __user *) arg);
@@ -4059,6 +4491,7 @@ static void kbus_setup_cdev(struct kbus_dev *dev, int devno)
*/
INIT_LIST_HEAD(&dev->bound_message_list);
INIT_LIST_HEAD(&dev->open_ksock_list);
+ INIT_LIST_HEAD(&dev->unsent_unbind_msg_list);

init_waitqueue_head(&dev->write_wait);

@@ -4078,6 +4511,7 @@ static void kbus_teardown_cdev(struct kbus_dev *dev)
{
kbus_forget_all_bindings(dev);
kbus_forget_all_open_ksocks(dev);
+ kbus_forget_unsent_unbind_msgs(dev);

cdev_del(&dev->cdev);
}
--
1.7.4.1

2011-03-18 17:46:18

by Tony Ibbs

[permalink] [raw]
Subject: [PATCH 11/11] KBUS configuration and Makefile


Signed-off-by: Tony Ibbs <[email protected]>
---
init/Kconfig | 2 +
ipc/Kconfig | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ipc/Makefile | 9 ++++
3 files changed, 128 insertions(+), 0 deletions(-)
create mode 100644 ipc/Kconfig

diff --git a/init/Kconfig b/init/Kconfig
index c972899..945c380 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -339,6 +339,8 @@ config AUDIT_TREE
depends on AUDITSYSCALL
select FSNOTIFY

+source "ipc/Kconfig"
+
source "kernel/irq/Kconfig"

menu "RCU Subsystem"
diff --git a/ipc/Kconfig b/ipc/Kconfig
new file mode 100644
index 0000000..808d742
--- /dev/null
+++ b/ipc/Kconfig
@@ -0,0 +1,117 @@
+config KBUS
+ tristate "KBUS messaging system"
+ default n
+ ---help---
+ KBUS is a lightweight messaging system, particularly aimed
+ at embedded platforms. This option provides the kernel support
+ for mediating messages between client processes.
+
+ KBUS documentation may be found in Documentation/Kbus.txt
+
+ If you want KBUS support, you should say Y or M here. If you
+ choose M, the module will be called kbus.
+
+ If unsure, say N.
+
+if KBUS
+
+config KBUS_DEBUG
+ bool "Make KBUS debugging messages available"
+ default y
+ ---help---
+ This is the master switch for KBUS debug kernel messages.
+
+ If N is selected, all debug messages will be compiled out,
+ and the KBUS_IOC_VERBOSE ioctl will have no effect.
+
+ If Y is selected, then KBUS_DEBUG_DEFAULT_VERBOSE sets
+ the default for whether debug messages are emitted or not,
+ and the KBUS_IOC_VERBOSE ioctl can be used at runtime to
+ set/unset the output of debugging messages per KBUS device.
+
+ If unsure, say Y.
+
+config KBUS_DEBUG_DEFAULT_VERBOSE
+ bool "Output KBUS debug messages by default"
+ depends on KBUS_DEBUG
+ default n
+ ---help---
+ This sets the default state for the output of debugging messages,
+ It only has effect if KBUS_DEBUG is already set.
+
+ If Y is selected, then KBUS devices default to outputting debug
+ messages. If N is selected, they do not.
+
+ In either case, debug messages for a particular KBUS device can
+ be turned on or off at runtime with the KBUS_IOC_VERBOSE ioctl.
+
+ If unsure, say N.
+
+config KBUS_DEF_NUM_DEVICES
+ int "Number of KBUS devices to auto-create"
+ default 1
+ range 1 KBUS_MAX_NUM_DEVICES
+ ---help---
+ This specifies the number of KBUS devices automatically created
+ when the KBUS subsystem initialises (when the module is loaded
+ or the kernel booted, as appropriate).
+
+ If KBUS is built as a module, this number may also be given as a
+ parameter; for example: kbus_num_devices=5.
+
+ Additional devices can be added at runtime using the
+ KBUS_IOC_NEWDEVICE ioctl.
+
+config KBUS_MAX_NUM_DEVICES
+ int "Maximum number of KBUS devices"
+ default 256
+ range 1 2147483647
+ # We don't impose a limit on the max, so if you've got enough
+ # RAM the only practical limit will be the (int) minor count
+ # passed to __register_chrdev_region.
+ ---help---
+ The maximum number of KBUS devices to support. If unsure, use
+ the default of 256.
+
+ Note that this setting controls the size of an array of pointers
+ to in-kernel KBUS structures; reducing it only saves a tiny amount
+ of RAM. On the other hand, once a KBUS device is used, the various
+ message lists and so on do take space.
+
+config KBUS_DEF_MAX_MESSAGES
+ int "Default KBUS message queue size limit"
+ default 100
+ range 1 2147483647
+ ---help---
+ Specify the default incoming message queue size limit. This default
+ is applied to all clients whenever they open or reopen a KBUS device
+ node.
+
+ Clients sending messages may specify the desired behaviour if any
+ of the recipients' message queues are full. If a senders own queue
+ is full, it may not send a message flagged as a Request. Refer to
+ the KBBUS documentation ("Queues filling up") for details.
+
+ Clients may change their own queue size limits at any time with the
+ KBUS_IOC_MAXMSGS ioctl.
+
+ The default is believed to be a reasonable conservative choice.
+
+config KBUS_MAX_UNSENT_UNBIND_MESSAGES
+ int "Maximum number of unsent KBUS unbind event messages"
+ default 1000
+ range 1 2147483647
+ ---help---
+ KBUS devices may request (by ioctl) that they want to receive
+ messages when clients bind or unbind as repliers. If such a
+ message is sent when their incoming queue is full, it is instead
+ saved onto a set-aside queue, and delivered later. This setting
+ determines the size of the set-aside queue; if the limit is reached,
+ a special "bind/unbind event messages were lost" message is queued
+ instead, and any further bind/unbind messages will be lost, until
+ such time as the special message can be delivered.
+
+ If unsure, choose the default.
+
+endif # KBUS
+
diff --git a/ipc/Makefile b/ipc/Makefile
index 9075e17..db692ad 100644
--- a/ipc/Makefile
+++ b/ipc/Makefile
@@ -2,6 +2,11 @@
# Makefile for the linux ipc.
#

+ifeq ($(CONFIG_KBUS_DEBUG),y)
+ CFLAGS_kbus_main.o = -DDEBUG
+ CFLAGS_kbus_report.o = -DDEBUG
+endif
+
obj-$(CONFIG_SYSVIPC_COMPAT) += compat.o
obj-$(CONFIG_SYSVIPC) += util.o msgutil.o msg.o sem.o shm.o ipcns_notifier.o syscall.o
obj-$(CONFIG_SYSVIPC_SYSCTL) += ipc_sysctl.o
@@ -9,4 +14,8 @@ obj_mq-$(CONFIG_COMPAT) += compat_mq.o
obj-$(CONFIG_POSIX_MQUEUE) += mqueue.o msgutil.o $(obj_mq-y)
obj-$(CONFIG_IPC_NS) += namespace.o
obj-$(CONFIG_POSIX_MQUEUE_SYSCTL) += mq_sysctl.o
+obj-$(CONFIG_KBUS) += kbus.o
+
+kbus-y := kbus_main.o
+kbus-$(CONFIG_PROC_FS) += kbus_report.o

--
1.7.4.1

2011-03-18 17:46:34

by Tony Ibbs

[permalink] [raw]
Subject: [PATCH 10/11] KBUS report state to userspace

This introduces two pseudo-files which expose some of the current
KBUS internal state. The bindings file, in particular, is used
in the KBUS test scripts (which are part of the userspace Python
binding).

Signed-off-by: Tony Ibbs <[email protected]>
---
ipc/kbus_report.c | 256 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 256 insertions(+), 0 deletions(-)
create mode 100644 ipc/kbus_report.c

diff --git a/ipc/kbus_report.c b/ipc/kbus_report.c
new file mode 100644
index 0000000..ba07254
--- /dev/null
+++ b/ipc/kbus_report.c
@@ -0,0 +1,256 @@
+/* KBUS kernel module - general reporting to userspace
+ */
+
+/*
+ * ***** BEGIN LICENSE BLOCK *****
+ * Version: MPL 1.1
+ *
+ * The contents of this file are subject to the Mozilla Public License Version
+ * 1.1 (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ * http://www.mozilla.org/MPL/
+ *
+ * Software distributed under the License is distributed on an "AS IS" basis,
+ * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
+ * for the specific language governing rights and limitations under the
+ * License.
+ *
+ * The Original Code is the KBUS Lightweight Linux-kernel mediated
+ * message system
+ *
+ * The Initial Developer of the Original Code is Kynesim, Cambridge UK.
+ * Portions created by the Initial Developer are Copyright (C) 2009
+ * the Initial Developer. All Rights Reserved.
+ *
+ * Contributor(s):
+ * Kynesim, Cambridge UK
+ * Tony Ibbs <[email protected]>
+ *
+ * Alternatively, the contents of this file may be used under the terms of the
+ * GNU Public License version 2 (the "GPL"), in which case the provisions of
+ * the GPL are applicable instead of the above. If you wish to allow the use
+ * of your version of this file only under the terms of the GPL and not to
+ * allow others to use your version of this file under the MPL, indicate your
+ * decision by deleting the provisions above and replace them with the notice
+ * and other provisions required by the GPL. If you do not delete the
+ * provisions above, a recipient may use your version of this file under either
+ * the MPL or the GPL.
+ *
+ * ***** END LICENSE BLOCK *****
+ */
+
+#include <linux/module.h>
+#include <linux/cdev.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+
+#include <linux/kbus_defns.h>
+#include "kbus_internal.h"
+
+/* /proc */
+static struct proc_dir_entry *kbus_proc_dir;
+static struct proc_dir_entry *kbus_proc_file_bindings;
+static struct proc_dir_entry *kbus_proc_file_stats;
+
+/* What's the symbolic name of a message part? */
+static const char *kbus_msg_part_name(enum kbus_msg_parts p)
+{
+ switch (p) {
+ case KBUS_PART_HDR: return "HDR";
+ case KBUS_PART_NAME: return "NAME";
+ case KBUS_PART_NPAD: return "NPAD";
+ case KBUS_PART_DATA: return "DATA";
+ case KBUS_PART_DPAD: return "DPAD";
+ case KBUS_PART_FINAL_GUARD: return "FINAL";
+ }
+
+ pr_err("kbus: unhandled enum lookup %d in "
+ "kbus_msg_part_name - memory corruption?", p);
+ return "???";
+}
+
+/*
+ * Report on the current bindings, via /proc/kbus/bindings
+ */
+
+static int kbus_binding_seq_show(struct seq_file *s, void *v __always_unused)
+{
+ int ii;
+ int kbus_num_devices;
+ struct kbus_dev **kbus_devices;
+
+ kbus_get_device_data(&kbus_num_devices, &kbus_devices);
+
+ /* We report on all of the KBUS devices */
+ for (ii = 0; ii < kbus_num_devices; ii++) {
+ struct kbus_dev *dev = kbus_devices[ii];
+
+ struct kbus_message_binding *ptr;
+ struct kbus_message_binding *next;
+
+ if (mutex_lock_interruptible(&dev->mux))
+ return -ERESTARTSYS;
+
+ seq_printf(s,
+ "# <device> is bound to <Ksock-ID> in <process-PID>"
+ " as <Replier|Listener> for <message-name>\n");
+
+ list_for_each_entry_safe(ptr, next, &dev->bound_message_list,
+ list) {
+ seq_printf(s, "%3u: %8u %8lu %c %.*s\n", dev->index,
+ ptr->bound_to_id,
+ (long unsigned)ptr->bound_to->pid,
+ (ptr->is_replier ? 'R' : 'L'),
+ ptr->name_len, ptr->name);
+ }
+
+ mutex_unlock(&dev->mux);
+ }
+ return 0;
+}
+
+static int kbus_proc_bindings_open(struct inode *inode __always_unused,
+ struct file *file)
+{
+ return single_open(file, kbus_binding_seq_show, NULL);
+}
+
+static const struct file_operations kbus_proc_binding_file_ops = {
+ .owner = THIS_MODULE,
+ .open = kbus_proc_bindings_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release
+};
+
+static struct proc_dir_entry
+*kbus_create_proc_binding_file(struct proc_dir_entry *directory)
+{
+ struct proc_dir_entry *entry =
+ create_proc_entry("bindings", 0, directory);
+ if (entry)
+ entry->proc_fops = &kbus_proc_binding_file_ops;
+ return entry;
+}
+
+/*
+ * Report on whatever statistics seem like they might be useful,
+ * via /proc/kbus/stats
+ */
+
+static int kbus_stats_seq_show(struct seq_file *s, void *v __always_unused)
+{
+ int ii;
+ int kbus_num_devices;
+ struct kbus_dev **kbus_devices;
+
+ kbus_get_device_data(&kbus_num_devices, &kbus_devices);
+
+ /* We report on all of the KBUS devices */
+ for (ii = 0; ii < kbus_num_devices; ii++) {
+ struct kbus_dev *dev = kbus_devices[ii];
+
+ struct kbus_private_data *ptr;
+ struct kbus_private_data *next;
+
+ if (mutex_lock_interruptible(&dev->mux))
+ return -ERESTARTSYS;
+
+ seq_printf(s,
+ "dev %2u: next ksock %u next msg %u "
+ "unsent unbindings %u%s\n",
+ dev->index, dev->next_ksock_id,
+ dev->next_msg_serial_num,
+ dev->unsent_unbind_msg_count,
+ (dev->unsent_unbind_is_tragic ? "(gone tragic)" : ""));
+
+ list_for_each_entry_safe(ptr, next,
+ &dev->open_ksock_list, list) {
+
+ u32 left = kbus_lenleft(ptr);
+ u32 total;
+ if (ptr->read.msg)
+ total =
+ KBUS_ENTIRE_MSG_LEN(ptr->read.msg->name_len,
+ ptr->read.msg->data_len);
+ else
+ total = 0;
+
+ seq_printf(s, " ksock %u last msg %u:%u "
+ "queue %u of %u\n",
+ ptr->id, ptr->last_msg_id_sent.network_id,
+ ptr->last_msg_id_sent.serial_num,
+ ptr->message_count, ptr->max_messages);
+
+ seq_printf(s, " read byte %u of %u, "
+ "wrote byte %u of %s (%sfinished), "
+ "%ssending\n",
+ (total - left), total, ptr->write.pos,
+ kbus_msg_part_name(ptr->write.which),
+ ptr->write.is_finished ? "" : "not ",
+ ptr->sending ? "" : "not ");
+
+ seq_printf(s, " outstanding requests %u "
+ "(size %u, max %u), "
+ "unsent replies %u (max %u)\n",
+ ptr->outstanding_requests.count,
+ ptr->outstanding_requests.size,
+ ptr->outstanding_requests.max_count,
+ ptr->num_replies_unsent,
+ ptr->max_replies_unsent);
+ }
+ mutex_unlock(&dev->mux);
+ }
+
+ return 0;
+}
+
+static int kbus_proc_stats_open(struct inode *inode __always_unused,
+ struct file *file)
+{
+ return single_open(file, kbus_stats_seq_show, NULL);
+}
+
+static const struct file_operations kbus_proc_stats_file_ops = {
+ .owner = THIS_MODULE,
+ .open = kbus_proc_stats_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release
+};
+
+static struct proc_dir_entry
+*kbus_create_proc_stats_file(struct proc_dir_entry *directory)
+{
+ struct proc_dir_entry *entry = create_proc_entry("stats", 0, directory);
+ if (entry)
+ entry->proc_fops = &kbus_proc_stats_file_ops;
+ return entry;
+}
+
+/* ========================================================================= */
+
+extern void kbus_setup_reporting(void)
+{
+ /* Within the /proc/kbus directory, we have: */
+ kbus_proc_dir = proc_mkdir("kbus", NULL);
+ if (kbus_proc_dir) {
+ /* /proc/kbus/bindings -- message name bindings */
+ kbus_proc_file_bindings =
+ kbus_create_proc_binding_file(kbus_proc_dir);
+ /* /proc/kbus/stats -- miscellaneous statistics */
+ kbus_proc_file_stats =
+ kbus_create_proc_stats_file(kbus_proc_dir);
+ }
+}
+
+extern void kbus_remove_reporting(void)
+{
+ if (kbus_proc_dir) {
+ if (kbus_proc_file_bindings)
+ remove_proc_entry("bindings", kbus_proc_dir);
+ if (kbus_proc_file_stats)
+ remove_proc_entry("stats", kbus_proc_dir);
+ remove_proc_entry("kbus", NULL);
+ }
+}
--
1.7.4.1

2011-03-18 17:46:51

by Tony Ibbs

[permalink] [raw]
Subject: [PATCH 02/11] KBUS external header file.

This defines the message datastructures, the IOCTLs, etc.

Signed-off-by: Tony Ibbs <[email protected]>
---
include/linux/kbus_defns.h | 613 ++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 613 insertions(+), 0 deletions(-)
create mode 100644 include/linux/kbus_defns.h

diff --git a/include/linux/kbus_defns.h b/include/linux/kbus_defns.h
new file mode 100644
index 0000000..d43c498
--- /dev/null
+++ b/include/linux/kbus_defns.h
@@ -0,0 +1,613 @@
+/* Kbus kernel module external headers
+ *
+ * This file provides the definitions (datastructures and ioctls) needed to
+ * communicate with the KBUS character device driver.
+ */
+
+/*
+ * ***** BEGIN LICENSE BLOCK *****
+ * Version: MPL 1.1
+ *
+ * The contents of this file are subject to the Mozilla Public License Version
+ * 1.1 (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ * http://www.mozilla.org/MPL/
+ *
+ * Software distributed under the License is distributed on an "AS IS" basis,
+ * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
+ * for the specific language governing rights and limitations under the
+ * License.
+ *
+ * The Original Code is the KBUS Lightweight Linux-kernel mediated
+ * message system
+ *
+ * The Initial Developer of the Original Code is Kynesim, Cambridge UK.
+ * Portions created by the Initial Developer are Copyright (C) 2009
+ * the Initial Developer. All Rights Reserved.
+ *
+ * Contributor(s):
+ * Kynesim, Cambridge UK
+ * Tony Ibbs <[email protected]>
+ *
+ * Alternatively, the contents of this file may be used under the terms of the
+ * GNU Public License version 2 (the "GPL"), in which case the provisions of
+ * the GPL are applicable instead of the above. If you wish to allow the use
+ * of your version of this file only under the terms of the GPL and not to
+ * allow others to use your version of this file under the MPL, indicate your
+ * decision by deleting the provisions above and replace them with the notice
+ * and other provisions required by the GPL. If you do not delete the
+ * provisions above, a recipient may use your version of this file under either
+ * the MPL or the GPL.
+ *
+ * ***** END LICENSE BLOCK *****
+ */
+
+#ifndef _kbus_defns
+#define _kbus_defns
+
+#if !__KERNEL__ && defined(__cplusplus)
+extern "C" {
+#endif
+
+#if __KERNEL__
+#include <linux/kernel.h>
+#include <linux/ioctl.h>
+#else
+#include <stdint.h>
+#include <sys/ioctl.h>
+#endif
+
+/*
+ * A message id is made up of two fields.
+ *
+ * If the network id is 0, then it is up to us (KBUS) to assign the
+ * serial number. In other words, this is a local message.
+ *
+ * If the network id is non-zero, then this message is presumed to
+ * have originated from another "network", and we preserve both the
+ * network id and the serial number.
+ *
+ * The message id {0,0} is special and reserved (for use by KBUS).
+ */
+struct kbus_msg_id {
+ __u32 network_id;
+ __u32 serial_num;
+};
+
+/*
+ * kbus_orig_from is used for the "originally from" and "finally to" ids
+ * in the message header. These in turn are used when messages are
+ * being sent between KBUS systems (via KBUS "Limpets"). KBUS the kernel
+ * module transmits them, unaltered, but does not use them (although
+ * debug messages may report them).
+ *
+ * An "originally from" or "finally to" id is made up of two fields, the
+ * network id (which indicates the Limpet, if any, that originally gated the
+ * message), and a local id, which is the Ksock id of the original sender
+ * of the message, on its local KBUS.
+ *
+ * If the network id is 0, then the "originally from" id is not being used.
+ *
+ * Limpets and these fields are discussed in more detail in the userspace
+ * KBUS documentation - see http://kbus-messaging.org/ for pointers to
+ * more information.
+ */
+struct kbus_orig_from {
+ __u32 network_id;
+ __u32 local_id;
+};
+
+/* When the user asks to bind a message name to an interface, they use: */
+struct kbus_bind_request {
+ __u32 is_replier; /* are we a replier? */
+ __u32 name_len;
+ char *name;
+};
+
+/* When the user requests the id of the replier to a message, they use: */
+struct kbus_bind_query {
+ __u32 return_id;
+ __u32 name_len;
+ char *name;
+};
+
+/* When the user writes/reads a message, they use: */
+struct kbus_message_header {
+ /*
+ * The guards
+ * ----------
+ *
+ * * 'start_guard' is notionally "Kbus", and 'end_guard' (the 32 bit
+ * word after the rest of the message datastructure) is notionally
+ * "subK". Obviously that depends on how one looks at the 32-bit
+ * word. Every message datastructure shall start with a start guard
+ * and end with an end guard.
+ *
+ * These provide some help in checking that a message is well formed,
+ * and in particular the end guard helps to check for broken length
+ * fields.
+ *
+ * - 'id' identifies this particular message.
+ *
+ * When a user writes a new message, they should set this to {0,0}.
+ * KBUS will then set a new message id for the message.
+ *
+ * When a user reads a message, this will have been set by KBUS.
+ *
+ * When a user replies to a message, they should copy this value
+ * into the 'in_reply_to' field, so that the recipient will know
+ * what message this was a reply to.
+ *
+ * - 'in_reply_to' identifies the message this is a reply to.
+ *
+ * This shall be set to {0,0} unless this message *is* a reply to a
+ * previous message. In other words, if this value is non-0, then
+ * the message *is* a reply.
+ *
+ * - 'to' is who the message is to be sent to.
+ *
+ * When a user writes a new message, this should normally be set
+ * to {0,0}, meaning "anyone listening" (but see below if "state"
+ * is being maintained).
+ *
+ * When replying to a message, it shall be set to the 'from' value
+ * of the orginal message.
+ *
+ * When constructing a request message (a message wanting a reply),
+ * the user can set it to a specific replier id, to produce a stateful
+ * request. This is normally done by copying the 'from' of a previous
+ * Reply from the appropriate replier. When such a message is sent,
+ * if the replier bound (at that time) does not have that specific
+ * id, then the send will fail.
+ *
+ * Note that if 'to' is set, then 'orig_from' should also be set.
+ *
+ * - 'from' indicates who sent the message.
+ *
+ * When a user is writing a new message, they should set this
+ * to {0,0}.
+ *
+ * When a user is reading a message, this will have been set
+ * by KBUS.
+ *
+ * When a user replies to a message, the reply should have its
+ * 'to' set to the original messages 'from', and its 'from' set
+ * to {0,0} (see the "hmm" caveat under 'to' above, though).
+ *
+ * - 'orig_from' and 'final_to' are used when Limpets are mediating
+ * KBUS messages between KBUS devices (possibly on different
+ * machines). See the description by the datastructure definition
+ * above. The KBUS kernel preserves and propagates their values,
+ * but does not alter or use them.
+ *
+ * - 'extra' is currently unused, and KBUS will set it to zero.
+ * Future versions of KBUS may treat it differently.
+ *
+ * - 'flags' indicates the type of message.
+ *
+ * When a user writes a message, this can be used to indicate
+ * that:
+ *
+ * * the message is URGENT
+ * * a reply is wanted
+ *
+ * When a user reads a message, this indicates if:
+ *
+ * * the message is URGENT
+ * * a reply is wanted
+ *
+ * When a user writes a reply, this field should be set to 0.
+ *
+ * The top half of the 'flags' is not touched by KBUS, and may
+ * be used for any purpose the user wishes.
+ *
+ * - 'name_len' is the length of the message name in bytes.
+ *
+ * This must be non-zero.
+ *
+ * - 'data_len' is the length of the message data in bytes. It may be
+ * zero if there is no data.
+ *
+ * - 'name' is a pointer to the message name. This should be null
+ * terminated, as is the normal case for C strings.
+ *
+ * NB: If this is zero, then the name will be present, but after
+ * the end of this datastructure, and padded out to a multiple of
+ * four bytes (see kbus_entire_message). When doing this padding,
+ * remember to allow for the terminating null byte. If this field is
+ * zero, then 'data' shall also be zero.
+ *
+ * - 'data' is a pointer to the data. If there is no data (if
+ * 'data_len' is zero), then this shall also be zero. The data is
+ * not touched by KBUS, and may include any values.
+ *
+ * NB: If this is zero, then the data will occur immediately
+ * after the message name, padded out to a multiple of four bytes.
+ * See the note for 'name' above.
+ *
+ */
+ __u32 start_guard;
+ struct kbus_msg_id id; /* Unique to this message */
+ struct kbus_msg_id in_reply_to; /* Which message this is a reply to */
+ __u32 to; /* 0 (empty) or a replier id */
+ __u32 from; /* 0 (KBUS) or the sender's id */
+ struct kbus_orig_from orig_from;/* Cross-network linkage */
+ struct kbus_orig_from final_to; /* Cross-network linkage */
+ __u32 extra; /* ignored field - future proofing */
+ __u32 flags; /* Message type/flags */
+ __u32 name_len; /* Message name's length, in bytes */
+ __u32 data_len; /* Message length, also in bytes */
+ char *name;
+ void *data;
+ __u32 end_guard;
+};
+
+#define KBUS_MSG_START_GUARD 0x7375624B
+#define KBUS_MSG_END_GUARD 0x4B627573
+
+/*
+ * When a message is returned by 'read', it is actually returned using the
+ * following datastructure, in which:
+ *
+ * - 'header.name' will point to 'rest[0]'
+ * - 'header.data' will point to 'rest[(header.name_len+3)/4]'
+ *
+ * followed by the name (padded to 4 bytes, remembering to allow for the
+ * terminating null byte), followed by the data (padded to 4 bytes) followed by
+ * (another) end_guard.
+ */
+struct kbus_entire_message {
+ struct kbus_message_header header;
+ __u32 rest[];
+};
+
+/*
+ * We limit a message name to at most 1000 characters (some limit seems
+ * sensible, after all)
+ */
+#define KBUS_MAX_NAME_LEN 1000
+
+/*
+ * The length (in bytes) of the name after padding, allowing for a terminating
+ * null byte.
+ */
+#define KBUS_PADDED_NAME_LEN(name_len) (4 * ((name_len + 1 + 3) / 4))
+
+/*
+ * The length (in bytes) of the data after padding
+ */
+#define KBUS_PADDED_DATA_LEN(data_len) (4 * ((data_len + 3) / 4))
+
+/*
+ * Given name_len (in bytes) and data_len (in bytes), return the
+ * length of the appropriate kbus_entire_message_struct, in bytes
+ *
+ * Note that we're allowing for a zero byte after the end of the message name.
+ *
+ * Remember that "sizeof" doesn't count the 'rest' field in our message
+ * structure.
+ */
+#define KBUS_ENTIRE_MSG_LEN(name_len, data_len) \
+ (sizeof(struct kbus_entire_message) + \
+ KBUS_PADDED_NAME_LEN(name_len) + \
+ KBUS_PADDED_DATA_LEN(data_len) + 4)
+
+/*
+ * The message name starts at entire->rest[0].
+ * The message data starts after the message name - given the message
+ * name's length (in bytes), that is at index:
+ */
+#define KBUS_ENTIRE_MSG_DATA_INDEX(name_len) ((name_len+1+3)/4)
+/*
+ * Given the message name length (in bytes) and the message data length (also
+ * in bytes), the index of the entire message end guard is thus:
+ */
+#define KBUS_ENTIRE_MSG_END_GUARD_INDEX(name_len, data_len) \
+ ((name_len+1+3)/4 + (data_len+3)/4)
+
+/*
+ * Find a pointer to the message's name.
+ *
+ * It's either the given name pointer, or just after the header (if the pointer
+ * is NULL)
+ */
+static inline char *kbus_msg_name_ptr(const struct kbus_message_header
+ *hdr)
+{
+ if (hdr->name) {
+ return hdr->name;
+ } else {
+ struct kbus_entire_message *entire;
+ entire = (struct kbus_entire_message *)hdr;
+ return (char *)&entire->rest[0];
+ }
+}
+
+/*
+ * Find a pointer to the message's data.
+ *
+ * It's either the given data pointer, or just after the name (if the pointer
+ * is NULL)
+ */
+static inline void *kbus_msg_data_ptr(const struct kbus_message_header
+ *hdr)
+{
+ if (hdr->data) {
+ return hdr->data;
+ } else {
+ struct kbus_entire_message *entire;
+ __u32 data_idx;
+
+ entire = (struct kbus_entire_message *)hdr;
+ data_idx = KBUS_ENTIRE_MSG_DATA_INDEX(hdr->name_len);
+ return (void *)&entire->rest[data_idx];
+ }
+}
+
+/*
+ * Find a pointer to the message's (second/final) end guard.
+ */
+static inline __u32 *kbus_msg_end_ptr(struct kbus_entire_message
+ *entire)
+{
+ __u32 end_guard_idx =
+ KBUS_ENTIRE_MSG_END_GUARD_INDEX(entire->header.name_len,
+ entire->header.data_len);
+ return (__u32 *) &entire->rest[end_guard_idx];
+}
+
+/*
+ * Things KBUS changes in a message
+ * --------------------------------
+ * In general, KBUS leaves the content of a message alone. However, it does
+ * change:
+ *
+ * - the message id (if id.network_id is unset - it assigns a new serial
+ * number unique to this message)
+ * - the from id (if from.network_id is unset - it sets the local_id to
+ * indicate the Ksock this message was sent from)
+ * - the KBUS_BIT_WANT_YOU_TO_REPLY bit in the flags (set or cleared
+ * as appropriate)
+ * - the SYNTHETIC bit, which KBUS will always unset in a user message
+ */
+
+/*
+ * Flags for the message 'flags' word
+ * ----------------------------------
+ * The KBUS_BIT_WANT_A_REPLY bit is set by the sender to indicate that a
+ * reply is wanted. This makes the message into a request.
+ *
+ * Note that setting the WANT_A_REPLY bit (i.e., a request) and
+ * setting 'in_reply_to' (i.e., a reply) is bound to lead to
+ * confusion, and the results are undefined (i.e., don't do it).
+ *
+ * The KBUS_BIT_WANT_YOU_TO_REPLY bit is set by KBUS on a particular message
+ * to indicate that the particular recipient is responsible for replying
+ * to (this instance of the) message. Otherwise, KBUS clears it.
+ *
+ * The KBUS_BIT_SYNTHETIC bit is set by KBUS when it generates a synthetic
+ * message (an exception, if you will), for instance when a replier has
+ * gone away and therefore a reply will never be generated for a request
+ * that has already been queued.
+ *
+ * Note that KBUS does not check that a sender has not set this
+ * on a message, but doing so may lead to confusion.
+ *
+ * The KBUS_BIT_URGENT bit is set by the sender if this message is to be
+ * treated as urgent - i.e., it should be added to the *front* of the
+ * recipient's message queue, not the back.
+ *
+ * Send flags
+ * ==========
+ * There are two "send" flags, KBUS_BIT_ALL_OR_WAIT and KBUS_BIT_ALL_OR_FAIL.
+ * Either one may be set, or both may be unset.
+ *
+ * If both bits are set, the message will be rejected as invalid.
+ *
+ * Both flags are ignored in reply messages (i.e., messages with the
+ * 'in_reply_to' field set).
+ *
+ * If both are unset, then a send will behave in the default manner. That is,
+ * the message will be added to a listener's queue if there is room but
+ * otherwise the listener will (silently) not receive the message.
+ *
+ * (Obviously, if the listener is a replier, and the message is a request,
+ * then a KBUS message will be synthesised in the normal manner when a
+ * request is lost.)
+ *
+ * If the KBUS_BIT_ALL_OR_WAIT bit is set, then a send should block until
+ * all recipients can be sent the message. Specifically, before the message is
+ * sent, all recipients must have room on their message queues for this
+ * message, and if they do not, the send will block until there is room for the
+ * message on all the queues.
+ *
+ * If the KBUS_BIT_ALL_OR_FAIL bit is set, then a send should fail if all
+ * recipients cannot be sent the message. Specifically, before the message is
+ * sent, all recipients must have room on their message queues for this
+ * message, and if they do not, the send will fail.
+ */
+
+/*
+ * When a $.KBUS.ReplierBindEvent message is constructed, we use the
+ * following to encapsulate its data.
+ *
+ * This indicates whether it is a bind or unbind event, who is doing the
+ * bind or unbind, and for what message name. The message name is padded
+ * out to a multiple of four bytes, allowing for a terminating null byte,
+ * but the name length is the length without said padding (so, in C terms,
+ * strlen(name)).
+ *
+ * As for the message header data structure, the actual data "goes off the end"
+ * of the datastructure.
+ */
+struct kbus_replier_bind_event_data {
+ __u32 is_bind; /* 1=bind, 0=unbind */
+ __u32 binder; /* Ksock id of binder */
+ __u32 name_len; /* Length of name */
+ __u32 rest[]; /* Message name */
+};
+
+#if !__KERNEL__
+#define BIT(num) (((unsigned)1) << (num))
+#endif
+
+#define KBUS_BIT_WANT_A_REPLY BIT(0)
+#define KBUS_BIT_WANT_YOU_TO_REPLY BIT(1)
+#define KBUS_BIT_SYNTHETIC BIT(2)
+#define KBUS_BIT_URGENT BIT(3)
+
+#define KBUS_BIT_ALL_OR_WAIT BIT(8)
+#define KBUS_BIT_ALL_OR_FAIL BIT(9)
+
+/*
+ * Standard message names
+ * ======================
+ * KBUS itself has some predefined message names.
+ *
+ * Synthetic Replies with no data
+ * ------------------------------
+ * These are sent to the original Sender of a Request when KBUS knows that the
+ * Replier is not going to Reply. In all cases, you can identify which message
+ * they concern by looking at the "in_reply_to" field:
+ *
+ * * Replier.GoneAway - the Replier has gone away before reading the Request.
+ * * Replier.Ignored - the Replier has gone away after reading a Request, but
+ * before replying to it.
+ * * Replier.Unbound - the Replier has unbound (as Replier) from the message
+ * name, and is thus not going to reply to this Request in its unread message
+ * queue.
+ * * Replier.Disappeared - the Replier has disappeared when an attempt is made
+ * to send a Request whilst polling (i.e., after EAGAIN was returned from an
+ * earlier attempt to send a message). This typically means that the Ksock
+ * bound as Replier closed.
+ * * ErrorSending - an unexpected error occurred when trying to send a Request
+ * to its Replier whilst polling.
+ */
+#define KBUS_MSG_NAME_REPLIER_GONEAWAY "$.KBUS.Replier.GoneAway"
+#define KBUS_MSG_NAME_REPLIER_IGNORED "$.KBUS.Replier.Ignored"
+#define KBUS_MSG_NAME_REPLIER_UNBOUND "$.KBUS.Replier.Unbound"
+#define KBUS_MSG_NAME_REPLIER_DISAPPEARED "$.KBUS.Replier.Disappeared"
+#define KBUS_MSG_NAME_ERROR_SENDING "$.KBUS.ErrorSending"
+
+#define KBUS_IOC_MAGIC 'k' /* 0x6b - which seems fair enough for now */
+/*
+ * RESET: reserved for future use
+ */
+#define KBUS_IOC_RESET _IO(KBUS_IOC_MAGIC, 1)
+/*
+ * BIND - bind a Ksock to a message name
+ * arg: struct kbus_bind_request, indicating what to bind to
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_BIND _IOW(KBUS_IOC_MAGIC, 2, char *)
+/*
+ * UNBIND - unbind a Ksock from a message id
+ * arg: struct kbus_bind_request, indicating what to unbind from
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_UNBIND _IOW(KBUS_IOC_MAGIC, 3, char *)
+/*
+ * KSOCKID - determine a Ksock's Ksock id
+ *
+ * The network_id for the current Ksock is, by definition, 0, so we don't need
+ * to return it.
+ *
+ * arg (out): __u32, indicating this Ksock's local_id
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_KSOCKID _IOR(KBUS_IOC_MAGIC, 4, char *)
+/*
+ * REPLIER - determine the Ksock id of the replier for a message name
+ * arg: struct kbus_bind_query
+ *
+ * - on input, specify the message name to ask about.
+ * - on output, KBUS fills in the relevant Ksock id in the return_value,
+ * or 0 if there is no bound replier
+ *
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_REPLIER _IOWR(KBUS_IOC_MAGIC, 5, char *)
+/*
+ * NEXTMSG - pop the next message from the read queue
+ * arg (out): __u32, number of bytes in the next message, 0 if there is no
+ * next message
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_NEXTMSG _IOR(KBUS_IOC_MAGIC, 6, char *)
+/*
+ * LENLEFT - determine how many bytes are left to read of the current message
+ * arg (out): __u32, number of bytes left, 0 if there is no current read
+ * message
+ * retval: 1 if there was a message, 0 if there wasn't, negative for failure
+ */
+#define KBUS_IOC_LENLEFT _IOR(KBUS_IOC_MAGIC, 7, char *)
+/*
+ * SEND - send the current message
+ * arg (out): struct kbus_msg_id, the message id of the sent message
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_SEND _IOR(KBUS_IOC_MAGIC, 8, char *)
+/*
+ * DISCARD - discard the message currently being written (if any)
+ * arg: none
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_DISCARD _IO(KBUS_IOC_MAGIC, 9)
+/*
+ * LASTSENT - determine the message id of the last message SENT
+ * arg (out): struct kbus_msg_id, {0,0} if there was no last message
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_LASTSENT _IOR(KBUS_IOC_MAGIC, 10, char *)
+/*
+ * MAXMSGS - set the maximum number of messages on a Ksock read queue
+ * arg (in): __u32, the requested length of the read queue, or 0 to just
+ * request how many there are
+ * arg (out): __u32, the length of the read queue after this call has
+ * succeeded
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_MAXMSGS _IOWR(KBUS_IOC_MAGIC, 11, char *)
+/*
+ * NUMMSGS - determine how many messages are in the read queue for this Ksock
+ * arg (out): __u32, the number of messages in the read queue.
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_NUMMSGS _IOR(KBUS_IOC_MAGIC, 12, char *)
+/*
+ * UNREPLIEDTO - determine the number of requests (marked "WANT_YOU_TO_REPLY")
+ * which we still need to reply to.
+ * arg(out): __u32, said number
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_UNREPLIEDTO _IOR(KBUS_IOC_MAGIC, 13, char *)
+
+/*
+ * IOCTL 14 is not used, because it is introduced in the next revision,
+ * (obviously, in real history this was done in a different order) and
+ * I don't want to alter the number for VERBOSE.
+ */
+
+/*
+ * VERBOSE - should KBUS output verbose "printk" messages (for this device)?
+ *
+ * This IOCTL tells a Ksock whether it should output debugging messages. It is
+ * only effective if the kernel module has been built with the VERBOSE_DEBUGGING
+ * flag set.
+ *
+ * arg(in): __u32, 1 to change to "verbose", 0 to change to "quiet",
+ * 0xFFFFFFFF to just return the current/previous state.
+ * arg(out): __u32, the previous state.
+ * retval: 0 for success, negative for failure (-EINVAL if arg in was not one
+ * of the specified values)
+ */
+#define KBUS_IOC_VERBOSE _IOWR(KBUS_IOC_MAGIC, 15, char *)
+
+/* If adding another IOCTL, remember to increment the next number! */
+#define KBUS_IOC_MAXNR 15
+
+#if !__KERNEL__ && defined(__cplusplus)
+}
+#endif
+
+#endif /* _kbus_defns */
--
1.7.4.1

2011-03-18 17:47:14

by Tony Ibbs

[permalink] [raw]
Subject: [PATCH 03/11] KBUS internal header file

Various internal datastructures, and communication between source
files.

Signed-off-by: Tony Ibbs <[email protected]>
---
ipc/kbus_internal.h | 626 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 626 insertions(+), 0 deletions(-)
create mode 100644 ipc/kbus_internal.h

diff --git a/ipc/kbus_internal.h b/ipc/kbus_internal.h
new file mode 100644
index 0000000..2d9e737
--- /dev/null
+++ b/ipc/kbus_internal.h
@@ -0,0 +1,626 @@
+/* KBUS kernel module - internal definitions
+ *
+ * This is a character device driver, providing the messaging support
+ * for KBUS.
+ *
+ * This header contains the definitions used internally by kbus.c.
+ * At the moment nothing else is expected to include this file.
+ *
+ * KBUS clients should include (at least) kbus_defns.h.
+ */
+
+/*
+ * ***** BEGIN LICENSE BLOCK *****
+ * Version: MPL 1.1
+ *
+ * The contents of this file are subject to the Mozilla Public License Version
+ * 1.1 (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ * http://www.mozilla.org/MPL/
+ *
+ * Software distributed under the License is distributed on an "AS IS" basis,
+ * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
+ * for the specific language governing rights and limitations under the
+ * License.
+ *
+ * The Original Code is the KBUS Lightweight Linux-kernel mediated
+ * message system
+ *
+ * The Initial Developer of the Original Code is Kynesim, Cambridge UK.
+ * Portions created by the Initial Developer are Copyright (C) 2009
+ * the Initial Developer. All Rights Reserved.
+ *
+ * Contributor(s):
+ * Kynesim, Cambridge UK
+ * Tony Ibbs <[email protected]>
+ *
+ * Alternatively, the contents of this file may be used under the terms of the
+ * GNU Public License version 2 (the "GPL"), in which case the provisions of
+ * the GPL are applicable instead of the above. If you wish to allow the use
+ * of your version of this file only under the terms of the GPL and not to
+ * allow others to use your version of this file under the MPL, indicate your
+ * decision by deleting the provisions above and replace them with the notice
+ * and other provisions required by the GPL. If you do not delete the
+ * provisions above, a recipient may use your version of this file under either
+ * the MPL or the GPL.
+ *
+ * ***** END LICENSE BLOCK *****
+ */
+
+#ifndef _kbus_internal
+#define _kbus_internal
+
+/*
+ * KBUS can support multiple devices, as /dev/kbus<N>. These all have
+ * the same major device number, and map to differing minor device
+ * numbers. <N> will also be the minor device number, but don't rely
+ * on that for anything.
+ *
+ * When KBUS starts up, it will always setup a single device (/dev/kbus0),
+ * but it can be asked to setup more - for instance:
+ *
+ * # insmod kbus.ko kbus_num_devices=5
+ *
+ * There is also an IOCTL to allow user-space to request a new device as
+ * necessary. The hot plugging mechanisms should cause the device to appear "as
+ * if by magic".
+ *
+ * (This last means that we *could* default to setting up zero devices
+ * at module startup, and leave the user to ask for the first one, but
+ * that seems rather cruel.)
+ *
+ * We need to set a maximum number of KBUS devices (corresponding to a limit on
+ * minor device numbers). The obvious limit (corresponding to what we'd have
+ * got if we used the deprecated "register_chrdev" to setup our device) is 256,
+ * so we'll go with that.
+ */
+#define KBUS_MIN_NUM_DEVICES 1
+
+#ifdef CONFIG_KBUS_MAX_NUM_DEVICES
+#define KBUS_MAX_NUM_DEVICES CONFIG_KBUS_MAX_NUM_DEVICES
+#else
+#define KBUS_MAX_NUM_DEVICES 256
+#endif
+
+#ifndef CONFIG_KBUS_DEF_NUM_DEVICES
+#define CONFIG_KBUS_DEF_NUM_DEVICES 1
+#endif
+
+/*
+ * Our initial array sizes could arguably be made configurable
+ * for tuning, if we discover this is useful
+ */
+#define KBUS_INIT_MSG_ID_MEMSIZE 16
+#define KBUS_INIT_LISTENER_ARRAY_SIZE 8
+
+/*
+ * Setting CONFIG_KBUS_DEBUG will cause the Makefile
+ * to define DEBUG for us
+ */
+#ifdef DEBUG
+#define kbus_maybe_dbg(kbus_dev, format, args...) do { \
+ if ((kbus_dev)->verbose) \
+ (void) dev_dbg((kbus_dev)->dev, format, ## args); \
+} while (0)
+#else
+#define kbus_maybe_dbg(kbus_dev, format, args...) ((void)0)
+#endif
+
+/*
+ * This is really only directly useful if CONFIG_KBUS_DEBUG is on
+ */
+#ifdef CONFIG_KBUS_DEBUG_DEFAULT_VERBOSE
+#define KBUS_DEFAULT_VERBOSE_SETTING true
+#else
+#define KBUS_DEFAULT_VERBOSE_SETTING false
+#endif
+
+/* ========================================================================= */
+
+/* We need a way of remembering message bindings */
+struct kbus_message_binding {
+ struct list_head list;
+ struct kbus_private_data *bound_to; /* who we're bound to */
+ u32 bound_to_id; /* but the id is often useful */
+ u32 is_replier; /* bound as a replier */
+ u32 name_len;
+ char *name; /* the message name */
+};
+
+/*
+ * For both keeping track of requests sent (to which we still want replies)
+ * and replies read (to which we haven't yet sent a reply), we need some
+ * means of remembering message ids. Since I'd rather not worry the rest of
+ * the code with how this is implemented (which is code for "I'll implement
+ * it very simply and worry about making it efficient/scalable later"), and
+ * since we always want to remember both the message ids and also how many
+ * there are, it seems sensible to bundle this up in its own datastructure.
+ */
+struct kbus_msg_id_mem {
+ u32 count; /* Number of entries in use */
+ u32 size; /* Actual size of the array */
+ u32 max_count; /* Max 'count' we've had */
+ /*
+ * An array is probably the worst way to store a list of message ids,
+ * but it's *very simple*, and should work OK for a smallish number of
+ * message ids. So it's a place to start...
+ *
+ * Note the array may have "unused" slots, signified by message id {0:0}
+ */
+ struct kbus_msg_id *ids;
+};
+
+/* An item in the list of requests that a Ksock has not yet replied to */
+struct kbus_unreplied_item {
+ struct list_head list;
+ struct kbus_msg_id id; /* the request's id */
+ u32 from; /* the sender's id */
+ struct kbus_name_ptr *name_ref; /* and its name... */
+ u32 name_len;
+};
+
+/*
+ * The parts of a message being written to KBUS (via kbus_write[_parts]),
+ * or read by the user (via kbus_read) are:
+ *
+ * * the user-space message header - as in 'struct kbus_message_header'
+ *
+ * from which we copy various items into our own internal message header.
+ *
+ * For a "pointy" message, that is all there is.
+ *
+ * For an "entire" message, this is then followed by:
+ *
+ * * the message name
+ * * padding to bring that up to a NULL terminator and then a 4-byte boundary.
+ *
+ * If the "entire" message has data, then this is followed by:
+ *
+ * * N data parts (all but the last of size PART_LEN)
+ * * padding to bring that up to a 4-byte boundary.
+ *
+ * and finally, whether there was data or not:
+ *
+ * * the final end guard.
+ *
+ * Remember that kbus_read always delivers an "entire" message.
+ */
+enum kbus_msg_parts {
+ KBUS_PART_HDR = 0,
+ KBUS_PART_NAME,
+ KBUS_PART_NPAD,
+ KBUS_PART_DATA,
+ KBUS_PART_DPAD,
+ KBUS_PART_FINAL_GUARD
+};
+/* N.B. New message parts require switch cases in kbus_msg_part_name()
+ * and kbus_write_parts().
+ */
+
+#define KBUS_NUM_PARTS (KBUS_PART_FINAL_GUARD+1)
+
+/*
+ * Replier typing.
+ * The higher the replier type, the more specific it is.
+ * We trust the binding mechanisms not to have created two replier
+ * bindings of the same type for the same name (so we shan't, for
+ * example, get '$.Fred.*' bound as replier twice).
+ */
+
+enum kbus_replier_type {
+ UNSET = 0,
+ WILD_STAR,
+ WILD_PERCENT,
+ SPECIFIC
+};
+
+/*
+ * A reference counting wrapper for message data
+ *
+ * If 'as_pages' is false, then the data is stored as a single kmalloc'd
+ * entity, pointed to by 'parts[0]'. In this case, 'num_parts' will be 1,
+ * and 'last_page_len' will be the size of the allocated data.
+ *
+ * If 'as_pages' is true, then the data is stored as 'num_parts' pages, each
+ * pointed to by 'parts[n]'. The last page should be treated as being size
+ * 'last_page_len' (even if the implementation is not enforcing this). All
+ * other pages are of size PART_LEN.
+ *
+ * In either case, 'lengths[n]' is a "fill counter" for how many bytes of data
+ * are actually being stored in page 'n'. Once the data is all in place, this
+ * should be equal to PART_LEN or 'last_page_len' as appropriate.
+ *
+ * 'refcount' is a stanard kernel reference count for the data - when it reaches
+ * 0, everything (the parts, the arrays and the datastructure) gets freed.
+ */
+struct kbus_data_ptr {
+ int as_pages;
+ unsigned num_parts;
+ unsigned long *parts;
+ unsigned *lengths;
+ unsigned last_page_len;
+ struct kref refcount;
+};
+
+/*
+ * A reference counting wrapper for message names
+ *
+ * RATIONALE:
+ *
+ * When a message name is copied from user space, we take our first copy.
+ *
+ * In order to send a message to a Ksock, we use kbus_push_message(), which
+ * takes a copy of the message for each recipient.
+ *
+ * We also take a copy of the message name for our list of messages that have
+ * been read but not replied to.
+ *
+ * It makes sense to copy the message header, because the contents thereof are
+ * changed according to the particular recipient.
+ *
+ * Copying the message data (if any) is handled by the kbus_data_ptr, above,
+ * which provides reference counting. This makes sense because message data may
+ * be large.
+ *
+ * If we only have one recipient, copying the message name is not a big issue,
+ * but if there are many, we would prefer not to make many copies of the
+ * string. It is, perhaps, worth keeping a dictionary of message names. and
+ * referring to the name in that - but that's not an incremental change from
+ * the "simple copying" state we start from.
+ *
+ * The simplest change to make, which may have some benefit, is to reference
+ * count the names for an individual message, as is done for the message data.
+ *
+ * If we have a single recipient, we will have copied the string from user
+ * space, and also created the kbus_name_ptr datastructure - an overhead of 8
+ * bytes. However, when we copy the message for the recipient, we do not need
+ * to copy the message name, so if the message name is more than 8 bytes, we
+ * have immediately made a gain (and experience shows that message names tend
+ * to be at least that long).
+ *
+ * As soon as we have more than one recipient, it becomes extremely likely that
+ * we have saved space, and we will definitely have saved allocations which
+ * could be fragmenting memory. So it sounds like a good thing to try.
+ *
+ * Also, if I later on want to store a hash code for the string (hoping to
+ * speed up comparisons), the new datastructure gives me somewhere to put it...
+ */
+struct kbus_name_ptr {
+ char *name;
+ struct kref refcount;
+};
+
+/*
+ * When the user reads a message from us, they receive a kbus_entire_message
+ * structure.
+ *
+ * When the user writes a message to us, they write a "pointy" message, using
+ * the kbus_message_header structure, or an "entire" message, using the
+ * kbus_entire_message structure.
+ *
+ * Within the kernel, all messages are held as "pointy" messages, but instead
+ * of direct pointers to the message name and data, we use reference counted
+ * pointers.
+ *
+ * Rather than overload the 'name' and 'data' pointer fields, with all the
+ * danger of getting it wrong that that implies, it seems simpler to have our
+ * own, internal to the kernel, clone of the datastructure, but with these
+ * fields defined correctly...
+ *
+ * Hmm. If we have the name and data references, perhaps we should move the
+ * name and data *lengths* into those same.
+ */
+
+struct kbus_msg {
+ struct kbus_msg_id id; /* Unique to this message */
+ struct kbus_msg_id in_reply_to; /* Which message this is a reply to */
+ u32 to; /* 0 (empty) or a replier id */
+ u32 from; /* 0 (KBUS) or the sender's id */
+ struct kbus_orig_from orig_from; /* Cross-network linkage */
+ struct kbus_orig_from final_to; /* Cross-network linkage */
+ u32 extra; /* ignored field - future proofing */
+ u32 flags; /* Message type/flags */
+ u32 name_len; /* Message name's length, in bytes */
+ u32 data_len; /* Message length, also in bytes */
+ struct kbus_name_ptr *name_ref;
+ struct kbus_data_ptr *data_ref;
+};
+
+/*
+ * The current message that the user is reading (with kbus_read())
+ *
+ * If 'msg' is NULL, then the data structure is "empty" (i.e., there is no
+ * message being read).
+ */
+struct kbus_read_msg {
+ struct kbus_entire_message user_hdr; /* the header for user space */
+
+ struct kbus_msg *msg; /* the internal message */
+ char *parts[KBUS_NUM_PARTS];
+ unsigned lengths[KBUS_NUM_PARTS];
+ int which; /* The current item */
+ u32 pos; /* How far they've read in it */
+ /*
+ * If the current item is KBUS_PART_DATA then we 'ref_data_index' is
+ * which part of the data we're in, and 'pos' is how far we are through
+ * that particular item.
+ */
+ u32 ref_data_index;
+};
+
+/*
+ * See kbus_write_parts() for how this data structure is actually used.
+ *
+ * If 'msg' is NULL, then the data structure is "empty" (i.e., there is no
+ * message being written).
+ *
+ * * 'is_finished' is true when we've got all the bytes for our message,
+ * and thus don't want any more. It's an error for the user to try to
+ * write more message after it is finished.
+ *
+ * For a "pointy" message, this is set immediately after the message header
+ * end guard is finished (the message name and any data aren't "pulled in"
+ * until the user does SEND). For an "entire" message, this is set after the
+ * final end guard is finished (so we will have the message name and any data
+ * in memory).
+ *
+ * * 'pointers_are_local' is true if the message's name and data have been
+ * transferred to kernel space (as reference counted entities), and false
+ * if they are (still) in user space.
+ *
+ * * 'hdr' is the message header, the shorter in-kernel version.
+ *
+ * * 'which' indicates which part of the message we think we're being given
+ * bytes for, from KBUS_PART_HDR through to (for an "entire" message)
+ * KBUS_PART_FINAL_GUARD.
+ * * 'pos' is the index of the next byte within the current part of whatever
+ * we're working on, as indicated by 'which'. Note that for message data,
+ * this is the index within the whole of the data (not the index within a
+ * data part).
+ *
+ * * If we're reading an "entire" message, then the message name gets written
+ * to 'ref_name', which is a reference-counted string. This is allocated to
+ * the correct size/shape for the entire message name, after the head has
+ * been read.
+ *
+ * The intention is that, if 'ref_name' is non-NULL, it should be legal
+ * to call 'kbus_lower_name_ref()' on it, to free its contents.
+ *
+ * * Similarly, 'ref_data' is reference-counted data, again allocated to the
+ * correct size/shape for the entire message data length, after the header
+ * has been read. The 'length' for each part is used to indicate how far
+ * through that part we have populated with bytes.
+ *
+ * The intention is that, if 'ref_data' is non-NULL, it should be legal
+ * to call 'kbus_lower_data_ref()' on it, to free its contents.
+ *
+ * 'ref_data_index' is then the index (starting at 0) of the referenced
+ * data part that we are populating.
+ */
+struct kbus_write_msg {
+ struct kbus_entire_message user_msg; /* from user space */
+ struct kbus_msg *msg; /* our version of it */
+
+ u32 is_finished;
+ u32 pointers_are_local;
+ u32 guard; /* Whichever guard we're reading */
+ char *user_name_ptr; /* User space name */
+ void *user_data_ptr; /* User space data */
+ enum kbus_msg_parts which;
+ u32 pos;
+ struct kbus_name_ptr *ref_name;
+ struct kbus_data_ptr *ref_data;
+ u32 ref_data_index;
+};
+
+/*
+ * This is the data for an individual Ksock
+ *
+ * Each time we open /dev/kbus<n>, we need to remember a unique id for
+ * our file-instance. Using 'filp' might work, but it's not something
+ * we have control over, and in particular, if the file is closed and
+ * then reopened, there's no guarantee that a particular value of 'filp'
+ * won't be used again. A simple serial number is safer.
+ *
+ * Each such "opening" also has a message queue associated with it. Any
+ * messages this "opening" has declared itself a listener (or replier)
+ * for will be added to that queue.
+ *
+ * 'id' is the unique id for this file descriptor - it enables stateful message
+ * transtions, etc. It is local to the particular KBUS device.
+ *
+ * 'last_msg_id_sent' is the message id of the last message that was
+ * (successfully) written to this file descriptor. It is needed when
+ * constructing a reply.
+ *
+ * We have a queue of messages waiting for us to read them, in 'message_queue'.
+ * 'message_count' is how many messages are in the queue, and 'max_messages'
+ * is an indication of how many messages we shall allow in the queue.
+ *
+ * Note that, however a message was originally sent to us, messages held
+ * internally are always a message header plus pointers to a message name and
+ * (optionally) message data. See kbus_send() for details.
+ */
+struct kbus_private_data {
+ struct list_head list;
+ struct kbus_dev *dev; /* Which device we are on */
+ u32 id; /* Our own id */
+ struct kbus_msg_id last_msg_id_sent; /* As it says - see above */
+ u32 message_count; /* How many messages for us */
+ u32 max_messages; /* How many messages allowed */
+ struct list_head message_queue; /* Messages for us */
+
+ /*
+ * It's useful (for /proc/kbus/bindings) to remember the PID of the
+ * current process
+ */
+ pid_t pid;
+
+ /* Wait for something to appear in the message_queue */
+ wait_queue_head_t read_wait;
+
+ /* The message currently being read by the user */
+ struct kbus_read_msg read;
+
+ /* The message currently being written by the user */
+ struct kbus_write_msg write;
+
+ /* Are we currently sending that message? */
+ int sending;
+
+ /*
+ * Each request we send should (eventually) generate us a reply, or
+ * at worst a status message from KBUS itself telling us there isn't
+ * going to be one. So we need to ensure that there is room in our
+ * (as the sender) message queue to receive all/any such.
+ *
+ * Note that this *also* allows SEND to forbid sending a Reply to a
+ * Request that we did not receive (or to which we have already
+ * replied)
+ */
+ struct kbus_msg_id_mem outstanding_requests;
+
+ /*
+ * If we are a replier for a message, then KBUS wants to ensure
+ * that a reply is *definitely* made. If we release ourselves, then
+ * we're clearly not going to reply to any requests that we have
+ * read but not replied to, and KBUS would like to generate a status
+ * message for each such. So we need a list of the information needed
+ * to form such Status/Reply messages.
+ *
+ * (Thus we don't need the whole of the original message, since
+ * we're only *really* needing its name, its id and who its
+ * from -- given which its easiest just to keep the parts we
+ * *do* need, and ignore the data.)
+ *
+ * It was decided not to place a limit on the size of this list.
+ * Its size is limited by the ability of sender(s) to send
+ * requests, which in turn is limited by the the number of slots
+ * they can reserve for the replies to those requests in their
+ * own message queues.
+ *
+ * If a limit was imposed, then we would also need to stop a sender
+ * sending a request because the replier has too many replies
+ * outstanding (for instance, because it has gone to sleep). But
+ * then we'd assume that it is not responding to messages in
+ * general, and so its message queue would fill up, and that
+ * should be sufficient protection.
+ */
+ struct list_head replies_unsent;
+ u32 num_replies_unsent;
+ u32 max_replies_unsent;
+
+ /*
+ * Managing which messages a replier may reply to
+ * ----------------------------------------------
+ * We need to police replying, such that a replier may only reply
+ * to requests that it has received (where "received" means "had
+ * placed into its message queue", because KBUS must reply for us
+ * if the particular Ksock is not going to).
+ *
+ * It is possible to do this using either the 'outstanding_requests'
+ * or the 'replies_unsent' list.
+ *
+ * Using the 'outstanding_requests' list means that when a replier
+ * wants to send a reply, it needs to look up who the original-sender
+ * is (from its Ksock id, in the "from" field of the message), and
+ * check against that. This is a bit inefficient.
+ *
+ * Using the 'replies_unsent' list means that when a replier wants
+ * to send a reply, it just needs to find the right message stub
+ * in said 'replies_unsent' list, and check that the reply *does*
+ * match the original request. This may be more efficient, depending.
+ *
+ * In fact, the 'outstanding_requests' list is used, simply because
+ * it was implemented first.
+ */
+};
+
+/* What is a sensible number for the default maximum number of messages? */
+#ifndef CONFIG_KBUS_DEF_MAX_MESSAGES
+#define CONFIG_KBUS_DEF_MAX_MESSAGES 100
+#endif
+
+/* Information belonging to each /dev/kbus<N> device */
+struct kbus_dev {
+ struct cdev cdev; /* Character device data */
+ struct device *dev; /* Our very selves */
+
+ u32 index; /* Which /dev/kbus<n> device we are */
+
+ /*
+ * The Big Lock
+ * We use a single mutex for all purposes, and all locking is done
+ * at the "top level", i.e., in the externally called functions.
+ * This simplifies the design of the internal (list processing,
+ * etc.) functions, at the possible cost of making interaction
+ * with KBUS, in general, slower.
+ *
+ * On the other hand, we favour reliable over fast.
+ */
+ struct mutex mux;
+
+ /* Who has bound to receive which messages in what manner */
+ struct list_head bound_message_list;
+
+ /*
+ * The actual Ksock entries (one per 'open("/dev/kbus<n>")')
+ * This is to allow us to find the 'kbus_private_data' instances,
+ * so that we can get at all the message queues. The details of
+ * how we do this are *definitely* going to change...
+ */
+ struct list_head open_ksock_list;
+
+ /* Has one of our Ksocks made space available in its message queue? */
+ wait_queue_head_t write_wait;
+
+ /*
+ * Each open file descriptor needs an internal id - this is used
+ * when binding messages to listeners, but is also needed when we
+ * want to reply. We reserve the id 0 as a special value ("none").
+ */
+ u32 next_ksock_id;
+
+ /*
+ * Every message sent has a unique id (again, unique per device).
+ */
+ u32 next_msg_serial_num;
+
+ /* Are we wanting debugging messages? */
+ u32 verbose;
+};
+
+/*
+ * Each entry in a message queue holds a single message, and a pointer to
+ * the message name binding that caused it to be added to the list. This
+ * makes it simple to remove messages from the queue if the message name
+ * binding is unbound. The binding shall be NULL for:
+ *
+ * * Replies
+ * * KBUS "synthetic" messages, which are also (essentialy) Replies
+ */
+struct kbus_message_queue_item {
+ struct list_head list;
+ struct kbus_msg *msg;
+ struct kbus_message_binding *binding;
+};
+
+/* The sizes of the parts in our reference counted data */
+#define KBUS_PART_LEN PAGE_SIZE
+#define KBUS_PAGE_THRESHOLD (PAGE_SIZE >> 1)
+
+/* Manage the files used to report KBUS internal state */
+/* From kbus_internal.c */
+#ifndef CONFIG_PROC_FS
+void kbus_setup_reporting(void) {}
+void kbus_remove_reporting(void) {}
+#else
+extern void kbus_setup_reporting(void);
+extern void kbus_remove_reporting(void);
+#endif
+/* From kbus.c itself */
+extern void kbus_get_device_data(int *num_devices,
+ struct kbus_dev ***devices);
+extern u32 kbus_lenleft(struct kbus_private_data *priv);
+
+#endif /* _kbus_internal */
--
1.7.4.1

2011-03-18 17:47:29

by Tony Ibbs

[permalink] [raw]
Subject: [PATCH 01/11] Documentation for KBUS


Signed-off-by: Tony Ibbs <[email protected]>
---
Documentation/Kbus.txt | 1222 ++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 1222 insertions(+), 0 deletions(-)
create mode 100644 Documentation/Kbus.txt

diff --git a/Documentation/Kbus.txt b/Documentation/Kbus.txt
new file mode 100644
index 0000000..7cf723fd6
--- /dev/null
+++ b/Documentation/Kbus.txt
@@ -0,0 +1,1222 @@
+=============================================
+KBUS -- Lightweight kernel-mediated messaging
+=============================================
+
+Summary
+=======
+KBUS provides lightweight kernel-mediated messaging for Linux.
+
+* "lightweight" means that there is no intent to provide complex or
+ sophisticated mechanisms - if you need something more, consider DBUS or
+ other alternatives.
+
+* "kernel-mediated" means that the actual business of message passing and
+ message synchronisation is handled by a kernel module.
+
+* "for Linux" means what it says, since the Linux kernel is required.
+
+Initial use is expected to be in embedded systems.
+
+There is (at least initially) no intent to aim for a "fast" system - this is
+not aimed at real-time systems.
+
+Although the implementation is kernel-mediated, there is a mechanism
+("Limpets") for commnicating KBUS messages between buses and/or systems.
+
+Intentions
+==========
+KBUS is intended:
+
+* To be simple to use and simple to understand.
+* To have a small codebase, written in C.
+* To provide predictable message delivery.
+* To give deterministic message ordering.
+* To guarantee a reply to every request.
+
+It needs to be simple to use and understand because the expected users are
+typically busy with other matters, and do not have time to spend learning
+a complex messaging system.
+
+It needs to have a small codebase, written in C, because embedded systems
+often lack resources, and may not have enough space for C++ libraries, or
+messaging systems supporting more complex protocol stacks.
+
+Our own experience on embedded systems of various sizes indicates that
+the last three points are especially important.
+
+Predictable message delivery means the user can know whether they can tell in
+what circumstances messages will or will not be received.
+
+Deterministic message ordering means that all recipients of a given set of
+messages will receive them in the same order as all other recpients (and this
+will be the order in which the messages were sent). This is important when
+several part of (for instance) an audio/video stack are interoperating.
+
+Guaranteeing that a request will always result in a reply means that the user
+will be told if the intended replier has (for instance) crashed. This again
+allows for simpler use of the system.
+
+The basics
+==========
+Python and C
+------------
+Although the KBUS kernel module is written in C, the module tests are written
+in Python, and there is a Python module providing useful interfaces, which is
+expected to be the normal way of using KBUS from Python.
+
+There is also a C library (libkbus) which provides a similar level of
+abstraction, so that C programmers can use KBUS without having to handle the
+low level details of sockets and message datastructures. Note that the C
+programer using KBUS does need to have some awareness of how KBUS messages
+work in order to get memory management right.
+
+Messages
+========
+Message names
+-------------
+All messages have names - for instance "$.Sensors.Kitchen".
+
+All message names start with "$.", followed by one or more alphanumeric words
+separated by dots. There are two wildcard characters, "*" and "%", which can
+be the last word of a name.
+
+Thus (in some notation or other)::
+
+ name := '$.' [ word '.' ]+ ( word | '*' | '%' )
+ word := alphanumerics
+
+Case is significant. There is probably a limit on the maximum size of a
+subname, and also on the maximum length of a message name.
+
+Names form a name hierarchy or tree - so "$.Sensors" might have children
+"$.Sensors.Kitchen" and "$.Sensors.Bedroom".
+
+If the last word of a name is "*", then this is a wildcard name that also
+includes all the child names at that level and below -- i.e., all the names
+that start with the name up to the "*". So "$.Sensors.*" includes
+"$.Sensors.Kitchen", "$.Sensors.Bedroom", "$.Sensors.Kitchen.FireAlarm",
+"$.Sensors.Kitchen.Toaster", "$.Sensors.Bedroom.FireAlarm", and so on.
+
+If the last word of a name is "%", then this is a wildcard name that also
+includes all the child names at that level -- i.e., all the names obtained by
+replacing the "%" by another word. So "$.Sensors.%" includes
+"$.Sensors.Kitchen" and "$.Sensors.Bedroom", but not
+"$.Sensors.Kitchen.Toaster".
+
+Message ids
+-----------
+Every message is expected to have a unique id.
+
+A message id is made up of two parts, a network id and a serial number.
+
+The network id is used to carry useful information when a message is
+transferred from one KBUS system to another (for instance, over a bridge). By
+default (for local messages) it is 0.
+
+A serial number is used to identify the particular message within a network.
+
+If a message is sent via KBUS with a network id of 0, then KBUS itself will
+assign a new message id to the message, with the network id (still) 0, and
+with the serial number one more than the last serial number assigned. Thus for
+local messages, message ids ascend, and their order is deterministic.
+
+If a message is sent via KBUS with a non-zero network id, then KBUS does not
+touch its message id.
+
+Network ids are represented textually as ``{n,s}``, where ``n`` is the
+network id and ``s`` is the serial number.
+
+ Message id {0,0} is reserved for use as an invalid message id. Both
+ network id and serial number are unsigned 32-bit integers. Note that this
+ means that local serial numbers will eventually wrap.
+
+Message content
+---------------
+Messages are made of the following parts:
+
+:start and end guards:
+
+ These are unsigned 32-bit words. 'start_guard' is notionally "Kbus",
+ and 'end_guard' (the 32 bit word after the rest of the message) is
+ notionally "subK". Obviously that depends on how one looks at the 32-bit
+ word. Every message shall start with a start guard and end with an end
+ guard (but see `Message implementation`_ for details).
+
+ These provide some help in checking that a message is well formed, and in
+ particular the end guard helps to check for broken length fields.
+
+ If the message layout changes in an incompatible manner (this has happened
+ once, and is strongly discouraged), then the start and end guards change.
+
+Unset
+~~~~~
+Unset values are 0, or have zero length (as appropriate).
+
+It is not possible for a message name to be unset.
+
+The message header
+~~~~~~~~~~~~~~~~~~
+:message id: identifies this particular message. This is made up of a network
+ id and a serial number, and is discussed in `Message ids`_.
+
+ When replying to a message, copy this value into the 'In reply to' field.
+
+:in_reply_to: is the message id of the message that this is a reply to.
+
+ This shall be set to 0 unless this message *is* a reply to a previous
+ message. In other words, if this value is non-0, then the message *is* a
+ reply.
+
+:to: is the Ksock id identifying who the message is to be sent to.
+
+ When writing a new message, this should normally be set to 0, meaning
+ "anyone listening" (but see below if "state" is being maintained).
+
+ When replying to a message, it shall be set to the 'from' value of the
+ orginal message.
+
+ When constructing a request message (a message wanting a reply), then it can
+ be set to a specific replier's Ksock id. When such a message is sent, if the
+ replier bound (at that time) does not have that specific Ksock id, then the
+ send will fail.
+
+:from: indicates the Ksock id of the message's sender.
+
+ When writing a new message, set this to 0, since KBUS will set it.
+
+ When reading a message, this will have been set by KBUS.
+
+:orig_from: this indicates the original sender of a message, when being
+ transported via Limpet. This will be documented in more detail in the future.
+
+:final_to: this indicates the final target of a message, when being
+ transported via Limpet. This will be documented in more detail in the future.
+
+:extra: this is a zero field, for future expansion. KBUS will always set this
+ field to zero.
+
+:flags: indicates extra information about the message. See `Message Flags`_
+ for detailed information.
+
+ When writing a message, typical uses include:
+
+ * the message is URGENT
+ * a reply is wanted
+
+ When reading a message, typical uses include:
+
+ * the message is URGENT
+ * a reply is wanted
+ * a reply is wanted from the specific reader
+
+ The top 16 bits of the flags field is reserved for use by the user - KBUS
+ will not touch it.
+
+:name_length: is the length of the message name in bytes. This will always be
+ non-zero, as a message name must always be given.
+
+:data_length: is the length of the message data in bytes. It may be zero
+ if there is no data associated with this message.
+
+:name: identifies the message. It must be terminated with a
+ zero byte (as is normal for C - in the Python binding a normal Python string
+ can be used, and the this will be done for you). Byte ordering is according
+ to that of the platform.
+
+ In an "entire" message (see `Message implementation`_ below) the name shall
+ be padded out to a multiple of 4 bytes. Neither the terminating zero byte
+ nor the padding are included in the name length. Padding should be with
+ zero bytes.
+
+:data: is optional. KBUS does not touch the content of the
+ data, but just copies it. Byte ordering is according to that of the
+ platform.
+
+ In an "entire" message (see `Message implementation`_ below) the data shall,
+ if present, be padded out to a multiple of 4 bytes. This padding is not
+ included in the data length, and the padding bytes may be whatever byte
+ values are convenient to the user. KBUS does not guarantee to copy the exact
+ given padding bytes (in fact, current implementations just ignore them).
+
+Message implementation
+~~~~~~~~~~~~~~~~~~~~~~
+There are two ways in which a message may be constructed, "pointy" and
+"entire".
+See the ``kbus_defns.h`` header file for details.
+
+.. note:: The Python binding hides most of the detail of the message
+ implementation from the user, so if you are using Python you may be able to
+ skip this section.
+
+In a "pointy" message, the ``name`` and ``data`` fields in the message header
+are C pointers to the actual name and data. If there is no data, then the
+``data`` field is NULL. This is probably the simplest form of message for a C
+programmer to create. This might be represented as::
+
+ start_guard: 'Kbus'
+ id: (0,0)
+ in_reply_to: (0,0)
+ to: 0
+ from: 0
+ name_len: 6
+ data_len: 0
+ name: ---------------------------> "$.Fred"
+ data: NULL
+ end_guard: 'subK'
+
+or (with data)::
+
+ start_guard: 'Kbus'
+ id: (0,0)
+ in_reply_to: (0,0)
+ to: 0
+ from: 0
+ name_len: 6
+ data_len: 7
+ name: ---------------------------> "$.Fred"
+ data: ---------------------------> "abc1234"
+ end_guard: 'subK'
+
+.. warning:: When writing a "pointy" message in C, be very careful not to
+ free the name and data between the ``write`` and the SEND, as it is
+ only when the message is sent that KBUS actually follows the ``name`` and
+ ``data`` pointers.
+
+ *After* the SEND, KBUS will have taken its own copies of the name and
+ (any) data.
+
+In an "entire" message, both ``name`` and ``data`` fields are required to be
+NULL. The message header is followed by the message name (padded as described
+above), any message data (also padded), and another end guard. This might be
+represented as::
+
+ start_guard: 'Kbus'
+ id: (0,0)
+ in_reply_to: (0,0)
+ to: 0
+ from: 0
+ name_len: 6
+ data_len: 0
+ name: NULL
+ data: NULL
+ end_guard: 'subK'
+ name_data: '$.Fred\x0\x0'
+ end_guard: 'subK'
+
+or (again with data)::
+
+ start_guard: 'Kbus'
+ id: (0,0)
+ in_reply_to: (0,0)
+ to: 0
+ from: 0
+ name_len: 6
+ data_len: 7
+ name: NULL
+ data: NULL
+ end_guard: 'subK'
+ name_data: '$.Fred\x0\x0'
+ data_data: 'abc1234\x0'
+ end_guard: 'subK'
+
+Note that in these examples:
+
+1. The message name is padded out to 6 bytes of name, plus one of terminating
+ zero byte, plus another zero byte to make 8, but the message's ``name_len``
+ is still 6.
+2. When there is no data, there is no "data data" after the name data.
+3. When there is data, the data is presented after the name, and is padded out
+ to a multiple of 4 bytes (but without the necessity for a terminating zero
+ byte, so it is possible to have no pad bytes if the data length is already
+ a multiple of 4). Again, the ``data_len`` always reflects the "real" data
+ length.
+4. Although the data shown is presented as ASCII strings for these examples,
+ it really is just bytes, with no assumption of its content/meaning.
+
+When writing/sending messages, either form may be used (again, the "pointy"
+form may be simpler for C programmers).
+
+When reading messages, however, the "entire" form is always returned - this
+removes questions about needing to free multiple returned datastructures (for
+instance, what to do if the user were to ask for the NEXTMSG, read a few
+bytes, and then DISCARD the rest).
+
+Limits
+~~~~~~
+Message names may not be shorter than 3 characters (since they must be at
+least "$." plus another character). An arbitrary limit is also placed on the
+maximum message length - this is currently 1000 characters, but may be
+reviewed in the future.
+
+Message data may, of course, be of zero length.
+
+When reading a message, an "entire" message is always returned.
+
+ .. note:: When using C to work with KBUS messages, it is generally
+ ill-advised to reference the message name and data "directly"::
+
+ char *name = msg->name;
+ uint8_t *data = msg->data;
+
+ since this will work for "pointy" messages, but not for "entire"
+ messages (where the ``name`` field will be NULL). Instead, it
+ is always better to do::
+
+ char *name = kbus_msg_name_ptr(msg);
+ uint8_t *data = kbus_msg_data_ptr(msg);
+
+ regardless of the message type.
+
+Message flags
+-------------
+KBUS reserves the bottom 16 bits of the flags word for predefined purposes
+(although not all of those bits are yet used), and guarantees not to touch the
+top 16 bits, which are available for use by the programmer as a particular
+application may wish.
+
+The WANT_A_REPLY bit is set by the sender to indicate that a
+reply is wanted. This makes the message into a request.
+
+ Note that setting the WANT_A_REPLY bit (i.e., a request) and
+ setting 'in_reply_to' (i.e., a reply) is bound to lead to
+ confusion, and the results are undefined (i.e., don't do it).
+
+The WANT_YOU_TO_REPLY bit is set by KBUS on a particular message
+to indicate that the particular recipient is responsible for replying
+to (this instance of the) message. Otherwise, KBUS clears it.
+
+The SYNTHETIC bit is set by KBUS when it generates a Status message, for
+instance when a replier has gone away and will therefore not be sending a
+reply to a request that has already been queued.
+
+ Note that KBUS does not check that a sender has not set this
+ flag on a message, but doing so may lead to confusion.
+
+The URGENT bit is set by the sender if this message is to be
+treated as urgent - i.e., it should be added to the *front* of the
+recipient's message queue, not the back.
+
+Send flags
+~~~~~~~~~~
+There are two "send" flags, ALL_OR_WAIT and ALL_OR_FAIL.
+Either one may be set, or both may be unset.
+
+ If both are set, the message will be rejected as invalid.
+
+ Both flags are ignored in reply messages (i.e., messages with the
+ 'in_reply_to' field set).
+
+If a message has ALL_OR_FAIL set, then a SEND will only succeed if the message
+could be added to all the (intended) recipient's message queues. Otherwise,
+SEND returns -EBUSY.
+
+If a message has ALL_OR_WAIT set, then a SEND will only succeed if the message
+could be added to all the (intended) recipient's message queues. Otherwise
+SEND returns -EAGAIN. In this case, the message is still being sent, and the
+caller should either call DISCARD (to drop it), or else use poll/select to
+wait for the send to finish. It will not be possible to call "write" until the
+send has completed or been discarded.
+
+These are primarily intended for use in debugging systems. In particular, note
+that the mechanisms dealing with ALL_OR_WAIT internally are unlikely to be
+very efficient.
+
+.. note:: The send flags will be less effective when messages are being
+ mediated via Limpets, as remote systems are involved.
+
+Things KBUS changes in a message
+--------------------------------
+In general, KBUS leaves the content of a message alone - mostly so that an
+individual KBUS module can "pass through" messages from another domain.
+However, it does change:
+
+- the message id's serial number (but only if its network id is unset)
+- the 'from' id (to indicate the Ksock this message was sent from)
+- the WANT_YOU_TO_REPLY bit in the flags (set or cleared as appropriate)
+- the SYNTHETIC bit, which will always be unset in a message sent by a
+ Sender
+
+KBUS will always set the 'extra' field to zero.
+
+Limpets will change:
+
+- the network id in any field that has one.
+- the 'orig_from' and 'final_to' fields (which in general should only be
+ manipulated by Limpets).
+
+Types of message
+================
+There are four basic message types:
+
+* Announcement -- a message aimed at any listeners, expecting no reply
+* Request -- a message aimed at a replier, who is expected to reply
+* Reply -- a reply to a request
+* Status -- a message generated by KBUS
+
+The Python interface provides a Message base class, and subclasses thereof for
+each of the "user" message types (but not currently for Status).
+
+Announcements
+-------------
+An announcement is the "plain" message type. It is a message that is being
+sent for all bound listeners to "hear".
+
+When creating a new announcement message, it has:
+
+ :message id: see `Message ids`_
+ :in reply to: unset (it's not a reply)
+ :to: unset (all announcements are broadcast to any listeners)
+ :from: unset (KBUS will set it)
+ :flags: typically unset, see `Message flags`_
+ :message name: as appropriate
+ :message data: as appropriate
+
+The Python interface provides an ``Announcement`` class to help in creating an
+announcement message.
+
+Request message
+---------------
+A request message is a message that wants a reply.
+
+Since only one Ksock may bind as a replier for a given message name, a
+request message wants a reply from a single Ksock. By default, this is
+whichever Ksock has bound to the message name at the moment of sending, but
+see `Stateful transactions`_.
+
+When creating a new request message, it has:
+
+ :message id: see `Message ids`_
+ :in reply to: unset (it's not a reply)
+ :to: either unset, or a specific Ksock id if the request
+ should fail if that Ksock is (no longer) the replier
+ for this message name
+ :from: unset (KBUS will set it)
+ :flags: the "needs a reply" flag should be set.
+ KBUS will set the "you need to reply" flag in the
+ copy of the message delivered to its replier.
+ :message name: as appropriate
+ :message data: as appropriate
+
+When receiving a request message, the WANT_YOU_TO_REPLY flag will be set if it
+is this recipient's responsibility to reply.
+
+The Python interface provides a ``Request`` class to help in creating a
+request message.
+
+When a request message is sent, it is an error if there is no replier bound to
+that message name.
+
+The message will, as normal, be delivered to all listeners, and will have the
+"needs a reply" flag set wherever it is received. However, only the copy of
+the message received by the replier will be marked with the WANT_YOU_TO_REPLY
+flag.
+
+ So, if a particular file descriptor is bound as listener and replier
+ for '$.Fred', it will receive two copies of the original message (one
+ marked as needing reply from that file descriptor). However, when the
+ reply is sent, only the "plain" listener will receive a copy of the reply
+ message.
+
+Reply message
+-------------
+A reply message is the expected response after reading a request message.
+
+A reply message is distinguished by having a non-zero 'in reply to' value.
+
+Each reply message is in response to a specific request, as indicated by the
+'in reply to' field in the message.
+
+The replier is helped to remember that it needs to reply to a request, because
+the request has the WANT_YOU_TO_REPLY flag set.
+
+When a reply is sent, all listeners for that message name will receive it.
+However, the original replier will not.
+
+When creating a new reply message, it has:
+
+ :message id: see `Message ids`_
+ :in reply to: the request message's 'message id'
+ :to: the request message's 'from' id
+ :from: unset (KBUS will set it)
+ :flags: typically unset, see `Message flags`_
+ :message name: the request message's 'message name'
+ :message data: as appropriate
+
+The Python interface provides a ``Reply`` class to help in creating a reply
+message, but more usefully there is also a ``reply_to`` function that creates
+a Reply Message from the original Request.
+
+Status message
+--------------
+KBUS generates Status messages (also sometimes referred to as "synthetic"
+messages) when a request message has been successfully sent, but the replier
+is unable to reply (for instance, because it has closed its Ksock). KBUS thus
+uses a Status message to provide the "reply" that it guarantees the sender
+will get.
+
+As you might expect, a KBUS status message is thus (technically) a reply
+message.
+
+A status message looks like:
+
+ :message id: as normal
+ :in reply to: the 'message id' of the message whose sending or
+ processing caused this message.
+ :to: the Ksock id of the recipient of the message
+ :from: the Ksock id of the sender of the message - this will
+ be 0 if the sender is KBUS itself (which is assumed for
+ most exceptions)
+ :flags: typically unset, see `Message flags`_
+ :message name: for KBUS exceptions, a message name in '$.KBUS.*'
+ :message data: for KBUS exceptions, normally absent
+
+KBUS status messages always have '$.KBUS.<something>' names (this may be a
+multi-level <something>), and are always in response to a previous message, so
+always have an 'in reply to'.
+
+Requests and Replies
+--------------------
+KBUS guarantees that each Request will (eventually) be matched by a consequent
+Reply (or Status [1]_) message, and only one such.
+
+The "normal" case is when the replier reads the request, and sends its own
+reply back.
+
+If a Request message has been successfully SENT, there are the following other
+cases to consider:
+
+1. The replier unbinds from that message name before reading the request
+ message from its queue. In this case, KBUS removes the message from the
+ repliers queue, and issues a "$.KBUS.Replier.Unbound" message.
+
+2. The replier closes itself (close the Ksock), but has not yet read the
+ message. In this case, KBUS issues a "$.KBUS.Replier.GoneAway" message.
+
+3. The replier closes itself (closes the Ksock), has read the message, but has
+ not yet (and now cannot) replied to it. In this case, KBUS issues a
+ "$.KBUS.Replier.Ignored" message.
+
+4. SEND did not complete, and the replier closes itself before the message can
+ be added to its message queue (by the POLL mechanism). In this case, KBUS
+ issues a "$.KBUS.Replier.Disappeared" message.
+
+5. SEND did not complete, and an error occurs when the POLL mechanims tries to
+ send the message. In this case, KBUS issues a "$.KBUS.ErrorSending"
+ message.
+
+In all these cases, the 'in_reply_to' field is set to the original request's
+message id. In the first three cases, the 'from' field will be set to the
+Ksock id of the (originally intended) replier. In the last two cases, that
+information is not available, and a 'from' of 0 (indicating KBUS itself) is
+used.
+
+.. [1] Remember that a Status message is essentially a specialisation of a
+ Reply message.
+
+.. note:: Limpets introduce some extra messages, which will be documented when
+ the proper Limpet documentation is written.
+
+KBUS end points - Ksocks
+========================
+The KBUS devices
+----------------
+Message interactions happen via the KBUS devices. Installing the KBUS kernel
+module always creates ``/dev/kbus0``, it may also create ``/dev/kbus1``, and
+so on.
+
+ The number of devices to create is indicated by an argument at module
+ installation, for instance::
+
+ # insmod kbus.ko num_kbus_devices=10
+
+Messages are sent by writing to a KBUS device, and received by reading from
+the same device. A variety of useful ioctls are also provided. Each KBUS
+device is independent - messages cannot be sent from ``/dev/kbus0`` to
+``/dev/kbus1``, since there is no shared information.
+
+Ksocks
+------
+Specifically, messages are written to and read from KBUS device file
+descriptors. Each such is termed a *Ksock* - this is a simpler term than "file
+descriptor", and has some resonance with "socket".
+
+Each Ksock may be any (one or more) of:
+
+* a Sender (opening the device for read/write)
+* a Listener (only needing to open the device for read)
+* a Replier (opening the device for read/write)
+
+Every Ksock has an id. This is a 32-bit unsigned number assigned by KBUS when
+the device is opened. The value 0 is reserved for KBUS itself.
+
+ The terms "listener id", "sender id", "replier id", etc., thus all refer
+ to a Ksock id, depending on what it is being used for.
+
+Senders
+-------
+Message senders are called "senders". A sender should open a Ksock for read
+and write, as it may need to read replies and error/status messages.
+
+A message is sent by:
+
+1. Writing the message to the Ksock (using the standard ``write`` function)
+2. Calling the SEND ioctl on the Ksock, to actually send the message. This
+ returns (via its arguments) the message id of the message sent. It also
+ returns status information about the send
+
+ The status information is to be documented.
+
+The DISCARD ioctl can be used to "throw away" a partially written message,
+before SEND has been called on it.
+
+If there are no listeners (of any type) bound to that message name, then the
+message will be ignored.
+
+If the message is flagged as needing a reply, and there are no repliers bound
+to that message name, then an error message will be sent to the sender, by
+KBUS.
+
+It is not possible to send a message with a wildcard message name.
+
+ As a restriction this makes the life of the implementor and documentor
+ easier. I believe it would also be confusing if provided.
+
+The sender does not need to bind to any message names in order to receive
+error and status messages from KBUS.
+
+When a sender sends a Request, an internal note is made that it expects a
+corresponding Reply (or possible a Status message from KBUS if the Replier
+goes away or unbinds from that message name, before replying). A place for
+that Reply is reserved in the sender's message queue. If the message queue
+fills up (either with messages waiting to be read, or with reserved slots for
+Replies), then the sender will not be able to send another Request until there
+is room on the message queue again.
+
+ Hopefully, this can be resolved by the sender reading a message off its
+ queue. However, if there are no messages to be read, and the queue is all
+ reserved for replies, the only solution is for the sender to wait for a
+ replier to send it something that it can then read.
+
+.. note:: What order do we describe things in? Don't forget:
+
+ If the message being sent is a request, then the replier bound to that
+ message name will (presumably) write a reply to the request. Thus the normal
+ sequence for a request is likely to be:
+
+ 1. write the request message
+ 2. read the reply
+
+ The sender does *not* need to bind to anything in order to receive a reply to
+ a request it has sent.
+
+ Of course, if a sender binds to listen to the name it uses for its
+ request, then it will get a copy of the request as sent, and it will
+ also get (an extra) copy of the reply. But see `Receiving messages once
+ only`_.
+
+Listeners
+---------
+Message recipients are called "listeners".
+
+Listeners indicate that they want to receive particular messages, by using the
+BIND ioctl on a Ksock to specify the name of the message that is to be
+listened for. If the binding is to a wildcarded message name, then the
+listener will receive all messages with names that match the wildcard.
+
+An ordinary listener will receive all messages with that name (sent to the
+relevant Ksock). A listener may make more than one binding on the same Ksock
+(indeed, it is allowed to bind to the same name more than once).
+
+Messages are received by:
+
+1. Using the NEXTMSG ioctl to request the next message (this also returns the
+ messages length in bytes)
+2. Calling the standard ``read`` function to read the message data.
+
+If NEXTMSG is called again, the next message will be readied for reading,
+whether the previous message has been read (or partially read) or not.
+
+If a listener no longer wants to receive a particular message name, then they
+can unbind from it, using the UNBIND ioctl. The message name and flags used in
+an UNBIND must match those in the corresponding BIND. Any messages in the
+listener's message queue which match that unbinding will be removed from the
+queue (i.e., the listener will not actually receive them). This does *not*
+affect the message currently being read.
+
+ Note that this has implication for binding and unbinding wildcards,
+ which must also match.
+
+Closing the Ksock also unbinds all the message bindings made on it.
+It does not affect message bindings made on other Ksocks.
+
+Repliers
+--------
+Repliers are a special sort of listener.
+
+For each message name, there may be a single "replier". A replier binds to a
+message name in the same way as any other listener, but sets the "replier"
+flag. If someone else has already bound to the same Ksock as a replier for
+that message name, the request will fail.
+
+Repliers only receive Requests (messages that are marked as wanting a reply).
+
+A replier may (should? must?) reply to the request - this is done by sending
+a Reply message through the Ksock from which the Request was read.
+
+It is perfectly legitimate to bind to a message as both replier and listener,
+in which case two copies of the message will be read, once as replier, and
+once as (just) listener (but see `Receiving messages once only`_).
+
+
+When a request message is read by the appropriate replier, KBUS will mark
+*that particular message* with the "you must reply" flag. This will not be set
+on copies of that message read by any (non-replier) listeners.
+
+ So, in the case where a Ksock is bound as replier and listener for the
+ same message name, only one of the two copies of the message received will
+ be marked as "you must reply".
+
+If a replier binds to a wildcarded message name, then they are the *default*
+replier for any message names satisfying that wildcard. If another replier
+binds to a more specific message name (matching that wildcard),
+then the specific message name binding "wins" - the wildcard replier will no
+longer receive that message name.
+
+ In particular '$.Fred.Jim' is more specific than '$.Fred.%' which in turn
+ is more specific than '$.Fred.*'
+
+This means that if a wildcard replier wants to guarantee to see all the
+messages matching their wildcard, they also need to bind as a listener for the
+same wildcarded name.
+
+For example:
+
+ Assume message names are of the form '$.Sensors.<Room>' or
+ '$.Sensors.<Room>.<Measurement>'.
+
+ Replier 1 binds to '$.Sensors.*'. They will be the default replier for
+ all sensor requests.
+
+ Replier 2 binds to '$.Sensors.%'. They will take over as the default
+ replier for any room specific requests.
+
+ Replier 3 binds to '$.Sensors.Kitchen.Temperature'. They will take over as
+ the replier for the kitchen temperature.
+
+ So:
+
+ - A message named '$.Sensors.Kitchen.Temperature' will go to replier 3.
+ - A message named '$.Sensors.Kitchen' or '$.Sensors.LivingRoom' will go to
+ replier 2.
+ - A message named '$.Sensors.LivingRoom.Temperature' will go to replier 1.
+
+When a Replier is closed (technically, when its ``release`` function is
+called by the kernel) KBUS traverses its outstanding message queue, and for
+each Request that has not been answered, generates a Status message saying
+that the Replier has "GoneAway".
+
+Similarly, if a Replier unbinds from replying to a mesage, KBUS traverses its
+outstanding message queue, and for each Request that has not been answered, it
+generates a Status message saying that it has "Unbound" from being a replier
+for that message name. It also forgets the message, which it is now not going
+to reply to.
+
+Lastly, when a Replier is closed, if it has read any Requests (technically,
+called NEXTMSG to pop them from the message queue), but not actually replied
+to them, then KBUS will send an "Ignored" Status message for each such
+Request.
+
+More information
+================
+Stateful transactions
+---------------------
+It is possible to make stateful message transactions, by:
+
+1. sending a Request
+2. receiving the Reply, and noting the Ksock id of the replier
+3. sending another Request to that specific replier
+4. and so on
+
+Sending a request to a particular Ksock will fail if that Ksock is no longer
+bound as replier to the relevant message name. This allows a sender to
+guarantee that it is communicating with a particular instance of the replier
+for a message name.
+
+Queues filling up
+-----------------
+Messages are sent by a mechanism which:
+
+1. Checks the message is plausible (it has a plausible message name,
+ and the right sort of "shape")
+2. If the message is a Request, checks that the sender has room on its message
+ queue for the (eventual) Reply.
+3. Finds the Ksock ids of all the listeners and repliers bound to that
+ messages name
+4. Adds the message to the queue for each such listener/replier
+
+This can cause problems if one of the queues is already full (allowing
+infinite expansion of queues would also cause problems, of couse).
+
+If a *sender* attempts to send a Request, but does not have room on its
+message queue for the (corresponding) Reply, then the message will not be
+sent, and the send will fail. Note that the message id will not be set, and
+the blocking behaviours defined below do not occur.
+
+If a *replier* cannot receive a particular message, because its queue is full,
+then the message will not be sent, and the send will fail with an error. This
+does, however, set the message id (and thus the "last message id" on the
+sender).
+
+Moreover, a sender can indicate if it wants a message to be:
+
+1. Added to all the listener queues, regardless, in which case it will block
+ until that can be done (ALL_OR_WAIT, sender blocks)
+2. Added to all the listener queues, and fail if that can't be done
+ (ALL_OR_FAIL)
+3. Added to all the listener queues that have room (the default)
+
+See `Message flags`_ for more details.
+
+Urgent messages
+---------------
+Messages may be flagged urgent. In this case they will be added to the front
+of the destination message queue, rather than the end - in other words, they
+will be the next message to be "popped" by NEXTMSG.
+
+Note that this means that if two urgent messages are sent to the same target,
+and *then* a NEXTMSG/read occurs, the second urgent message will be popped and
+read first.
+
+Select, write/send and "next message", blocking
+-----------------------------------------------
+.. warning:: At the moment, ``read`` and ``write`` are always non-blocking.
+
+``read`` returns more of the currently selected message, or EOF if there is no
+more of that message to read (and thus also if there is no currently selected
+message). The NEXTMSG ioctl is used to select ("pop") the next message.
+
+``write`` writes to the end of the currently-being-written message. The
+DISCARD ioctl can be used to discard the data written so far, and the SEND
+ioctl to send the (presumably completed message). Whilst the message is being
+sent, it is not possible to use ``write``.
+
+Note that if SEND is used to send a Request, then KBUS ensures that there will
+always be either a Reply or a Status message in response to that request.
+
+Specifically, if:
+
+1. The Replier "goes away" (and its "release" function is called) before
+ reading the Request (specifically, before calling NEXTMSG to pop it from
+ the message queue)
+2. The Replier "goes away" (and its "release" function is called) before
+ replying to a Request that it has already read (i.e., used NEXTMSG to pop
+ from the message queue)
+3. The Replier unbinds from that Request message name before reading the
+ Request (with the same caveat on what that means)
+4. Select/poll attempts to send the Request, and discovers that the
+ Replier has disappeared since the initial SEND
+5. Select/poll attempts to send the Request, and some other error occurs
+
+then KBUS will "reply" with an appropriate Status message.
+
+--------------------------------------------------
+
+KBUS support its own particular variation on blocking of message sending.
+
+First of all, it supports use of "select" to determine if there are any
+messages waiting to be read. So, for instance (in Python)::
+
+ with Ksock(0,'rw') as sender:
+ with Ksock(0,'r') as listener:
+ (r,w,x) = select.select([listener],[],[],0)
+ assert r == []
+
+ listener.bind('$.Fred')
+ msg = Announcement('$.Fred','data')
+ sender.send_msg(msg)
+
+ (r,w,x) = select.select([listener],[],[],0)
+ assert r == [listener]
+
+This simply checks if there is a message in the Ksock's message list, waiting
+to be "popped" with NEXTMSG.
+
+Secondly, ``write``, SEND and DISCARD interact in what is hoped to be a
+sensible manner. Specifically:
+
+* When SEND (i.e., the SEND ioctl) is called, KBUS can either:
+
+ 1. Succeed in sending the message. The Ksock is now ready for ``write`` to
+ be called on it again.
+ 2. Failed in sending the message (possibly, if the message was a Request,
+ with EADDRNOTAVAIL, indicating that there is no Replier for that
+ Request). The Ksock is now ready for ``write`` to be called on it again.
+ 3. If the message was marked ALL_OR_WAIT, then it may fail with EAGAIN.
+ In this case, the Ksock is still in sending state, and an attempt to
+ call ``write`` will fail (with EALREADY). The caller can either use
+ DISCARD to discard the message, or use select/poll to wait for the
+ message to finish sending.
+
+Thus "select" for the write case checks whether it is allowed to call
+"write" - for instance::
+
+ with Ksock(0,'rw') as sender:
+ write_list = [sender]
+ with Ksock(0,'r') as listener1:
+ write_list= [sender,listener1]
+ read_list = [listener1]
+
+ (r,w,x) = select.select(read_list,write_list,[],0)
+ assert r == []
+ assert w == [sender]
+ assert x == []
+
+ with Ksock(0,'rw') as listener2:
+ write_list.append(listener2)
+ read_list.append(listener2)
+
+ (r,w,x) = select.select(read_list,write_list,[],0)
+ assert r == []
+ assert len(w) == 2
+ assert sender in w
+ assert listener2 in w
+ assert x == []
+
+Receiving messages once only
+----------------------------
+In normal usage (and by default), if a Ksock binds to a message name multiple
+times, it will receive multiple copies of a message. This can happen:
+
+* explicitly (the Ksock deliberately and explicitly binds to the same name
+ more than once, seeking this effect).
+* as a result of binding to a message name and a wildcard that includes the
+ same name, or two overlapping wildcards.
+* as a result of binding as Replier to a name, and also as Listener to the
+ same name (possibly via a wildcard). In this case, multiple copies will
+ only be received when a Request with that name is made.
+
+Several programmers have complained that the last case, in particular, is very
+inconvenient, and thus the "receive a message once only" facility has been
+added.
+
+Using the MSGONCEONLY IOCTL, it is possible to tell a Ksock that only one copy
+of a particular message should be received, even if multiple are "due". In the
+case of the Replier/Listener copies, it will always be the message to which
+the Replier should reply (the one with WANT_YOU_TO_REPLY set) that will be
+received.
+
+Please use this facility with care, and only if you really need it.
+
+IOCTLS
+------
+The KBUS ioctls are defined (with explanatory comments) in the kernel module
+header file (``kbus_defns.h``). They are:
+
+:RESET: Currently has no effect
+:BIND: Bind to a particular message name (possibly as replier).
+:UNBIND: Unbind from a binding - must match exactly.
+:KSOCKID: Determine the Ksock id of the Ksock used
+:REPLIER: Determine who is bound as replier to a particular message
+ name. This returns 0 or the Ksock id of the replier.
+:NEXTMSG: Pop the next message from the Ksock's message queue, ready
+ for reading (with ``read``), and return its length (in bytes).
+ If there is no next message, return a length of 0.
+ The length is always the length of an "entire" message (see
+ `Message implementation`_).
+:LENLEFT: Determine how many bytes of the message currently being read
+ are still to read.
+:SEND: Send the current outstanding message for this Ksock (i.e., the
+ bytes written to the Ksock since the last SEND or DISCARD).
+ Return the message id of the message, and maybe other status
+ information.
+:DISCARD: Discard (throw away) the current outstanding message for this
+ Ksock (i.e., any bytes written to the Ksock since the last
+ SEND or DISCARD).
+:LASTSENT: Determine the message id of the last message SENT on this
+ Ksock.
+:MAXMSGS: Set the maximum length of the (read) message queue for this
+ KSOCK, and return the actual length that is set. An attempt
+ to set the queue length to 0 will just return the current
+ queue length.
+:NUMMSGS: Determine how many messages are outstanding in this Ksock's
+ read queue.
+:UNREPLIEDTO: Determines how many Requests (marked "WANT_YOU_TO_REPLY")
+ this Ksock still needs to reply to. This is primarily a
+ development tool.
+:MSGONLYONCE: Determines whether only one copy of a message will be
+ received, even if the message name is bound to multiple times.
+ May also be used to query the current state.
+:VERBOSE: Determines whether verbose kernel messages should be output or
+ not. Affects the *device* (the entire Ksock).
+ May also be used to query the current state.
+:NEWDEVICE: Requests another KBUS device (``/dev/kbus/<n>``). The next
+ KBUS device number (up to a maximum of 255) will be allocated.
+ Returns the new device number.
+:REPORTREPLIERBINDS: Request synthetic messages announcing Replier BIND/UNBIND
+ events. These are messages named "$.KBUS.ReplierBindEvent",
+ and are the only predefined messages with data.
+ Both Python and C bindings provide a useful function to
+ extract the ``is_bind``, ``binder`` and ``name`` values from
+ the data.
+
+/proc/kbus/bindings
+-------------------
+``/proc/kbus/bindings`` is a debugging aid for reporting the listener id,
+exclusive flag and message name for each binding, for each kbus device.
+
+An example might be::
+
+ $ cat /proc/kbus/bindings
+ # <device> is bound to <Ksock-ID> in <process-PID> as <Replier|Listener> for <message-name>
+ 1: 1 22158 R $.Sensors.*
+ 1: 2 22158 R $.Sensors.Kitchen.Temperature
+ 1: 3 22158 L $.Sensors.*
+ 13: 4 22159 L $.Jim.*
+ 13: 1 22159 R $.Fred
+ 13: 1 22159 L $.Jim
+ 13: 14 23021 L $.Jim.*
+
+This describes two KBUS devices (``/dev/kbus1`` and ``/dev/kbus13``).
+
+The first has bindings on Ksock ids 1, 2 and 3, for the given message names. The
+"R" indicates a replier binding, the "L" indicates a listener (non-replier)
+binding.
+
+The second has bindings on Ksock ids 4, 1 and 14. The order of the bindings
+reported is *not* particularly significant.
+
+Note that there is no communication between the two devices, so Ksock id 1 on
+device 1 is not related to (and has no commonality with) Ksock id 1 on device
+13.
+
+/proc/kbus/stats
+----------------
+``/proc/kbus/stats`` is a debugging aid for reporting various statistics about
+the KBUS devices and the Ksocks open on them.
+
+An example might be::
+
+ $ cat /proc/kbus/stats
+ dev 0: next file 5 next msg 8 unsent unbindings 0
+ ksock 4 last msg 0:7 queue 1 of 100
+ read byte 0 of 0, wrote byte 52 (max 60), sending
+ outstanding requests 0 (size 16, max 0), unsent replies 0 (max 0)
+ ksock 3 last msg 0:5 queue 0 of 1
+ read byte 0 of 0, wrote byte 0 (max 0), not sending
+ outstanding requests 1 (size 16, max 0), unsent replies 0 (max 0)
+
+or::
+
+ $ cat /proc/kbus/stats
+ dev 0: next file 4 next msg 101 unsent unbindings 0
+ ksock 3 last msg 0:0 queue 100 of 100
+ read byte 0 of 0, wrote byte 0 (max 0), not sending
+ outstanding requests 0 (size 16, max 0), unsent replies 0 (max 0)
+ ksock 2 last msg 0:100 queue 0 of 100
+ read byte 0 of 0, wrote byte 0 (max 0), not sending
+ outstanding requests 100 (size 102, max 92), unsent replies 0 (max 0)
+
+
+Error numbers
+-------------
+The following error numbers get special use. In Python, they are all returned
+as values inside the IOError exception.
+
+ Since we're trying to fit into the normal Un*x convention that negative
+ values are error numbers, and since Un*x defines many of these for us,
+ it is natural to make use of the relevant definitions. However, this also
+ means that we are often using them in an unnatural sense. I've tried to
+ make the error numbers used bear at least a vague relationship to their
+ (mis)use in KBUS.
+
+:EADDRINUSE: On attempting to bind a message name as replier: There is
+ already a replier bound for this message
+:EADDRNOTAVAIL: On attempting to send a Request message: There is no replier
+ bound for this message's name.
+
+ On attempting to send a Reply message: The sender of the
+ original request (i.e., the Ksock mentioned as the ``to``
+ in the Reply) is no longer connected.
+:EALREADY: On attempting to write to a Ksock, when a previous send has
+ returned EAGAIN. Either DISCARD the message, or use
+ select/poll to wait for the send to complete, and write to be
+ allowed.
+:EBADMSG: On attempting to bind, unbind or send a message: The message
+ name is not valid. On sending, this can also be because the
+ message name is a wildcard.
+:EBUSY: On attempting to send, then:
+
+ 1. For a request, the replier's message queue is full.
+ 2. For any message, with ALL_OR_FAIL set, one of the
+ targetted listener/replier queues was full.
+
+:ECONNREFUSED: On attempting to send a Reply, the intended recipient (the
+ notional original sender of the Request) is not expecting
+ a Reply with that message id in its 'in_reply_to'. Or, in
+ other words, this appears to be an attempt to reply to the
+ wrong message id or the wrong Ksock.
+:EINVAL: Something went wrong (generic error).
+:EMSGSIZE: On attempting to write a message: Data was written after
+ the end of the message (i.e., after the final end guard
+ of the message).
+:ENAMETOOLONG: On attempting to bind, unbind or send a message: The message
+ name is too long.
+:ENOENT: On attempting to open a Ksock: There is no such device
+ (normally because one has tried to open, for instance,
+ '/dev/kbus9' when there are only 3 KBUS devices).
+:ENOLCK: On attempting to send a Request, when there is not enough room
+ in the sender's message queue to guarantee that it can
+ receive a reply for every Request already sent, *plus* this
+ one. If there are oustanding messages in the sender's message
+ queue, then the solution is to read some of them. Otherwise,
+ the sender will have to wait until one of the Repliers
+ replies to a previous Request (or goes away and KBUS replies
+ for it).
+
+ When this error is received, the send has failed (just as if
+ the message was invalid). The sender is not left in "sending"
+ state, nor has the message been assigned a message id.
+
+ Note that this is *not* EAGAIN, since we do not want to block
+ the sender (in the SEND) if it is up to the sender to perform
+ a read to sort things out.
+
+:ENOMSG: On attempting to send, when there is no message waiting to be
+ sent (either because there has been no write since the last
+ send, or because the message being written has been
+ discarded).
+:EPIPE: On attempting to send 'to' a specific replier, the replier
+ with that id is no longer bound to the given message's name.
+
+:EFAULT: Memory allocation, copy from user space, or other such failed. This
+ is normally very bad, it should not happen, UNLESS it is the result
+ of calling an ioctl, when it indicates that the ioctl argument
+ cannot be accessed.
+
+:ENOMEM: Memory allocation failed (return NULL). This is normally very bad,
+ it should not happen.
+
+:EAGAIN: On attempting to send, the message being sent had ALL_OR_WAIT set,
+ and one of the targetted listener/replier queues was full.
+
+ On attempting to unbind when Replier Bind Events have been
+ requested, one or more of the KSocks bound to receive
+ "$.KBUS.ReplierBindEvent" messages has a full message queue,
+ and thus cannot receive the unbind event. The unbind has not been
+ done.
+
+In the ``utils`` directory of the KBUS sources, there is a script called
+``errno.py`` which takes an ``errno`` integer or name and prints out both the
+"normal" meaning of that error number, and also (if there is one) the KBUS use
+of it. For instance::
+
+ $ errno.py 1
+ Error 1 (0x1) is EPERM: Operation not permitted
+ $
+ $ errno.py EPIPE
+ EPIPE is error 32 (0x20): Broken pipe
+
+ KBUS:
+ On attempting to send 'to' a specific replier, the replier with that id
+ is no longer bound to the given message's name.
+
--
1.7.4.1

2011-03-18 17:45:35

by Tony Ibbs

[permalink] [raw]
Subject: [PATCH 05/11] KBUS add support for messages

This patch adds the code that actually allows KBUS to send
and receive messages.

Signed-off-by: Tony Ibbs <[email protected]>
---
ipc/kbus_main.c | 2766 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 2743 insertions(+), 23 deletions(-)

diff --git a/ipc/kbus_main.c b/ipc/kbus_main.c
index 87c7506..944b60c 100644
--- a/ipc/kbus_main.c
+++ b/ipc/kbus_main.c
@@ -65,6 +65,10 @@ static int kbus_num_devices = CONFIG_KBUS_DEF_NUM_DEVICES;
static int kbus_major; /* 0 => We'll go for dynamic allocation */
static int kbus_minor; /* 0 => We're happy to start with device 0 */

+/* We can't need more than 8 characters of padding, by definition! */
+static char *static_zero_padding = "\0\0\0\0\0\0\0\0";
+static u32 static_end_guard = KBUS_MSG_END_GUARD;
+
/* Our actual devices, 0 through kbus_num_devices-1 */
static struct kbus_dev **kbus_devices;

@@ -72,9 +76,21 @@ static struct class *kbus_class_p;

/* ========================================================================= */

+/* As few foreshadowings as I can get away with */
+static struct kbus_private_data *kbus_find_open_ksock(struct kbus_dev *dev,
+ u32 id);
+
/* I really want this function where it is in the code, so need to foreshadow */
static int kbus_setup_new_device(int which);

+/* More or less ditto */
+static int kbus_write_to_recipients(struct kbus_private_data *priv,
+ struct kbus_dev *dev,
+ struct kbus_msg *msg);
+
+static int kbus_alloc_ref_data(struct kbus_private_data *priv,
+ u32 data_len,
+ struct kbus_data_ptr **ret_ref_data);
/* ========================================================================= */

/* What's the symbolic name of a replier type? */
@@ -93,6 +109,376 @@ static const char *kbus_replier_type_name(enum kbus_replier_type t)
}

/*
+ * Wrap a set of data pointers and lengths in a reference
+ */
+static struct kbus_data_ptr *kbus_wrap_data_in_ref(int as_pages,
+ unsigned num_parts,
+ unsigned long *parts,
+ unsigned *lengths,
+ unsigned last_page_len)
+{
+ struct kbus_data_ptr *new = NULL;
+
+ new = kmalloc(sizeof(*new), GFP_KERNEL);
+ if (!new)
+ return NULL;
+
+ new->as_pages = as_pages;
+ new->parts = parts;
+ new->lengths = lengths;
+ new->num_parts = num_parts;
+ new->last_page_len = last_page_len;
+
+ kref_init(&new->refcount);
+ return new;
+}
+
+/*
+ * Increment the reference count for our pointer.
+ *
+ * Returns the (same) reference, for convenience.
+ */
+static struct kbus_data_ptr *kbus_raise_data_ref(struct kbus_data_ptr *refdata)
+{
+ if (refdata != NULL)
+ kref_get(&refdata->refcount);
+ return refdata;
+}
+
+/*
+ * Data release callback for data reference pointers. Called when the reference
+ * count says to...
+ */
+static void kbus_release_data_ref(struct kref *ref)
+{
+ struct kbus_data_ptr *refdata = container_of(ref,
+ struct kbus_data_ptr,
+ refcount);
+ if (refdata->parts == NULL) {
+ /* Not that I think this can happen */
+ pr_err("kbus: Removing data reference,"
+ " but data ptr already freed\n");
+ } else {
+ int jj;
+ if (refdata->as_pages)
+ for (jj = 0; jj < refdata->num_parts; jj++)
+ free_page((unsigned long)refdata->parts[jj]);
+ else
+ for (jj = 0; jj < refdata->num_parts; jj++)
+ kfree((void *)refdata->parts[jj]);
+ kfree(refdata->parts);
+ kfree(refdata->lengths);
+ refdata->parts = NULL;
+ refdata->lengths = NULL;
+ }
+ kfree(refdata);
+}
+
+/*
+ * Forget a reference to our pointer, and if no-one cares anymore, free it and
+ * its contents.
+ */
+static void kbus_lower_data_ref(struct kbus_data_ptr *refdata)
+{
+ if (refdata == NULL)
+ return;
+ kref_put(&refdata->refcount, kbus_release_data_ref);
+}
+
+/*
+ * Wrap a string in a reference. Does not take a copy of the string,
+ * but note that the release mechanism (triggered when there are no more
+ * references to the string) *will* free it.
+ */
+static struct kbus_name_ptr *kbus_wrap_name_in_ref(char *str)
+{
+ struct kbus_name_ptr *new = NULL;
+
+ new = kmalloc(sizeof(*new), GFP_KERNEL);
+ if (!new)
+ return NULL;
+
+ new->name = str;
+ kref_init(&new->refcount);
+ return new;
+}
+
+/*
+ * Increment the reference count for a string reference
+ *
+ * Returns the (same) reference, for convenience.
+ */
+static struct kbus_name_ptr *kbus_raise_name_ref(struct kbus_name_ptr *refname)
+{
+ if (refname != NULL)
+ kref_get(&refname->refcount);
+ return refname;
+}
+
+/*
+ * Data release callback for string reference pointers.
+ * Called when the reference count says to...
+ */
+static void kbus_release_name_ref(struct kref *ref)
+{
+ struct kbus_name_ptr *refname = container_of(ref,
+ struct kbus_name_ptr,
+ refcount);
+ if (refname->name == NULL) {
+ /* Not that I think this can happen */
+ pr_err("kbus: Removing name reference,"
+ " but name ptr already freed\n");
+ } else {
+ kfree(refname->name);
+ refname->name = NULL;
+ }
+ kfree(refname);
+}
+
+/*
+ * Forget a reference to our string, and if no-one cares anymore, free it and
+ * its contents.
+ */
+static void kbus_lower_name_ref(struct kbus_name_ptr *refname)
+{
+ if (refname == NULL)
+ return;
+
+ kref_put(&refname->refcount, kbus_release_name_ref);
+}
+
+/*
+ * Return a stab at the next size for an array
+ */
+static u32 kbus_next_size(u32 old_size)
+{
+ if (old_size < 16)
+ /* For very small numbers, just double */
+ return old_size << 1;
+ /* Otherwise, try something like the mechanism used for Python
+ * lists - doubling feels a bit over the top */
+ return old_size + (old_size >> 3);
+}
+
+/* Determine (and return) the next message serial number */
+static u32 kbus_next_serial_num(struct kbus_dev *dev)
+{
+ if (dev->next_msg_serial_num == 0)
+ dev->next_msg_serial_num++;
+ return dev->next_msg_serial_num++;
+}
+
+static int kbus_same_message_id(struct kbus_msg_id *msg_id,
+ u32 network_id, u32 serial_num)
+{
+ return msg_id->network_id == network_id &&
+ msg_id->serial_num == serial_num;
+}
+
+static int kbus_init_msg_id_memory(struct kbus_private_data *priv)
+{
+ struct kbus_msg_id_mem *mem = &priv->outstanding_requests;
+ struct kbus_msg_id *ids;
+
+ ids = kmalloc(sizeof(*ids) * KBUS_INIT_MSG_ID_MEMSIZE, GFP_KERNEL);
+ if (!ids)
+ return -ENOMEM;
+
+ memset(ids, 0, sizeof(*ids) * KBUS_INIT_MSG_ID_MEMSIZE);
+
+ mem->count = 0;
+ mem->max_count = 0;
+ mem->ids = ids;
+ mem->size = KBUS_INIT_MSG_ID_MEMSIZE;
+ return 0;
+}
+
+static void kbus_empty_msg_id_memory(struct kbus_private_data *priv)
+{
+ struct kbus_msg_id_mem *mem = &priv->outstanding_requests;
+
+ if (mem->ids == NULL)
+ return;
+
+ kfree(mem->ids);
+ mem->ids = NULL;
+ mem->size = 0;
+ mem->max_count = 0;
+ mem->count = 0;
+}
+
+/*
+ * Note we don't worry about whether the id is already in there - if
+ * the user cares, that's up to them (I don't think I do)
+ */
+static int kbus_remember_msg_id(struct kbus_private_data *priv,
+ struct kbus_msg_id *id)
+{
+ struct kbus_msg_id_mem *mem = &priv->outstanding_requests;
+ int ii, which;
+
+ kbus_maybe_dbg(priv->dev, " %u Remembering outstanding"
+ " request %u:%u (count->%d)\n",
+ priv->id, id->network_id, id->serial_num, mem->count+1);
+
+ /* First, try for an empty slot we can re-use */
+ for (ii = 0; ii < mem->size; ii++) {
+ if (kbus_same_message_id(&mem->ids[ii], 0, 0)) {
+ which = ii;
+ goto done;
+ }
+
+ }
+ /* Otherwise, give in and use a new one */
+ if (mem->count == mem->size) {
+ u32 old_size = mem->size;
+ u32 new_size = kbus_next_size(old_size);
+
+ kbus_maybe_dbg(priv->dev, " %u XXX outstanding"
+ " request array size %u -> %u\n",
+ priv->id, old_size, new_size);
+
+ mem->ids = krealloc(mem->ids,
+ new_size * sizeof(struct kbus_msg_id),
+ GFP_KERNEL);
+ if (!mem->ids)
+ return -EFAULT;
+ for (ii = old_size; ii < new_size; ii++) {
+ mem->ids[ii].network_id = 0;
+ mem->ids[ii].serial_num = 0;
+ }
+ mem->size = new_size;
+ which = mem->count;
+ }
+ which = mem->count;
+done:
+ mem->ids[which] = *id;
+ mem->count++;
+ if (mem->count > mem->max_count)
+ mem->max_count = mem->count;
+ return 0;
+}
+
+/* Returns 0 if we found it, -1 if we couldn't find it */
+static int kbus_find_msg_id(struct kbus_private_data *priv,
+ struct kbus_msg_id *id)
+{
+ struct kbus_msg_id_mem *mem = &priv->outstanding_requests;
+ int ii;
+ for (ii = 0; ii < mem->size; ii++) {
+ if (!kbus_same_message_id(&mem->ids[ii],
+ id->network_id, id->serial_num))
+ continue;
+ kbus_maybe_dbg(priv->dev, " %u Found outstanding "
+ "request %u:%u (count=%d)\n",
+ priv->id, id->network_id,
+ id->serial_num, mem->count);
+ return 0;
+ }
+ kbus_maybe_dbg(priv->dev,
+ " %u Could not find outstanding "
+ "request %u:%u (count=%d)\n",
+ priv->id, id->network_id,
+ id->serial_num, mem->count);
+ return -1;
+}
+
+/* Returns 0 if we found and forgot it, -1 if we couldn't find it */
+static int kbus_forget_msg_id(struct kbus_private_data *priv,
+ struct kbus_msg_id *id)
+{
+ struct kbus_msg_id_mem *mem = &priv->outstanding_requests;
+ int ii;
+ for (ii = 0; ii < mem->size; ii++) {
+ if (!kbus_same_message_id(&mem->ids[ii],
+ id->network_id, id->serial_num))
+ continue;
+
+ mem->ids[ii].network_id = 0;
+ mem->ids[ii].serial_num = 0;
+ mem->count--;
+ kbus_maybe_dbg(priv->dev,
+ " %u Forgot outstanding "
+ "request %u:%u (count<-%d)\n",
+ priv->id, id->network_id,
+ id->serial_num, mem->count);
+
+ return 0;
+ }
+ kbus_maybe_dbg(priv->dev,
+ " %u Could not forget outstanding "
+ "request %u:%u (count<-%d)\n",
+ priv->id, id->network_id,
+ id->serial_num, mem->count);
+ return -1;
+}
+
+/* A message is a reply iff 'in_reply_to' is non-zero */
+static int kbus_message_is_reply(struct kbus_msg *msg)
+{
+ return !kbus_same_message_id(&msg->in_reply_to, 0, 0);
+}
+
+/*
+ * Build a KBUS synthetic message/exception. We assume no data.
+ *
+ * The message built is a 'pointy' message.
+ *
+ * 'msg_name' is copied.
+ *
+ * Use kbus_free_message() to free this message when it is finished with.
+ */
+static struct kbus_msg
+*kbus_build_kbus_message(struct kbus_dev *dev,
+ char *msg_name,
+ u32 from,
+ u32 to, struct kbus_msg_id in_reply_to)
+{
+ struct kbus_msg *new_msg;
+ struct kbus_name_ptr *name_ref;
+
+ size_t msg_name_len = strlen(msg_name);
+ char *msg_name_copy;
+
+ new_msg = kmalloc(sizeof(*new_msg), GFP_KERNEL);
+ if (!new_msg) {
+ dev_err(dev->dev, "Cannot kmalloc synthetic message\n");
+ return NULL;
+ }
+
+ msg_name_copy = kmalloc(msg_name_len + 1, GFP_KERNEL);
+ if (!msg_name_copy) {
+ dev_err(dev->dev, "Cannot kmalloc synthetic message's name\n");
+ kfree(new_msg);
+ return NULL;
+ }
+
+ strncpy(msg_name_copy, msg_name, msg_name_len);
+ msg_name_copy[msg_name_len] = '\0';
+
+ name_ref = kbus_wrap_name_in_ref(msg_name_copy);
+ if (!name_ref) {
+ dev_err(dev->dev, "Cannot kmalloc synthetic message's string ref\n");
+ kfree(new_msg);
+ kfree(msg_name_copy);
+ return NULL;
+ }
+
+ memset(new_msg, 0, sizeof(*new_msg));
+
+ new_msg->from = from;
+ new_msg->to = to;
+ new_msg->in_reply_to = in_reply_to;
+ new_msg->flags = KBUS_BIT_SYNTHETIC;
+ new_msg->name_ref = name_ref;
+ new_msg->name_len = msg_name_len;
+
+ new_msg->id.serial_num = kbus_next_serial_num(dev);
+
+ return new_msg;
+}
+
+/*
* Given a message name, is it valid?
*
* We have nothing to say on maximum length.
@@ -130,6 +516,682 @@ static int kbus_bad_message_name(char *name, size_t name_len)
}

/*
+ * Is a message name wildcarded?
+ *
+ * We assume it is already checked to be a valid name
+ *
+ * Returns 1 if it is, 0 if not. In other words, returns 1
+ * if the name is not a valid destination.
+ */
+static int kbus_wildcarded_message_name(char *name, size_t name_len)
+{
+ return name[name_len - 1] == '*' || name[name_len - 1] == '%';
+}
+
+/*
+ * Is a message name legitimate for writing/sending?
+ *
+ * This is an omnibus call of the last two checks, with error output.
+ *
+ * Returns 0 if it's OK, 1 if it's naughty
+ */
+static int kbus_invalid_message_name(struct kbus_dev *dev,
+ char *name, size_t name_len)
+{
+ if (kbus_bad_message_name(name, name_len)) {
+ dev_err(dev->dev, "pid %u [%s]"
+ " (send) message name '%.*s' is not allowed\n",
+ current->pid, current->comm, (int)name_len, name);
+ return 1;
+ }
+ if (kbus_wildcarded_message_name(name, name_len)) {
+ dev_err(dev->dev, "pid %u [%s]"
+ " (send) sending to wildcards not allowed, "
+ "message name '%.*s'\n",
+ current->pid, current->comm, (int)name_len, name);
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Does this message name match the given binding?
+ *
+ * The binding may be a normal message name, or a wildcard.
+ *
+ * We assume that both names are legitimate.
+ */
+static int kbus_message_name_matches(char *name, size_t name_len, char *other)
+{
+ size_t other_len = strlen(other);
+
+ if (other[other_len - 1] == '*' || other[other_len - 1] == '%') {
+ char *rest = name + other_len - 1;
+ size_t rest_len = name_len - other_len + 1;
+
+ /*
+ * If we have '$.Fred.*', then we need at least '$.Fred.X'
+ * to match
+ */
+ if (name_len < other_len)
+ return false;
+ /*
+ * Does the name match all of the wildcard except the
+ * last character?
+ */
+ if (strncmp(other, name, other_len - 1))
+ return false;
+
+ /* '*' matches anything at all, so we're done */
+ if (other[other_len - 1] == '*')
+ return true;
+
+ /* '%' only matches if we don't have another dot */
+ if (strnchr(rest, rest_len, '.'))
+ return false;
+ else
+ return true;
+ } else {
+ if (name_len != other_len)
+ return false;
+ else
+ return !strncmp(name, other, name_len);
+ }
+}
+
+/*
+ * Check if a message read by kbus_write() is well formed
+ *
+ * Return 0 if a message is well-formed, negative otherwise.
+ */
+static int kbus_check_message_written(struct kbus_dev *dev,
+ struct kbus_write_msg *this)
+{
+ struct kbus_message_header *user_msg =
+ (struct kbus_message_header *)&this->user_msg;
+
+ if (this == NULL) {
+ dev_err(dev->dev, "pid %u [%s]"
+ " Tried to check NULL message\n",
+ current->pid, current->comm);
+ return -EINVAL;
+ }
+
+ if (user_msg->start_guard != KBUS_MSG_START_GUARD) {
+ dev_err(dev->dev, "pid %u [%s]"
+ " message start guard is %08x, not %08x",
+ current->pid, current->comm,
+ user_msg->start_guard, KBUS_MSG_START_GUARD);
+ return -EINVAL;
+ }
+ if (user_msg->end_guard != KBUS_MSG_END_GUARD) {
+ dev_err(dev->dev, "pid %u [%s]"
+ " message end guard is %08x, not %08x\n",
+ current->pid, current->comm,
+ user_msg->end_guard, KBUS_MSG_END_GUARD);
+ return -EINVAL;
+ }
+
+ if (user_msg->name_len == 0) {
+ dev_err(dev->dev, "pid %u [%s]"
+ " Message name length is 0\n",
+ current->pid, current->comm);
+ return -EINVAL;
+ }
+ if (user_msg->name_len > KBUS_MAX_NAME_LEN) {
+ dev_err(dev->dev, "pid %u [%s]"
+ " Message name length is %u, more than %u\n",
+ current->pid, current->comm,
+ user_msg->name_len, KBUS_MAX_NAME_LEN);
+ return -ENAMETOOLONG;
+ }
+
+ if (user_msg->name == NULL) {
+ if (user_msg->data != NULL) {
+ dev_err(dev->dev, "pid %u [%s]"
+ " Message name is inline, data is not\n",
+ current->pid, current->comm);
+ return -EINVAL;
+ }
+ } else {
+ if (user_msg->data == NULL && user_msg->data_len != 0) {
+ dev_err(dev->dev, "pid %u [%s]"
+ " Message data is inline, name is not\n",
+ current->pid, current->comm);
+ return -EINVAL;
+ }
+ }
+
+ if (user_msg->data_len == 0 && user_msg->data != NULL) {
+ dev_err(dev->dev, "pid %u [%s]"
+ " Message data length is 0, but data pointer is set\n",
+ current->pid, current->comm);
+ return -EINVAL;
+ }
+
+ /* It's not legal to set both ALL_OR_WAIT and ALL_OR_FAIL */
+ if ((user_msg->flags & KBUS_BIT_ALL_OR_WAIT) &&
+ (user_msg->flags & KBUS_BIT_ALL_OR_FAIL)) {
+ dev_err(dev->dev, "pid %u [%s]"
+ " Message cannot have both ALL_OR_WAIT and "
+ "ALL_OR_FAIL set\n",
+ current->pid, current->comm);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+/*
+ * Output a description of an in-kernel message
+ */
+static void kbus_maybe_report_message(struct kbus_dev *dev __maybe_unused,
+ struct kbus_msg *msg __maybe_unused)
+{
+ if (msg->data_len) {
+ struct kbus_data_ptr *data_p = msg->data_ref;
+ uint8_t *part0 __maybe_unused = (uint8_t *) data_p->parts[0];
+ kbus_maybe_dbg(dev, "=== %u:%u '%.*s'"
+ " to %u from %u in-reply-to %u:%u orig %u,%u "
+ "final %u:%u flags %04x:%04x"
+ " data/%u<in%u> %02x.%02x.%02x.%02x\n",
+ msg->id.network_id, msg->id.serial_num,
+ msg->name_len, msg->name_ref->name,
+ msg->to, msg->from,
+ msg->in_reply_to.network_id, msg->in_reply_to.serial_num,
+ msg->orig_from.network_id, msg->orig_from.local_id,
+ msg->final_to.network_id, msg->final_to.local_id,
+ (msg->flags & 0xFFFF0000) >> 4,
+ (msg->flags & 0x0000FFFF), msg->data_len,
+ data_p->num_parts, part0[0], part0[1], part0[2],
+ part0[3]);
+ } else {
+ kbus_maybe_dbg(dev, "=== %u:%u '%.*s'"
+ " to %u from %u in-reply-to %u:%u orig %u,%u "
+ "final %u,%u flags %04x:%04x\n",
+ msg->id.network_id, msg->id.serial_num,
+ msg->name_len, msg->name_ref->name,
+ msg->to, msg->from,
+ msg->in_reply_to.network_id, msg->in_reply_to.serial_num,
+ msg->orig_from.network_id, msg->orig_from.local_id,
+ msg->final_to.network_id, msg->final_to.local_id,
+ (msg->flags & 0xFFFF0000) >> 4,
+ (msg->flags & 0x0000FFFF));
+ }
+}
+
+/*
+ * Copy a message, doing whatever is deemed necessary.
+ *
+ * Copies the message header, and also copies the message name and any
+ * data. The message must be a 'pointy' message with reference counted
+ * name and data.
+ */
+static struct kbus_msg *kbus_copy_message(struct kbus_dev *dev,
+ struct kbus_msg *old_msg)
+{
+ struct kbus_msg *new_msg;
+
+ new_msg = kmalloc(sizeof(*new_msg), GFP_KERNEL);
+ if (!new_msg) {
+ dev_err(dev->dev, "Cannot kmalloc copy of message header\n");
+ return NULL;
+ }
+ if (!memcpy(new_msg, old_msg, sizeof(*new_msg))) {
+ dev_err(dev->dev, "Cannot copy message header\n");
+ kfree(new_msg);
+ return NULL;
+ }
+
+ /* In case of error before we're finished... */
+ new_msg->name_ref = NULL;
+ new_msg->data_ref = NULL;
+
+ new_msg->name_ref = kbus_raise_name_ref(old_msg->name_ref);
+
+ if (new_msg->data_len)
+ /* Take a new reference to the data */
+ new_msg->data_ref = kbus_raise_data_ref(old_msg->data_ref);
+ return new_msg;
+}
+
+/*
+ * Free a message.
+ *
+ * Also dereferences the message name and any message data.
+ */
+static void kbus_free_message(struct kbus_msg *msg)
+{
+ if (msg->name_ref)
+ kbus_lower_name_ref(msg->name_ref);
+ msg->name_len = 0;
+ msg->name_ref = NULL;
+
+ if (msg->data_len && msg->data_ref)
+ kbus_lower_data_ref(msg->data_ref);
+
+ msg->data_len = 0;
+ msg->data_ref = NULL;
+ kfree(msg);
+}
+
+static void kbus_empty_read_msg(struct kbus_private_data *priv)
+{
+ struct kbus_read_msg *this = &(priv->read);
+ int ii;
+
+ if (this->msg) {
+ kbus_free_message(this->msg);
+ this->msg = NULL;
+ }
+
+ for (ii = 0; ii < KBUS_NUM_PARTS; ii++) {
+ this->parts[ii] = NULL;
+ this->lengths[ii] = 0;
+ }
+ this->which = 0;
+ this->pos = 0;
+ this->ref_data_index = 0;
+}
+
+static void kbus_empty_write_msg(struct kbus_private_data *priv)
+{
+ struct kbus_write_msg *this = &priv->write;
+ if (this->msg) {
+ kbus_free_message(this->msg);
+ this->msg = NULL;
+ }
+
+ if (this->ref_name) {
+ kbus_lower_name_ref(this->ref_name);
+ this->ref_name = NULL;
+ }
+
+ if (this->ref_data) {
+ kbus_lower_data_ref(this->ref_data);
+ this->ref_data = NULL;
+ }
+
+ this->is_finished = false;
+ this->pos = 0;
+ this->which = 0;
+}
+
+/*
+ * Copy the given message, and add it to the end of the queue.
+ *
+ * This is the *only* way of adding a message to a queue. It shall remain so.
+ *
+ * We assume the message has been checked for sanity.
+ *
+ * 'msg' is the message to add to the queue.
+ *
+ * 'binding' is a pointer to the KBUS message name binding that caused the
+ * message to be added.
+ *
+ * 'for_replier' is true if this particular message is being pushed to the
+ * message's replier's queue. Specifically, it's true if this is a Reply
+ * to this Ksock, or a Request aimed at this Ksock (as Replier).
+ *
+ * Returns 0 if all goes well, or -EFAULT/-ENOMEM if we can't allocate
+ * datastructures.
+ *
+ * May also return negative values if the message is mis-named or malformed,
+ * at least at the moment.
+ */
+static int kbus_push_message(struct kbus_private_data *priv,
+ struct kbus_msg *msg,
+ struct kbus_message_binding *binding,
+ int for_replier)
+{
+ struct list_head *queue = &priv->message_queue;
+ struct kbus_msg *new_msg = NULL;
+ struct kbus_message_queue_item *item;
+
+ kbus_maybe_dbg(priv->dev,
+ " %u Pushing message onto queue (%s)\n",
+ priv->id, for_replier ? "replier" : "listener");
+
+ new_msg = kbus_copy_message(priv->dev, msg);
+ if (!new_msg)
+ return -EFAULT;
+
+ item = kmalloc(sizeof(*item), GFP_KERNEL);
+ if (!item) {
+ dev_err(priv->dev->dev, "Cannot kmalloc new message item\n");
+ kbus_free_message(new_msg);
+ return -ENOMEM;
+ }
+ kbus_maybe_report_message(priv->dev, new_msg);
+
+ if (for_replier && (KBUS_BIT_WANT_A_REPLY & msg->flags)) {
+ /*
+ * This message wants a reply, and is for the message's
+ * replier, so they need to be told that they are to reply to
+ * this message
+ */
+ new_msg->flags |= KBUS_BIT_WANT_YOU_TO_REPLY;
+ kbus_maybe_dbg(priv->dev,
+ " Setting WANT_YOU_TO_REPLY, "
+ "flags %08x\n",
+ new_msg->flags);
+ } else {
+ /*
+ * The recipient is *not* the replier for this message,
+ * so it is not responsible for replying.
+ */
+ new_msg->flags &= ~KBUS_BIT_WANT_YOU_TO_REPLY;
+ }
+
+ /* And join it up... */
+ item->msg = new_msg;
+ item->binding = binding;
+
+ /* By default, we're using the list as a FIFO, so we want to add our
+ * new message to the end (just before the first item). However, if the
+ * URGENT flag is set, then we instead want to add it to the start.
+ */
+ if (msg->flags & KBUS_BIT_URGENT) {
+ kbus_maybe_dbg(priv->dev, " Message is URGENT\n");
+ list_add(&item->list, queue);
+ } else {
+ list_add_tail(&item->list, queue);
+ }
+
+ priv->message_count++;
+
+ if (!kbus_same_message_id(&msg->in_reply_to, 0, 0)) {
+ /*
+ * If it's a reply (and this will include a synthetic reply,
+ * since we're checking the "in_reply_to" field) then the
+ * original sender has now had its request satisfied.
+ */
+ int retval = kbus_forget_msg_id(priv, &msg->in_reply_to);
+
+ if (retval)
+ /* But there's not much we can do about it */
+ dev_err(priv->dev->dev,
+ "%u Error forgetting "
+ "outstanding request %u:%u\n",
+ priv->id, msg->in_reply_to.network_id,
+ msg->in_reply_to.serial_num);
+ }
+
+ /* And indicate that there is something available to read */
+ wake_up_interruptible(&priv->read_wait);
+
+ kbus_maybe_dbg(priv->dev,
+ "%u Leaving %d message%s in queue\n",
+ priv->id, priv->message_count,
+ priv->message_count == 1 ? "" : "s");
+
+ return 0;
+}
+
+/*
+ * Generate a synthetic message, and add it to the recipient's message queue.
+ *
+ * This is to be used when a Reply is not going to be generated
+ * by the intended Replier. Since we don't want KBUS itself to block on
+ * (trying to) SEND a message to someone not expecting it, I don't think
+ * there are any other occasions when it is useful.
+ *
+ * 'from' is the id of the recipient who has gone away, not received the
+ * message, or whatever.
+ *
+ * 'to' is the 'from' for the message we're bouncing (or whatever). This
+ * needs to be local (it cannot be on another network), so we don't specify
+ * the network id.
+ *
+ * 'in_reply_to' should be the message id of that same message.
+ *
+ * Note that the message is essentially a Reply, so it only goes to the
+ * original Sender.
+ *
+ * Doesn't return anything since I can't think of anything useful to do if it
+ * goes wrong.
+ */
+static void kbus_push_synthetic_message(struct kbus_dev *dev,
+ u32 from,
+ u32 to,
+ struct kbus_msg_id in_reply_to,
+ char *name)
+{
+ struct kbus_private_data *priv = NULL;
+ struct kbus_msg *new_msg;
+
+ /* Who *was* the original message to? */
+ priv = kbus_find_open_ksock(dev, to);
+ if (!priv) {
+ dev_err(dev->dev,
+ "pid %u [%s] Cannot send synthetic reply to %u,"
+ " as they are gone\n", current->pid, current->comm, to);
+ return;
+ }
+
+ kbus_maybe_dbg(priv->dev, " Pushing synthetic message '%s'"
+ " onto queue for %u\n", name, to);
+
+ /*
+ * Note that we do not check if the destination queue is full
+ * - we're going to trust that the "keep enough room in the
+ * message queue for a reply to each request" mechanism does
+ * it's job properly.
+ */
+
+ new_msg = kbus_build_kbus_message(dev, name, from, to, in_reply_to);
+ if (!new_msg)
+ return;
+
+ (void)kbus_push_message(priv, new_msg, NULL, false);
+ /* ignore retval; we can't do anything useful if this goes wrong */
+
+ /* kbus_push_message takes a copy of our message */
+ kbus_free_message(new_msg);
+}
+
+/*
+ * Pop the next message off our queue.
+ *
+ * Returns a pointer to the message, or NULL if there is no next message.
+ */
+static struct kbus_msg *kbus_pop_message(struct kbus_private_data *priv)
+{
+ struct list_head *queue = &priv->message_queue;
+ struct kbus_message_queue_item *item;
+ struct kbus_msg *msg = NULL;
+
+ kbus_maybe_dbg(priv->dev, " %u Popping message from queue\n",
+ priv->id);
+
+ if (list_empty(queue))
+ return NULL;
+
+ /* Retrieve the next message */
+ item = list_first_entry(queue, struct kbus_message_queue_item, list);
+
+ /* And straightway remove it from the list */
+ list_del(&item->list);
+
+ priv->message_count--;
+
+ msg = item->msg;
+ kfree(item);
+
+ /* If doing that made us go from no-room to some-room, wake up */
+ if (priv->message_count == (priv->max_messages - 1))
+ wake_up_interruptible(&priv->dev->write_wait);
+
+ kbus_maybe_report_message(priv->dev, msg);
+ kbus_maybe_dbg(priv->dev,
+ " %u Leaving %d message%s in queue\n",
+ priv->id, priv->message_count,
+ priv->message_count == 1 ? "" : "s");
+
+ return msg;
+}
+
+/*
+ * Empty a message queue. Send synthetic messages for any outstanding
+ * request messages that are now not going to be delivered/replied to.
+ */
+static void kbus_empty_message_queue(struct kbus_private_data *priv)
+{
+ struct list_head *queue = &priv->message_queue;
+ struct kbus_message_queue_item *ptr;
+ struct kbus_message_queue_item *next;
+
+ kbus_maybe_dbg(priv->dev, " %u Emptying message queue\n", priv->id);
+
+ list_for_each_entry_safe(ptr, next, queue, list) {
+ struct kbus_msg *msg = ptr->msg;
+ int is_OUR_request = (KBUS_BIT_WANT_YOU_TO_REPLY & msg->flags);
+
+ kbus_maybe_report_message(priv->dev, msg);
+
+ /*
+ * If it wanted a reply (from us). let the sender know it's
+ * going away (but take care not to send a message to
+ * ourselves, by accident!)
+ */
+ if (is_OUR_request && msg->to != priv->id)
+ kbus_push_synthetic_message(priv->dev, priv->id,
+ msg->from, msg->id,
+ KBUS_MSG_NAME_REPLIER_GONEAWAY);
+
+ list_del(&ptr->list);
+ kbus_free_message(ptr->msg);
+
+ priv->message_count--;
+ }
+
+ kbus_maybe_dbg(priv->dev,
+ " %u Leaving %d message%s in queue\n",
+ priv->id, priv->message_count,
+ priv->message_count == 1 ? "" : "s");
+}
+
+/*
+ * Add a message to the list of messages read by the replier, but still needing
+ * a reply.
+ */
+static int kbus_reply_needed(struct kbus_private_data *priv,
+ struct kbus_msg *msg)
+{
+ struct list_head *queue = &priv->replies_unsent;
+ struct kbus_unreplied_item *item;
+
+ kbus_maybe_dbg(priv->dev,
+ " %u Adding message %u:%u to unsent "
+ "replies list\n",
+ priv->id, msg->id.network_id,
+ msg->id.serial_num);
+
+ item = kmalloc(sizeof(*item), GFP_KERNEL);
+ if (!item) {
+ dev_err(priv->dev->dev, "Cannot kmalloc reply-needed item\n");
+ return -ENOMEM;
+ }
+
+ item->id = msg->id;
+ item->from = msg->from;
+ item->name_len = msg->name_len;
+ /*
+ * It seems sensible to use a reference to the name. I believe
+ * we are safe to do this because we have the message "in hand".
+ */
+ item->name_ref = kbus_raise_name_ref(msg->name_ref);
+
+ list_add(&item->list, queue);
+
+ priv->num_replies_unsent++;
+
+ if (priv->num_replies_unsent > priv->max_replies_unsent)
+ priv->max_replies_unsent = priv->num_replies_unsent;
+
+ kbus_maybe_dbg(priv->dev,
+ " %u Leaving %d message%s unreplied-to\n",
+ priv->id, priv->num_replies_unsent,
+ priv->num_replies_unsent == 1 ? "" : "s");
+
+ return 0;
+}
+
+/*
+ * Remove a message from the list of (read) messages needing a reply
+ *
+ * Returns 0 on success, -1 if it could not find the message
+ */
+static int kbus_reply_now_sent(struct kbus_private_data *priv,
+ struct kbus_msg_id *msg_id)
+{
+ struct list_head *queue = &priv->replies_unsent;
+ struct kbus_unreplied_item *ptr;
+ struct kbus_unreplied_item *next;
+
+ list_for_each_entry_safe(ptr, next, queue, list) {
+ if (!kbus_same_message_id(&ptr->id,
+ msg_id->network_id,
+ msg_id->serial_num))
+ continue;
+
+ kbus_maybe_dbg(priv->dev,
+ " %u Reply to %u:%u %.*s now sent\n",
+ priv->id, msg_id->network_id,
+ msg_id->serial_num, ptr->name_len, ptr->name_ref->name);
+
+ list_del(&ptr->list);
+ kbus_lower_name_ref(ptr->name_ref);
+ kfree(ptr);
+
+ priv->num_replies_unsent--;
+
+ kbus_maybe_dbg(priv->dev,
+ " %u Leaving %d message%s unreplied-to\n",
+ priv->id, priv->num_replies_unsent,
+ priv->num_replies_unsent == 1 ? "" : "s");
+
+ return 0;
+ }
+
+ dev_err(priv->dev->dev, "%u Could not find message %u:%u in unsent "
+ "replies list\n",
+ priv->id, msg_id->network_id, msg_id->serial_num);
+ return -1;
+}
+
+/*
+ * Empty our "replies unsent" queue. Send synthetic messages for any
+ * request messages that are now not going to be replied to.
+ */
+static void kbus_empty_replies_unsent(struct kbus_private_data *priv)
+{
+ struct list_head *queue = &priv->replies_unsent;
+ struct kbus_unreplied_item *ptr;
+ struct kbus_unreplied_item *next;
+
+ kbus_maybe_dbg(priv->dev,
+ " %u Emptying unreplied messages list\n", priv->id);
+
+ list_for_each_entry_safe(ptr, next, queue, list) {
+
+ kbus_push_synthetic_message(priv->dev, priv->id,
+ ptr->from, ptr->id,
+ KBUS_MSG_NAME_REPLIER_IGNORED);
+
+ list_del(&ptr->list);
+ kbus_lower_name_ref(ptr->name_ref);
+ kfree(ptr);
+
+ priv->num_replies_unsent--;
+ }
+
+ kbus_maybe_dbg(priv->dev,
+ " %u Leaving %d message%s unreplied-to\n",
+ priv->id, priv->num_replies_unsent,
+ priv->num_replies_unsent == 1 ? "" : "s");
+}
+
+/*
* Find out who, if anyone, is bound as a replier to the given message name.
*
* Returns 1 if we found a replier, 0 if we did not (but all went well), and
@@ -156,12 +1218,139 @@ static int kbus_find_replier(struct kbus_dev *dev,
strncmp(name, ptr->name, name_len))
continue;

- kbus_maybe_dbg(dev, " '%.*s' has replier %u\n",
- ptr->name_len, ptr->name, ptr->bound_to_id);
- *bound_to = ptr->bound_to;
- return 1;
+ kbus_maybe_dbg(dev, " '%.*s' has replier %u\n",
+ ptr->name_len, ptr->name, ptr->bound_to_id);
+ *bound_to = ptr->bound_to;
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Find out who, if anyone, is bound as listener/replier to this message name.
+ *
+ * 'listeners' is an array of (pointers to) listener bindings. It may be NULL
+ * (if there are no listeners or if there was an error). It is up to the caller
+ * to free it. It does not include (pointers to) any replier binding.
+ *
+ * If there is also a replier for this message, then 'replier' will be (a
+ * pointer to) its binding, otherwise it will be NULL. The replier will not be
+ * in the 'listeners' array, so the caller must check both.
+ *
+ * Note that a particular listener may be present more than once, if that
+ * particular listener has bound to the message more than once (but no
+ * *binding* will be represented more than once).
+ *
+ * Returns the number of listeners found (i.e., the length of the array), or a
+ * negative value if something went wrong. This is a bit clumsy, because the
+ * caller needs to check the return value *and* the 'replier' value, but there
+ * is only one caller, so...
+ */
+static int kbus_find_listeners(struct kbus_dev *dev,
+ struct kbus_message_binding **listeners[],
+ struct kbus_message_binding **replier,
+ u32 name_len, char *name)
+{
+ int count = 0;
+ int array_size = KBUS_INIT_LISTENER_ARRAY_SIZE;
+ struct kbus_message_binding *ptr;
+ struct kbus_message_binding *next;
+
+ enum kbus_replier_type replier_type = UNSET;
+ enum kbus_replier_type new_replier_type = UNSET;
+
+ kbus_maybe_dbg(dev,
+ " Looking for listeners/repliers for '%.*s'\n",
+ name_len, name);
+
+ *listeners =
+ kmalloc(array_size * sizeof(struct kbus_message_binding *),
+ GFP_KERNEL);
+ if (!(*listeners))
+ return -ENOMEM;
+
+ *replier = NULL;
+
+ list_for_each_entry_safe(ptr, next, &dev->bound_message_list, list) {
+
+ if (!kbus_message_name_matches(name, name_len, ptr->name))
+ continue;
+
+ kbus_maybe_dbg(dev, " Name '%.*s' matches "
+ "'%s' for %s %u\n",
+ name_len, name, ptr->name,
+ ptr->is_replier ? "replier" : "listener",
+ ptr->bound_to_id);
+
+ if (ptr->is_replier) {
+ /* It *may* be the replier for this message */
+ size_t last_char = strlen(ptr->name) - 1;
+ if (ptr->name[last_char] == '*')
+ new_replier_type = WILD_STAR;
+ else if (ptr->name[last_char] == '%')
+ new_replier_type = WILD_PERCENT;
+ else
+ new_replier_type = SPECIFIC;
+
+ kbus_maybe_dbg(dev,
+ " ..previous replier was %u "
+ "(%s), looking at %u (%s)\n",
+ ((*replier) == NULL ? 0 :
+ (*replier)->bound_to_id),
+ kbus_replier_type_name(replier_type),
+ ptr->bound_to_id,
+ kbus_replier_type_name(new_replier_type));
+
+ /*
+ * If this is the first replier, just remember
+ * it. Otherwise, if it's more specific than
+ * our previous replier, remember it instead.
+ */
+ if (*replier == NULL ||
+ new_replier_type > replier_type) {
+
+ if (*replier)
+ kbus_maybe_dbg(dev,
+ " ..new replier %u (%s)\n",
+ ptr->bound_to_id,
+ kbus_replier_type_name(
+ new_replier_type));
+
+ *replier = ptr;
+ replier_type = new_replier_type;
+ } else {
+ if (*replier)
+ kbus_maybe_dbg(dev,
+ " ..keeping replier %u (%s)\n",
+ (*replier)->bound_to_id,
+ kbus_replier_type_name(replier_type));
+ }
+ } else {
+ /* It is a listener */
+ if (count == array_size) {
+ u32 new_size = kbus_next_size(array_size);
+
+ kbus_maybe_dbg(dev, " XXX listener "
+ "array size %d -> %d\n",
+ array_size, new_size);
+
+ array_size = new_size;
+ *listeners = krealloc(*listeners,
+ sizeof(**listeners) * array_size,
+ GFP_KERNEL);
+ if (!(*listeners))
+ return -EFAULT;
+ }
+ (*listeners)[count++] = ptr;
+ }
}
- return 0;
+
+ kbus_maybe_dbg(dev, " Found %d listener%s%s for '%.*s'\n",
+ count, (count == 1 ? "" : "s"),
+ (*replier == NULL ? "" : " and a replier"),
+ name_len, name);
+
+ return count;
}

/*
@@ -248,6 +1437,69 @@ static struct kbus_message_binding
}

/*
+ * Forget any messages (in our queue) that were only in the queue because of
+ * the binding we're removing.
+ *
+ * If the message was a request (needing a reply) generate an appropriate
+ * synthetic message.
+ */
+static void kbus_forget_matching_messages(struct kbus_private_data *priv,
+ struct kbus_message_binding *binding)
+{
+ struct list_head *queue = &priv->message_queue;
+ struct kbus_message_queue_item *ptr;
+ struct kbus_message_queue_item *next;
+
+ kbus_maybe_dbg(priv->dev,
+ " %u Forgetting matching messages\n", priv->id);
+
+ list_for_each_entry_safe(ptr, next, queue, list) {
+ struct kbus_msg *msg = ptr->msg;
+ int is_OUR_request = (KBUS_BIT_WANT_YOU_TO_REPLY & msg->flags);
+
+ /*
+ * If this message was not added to the queue because of this
+ * binding, then we are not interested in it...
+ */
+ if (ptr->binding != binding)
+ continue;
+
+ kbus_maybe_dbg(priv->dev,
+ " Deleting message from queue\n");
+ kbus_maybe_report_message(priv->dev, msg);
+
+ /*
+ * If it wanted a reply (from us). let the sender know it's
+ * going away (but take care not to send a message to
+ * ourselves, by accident!)
+ */
+ if (is_OUR_request && msg->to != priv->id) {
+
+ kbus_maybe_dbg(priv->dev, " >>> is_OUR_request,"
+ " sending fake reply\n");
+ kbus_maybe_report_message(priv->dev, msg);
+ kbus_push_synthetic_message(priv->dev, priv->id,
+ msg->from, msg->id,
+ KBUS_MSG_NAME_REPLIER_UNBOUND);
+ }
+
+ list_del(&ptr->list);
+ kbus_free_message(ptr->msg);
+
+ priv->message_count--;
+
+ /* If that made us go from no-room to some-room, wake up */
+ if (priv->message_count == (priv->max_messages - 1))
+ wake_up_interruptible(&priv->dev->write_wait);
+ }
+
+ kbus_maybe_dbg(priv->dev,
+ " %u Leaving %d message%s in queue\n",
+ priv->id, priv->message_count,
+ priv->message_count == 1 ? "" : "s");
+}
+
+/*
* Remove an existing binding.
*
* Returns 0 if all went well, a negative value if it did not.
@@ -273,9 +1525,15 @@ static int kbus_forget_binding(struct kbus_dev *dev,
(binding->is_replier ? 'R' : 'L'),
binding->name_len, binding->name);

+ /* And forget any messages we now shouldn't receive */
+ kbus_forget_matching_messages(priv, binding);
+
/*
- * If we supported sending messages (yet), we'd need to forget
- * any messages in our queue that match this binding.
+ * We carefully don't try to do anything about requests that
+ * have already been read - the fact that the user has unbound
+ * from receiving new messages with this name doesn't imply
+ * anything about whether they're going to reply to requests
+ * (with that name) which they've already read.
*/

/* And remove the binding once that has been done. */
@@ -358,6 +1616,27 @@ static int kbus_remember_open_ksock(struct kbus_dev *dev,
}

/*
+ * Retrieve the pointer to an open file's data
+ *
+ * Return NULL if we can't find it.
+ */
+static struct kbus_private_data *kbus_find_open_ksock(struct kbus_dev *dev,
+ u32 id)
+{
+ struct kbus_private_data *ptr;
+ struct kbus_private_data *next;
+
+ list_for_each_entry_safe(ptr, next, &dev->open_ksock_list, list) {
+ if (id == ptr->id) {
+ kbus_maybe_dbg(dev, " Found open Ksock %u\n", id);
+ return ptr;
+ }
+ }
+ kbus_maybe_dbg(dev, " Could not find open Ksock %u\n", id);
+ return NULL;
+}
+
+/*
* Remove an open file remembrance.
*
* Returns 0 if all went well, -EINVAL if we couldn't find the open Ksock
@@ -446,38 +1725,820 @@ static int kbus_open(struct inode *inode, struct file *filp)
priv->num_replies_unsent = 0;
priv->max_replies_unsent = 0;

+ if (kbus_init_msg_id_memory(priv)) {
+ kbus_empty_read_msg(priv);
+ kfree(priv);
+ return -EFAULT;
+ }
INIT_LIST_HEAD(&priv->message_queue);
INIT_LIST_HEAD(&priv->replies_unsent);

- (void)kbus_remember_open_ksock(dev, priv);
+ init_waitqueue_head(&priv->read_wait);
+
+ /* Note that we immediately have a space available for a message */
+ wake_up_interruptible(&dev->write_wait);
+
+ (void)kbus_remember_open_ksock(dev, priv);
+
+ filp->private_data = priv;
+
+ mutex_unlock(&dev->mux);
+
+ kbus_maybe_dbg(dev, "%u OPEN\n", priv->id);
+
+ return 0;
+}
+
+static int kbus_release(struct inode *inode __always_unused, struct file *filp)
+{
+ int retval2 = 0;
+ struct kbus_private_data *priv = filp->private_data;
+ struct kbus_dev *dev = priv->dev;
+
+ if (mutex_lock_interruptible(&dev->mux))
+ return -ERESTARTSYS;
+
+ kbus_maybe_dbg(dev, "%u RELEASE\n", priv->id);
+
+ kbus_empty_read_msg(priv);
+ kbus_empty_write_msg(priv);
+
+ kbus_empty_msg_id_memory(priv);
+
+ kbus_empty_message_queue(priv);
+ kbus_forget_my_bindings(priv);
+ kbus_empty_replies_unsent(priv);
+ retval2 = kbus_forget_open_ksock(dev, priv->id);
+ kfree(priv);
+
+ mutex_unlock(&dev->mux);
+
+ return retval2;
+}
+
+/*
+ * Determine the private data for the given listener/replier id.
+ *
+ * Return NULL if we can't find it.
+ */
+static struct kbus_private_data
+*kbus_find_private_data(struct kbus_private_data *our_priv,
+ struct kbus_dev *dev, u32 id)
+{
+ struct kbus_private_data *l_priv;
+ if (id == our_priv->id) {
+ /* Heh, it's us, we know who we are! */
+ kbus_maybe_dbg(dev, " -- Id %u is us\n", id);
+
+ l_priv = our_priv;
+ } else {
+ /* OK, look it up */
+ kbus_maybe_dbg(dev, " -- Looking up id %u\n", id);
+
+ l_priv = kbus_find_open_ksock(dev, id);
+ }
+ return l_priv;
+}
+
+/*
+ * Determine if the specified recipient has room for a message in their queue
+ *
+ * - 'priv' is the recipient
+ * - 'what' is a string describing them (e.g., "sender", "replier"), just
+ * for use in debugging/grumbling
+ * - if 'is_reply' is true, then we're checking for a Reply message,
+ * which we already know is expected by the specified recipient.
+ */
+static int kbus_queue_is_full(struct kbus_private_data *priv,
+ char *what __maybe_unused, int is_reply)
+{
+ /*
+ * When figuring out how "full" the message queue is, we need
+ * to take account of the messages already in the queue (!),
+ * and also the replies that still need to be written to the
+ * queue.
+ *
+ * Of course, if we're checking because we want to send one
+ * of the Replies that we are keeping room for, we need to
+ * remember to account for that!
+ */
+ int already_accounted_for = priv->message_count +
+ priv->outstanding_requests.count;
+
+ if (is_reply)
+ already_accounted_for--;
+
+ kbus_maybe_dbg(priv->dev,
+ " %u Message queue: count %d + "
+ "outstanding %d %s= %d, max %d\n",
+ priv->id, priv->message_count,
+ priv->outstanding_requests.count,
+ (is_reply ? "-1 " : ""), already_accounted_for,
+ priv->max_messages);
+
+ if (already_accounted_for < priv->max_messages) {
+ return false;
+ } else {
+ kbus_maybe_dbg(priv->dev,
+ " Message queue for %s %u is full"
+ " (%u+%u%s > %u messages)\n", what, priv->id,
+ priv->message_count,
+ priv->outstanding_requests.count,
+ (is_reply ? "-1" : ""), priv->max_messages);
+ return true;
+ }
+}
+
+/*
+ * Actually write to anyone interested in this message.
+ *
+ * Remember that the caller is going to free the message data after
+ * calling us, on the assumption that we're taking a copy...
+ *
+ * Returns 0 on success.
+ *
+ * If the message is a Request, and there is no replier for it, then we return
+ * -EADDRNOTAVAIL.
+ *
+ * If the message is a Reply, and the is sender is no longer connected (it has
+ * released its Ksock), then we return -EADDRNOTAVAIL.
+ *
+ * If the message couldn't be sent because some of the targets (those that we
+ * *have* to deliver to) had full queues, then it will return -EAGAIN or
+ * -EBUSY. If -EAGAIN is returned, then the caller should try again later, if
+ * -EBUSY then it should not.
+ *
+ * Otherwise, it returns a negative value for error.
+ */
+static int kbus_write_to_recipients(struct kbus_private_data *priv,
+ struct kbus_dev *dev,
+ struct kbus_msg *msg)
+{
+ struct kbus_message_binding **listeners = NULL;
+ struct kbus_message_binding *replier = NULL;
+ struct kbus_private_data *reply_to = NULL;
+ ssize_t retval = 0;
+ int num_listeners;
+ int ii;
+ int num_sent = 0; /* # successfully "sent" */
+
+ int all_or_fail = msg->flags & KBUS_BIT_ALL_OR_FAIL;
+ int all_or_wait = msg->flags & KBUS_BIT_ALL_OR_WAIT;
+
+ kbus_maybe_dbg(priv->dev, " all_or_fail %d, all_or_wait %d\n",
+ all_or_fail, all_or_wait);
+
+ /*
+ * Remember that
+ * (a) a listener may occur more than once in our array, and
+ * (b) we have 0 or 1 repliers, but
+ * (c) the replier is *not* one of the listeners.
+ */
+ num_listeners = kbus_find_listeners(dev, &listeners, &replier,
+ msg->name_len, msg->name_ref->name);
+ if (num_listeners < 0) {
+ kbus_maybe_dbg(priv->dev,
+ " Error %d finding listeners\n",
+ num_listeners);
+
+ retval = num_listeners;
+ goto done_sending;
+ }
+
+ /*
+ * In general, we don't mind if no-one is listening, but
+ *
+ * a. If we want a reply, we want there to be a replier
+ * b. If we *are* a reply, we want there to be an original sender
+ * c. If we have the "to" field set, and we want a reply, then we
+ * want that specific replier to exist
+ *
+ * We can check the first of those immediately.
+ */
+
+ if (msg->flags & KBUS_BIT_WANT_A_REPLY && replier == NULL) {
+ kbus_maybe_dbg(priv->dev,
+ " Message wants a reply, "
+ "but no replier\n");
+ retval = -EADDRNOTAVAIL;
+ goto done_sending;
+ }
+
+ /* And we need to add it to the queue for each interested party */
+
+ /*
+ * ===================================================================
+ * Check if the proposed recipients *can* receive
+ * ===================================================================
+ */
+
+ /*
+ * Are we replying to a sender's request?
+ * Replies are unusual in that the recipient will not normally have
+ * bound to the appropriate message name.
+ */
+ if (kbus_message_is_reply(msg)) {
+ kbus_maybe_dbg(priv->dev,
+ " Considering sender-of-request %u\n",
+ msg->to);
+
+ reply_to = kbus_find_private_data(priv, dev, msg->to);
+ if (reply_to == NULL) {
+ kbus_maybe_dbg(priv->dev,
+ " Can't find sender-of-request"
+ " %u\n", msg->to);
+
+ /* We can't find the original Sender */
+ retval = -EADDRNOTAVAIL;
+ goto done_sending;
+ }
+
+ /* Are they expecting this reply? */
+ if (kbus_find_msg_id(reply_to, &msg->in_reply_to)) {
+ /* No, so we aren't allowed to send it */
+ retval = -ECONNREFUSED;
+ goto done_sending;
+ }
+
+ if (kbus_queue_is_full(reply_to, "sender-of-request", true)) {
+ if (all_or_wait)
+ retval = -EAGAIN; /* try again later */
+ else
+ retval = -EBUSY;
+ goto done_sending;
+ }
+ }
+
+ /* Repliers only get request messages */
+ if (replier && !(msg->flags & KBUS_BIT_WANT_A_REPLY))
+ replier = NULL;
+
+ /*
+ * And even then, only if they have room in their queue
+ * Note that it is *always* fatal (to this send) if we can't
+ * add a Request to a Replier's queue -- we just need to figure
+ * out what sort of error to return
+ */
+ if (replier) {
+ kbus_maybe_dbg(priv->dev, " Considering replier %u\n",
+ replier->bound_to_id);
+ /*
+ * If the 'to' field was set, then we only want to send it if
+ * it is *that* specific replier (and otherwise we want to fail
+ * with "that's the wrong person for this (stateful) request").
+ */
+ if (msg->to && (replier->bound_to_id != msg->to)) {
+
+ kbus_maybe_dbg(priv->dev, " ..Request to %u,"
+ " but replier is %u\n", msg->to,
+ replier->bound_to_id);
+
+ retval = -EPIPE; /* Well, sort of */
+ goto done_sending;
+ }
+
+ if (kbus_queue_is_full(replier->bound_to, "replier", false)) {
+ if (all_or_wait)
+ retval = -EAGAIN; /* try again later */
+ else
+ retval = -EBUSY;
+ goto done_sending;
+ }
+ }
+
+ for (ii = 0; ii < num_listeners; ii++) {
+
+ kbus_maybe_dbg(priv->dev, " Considering listener %u\n",
+ listeners[ii]->bound_to_id);
+
+ if (kbus_queue_is_full
+ (listeners[ii]->bound_to, "listener", false)) {
+ if (all_or_wait) {
+ retval = -EAGAIN; /* try again later */
+ goto done_sending;
+ } else if (all_or_fail) {
+ retval = -EBUSY;
+ goto done_sending;
+ } else {
+ /* For now, just ignore *this* listener */
+ listeners[ii] = NULL;
+ continue;
+ }
+ }
+ }
+
+ /*
+ * ===================================================================
+ * Actually send the messages
+ * ===================================================================
+ */
+
+ /*
+ * Remember that kbus_push_message takes a copy of the message for us.
+ *
+ * This is inefficient, since otherwise we could keep a single copy of
+ * the message (or at least the message header) and just bump a
+ * reference count for each "use" of the message name/data.
+ *
+ * However, it also allows us to easily set the "needs a reply" flag
+ * (and associated data) when sending a "needs a reply" message to a
+ * replier, and *unset* the same when sending said message to "just"
+ * listeners...
+ *
+ * Be careful if altering this...
+ */
+
+ /*
+ * We know that kbus_push_message() can return 0 or -EFAULT.
+ * It seems sensible to treat that latter as a "local" error, as it
+ * means that our internals have gone wrong. Thus we don't need to
+ * generate a message for it.
+ */
+
+ /* If it's a reply message and we've got someone to reply to, send it */
+ if (reply_to) {
+ retval = kbus_push_message(reply_to, msg, NULL, true);
+ if (retval == 0) {
+ num_sent++;
+ /*
+ * In which case, we *have* sent this reply,
+ * and can forget about needing to do so
+ * (there's not much we can do with an error
+ * in this, so just ignore it)
+ */
+ (void)kbus_reply_now_sent(priv, &msg->in_reply_to);
+ } else {
+ goto done_sending;
+ }
+ }
+
+ /* If it's a request, and we've got a replier for it, send it */
+ if (replier) {
+ retval =
+ kbus_push_message(replier->bound_to, msg, replier, true);
+ if (retval)
+ goto done_sending;
+
+ num_sent++;
+ /* And we'll need a reply for that, thank you */
+ retval = kbus_remember_msg_id(priv, &msg->id);
+ if (retval)
+ /*
+ * Out of memory - what *can* we do?
+ * (basically, nothing, it's all gone horribly
+ * wrong)
+ */
+ goto done_sending;
+ }
+
+ /* For each listener, if they're still interested, send it */
+ for (ii = 0; ii < num_listeners; ii++) {
+ struct kbus_message_binding *listener = listeners[ii];
+ if (listener) {
+ retval = kbus_push_message(listener->bound_to, msg,
+ listener, false);
+ if (retval == 0)
+ num_sent++;
+ else
+ goto done_sending;
+ }
+ }
+
+ retval = 0;
+
+done_sending:
+ kfree(listeners);
+ return retval;
+}
+
+/*
+ * Handle moving over the next chunk of data bytes from the user.
+ */
+static int kbus_write_data_parts(struct kbus_private_data *priv,
+ const char __user *buf,
+ size_t buf_pos, size_t bytes_to_use)
+{
+ struct kbus_write_msg *this = &(priv->write);
+
+ u32 num_parts = this->ref_data->num_parts;
+ size_t local_count = bytes_to_use;
+ size_t local_buf_pos = 0;
+
+ while (local_count) {
+ unsigned ii = this->ref_data_index;
+ unsigned this_part_len;
+ size_t sofar, needed, to_use;
+
+ unsigned *lengths = this->ref_data->lengths;
+ unsigned long *parts = this->ref_data->parts;
+
+ if (ii == num_parts - 1)
+ this_part_len = this->ref_data->last_page_len;
+ else
+ this_part_len = KBUS_PART_LEN;
+
+ sofar = lengths[ii];
+
+ needed = this_part_len - sofar;
+ to_use = min(needed, local_count);
+
+ if (copy_from_user((char *)parts[ii] + sofar,
+ buf + buf_pos + local_buf_pos, to_use)) {
+ dev_err(priv->dev->dev, "copy from data failed"
+ " (part %d: %u of %u to %p + %u)\n",
+ this->ref_data_index,
+ (unsigned)to_use, (unsigned)local_count,
+ (void *)parts[ii], (unsigned)sofar);
+ return -EFAULT;
+ }
+
+ lengths[ii] += to_use;
+ local_count -= to_use;
+ local_buf_pos += to_use;
+
+ if (lengths[ii] == this_part_len) {
+ /* This part is full */
+ this->ref_data_index++;
+ }
+ }
+ return 0;
+}
+
+/*
+ * Handle moving over the next chunk of bytes from the user to our message.
+ *
+ * 'buf' is the buffer of data the user gave us.
+ *
+ * 'buf_pos' is the offset in that buffer from which we are to take bytes.
+ * We alter that by how many bytes we do take.
+ *
+ * 'count' is the number of bytes we're still to take from 'buf'. We also
+ * alter 'count' by how many bytes we do take (downwards).
+ */
+static int kbus_write_parts(struct kbus_private_data *priv,
+ const char __user *buf,
+ size_t *buf_pos, size_t *count)
+{
+ struct kbus_write_msg *this = &(priv->write);
+ ssize_t retval = 0;
+
+ size_t bytes_needed; /* ...to fill the current part */
+ size_t bytes_to_use; /* ...from the user's data */
+
+ struct kbus_msg *msg = this->msg;
+ struct kbus_message_header *user_msg =
+ (struct kbus_message_header *)&this->user_msg;
+
+ if (this->is_finished) {
+ dev_err(priv->dev->dev, "pid %u [%s]"
+ " Attempt to write data after the end guard in a"
+ " message (%u extra byte%s) - did you forget to"
+ " 'send'?\n",
+ current->pid, current->comm,
+ (unsigned)*count, *count == 1 ? "" : "s");
+ return -EMSGSIZE;
+ }
+
+ switch (this->which) {
+
+ case KBUS_PART_HDR:
+ bytes_needed = sizeof(*user_msg) - this->pos;
+ bytes_to_use = min(bytes_needed, *count);
+
+ if (copy_from_user((char *)user_msg + this->pos,
+ buf + *buf_pos, bytes_to_use)) {
+ dev_err(priv->dev->dev,
+ "copy from user failed (msg hdr: "
+ "%u of %u to %p + %u)\n",
+ (unsigned)bytes_to_use, (unsigned)*count, msg,
+ this->pos);
+ return -EFAULT;
+ }
+ if (bytes_needed == bytes_to_use) {
+ /*
+ * At this point, we can check the message header makes
+ * sense
+ */
+ retval = kbus_check_message_written(priv->dev, this);
+ if (retval)
+ return retval;
+
+ msg->id = user_msg->id;
+ msg->in_reply_to = user_msg->in_reply_to;
+ msg->to = user_msg->to;
+ msg->from = user_msg->from;
+ msg->orig_from = user_msg->orig_from;
+ msg->final_to = user_msg->final_to;
+ msg->extra = user_msg->extra;
+ msg->flags = user_msg->flags;
+ msg->name_len = user_msg->name_len;
+ msg->data_len = user_msg->data_len;
+ /* Leaving msg->name|data_ref still unset */
+
+ this->user_name_ptr = user_msg->name;
+ this->user_data_ptr = user_msg->data;
+
+ if (user_msg->name)
+ /*
+ * If we're reading a "pointy" message header,
+ * then that's all we need - we shan't try to
+ * copy the message name and any data until the
+ * user says to SEND.
+ */
+ this->is_finished = true;
+ else
+ this->pointers_are_local = true;
+ }
+ break;
+
+ case KBUS_PART_NAME:
+ if (this->ref_name == NULL) {
+ char *name = kmalloc(msg->name_len + 1, GFP_KERNEL);
+ if (!name) {
+ dev_err(priv->dev->dev,
+ "Cannot kmalloc message name\n");
+ return -ENOMEM;
+ }
+ name[msg->name_len] = 0; /* always */
+ name[0] = 0; /* we don't know the name yet */
+ this->ref_name = kbus_wrap_name_in_ref(name);
+ if (!this->ref_name) {
+ kfree(name);
+ dev_err(priv->dev->dev,
+ "Cannot kmalloc ref to message name\n");
+ return -ENOMEM;
+ }
+ }
+ bytes_needed = msg->name_len - this->pos;
+ bytes_to_use = min(bytes_needed, *count);
+
+ if (copy_from_user(this->ref_name->name + this->pos,
+ buf + *buf_pos, bytes_to_use)) {
+ dev_err(priv->dev->dev, "copy from user failed"
+ " (name: %d of %d to %p + %u)\n",
+ (unsigned)bytes_to_use, (unsigned)*count,
+ this->ref_name->name, this->pos);
+ return -EFAULT;
+ }
+ if (bytes_needed == bytes_to_use) {
+ /*
+ * We can check the name now it is in kernel space - we
+ * want to do this before we sort out the data, since
+ * that can involve a *lot* of copying...
+ */
+ if (kbus_invalid_message_name(priv->dev,
+ this->ref_name->name,
+ msg->name_len))
+ return -EBADMSG;
+
+ this->msg->name_ref = this->ref_name;
+ this->ref_name = NULL;
+ }
+ break;
+
+ case KBUS_PART_NPAD:
+ bytes_needed = KBUS_PADDED_NAME_LEN(msg->name_len) -
+ msg->name_len - this->pos;
+ bytes_to_use = min(bytes_needed, *count);
+ break;
+
+ case KBUS_PART_DATA:
+ if (msg->data_len == 0) {
+ bytes_needed = 0;
+ bytes_to_use = 0;
+ break;
+ }
+ if (this->ref_data == NULL) {
+ if (kbus_alloc_ref_data(priv, msg->data_len,
+ &this->ref_data))
+ return -ENOMEM;
+ this->ref_data_index = 0; /* current part index */
+ }
+ /* Overall, how far are we through the message's data? */
+ bytes_needed = msg->data_len - this->pos;
+ bytes_to_use = min(bytes_needed, *count);
+ /* So let's add 'bytes_to_use' bytes to our message data */
+ retval = kbus_write_data_parts(priv, buf, *buf_pos,
+ bytes_to_use);
+ if (retval) {
+ kbus_lower_data_ref(this->ref_data);
+ this->ref_data = NULL;
+ return retval;
+ }
+ if (bytes_needed == bytes_to_use) {
+ /* Hooray - we've finished our data */
+ this->msg->data_ref = this->ref_data;
+ this->ref_data = NULL;
+ }
+ break;
+
+ case KBUS_PART_DPAD:
+ bytes_needed = KBUS_PADDED_DATA_LEN(msg->data_len) -
+ msg->data_len - this->pos;
+ bytes_to_use = min(bytes_needed, *count);
+ break;
+
+ case KBUS_PART_FINAL_GUARD:
+ bytes_needed = 4 - this->pos;
+ bytes_to_use = min(bytes_needed, *count);
+ if (copy_from_user((char *)(&this->guard) + this->pos,
+ buf + *buf_pos, bytes_to_use)) {
+ dev_err(priv->dev->dev, "copy from user failed"
+ " (final guard: %u of %u to %p + %u)\n",
+ (unsigned)bytes_to_use, (unsigned)*count,
+ &this->guard, this->pos);
+ return -EFAULT;
+ }
+ if (bytes_needed == bytes_to_use) {
+ if (this->guard != KBUS_MSG_END_GUARD) {
+ dev_err(priv->dev->dev, "pid %u [%s]"
+ " (entire) message end guard is "
+ "%08x, not %08x\n",
+ current->pid, current->comm,
+ this->guard, KBUS_MSG_END_GUARD);
+ return -EINVAL;
+ }
+ this->is_finished = true;
+ }
+ break;
+
+ default:
+ dev_err(priv->dev->dev, "Internal error in write: unexpected"
+ " message part %d\n", this->which);
+ return -EFAULT; /* what *should* it be? */
+ }
+
+ *count -= bytes_to_use;
+ *buf_pos += bytes_to_use;
+
+ if (bytes_needed == bytes_to_use) {
+ this->which++;
+ this->pos = 0;
+ } else {
+ this->pos += bytes_to_use;
+ }
+ return 0;
+}
+
+static ssize_t kbus_write(struct file *filp, const char __user *buf,
+ size_t count, loff_t *f_pos __maybe_unused)
+{
+ struct kbus_private_data *priv = filp->private_data;
+ struct kbus_dev *dev = priv->dev;
+ ssize_t retval = 0;
+ size_t bytes_left = count;
+ size_t buf_pos = 0;
+
+ struct kbus_write_msg *this = &priv->write;
+
+ if (mutex_lock_interruptible(&dev->mux))
+ return -EAGAIN;
+
+ kbus_maybe_dbg(priv->dev, "%u WRITE count %u, pos %d\n",
+ priv->id, (unsigned)count, (int)*f_pos);
+
+ /*
+ * If we've already started to try sending a message, we don't
+ * want to continue appending to it
+ */
+ if (priv->sending) {
+ retval = -EALREADY;
+ goto done;
+ }
+
+ if (this->msg == NULL) {
+ /* Clearly, the start of a new message */
+ memset(this, 0, sizeof(*this));

- filp->private_data = priv;
+ /* This is the new (internal) message we're preparing */
+ this->msg = kmalloc(sizeof(*(this->msg)), GFP_KERNEL);
+ if (!this->msg) {
+ retval = -ENOMEM;
+ goto done;
+ }
+ memset(this->msg, 0, sizeof(*(this->msg)));
+ }

- mutex_unlock(&dev->mux);
+ while (bytes_left) {
+ retval = kbus_write_parts(priv, buf, &buf_pos, &bytes_left);
+ if (retval)
+ goto done;
+ }

- kbus_maybe_dbg(dev, "%u OPEN\n", priv->id);
+done:
+ kbus_maybe_dbg(priv->dev, "%u WRITE ends with retval %d\n",
+ priv->id, (int)retval);

- return 0;
+ if (retval)
+ kbus_empty_write_msg(priv);
+ mutex_unlock(&dev->mux);
+ if (retval)
+ return retval;
+ else
+ return count;
}

-static int kbus_release(struct inode *inode __always_unused, struct file *filp)
+static ssize_t kbus_read(struct file *filp, char __user *buf, size_t count,
+ loff_t *f_pos __maybe_unused)
{
- int retval2 = 0;
struct kbus_private_data *priv = filp->private_data;
struct kbus_dev *dev = priv->dev;
+ struct kbus_read_msg *this = &(priv->read);
+ ssize_t retval = 0;
+ u32 len, left;
+ u32 which = this->which;

if (mutex_lock_interruptible(&dev->mux))
- return -ERESTARTSYS;
+ return -EAGAIN; /* Just try again later */

- kbus_maybe_dbg(dev, "%u RELEASE\n", priv->id);
+ kbus_maybe_dbg(priv->dev, "%u READ count %u, pos %d\n",
+ priv->id, (unsigned)count, (int)*f_pos);

- kbus_forget_my_bindings(priv);
- retval2 = kbus_forget_open_ksock(dev, priv->id);
- kfree(priv);
+ if (this->msg == NULL) {
+ /* No message to read at the moment */
+ kbus_maybe_dbg(priv->dev, " Nothing to read\n");
+ retval = 0;
+ goto done;
+ }

- mutex_unlock(&dev->mux);
+ /*
+ * Read each of the parts of a message until we're read 'count'
+ * characters, or run off the end of the message.
+ */
+ while (which < KBUS_NUM_PARTS && count > 0) {
+ if (this->lengths[which] == 0) {
+ kbus_maybe_dbg(priv->dev,
+ " xx which %d, read_len[%d] %u\n",
+ which, which, this->lengths[which]);
+ this->pos = 0;
+ which++;
+ continue;
+ }

- return retval2;
+ if (which == KBUS_PART_DATA) {
+ struct kbus_data_ptr *dp = this->msg->data_ref;
+
+ left = dp->lengths[this->ref_data_index] - this->pos;
+ len = min(left, (u32) count);
+ if (len) {
+ if (copy_to_user(buf,
+ (void *)
+ dp->parts[this->ref_data_index]
+ + this->pos, len)) {
+ dev_err(priv->dev->dev,
+ "error reading from %u\n",
+ priv->id);
+ retval = -EFAULT;
+ goto done;
+ }
+ buf += len;
+ retval += len;
+ count -= len;
+ this->pos += len;
+ }
+
+ if (this->pos == dp->lengths[this->ref_data_index]) {
+ this->pos = 0;
+ this->ref_data_index++;
+ }
+ if (this->ref_data_index == dp->num_parts) {
+ this->pos = 0;
+ which++;
+ }
+ } else {
+ left = this->lengths[which] - this->pos;
+ len = min(left, (u32) count);
+ if (len) {
+ if (copy_to_user(buf,
+ this->parts[which] + this->pos,
+ len)) {
+ dev_err(priv->dev->dev,
+ "error reading from %u\n",
+ priv->id);
+ retval = -EFAULT;
+ goto done;
+ }
+ buf += len;
+ retval += len;
+ count -= len;
+ this->pos += len;
+ }
+
+ if (this->pos == this->lengths[which]) {
+ this->pos = 0;
+ which++;
+ }
+ }
+ }
+
+ if (which < KBUS_NUM_PARTS)
+ this->which = which;
+ else
+ kbus_empty_read_msg(priv);
+
+done:
+ mutex_unlock(&dev->mux);
+ return retval;
}

static int kbus_bind(struct kbus_private_data *priv,
@@ -651,12 +2712,507 @@ done:
return retval;
}

+/*
+ * Make the next message ready for reading by the user.
+ *
+ * Returns 0 if there is no next message, 1 if there is, and a negative value
+ * if there's an error.
+ */
+static int kbus_nextmsg(struct kbus_private_data *priv,
+ unsigned long arg)
+{
+ int retval = 0;
+ struct kbus_msg *msg;
+ struct kbus_read_msg *this = &(priv->read);
+ struct kbus_message_header *user_msg;
+
+ kbus_maybe_dbg(priv->dev, "%u NEXTMSG\n", priv->id);
+
+ /* If we were partway through a message, lose it */
+ if (this->msg) {
+ kbus_maybe_dbg(priv->dev, " Dropping partial message\n");
+ kbus_empty_read_msg(priv);
+ }
+
+ /* Have we got a next message? */
+ msg = kbus_pop_message(priv);
+ if (msg == NULL) {
+ kbus_maybe_dbg(priv->dev, " No next message\n");
+ /*
+ * A return value of 0 means no message, and that's
+ * what __put_user returns for success.
+ */
+ return __put_user(0, (u32 __user *) arg);
+ }
+
+ user_msg = (struct kbus_message_header *)&this->user_hdr;
+ user_msg->start_guard = KBUS_MSG_START_GUARD;
+ user_msg->id = msg->id;
+ user_msg->in_reply_to = msg->in_reply_to;
+ user_msg->to = msg->to;
+ user_msg->from = msg->from;
+ user_msg->orig_from = msg->orig_from;
+ user_msg->final_to = msg->final_to;
+ user_msg->extra = msg->extra;
+ user_msg->flags = msg->flags;
+ user_msg->name_len = msg->name_len;
+ user_msg->data_len = msg->data_len;
+ user_msg->name = NULL;
+ user_msg->data = NULL;
+ user_msg->end_guard = KBUS_MSG_END_GUARD;
+
+ this->msg = msg; /* Remember it so we can free it later */
+
+ this->parts[KBUS_PART_HDR] = (char *)user_msg;
+ this->parts[KBUS_PART_NAME] = msg->name_ref->name;
+ /* direct to the string */
+
+ this->parts[KBUS_PART_NPAD] = static_zero_padding;
+
+ /* The data is treated specially - see kbus_read() */
+ this->parts[KBUS_PART_DATA] = (char *)msg->data_ref;
+
+ this->parts[KBUS_PART_DPAD] = static_zero_padding;
+ this->parts[KBUS_PART_FINAL_GUARD] = (char *)&static_end_guard;
+
+ this->lengths[KBUS_PART_HDR] = sizeof(*user_msg);
+ this->lengths[KBUS_PART_NAME] = msg->name_len;
+ this->lengths[KBUS_PART_NPAD] =
+ KBUS_PADDED_NAME_LEN(msg->name_len) - msg->name_len;
+
+ /* The data is treated specially - see kbus_read() */
+ this->lengths[KBUS_PART_DATA] = msg->data_len;
+ this->lengths[KBUS_PART_DPAD] =
+ KBUS_PADDED_DATA_LEN(msg->data_len) - msg->data_len;
+
+ this->lengths[KBUS_PART_FINAL_GUARD] = 4;
+
+ /* And we'll be starting by writing out the first thing first */
+ this->which = 0;
+ this->pos = 0;
+ this->ref_data_index = 0;
+
+ /*
+ * If the message is a request (to us), then this is the approriate
+ * point to add it to our list of "requests we've read but not yet
+ * replied to" -- although that *sounds* as if we should be doing it in
+ * kbus_read, we might never get round to reading the content of the
+ * message (we might call NEXTMSG again, or DISCARD), and also
+ * kbus_read can get called multiple times for a single message body.
+ * If we do our remembering here, then we guarantee to get one memory
+ * for each request, as it leaves the message queue and is (in whatever
+ * way) dealt with.
+ */
+ if (msg->flags & KBUS_BIT_WANT_YOU_TO_REPLY) {
+ retval = kbus_reply_needed(priv, msg);
+ /* If it couldn't malloc, there's not much we can do,
+ * it's fairly fatal */
+ if (retval)
+ return retval;
+ }
+
+ retval = __put_user(KBUS_ENTIRE_MSG_LEN(msg->name_len, msg->data_len),
+ (u32 __user *) arg);
+ if (retval)
+ return retval;
+ return 1; /* We had a message */
+}
+
/* How much of the current message is left to read? */
extern u32 kbus_lenleft(struct kbus_private_data *priv)
{
+ struct kbus_read_msg *this = &(priv->read);
+ if (this->msg) {
+ int ii, jj;
+ u32 sofar = 0;
+ u32 total = KBUS_ENTIRE_MSG_LEN(this->msg->name_len,
+ this->msg->data_len);
+ /* Add up the items we're read all of, so far */
+ for (ii = 0; ii < this->which; ii++) {
+ if (this->which == KBUS_PART_DATA &&
+ this->msg->data_len > 0) {
+ struct kbus_data_ptr *dp = this->msg->data_ref;
+ for (jj = 0; jj < this->ref_data_index; jj++)
+ sofar += dp->lengths[jj];
+ if (this->ref_data_index < dp->num_parts)
+ sofar += this->pos;
+ } else {
+ sofar += this->lengths[ii];
+ }
+ }
+ /* Plus what we're read of the last one */
+ if (this->which < KBUS_NUM_PARTS) {
+ if (this->which == KBUS_PART_DATA &&
+ this->msg->data_len > 0) {
+ struct kbus_data_ptr *dp = this->msg->data_ref;
+ for (jj = 0; jj < this->ref_data_index; jj++)
+ sofar += dp->lengths[jj];
+ if (this->ref_data_index < dp->num_parts)
+ sofar += this->pos;
+ } else {
+ sofar += this->pos;
+ }
+ }
+ return total - sofar;
+ }
return 0; /* no message => nothing to read */
}

+/*
+ * Allocate the data arrays we need to hold reference-counted data, possibly
+ * spread over multiple pages. 'data_len' is from the message header.
+ *
+ * Note that the 'lengths[n]' field to each page 'n' will be set to zero.
+ */
+static int kbus_alloc_ref_data(struct kbus_private_data *priv __maybe_unused,
+ u32 data_len,
+ struct kbus_data_ptr **ret_ref_data)
+{
+ int num_parts = 0;
+ unsigned long *parts = NULL;
+ unsigned *lengths = NULL;
+ unsigned last_page_len = 0;
+ struct kbus_data_ptr *ref_data = NULL;
+ int as_pages;
+ int ii;
+
+ *ret_ref_data = NULL;
+
+ num_parts = (data_len + KBUS_PART_LEN - 1) / KBUS_PART_LEN;
+
+ /*
+ * To save recalculating the length of the last page every time
+ * we're interested, get it right once and for all.
+ */
+ last_page_len = data_len - (num_parts - 1) * KBUS_PART_LEN;
+
+ kbus_maybe_dbg(priv->dev,
+ "%u Allocate ref data: part=%lu, "
+ "threshold=%lu, data_len %u -> num_parts %d\n",
+ priv->id, KBUS_PART_LEN,
+ KBUS_PAGE_THRESHOLD, data_len, num_parts);
+
+ parts = kmalloc(sizeof(*parts) * num_parts, GFP_KERNEL);
+ if (!parts)
+ return -ENOMEM;
+ lengths = kmalloc(sizeof(*lengths) * num_parts, GFP_KERNEL);
+ if (!lengths) {
+ kfree(parts);
+ return -ENOMEM;
+ }
+
+ if (num_parts == 1 && data_len < KBUS_PAGE_THRESHOLD) {
+ /* A single part in "simple" memory */
+ as_pages = false;
+ parts[0] = (unsigned long)kmalloc(data_len, GFP_KERNEL);
+ if (!parts[0]) {
+ kfree(lengths);
+ kfree(parts);
+ return -ENOMEM;
+ }
+ lengths[0] = 0;
+ } else {
+ /*
+ * One or more pages
+ *
+ * For simplicity, we make all of our pages be full pages.
+ * In theory, we could use the same rules for the last page
+ * as we do if we only have a single page - but for the
+ * moment, we're not bothering.
+ *
+ * This means that the 'last_page_len' is strictly theoretical
+ * for the moment...
+ */
+ as_pages = true;
+ for (ii = 0; ii < num_parts; ii++) {
+ parts[ii] = __get_free_page(GFP_KERNEL);
+ if (!parts[ii]) {
+ int jj;
+ for (jj = 0; jj < ii; jj++)
+ free_page(parts[jj]);
+ kfree(lengths);
+ kfree(parts);
+ return -ENOMEM;
+ }
+ lengths[ii] = 0;
+ }
+ }
+ ref_data = kbus_wrap_data_in_ref(as_pages, num_parts, parts, lengths,
+ last_page_len);
+ if (!ref_data) {
+ int jj;
+ if (as_pages)
+ for (jj = 0; jj < num_parts; jj++)
+ free_page(parts[jj]);
+ else
+ kfree((void *)parts[0]);
+ kfree(lengths);
+ kfree(parts);
+ return -ENOMEM;
+ }
+ *ret_ref_data = ref_data;
+ return 0;
+}
+
+/*
+ * Does what it says on the box - take the user data and promote it to kernel
+ * space, as a reference counted quantity, possibly spread over multiple pages.
+ */
+static int kbus_wrap_user_data(struct kbus_private_data *priv,
+ u32 data_len,
+ void *user_data_ptr,
+ struct kbus_data_ptr **new_data)
+{
+ struct kbus_data_ptr *ref_data = NULL;
+ int num_parts;
+ unsigned long *parts;
+ unsigned *lengths;
+ int ii;
+ uint8_t __user *data_ptr;
+
+ int retval = kbus_alloc_ref_data(priv, data_len, &ref_data);
+ if (retval)
+ return retval;
+
+ num_parts = ref_data->num_parts;
+ lengths = ref_data->lengths;
+ parts = ref_data->parts;
+
+ kbus_maybe_dbg(priv->dev, " @@ copying %s\n",
+ ref_data->as_pages ? "as pages" : "as kmalloc'ed data");
+
+ /* Given all of the *space* for our data, populate it */
+ data_ptr = (void __user *) user_data_ptr;
+ for (ii = 0; ii < num_parts; ii++) {
+ unsigned len;
+ if (ii == num_parts - 1)
+ len = ref_data->last_page_len;
+ else
+ len = KBUS_PART_LEN;
+
+ kbus_maybe_dbg(priv->dev,
+ " @@ %d: copy %d bytes "
+ "from user address %lu\n",
+ ii, len, parts[ii]);
+
+ if (copy_from_user((void *)parts[ii], data_ptr, len)) {
+ kbus_lower_data_ref(ref_data);
+ return -EFAULT;
+ }
+ lengths[ii] = len;
+ data_ptr += len;
+ }
+ *new_data = ref_data;
+ return 0;
+}
+
+/*
+ * Given a "pointy" message header, copy the message name and data from
+ * user space into kernel space.
+ *
+ * The message name is copied as a reference-counted string.
+ *
+ * The message data (if any) is copied as reference-counted data.
+ *
+ * Also checks the legality of the message name, since we need the name in
+ * kernel space to do that, but prefer to do the check before copying any
+ * data (which can be expensive).
+ */
+static int kbus_copy_pointy_parts(struct kbus_private_data *priv,
+ struct kbus_write_msg *this)
+{
+ struct kbus_msg *msg = this->msg;
+ char *new_name = NULL;
+ struct kbus_name_ptr *name_ref;
+ struct kbus_data_ptr *new_data = NULL;
+
+ /* First, let's deal with the name */
+ new_name = kmalloc(msg->name_len + 1, GFP_KERNEL);
+ if (!new_name)
+ return -ENOMEM;
+ if (copy_from_user
+ (new_name, (void __user *)this->user_name_ptr, msg->name_len + 1)) {
+ kfree(new_name);
+ return -EFAULT;
+ }
+
+ /*
+ * We can check the name now it is in kernel space - we want
+ * to do this before we sort out the data, since that can involve
+ * a *lot* of copying...
+ */
+ if (kbus_invalid_message_name(priv->dev, new_name, msg->name_len)) {
+ kfree(new_name);
+ return -EBADMSG;
+ }
+ name_ref = kbus_wrap_name_in_ref(new_name);
+ if (!name_ref) {
+ kfree(new_name);
+ return -ENOMEM;
+ }
+
+ /* Now for the data. */
+ if (msg->data_len) {
+ int retval = kbus_wrap_user_data(priv, msg->data_len,
+ this->user_data_ptr,
+ &new_data);
+ if (retval) {
+ kbus_lower_name_ref(name_ref);
+ return retval;
+ }
+ }
+
+ kbus_maybe_dbg(priv->dev, " 'pointy' message normalised\n");
+
+ msg->name_ref = name_ref;
+ msg->data_ref = new_data;
+
+ this->user_name_ptr = NULL;
+ this->user_data_ptr = NULL;
+ this->pointers_are_local = true;
+
+ return 0;
+}
+
+static void kbus_discard(struct kbus_private_data *priv)
+{
+ kbus_empty_write_msg(priv);
+ priv->sending = false;
+}
+
+/*
+ * Returns 0 for success, and a negative value if there's an error.
+ */
+static int kbus_send(struct kbus_private_data *priv,
+ struct kbus_dev *dev, unsigned long arg)
+{
+ ssize_t retval = 0;
+ struct kbus_msg *msg = priv->write.msg;
+
+ kbus_maybe_dbg(priv->dev, "%u SEND\n", priv->id);
+
+ if (priv->write.msg == NULL)
+ return -ENOMSG;
+
+ if (!priv->write.is_finished) {
+ dev_err(priv->dev->dev, "pid %u [%s]"
+ " message not finished (in part %d of message)\n",
+ current->pid, current->comm, priv->write.which);
+ retval = -EINVAL;
+ goto done;
+ }
+
+ /*
+ * Users are not allowed to send messages marked as "synthetic"
+ * (since, after all, if the user sends it, it is not). However,
+ * it's possible that, in good faith, they re-sent a synthetic
+ * message that they received earlier, so we shall take care to
+ * unset the bit, if necessary.
+ */
+ if (KBUS_BIT_SYNTHETIC & msg->flags)
+ msg->flags &= ~KBUS_BIT_SYNTHETIC;
+
+ /*
+ * The "extra" field is reserved for future expansion, so for the
+ * moment we always zero it (this stops anyone from trying to take
+ * advantage of it, and getting caught out when we decide WE want it)
+ */
+ msg->extra = 0;
+
+ /*
+ * The message header is already in kernel space (thanks to kbus_write),
+ * but if it's a "pointy" message, the name and data are not. So let's
+ * fix that.
+ *
+ * Note that we *always* end up with a message header containing
+ * pointers to (copies of) the name and (if given) data, and the
+ * data reference counted, and maybe split over multiple pages.
+ *
+ * Note that if this is a message we already tried to send
+ * earlier, any "pointy" parts would have been copied earlier,
+ * hence the check we actually make.
+ */
+ if (!priv->write.pointers_are_local) {
+ retval = kbus_copy_pointy_parts(priv, &priv->write);
+ if (retval)
+ goto done;
+ }
+
+ /* ================================================================= */
+ /*
+ * If this message is a Request, then we can't send it until/unless
+ * we've got room in our message queue to receive the Reply.
+ *
+ * We do this check here, rather than in kbus_write_to_recipients,
+ * because:
+ *
+ * a) kbus_write_to_recipients gets (re)called by the POLL interface,
+ * and at that stage KBUS *knows* that there is room for the
+ * message concerned (so the checking code would need to know not
+ * to check)
+ *
+ * b) If the check fails, we do not want to consider ourselves in
+ * "sending" state, since we can't afford to block, because it's
+ * *this Ksock* that needs to do some reading to clear the relevant
+ * queue, and it can't do that if it's blocking. So we'd either
+ * need to handle that (somehow), or just do the check here.
+ *
+ * Similarly, we don't finalise the message (put in its "from" and "id"
+ * fields) until we pass this test.
+ */
+ if ((msg->flags & KBUS_BIT_WANT_A_REPLY) &&
+ kbus_queue_is_full(priv, "sender", false)) {
+ dev_err(priv->dev->dev, "%u Unable to send Request becausei"
+ " no room for a Reply in sender's message queue\n",
+ priv->id);
+ retval = -ENOLCK;
+ goto done;
+ }
+ /* ================================================================= */
+
+ /* So, we're actually ready to SEND! */
+
+ /* The message needs to say it is from us */
+ msg->from = priv->id;
+
+ /*
+ * If we've already tried to send this message earlier (and
+ * presumably failed with -EAGAIN), then we don't need to give
+ * it a message id, because it already has one...
+ */
+ if (!priv->sending) {
+ /* The message seems well formed, give it an id if necessary */
+ if (msg->id.network_id == 0)
+ msg->id.serial_num = kbus_next_serial_num(dev);
+ }
+
+ /* Also, remember this as the "message we last (tried to) send" */
+ priv->last_msg_id_sent = msg->id;
+
+ /*
+ * Figure out who should receive this message, and write it to them
+ */
+ retval = kbus_write_to_recipients(priv, dev, msg);
+
+done:
+ /*
+ * -EAGAIN means we were blocked from sending, and the caller
+ * should try again (as one might expect).
+ */
+ if (retval == -EAGAIN)
+ /* Remember we're still trying to send this message */
+ priv->sending = true;
+ else
+ /* We've now finished with our copy of the message header */
+ kbus_discard(priv);
+
+ if (retval == 0 || retval == -EAGAIN)
+ if (copy_to_user((void __user *)arg, &priv->last_msg_id_sent,
+ sizeof(priv->last_msg_id_sent)))
+ retval = -EFAULT;
+ return retval;
+}
+
static int kbus_maxmsgs(struct kbus_private_data *priv,
unsigned long arg)
{
@@ -800,6 +3356,60 @@ static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
retval = kbus_replier(priv, dev, arg);
break;

+ case KBUS_IOC_NEXTMSG:
+ /*
+ * Get the next message ready to be read, and return its
+ * length.
+ *
+ * arg in: none
+ * arg out: number of bytes in next message
+ * retval: 0 if no next message, 1 if there is a next message,
+ * negative value if there's an error.
+ */
+ retval = kbus_nextmsg(priv, arg);
+ break;
+
+ case KBUS_IOC_LENLEFT:
+ /* How many bytes are left to read in the current message? */
+ {
+ u32 left = kbus_lenleft(priv);
+ kbus_maybe_dbg(priv->dev, "%u LENLEFT %u\n",
+ id, left);
+ retval = __put_user(left, (u32 __user *) arg);
+ }
+ break;
+
+ case KBUS_IOC_SEND:
+ /*
+ * Send the curent message, we've finished writing it.
+ *
+ * arg in: <ignored>
+ * arg out: the message id of said message
+ * retval: negative for bad message, etc., 0 otherwise
+ */
+ retval = kbus_send(priv, dev, arg);
+ break;
+
+ case KBUS_IOC_DISCARD:
+ /* Throw away the message we're currently writing. */
+ kbus_maybe_dbg(priv->dev, "%u DISCARD\n", id);
+ kbus_discard(priv);
+ break;
+
+ case KBUS_IOC_LASTSENT:
+ /*
+ * What was the message id of the last message written to this
+ * file descriptor? Before any messages have been written to
+ * this file descriptor, this ioctl will return {0,0).
+ */
+ kbus_maybe_dbg(priv->dev, "%u LASTSENT %u:%u\n", id,
+ priv->last_msg_id_sent.network_id,
+ priv->last_msg_id_sent.serial_num);
+ if (copy_to_user((void __user *)arg, &priv->last_msg_id_sent,
+ sizeof(priv->last_msg_id_sent)))
+ retval = -EFAULT;
+ break;
+
case KBUS_IOC_MAXMSGS:
/*
* Set (and/or query) maximum number of messages in this
@@ -821,6 +3431,14 @@ static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
retval = kbus_nummsgs(priv, dev, arg);
break;

+ case KBUS_IOC_UNREPLIEDTO:
+ /* How many Requests (to us) do we still owe Replies to? */
+ kbus_maybe_dbg(priv->dev, "%u UNREPLIEDTO %d\n",
+ id, priv->num_replies_unsent);
+ retval = __put_user(priv->num_replies_unsent,
+ (u32 __user *) arg);
+ break;
+
case KBUS_IOC_VERBOSE:
/*
* Should we output verbose/debug messages?
@@ -842,10 +3460,110 @@ static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return retval;
}

+/*
+ * Try sending the (current waiting to be sent) message
+ *
+ * Returns true if the message has either been successfully sent, or an error
+ * occcurred (which has been dealt with) and there is no longer a current
+ * message.
+ *
+ * Returns false if we hit EAGAIN (again) and we're still trying to send the
+ * current message.
+ */
+static int kbus_poll_try_send_again(struct kbus_private_data *priv,
+ struct kbus_dev *dev)
+{
+ int retval;
+ struct kbus_msg *msg = priv->write.msg;
+
+ retval = kbus_write_to_recipients(priv, dev, msg);
+
+ switch (-retval) {
+ case 0: /* All is well, nothing to do */
+ break;
+ case EAGAIN: /* Still blocked by *someone* - nowt to do */
+ break;
+ case EADDRNOTAVAIL:
+ /*
+ * It's a Request and there's no Replier (presumably there was
+ * when the initial SEND was done, but now they've gone away).
+ * A Request *needs* a Reply...
+ */
+ kbus_push_synthetic_message(dev, 0, msg->from, msg->id,
+ KBUS_MSG_NAME_REPLIER_DISAPPEARED);
+ retval = 0;
+ break;
+ default:
+ /*
+ * Send *failed* - what can we do?
+ * Not much, perhaps, but we must ensure that a Request gets
+ * (some sort of) reply
+ */
+ if (msg->flags & KBUS_BIT_WANT_A_REPLY)
+ kbus_push_synthetic_message(dev, 0, msg->from, msg->id,
+ KBUS_MSG_NAME_ERROR_SENDING);
+ retval = 0;
+ break;
+ }
+
+ if (retval == 0) {
+ kbus_discard(priv);
+ return true;
+ }
+ return false;
+}
+
+static unsigned int kbus_poll(struct file *filp, poll_table * wait)
+{
+ struct kbus_private_data *priv = filp->private_data;
+ struct kbus_dev *dev = priv->dev;
+ unsigned mask = 0;
+
+ mutex_lock(&dev->mux);
+
+ kbus_maybe_dbg(priv->dev, "%u POLL\n", priv->id);
+
+ /*
+ * Did I wake up because there's a message available to be read?
+ */
+ if (priv->message_count != 0)
+ mask |= POLLIN | POLLRDNORM; /* readable */
+
+ /*
+ * Did I wake up because someone said they had space for a message on
+ * their message queue (where there wasn't space before)?
+ *
+ * And if that is the case, if we're opened for write and have a
+ * message waiting to be sent, can we now send it?
+ *
+ * The simplest way to find out is just to try again.
+ */
+ if (filp->f_mode & FMODE_WRITE) {
+ int writable = true;
+ if (priv->sending)
+ writable = kbus_poll_try_send_again(priv, dev);
+ if (writable)
+ mask |= POLLOUT | POLLWRNORM;
+ }
+
+ /* Wait until someone has a message waiting to be read */
+ poll_wait(filp, &priv->read_wait, wait);
+
+ /* Wait until someone has a space into which a message can be pushed */
+ if (priv->sending)
+ poll_wait(filp, &dev->write_wait, wait);
+
+ mutex_unlock(&dev->mux);
+ return mask;
+}
+
/* File operations for /dev/kbus<n> */
static const struct file_operations kbus_fops = {
.owner = THIS_MODULE,
+ .read = kbus_read,
+ .write = kbus_write,
.unlocked_ioctl = kbus_ioctl,
+ .poll = kbus_poll,
.open = kbus_open,
.release = kbus_release,
};
@@ -867,6 +3585,8 @@ static void kbus_setup_cdev(struct kbus_dev *dev, int devno)
INIT_LIST_HEAD(&dev->bound_message_list);
INIT_LIST_HEAD(&dev->open_ksock_list);

+ init_waitqueue_head(&dev->write_wait);
+
dev->next_ksock_id = 0;
dev->next_msg_serial_num = 0;

--
1.7.4.1

2011-03-22 19:36:44

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [PATCH 00/11] RFC: KBUS messaging subsystem

On Fri, 18 Mar 2011 17:21:09 +0000
Tony Ibbs <[email protected]> wrote:

> KBUS is a lightweight, Linux kernel mediated messaging system,
> particularly intended for use in embedded environments.

I've spent a bit of time looking at this code...this isn't a detailed
review by any stretch, more like a few impressions.

- Why kbus over, say, a user-space daemon and unix-domain sockets? I'm
not sure I see the advantage that comes with putting this into kernel
space.

- The interface is ... creative. If you have to do this in kernel space,
it would be nice to do away with the split write()/ioctl() API for
reading or writing messages. It seems like either a write(), OR an
ioctl() with a message data pointer would suffice; that would cut the
number of syscalls the applications need to make too.

Even better might be to just use the socket API.

- Does anything bound the size of a message fed into the kernel with
write()? I couldn't find it. It seems like an application could
consume arbitrary amounts of kernel memory.

- It would be good to use the kernel's dynamic debugging and tracing
facilities rather than rolling your own.

- There's lots of kmalloc()/memset() pairs that could be kzalloc().

That's as far as I could get for now.

Thanks,

jon

2011-03-23 23:13:52

by Tony Ibbs

[permalink] [raw]
Subject: Re: [PATCH 00/11] RFC: KBUS messaging subsystem


On 22 Mar 2011, at 19:36, Jonathan Corbet wrote:

> On Fri, 18 Mar 2011 17:21:09 +0000
> Tony Ibbs <[email protected]> wrote:
>
> > KBUS is a lightweight, Linux kernel mediated messaging system,
> > particularly intended for use in embedded environments.

> - Why kbus over, say, a user-space daemon and unix-domain sockets? I'm
> not sure I see the advantage that comes with putting this into kernel
> space.

Mostly, a kernel module gives us reliability.

In particular, a kernel module allows us to guarantee that a replier
that "goes away" (including crashing) will be detected by KBUS, and
cause a synthetic reply to be sent, so that the sender can know that it
will not get a real reply.

This same guarantee means that the sender end of a stateful dialogue can
be reliably told if the replier end disconnects and (some new version of
it) reconnects - in which case state presumably needs to be
reestablished.

Doing this in userspace would be difficult and unreliable.

There are other problems with userspace daemons, including setting up
many-to-many messaging, message atomicity, and so on. Our past
experience of other people's solutions (previous customers in
particular) is that it is perilously easy to get it wrong in userspace,
and especially to end up with race conditions.

> - The interface is ... creative.

That's very tactfully put.

> If you have to do this in kernel space,
> it would be nice to do away with the split write()/ioctl() API for
> reading or writing messages. It seems like either a write(), OR an
> ioctl() with a message data pointer would suffice; that would cut the
> number of syscalls the applications need to make too.

When the reader is reading a message, using 'read' seems very natural,
and is simple to explain. Because we always return an "entire" message
(i.e., one in which all the message data is in one chunk, rather than
a header pointing to message name and/or data), it also means that
memory handling on return to user space is much simplified. Doing an
ioctl first to find out the length of the message to come is also
simple to explain.

Also, in the case of reading a message, I can see clear advantage
in being able to "stream" the reading of the message data (for a
long and appropriately structured message).

Writing a message *could* be done with 'write' alone. I must admit that
having 'write' detect the end of the message by looking at it feels
wrong, somehow, but that's not a very compelling answer. It is,
however, definitely easier for the user to understand the error if
they try to <send> and get told they haven't written enough data
yet, rather than just waiting for the 'write' to magically complete.

There is also a certain symmetry to using <nextmsg>/'read' and
'write'/<send>, but as you said at the start, it's a bit unusual.

Using an ioctl instead of 'write' would involve a more complex ioctl
than we're otherwise commonly using, would lose the symmetry, and just
didn't feel right. It also means pointer handling for even the simplest
message.

> Even better might be to just use the socket API.

Whilst the current API is a bit odd, trying to use the socket API looked
to us as if it would be a worse fit.

The socket API doesn't seem to match what we wanted KBUS to do
particularly well. It's not, for instance, obvious how to do a 'recv' of
a variable length message that might be quite short or several hundred
KB long - does one 'recv' the header first, and then the body (which
isn't very nice)? Doing a 'next message' ioctl as current KBUS does
would feel really alien in a socket environment.

Of course, we'd still have to invent our own addressing scheme, and our
own ``struct *addr``, and appropriate socket options, and also decide
how the common options should apply or not (for instance, SO_ACCEPTCONN,
SO_BROADCAST). And how to work with accept/listen/bind and all the other
common calls.

Also, lazily on my part, it's fairly obvious how to write a file
interface for the kernel, but the socket API (from the inside) appears
to be more complex, and to have fewer examples with training wheels.

We *could* reimplement in terms of sockets, but I think the code would
get a lot bigger, and I think using the system would be a lot harder to
explain (I don't think the current message name binding mechanisms would
get any clearer, for instance).

And some of the semantics of KBUS (the sending of a message to say that
the expected replier has been replaced by a new one, for instance) seem
to fit oddly with how people expect sockets to work. Or being told that
the far end has gone away, or is not who one expected it to be.

Also, I'm afraid my experience is that people find sockets hard to
understand (not necessarily justifiably), whereas explaining KBUS to its
intended users is fairly simple - one can assume they know about file
interfaces, and people fairly easily accept a few "odd" extra calls. But
that may not be a very compelling reason from the inside of the
kernel...

> - Does anything bound the size of a message fed into the kernel with
> write()? I couldn't find it. It seems like an application could
> consume arbitrary amounts of kernel memory.

That is indeed a misfeature. There should be a default limit, and some
way of changing it.

> - It would be good to use the kernel's dynamic debugging and tracing
> facilities rather than rolling your own.

Mea culpa. KBUS's debug support grew rather erratically, and only
recently got converted to at least using dev_debug and friends.
Also, I'm not at all sure what the current kernel mechanisms are
(pointers are welcomed, since this is a clear case where normal
kernel conventions should be followed, and I don't know what they are).

> - There's lots of kmalloc()/memset() pairs that could be kzalloc().

And I just missed that.

> That's as far as I could get for now.

Thanks, it's all appreciated, and all makes sense.

(and I should say thank you since I started out writing KBUS with a copy
of Linux Device Drivers beside me, and bookmarks for various LWN
articles. It would all be a lot worse without those).

Hope this all makes sense - it's late here but I shan't have a chance to
reply tomorrow.

Tibs

2011-03-24 18:12:39

by James Chapman

[permalink] [raw]
Subject: Re: [PATCH 00/11] RFC: KBUS messaging subsystem

On 23/03/2011 23:13, Tony Ibbs wrote:
>
> On 22 Mar 2011, at 19:36, Jonathan Corbet wrote:
>
>> On Fri, 18 Mar 2011 17:21:09 +0000
>> Tony Ibbs <[email protected]> wrote:
>>
>>> KBUS is a lightweight, Linux kernel mediated messaging system,
>>> particularly intended for use in embedded environments.
>
>> - Why kbus over, say, a user-space daemon and unix-domain sockets? I'm
>> not sure I see the advantage that comes with putting this into kernel
>> space.
>
> Mostly, a kernel module gives us reliability.
>
> In particular, a kernel module allows us to guarantee that a replier
> that "goes away" (including crashing) will be detected by KBUS, and
> cause a synthetic reply to be sent, so that the sender can know that it
> will not get a real reply.
>
> This same guarantee means that the sender end of a stateful dialogue can
> be reliably told if the replier end disconnects and (some new version of
> it) reconnects - in which case state presumably needs to be
> reestablished.
>
> Doing this in userspace would be difficult and unreliable.
>
> There are other problems with userspace daemons, including setting up
> many-to-many messaging, message atomicity, and so on. Our past
> experience of other people's solutions (previous customers in
> particular) is that it is perilously easy to get it wrong in userspace,
> and especially to end up with race conditions.

I don't understand what Kbus really brings either.

With good sockets programming, it is possible to avoid most of the
issues mentioned above. Frameworks like Glib and DBus can also help.

Have you considered other kernel messaging subsystems such as netlink
sockets, connectors, POSIX message queues etc etc if you don't want DBus?

>
>> - The interface is ... creative.
>
> That's very tactfully put.
>
>> If you have to do this in kernel space,
>> it would be nice to do away with the split write()/ioctl() API for
>> reading or writing messages. It seems like either a write(), OR an
>> ioctl() with a message data pointer would suffice; that would cut the
>> number of syscalls the applications need to make too.
>
> When the reader is reading a message, using 'read' seems very natural,
> and is simple to explain. Because we always return an "entire" message
> (i.e., one in which all the message data is in one chunk, rather than
> a header pointing to message name and/or data), it also means that
> memory handling on return to user space is much simplified. Doing an
> ioctl first to find out the length of the message to come is also
> simple to explain.

Eh? Network protocols routinely do this sort of thing with regular
sockets. Read the message header then read the rest when you know how
big the rest is. Of where you know the max size of all possible
messages, do one read into a fixed size buffer.

> Also, in the case of reading a message, I can see clear advantage
> in being able to "stream" the reading of the message data (for a
> long and appropriately structured message).
>
> Writing a message *could* be done with 'write' alone. I must admit that
> having 'write' detect the end of the message by looking at it feels
> wrong, somehow, but that's not a very compelling answer. It is,
> however, definitely easier for the user to understand the error if
> they try to <send> and get told they haven't written enough data
> yet, rather than just waiting for the 'write' to magically complete.

A write to a socket would do the same. I don't get the bit about
detecting the end of the message. I think this complexity is coming from
using char devices for message passing.

> There is also a certain symmetry to using <nextmsg>/'read' and
> 'write'/<send>, but as you said at the start, it's a bit unusual.
>
> Using an ioctl instead of 'write' would involve a more complex ioctl
> than we're otherwise commonly using, would lose the symmetry, and just
> didn't feel right. It also means pointer handling for even the simplest
> message.
>
>> Even better might be to just use the socket API.

Agreed.

> Whilst the current API is a bit odd, trying to use the socket API looked
> to us as if it would be a worse fit.
>
> The socket API doesn't seem to match what we wanted KBUS to do
> particularly well. It's not, for instance, obvious how to do a 'recv' of
> a variable length message that might be quite short or several hundred
> KB long - does one 'recv' the header first, and then the body (which
> isn't very nice)? Doing a 'next message' ioctl as current KBUS does
> would feel really alien in a socket environment.
>
> Of course, we'd still have to invent our own addressing scheme, and our
> own ``struct *addr``, and appropriate socket options, and also decide
> how the common options should apply or not (for instance, SO_ACCEPTCONN,
> SO_BROADCAST). And how to work with accept/listen/bind and all the other
> common calls.
>
> Also, lazily on my part, it's fairly obvious how to write a file
> interface for the kernel, but the socket API (from the inside) appears
> to be more complex, and to have fewer examples with training wheels.
>
> We *could* reimplement in terms of sockets, but I think the code would
> get a lot bigger, and I think using the system would be a lot harder to
> explain (I don't think the current message name binding mechanisms would
> get any clearer, for instance).

Why would you need a new socket family?

> And some of the semantics of KBUS (the sending of a message to say that
> the expected replier has been replaced by a new one, for instance) seem
> to fit oddly with how people expect sockets to work. Or being told that
> the far end has gone away, or is not who one expected it to be.

Not really. It is what DBus is all about.

Perhaps KBUS is intended for uses where DBus is too big? Or is it to
help port legacy RTOS apps to Linux? Shudder. :-)

Perhaps I misunderstand what KBUS might do for me. It might be useful to
present two simple apps implementing the same thing with, say, a unix
socket and KBUS, e.g. sending a message reliably to another process and
handling possible errors.

> Also, I'm afraid my experience is that people find sockets hard to
> understand (not necessarily justifiably), whereas explaining KBUS to its
> intended users is fairly simple - one can assume they know about file
> interfaces, and people fairly easily accept a few "odd" extra calls. But
> that may not be a very compelling reason from the inside of the
> kernel...
>
>> - Does anything bound the size of a message fed into the kernel with
>> write()? I couldn't find it. It seems like an application could
>> consume arbitrary amounts of kernel memory.
>
> That is indeed a misfeature. There should be a default limit, and some
> way of changing it.
>
>> - It would be good to use the kernel's dynamic debugging and tracing
>> facilities rather than rolling your own.
>
> Mea culpa. KBUS's debug support grew rather erratically, and only
> recently got converted to at least using dev_debug and friends.
> Also, I'm not at all sure what the current kernel mechanisms are
> (pointers are welcomed, since this is a clear case where normal
> kernel conventions should be followed, and I don't know what they are).
>
>> - There's lots of kmalloc()/memset() pairs that could be kzalloc().
>
> And I just missed that.
>
>> That's as far as I could get for now.
>
> Thanks, it's all appreciated, and all makes sense.
>
> (and I should say thank you since I started out writing KBUS with a copy
> of Linux Device Drivers beside me, and bookmarks for various LWN
> articles. It would all be a lot worse without those).
>
> Hope this all makes sense - it's late here but I shan't have a chance to
> reply tomorrow.
>
> Tibs


--
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

2011-03-27 19:07:22

by Tony Ibbs

[permalink] [raw]
Subject: Re: [PATCH 00/11] RFC: KBUS messaging subsystem

On 24 Mar 2011, at 18:03, James Chapman wrote:

> On 23/03/2011 23:13, Tony Ibbs wrote:
> ...stuff I've left out, because it's expanded on below...
>
> I don't understand what Kbus really brings either.

>From one angle, KBUS aims to fit the particular needs of the embedded
user:

1. It's relatively small (so it will fit in small environments)
2. It's written in C (so no need for C++ libraries, etc., which are
relatively large)
3. It's meant to be simple to learn (so it should be useful if you don't
have time or resource to learn something arguably more
complex/featureful)
4. It's meant to be simple to use without anything but the kernel
module (i.e., you don't have to use the userspace libraries that
the overall project also provides).

These are limitations of possible use spaces.

We also know, from practical experience, that a hard-pressed engineer
working on a project wants to be able to reach for a messaging solution
that doesn't take more than (say) an afternoon to get to grips with,
becase in general it is not anywhere near the main thrust of what
they're doing, and they don't have the time or inclination to become
expert in yet another domain. So it needs to be simple to explain and
use.

(Personally, I see this as the main failing of Dbus - even the tutorial
states up front that it is an incomplete and unfinished document. It
would be good if someone produced documentation that gives the same
enthusiasm for Dbus as the 0mq people have done with their stuff.)

>From another angle:

a. KBUS provides deterministic message ordering, so message order is
the same for everyone receiving messages (if A sends a message,
and B sends a message, then any of A, B, C, etc., who are listening
to messages of that name will receive them in the same order.
Consider messages for 'play', 'rewind' and 'pause', for instance.
Or 'left by X', 'forward by Y' and 'up by Z' in a non-open space).
b. It guarantees that you'll get a Reply for a Request, even if that
reply is "you aren't going to get one because the replier has gone
away". This is done via message rather than error code, since it may
be determined some time after the sender has sent the original
message. Callback mechanisms can be used to similar effect, but are a
pain if you don't want them.
c. It is not possible to send a request if the replier for that request
does not have room in its incoming message queue. Nor is it possible
to send a request if the sender does not have room for the (eventual)
reply (ok, that last is perhaps a bit obvious).

Clearly, if the incoming and outgoing message queues are going be
managed in this way (i.e., according to (c)), there needs to be a
central daemon of some sort. Making sure that a daemon in userspace is
reliable is hard, and making sure that it knows when one of its clients
dies is also difficult (or perhaps "messy" would be a better word).
This last is particularly easy in the kernel, of course. Kernel modules
also come with a host of requirements and facilities that make them
easier to make reliable. Being in the kernel also brings well-debugged
handling for scalability issues, threading and multi-processing, all of
which are difficult to get absolutely right in userspace.

Some things which perhaps are not made as clear in the documentation as
they might be:

i. KBUS is not client/server based. All senders/listeners (KBUS
clients) are peers in this respect. Any Ksock can send messages
(well, if it's opened for write), any Ksock can receive messages.
ii. Anyone can choose to listen to (receive) a message of a particular
name. This is not dependent on who sends such a message.
iii. A single Ksock can choose to be a replier for a message of a
particular name. This does not limit who may send request messages
of that name, just who can reply.
iv. From the above, it should be obvious that any request can be seen
(received) by any listeners to that same message name, even though
they can't reply to it. This makes debugging/logging the message
traffic on a system particular simple.

Also (this is definitely addressed in the documentation), there is
limited wildcarding on message names - basically on the end of the
message name. This can used when binding to a message name for either
listening or replying.

Finally, KBUS does not address message content. It is not intended to be
an RPC mechanism (which I think is what Dbus, for instance, is mostly
after?). It's up to the user to choose an appropriate way of describing
data (of which there are many good examples). I don't think that's a
particularly contentious point, though, but perhaps worth mentioning.

> With good sockets programming, it is possible to avoid most of the
> issues mentioned

If someone who is a very good socket programmer wrote a user-space
library to reproduce all of the things that KBUS does, then that would
be very good (I'm unhappy with the concept of only solving "most" of the
problems, though). I don't think I could do it (hiding client/server,
handling many-to-many transfers over AF_UNIX, etc., etc.), but then I'd
not claim to be a very good socket programmer. And I'd assume it would
take rather more code (since one is, to an extent, fighting against the
underlying paradigms).

I also don't see how you'd get away without some sort of identity
broker, to indicate which socket(s) belong to whom (since I assume each
Ksock-equivalent would need multiple sockets to handle the different
things that need doing). And that's another moderately fiddly bunch of
code to be tested.

> Frameworks like Glib and DBus can also help.

Glib is, surely, addressing different issues.

Dbus is simply too large for many of the places we want to do messaging.
It is also, as I suggested above, not terribly simple to learn, so it is
not going to be picked up by people as a potential solution when they
"just" want to send messages. (Clearly, if one is wanting to communicate
with systems that already use Dbus, this is a different matter, and I
assume one can use existing examples to leverage oneself in such cases,
and anyway one doesn't have a choice.) It's also not clear to me that
Dbus allows one to do all the things we want to do with KBUS - I stand
willing to be told that I just haven't found the right documentation for
that, though.

> Have you considered other kernel messaging subsystems such as netlink
> sockets, connectors, POSIX message queues etc etc if you don't want DBus?

As I understand it, netlink is explicitly lossy (from the connector
documentation: "messages can be lost due to memory pressure or process'
receiving queue overflowed"). So one would have the issue of handling
acknowledgements and re-requests, and suddenly life is more complicated.

If one were using netlink, I think connector looks like an excellent way
to do it (it seems to have decent documentation, for a start), but it
doesn't alleviate the lossy nature of netlink itself.

In both cases, it's a lot simpler just not to do that - use a non-lossy
approach in the first place.

By 'sockets' I assume you mean over AF_UNIX, and to some extent I hope
I've addressed that above. It didn't seem a very natural fit to what
we're trying to achieve, so we didn't start from there.

Posix message queues are interesting, but I don't see how it would be
possble to do N-to-many messaging using them, without having to write
almost all of KBUS-as-infrastructure anyway. Also, as I understand it,
managing the existence of the exposed filesystem paths corresponding to
message queues in a reliable manner is difficult.

I'm not sure what other technologies are meant by "etc., etc.".

> Why would you need a new socket family?

I was assuming the alternative of writing a kernel module that "spoke"
the sockets API, instead of the file API, but still had the KBUS code
inside it (i.e., the internal management of queues, etc.), just as we've
done with the normal file interface API in current KBUS. As I said
before, we'd then need a new socket family - perhaps AF_KBUS. And I'd
contend that if this were to preserve the intended uses of KBUS, it
would be as "creative" a use of the socket interfaces as our current
code is of the file API, and I think in context even more confusing.

Or we could have taken a Poxix message queue style API, which would map
directly to KBUS usage. That, however, would involve new system calls
(i.e., replacing 'mq_' with 'kbus_') and I assume that's an absolute
forbidden.

> Perhaps KBUS is intended for uses where DBus is too big?

Addressed above. It's one of the intentions.

> Or is it to help port legacy RTOS apps to Linux?

I don't understand this point - but assume it is not relevant.

> Perhaps I misunderstand what KBUS might do for me.

I hope the above helps a bit.

> It might be useful to present two simple apps implementing the same
> thing with, say, a unix socket and KBUS, e.g. sending a message
> reliably to another process and handling possible errors.

Indeed, this would be an interesting and valuable exercise. I'm afraid I
don't see that I'm going to have time to produce such a thing in the
near future, though.

The EuroPython 2010 talk on the KBUS website does present some (very)
simple examples of use, albeit in Python, and the kmsg application
presents some very primitive usage (but hardly representative). I'm
afraid the project is sparse on examples at the moment, as most of its
use has been for customers who do not wish their code made public.

All the best,
Tibs

2011-04-15 21:34:59

by Tony Ibbs

[permalink] [raw]
Subject: [PATCH] extra/1 Allow setting the maximum KBUS message size


On 22 Mar 2011, at 19:36, Jonathan Corbet wrote:

> - Does anything bound the size of a message fed into the kernel with
> write()? I couldn't find it. It seems like an application could
> consume arbitrary amounts of kernel memory.

This patch provides mechanisms for setting an absolute maximum message
size at compile time, and a per-device maximum at runtime.

The patch is relative to the results of the previous set of patches - I
assume this is better than resubmitting all of them for what is a
relatively small change.

> - It would be good to use the kernel's dynamic debugging and tracing
> facilities rather than rolling your own.
>
> - There's lots of kmalloc()/memset() pairs that could be kzalloc().

I shall address these next, although I'm afraid it may be a few days.
Thanks, by the way, for the timely LWN article on the dynamic debugging
interface.

Signed-off-by: Tony Ibbs <[email protected]>
---
Documentation/Kbus.txt | 15 ++++++++++-
include/linux/kbus_defns.h | 14 ++++++++++-
ipc/Kconfig | 32 ++++++++++++++++++++++++-
ipc/kbus_internal.h | 11 ++++++++
ipc/kbus_main.c | 57 ++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 126 insertions(+), 3 deletions(-)

diff --git a/Documentation/Kbus.txt b/Documentation/Kbus.txt
index 7cf723fd6..16828b9 100644
--- a/Documentation/Kbus.txt
+++ b/Documentation/Kbus.txt
@@ -1058,6 +1058,18 @@ header file (``kbus_defns.h``). They are:
Both Python and C bindings provide a useful function to
extract the ``is_bind``, ``binder`` and ``name`` values from
the data.
+:MAXMSGSIZE: Set the maximum size of a KBUS message for this KBUS device,
+ and return the value that is set. This is the size of the
+ largest message that may be written to a KBUS Ksock. Trying
+ to write a longer message will result in an -EMSGSIZE error.
+ An attempt to set this value of 0 will just return the current
+ maximum size. Otherwise, the size requested may not be less
+ than 100, or more than the kernel configuration value
+ KBUS_ABS_MAX_MESSAGE_SIZE. The default maximum size is set by
+ the kernel configuration value KBUS_DEF_MAX_MESSAGE_SIZE, and
+ is typically 1024. The size being tested is that returned by
+ the KBUS_ENTIRE_MESSAGE_LEN macro - i.e., the size of an
+ equivalent "entire" message.

/proc/kbus/bindings
-------------------
@@ -1158,7 +1170,8 @@ as values inside the IOError exception.
:EINVAL: Something went wrong (generic error).
:EMSGSIZE: On attempting to write a message: Data was written after
the end of the message (i.e., after the final end guard
- of the message).
+ of the message), or an attempt was made to write a message
+ that is too long (see the MAXMSGSIZE ioctl).
:ENAMETOOLONG: On attempting to bind, unbind or send a message: The message
name is too long.
:ENOENT: On attempting to open a Ksock: There is no such device
diff --git a/include/linux/kbus_defns.h b/include/linux/kbus_defns.h
index 82779a6..29f6f99 100644
--- a/include/linux/kbus_defns.h
+++ b/include/linux/kbus_defns.h
@@ -655,9 +655,21 @@ struct kbus_replier_bind_event_data {
* of the specified values)
*/
#define KBUS_IOC_REPORTREPLIERBINDS _IOWR(KBUS_IOC_MAGIC, 17, char *)
+/*
+ * MAXMSGSIZE - set the maximum size of a KBUS message for this KBUS device.
+ * This may not be set to less than 100, or more than
+ * CONFIG_KBUS_ABS_MAX_MESSAGE_SIZE.
+ * arg (in): __u32, the requested maximum message size, or 0 to just
+ * request what the current limit is, 1 to request the absolute
+ * maximum size.
+ * arg (out): __u32, the maximum essage size after this call has
+ * succeeded
+ * retval: 0 for success, negative for failure
+ */
+#define KBUS_IOC_MAXMSGSIZE _IOWR(KBUS_IOC_MAGIC, 18, char *)

/* If adding another IOCTL, remember to increment the next number! */
-#define KBUS_IOC_MAXNR 17
+#define KBUS_IOC_MAXNR 18

#if !__KERNEL__ && defined(__cplusplus)
}
diff --git a/ipc/Kconfig b/ipc/Kconfig
index 808d742..603b2f6 100644
--- a/ipc/Kconfig
+++ b/ipc/Kconfig
@@ -113,5 +113,35 @@ config KBUS_MAX_UNSENT_UNBIND_MESSAGES

If unsure, choose the default.

-endif # KBUS
+config KBUS_ABS_MAX_MESSAGE_SIZE
+ int "Absolute maximum KBUS mesage size"
+ default 1024
+ range 100 2147483647
+ ---help---
+ This sets the absolute maximum size of an individual KBUS message,
+ that is, the size of the largest KBUS message that may be written
+ to a KBUS device node.
+
+ It is not possible to set the maximum message size greater than
+ this value using the KBUS_IOC_MAXMSGSIZE ioctl.

+ The size is measured as by the KBUS_ENTIRE_MSG_LEN macro, and
+ includes the message header (80 bytes on a 32-bit system).
+
+config KBUS_DEF_MAX_MESSAGE_SIZE
+ int "Default maximum KBUS mesage size"
+ default 1024
+ range 100 KBUS_ABS_MAX_MESSAGE_SIZE
+ ---help---
+ This sets the default maximum size of an individual KBUS message,
+ that is, the size of the largest KBUS message that may be written
+ to a KBUS device node.
+
+ It may be altered at runtime, for a particular KBUS device, with
+ the KBUS_IOC_MAXMSGSIZE ioctl, up to a limit of
+ KBUS_ABS_MAX_MESSAGE_SIZE.
+
+ The size is measured as by the KBUS_ENTIRE_MSG_LEN macro, and
+ includes the message header (80 bytes on a 32-bit system).
+
+endif # KBUS
diff --git a/ipc/kbus_internal.h b/ipc/kbus_internal.h
index a24fcaf..51d512c 100644
--- a/ipc/kbus_internal.h
+++ b/ipc/kbus_internal.h
@@ -86,6 +86,14 @@
#define CONFIG_KBUS_DEF_NUM_DEVICES 1
#endif

+#ifndef CONFIG_KBUS_ABS_MAX_MESSAGE_SIZE
+#define CONFIG_KBUS_ABS_MAX_MESSAGE_SIZE 1024
+#endif
+
+#ifndef CONFIG_KBUS_DEF_MAX_MESSAGE_SIZE
+#define CONFIG_KBUS_DEF_MAX_MESSAGE_SIZE 1024
+#endif
+
/*
* Our initial array sizes could arguably be made configurable
* for tuning, if we discover this is useful
@@ -685,6 +693,9 @@ struct kbus_dev {
struct list_head unsent_unbind_msg_list;
u32 unsent_unbind_msg_count;
int unsent_unbind_is_tragic;
+
+ /* The maximum message size that may be written to this device */
+ u32 max_message_size;
};

/*
diff --git a/ipc/kbus_main.c b/ipc/kbus_main.c
index e99bfca..64f863a 100644
--- a/ipc/kbus_main.c
+++ b/ipc/kbus_main.c
@@ -615,6 +615,8 @@ static int kbus_check_message_written(struct kbus_dev *dev,
struct kbus_message_header *user_msg =
(struct kbus_message_header *)&this->user_msg;

+ int msg_size;
+
if (this == NULL) {
dev_err(dev->dev, "pid %u [%s]"
" Tried to check NULL message\n",
@@ -683,6 +685,15 @@ static int kbus_check_message_written(struct kbus_dev *dev,
current->pid, current->comm);
return -EINVAL;
}
+
+ msg_size = KBUS_ENTIRE_MSG_LEN(user_msg->name_len, user_msg->data_len);
+ if (msg_size > dev->max_message_size) {
+ dev_err(dev->dev, "pid %u [%s]"
+ "Message size is %d, more than the maximum %d\n",
+ current->pid, current->comm,
+ msg_size, dev->max_message_size);
+ return -EMSGSIZE;
+ }
return 0;
}

@@ -4150,6 +4161,39 @@ static int kbus_set_report_binds(struct kbus_private_data *priv,
return __put_user(old_value, (u32 __user *) arg);
}

+static int kbus_maxmsgsize(struct kbus_private_data *priv,
+ unsigned long arg)
+{
+ int retval = 0;
+ u32 requested_max;
+
+ retval = __get_user(requested_max, (u32 __user *) arg);
+ if (retval)
+ return retval;
+
+ kbus_maybe_dbg(priv->dev, "%u MAXMSGSIZE requests %u (was %u)\n",
+ priv->id, requested_max, priv->dev->max_message_size);
+
+ dev_dbg(priv->dev->dev, " abs max %d, def max %d\n",
+ CONFIG_KBUS_ABS_MAX_MESSAGE_SIZE,
+ CONFIG_KBUS_DEF_MAX_MESSAGE_SIZE);
+
+ /* A value of 0 is a query for the current length */
+ /* A value of 1 is a query for the absolute maximum */
+ if (requested_max == 0)
+ return __put_user(priv->dev->max_message_size,
+ (u32 __user *) arg);
+ else if (requested_max == 1)
+ return __put_user(CONFIG_KBUS_ABS_MAX_MESSAGE_SIZE,
+ (u32 __user *) arg);
+ else if (requested_max < 100 ||
+ requested_max > CONFIG_KBUS_ABS_MAX_MESSAGE_SIZE)
+ return -EINVAL;
+
+ priv->dev->max_message_size = requested_max;
+ return __put_user(priv->dev->max_message_size, (u32 __user *) arg);
+}
+
static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
int err = 0;
@@ -4357,6 +4401,18 @@ static long kbus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
retval = kbus_set_report_binds(priv, dev, arg);
break;

+ case KBUS_IOC_MAXMSGSIZE:
+ /*
+ * Set (and/or query) maximum message size
+ *
+ * arg in: 0 or 1 (for query of current maximum or absolute
+ * maximu) or maximum size wanted
+ * arg out: maximum size allowed
+ * return: 0 means OK, otherwise not OK
+ */
+ retval = kbus_maxmsgsize(priv, arg);
+ break;
+
default:
/* *Should* be redundant, if we got our range checks right */
retval = -ENOTTY;
@@ -4545,6 +4601,7 @@ static int kbus_setup_new_device(int which)
new->index = which;

new->verbose = KBUS_DEFAULT_VERBOSE_SETTING;
+ new->max_message_size = CONFIG_KBUS_DEF_MAX_MESSAGE_SIZE;

new->dev = device_create(kbus_class_p, NULL,
this_devno, NULL, "kbus%d", which);
--
1.7.4.1

2011-04-15 22:46:12

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [PATCH] extra/1 Allow setting the maximum KBUS message size

On Fri, 15 Apr 2011 22:34:53 +0100
Tony Ibbs <[email protected]> wrote:

> This patch provides mechanisms for setting an absolute maximum message
> size at compile time, and a per-device maximum at runtime.

It seems like a good step in the right direction. Do you really need to
add a bunch more configuration options, though? It seems like a
reasonable default and a way to change it (sysfs file, maybe) might be
better. Is there a way to cap the total memory used by the kbus subsystem?

The kzalloc() fixes seem like a good idea too, BUT: I honestly think that
the item at the top of your list, if you want to merge this code, must be
to get the user-space API more widely reviewed and accepted. It could, I
think, be a big sticking point, and it's something you want to try to
address sooner rather than later.

That means getting more people to look at the patch, which could be hard.
The problem is that, if you wait, they'll only squeal when the code is
close to going in, and you could find yourself set back a long way. A
good first step might be to CC Andrew Morton on your next posting.

Thanks,

jon

2011-04-18 14:01:20

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH] extra/1 Allow setting the maximum KBUS message size

On Fri, Apr 15, 2011 at 04:46:08PM -0600, Jonathan Corbet wrote:

> That means getting more people to look at the patch, which could be hard.
> The problem is that, if you wait, they'll only squeal when the code is
> close to going in, and you could find yourself set back a long way. A
> good first step might be to CC Andrew Morton on your next posting.

One other thing that it'd be good to see is a contrast and compare with
other similar things like the Android binder that are floating around.

2011-04-19 19:33:15

by Tony Ibbs

[permalink] [raw]
Subject: Re: [PATCH] extra/1 Allow setting the maximum KBUS message size


On 18 Apr 2011, at 15:01, Mark Brown wrote:

> On Fri, Apr 15, 2011 at 04:46:08PM -0600, Jonathan Corbet wrote:
>
>> That means getting more people to look at the patch, which could be hard.
>> The problem is that, if you wait, they'll only squeal when the code is
>> close to going in, and you could find yourself set back a long way. A
>> good first step might be to CC Andrew Morton on your next posting.
>
> One other thing that it'd be good to see is a contrast and compare with
> other similar things like the Android binder that are floating around.

I'm aiming to write a smallish example of using KBUS for something not
entirely boring, and that will hopefully work as a starting point for
comparisons. Unfortunately (for this purpose, anyway), the next couple
of weekends (including four days of public holiday!) are taken up with
other things, so it will be a little while.

That will probably be an appropriate thing to CC Andrew Morton.

Comparative examples of how to do something similar with other means may
take longer, of course.

Thanks,
Tibs

2011-05-17 08:46:16

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH 00/11] RFC: KBUS messaging subsystem

Hello,

Sorry for this late answer.

On Tuesday 22 March 2011 20:36:40 Jonathan Corbet wrote:
> On Fri, 18 Mar 2011 17:21:09 +0000
>
> Tony Ibbs <[email protected]> wrote:
> > KBUS is a lightweight, Linux kernel mediated messaging system,
> > particularly intended for use in embedded environments.
>
> I've spent a bit of time looking at this code...this isn't a detailed
> review by any stretch, more like a few impressions.
>
> - Why kbus over, say, a user-space daemon and unix-domain sockets? I'm
> not sure I see the advantage that comes with putting this into kernel
> space.

I also fail to see why this would be required. In my opininon you are trading
the reliability over complexity by putting this in the kernel.

Most implementations (if not all) involving system-wide message delivery for
other daemons are running in user-space. Having a daemon for message delivery
to other kbus clients/servers does not not seem less reliable to me. If you
had in mind that this daemon might be killed under OOM conditions, then maybe
your whole system has an issue, issue which could be circumvented by making
sure the messaging process gets respawned when possible (upstart like
mechanism or such).

>
> - The interface is ... creative. If you have to do this in kernel space,
> it would be nice to do away with the split write()/ioctl() API for
> reading or writing messages. It seems like either a write(), OR an
> ioctl() with a message data pointer would suffice; that would cut the
> number of syscalls the applications need to make too.
>
> Even better might be to just use the socket API.

Indeed, I would also suggest having a look at what generic netlink already
provides like messages per application PID, multicasting and marshaling. If
you intend to keep a part of it in the kernel, you should have a look at this,
because from my experience with generic netlink, most of the hard job you are
re-doing here, has already been done in a generic manner.

>
> - Does anything bound the size of a message fed into the kernel with
> write()? I couldn't find it. It seems like an application could
> consume arbitrary amounts of kernel memory.
>
> - It would be good to use the kernel's dynamic debugging and tracing
> facilities rather than rolling your own.
>
> - There's lots of kmalloc()/memset() pairs that could be kzalloc().
>
> That's as far as I could get for now.
>
> Thanks,
>
> jon
> --
> To unsubscribe from this list: send the line "unsubscribe linux-embedded"
> in the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2011-05-22 19:58:25

by Tony Ibbs

[permalink] [raw]
Subject: Re: [PATCH 00/11] RFC: KBUS messaging subsystem

On 17 May 2011, at 09:50, Florian Fainelli wrote:

> Hello,
>
> Sorry for this late answer.

Not a problem from here, all responses are helpful. In, turn, apologies
for taking so long to reply.

> Most implementations (if not all) involving system-wide message
> delivery for other daemons are running in user-space.

OK. Although I certainly wouldn't claim to have anywhere near a complete
list of such (an annotated list of all the messaging systems on Linux
would be rather interesting, though!).

> If you had in mind that this daemon might be killed under OOM
> conditions, then maybe your whole system has an issue, which
> could be circumvented by making sure the messaging process gets
> respawned when possible (upstart like mechanism or such).

OOM isn't particularly an issue I'd worried about for any part of the
system. Other things tend to cause user processes to crash - using
ffmpeg on random video data, for instance. Of course, that is clearly
not a problem for KBUS itself.

Respawning itself isn't directly a problem, but getting everyone talking
to everyone else again is typically a nasty pain (and one users don't
want to think about), so one tends to want one's messaging handler to be
*very* robust. I think the discipline of working in-kernel helps with
that, although I'd be surprised if that were considered enough reason to
add a new kernel module!

> From: Jonathan Corbet <[email protected]>
> Date: 22 March 2011 19:36:40 GMT
>
> > Even better might be to just use the socket API.
>
> Indeed, I would also suggest having a look at what generic netlink already
> provides like messages per application PID, multicasting and marshaling.

As I said in an earlier message, I'd ignored netlink because it sounded
as if were intrinsically losssy (no way of not losing messages if a
queue got full) which is a problem for KBUS requests/replies.

On the other hand, understanding netlink from scratch is somewhat
difficult (I've just spent some hours doing more research, and don't
feel like I've begun to get a good idea of its boundaries yet).

I have also been reading the libnl documentation, which seems to make
the userspace end somewhat less complex, and looks like a good thing.

> If you intend to keep a part of it in the kernel, you should have a
> look at this, because from my experience with generic netlink, most of
> the hard job you are re-doing here, has already been done in a generic
> manner.

It looks interesting, but the worrying part of statements like this is
always the "most of".

Is your suggestion that netlink would be a better API than the current
"creating" use of a file API for communicating from user space to the
KBUS kernel module, and then back?

The LWN article http://lwn.net/Articles/131802/ makes that sound
plausible (assuming one can still detect "release" events for netlink
sockets - I assume one can). At first glance I'm not sure how much
harder it is to program such a netlink interface "bare" (without a
userspace library such as libnl) than it is to use the current KBUS
interface in such a manner.

(Aside: a quick look at my current KBUS build shows kbus.ko as 60KB,
libkbus.so (the C userspace library on top of the "raw" usage) as 54KB,
and libnl.so as 277KB - although I don't know how Ubuntu build the
latter, and it obviously also includes all sorts of data description
handling which KBUS deliberately does not. So netlink smaller if "bare",
and bigger, but not a huge amount, if used with its library.)

I'm not entirely sure what happens if either end of the netlink API
doesn't respond in a timely manner - is netlink allowed to throw things
away?

Or did you mean that netlink is appropriate to replace some/much of the
KBUS kernel module as well? In that case I'd have to think about it a
lot more to have an informed opinion.

Anyway.

What I'm working on at the moment is an email in which I try to restate
what we are/were trying to do with KBUS, with simple examples of the
sorts of call we're talking about, and ask if that is a sensible thing
to have in the kernel, emphasising that we are more worried about the
functionality than the API.

If the concept is a good thing but our implementation of it is
objectionable (e.g., we need to rewrite to a less "creative" interface,
be more sockety, or whatever), then so be it, we'll need to rewrite.

If you'd be willing to look at that email when it is posted, I hope it
will be easier to point at specific things and say "yes, that would be
better done with netlink" or, perhaps, "netlink would not address this,
but one might attack it in this way".

Thanks,
Tibs