/*****************************************/
Kernel Connector.
/*****************************************/
Kernel connector - new netlink based userspace <-> kernel space easy to use communication module.
Connector driver adds possibility to connect various agents using
netlink based network.
One must register callback and identifier. When driver receives
special netlink message with appropriate identifier, appropriate
callback will be called.
>From the userspace point of view it's quite straightforward:
socket();
bind();
send();
recv();
But if kernelspace want to use full power of such connections, driver
writer must create special sockets, must know about struct sk_buff
handling...
Connector allows any kernelspace agents to use netlink based
networking for inter-process communication in a significantly easier
way:
int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *));
void cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask);
struct cb_id
{
__u32 idx;
__u32 val;
};
idx and val are unique identifiers which must be registered in connector.h
for in-kernel usage.
void (*callback) (void *) - is a callback function which will be called
when message with above idx.val will be received by connector core.
Argument for that function must be dereferenced to struct cn_msg *.
struct cn_msg
{
struct cb_id id;
__u32 seq;
__u32 ack;
__u32 len; /* Length of the following data */
__u8 data[0];
};
/*****************************************/
Connector interfaces.
/*****************************************/
int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *));
Registers new callback with connector core.
struct cb_id *id - unique connector's user identifier.
It must be registered in connector.h for legal in-kernel users.
char *name - connector's callback symbolic name.
void (*callback) (void *) - connector's callback.
Argument must be dereferenced to struct cn_msg *.
void cn_del_callback(struct cb_id *id);
Unregisters new callback with connector core.
struct cb_id *id - unique connector's user identifier.
void cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask);
Sends message to the specified groups.
It can be safely called from any context, but may silently
fail under strong memory pressure.
struct cn_msg * - message header(with attached data).
u32 __groups - destination groups.
If __groups is zero, then appropriate group will
be searched through all registered connector users,
and message will be delivered to the group which was
created for user with the same ID as in msg.
If __groups is not zero, then message will be delivered
to the specified group.
int gfp_mask - GFP mask.
Note: When registering new callback user, connector core assigns netlink group
to the user which is equal to it's id.idx.
/*****************************************/
Protocol description.
/*****************************************/
Current offers transport layer with fixed header.
Recommended protocol which uses such header is following:
msg->seq and msg->ack are used to determine message genealogy.
When someone sends message it puts there locally unique sequence
and random acknowledge numbers.
Sequence number may be copied into nlmsghdr->nlmsg_seq too.
Sequence number is incremented with each message to be sent.
If we expect reply to our message, then sequence number in received
message MUST be the same as in original message, and acknowledge
number MUST be the same + 1.
If we receive message and it's sequence number is not equal to one
we are expecting, then it is new message.
If we receive message and it's sequence number is the same as one we
are expecting, but it's acknowledge is not equal acknowledge number
in original message + 1, then it is new message.
Obviously, protocol header contains above id.
connector allows event notification in the following form:
kernel driver or userspace process can ask connector to notify it
when selected id's will be turned on or off(registered or unregistered
it's callback). It is done by sending special command to connector
driver(it also registers itself with id={-1, -1}).
As example of usage Documentation/connector now contains cn_test.c -
testing module which uses connector to request notification
and to send messages.
/*****************************************/
CBUS.
/*****************************************/
This message bus allows message passing between different agents
using connector's infrastructure.
It is extremely fast for insert operations so it can be used in performance
critical paths in any context instead of direct connector's methods calls.
CBUS uses per CPU variables and thus allows message reordering,
caller must be prepared (and use CPU id in it's messages).
Usage is very simple - just call cbus_insert(struct cn_msg *msg, int gfp_mask);
It can fail, so caller must check return value - zero on success
and negative error code otherwise.
Benchmark with modified fork connector and fork bomb on 2-way system
did not show any differences between vanilla 2.6.11 and CBUS.
/*****************************************/
Reliability.
/*****************************************/
Netlink itself is not reliable protocol,
that means that messages can be lost
due to memory pressure or process' receiving
queue overflowed, so caller is warned
must be prepared. That is why struct cn_msg
[main connector's message header] contains
u32 seq and u32 ack fields.
P.S. As far as I can see, all previous comments are successfully resolved.
Thank you.
Signed-off-by: Evgeniy Polyakov <[email protected]>
diff -Nru /tmp/empty/cbus.c ./drivers/connector/cbus.c
--- /tmp/empty/cbus.c 1970-01-01 03:00:00.000000000 +0300
+++ ./drivers/connector/cbus.c 2005-04-11 16:15:29.535285712 +0400
@@ -0,0 +1,272 @@
+/*
+ * cbus.c
+ *
+ * 2005 Copyright (c) Evgeniy Polyakov <[email protected]>
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/connector.h>
+#include <linux/list.h>
+#include <linux/moduleparam.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Evgeniy Polyakov <[email protected]>");
+MODULE_DESCRIPTION("Ultrafast message bus based on kernel connector.");
+
+static DEFINE_PER_CPU(struct cbus_event_container, cbus_event_list);
+static int cbus_pid, cbus_need_exit;
+static struct completion cbus_thread_exited;
+static DECLARE_WAIT_QUEUE_HEAD(cbus_wait_queue);
+static int cbus_max_queue_len = 1024;
+module_param(cbus_max_queue_len, int, 0);
+MODULE_PARM_DESC(cbus_max_queue_len, "Maximum CBUS queue length, "
+ "if it is overflowed events will be delivered using direct connector's methods.");
+
+static char cbus_name[] = "cbus";
+
+struct cbus_event_container
+{
+ struct list_head event_list;
+ spinlock_t event_lock;
+ int qlen;
+};
+
+struct cbus_event
+{
+ struct list_head event_entry;
+ u32 cpu;
+ struct cn_msg msg;
+};
+
+static inline struct cbus_event *__cbus_dequeue(struct cbus_event_container *c)
+{
+ struct list_head *next = c->event_list.next;
+
+ list_del(next);
+ c->qlen--;
+
+ if (c->qlen < 0) {
+ printk(KERN_ERR "%s: qlen=%d after dequeue on CPU%u.\n",
+ cbus_name, c->qlen, smp_processor_id());
+ c->qlen = 0;
+ }
+
+ return list_entry(next, struct cbus_event, event_entry);
+}
+
+static inline struct cbus_event *cbus_dequeue(struct cbus_event_container *c)
+{
+ struct cbus_event *event;
+ unsigned long flags;
+
+ if (list_empty(&c->event_list))
+ return NULL;
+
+ spin_lock_irqsave(&c->event_lock, flags);
+ event = __cbus_dequeue(c);
+ spin_unlock_irqrestore(&c->event_lock, flags);
+
+ return event;
+}
+
+static inline void __cbus_enqueue(struct cbus_event_container *c, struct cbus_event *event)
+{
+ list_add_tail(&event->event_entry, &c->event_list);
+ c->qlen++;
+}
+
+static int cbus_enqueue(struct cbus_event_container *c, struct cn_msg *msg, int gfp_mask)
+{
+ int err, enq = 0;
+ struct cbus_event *event;
+ unsigned long flags;
+
+ event = kmalloc(sizeof(*event) + msg->len, gfp_mask);
+ if (!event) {
+ err = -ENOMEM;
+ goto err_out_exit;
+ }
+
+ memcpy(&event->msg, msg, sizeof(event->msg));
+
+ if (msg->len)
+ memcpy(event+1, msg->data, msg->len);
+
+ spin_lock_irqsave(&c->event_lock, flags);
+ if (c->qlen <= cbus_max_queue_len) {
+ __cbus_enqueue(c, event);
+ enq = 1;
+ }
+ spin_unlock_irqrestore(&c->event_lock, flags);
+
+ if (!enq) {
+ kfree(event);
+ cn_netlink_send(msg, 0, gfp_mask);
+ }
+
+ //wake_up_interruptible(&cbus_wait_queue);
+
+ return 0;
+
+err_out_exit:
+ return err;
+}
+
+int cbus_insert(struct cn_msg *msg, int gfp_flags)
+{
+ struct cbus_event_container *c;
+ int err;
+
+ /*
+ * If CBUS is being removed,
+ * do not allow to add new events,
+ * it still has a race, when
+ * event may be added after
+ * all queues are processed,
+ * but we do not care if one
+ * message in each queue will not
+ * be delivered and CBUS is removed.
+ */
+ if (cbus_need_exit)
+ return -ENODEV;
+
+ preempt_disable();
+ c = &__get_cpu_var(cbus_event_list);
+
+ err = cbus_enqueue(c, msg, gfp_flags);
+
+ preempt_enable();
+
+ return err;
+}
+
+static int cbus_process(struct cbus_event_container *c, int all)
+{
+ struct cbus_event *event;
+ int len, i, num;
+
+ if (list_empty(&c->event_list))
+ return 0;
+
+ if (all)
+ len = c->qlen;
+ else
+ len = 1;
+
+ num = 0;
+ for (i=0; i<len; ++i) {
+ event = cbus_dequeue(c);
+ if (!event)
+ continue;
+
+ cn_netlink_send(&event->msg, 0, GFP_KERNEL);
+ num++;
+
+ kfree(event);
+ }
+
+ return num;
+}
+
+static int cbus_event_thread(void *data)
+{
+ int i, non_empty = 0, empty = 0;
+ struct cbus_event_container *c;
+
+ daemonize(cbus_name);
+ allow_signal(SIGTERM);
+ set_user_nice(current, 19);
+
+ while (!cbus_need_exit) {
+ if (empty || non_empty == 0 || non_empty > 10) {
+ wait_event_interruptible_timeout(cbus_wait_queue, 0, HZ/100);
+ non_empty = 0;
+ empty = 0;
+ }
+
+ for_each_cpu(i) {
+ c = &per_cpu(cbus_event_list, i);
+
+ if (cbus_process(c, 0))
+ non_empty++;
+ else
+ empty = 1;
+ }
+ }
+
+ complete_and_exit(&cbus_thread_exited, 0);
+}
+
+static int cbus_init_event_container(struct cbus_event_container *c)
+{
+ INIT_LIST_HEAD(&c->event_list);
+ spin_lock_init(&c->event_lock);
+ c->qlen = 0;
+
+ return 0;
+}
+
+static void cbus_fini_event_container(struct cbus_event_container *c)
+{
+ cbus_process(c, 1);
+}
+
+int __devinit cbus_init(void)
+{
+ int i, err = 0;
+ struct cbus_event_container *c;
+
+ for_each_cpu(i) {
+ c = &per_cpu(cbus_event_list, i);
+ cbus_init_event_container(c);
+ }
+
+ init_completion(&cbus_thread_exited);
+
+ cbus_pid = kernel_thread(cbus_event_thread, NULL, CLONE_FS | CLONE_FILES);
+ if (cbus_pid < 0) {
+ printk(KERN_ERR "%s: Failed to create cbus event thread: err=%d.\n",
+ cbus_name, cbus_pid);
+ err = cbus_pid;
+ goto err_out_exit;
+ }
+
+err_out_exit:
+ return err;
+}
+
+void __devexit cbus_fini(void)
+{
+ int i;
+ struct cbus_event_container *c;
+
+ cbus_need_exit = 1;
+ kill_proc(cbus_pid, SIGTERM, 0);
+ wait_for_completion(&cbus_thread_exited);
+
+ for_each_cpu(i) {
+ c = &per_cpu(cbus_event_list, i);
+ cbus_fini_event_container(c);
+ }
+}
+
+module_init(cbus_init);
+module_exit(cbus_fini);
+
+EXPORT_SYMBOL_GPL(cbus_insert);
diff -Nru /tmp/empty/cn_queue.c ./drivers/connector/cn_queue.c
--- /tmp/empty/cn_queue.c 1970-01-01 03:00:00.000000000 +0300
+++ ./drivers/connector/cn_queue.c 2005-04-11 16:12:22.511717616 +0400
@@ -0,0 +1,207 @@
+/*
+ * cn_queue.c
+ *
+ * 2004 Copyright (c) Evgeniy Polyakov <[email protected]>
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/slab.h>
+#include <linux/skbuff.h>
+#include <linux/suspend.h>
+#include <linux/connector.h>
+#include <linux/delay.h>
+
+static void cn_queue_wrapper(void *data)
+{
+ struct cn_callback_entry *cbq = (struct cn_callback_entry *)data;
+
+ atomic_inc_and_test(&cbq->cb->refcnt);
+ cbq->cb->callback(cbq->cb->priv);
+ atomic_dec_and_test(&cbq->cb->refcnt);
+
+ cbq->destruct_data(cbq->ddata);
+}
+
+static struct cn_callback_entry *cn_queue_alloc_callback_entry(struct cn_callback *cb)
+{
+ struct cn_callback_entry *cbq;
+
+ cbq = kmalloc(sizeof(*cbq), GFP_KERNEL);
+ if (!cbq) {
+ printk(KERN_ERR "Failed to create new callback queue.\n");
+ return NULL;
+ }
+
+ memset(cbq, 0, sizeof(*cbq));
+
+ cbq->cb = cb;
+
+ INIT_WORK(&cbq->work, &cn_queue_wrapper, cbq);
+
+ return cbq;
+}
+
+static void cn_queue_free_callback(struct cn_callback_entry *cbq)
+{
+ cancel_delayed_work(&cbq->work);
+ flush_workqueue(cbq->pdev->cn_queue);
+
+ while (atomic_read(&cbq->cb->refcnt)) {
+ printk(KERN_INFO "Waiting for %s to become free: refcnt=%d.\n",
+ cbq->pdev->name, atomic_read(&cbq->cb->refcnt));
+
+ msleep(1000);
+ }
+
+ kfree(cbq);
+}
+
+int cn_cb_equal(struct cb_id *i1, struct cb_id *i2)
+{
+#if 0
+ printk(KERN_INFO "%s: comparing %04x.%04x and %04x.%04x\n",
+ __func__,
+ i1->idx, i1->val,
+ i2->idx, i2->val);
+#endif
+ return ((i1->idx == i2->idx) && (i1->val == i2->val));
+}
+
+int cn_queue_add_callback(struct cn_queue_dev *dev, struct cn_callback *cb)
+{
+ struct cn_callback_entry *cbq, *__cbq;
+ int found = 0;
+
+ cbq = cn_queue_alloc_callback_entry(cb);
+ if (!cbq)
+ return -ENOMEM;
+
+ atomic_inc(&dev->refcnt);
+ cbq->pdev = dev;
+
+ spin_lock_bh(&dev->queue_lock);
+ list_for_each_entry(__cbq, &dev->queue_list, callback_entry) {
+ if (cn_cb_equal(&__cbq->cb->id, &cb->id)) {
+ found = 1;
+ break;
+ }
+ }
+ if (!found) {
+ atomic_set(&cbq->cb->refcnt, 1);
+ list_add_tail(&cbq->callback_entry, &dev->queue_list);
+ }
+ spin_unlock_bh(&dev->queue_lock);
+
+ if (found) {
+ atomic_dec(&dev->refcnt);
+ atomic_set(&cbq->cb->refcnt, 0);
+ cn_queue_free_callback(cbq);
+ return -EINVAL;
+ }
+
+ cbq->nls = dev->nls;
+ cbq->seq = 0;
+ cbq->group = cbq->cb->id.idx;
+
+ return 0;
+}
+
+void cn_queue_del_callback(struct cn_queue_dev *dev, struct cb_id *id)
+{
+ struct cn_callback_entry *cbq = NULL, *n;
+ int found = 0;
+
+ spin_lock_bh(&dev->queue_lock);
+ list_for_each_entry_safe(cbq, n, &dev->queue_list, callback_entry) {
+ if (cn_cb_equal(&cbq->cb->id, id)) {
+ list_del(&cbq->callback_entry);
+ found = 1;
+ break;
+ }
+ }
+ spin_unlock_bh(&dev->queue_lock);
+
+ if (found) {
+ atomic_dec(&cbq->cb->refcnt);
+ cn_queue_free_callback(cbq);
+ atomic_dec_and_test(&dev->refcnt);
+ }
+}
+
+struct cn_queue_dev *cn_queue_alloc_dev(char *name, struct sock *nls)
+{
+ struct cn_queue_dev *dev;
+
+ dev = kmalloc(sizeof(*dev), GFP_KERNEL);
+ if (!dev) {
+ printk(KERN_ERR "%s: Failed to allocte new struct cn_queue_dev.\n",
+ name);
+ return NULL;
+ }
+
+ memset(dev, 0, sizeof(*dev));
+
+ snprintf(dev->name, sizeof(dev->name), "%s", name);
+
+ atomic_set(&dev->refcnt, 0);
+ INIT_LIST_HEAD(&dev->queue_list);
+ spin_lock_init(&dev->queue_lock);
+
+ dev->nls = nls;
+ dev->netlink_groups = 0;
+
+ dev->cn_queue = create_workqueue(dev->name);
+ if (!dev->cn_queue) {
+ printk(KERN_ERR "Failed to create %s queue.\n", dev->name);
+ kfree(dev);
+ return NULL;
+ }
+
+ return dev;
+}
+
+void cn_queue_free_dev(struct cn_queue_dev *dev)
+{
+ struct cn_callback_entry *cbq, *n;
+
+ flush_workqueue(dev->cn_queue);
+ destroy_workqueue(dev->cn_queue);
+
+ spin_lock_bh(&dev->queue_lock);
+ list_for_each_entry_safe(cbq, n, &dev->queue_list, callback_entry) {
+ list_del(&cbq->callback_entry);
+ atomic_dec_and_test(&cbq->cb->refcnt);
+ }
+ spin_unlock_bh(&dev->queue_lock);
+
+ while (atomic_read(&dev->refcnt)) {
+ printk(KERN_INFO "Waiting for %s to become free: refcnt=%d.\n",
+ dev->name, atomic_read(&dev->refcnt));
+
+ msleep(1000);
+ }
+
+ memset(dev, 0, sizeof(*dev));
+ kfree(dev);
+ dev = NULL;
+}
diff -Nru /tmp/empty/connector.c ./drivers/connector/connector.c
--- /tmp/empty/connector.c 1970-01-01 03:00:00.000000000 +0300
+++ ./drivers/connector/connector.c 2005-04-11 16:10:03.327876776 +0400
@@ -0,0 +1,529 @@
+/*
+ * connector.c
+ *
+ * 2004 Copyright (c) Evgeniy Polyakov <[email protected]>
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/skbuff.h>
+#include <linux/netlink.h>
+#include <linux/moduleparam.h>
+#include <linux/connector.h>
+
+#include <net/sock.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Evgeniy Polyakov <[email protected]>");
+MODULE_DESCRIPTION("Generic userspace <-> kernelspace connector.");
+
+static int unit = NETLINK_NFLOG;
+static u32 cn_idx = CN_IDX_CONNECTOR;
+static u32 cn_val = CN_VAL_CONNECTOR;
+
+module_param(unit, int, 0);
+MODULE_PARM_DESC(unit, "Netlink socket to use.");
+module_param(cn_idx, uint, 0);
+module_param(cn_val, uint, 0);
+MODULE_PARM_DESC(cn_idx, "Connector's main device idx.");
+MODULE_PARM_DESC(cn_val, "Connector's main device val.");
+
+static DEFINE_SPINLOCK(notify_lock);
+static LIST_HEAD(notify_list);
+
+static struct cn_dev cdev;
+
+int cn_already_initialized = 0;
+
+/*
+ * msg->seq and msg->ack are used to determine message genealogy.
+ * When someone sends message it puts there locally unique sequence
+ * and random acknowledge numbers.
+ * Sequence number may be copied into nlmsghdr->nlmsg_seq too.
+ *
+ * Sequence number is incremented with each message to be sent.
+ *
+ * If we expect reply to our message,
+ * then sequence number in received message MUST be the same as in original message,
+ * and acknowledge number MUST be the same + 1.
+ *
+ * If we receive message and it's sequence number is not equal to one we are expecting,
+ * then it is new message.
+ * If we receive message and it's sequence number is the same as one we are expecting,
+ * but it's acknowledge is not equal acknowledge number in original message + 1,
+ * then it is new message.
+ *
+ */
+int cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask)
+{
+ struct cn_callback_entry *__cbq;
+ unsigned int size;
+ struct sk_buff *skb;
+ struct nlmsghdr *nlh;
+ struct cn_msg *data;
+ struct cn_dev *dev = &cdev;
+ u32 groups = 0;
+ int found = 0;
+
+ if (!__groups) {
+ spin_lock_bh(&dev->cbdev->queue_lock);
+ list_for_each_entry(__cbq, &dev->cbdev->queue_list, callback_entry) {
+ if (cn_cb_equal(&__cbq->cb->id, &msg->id)) {
+ found = 1;
+ groups = __cbq->group;
+ }
+ }
+ spin_unlock_bh(&dev->cbdev->queue_lock);
+
+ if (!found) {
+ printk(KERN_ERR "Failed to find multicast netlink group for callback[0x%x.0x%x]. seq=%u\n",
+ msg->id.idx, msg->id.val, msg->seq);
+ return -ENODEV;
+ }
+ }
+ else
+ groups = __groups;
+
+ size = NLMSG_SPACE(sizeof(*msg) + msg->len);
+
+ skb = alloc_skb(size, gfp_mask);
+ if (!skb) {
+ printk(KERN_ERR "Failed to allocate new skb with size=%u.\n", size);
+ return -ENOMEM;
+ }
+
+ nlh = NLMSG_PUT(skb, 0, msg->seq, NLMSG_DONE, size - sizeof(*nlh));
+
+ data = (struct cn_msg *)NLMSG_DATA(nlh);
+
+ memcpy(data, msg, sizeof(*data) + msg->len);
+#if 0
+ printk("%s: len=%u, seq=%u, ack=%u, group=%u.\n",
+ __func__, msg->len, msg->seq, msg->ack, groups);
+#endif
+
+ NETLINK_CB(skb).dst_groups = groups;
+
+ netlink_broadcast(dev->nls, skb, 0, groups, gfp_mask);
+
+ return 0;
+
+nlmsg_failure:
+ kfree_skb(skb);
+ return -EINVAL;
+}
+
+/*
+ * Callback helper - queues work and setup destructor for given data.
+ */
+static int cn_call_callback(struct cn_msg *msg, void (*destruct_data) (void *), void *data)
+{
+ struct cn_callback_entry *__cbq;
+ struct cn_dev *dev = &cdev;
+ int found = 0;
+
+ spin_lock_bh(&dev->cbdev->queue_lock);
+ list_for_each_entry(__cbq, &dev->cbdev->queue_list, callback_entry) {
+ if (cn_cb_equal(&__cbq->cb->id, &msg->id)) {
+ __cbq->cb->priv = msg;
+
+ __cbq->ddata = data;
+ __cbq->destruct_data = destruct_data;
+
+ queue_work(dev->cbdev->cn_queue, &__cbq->work);
+ found = 1;
+ break;
+ }
+ }
+ spin_unlock_bh(&dev->cbdev->queue_lock);
+
+ return (found)?0:-ENODEV;
+}
+
+/*
+ * Skb receive helper - checks skb and msg size
+ * and calls callback helper.
+ */
+static int __cn_rx_skb(struct sk_buff *skb, struct nlmsghdr *nlh)
+{
+ u32 pid, uid, seq, group;
+ struct cn_msg *msg;
+
+ pid = NETLINK_CREDS(skb)->pid;
+ uid = NETLINK_CREDS(skb)->uid;
+ seq = nlh->nlmsg_seq;
+ group = NETLINK_CB((skb)).groups;
+ msg = (struct cn_msg *)NLMSG_DATA(nlh);
+
+ if (NLMSG_SPACE(msg->len + sizeof(*msg)) != nlh->nlmsg_len) {
+ printk(KERN_ERR "skb does not have enough length: "
+ "requested msg->len=%u[%u], nlh->nlmsg_len=%u, skb->len=%u.\n",
+ msg->len, NLMSG_SPACE(msg->len + sizeof(*msg)),
+ nlh->nlmsg_len, skb->len);
+ return -EINVAL;
+ }
+#if 0
+ printk(KERN_INFO "pid=%u, uid=%u, seq=%u, group=%u.\n",
+ pid, uid, seq, group);
+#endif
+ return cn_call_callback(msg, (void (*)(void *))kfree_skb, skb);
+}
+
+/*
+ * Main netlink receiving function -
+ * it checks skb and netlink header sizes
+ * and calls skb receive helper with shared skb.
+ */
+static void cn_rx_skb(struct sk_buff *__skb)
+{
+ struct nlmsghdr *nlh;
+ u32 len;
+ int err;
+ struct sk_buff *skb;
+
+ skb = skb_get(__skb);
+ if (!skb) {
+ printk(KERN_ERR "Failed to reference an skb.\n");
+ kfree_skb(__skb);
+ return;
+ }
+#if 0
+ printk(KERN_INFO
+ "skb: len=%u, data_len=%u, truesize=%u, proto=%u, cloned=%d, shared=%d.\n",
+ skb->len, skb->data_len, skb->truesize, skb->protocol,
+ skb_cloned(skb), skb_shared(skb));
+#endif
+ if (skb->len >= NLMSG_SPACE(0)) {
+ nlh = (struct nlmsghdr *)skb->data;
+
+ if (nlh->nlmsg_len < sizeof(struct cn_msg) ||
+ skb->len < nlh->nlmsg_len ||
+ nlh->nlmsg_len > CONNECTOR_MAX_MSG_SIZE) {
+#if 1
+ printk(KERN_INFO "nlmsg_len=%u, sizeof(*nlh)=%u\n",
+ nlh->nlmsg_len, sizeof(*nlh));
+#endif
+ kfree_skb(skb);
+ goto out;
+ }
+
+ len = NLMSG_ALIGN(nlh->nlmsg_len);
+ if (len > skb->len)
+ len = skb->len;
+
+ err = __cn_rx_skb(skb, nlh);
+ if (err < 0)
+ kfree_skb(skb);
+ }
+
+out:
+ kfree_skb(__skb);
+}
+
+/*
+ * Netlink socket input callback - dequeues skb and
+ * calls main netlink receiving function.
+ */
+static void cn_input(struct sock *sk, int len)
+{
+ struct sk_buff *skb;
+
+ while ((skb = skb_dequeue(&sk->sk_receive_queue)) != NULL)
+ cn_rx_skb(skb);
+}
+
+/*
+ * Notification routing.
+ * Gets id and checks if
+ * there are notification request for it's idx and val.
+ * If there are such requests notify it's listeners
+ * with given notify event.
+ */
+static void cn_notify(struct cb_id *id, u32 notify_event)
+{
+ struct cn_ctl_entry *ent;
+
+ spin_lock_bh(¬ify_lock);
+ list_for_each_entry(ent, ¬ify_list, notify_entry) {
+ int i;
+ struct cn_notify_req *req;
+ struct cn_ctl_msg *ctl = ent->msg;
+ int idx_found, val_found;
+
+ idx_found = val_found = 0;
+
+ req = (struct cn_notify_req *)ctl->data;
+ for (i=0; i<ctl->idx_notify_num; ++i, ++req) {
+ if (id->idx >= req->first && id->idx < req->first + req->range) {
+ idx_found = 1;
+ break;
+ }
+ }
+
+ for (i=0; i<ctl->val_notify_num; ++i, ++req) {
+ if (id->val >= req->first && id->val < req->first + req->range) {
+ val_found = 1;
+ break;
+ }
+ }
+
+ if (idx_found && val_found) {
+ struct cn_msg m;
+
+ printk(KERN_INFO "Notifying group %x with event %u about %x.%x.\n",
+ ctl->group, notify_event,
+ id->idx, id->val);
+
+ memset(&m, 0, sizeof(m));
+ m.ack = notify_event;
+
+ memcpy(&m.id, id, sizeof(m.id));
+ cn_netlink_send(&m, ctl->group, GFP_ATOMIC);
+ }
+ }
+ spin_unlock_bh(¬ify_lock);
+}
+
+/*
+ * Callback add routing - adds callback
+ * with given ID and name.
+ * If there is registered callback with the same
+ * ID it will not be added.
+ *
+ * May sleep.
+ */
+int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *))
+{
+ int err;
+ struct cn_dev *dev = &cdev;
+ struct cn_callback *cb;
+
+ cb = kmalloc(sizeof(*cb), GFP_KERNEL);
+ if (!cb) {
+ printk(KERN_INFO "%s: Failed to allocate new struct cn_callback.\n",
+ dev->cbdev->name);
+ return -ENOMEM;
+ }
+
+ memset(cb, 0, sizeof(*cb));
+
+ scnprintf(cb->name, sizeof(cb->name), "%s", name);
+
+ memcpy(&cb->id, id, sizeof(cb->id));
+ cb->callback = callback;
+
+ atomic_set(&cb->refcnt, 0);
+
+ err = cn_queue_add_callback(dev->cbdev, cb);
+ if (err) {
+ kfree(cb);
+ return err;
+ }
+
+ cn_notify(id, 0);
+
+ return 0;
+}
+
+/*
+ * Callback remove routing - removes callback
+ * with given ID.
+ * If there is no registered callback with given
+ * ID nothing happens.
+ *
+ * May sleep while waiting for reference counter to become zero.
+ */
+void cn_del_callback(struct cb_id *id)
+{
+ struct cn_dev *dev = &cdev;
+
+ cn_queue_del_callback(dev->cbdev, id);
+ cn_notify(id, 1);
+}
+
+/*
+ * Checks two connector's control messages to be the same.
+ * Returns 1 if they are the same or firts one is broken.
+ */
+static int cn_ctl_msg_equals(struct cn_ctl_msg *m1, struct cn_ctl_msg *m2)
+{
+ int i;
+ struct cn_notify_req *req1, *req2;
+
+ if (m1->idx_notify_num != m2->idx_notify_num)
+ return 0;
+
+ if (m1->val_notify_num != m2->val_notify_num)
+ return 0;
+
+ if (m1->len != m2->len)
+ return 0;
+
+ if ((m1->idx_notify_num + m1->val_notify_num)*sizeof(*req1) != m1->len) {
+ printk(KERN_ERR "Notify entry[idx_num=%x, val_num=%x, len=%u] contains garbage. Removing.\n",
+ m1->idx_notify_num, m1->val_notify_num, m1->len);
+ return 1;
+ }
+
+ req1 = (struct cn_notify_req *)m1->data;
+ req2 = (struct cn_notify_req *)m2->data;
+
+ for (i=0; i<m1->idx_notify_num; ++i) {
+ if (memcmp(req1, req2, sizeof(*req1)))
+ return 0;
+
+ req1++;
+ req2++;
+ }
+
+ for (i=0; i<m1->val_notify_num; ++i) {
+ if (memcmp(req1, req2, sizeof(*req1)))
+ return 0;
+
+ req1++;
+ req2++;
+ }
+
+ return 1;
+}
+
+/*
+ * Main connector device's callback.
+ * Is used for notification requests processing.
+ */
+static void cn_callback(void * data)
+{
+ struct cn_msg *msg = (struct cn_msg *)data;
+ struct cn_ctl_msg *ctl;
+ struct cn_ctl_entry *ent;
+ u32 size;
+
+ if (msg->len < sizeof(*ctl)) {
+ printk(KERN_ERR "Wrong connector request size %u, must be >= %u.\n",
+ msg->len, sizeof(*ctl));
+ return;
+ }
+
+ ctl = (struct cn_ctl_msg *)msg->data;
+
+ size = sizeof(*ctl) + (ctl->idx_notify_num + ctl->val_notify_num)*sizeof(struct cn_notify_req);
+
+ if (msg->len != size) {
+ printk(KERN_ERR "Wrong connector request size %u, must be == %u.\n",
+ msg->len, size);
+ return;
+ }
+
+ if (ctl->len + sizeof(*ctl) != msg->len) {
+ printk(KERN_ERR "Wrong message: msg->len=%u must be equal to inner_len=%u [+%u].\n",
+ msg->len, ctl->len, sizeof(*ctl));
+ return;
+ }
+
+ /*
+ * Remove notification.
+ */
+ if (ctl->group == 0) {
+ struct cn_ctl_entry *n;
+
+ spin_lock_bh(¬ify_lock);
+ list_for_each_entry_safe(ent, n, ¬ify_list, notify_entry) {
+ if (cn_ctl_msg_equals(ent->msg, ctl)) {
+ list_del(&ent->notify_entry);
+ kfree(ent);
+ }
+ }
+ spin_unlock_bh(¬ify_lock);
+
+ return;
+ }
+
+ size += sizeof(*ent);
+
+ ent = kmalloc(size, GFP_ATOMIC);
+ if (!ent) {
+ printk(KERN_ERR "Failed to allocate %d bytes for new notify entry.\n", size);
+ return;
+ }
+
+ memset(ent, 0, size);
+
+ ent->msg = (struct cn_ctl_msg *)(ent + 1);
+
+ memcpy(ent->msg, ctl, size - sizeof(*ent));
+
+ spin_lock_bh(¬ify_lock);
+ list_add(&ent->notify_entry, ¬ify_list);
+ spin_unlock_bh(¬ify_lock);
+}
+
+static int cn_init(void)
+{
+ struct cn_dev *dev = &cdev;
+ int err;
+
+ dev->input = cn_input;
+ dev->id.idx = cn_idx;
+ dev->id.val = cn_val;
+
+ dev->nls = netlink_kernel_create(unit, dev->input);
+ if (!dev->nls) {
+ printk(KERN_ERR "Failed to create new netlink socket(%u).\n",
+ unit);
+ return -EIO;
+ }
+
+ dev->cbdev = cn_queue_alloc_dev("cqueue", dev->nls);
+ if (!dev->cbdev) {
+ if (dev->nls->sk_socket)
+ sock_release(dev->nls->sk_socket);
+ return -EINVAL;
+ }
+
+ err = cn_add_callback(&dev->id, "connector", &cn_callback);
+ if (err) {
+ cn_queue_free_dev(dev->cbdev);
+ if (dev->nls->sk_socket)
+ sock_release(dev->nls->sk_socket);
+ return -EINVAL;
+ }
+
+ cn_already_initialized = 1;
+
+ return 0;
+}
+
+static void cn_fini(void)
+{
+ struct cn_dev *dev = &cdev;
+
+ cn_already_initialized = 0;
+
+ cn_del_callback(&dev->id);
+ cn_queue_free_dev(dev->cbdev);
+ if (dev->nls->sk_socket)
+ sock_release(dev->nls->sk_socket);
+}
+
+module_init(cn_init);
+module_exit(cn_fini);
+
+EXPORT_SYMBOL_GPL(cn_add_callback);
+EXPORT_SYMBOL_GPL(cn_del_callback);
+EXPORT_SYMBOL_GPL(cn_netlink_send);
diff -Nru /tmp/empty/Kconfig ./drivers/connector/Kconfig
--- /tmp/empty/Kconfig 1970-01-01 03:00:00.000000000 +0300
+++ ./drivers/connector/Kconfig 2005-04-11 16:16:27.176522912 +0400
@@ -0,0 +1,27 @@
+menu "Connector - unified userspace <-> kernelspace linker"
+
+config CONNECTOR
+ tristate "Connector - unified userspace <-> kernelspace linker"
+ depends on NET
+ ---help---
+ This is unified userspace <-> kernelspace connector working on top
+ of the netlink socket protocol.
+
+ Connector support can also be built as a module. If so, the module
+ will be called cn.ko.
+config CBUS
+ tristate "CBUS - ultra fast (for insert operations) message bus based on connector"
+ depends on CONNECTOR
+ ---help---
+ This message bus allows message passing between different agents
+ using connector's infrastructure.
+ It is extremly fast for insert operations so it can be used in performance
+ critical pathes instead of direct connector's methods calls.
+
+ CBUS uses per CPU variables and thus allows message reordering,
+ caller must be prepared (and use CPU id in it's messages).
+
+ CBUS support can also be built as a module. If so, the module
+ will be called cbus.
+
+endmenu
diff -Nru /tmp/empty/Makefile ./drivers/connector/Makefile
--- /tmp/empty/Makefile 1970-01-01 03:00:00.000000000 +0300
+++ ./drivers/connector/Makefile 2005-04-11 16:17:04.440857872 +0400
@@ -0,0 +1,3 @@
+obj-$(CONFIG_CONNECTOR) += cn.o
+obj-$(CONFIG_CBUS) += cbus.o
+cn-objs := cn_queue.o connector.o
Binary files /tmp/empty/.tmp_cbus.o and ./drivers/connector/.tmp_cbus.o differ
diff -Nru /tmp/empty/.tmp_cbus.ver ./drivers/connector/.tmp_cbus.ver
--- /tmp/empty/.tmp_cbus.ver 1970-01-01 03:00:00.000000000 +0300
+++ ./drivers/connector/.tmp_cbus.ver 2005-04-11 16:17:06.579532744 +0400
@@ -0,0 +1 @@
+__crc_cbus_insert = 0x2fcab901 ;
Binary files /tmp/empty/.tmp_connector.o and ./drivers/connector/.tmp_connector.o differ
diff -Nru /tmp/empty/.tmp_connector.ver ./drivers/connector/.tmp_connector.ver
--- /tmp/empty/.tmp_connector.ver 1970-01-01 03:00:00.000000000 +0300
+++ ./drivers/connector/.tmp_connector.ver 2005-04-11 16:16:39.753610904 +0400
@@ -0,0 +1,3 @@
+__crc_cn_add_callback = 0xdf3c19ec ;
+__crc_cn_del_callback = 0xff5a8cfe ;
+__crc_cn_netlink_send = 0xa53caf8b ;
--- /dev/null 2005-03-30 21:16:41.358774272 +0400
+++ ./include/linux/connector.h 2005-04-11 16:10:39.698347624 +0400
@@ -0,0 +1,166 @@
+/*
+ * connector.h
+ *
+ * 2004 Copyright (c) Evgeniy Polyakov <[email protected]>
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef __CONNECTOR_H
+#define __CONNECTOR_H
+
+#include <asm/types.h>
+
+#define CN_IDX_CONNECTOR 0xffffffff
+#define CN_VAL_CONNECTOR 0xffffffff
+
+
+/*
+ * Maximum connector's message size.
+ */
+#define CONNECTOR_MAX_MSG_SIZE 1024
+
+/*
+ * idx and val are unique identifiers which
+ * are used for message routing and
+ * must be registered in connector.h for in-kernel usage.
+ */
+
+struct cb_id
+{
+ __u32 idx;
+ __u32 val;
+};
+
+struct cn_msg
+{
+ struct cb_id id;
+
+ __u32 seq;
+ __u32 ack;
+
+ __u32 len; /* Length of the following data */
+ __u8 data[0];
+};
+
+/*
+ * Notify structure - requests notification about
+ * registering/unregistering idx/val in range [first, first+range].
+ */
+struct cn_notify_req
+{
+ __u32 first;
+ __u32 range;
+};
+
+/*
+ * Main notification control message
+ * *_notify_num - number of appropriate cn_notify_req structures after
+ * this struct.
+ * group - notification receiver's idx.
+ * len - total length of the attached data.
+ */
+struct cn_ctl_msg
+{
+ __u32 idx_notify_num;
+ __u32 val_notify_num;
+ __u32 group;
+ __u32 len;
+ __u8 data[0];
+};
+
+
+#ifdef __KERNEL__
+
+#include <asm/atomic.h>
+
+#include <linux/list.h>
+#include <linux/workqueue.h>
+
+#include <net/sock.h>
+
+#define CN_CBQ_NAMELEN 32
+
+struct cn_queue_dev
+{
+ atomic_t refcnt;
+ unsigned char name[CN_CBQ_NAMELEN];
+
+ struct workqueue_struct *cn_queue;
+
+ struct list_head queue_list;
+ spinlock_t queue_lock;
+
+ int netlink_groups;
+ struct sock *nls;
+};
+
+struct cn_callback
+{
+ unsigned char name[CN_CBQ_NAMELEN];
+
+ struct cb_id id;
+ void (* callback)(void *);
+ void *priv;
+
+ atomic_t refcnt;
+};
+
+struct cn_callback_entry
+{
+ struct list_head callback_entry;
+ struct cn_callback *cb;
+ struct work_struct work;
+ struct cn_queue_dev *pdev;
+
+ void (* destruct_data)(void *);
+ void *ddata;
+
+ int seq, group;
+ struct sock *nls;
+};
+
+struct cn_ctl_entry
+{
+ struct list_head notify_entry;
+ struct cn_ctl_msg *msg;
+};
+
+struct cn_dev
+{
+ struct cb_id id;
+
+ u32 seq, groups;
+ struct sock *nls;
+ void (*input)(struct sock *sk, int len);
+
+ struct cn_queue_dev *cbdev;
+};
+
+int cn_add_callback(struct cb_id *, char *, void (* callback)(void *));
+void cn_del_callback(struct cb_id *);
+int cn_netlink_send(struct cn_msg *, u32, int);
+
+int cn_queue_add_callback(struct cn_queue_dev *dev, struct cn_callback *cb);
+void cn_queue_del_callback(struct cn_queue_dev *dev, struct cb_id *id);
+
+struct cn_queue_dev *cn_queue_alloc_dev(char *name, struct sock *);
+void cn_queue_free_dev(struct cn_queue_dev *dev);
+
+int cn_cb_equal(struct cb_id *, struct cb_id *);
+
+#endif /* __KERNEL__ */
+#endif /* __CONNECTOR_H */
--- ./drivers/Kconfig.orig 2005-04-11 16:31:04.202194744 +0400
+++ ./drivers/Kconfig 2005-02-28 12:58:35.000000000 +0300
@@ -4,6 +4,8 @@
source "drivers/base/Kconfig"
+source "drivers/connector/Kconfig"
+
source "drivers/mtd/Kconfig"
source "drivers/parport/Kconfig"
--- ./drivers/Makefile.orig 2005-04-11 16:30:48.521578560 +0400
+++ ./drivers/Makefile 2005-02-28 12:58:35.000000000 +0300
@@ -17,6 +17,8 @@
# default.
obj-y += char/
+obj-$(CONFIG_CONNECTOR) += connector/
+
# i810fb and intelfb depend on char/agp/
obj-$(CONFIG_FB_I810) += video/i810/
obj-$(CONFIG_FB_INTEL) += video/intelfb/
diff -Nru /tmp/empty/cn_test.c ./Documentation/connector/cn_test.c
--- /tmp/empty/cn_test.c 1970-01-01 03:00:00.000000000 +0300
+++ ./Documentation/connector/cn_test.c 2005-04-11 16:11:11.967441976 +0400
@@ -0,0 +1,203 @@
+/*
+ * cn_test.c
+ *
+ * 2004 Copyright (c) Evgeniy Polyakov <[email protected]>
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/skbuff.h>
+#include <linux/timer.h>
+
+#include "connector.h"
+
+static struct cb_id cn_test_id = { 0x123, 0x456 };
+static char cn_test_name[] = "cn_test";
+static struct sock *nls;
+static struct timer_list cn_test_timer;
+
+void cn_test_callback(void *data)
+{
+ struct cn_msg *msg = (struct cn_msg *)data;
+
+ printk("%s: %lu: idx=%x, val=%x, seq=%u, ack=%u, len=%d: %s.\n",
+ __func__, jiffies, msg->id.idx, msg->id.val,
+ msg->seq, msg->ack, msg->len, (char *)msg->data);
+}
+
+static int cn_test_want_notify(void)
+{
+ struct cn_ctl_msg *ctl;
+ struct cn_notify_req *req;
+ struct cn_msg *msg = NULL;
+ int size, size0;
+ struct sk_buff *skb;
+ struct nlmsghdr *nlh;
+ u32 group = 1;
+
+ size0 = sizeof(*msg) + sizeof(*ctl) + 3*sizeof(*req);
+
+ size = NLMSG_SPACE(size0);
+
+ skb = alloc_skb(size, GFP_ATOMIC);
+ if (!skb) {
+ printk(KERN_ERR "Failed to allocate new skb with size=%u.\n", size);
+
+ return -ENOMEM;
+ }
+
+ nlh = NLMSG_PUT(skb, 0, 0x123, NLMSG_DONE, size - NLMSG_ALIGN(sizeof(*nlh)));
+
+ msg = (struct cn_msg *)NLMSG_DATA(nlh);
+
+ memset(msg, 0, size0);
+
+ msg->id.idx = -1;
+ msg->id.val = -1;
+ msg->seq = 0x123;
+ msg->ack = 0x345;
+ msg->len = size0 - sizeof(*msg);
+
+ ctl = (struct cn_ctl_msg *)(msg + 1);
+
+ ctl->idx_notify_num = 1;
+ ctl->val_notify_num = 2;
+ ctl->group = group;
+ ctl->len = msg->len - sizeof(*ctl);
+
+ req = (struct cn_notify_req *)(ctl + 1);
+
+ /*
+ * Idx.
+ */
+ req->first = cn_test_id.idx;
+ req->range = 10;
+
+ /*
+ * Val 0.
+ */
+ req++;
+ req->first = cn_test_id.val;
+ req->range = 10;
+
+ /*
+ * Val 1.
+ */
+ req++;
+ req->first = cn_test_id.val + 20;
+ req->range = 10;
+
+ NETLINK_CB(skb).dst_groups = ctl->group;
+ //netlink_broadcast(nls, skb, 0, ctl->group, GFP_ATOMIC);
+ netlink_unicast(nls, skb, 0, 0);
+
+ printk(KERN_INFO "Request was sent. Group=0x%x.\n", ctl->group);
+
+ return 0;
+
+nlmsg_failure:
+ printk(KERN_ERR "Failed to send %u.%u\n", msg->seq, msg->ack);
+ kfree_skb(skb);
+ return -EINVAL;
+}
+
+static u32 cn_test_timer_counter;
+static void cn_test_timer_func(unsigned long __data)
+{
+ struct cn_msg *m;
+ char data[32];
+
+ m = kmalloc(sizeof(*m) + sizeof(data), GFP_ATOMIC);
+ if (m)
+ {
+ memset(m, 0, sizeof(*m) + sizeof(data));
+
+ memcpy(&m->id, &cn_test_id, sizeof(m->id));
+ m->seq = cn_test_timer_counter;
+ m->len = sizeof(data);
+
+ m->len = scnprintf(data, sizeof(data), "counter = %u", cn_test_timer_counter) + 1;
+
+ memcpy(m+1, data, m->len);
+
+ cbus_insert(m, gfp_any());
+ //cn_netlink_send(m, gfp_any());
+ kfree(m);
+ }
+
+ cn_test_timer_counter++;
+
+ mod_timer(&cn_test_timer, jiffies + HZ);
+}
+
+static int cn_test_init(void)
+{
+ int err;
+#if 0
+ nls = netlink_kernel_create(NETLINK_TAPBASE + 1, NULL);
+ if (!nls) {
+ printk(KERN_ERR "Failed to create new netlink socket(%u).\n", NETLINK_NFLOG);
+ return -EIO;
+ }
+
+ err = cn_test_want_notify();
+ if (err)
+ goto err_out;
+#endif
+ err = cn_add_callback(&cn_test_id, cn_test_name, cn_test_callback);
+ if (err)
+ goto err_out;
+ cn_test_id.val++;
+ err = cn_add_callback(&cn_test_id, cn_test_name, cn_test_callback);
+ if (err) {
+ cn_del_callback(&cn_test_id);
+ goto err_out;
+ }
+
+ init_timer(&cn_test_timer);
+ cn_test_timer.function = cn_test_timer_func;
+ cn_test_timer.expires = jiffies + HZ;
+ cn_test_timer.data = 0;
+ add_timer(&cn_test_timer);
+
+ return 0;
+
+err_out:
+ if (nls && nls->sk_socket)
+ sock_release(nls->sk_socket);
+
+ return err;
+}
+
+static void cn_test_fini(void)
+{
+ del_timer_sync(&cn_test_timer);
+ cn_del_callback(&cn_test_id);
+ cn_test_id.val--;
+ cn_del_callback(&cn_test_id);
+ if (nls && nls->sk_socket)
+ sock_release(nls->sk_socket);
+}
+
+module_init(cn_test_init);
+module_exit(cn_test_fini);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Evgeniy Polyakov <[email protected]>");
+MODULE_DESCRIPTION("Connector's test module");
diff -Nru /tmp/empty/connector.txt ./Documentation/connector/connector.txt
--- /tmp/empty/connector.txt 1970-01-01 03:00:00.000000000 +0300
+++ ./Documentation/connector/connector.txt 2005-04-11 16:12:01.804865536 +0400
@@ -0,0 +1,155 @@
+/*****************************************/
+Kernel Connector.
+/*****************************************/
+
+Kernel connector - new netlink based userspace <-> kernel space easy to use communication module.
+
+Connector driver adds possibility to connect various agents using
+netlink based network.
+One must register callback and identifier. When driver receives
+special netlink message with appropriate identifier, appropriate
+callback will be called.
+
+From the userspace point of view it's quite straightforward:
+socket();
+bind();
+send();
+recv();
+
+But if kernelspace want to use full power of such connections, driver
+writer must create special sockets, must know about struct sk_buff
+handling...
+Connector allows any kernelspace agents to use netlink based
+networking for inter-process communication in a significantly easier
+way:
+
+int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *));
+void cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask);
+
+struct cb_id
+{
+ __u32 idx;
+ __u32 val;
+};
+
+idx and val are unique identifiers which must be registered in connector.h
+for in-kernel usage.
+void (*callback) (void *) - is a callback function which will be called
+when message with above idx.val will be received by connector core.
+Argument for that function must be dereferenced to struct cn_msg *.
+
+struct cn_msg
+{
+ struct cb_id id;
+
+ __u32 seq;
+ __u32 ack;
+
+ __u32 len; /* Length of the following data */
+ __u8 data[0];
+};
+
+/*****************************************/
+Connector interfaces.
+/*****************************************/
+
+int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *));
+Registers new callback with connector core.
+
+struct cb_id *id - unique connector's user identifier.
+ It must be registered in connector.h for legal in-kernel users.
+char *name - connector's callback symbolic name.
+void (*callback) (void *) - connector's callback.
+ Argument must be dereferenced to struct cn_msg *.
+
+void cn_del_callback(struct cb_id *id);
+Unregisters new callback with connector core.
+
+struct cb_id *id - unique connector's user identifier.
+
+void cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask);
+Sends message to the specified groups.
+It can be safely called from any context, but may silently
+fail under strong memory pressure.
+
+struct cn_msg * - message header(with attached data).
+u32 __groups - destination groups.
+ If __groups is zero, then appropriate group will
+ be searched through all registered connector users,
+ and message will be delivered to the group which was
+ created for user with the same ID as in msg.
+ If __groups is not zero, then message will be delivered
+ to the specified group.
+int gfp_mask - GFP mask.
+
+Note: When registering new callback user, connector core assigns netlink group
+to the user which is equal to it's id.idx.
+
+/*****************************************/
+Protocol description.
+/*****************************************/
+
+Current offers transport layer with fixed header.
+Recommended protocol which uses such header is following:
+
+msg->seq and msg->ack are used to determine message genealogy.
+When someone sends message it puts there locally unique sequence
+and random acknowledge numbers.
+Sequence number may be copied into nlmsghdr->nlmsg_seq too.
+
+Sequence number is incremented with each message to be sent.
+
+If we expect reply to our message, then sequence number in received
+message MUST be the same as in original message, and acknowledge
+number MUST be the same + 1.
+
+If we receive message and it's sequence number is not equal to one
+we are expecting, then it is new message.
+If we receive message and it's sequence number is the same as one we
+are expecting, but it's acknowledge is not equal acknowledge number
+in original message + 1, then it is new message.
+
+Obviously, protocol header contains above id.
+
+connector allows event notification in the following form:
+kernel driver or userspace process can ask connector to notify it
+when selected id's will be turned on or off(registered or unregistered
+it's callback). It is done by sending special command to connector
+driver(it also registers itself with id={-1, -1}).
+
+As example of usage Documentation/connector now contains cn_test.c -
+testing module which uses connector to request notification
+and to send messages.
+
+
+/*****************************************/
+CBUS.
+/*****************************************/
+
+This message bus allows message passing between different agents
+using connector's infrastructure.
+It is extremely fast for insert operations so it can be used in performance
+critical paths in any context instead of direct connector's methods calls.
+
+CBUS uses per CPU variables and thus allows message reordering,
+caller must be prepared (and use CPU id in it's messages).
+
+Usage is very simple - just call cbus_insert(struct cn_msg *msg, int gfp_mask);
+It can fail, so caller must check return value - zero on success
+and negative error code otherwise.
+
+Benchmark with modified fork connector and fork bomb on 2-way system
+did not show any differences between vanilla 2.6.11 and CBUS.
+
+
+
+/*****************************************/
+Reliability.
+/*****************************************/
+Netlink itself is not reliable protocol,
+that means that messages can be lost
+due to memory pressure or process' receiving
+queue overflowed, so caller is warned
+must be prepared. That is why struct cn_msg
+[main connector's message header] contains
+u32 seq and u32 ack fields.
--
Evgeniy Polyakov
Crash is better than data corruption. -- Artur Grabowski
* Evgeniy Polyakov <[email protected]> 2005-04-11 16:59
> + size = NLMSG_SPACE(sizeof(*msg) + msg->len);
> +
> + skb = alloc_skb(size, gfp_mask);
> + if (!skb) {
> + printk(KERN_ERR "Failed to allocate new skb with size=%u.\n", size);
> + return -ENOMEM;
> + }
> +
> + nlh = NLMSG_PUT(skb, 0, msg->seq, NLMSG_DONE, size - sizeof(*nlh));
Needs same fix.
> + size0 = sizeof(*msg) + sizeof(*ctl) + 3*sizeof(*req);
> +
> + size = NLMSG_SPACE(size0);
> +
> + skb = alloc_skb(size, GFP_ATOMIC);
> + if (!skb) {
> + printk(KERN_ERR "Failed to allocate new skb with size=%u.\n", size);
> +
> + return -ENOMEM;
> + }
> +
> + nlh = NLMSG_PUT(skb, 0, 0x123, NLMSG_DONE, size - NLMSG_ALIGN(sizeof(*nlh)));
Just pass size0 instead of reverting the calculation.
Some nlmsg alignment cleanups. Documentation module cleanups.
Thanks, Thomas.
Signed-off-by: Evgeniy Polyakov <[email protected]>
* looking for [email protected]/connector--main--0--patch-47 to compare with
* comparing to [email protected]/connector--main--0--patch-47
M cn_test.c
M connector.c
* modified files
--- orig/Documentation/connector/cn_test.c
+++ mod/Documentation/connector/cn_test.c
@@ -55,14 +55,14 @@
size = NLMSG_SPACE(size0);
- skb = alloc_skb(size, GFP_ATOMIC);
+ skb = alloc_skb(size, GFP_KERNEL);
if (!skb) {
printk(KERN_ERR "Failed to allocate new skb with size=%u.\n", size);
return -ENOMEM;
}
- nlh = NLMSG_PUT(skb, 0, 0x123, NLMSG_DONE, size - sizeof(*nlh));
+ nlh = NLMSG_PUT(skb, 0, 0x123, NLMSG_DONE, size0);
msg = (struct cn_msg *)NLMSG_DATA(nlh);
@@ -104,7 +104,6 @@
req->range = 10;
NETLINK_CB(skb).dst_groups = ctl->group;
- //netlink_broadcast(nls, skb, 0, ctl->group, GFP_ATOMIC);
netlink_unicast(nls, skb, 0, 0);
printk(KERN_INFO "Request was sent. Group=0x%x.\n", ctl->group);
@@ -124,8 +123,7 @@
char data[32];
m = kmalloc(sizeof(*m) + sizeof(data), GFP_ATOMIC);
- if (m)
- {
+ if (m) {
memset(m, 0, sizeof(*m) + sizeof(data));
memcpy(&m->id, &cn_test_id, sizeof(m->id));
@@ -136,8 +134,8 @@
memcpy(m+1, data, m->len);
- cbus_insert(m);
- //cn_netlink_send(m, 0);
+ cbus_insert(m, GFP_ATOMIC);
+ //cn_netlink_send(m, 0, GFP_ATOMIC);
kfree(m);
}
--- orig/drivers/connector/connector.c
+++ mod/drivers/connector/connector.c
@@ -100,15 +100,15 @@
else
groups = __groups;
- size = NLMSG_SPACE(sizeof(*msg) + msg->len);
+ size = sizeof(*msg) + msg->len;
- skb = alloc_skb(size, gfp_mask);
+ skb = alloc_skb(NLMSG_SPACE(size), gfp_mask);
if (!skb) {
- printk(KERN_ERR "Failed to allocate new skb with size=%u.\n", size);
+ printk(KERN_ERR "Failed to allocate new skb with size=%u.\n", NLMSG_SPACE(size));
return -ENOMEM;
}
- nlh = NLMSG_PUT(skb, 0, msg->seq, NLMSG_DONE, size - NLMSG_ALIGN(sizeof(*nlh)));
+ nlh = NLMSG_PUT(skb, 0, msg->seq, NLMSG_DONE, size);
data = (struct cn_msg *)NLMSG_DATA(nlh);
Evgeniy Polyakov
Only failure makes us experts. -- Theo de Raadt
Hi Evgeniy,
On 4/11/05, Evgeniy Polyakov <[email protected]> wrote:
> /*****************************************/
> Kernel Connector.
> /*****************************************/
...
> +static int cn_call_callback(struct cn_msg *msg, void (*destruct_data) (void *), void *data)
> +{
> + struct cn_callback_entry *__cbq;
> + struct cn_dev *dev = &cdev;
> + int found = 0;
> +
> + spin_lock_bh(&dev->cbdev->queue_lock);
> + list_for_each_entry(__cbq, &dev->cbdev->queue_list, callback_entry) {
> + if (cn_cb_equal(&__cbq->cb->id, &msg->id)) {
> + __cbq->cb->priv = msg;
> +
> + __cbq->ddata = data;
> + __cbq->destruct_data = destruct_data;
> +
> + queue_work(dev->cbdev->cn_queue, &__cbq->work);
It looks like there is a problem with the code. As far as I can see
there is only one cn_callback_entry associated with each callback. So,
if someone sends netlink messages with the same id at a high enough
rate (so cbdev's work queue does not get a chance to get scheduled and
process pending requests) ddata and the destructor will be overwritten
which can lead to memory leaks and non-delivery of some messages.
Am I missing something?
--
Dmitry
On Tue, 26 Apr 2005 10:57:55 -0500
Dmitry Torokhov <[email protected]> wrote:
> Hi Evgeniy,
>
> On 4/11/05, Evgeniy Polyakov <[email protected]> wrote:
> > /*****************************************/
> > Kernel Connector.
> > /*****************************************/
> ...
> > +static int cn_call_callback(struct cn_msg *msg, void (*destruct_data) (void *), void *data)
> > +{
> > + struct cn_callback_entry *__cbq;
> > + struct cn_dev *dev = &cdev;
> > + int found = 0;
> > +
> > + spin_lock_bh(&dev->cbdev->queue_lock);
> > + list_for_each_entry(__cbq, &dev->cbdev->queue_list, callback_entry) {
> > + if (cn_cb_equal(&__cbq->cb->id, &msg->id)) {
> > + __cbq->cb->priv = msg;
> > +
> > + __cbq->ddata = data;
> > + __cbq->destruct_data = destruct_data;
> > +
> > + queue_work(dev->cbdev->cn_queue, &__cbq->work);
>
> It looks like there is a problem with the code. As far as I can see
> there is only one cn_callback_entry associated with each callback. So,
> if someone sends netlink messages with the same id at a high enough
> rate (so cbdev's work queue does not get a chance to get scheduled and
> process pending requests) ddata and the destructor will be overwritten
> which can lead to memory leaks and non-delivery of some messages.
>
> Am I missing something?
Connector needs to check return value here - zero means
that work was already queued and we must free shared skb.
There may not be the same work with different data.
> --
> Dmitry
Evgeniy Polyakov
Only failure makes us experts. -- Theo de Raadt
On Tue, 26 Apr 2005 20:24:37 +0400
Evgeniy Polyakov <[email protected]> wrote:
> On Tue, 26 Apr 2005 10:57:55 -0500
> Dmitry Torokhov <[email protected]> wrote:
>
> > Hi Evgeniy,
> >
> > On 4/11/05, Evgeniy Polyakov <[email protected]> wrote:
> > > /*****************************************/
> > > Kernel Connector.
> > > /*****************************************/
> > ...
> > > +static int cn_call_callback(struct cn_msg *msg, void (*destruct_data) (void *), void *data)
> > > +{
> > > + struct cn_callback_entry *__cbq;
> > > + struct cn_dev *dev = &cdev;
> > > + int found = 0;
> > > +
> > > + spin_lock_bh(&dev->cbdev->queue_lock);
> > > + list_for_each_entry(__cbq, &dev->cbdev->queue_list, callback_entry) {
> > > + if (cn_cb_equal(&__cbq->cb->id, &msg->id)) {
> > > + __cbq->cb->priv = msg;
> > > +
> > > + __cbq->ddata = data;
> > > + __cbq->destruct_data = destruct_data;
> > > +
> > > + queue_work(dev->cbdev->cn_queue, &__cbq->work);
> >
> > It looks like there is a problem with the code. As far as I can see
> > there is only one cn_callback_entry associated with each callback. So,
> > if someone sends netlink messages with the same id at a high enough
> > rate (so cbdev's work queue does not get a chance to get scheduled and
> > process pending requests) ddata and the destructor will be overwritten
> > which can lead to memory leaks and non-delivery of some messages.
> >
> > Am I missing something?
>
> Connector needs to check return value here - zero means
> that work was already queued and we must free shared skb.
>
> There may not be the same work with different data.
Here is the patch:
* looking for [email protected]/connector--main--0--patch-52 to compare with
* comparing to [email protected]/connector--main--0--patch-52
M connector.c
* modified files
--- orig/drivers/connector/connector.c
+++ mod/drivers/connector/connector.c
@@ -151,8 +151,8 @@
__cbq->ddata = data;
__cbq->destruct_data = destruct_data;
- queue_work(dev->cbdev->cn_queue, &__cbq->work);
- found = 1;
+ if (queue_work(dev->cbdev->cn_queue, &__cbq->work))
+ found = 1;
break;
}
}
Evgeniy Polyakov
Only failure makes us experts. -- Theo de Raadt
On 4/26/05, Evgeniy Polyakov <[email protected]> wrote:
>
> --- orig/drivers/connector/connector.c
> +++ mod/drivers/connector/connector.c
> @@ -151,8 +151,8 @@
> __cbq->ddata = data;
> __cbq->destruct_data = destruct_data;
>
> - queue_work(dev->cbdev->cn_queue, &__cbq->work);
> - found = 1;
> + if (queue_work(dev->cbdev->cn_queue, &__cbq->work))
> + found = 1;
> break;
What does it help exactly? By the time you checked result of
queue_work you have already corrupted work structure wuth the new data
(and probably destructor).
Also, where is the rest of the code? Should we notify caller that
cn_netlink_send has dropped the message? And how do we do that?
--
Dmitry
On 4/26/05, Evgeniy Polyakov <[email protected]> wrote:
> On Tue, 26 Apr 2005 10:57:55 -0500
> Dmitry Torokhov <[email protected]> wrote:
>
> > Hi Evgeniy,
> >
> > On 4/11/05, Evgeniy Polyakov <[email protected]> wrote:
> > > /*****************************************/
> > > Kernel Connector.
> > > /*****************************************/
> > ...
> > > +static int cn_call_callback(struct cn_msg *msg, void (*destruct_data) (void *), void *data)
> > > +{
> > > + struct cn_callback_entry *__cbq;
> > > + struct cn_dev *dev = &cdev;
> > > + int found = 0;
> > > +
> > > + spin_lock_bh(&dev->cbdev->queue_lock);
> > > + list_for_each_entry(__cbq, &dev->cbdev->queue_list, callback_entry) {
> > > + if (cn_cb_equal(&__cbq->cb->id, &msg->id)) {
> > > + __cbq->cb->priv = msg;
> > > +
> > > + __cbq->ddata = data;
> > > + __cbq->destruct_data = destruct_data;
> > > +
> > > + queue_work(dev->cbdev->cn_queue, &__cbq->work);
> >
> > It looks like there is a problem with the code. As far as I can see
> > there is only one cn_callback_entry associated with each callback. So,
> > if someone sends netlink messages with the same id at a high enough
> > rate (so cbdev's work queue does not get a chance to get scheduled and
> > process pending requests) ddata and the destructor will be overwritten
> > which can lead to memory leaks and non-delivery of some messages.
> >
> > Am I missing something?
>
> Connector needs to check return value here - zero means
> that work was already queued and we must free shared skb.
>
> There may not be the same work with different data.
>
Ugh, that really blows. Now every user of a particular message type
has to coordinate efforts with other users of the same message type...
Imability to "fire and forget" undermines usefulness of whole
connector. How will you for example implement hotplug notification
over connector? Have kobject_hotplug wait and block other instances?
But wait on what?
--
Dmitry
On Tue, 26 Apr 2005 12:31:58 -0500
Dmitry Torokhov <[email protected]> wrote:
> > There may not be the same work with different data.
> >
>
> Ugh, that really blows. Now every user of a particular message type
> has to coordinate efforts with other users of the same message type...
>
> Imability to "fire and forget" undermines usefulness of whole
> connector. How will you for example implement hotplug notification
> over connector? Have kobject_hotplug wait and block other instances?
> But wait on what?
This is a simple load balancing schema.
Netlink messages may be dropped in socket queue when
they are bing delivered to userspace - this is the same -
if work queue can not be scheduled, message will be dropped,
but in this case userspace also can not be scheduled
and message will be dropped.
> --
> Dmitry
>
Evgeniy Polyakov
Only failure makes us experts. -- Theo de Raadt
On Tue, 26 Apr 2005 12:34:13 -0500
Dmitry Torokhov <[email protected]> wrote:
> On 4/26/05, Evgeniy Polyakov <[email protected]> wrote:
> >
> > --- orig/drivers/connector/connector.c
> > +++ mod/drivers/connector/connector.c
> > @@ -151,8 +151,8 @@
> > __cbq->ddata = data;
> > __cbq->destruct_data = destruct_data;
> >
> > - queue_work(dev->cbdev->cn_queue, &__cbq->work);
> > - found = 1;
> > + if (queue_work(dev->cbdev->cn_queue, &__cbq->work))
> > + found = 1;
> > break;
>
> What does it help exactly? By the time you checked result of
> queue_work you have already corrupted work structure wuth the new data
> (and probably destructor).
>
> Also, where is the rest of the code? Should we notify caller that
> cn_netlink_send has dropped the message? And how do we do that?
Yes, I found it too.
Following patch should be the solution:
--- orig/drivers/connector/connector.c
+++ mod/drivers/connector/connector.c
@@ -146,13 +146,16 @@
spin_lock_bh(&dev->cbdev->queue_lock);
list_for_each_entry(__cbq, &dev->cbdev->queue_list, callback_entry) {
if (cn_cb_equal(&__cbq->cb->id, &msg->id)) {
- __cbq->cb->priv = msg;
+
+ if (!test_bit(0, &work->pending)) {
+ __cbq->cb->priv = msg;
- __cbq->ddata = data;
- __cbq->destruct_data = destruct_data;
+ __cbq->ddata = data;
+ __cbq->destruct_data = destruct_data;
- if (queue_work(dev->cbdev->cn_queue, &__cbq->work))
- found = 1;
+ if (queue_work(dev->cbdev->cn_queue, &__cbq->work))
+ found = 1;
+ }
break;
}
}
> --
> Dmitry
>
Evgeniy Polyakov
Only failure makes us experts. -- Theo de Raadt
On Tue, 26 Apr 2005 22:03:54 +0400
Evgeniy Polyakov <[email protected]> wrote:
> On Tue, 26 Apr 2005 12:31:58 -0500
> Dmitry Torokhov <[email protected]> wrote:
>
> > > There may not be the same work with different data.
> > >
> >
> > Ugh, that really blows. Now every user of a particular message type
> > has to coordinate efforts with other users of the same message type...
> >
> > Imability to "fire and forget" undermines usefulness of whole
> > connector. How will you for example implement hotplug notification
> > over connector? Have kobject_hotplug wait and block other instances?
> > But wait on what?
>
> This is a simple load balancing schema.
> Netlink messages may be dropped in socket queue when
> they are bing delivered to userspace - this is the same -
> if work queue can not be scheduled, message will be dropped,
> but in this case userspace also can not be scheduled
> and message will be dropped.
Btw, I belive we see that it is reverse direction...
So we have reverse load balancing schema here -
exactly like userspace socket queueing.
We basically can not sleep here - it will be DOS.
> > --
> > Dmitry
> >
>
>
> Evgeniy Polyakov
>
> Only failure makes us experts. -- Theo de Raadt
Evgeniy Polyakov
Only failure makes us experts. -- Theo de Raadt
On Tue, 26 Apr 2005 22:10:26 +0400
Evgeniy Polyakov <[email protected]> wrote:
> > > > There may not be the same work with different data.
> > > >
> > >
> > > Ugh, that really blows. Now every user of a particular message type
> > > has to coordinate efforts with other users of the same message type...
> > >
> > > Imability to "fire and forget" undermines usefulness of whole
> > > connector. How will you for example implement hotplug notification
> > > over connector? Have kobject_hotplug wait and block other instances?
> > > But wait on what?
> >
> > This is a simple load balancing schema.
> > Netlink messages may be dropped in socket queue when
> > they are bing delivered to userspace - this is the same -
> > if work queue can not be scheduled, message will be dropped,
> > but in this case userspace also can not be scheduled
> > and message will be dropped.
>
> Btw, I belive we see that it is reverse direction...
> So we have reverse load balancing schema here -
> exactly like userspace socket queueing.
> We basically can not sleep here - it will be DOS.
And yet another btw - netlink is unreliable protocol,
that is why there are seq and ack fields in connector's header -
connector's users must implement some check on top of
raw connector messages - it could be returned message with
timeout resending and so on.
I wrote it several times and it is in connector's documentation.
Evgeniy Polyakov
Only failure makes us experts. -- Theo de Raadt
On 4/26/05, Evgeniy Polyakov <[email protected]> wrote:
> Yes, I found it too.
> Following patch should be the solution:
>
> --- orig/drivers/connector/connector.c
> +++ mod/drivers/connector/connector.c
> @@ -146,13 +146,16 @@
> spin_lock_bh(&dev->cbdev->queue_lock);
> list_for_each_entry(__cbq, &dev->cbdev->queue_list, callback_entry) {
> if (cn_cb_equal(&__cbq->cb->id, &msg->id)) {
> - __cbq->cb->priv = msg;
> +
> + if (!test_bit(0, &work->pending)) {
> + __cbq->cb->priv = msg;
>
> - __cbq->ddata = data;
> - __cbq->destruct_data = destruct_data;
> + __cbq->ddata = data;
> + __cbq->destruct_data = destruct_data;
>
Still not good enough - work->pending bit gets cleared when work has
been scheduled, but before executing payload. You still have the race.
--
Dmitry
On 4/26/05, Evgeniy Polyakov <[email protected]> wrote:
> On Tue, 26 Apr 2005 22:10:26 +0400
> Evgeniy Polyakov <[email protected]> wrote:
>
> > > > > There may not be the same work with different data.
> > > > >
> > > >
> > > > Ugh, that really blows. Now every user of a particular message type
> > > > has to coordinate efforts with other users of the same message type...
> > > >
> > > > Imability to "fire and forget" undermines usefulness of whole
> > > > connector. How will you for example implement hotplug notification
> > > > over connector? Have kobject_hotplug wait and block other instances?
> > > > But wait on what?
> > >
> > > This is a simple load balancing schema.
> > > Netlink messages may be dropped in socket queue when
> > > they are bing delivered to userspace - this is the same -
> > > if work queue can not be scheduled, message will be dropped,
> > > but in this case userspace also can not be scheduled
> > > and message will be dropped.
> >
> > Btw, I belive we see that it is reverse direction...
> > So we have reverse load balancing schema here -
> > exactly like userspace socket queueing.
> > We basically can not sleep here - it will be DOS.
>
> And yet another btw - netlink is unreliable protocol,
> that is why there are seq and ack fields in connector's header -
> connector's users must implement some check on top of
> raw connector messages - it could be returned message with
> timeout resending and so on.
> I wrote it several times and it is in connector's documentation.
>
I can accept that netlink is unreliable protocol and can drop messages
- but that should only happen when there is memory pressure. In your
case you simply can not send messages with frequency higher than your
timeslice unless you implement synchronous protocol.
Load balancing allows bursts as long as average rate is under control
- connector does not.
--
Dmitry
On Tue, 26 Apr 2005 13:20:08 -0500
Dmitry Torokhov <[email protected]> wrote:
> On 4/26/05, Evgeniy Polyakov <[email protected]> wrote:
> > Yes, I found it too.
> > Following patch should be the solution:
> >
> > --- orig/drivers/connector/connector.c
> > +++ mod/drivers/connector/connector.c
> > @@ -146,13 +146,16 @@
> > spin_lock_bh(&dev->cbdev->queue_lock);
> > list_for_each_entry(__cbq, &dev->cbdev->queue_list, callback_entry) {
> > if (cn_cb_equal(&__cbq->cb->id, &msg->id)) {
> > - __cbq->cb->priv = msg;
> > +
> > + if (!test_bit(0, &work->pending)) {
> > + __cbq->cb->priv = msg;
> >
> > - __cbq->ddata = data;
> > - __cbq->destruct_data = destruct_data;
> > + __cbq->ddata = data;
> > + __cbq->destruct_data = destruct_data;
> >
>
> Still not good enough - work->pending bit gets cleared when work has
> been scheduled, but before executing payload. You still have the race.
Data pointer is copied before bit is set,
but I forget that it is not data, but another pointer
which may be overwritten.
I think we may finish it by setting skb as data,
and call kfree_skb() as destructor.
Thank you for your analysis.
> --
> Dmitry
Evgeniy Polyakov
Only failure makes us experts. -- Theo de Raadt
On Tue, 26 Apr 2005 13:25:50 -0500
Dmitry Torokhov <[email protected]> wrote:
> On 4/26/05, Evgeniy Polyakov <[email protected]> wrote:
> > On Tue, 26 Apr 2005 22:10:26 +0400
> > Evgeniy Polyakov <[email protected]> wrote:
> >
> > > > > > There may not be the same work with different data.
> > > > > >
> > > > >
> > > > > Ugh, that really blows. Now every user of a particular message type
> > > > > has to coordinate efforts with other users of the same message type...
> > > > >
> > > > > Imability to "fire and forget" undermines usefulness of whole
> > > > > connector. How will you for example implement hotplug notification
> > > > > over connector? Have kobject_hotplug wait and block other instances?
> > > > > But wait on what?
> > > >
> > > > This is a simple load balancing schema.
> > > > Netlink messages may be dropped in socket queue when
> > > > they are bing delivered to userspace - this is the same -
> > > > if work queue can not be scheduled, message will be dropped,
> > > > but in this case userspace also can not be scheduled
> > > > and message will be dropped.
> > >
> > > Btw, I belive we see that it is reverse direction...
> > > So we have reverse load balancing schema here -
> > > exactly like userspace socket queueing.
> > > We basically can not sleep here - it will be DOS.
> >
> > And yet another btw - netlink is unreliable protocol,
> > that is why there are seq and ack fields in connector's header -
> > connector's users must implement some check on top of
> > raw connector messages - it could be returned message with
> > timeout resending and so on.
> > I wrote it several times and it is in connector's documentation.
> >
>
> I can accept that netlink is unreliable protocol and can drop messages
> - but that should only happen when there is memory pressure. In your
> case you simply can not send messages with frequency higher than your
> timeslice unless you implement synchronous protocol.
>
> Load balancing allows bursts as long as average rate is under control
> - connector does not.
It happens not only when there is memory pressure,
but also when remote side does not read it's messages.
If system can not schedule work queue, then
it is the same situation.
And, btw, if system under so strong pressure than
work queue can not be scheduled, then no skb will be delivered
to/from userspace too.
> --
> Dmitry
Evgeniy Polyakov
Only failure makes us experts. -- Theo de Raadt
On 4/26/05, Evgeniy Polyakov <[email protected]> wrote:
> On Tue, 26 Apr 2005 13:20:08 -0500
> Dmitry Torokhov <[email protected]> wrote:
>
> > On 4/26/05, Evgeniy Polyakov <[email protected]> wrote:
> > > Yes, I found it too.
> > > Following patch should be the solution:
> > >
> > > --- orig/drivers/connector/connector.c
> > > +++ mod/drivers/connector/connector.c
> > > @@ -146,13 +146,16 @@
> > > spin_lock_bh(&dev->cbdev->queue_lock);
> > > list_for_each_entry(__cbq, &dev->cbdev->queue_list, callback_entry) {
> > > if (cn_cb_equal(&__cbq->cb->id, &msg->id)) {
> > > - __cbq->cb->priv = msg;
> > > +
> > > + if (!test_bit(0, &work->pending)) {
> > > + __cbq->cb->priv = msg;
> > >
> > > - __cbq->ddata = data;
> > > - __cbq->destruct_data = destruct_data;
> > > + __cbq->ddata = data;
> > > + __cbq->destruct_data = destruct_data;
> > >
> >
> > Still not good enough - work->pending bit gets cleared when work has
> > been scheduled, but before executing payload. You still have the race.
>
> Data pointer is copied before bit is set,
> but I forget that it is not data, but another pointer
> which may be overwritten.
>
> I think we may finish it by setting skb as data,
> and call kfree_skb() as destructor.
>
Yes, that woudl work, although I would urge you to implement a message
queue for callbacks (probably limit it to 1000 messages or so) to
allow bursting.
> Thank you for your analysis.
You are welcome.
--
Dmitry
On Tue, 26 Apr 2005 13:42:10 -0500
Dmitry Torokhov <[email protected]> wrote:
> On 4/26/05, Evgeniy Polyakov <[email protected]> wrote:
> > On Tue, 26 Apr 2005 13:20:08 -0500
> > Dmitry Torokhov <[email protected]> wrote:
> >
> > > On 4/26/05, Evgeniy Polyakov <[email protected]> wrote:
> > > > Yes, I found it too.
> > > > Following patch should be the solution:
> > > >
> > > > --- orig/drivers/connector/connector.c
> > > > +++ mod/drivers/connector/connector.c
> > > > @@ -146,13 +146,16 @@
> > > > spin_lock_bh(&dev->cbdev->queue_lock);
> > > > list_for_each_entry(__cbq, &dev->cbdev->queue_list, callback_entry) {
> > > > if (cn_cb_equal(&__cbq->cb->id, &msg->id)) {
> > > > - __cbq->cb->priv = msg;
> > > > +
> > > > + if (!test_bit(0, &work->pending)) {
> > > > + __cbq->cb->priv = msg;
> > > >
> > > > - __cbq->ddata = data;
> > > > - __cbq->destruct_data = destruct_data;
> > > > + __cbq->ddata = data;
> > > > + __cbq->destruct_data = destruct_data;
> > > >
> > >
> > > Still not good enough - work->pending bit gets cleared when work has
> > > been scheduled, but before executing payload. You still have the race.
> >
> > Data pointer is copied before bit is set,
> > but I forget that it is not data, but another pointer
> > which may be overwritten.
> >
> > I think we may finish it by setting skb as data,
> > and call kfree_skb() as destructor.
> >
>
> Yes, that woudl work, although I would urge you to implement a message
> queue for callbacks (probably limit it to 1000 messages or so) to
> allow bursting.
It already exist, btw, but not exactly in that way -
we have skb queue, which can not be filled from userspace
if pressure is so strong so work queue can not be scheduled.
It is of course different and is influenced by other things
but it handles bursts quite well - it was tested on both
SMP and UP machines with continuous flows of forks with
shape addon of new running tasks [both fith fork bomb and not],
so I think it can be called real bursty test.
> --
> Dmitry
Evgeniy Polyakov
Only failure makes us experts. -- Theo de Raadt
On 4/26/05, Evgeniy Polyakov <[email protected]> wrote:
> On Tue, 26 Apr 2005 13:42:10 -0500
> Dmitry Torokhov <[email protected]> wrote:
> > Yes, that woudl work, although I would urge you to implement a message
> > queue for callbacks (probably limit it to 1000 messages or so) to
> > allow bursting.
>
> It already exist, btw, but not exactly in that way -
> we have skb queue, which can not be filled from userspace
> if pressure is so strong so work queue can not be scheduled.
> It is of course different and is influenced by other things
> but it handles bursts quite well - it was tested on both
> SMP and UP machines with continuous flows of forks with
> shape addon of new running tasks [both fith fork bomb and not],
> so I think it can be called real bursty test.
>
Ok, hear me out and tell me where I am wrong:
By default a socket can receive at least 128 skbs with 258-byte
payload, correct? That means that user of cn_netlink_send, if started
"fresh", 128 average - size connector messages. If sender does not
want to wait for anything (unlike your fork tests that do schedule)
that means that 127 of those 128 messages will be dropped, although
netlink would deliver them in time just fine.
What am I missing?
--
Dmitry
On Tue, 26 Apr 2005 14:06:36 -0500
Dmitry Torokhov <[email protected]> wrote:
> On 4/26/05, Evgeniy Polyakov <[email protected]> wrote:
> > On Tue, 26 Apr 2005 13:42:10 -0500
> > Dmitry Torokhov <[email protected]> wrote:
> > > Yes, that woudl work, although I would urge you to implement a message
> > > queue for callbacks (probably limit it to 1000 messages or so) to
> > > allow bursting.
> >
> > It already exist, btw, but not exactly in that way -
> > we have skb queue, which can not be filled from userspace
> > if pressure is so strong so work queue can not be scheduled.
> > It is of course different and is influenced by other things
> > but it handles bursts quite well - it was tested on both
> > SMP and UP machines with continuous flows of forks with
> > shape addon of new running tasks [both fith fork bomb and not],
> > so I think it can be called real bursty test.
> >
>
> Ok, hear me out and tell me where I am wrong:
>
> By default a socket can receive at least 128 skbs with 258-byte
> payload, correct? That means that user of cn_netlink_send, if started
> "fresh", 128 average - size connector messages. If sender does not
> want to wait for anything (unlike your fork tests that do schedule)
> that means that 127 of those 128 messages will be dropped, although
> netlink would deliver them in time just fine.
>
> What am I missing?
Concider netlink_broadcast - it delivers skb to the kernel
listener directly to the input callback - no queueing actually,
the same is for netlink_unicast - so in-kernel users are always
called synchronously.
send() [sys_send()] system call(which btw as syscall does schedule too)
ends up in netlink_sendmsg and thus either netlink_broadcast and
netlink_unicast, which is called directly as we have seen.
> --
> Dmitry
Evgeniy Polyakov
Only failure makes us experts. -- Theo de Raadt
On 4/26/05, Evgeniy Polyakov <[email protected]> wrote:
> On Tue, 26 Apr 2005 14:06:36 -0500
> Dmitry Torokhov <[email protected]> wrote:
>
> > On 4/26/05, Evgeniy Polyakov <[email protected]> wrote:
> > > On Tue, 26 Apr 2005 13:42:10 -0500
> > > Dmitry Torokhov <[email protected]> wrote:
> > > > Yes, that woudl work, although I would urge you to implement a message
> > > > queue for callbacks (probably limit it to 1000 messages or so) to
> > > > allow bursting.
> > >
> > > It already exist, btw, but not exactly in that way -
> > > we have skb queue, which can not be filled from userspace
> > > if pressure is so strong so work queue can not be scheduled.
> > > It is of course different and is influenced by other things
> > > but it handles bursts quite well - it was tested on both
> > > SMP and UP machines with continuous flows of forks with
> > > shape addon of new running tasks [both fith fork bomb and not],
> > > so I think it can be called real bursty test.
> > >
> >
> > Ok, hear me out and tell me where I am wrong:
> >
> > By default a socket can receive at least 128 skbs with 258-byte
> > payload, correct? That means that user of cn_netlink_send, if started
> > "fresh", 128 average - size connector messages. If sender does not
> > want to wait for anything (unlike your fork tests that do schedule)
> > that means that 127 of those 128 messages will be dropped, although
> > netlink would deliver them in time just fine.
> >
> > What am I missing?
>
> Concider netlink_broadcast - it delivers skb to the kernel
> listener directly to the input callback - no queueing actually,
Right, no queueing for in-kernel...
But then we have the following: netlink will drop messages sent to
in-kernel socket ony if it can not copy skb - which is i'd say a very
rare scenario. Connector, on the other hand, is guaranteed to drop all
but the very first message sent between 2 schedules. That makes
connector several orders of magnitude less reliable than bare netlink
protocol. And you don't see it with your fork tests because you do
schedule when you fork.
--
Dmitry
On Tue, 2005-04-26 at 15:02 -0500, Dmitry Torokhov wrote:
> On 4/26/05, Evgeniy Polyakov <[email protected]> wrote:
> > On Tue, 26 Apr 2005 14:06:36 -0500
> > Dmitry Torokhov <[email protected]> wrote:
> >
> > > On 4/26/05, Evgeniy Polyakov <[email protected]> wrote:
> > > > On Tue, 26 Apr 2005 13:42:10 -0500
> > > > Dmitry Torokhov <[email protected]> wrote:
> > > > > Yes, that woudl work, although I would urge you to implement a message
> > > > > queue for callbacks (probably limit it to 1000 messages or so) to
> > > > > allow bursting.
> > > >
> > > > It already exist, btw, but not exactly in that way -
> > > > we have skb queue, which can not be filled from userspace
> > > > if pressure is so strong so work queue can not be scheduled.
> > > > It is of course different and is influenced by other things
> > > > but it handles bursts quite well - it was tested on both
> > > > SMP and UP machines with continuous flows of forks with
> > > > shape addon of new running tasks [both fith fork bomb and not],
> > > > so I think it can be called real bursty test.
> > > >
> > >
> > > Ok, hear me out and tell me where I am wrong:
> > >
> > > By default a socket can receive at least 128 skbs with 258-byte
> > > payload, correct? That means that user of cn_netlink_send, if started
> > > "fresh", 128 average - size connector messages. If sender does not
> > > want to wait for anything (unlike your fork tests that do schedule)
> > > that means that 127 of those 128 messages will be dropped, although
> > > netlink would deliver them in time just fine.
> > >
> > > What am I missing?
> >
> > Concider netlink_broadcast - it delivers skb to the kernel
> > listener directly to the input callback - no queueing actually,
>
> Right, no queueing for in-kernel...
>
> But then we have the following: netlink will drop messages sent to
> in-kernel socket ony if it can not copy skb - which is i'd say a very
> rare scenario. Connector, on the other hand, is guaranteed to drop all
> but the very first message sent between 2 schedules. That makes
> connector several orders of magnitude less reliable than bare netlink
> protocol. And you don't see it with your fork tests because you do
> schedule when you fork.
Let's clarify that we are talking about userspace->kernelspace
direction.
Only for that messages callback path is invoked.
For such messages there is always scheduling after system call.
So if we have situation when after a schedule() call work queue can not
be scheduled, but userspace process in that situation can not be
scheduled
too, so no new message will be delivered and nothing can be dropped.
Actually you pointed to the interesting peice of code,
I've digged out some old pre-connector draft, where there is a note,
that it could be fine to
1. send messages to self, but first implementation was dropped.
2. call several callbacks in parallel, although it can not happen now.
So I will think of changing that code, although it works without unneded
drops now.
--
Evgeniy Polyakov
Crash is better than data corruption -- Arthur Grabowski
On Tuesday 26 April 2005 23:06, Evgeniy Polyakov wrote:
> Let's clarify that we are talking about userspace->kernelspace
> direction.
> Only for that messages callback path is invoked.
What about kernelspace->userspace or kernelspace->kernelspace?
>From what I see nothing stops kernel code from calling cn_netlink_send,
in fact your cbus does exactly that. So I am confused why you singled
out userspace->kernelspace direction.
--
Dmitry
On Wed, 2005-04-27 at 00:16 -0500, Dmitry Torokhov wrote:
> On Tuesday 26 April 2005 23:06, Evgeniy Polyakov wrote:
> > Let's clarify that we are talking about userspace->kernelspace
> > direction.
> > Only for that messages callback path is invoked.
>
> What about kernelspace->userspace or kernelspace->kernelspace?
> From what I see nothing stops kernel code from calling cn_netlink_send,
> in fact your cbus does exactly that. So I am confused why you singled
> out userspace->kernelspace direction.
You miunderstand the code -
cn_netlink_send() never ends up in callback invocation,
it can only deliver messages in kernelspace->userspace direction.
kernelspace->userspace direction ends up adding buffer into
socket queue, from which userspace may read data using recv() system
call.
There is no kernelspace->kernelspace sending possibility
except by creating new socket in userspace and sendmsg/recvmsg
interception/using, but that is the same as reading from userspace.
--
Evgeniy Polyakov
Crash is better than data corruption -- Arthur Grabowski
On Wednesday 27 April 2005 00:32, Evgeniy Polyakov wrote:
> On Wed, 2005-04-27 at 00:16 -0500, Dmitry Torokhov wrote:
> > On Tuesday 26 April 2005 23:06, Evgeniy Polyakov wrote:
> > > Let's clarify that we are talking about userspace->kernelspace
> > > direction.
> > > Only for that messages callback path is invoked.
> >
> > What about kernelspace->userspace or kernelspace->kernelspace?
> > From what I see nothing stops kernel code from calling cn_netlink_send,
> > in fact your cbus does exactly that. So I am confused why you singled
> > out userspace->kernelspace direction.
>
> You miunderstand the code -
> cn_netlink_send() never ends up in callback invocation,
> it can only deliver messages in kernelspace->userspace direction.
> kernelspace->userspace direction ends up adding buffer into
> socket queue, from which userspace may read data using recv() system
> call.
Yes, you are right, sorry. I missed the fact that message is not injected
into callback queue. Hmm, might be useful if it was, for implementing
various kinds of in-kernel notifications.
Other thing to consider - even if there is explicit schedule on userspace->
kernelspace path there is no quarantee that connector thread will be
scheduled before userspace - userspace could be higher-priority bursty task.
Although this scenario it not likely I guess.
--
Dmitry
On Wed, 2005-04-27 at 00:46 -0500, Dmitry Torokhov wrote:
> into callback queue. Hmm, might be useful if it was, for implementing
> various kinds of in-kernel notifications.
There was implementation for that, but I dropped it due to broken code
[fix was simple, but there is issue number 2]
and I think it may end up in various microkerel-alike message queues.
Since I worked too much with projects that are based on such design,
so I afraid it will be very bad to allow such using in Linux kernel.
Although it can be compile-time or something...
--
Evgeniy Polyakov
Crash is better than data corruption -- Arthur Grabowski
On Monday 11 April 2005 07:59, Evgeniy Polyakov wrote:
> /*****************************************/
> Kernel Connector.
> /*****************************************/
>
> Kernel connector - new netlink based userspace <-> kernel space easy to use communication module.
>
Hi Evgeniy,
I have looked at the connector implementation one more time and I think
it can be improved in the following ways:
- drop unneeded refcounting;
- start only one workqueue entry instead of one for every callback type;
- drop cn_dev - there is only one connector;
- simplify cn_notify_request to carry only one range - it simplifies code
quite a bit - nothing stops clientd from sending several notification
requests;
- implement internal queuefor callbacks so we are not at mercy of scheduler;
- admit that SKBs are message medium and do not mess up with passing around
destructor functions.
- In callback notifier switch to using GFP_KERNEL since it can sleep just
fine;
- more...
Because there were a lot of changes I decided against doing relative patch
but instead a replacement patch for affected files. I have cut out cbus and
documentation to keep it smaller so it will be easier to review. The patch
is compile-tested only.
BTW, I do not think that "All rights reserved" statement is applicable/
compatible with GPL this software is supposedly released under, I have
taken liberty of removing this statement.
> +/*
> + * connector.c
> + *
> + * 2004 Copyright (c) Evgeniy Polyakov <[email protected]>
> + * All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
Please look it over.
--
Dmitry
drivers/Kconfig | 2
drivers/Makefile | 2
drivers/connector/Kconfig | 12 +
drivers/connector/Makefile | 1
drivers/connector/connector.c | 481 ++++++++++++++++++++++++++++++++++++++++++
include/linux/connector.h | 103 ++++++++
6 files changed, 601 insertions(+)
Index: dtor/drivers/connector/connector.c
===================================================================
--- /dev/null
+++ dtor/drivers/connector/connector.c
@@ -0,0 +1,481 @@
+/*
+ * connector.c
+ *
+ * 2004 Copyright (c) Evgeniy Polyakov <[email protected]>
+ * 2005 Copyright (c) Dmitry Torokhov <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/skbuff.h>
+#include <linux/netlink.h>
+#include <linux/moduleparam.h>
+#include <linux/connector.h>
+
+#include <net/sock.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Evgeniy Polyakov <[email protected]>");
+MODULE_DESCRIPTION("Generic userspace <-> kernelspace connector.");
+
+static int unit = NETLINK_NFLOG;
+module_param(unit, int, 0);
+MODULE_PARM_DESC(unit, "Netlink socket to use.");
+
+static u32 cn_idx = CN_IDX_CONNECTOR;
+module_param_named(idx, cn_idx, uint, 0);
+MODULE_PARM_DESC(idx, "Connector's main device idx.");
+
+static u32 cn_val = CN_VAL_CONNECTOR;
+module_param_named(val, cn_val, uint, 0);
+MODULE_PARM_DESC(val, "Connector's main device val.");
+
+static u32 cn_max_backlog = 100;
+module_param_named(max_backlog, cn_max_backlog, uint, 0);
+MODULE_PARM_DESC(max_backlog, "Connector's maximum request backlog.");
+
+static LIST_HEAD(notify_list);
+static DECLARE_MUTEX(notify_list_lock);
+
+static LIST_HEAD(callback_list);
+static DEFINE_SPINLOCK(callback_list_lock);
+
+static struct sock *cn_netlink_socket;
+static struct workqueue_struct *cn_workqueue;
+static struct cb_id cn_id;
+static int cn_initialized;
+
+#define DEBUG
+#define PFX "connector: "
+
+#ifdef DEBUG
+#define dprintk(fmt, args...) printk(KERN_DEBUG PFX "%s: " fmt, \
+ __FUNCTION__, ## args)
+#else
+#define dprintk(fmt, args...)
+#endif
+
+/*
+ * cn_cb_equal() - checks whether 2 callback IDs are equal.
+ */
+static inline int cn_cb_equal(struct cb_id *i1, struct cb_id *i2)
+{
+ dprintk("comparing %04x.%04x and %04x.%04x\n",
+ i1->idx, i1->val,
+ i2->idx, i2->val);
+
+ return i1->idx == i2->idx && i1->val == i2->val;
+}
+
+/*
+ * cn_find_callback() - locates registered callback structure
+ * by its id. Caller must hold callback_list_lock.
+ */
+static struct cn_callback *cn_find_callback(struct cb_id *id)
+{
+ struct cn_callback *cb;
+
+ list_for_each_entry(cb, &callback_list, node)
+ if (cn_cb_equal(&cb->id, id))
+ return cb;
+
+ return NULL;
+}
+
+/*
+ * msg->seq and msg->ack are used to determine message genealogy.
+ * When someone sends message it puts there locally unique sequence
+ * and random acknowledge numbers.
+ * Sequence number may also be copied into nlmsghdr->nlmsg_seq.
+ *
+ * Sequence number is incremented with each message to be sent.
+ *
+ * If we expect reply to our message,
+ * then sequence number in received message MUST be the same as in
+ * original message, and acknowledge number MUST be the same + 1.
+ *
+ * If we receive a message and it's sequence number is not equal to
+ * the one we are expecting, then it is new message.
+ * If we receive a message and it's sequence number is the same as
+ * the one we are expecting, but it's acknowledge is not equal to the
+ * acknowledge number in original message + 1, then it is ialso a new
+ * message.
+ */
+int cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask)
+{
+ struct cn_callback *cb;
+ struct sk_buff *skb;
+ struct nlmsghdr *nlh;
+ unsigned int size;
+ u32 groups = 0;
+
+ if (!cn_initialized)
+ return -EAGAIN;
+
+ if (!__groups) {
+ spin_lock_bh(&callback_list_lock);
+ cb = cn_find_callback(&msg->id);
+ if (cb)
+ groups = cb->group;
+ spin_unlock_bh(&callback_list_lock);
+
+ if (!cb) {
+ dprintk("Failed to find callback %#x.%#x, seq=%u\n",
+ msg->id.idx, msg->id.val, msg->seq);
+ return -ENODEV;
+ }
+ }
+ else
+ groups = __groups;
+
+ size = sizeof(struct cn_msg) + msg->len;
+
+ skb = alloc_skb(NLMSG_SPACE(size), gfp_mask);
+ if (!skb) {
+ printk(KERN_ERR PFX "Failed to allocate skb with size=%u\n",
+ NLMSG_SPACE(size));
+ return -ENOMEM;
+ }
+
+ dprintk("len=%u, seq=%u, ack=%u, group=%u.\n",
+ msg->len, msg->seq, msg->ack, groups);
+
+ nlh = NLMSG_PUT(skb, 0, msg->seq, NLMSG_DONE, size);
+ memcpy(NLMSG_DATA(nlh), msg, sizeof(struct cn_msg) + msg->len);
+
+ NETLINK_CB(skb).dst_groups = groups;
+
+ netlink_broadcast(cn_netlink_socket, skb, 0, groups, gfp_mask);
+ return 0;
+
+ nlmsg_failure:
+ kfree_skb(skb);
+ return -ENOMEM;
+}
+
+/*
+ * cn_execute_callback() - executes callback and frees associated
+ * skb and work structures.
+ */
+static void cn_execute_callback(void *data)
+{
+ struct cn_callback_work *cb_work = data;
+
+ cb_work->cb->callback(cb_work->msg->data);
+
+ spin_lock_bh(&callback_list_lock);
+ cb_work->cb->requests--;
+ spin_unlock_bh(&callback_list_lock);
+
+ kfree_skb(cb_work->skb);
+ kfree(cb_work);
+}
+
+/*
+ * cn_schedule_callback() - allocates work structure and
+ * schedules callback for execution on cn_workqueue
+ */
+static int cn_schedule_callback(struct sk_buff *skb)
+{
+ struct nlmsghdr *nlh = (struct nlmsghdr *)skb->data;
+ struct cn_msg *msg = NLMSG_DATA(nlh);
+ struct cn_callback_work *cb_work;
+ struct cn_callback *cb;
+ int error;
+
+ cb_work = kcalloc(1, sizeof(struct cn_callback_work), GFP_KERNEL);
+ if (!cb_work)
+ return -ENOMEM;
+
+ spin_lock_bh(&callback_list_lock);
+ cb = cn_find_callback(&msg->id);
+ if (!cb)
+ error = -ENODEV;
+ else if (cb->requests >= cn_max_backlog)
+ error = -ENOSPC;
+ else {
+ cb_work->cb = cb;
+ cb_work->msg = msg;
+ cb_work->skb = skb_get(skb);
+ INIT_WORK(&cb_work->work, cn_execute_callback, cb_work);
+ queue_work(cn_workqueue, &cb_work->work);
+ cb->requests++;
+ error = 0;
+ }
+ spin_unlock_bh(&callback_list_lock);
+
+ if (error)
+ kfree(cb_work);
+
+ return error;
+}
+
+/*
+ * cn_validate_skb() - performs basic checks on skb to ensure
+ * that received message has correct structure.
+ */
+static int cn_validate_skb(struct sk_buff *skb)
+{
+ struct nlmsghdr *nlh;
+ struct cn_msg *msg;
+
+ dprintk("skb: len=%u, data_len=%u, truesize=%u, proto=%u, "
+ "cloned=%d, shared=%d\n",
+ skb->len, skb->data_len, skb->truesize, skb->protocol,
+ skb_cloned(skb), skb_shared(skb));
+
+ if (skb->len < NLMSG_SPACE(0)) {
+ printk(KERN_ERR PFX "empty skb received\n");
+ return -EINVAL;
+ }
+
+ nlh = (struct nlmsghdr *)skb->data;
+ if (skb->len < nlh->nlmsg_len) {
+ printk(KERN_ERR PFX
+ "skb is too short, skb_len=%u, nlmsg_len=%u\n",
+ skb->len, nlh->nlmsg_len);
+ return -EINVAL;
+ }
+
+ msg = (struct cn_msg *)NLMSG_DATA(nlh);
+ if (NLMSG_SPACE(msg->len + sizeof(*msg)) != nlh->nlmsg_len) {
+ printk(KERN_ERR PFX "invalid netlink message length, "
+ "msg->len=%u[%u], nlh->nlmsg_len=%u\n",
+ msg->len, NLMSG_SPACE(msg->len + sizeof(*msg)),
+ nlh->nlmsg_len);
+ return -EINVAL;
+ }
+
+ dprintk("pid=%u, uid=%u, seq=%u, group=%u\n",
+ NETLINK_CREDS(skb)->pid, NETLINK_CREDS(skb)->uid,
+ nlh->nlmsg_seq, NETLINK_CB((skb)).groups);
+
+ return 0;
+}
+
+/*
+ * cn_input() - netlink socket input handler. Dequeues skb and,
+ * after validating reveived data, schedules requested callback
+ * for subsequent execution.
+ */
+static void cn_input(struct sock *sk, int len)
+{
+ struct sk_buff *skb;
+
+ while ((skb = skb_dequeue(&sk->sk_receive_queue)) != NULL) {
+ if (cn_validate_skb(skb))
+ cn_schedule_callback(skb);
+
+ kfree_skb(skb);
+ }
+}
+
+/*
+ * cn_notify() - notifies registered listeners about an event.
+ * Currently is used to notify userspace about add/remove callback
+ * requests being processed by connector.
+ */
+static void cn_notify(struct cb_id *id, u32 notify_event)
+{
+ struct cn_notify_entry *entry;
+ struct cn_notify_req *req;
+ struct cn_msg notify_msg;
+
+ down(¬ify_list_lock);
+ list_for_each_entry(entry, ¬ify_list, node) {
+
+ req = &entry->req;
+
+ if (id->idx >= req->idx_first &&
+ id->idx < req->idx_first + req->idx_range &&
+ id->val >= req->val_first &&
+ id->val < req->val_first + req->val_range) {
+
+ dprintk("Notifying group %x - %#x.%#x: event %u\n",
+ req->group, id->idx, id->val, notify_event);
+
+ memset(¬ify_msg, 0, sizeof(notify_msg));
+ notify_msg.id = *id;
+ notify_msg.ack = notify_event;
+
+ cn_netlink_send(¬ify_msg, req->group, GFP_KERNEL);
+ }
+ }
+ up(¬ify_list_lock);
+}
+
+/*
+ * cn_add_callback() - installs callback with specified callback ID
+ */
+int cn_add_callback(struct cb_id *id, char *name, void (*callback)(void *))
+{
+ struct cn_callback *cb;
+ int retval = 0;
+
+ cb = kcalloc(1, sizeof(struct cn_callback), GFP_KERNEL);
+ if (!cb) {
+ printk(KERN_ERR PFX
+ "Failed to allocate new struct cn_callback\n");
+ return -ENOMEM;
+ }
+
+ cb->id = *id;
+ snprintf(cb->name, sizeof(cb->name), "%s", name);
+ cb->callback = callback;
+
+ spin_lock_bh(&callback_list_lock);
+ if (cn_find_callback(id)) {
+ kfree(cb);
+ retval = -EEXIST;
+ } else
+ list_add_tail(&cb->node, &callback_list);
+ spin_unlock_bh(&callback_list_lock);
+
+ if (retval == 0)
+ cn_notify(id, 0);
+
+ return retval;
+}
+
+/*
+ * cn_del_callback() - deletes callback with specified callback ID
+ * Will wait until all scheduled instances of the callback complete.
+ */
+void cn_del_callback(struct cb_id *id)
+{
+ struct cn_callback *cb;
+
+ spin_lock_bh(&callback_list_lock);
+ cb = cn_find_callback(id);
+ if (cb)
+ list_del(&cb->node);
+ spin_unlock_bh(&callback_list_lock);
+
+ if (cb) {
+ flush_workqueue(cn_workqueue);
+ kfree(cb);
+ cn_notify(id, 1);
+ } else
+ printk(KERN_WARNING PFX
+ "Attempting do delete non-existing callback %x/%x\n",
+ id->idx, id->val);
+}
+
+/*
+ * cn_notify_req_same() - compares two connector notify requestes
+ */
+static inline int cn_notify_req_same(struct cn_notify_req *r1,
+ struct cn_notify_req *r2)
+{
+ return r1->group == r2->group &&
+ r1->idx_first == r2->idx_first &&
+ r1->idx_range == r2->idx_range &&
+ r1->val_first == r2->val_first &&
+ r1->val_range == r2->val_range;
+}
+
+/*
+ * cn_connector_callback() - callback installed and used by connector
+ * itself to manage connector notification requests.
+ */
+static void cn_connector_callback(void *data)
+{
+ struct cn_msg *msg = data;
+ struct cn_notify_req *req;
+ struct cn_notify_entry *entry;
+ int found = 0;
+
+ if (msg->len != sizeof(struct cn_notify_req)) {
+ printk(KERN_ERR PFX
+ "Wrong notify request size (%u, must be %u)\n",
+ msg->len, sizeof(struct cn_notify_req));
+ return;
+ }
+
+ req = (struct cn_notify_req *)msg->data;
+
+ down(¬ify_list_lock);
+
+ list_for_each_entry(entry, ¬ify_list, node) {
+ if (cn_notify_req_same(&entry->req, req)) {
+ list_del(&entry->node);
+ kfree(entry);
+ found = 1;
+ break;
+ }
+ }
+
+ if (!found) {
+ entry = kcalloc(1, sizeof(*entry), GFP_KERNEL);
+ if (!entry)
+ printk(KERN_ERR PFX
+ "Failed to allocate new notify entry\n");
+ else {
+ entry->req = *req;
+ list_add_tail(&entry->node, ¬ify_list);
+ }
+ }
+
+ up(¬ify_list_lock);
+}
+
+static int cn_init(void)
+{
+ int error;
+
+ cn_workqueue = create_workqueue("cqueue");
+ if (!cn_workqueue) {
+ printk(KERN_ERR PFX "Failed to create workqueue\n");
+ return -ENOMEM;
+ }
+
+ cn_netlink_socket = netlink_kernel_create(unit, cn_input);
+ if (!cn_netlink_socket) {
+ printk(KERN_ERR PFX
+ "Failed to create new netlink socket(%u)\n", unit);
+ destroy_workqueue(cn_workqueue);
+ return -ENOMEM;
+ }
+
+ cn_id.idx = cn_idx;
+ cn_id.val = cn_val;
+ error = cn_add_callback(&cn_id, "connector", cn_connector_callback);
+ if (error) {
+ printk(KERN_ERR PFX "Failed to install 'connector' callback\n");
+ sock_release(cn_netlink_socket->sk_socket);
+ destroy_workqueue(cn_workqueue);
+ return error;
+ }
+
+ cn_initialized = 1;
+ return 0;
+}
+
+static void cn_exit(void)
+{
+ cn_del_callback(&cn_id);
+ sock_release(cn_netlink_socket->sk_socket);
+ destroy_workqueue(cn_workqueue);
+}
+
+module_init(cn_init);
+module_exit(cn_exit);
+
+EXPORT_SYMBOL_GPL(cn_add_callback);
+EXPORT_SYMBOL_GPL(cn_del_callback);
+EXPORT_SYMBOL_GPL(cn_netlink_send);
Index: dtor/drivers/connector/Makefile
===================================================================
--- /dev/null
+++ dtor/drivers/connector/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_CONNECTOR) += connector.o
Index: dtor/drivers/connector/Kconfig
===================================================================
--- /dev/null
+++ dtor/drivers/connector/Kconfig
@@ -0,0 +1,12 @@
+menu "Connector - unified userspace <-> kernelspace linker"
+
+config CONNECTOR
+ tristate "Connector - unified userspace <-> kernelspace linker"
+ depends on NET
+ ---help---
+ This is unified userspace <-> kernelspace connector working on top
+ of the netlink socket protocol.
+
+ Connector support can also be built as a module. If so, the module
+ will be called connector.ko.
+endmenu
Index: dtor/include/linux/connector.h
===================================================================
--- /dev/null
+++ dtor/include/linux/connector.h
@@ -0,0 +1,103 @@
+/*
+ * connector.h
+ *
+ * 2004 Copyright (c) Evgeniy Polyakov <[email protected]>
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef __CONNECTOR_H
+#define __CONNECTOR_H
+
+#include <asm/types.h>
+
+#define CN_IDX_CONNECTOR 0xffffffff
+#define CN_VAL_CONNECTOR 0xffffffff
+
+/*
+ * idx and val are unique identifiers which
+ * are used for message routing and
+ * must be registered in connector.h for in-kernel usage.
+ */
+
+struct cb_id
+{
+ __u32 idx;
+ __u32 val;
+};
+
+struct cn_msg
+{
+ struct cb_id id;
+
+ __u32 seq;
+ __u32 ack;
+
+ __u32 len; /* Length of the following data */
+ __u8 data[0];
+};
+
+/*
+ * Notify structure - requests notification about
+ * registering/unregistering idx/val in range [first, first+range).
+ */
+struct cn_notify_req
+{
+ __u32 group;
+ __u32 idx_first, idx_range;
+ __u32 val_first, val_range;
+};
+
+#ifdef __KERNEL__
+
+#include <asm/atomic.h>
+
+#include <linux/list.h>
+#include <linux/workqueue.h>
+
+#include <net/sock.h>
+
+#define CN_CBQ_NAMELEN 32
+
+struct cn_callback
+{
+ struct cb_id id;
+ unsigned char name[CN_CBQ_NAMELEN];
+ int group;
+ void (*callback)(void *);
+ int requests;
+ struct list_head node;
+};
+
+struct cn_callback_work {
+ struct cn_callback *cb;
+ struct cn_msg *msg;
+ struct sk_buff *skb;
+ struct work_struct work;
+};
+
+struct cn_notify_entry
+{
+ struct list_head node;
+ struct cn_notify_req req;
+};
+
+int cn_add_callback(struct cb_id *, char *, void (* callback)(void *));
+void cn_del_callback(struct cb_id *);
+int cn_netlink_send(struct cn_msg *, u32, int);
+
+#endif /* __KERNEL__ */
+#endif /* __CONNECTOR_H */
Index: dtor/drivers/Makefile
===================================================================
--- dtor.orig/drivers/Makefile
+++ dtor/drivers/Makefile
@@ -17,6 +17,8 @@ obj-$(CONFIG_PNP) += pnp/
# default.
obj-y += char/
+obj-$(CONFIG_CONNECTOR) += connector/
+
# i810fb and intelfb depend on char/agp/
obj-$(CONFIG_FB_I810) += video/i810/
obj-$(CONFIG_FB_INTEL) += video/intelfb/
Index: dtor/drivers/Kconfig
===================================================================
--- dtor.orig/drivers/Kconfig
+++ dtor/drivers/Kconfig
@@ -4,6 +4,8 @@ menu "Device Drivers"
source "drivers/base/Kconfig"
+source "drivers/connector/Kconfig"
+
source "drivers/mtd/Kconfig"
source "drivers/parport/Kconfig"
On Tue, 10 May 2005 01:18:46 -0500
Dmitry Torokhov <[email protected]> wrote:
> On Monday 11 April 2005 07:59, Evgeniy Polyakov wrote:
> > /*****************************************/
> > Kernel Connector.
> > /*****************************************/
> >
> > Kernel connector - new netlink based userspace <-> kernel space easy to use communication module.
> >
>
> Hi Evgeniy,
Hello, Dmitriy.
> I have looked at the connector implementation one more time and I think
> it can be improved in the following ways:
>
> - drop unneeded refcounting;
It is already.
> - start only one workqueue entry instead of one for every callback type;
There is only one queue per callback device.
> - drop cn_dev - there is only one connector;
There may be several connectors with different socket types.
I use it in my environment for tests.
> - simplify cn_notify_request to carry only one range - it simplifies code
> quite a bit - nothing stops clientd from sending several notification
> requests;
? Nothing stops clients from sending several notification requests now.
I do not see how it simplifies the things?
Feel free to set idx_num and val_num to 1 and you will have the same.
> - implement internal queuefor callbacks so we are not at mercy of scheduler;
Current implementation has big warning about such theoretical behaviour,
if it will be triggered, I will fix that behaviour,
more likely not by naive work allocation - there might be
at least cache/memory pool or something...
> - admit that SKBs are message medium and do not mess up with passing around
> destructor functions.
You mess up layers.
I do not see why you want it.
> - In callback notifier switch to using GFP_KERNEL since it can sleep just
> fine;
Yes, it is correct
> - more...
>
> Because there were a lot of changes I decided against doing relative patch
> but instead a replacement patch for affected files. I have cut out cbus and
> documentation to keep it smaller so it will be easier to review. The patch
> is compile-tested only.
Connector with your changes just does not work.
Dmitry, I do not understand why do you change things in so radical manner,
especially when there are no 1. new interesting features, 2. no performance
improvements, 3. no right design changes.
> BTW, I do not think that "All rights reserved" statement is applicable/
> compatible with GPL this software is supposedly released under, I have
> taken liberty of removing this statement.
You substitute license agreements with copyright.
"All rights reserved" is targeted to copyright
(avtorskoe pravo, zakon N 5351/1-1) of the code
and has nothing with license.
Evgeniy Polyakov
Only failure makes us experts. -- Theo de Raadt
On 5/10/05, Evgeniy Polyakov <[email protected]> wrote:
> On Tue, 10 May 2005 01:18:46 -0500
> Dmitry Torokhov <[email protected]> wrote:
>
> > - drop cn_dev - there is only one connector;
>
> There may be several connectors with different socket types.
> I use it in my environment for tests.
>
Why do you need that? u32 ID space is not enough?
> > - simplify cn_notify_request to carry only one range - it simplifies code
> > quite a bit - nothing stops clientd from sending several notification
> > requests;
>
> ? Nothing stops clients from sending several notification requests now.
> I do not see how it simplifies the things?
> Feel free to set idx_num and val_num to 1 and you will have the same.
Compare the implementaion - without variable-length notification
requests the code is much simplier and flexible.
> > - implement internal queuefor callbacks so we are not at mercy of scheduler;
>
> Current implementation has big warning about such theoretical behaviour,
> if it will be triggered, I will fix that behaviour,
> more likely not by naive work allocation - there might be
> at least cache/memory pool or something...
What is the average expected rate for such messages? Not particularly
high I think. We just need to sustain sudden bursts, naive work
allocation will work just fine here.
> > - admit that SKBs are message medium and do not mess up with passing around
> > destructor functions.
>
> You mess up layers.
> I do not see why you want it.
What layers? Connector core is not an onion, with my patch it is 481
line .c file(down from 900+), comments probably take 1/3 of it. We
just need to keep it simple, that's all.
> Connector with your changes just does not work.
As in "I tried to run it and adjusted the application for the new
cn_notify_req size and it does not work" or "I am saying it does not
work"?
> Dmitry, I do not understand why do you change things in so radical manner,
> especially when there are no 1. new interesting features, 2. no performance
> improvements, 3. no right design changes.
There is a right design change - it keeps the code simple. It peels
all the layers you invented for something that does not really need
layers - connector is just a simple wrapper over raw netlink.
When I look at the code I see the mess of unneeded refcounting, wrong
kinds of locking, over-engineering. Of course I want to change it.
Keep it simple. Except for your testing environment why do you need to
have separate netlink sockets? You have an ample ID space. Why do you
need separate cn_dev and cn_queue_dev (and in the end have them at
all)? Why do you need "input" handler, do you expect to use different
handlers? If so which? And why? Do you expect to use other transports?
If so which? And why? You need to have justifiction for every
additional layer you putting on.
>
> > BTW, I do not think that "All rights reserved" statement is applicable/
> > compatible with GPL this software is supposedly released under, I have
> > taken liberty of removing this statement.
>
> You substitute license agreements with copyright.
> "All rights reserved" is targeted to copyright
> (avtorskoe pravo, zakon N 5351/1-1) of the code
> and has nothing with license.
Yes, you are probably right, sorry for the noise...
--
Dmitry
On Tue, 10 May 2005 09:56:45 -0500
Dmitry Torokhov <[email protected]> wrote:
> On 5/10/05, Evgeniy Polyakov <[email protected]> wrote:
> > On Tue, 10 May 2005 01:18:46 -0500
> > Dmitry Torokhov <[email protected]> wrote:
> >
> > > - drop cn_dev - there is only one connector;
> >
> > There may be several connectors with different socket types.
> > I use it in my environment for tests.
> >
>
> Why do you need that? u32 ID space is not enough?
They use different netlink sockets.
One of them can fail during initialisation, for example...
> > > - simplify cn_notify_request to carry only one range - it simplifies code
> > > quite a bit - nothing stops clientd from sending several notification
> > > requests;
> >
> > ? Nothing stops clients from sending several notification requests now.
> > I do not see how it simplifies the things?
> > Feel free to set idx_num and val_num to 1 and you will have the same.
>
> Compare the implementaion - without variable-length notification
> requests the code is much simplier and flexible.
Hmm, old request had a length, new one - does not.
But instead of atomically addition of the whole set, one need to send
couple of events.
I do not see any profit here.
> > > - implement internal queuefor callbacks so we are not at mercy of scheduler;
> >
> > Current implementation has big warning about such theoretical behaviour,
> > if it will be triggered, I will fix that behaviour,
> > more likely not by naive work allocation - there might be
> > at least cache/memory pool or something...
>
> What is the average expected rate for such messages? Not particularly
> high I think. We just need to sustain sudden bursts, naive work
> allocation will work just fine here.
Probably - I still think it is theoretical "misbehaviour" [dropping
input messages under the high load] that can not be
triggered, if it will - I will fix it.
> > > - admit that SKBs are message medium and do not mess up with passing around
> > > destructor functions.
> >
> > You mess up layers.
> > I do not see why you want it.
>
> What layers? Connector core is not an onion, with my patch it is 481
> line .c file(down from 900+), comments probably take 1/3 of it. We
> just need to keep it simple, that's all.
There is low-level medium layer - skbs.
There is middle-level layer - logical handling attached data and appropriate callback invocation.
There is high-level layer - various protocols over connector.
Why any of them should know that others exist?
Your changes with destructor removing only saved < 10 lines of code, btw.
> > Connector with your changes just does not work.
>
> As in "I tried to run it and adjusted the application for the new
> cn_notify_req size and it does not work" or "I am saying it does not
> work"?
Second, i.e. "I am saing it does not work".
With your changes userspace just does not see any message being sent by kernelspace.
> > Dmitry, I do not understand why do you change things in so radical manner,
> > especially when there are no 1. new interesting features, 2. no performance
> > improvements, 3. no right design changes.
>
> There is a right design change - it keeps the code simple. It peels
> all the layers you invented for something that does not really need
> layers - connector is just a simple wrapper over raw netlink.
That is your opinion.
Code is already as simple as possible, since it is layered and each
piece of code is separated from different logical entities.
> When I look at the code I see the mess of unneeded refcounting, wrong
> kinds of locking, over-engineering. Of course I want to change it.
There are no unneded reference counting now.
Locking was not wrong - you only placed semaphore instead of notify BH lock
just to have ability to sleep under the lock and send callback addition/removing
notification with GFP_KERNEL mask,
and it is probably right change from the first view, as I said,
but it can be implemented in a different way.
> Keep it simple. Except for your testing environment why do you need to
> have separate netlink sockets? You have an ample ID space. Why do you
> need separate cn_dev and cn_queue_dev (and in the end have them at
> all)? Why do you need "input" handler, do you expect to use different
> handlers? If so which? And why? Do you expect to use other transports?
> If so which? And why? You need to have justifiction for every
> additional layer you putting on.
Sigh.
It was already answered so many times, as far as I remember...
Ok, let's do it again.
Several netlink sockets - if you can see, connector uses NETLINK_ULOG
for it's notification, you can not use the same socket twice, so you
may have new device with new socket, or several of them, and system
may work if one initialisation failed.
cn_dev is connector device structure, cn_queue_dev - it's queueing part.
input handler is needed for ability to receive messages.
Couple of next questions can be answered in one reply:
original connector's idea was to implement notification based
not on top of netlink protocol, but using ioctls of char device,
so there could be different connector device, input method,
transport and so on. Probably I will not implement it thought.
Over-engineering? I can not call it so... :)
> > > BTW, I do not think that "All rights reserved" statement is applicable/
> > > compatible with GPL this software is supposedly released under, I have
> > > taken liberty of removing this statement.
> >
> > You substitute license agreements with copyright.
> > "All rights reserved" is targeted to copyright
> > (avtorskoe pravo, zakon N 5351/1-1) of the code
> > and has nothing with license.
>
> Yes, you are probably right, sorry for the noise...
>
> --
> Dmitry
Evgeniy Polyakov
Only failure makes us experts. -- Theo de Raadt
On 5/10/05, Evgeniy Polyakov <[email protected]> wrote:
> On Tue, 10 May 2005 09:56:45 -0500
> Dmitry Torokhov <[email protected]> wrote:
>
> > On 5/10/05, Evgeniy Polyakov <[email protected]> wrote:
> > > On Tue, 10 May 2005 01:18:46 -0500
> > > Dmitry Torokhov <[email protected]> wrote:
> > >
> > > > - drop cn_dev - there is only one connector;
> > >
> > > There may be several connectors with different socket types.
> > > I use it in my environment for tests.
> > >
> >
> > Why do you need that? u32 ID space is not enough?
>
> They use different netlink sockets.
> One of them can fail during initialisation, for example...
Are you saying that there more chances successfully allocating 2
netlink sockets than one?
>
> > > > - simplify cn_notify_request to carry only one range - it simplifies code
> > > > quite a bit - nothing stops clientd from sending several notification
> > > > requests;
> > >
> > > ? Nothing stops clients from sending several notification requests now.
> > > I do not see how it simplifies the things?
> > > Feel free to set idx_num and val_num to 1 and you will have the same.
> >
> > Compare the implementaion - without variable-length notification
> > requests the code is much simplier and flexible.
>
> Hmm, old request had a length, new one - does not.
> But instead of atomically addition of the whole set, one need to send
> couple of events.
> I do not see any profit here.
Smaller kernel code, simpler notification structure, ability to stop
listenig to a certain events without re-registering the rest of the
block.
> > > > - admit that SKBs are message medium and do not mess up with passing around
> > > > destructor functions.
> > >
> > > You mess up layers.
> > > I do not see why you want it.
> >
> > What layers? Connector core is not an onion, with my patch it is 481
> > line .c file(down from 900+), comments probably take 1/3 of it. We
> > just need to keep it simple, that's all.
>
> There is low-level medium layer - skbs.
> There is middle-level layer - logical handling attached data and appropriate callback invocation.
> There is high-level layer - various protocols over connector.
>
> Why any of them should know that others exist?
> Your changes with destructor removing only saved < 10 lines of code, btw.
Yes, line here and line there - all in all couple hundred lines.
> > > Connector with your changes just does not work.
> >
> > As in "I tried to run it and adjusted the application for the new
> > cn_notify_req size and it does not work" or "I am saying it does not
> > work"?
>
> Second, i.e. "I am saing it does not work".
> With your changes userspace just does not see any message being sent by kernelspace.
Would you care to elaborate on this please. As far as I know
kernel->userspace path was not changed at all.
>
> > > Dmitry, I do not understand why do you change things in so radical manner,
> > > especially when there are no 1. new interesting features, 2. no performance
> > > improvements, 3. no right design changes.
> >
> > There is a right design change - it keeps the code simple. It peels
> > all the layers you invented for something that does not really need
> > layers - connector is just a simple wrapper over raw netlink.
>
> That is your opinion.
>
> Code is already as simple as possible, since it is layered and each
> piece of code is separated from different logical entities.
>
> > When I look at the code I see the mess of unneeded refcounting, wrong
> > kinds of locking, over-engineering. Of course I want to change it.
>
> There are no unneded reference counting now.
> Locking was not wrong - you only placed semaphore instead of notify BH lock
> just to have ability to sleep under the lock and send callback addition/removing
> notification with GFP_KERNEL mask,
> and it is probably right change from the first view, as I said,
> but it can be implemented in a different way.
That's what I was referring to as "wring locking". Notifications do
not need to stop bottom handlers, a spinlock or semaphore willl work
just fine. SInce I wanted to allocate memory only after I determined
that it is a new notify request a semaphore was a better choise.
> > Keep it simple. Except for your testing environment why do you need to
> > have separate netlink sockets? You have an ample ID space. Why do you
> > need separate cn_dev and cn_queue_dev (and in the end have them at
> > all)? Why do you need "input" handler, do you expect to use different
> > handlers? If so which? And why? Do you expect to use other transports?
> > If so which? And why? You need to have justifiction for every
> > additional layer you putting on.
>
> Sigh.
> It was already answered so many times, as far as I remember...
> Ok, let's do it again.
>
> Several netlink sockets - if you can see, connector uses NETLINK_ULOG
> for it's notification, you can not use the same socket twice, so you
> may have new device with new socket, or several of them, and system
> may work if one initialisation failed.
Or it may not. Why do you think that having several sockets is safer
than using one? After all, it is just a matter of setting it up and
finally allocating a separate socket number for connector.
> cn_dev is connector device structure, cn_queue_dev - it's queueing part.
> input handler is needed for ability to receive messages.
You chose not to understand my question. Why do you need to have is as
a pointer? Do you expect to have different input handler?
> Couple of next questions can be answered in one reply:
> original connector's idea was to implement notification based
> not on top of netlink protocol, but using ioctls of char device,
> so there could be different connector device, input method,
> transport and so on. Probably I will not implement it thought.
IOW this all is remnants of an older design. You have no plans to
utilize the framework and therefore it is not needed anymore and can
be dropped.
--
Dmitry
On Tue, 10 May 2005 12:50:55 -0500
Dmitry Torokhov <[email protected]> wrote:
> On 5/10/05, Evgeniy Polyakov <[email protected]> wrote:
> > On Tue, 10 May 2005 09:56:45 -0500
> > Dmitry Torokhov <[email protected]> wrote:
> >
> > > On 5/10/05, Evgeniy Polyakov <[email protected]> wrote:
> > > > On Tue, 10 May 2005 01:18:46 -0500
> > > > Dmitry Torokhov <[email protected]> wrote:
> > > >
> > > > > - drop cn_dev - there is only one connector;
> > > >
> > > > There may be several connectors with different socket types.
> > > > I use it in my environment for tests.
> > > >
> > >
> > > Why do you need that? u32 ID space is not enough?
> >
> > They use different netlink sockets.
> > One of them can fail during initialisation, for example...
>
> Are you saying that there more chances successfully allocating 2
> netlink sockets than one?
Of course - one may be in use by ipt_ULOG for example.
Although probably it could be better to get new unit from David Miller.
I want to create some kind of unused-in-kernel socket number -
each netlink user outside kernel might use it without kernel users
bothering.
> >
> > > > > - simplify cn_notify_request to carry only one range - it simplifies code
> > > > > quite a bit - nothing stops clientd from sending several notification
> > > > > requests;
> > > >
> > > > ? Nothing stops clients from sending several notification requests now.
> > > > I do not see how it simplifies the things?
> > > > Feel free to set idx_num and val_num to 1 and you will have the same.
> > >
> > > Compare the implementaion - without variable-length notification
> > > requests the code is much simplier and flexible.
> >
> > Hmm, old request had a length, new one - does not.
> > But instead of atomically addition of the whole set, one need to send
> > couple of events.
> > I do not see any profit here.
>
> Smaller kernel code, simpler notification structure, ability to stop
> listenig to a certain events without re-registering the rest of the
> block.
As I said - set idx and val numbers to one and you will have the same
behaviour.
Code is neither smaller [ok, you removed a loop] nor simplier
- it is functionality decreasing.
> > > > > - admit that SKBs are message medium and do not mess up with passing around
> > > > > destructor functions.
> > > >
> > > > You mess up layers.
> > > > I do not see why you want it.
> > >
> > > What layers? Connector core is not an onion, with my patch it is 481
> > > line .c file(down from 900+), comments probably take 1/3 of it. We
> > > just need to keep it simple, that's all.
> >
> > There is low-level medium layer - skbs.
> > There is middle-level layer - logical handling attached data and appropriate callback invocation.
> > There is high-level layer - various protocols over connector.
> >
> > Why any of them should know that others exist?
> > Your changes with destructor removing only saved < 10 lines of code, btw.
>
> Yes, line here and line there - all in all couple hundred lines.
It is not unused bits - it is framework which allows easy functionality extending.
> > > > Connector with your changes just does not work.
> > >
> > > As in "I tried to run it and adjusted the application for the new
> > > cn_notify_req size and it does not work" or "I am saying it does not
> > > work"?
> >
> > Second, i.e. "I am saing it does not work".
> > With your changes userspace just does not see any message being sent by kernelspace.
>
> Would you care to elaborate on this please. As far as I know
> kernel->userspace path was not changed at all.
I inserted your module and test module, which sends a message using cn_netlink_send()
one time per second.
Usersace application bound to -1 group does not receive that messages.
If I remove test module and your module, then insert vanilla
connector module and test module, then userspace application
begins to receive messages.
Here is part of dmesg:
cn_test_callback: 403117737: idx=123, val=457, seq=9, ack=0, len=12: counter = 9.
cn_test_callback: 403118737: idx=123, val=457, seq=10, ack=0, len=13: counter = 10.
cn_test_callback: 403119737: idx=123, val=457, seq=11, ack=0, len=13: counter = 11.
cn_test_callback: 403120737: idx=123, val=457, seq=12, ack=0, len=13: counter = 12.
connector: cn_cb_equal: comparing ffffffff.ffffffff and 0123.0456
connector: cn_cb_equal: comparing ffffffff.ffffffff and 0123.0457
connector: cn_cb_equal: comparing 0123.0456 and 0123.0457
connector: cn_cb_equal: comparing ffffffff.ffffffff and 0123.0457
connector: cn_cb_equal: comparing 0123.0456 and 0123.0457
connector: cn_cb_equal: comparing 0123.0457 and 0123.0457
connector: cn_netlink_send: len=12, seq=0, ack=0, group=0.
I can not say where the problem is and completely misunderstand how test callback
can be called several times...
Something is broken.
> >
> > > > Dmitry, I do not understand why do you change things in so radical manner,
> > > > especially when there are no 1. new interesting features, 2. no performance
> > > > improvements, 3. no right design changes.
> > >
> > > There is a right design change - it keeps the code simple. It peels
> > > all the layers you invented for something that does not really need
> > > layers - connector is just a simple wrapper over raw netlink.
> >
> > That is your opinion.
> >
> > Code is already as simple as possible, since it is layered and each
> > piece of code is separated from different logical entities.
> >
> > > When I look at the code I see the mess of unneeded refcounting, wrong
> > > kinds of locking, over-engineering. Of course I want to change it.
> >
> > There are no unneded reference counting now.
> > Locking was not wrong - you only placed semaphore instead of notify BH lock
> > just to have ability to sleep under the lock and send callback addition/removing
> > notification with GFP_KERNEL mask,
> > and it is probably right change from the first view, as I said,
> > but it can be implemented in a different way.
>
> That's what I was referring to as "wring locking". Notifications do
> not need to stop bottom handlers, a spinlock or semaphore willl work
> just fine. SInce I wanted to allocate memory only after I determined
> that it is a new notify request a semaphore was a better choise.
I will think of it, but in long-time I do not like such approach -
it is some kind of deadlock - we have no memory, so we sleep in allocation,
so we can not remove callback since it will sleep in allocation...
> > > Keep it simple. Except for your testing environment why do you need to
> > > have separate netlink sockets? You have an ample ID space. Why do you
> > > need separate cn_dev and cn_queue_dev (and in the end have them at
> > > all)? Why do you need "input" handler, do you expect to use different
> > > handlers? If so which? And why? Do you expect to use other transports?
> > > If so which? And why? You need to have justifiction for every
> > > additional layer you putting on.
> >
> > Sigh.
> > It was already answered so many times, as far as I remember...
> > Ok, let's do it again.
> >
> > Several netlink sockets - if you can see, connector uses NETLINK_ULOG
> > for it's notification, you can not use the same socket twice, so you
> > may have new device with new socket, or several of them, and system
> > may work if one initialisation failed.
>
> Or it may not. Why do you think that having several sockets is safer
> than using one? After all, it is just a matter of setting it up and
> finally allocating a separate socket number for connector.
Yes, there can be such a way.
But when connector was not in mainline it used TUN/TAP netlink sockets
and tried to get one by one, until successfully allocated.
Using dmesg userspace could find needed socket number for itself.
> > cn_dev is connector device structure, cn_queue_dev - it's queueing part.
> > input handler is needed for ability to receive messages.
>
> You chose not to understand my question. Why do you need to have is as
> a pointer? Do you expect to have different input handler?
In runtime - no.
cn_dev is a main transport structure, while cn_queue_dev is for
callback queue handling.
> > Couple of next questions can be answered in one reply:
> > original connector's idea was to implement notification based
> > not on top of netlink protocol, but using ioctls of char device,
> > so there could be different connector device, input method,
> > transport and so on. Probably I will not implement it thought.
>
> IOW this all is remnants of an older design. You have no plans to
> utilize the framework and therefore it is not needed anymore and can
> be dropped.
Different connector devices still there.
They use different sockets.
Concider netlink2 [with guaranteed delivering - I have some ideas about it] -
one just needs to add a new connector device, no new braindumping and
recalling how it was cool before.
It will require some changes to the connector structure, but it will not
be complete rewrite because some time ago someone decided we do not
need it and completely removed anything that lives outside simple
netlink wrapper.
> --
> Dmitry
Evgeniy Polyakov
Only failure makes us experts. -- Theo de Raadt
On Tuesday 10 May 2005 13:24, Evgeniy Polyakov wrote:
> On Tue, 10 May 2005 12:50:55 -0500
> Dmitry Torokhov <[email protected]> wrote:
>
> > On 5/10/05, Evgeniy Polyakov <[email protected]> wrote:
> > > On Tue, 10 May 2005 09:56:45 -0500
> > > Dmitry Torokhov <[email protected]> wrote:
> > >
> > > > On 5/10/05, Evgeniy Polyakov <[email protected]> wrote:
> > > > > On Tue, 10 May 2005 01:18:46 -0500
> > > > > Dmitry Torokhov <[email protected]> wrote:
> > > > >
> > > > > > - drop cn_dev - there is only one connector;
> > > > >
> > > > > There may be several connectors with different socket types.
> > > > > I use it in my environment for tests.
> > > > >
> > > >
> > > > Why do you need that? u32 ID space is not enough?
> > >
> > > They use different netlink sockets.
> > > One of them can fail during initialisation, for example...
> >
> > Are you saying that there more chances successfully allocating 2
> > netlink sockets than one?
>
> Of course - one may be in use by ipt_ULOG for example.
And what does it have to do with multiple connectors and different sockets?
You just need to find one free socket, once you found it you have only one
instance of connector running.
> Although probably it could be better to get new unit from David Miller.
> I want to create some kind of unused-in-kernel socket number -
> each netlink user outside kernel might use it without kernel users
> bothering.
Right.
> > >
> > > > > > - simplify cn_notify_request to carry only one range - it simplifies code
> > > > > > quite a bit - nothing stops clientd from sending several notification
> > > > > > requests;
> > > > >
> > > > > ? Nothing stops clients from sending several notification requests now.
> > > > > I do not see how it simplifies the things?
> > > > > Feel free to set idx_num and val_num to 1 and you will have the same.
> > > >
> > > > Compare the implementaion - without variable-length notification
> > > > requests the code is much simplier and flexible.
> > >
> > > Hmm, old request had a length, new one - does not.
> > > But instead of atomically addition of the whole set, one need to send
> > > couple of events.
> > > I do not see any profit here.
> >
> > Smaller kernel code, simpler notification structure, ability to stop
> > listenig to a certain events without re-registering the rest of the
> > block.
>
> As I said - set idx and val numbers to one and you will have the same
> behaviour.
> Code is neither smaller [ok, you removed a loop] nor simplier
> - it is functionality decreasing.
Actually, I would do yet another step and just put something like
int cn_request_notify(u32 group, u32 idx, u32 idx_range, u32 val, u32 val_range)
int cn_cancel_notify(u32 group, u32 idx, u32 idx_range, u32 val, u32 val_range)
into connector core. Then drivers could do something like this:
static int cn_test_want_notify(void)
{
int error;
error = cn_request_notify(1, cn_test_id.idx, 10, cn_test_id.val, 10);
if (error)
return error;
error = cn_request_notify(1, cn_test_id.idx, 10, cn_test_id.val + 20, 10);
if (error) {
/* unregister */
cn_cancel_notify(1, cn_test_id.idx, 10, cn_test_id.val, 10);
return error;
}
return 0;
}
No messing up with memory allocations and complex message structures.
> > > > > Connector with your changes just does not work.
> > > >
> > > > As in "I tried to run it and adjusted the application for the new
> > > > cn_notify_req size and it does not work" or "I am saying it does not
> > > > work"?
> > >
> > > Second, i.e. "I am saing it does not work".
> > > With your changes userspace just does not see any message being sent by kernelspace.
> >
> > Would you care to elaborate on this please. As far as I know
> > kernel->userspace path was not changed at all.
>
> I inserted your module and test module, which sends a message using cn_netlink_send()
> one time per second.
> Usersace application bound to -1 group does not receive that messages.
Works for me:
root@core cntest]# ./a.out
recvfrom successful
counter = 0
recvfrom successful
counter = 1
recvfrom successful
counter = 2
recvfrom successful
counter = 3
recvfrom successful
counter = 4
recvfrom successful
counter = 5
recvfrom successful
counter = 6
recvfrom successful
counter = 7
recvfrom successful
counter = 8
(I won't ever show anyone that gutted piece of ulogd that produced these
messages :) )
Try recompiling your userspace application, messages were changed a bit.
> > > > > Dmitry, I do not understand why do you change things in so radical manner,
> > > > > especially when there are no 1. new interesting features, 2. no performance
> > > > > improvements, 3. no right design changes.
> > > >
> > > > There is a right design change - it keeps the code simple. It peels
> > > > all the layers you invented for something that does not really need
> > > > layers - connector is just a simple wrapper over raw netlink.
> > >
> > > That is your opinion.
> > >
> > > Code is already as simple as possible, since it is layered and each
> > > piece of code is separated from different logical entities.
> > >
> > > > When I look at the code I see the mess of unneeded refcounting, wrong
> > > > kinds of locking, over-engineering. Of course I want to change it.
> > >
> > > There are no unneded reference counting now.
> > > Locking was not wrong - you only placed semaphore instead of notify BH lock
> > > just to have ability to sleep under the lock and send callback addition/removing
> > > notification with GFP_KERNEL mask,
> > > and it is probably right change from the first view, as I said,
> > > but it can be implemented in a different way.
> >
> > That's what I was referring to as "wring locking". Notifications do
> > not need to stop bottom handlers, a spinlock or semaphore willl work
> > just fine. SInce I wanted to allocate memory only after I determined
> > that it is a new notify request a semaphore was a better choise.
>
> I will think of it, but in long-time I do not like such approach -
> it is some kind of deadlock - we have no memory, so we sleep in allocation,
> so we can not remove callback since it will sleep in allocation...
There is no deadlock. We are not waiting for memory to be freed, we are
waiting for paging.
> > > > Keep it simple. Except for your testing environment why do you need to
> > > > have separate netlink sockets? You have an ample ID space. Why do you
> > > > need separate cn_dev and cn_queue_dev (and in the end have them at
> > > > all)? Why do you need "input" handler, do you expect to use different
> > > > handlers? If so which? And why? Do you expect to use other transports?
> > > > If so which? And why? You need to have justifiction for every
> > > > additional layer you putting on.
> > >
> > > Sigh.
> > > It was already answered so many times, as far as I remember...
> > > Ok, let's do it again.
> > >
> > > Several netlink sockets - if you can see, connector uses NETLINK_ULOG
> > > for it's notification, you can not use the same socket twice, so you
> > > may have new device with new socket, or several of them, and system
> > > may work if one initialisation failed.
> >
> > Or it may not. Why do you think that having several sockets is safer
> > than using one? After all, it is just a matter of setting it up and
> > finally allocating a separate socket number for connector.
>
> Yes, there can be such a way.
> But when connector was not in mainline it used TUN/TAP netlink sockets
> and tried to get one by one, until successfully allocated.
> Using dmesg userspace could find needed socket number for itself.
>
And in the end you have one connector bound to some socket. There is no
multiple connectors - the goal (as far as I see it) was to have single
transport web of messages and callbacks for entire kernel. It does not make
sense to divide it.
> > > cn_dev is connector device structure, cn_queue_dev - it's queueing part.
> > > input handler is needed for ability to receive messages.
> >
> > You chose not to understand my question. Why do you need to have is as
> > a pointer? Do you expect to have different input handler?
>
> In runtime - no.
> cn_dev is a main transport structure, while cn_queue_dev is for
> callback queue handling.
>
> > > Couple of next questions can be answered in one reply:
> > > original connector's idea was to implement notification based
> > > not on top of netlink protocol, but using ioctls of char device,
> > > so there could be different connector device, input method,
> > > transport and so on. Probably I will not implement it thought.
> >
> > IOW this all is remnants of an older design. You have no plans to
> > utilize the framework and therefore it is not needed anymore and can
> > be dropped.
>
> Different connector devices still there.
> They use different sockets.
> Concider netlink2 [with guaranteed delivering - I have some ideas about it] -
> one just needs to add a new connector device, no new braindumping and
> recalling how it was cool before.
> It will require some changes to the connector structure, but it will not
> be complete rewrite because some time ago someone decided we do not
> need it and completely removed anything that lives outside simple
> netlink wrapper.
Think about implications some more - userspace needs to be aware and chose
what socket to bind to, applications will change... Messages will change.
In-kernel users will also have to select kind of transport they want to use
(guaranteed delivery or not) - it looks like you'll be better off having
these connectors completely separated.
--
Dmitry
On Wed, 2005-05-11 at 00:46 -0500, Dmitry Torokhov wrote:
> > > > > >
> > > > > > > - drop cn_dev - there is only one connector;
> > > > > >
> > > > > > There may be several connectors with different socket types.
> > > > > > I use it in my environment for tests.
> > > > > >
> > > > >
> > > > > Why do you need that? u32 ID space is not enough?
> > > >
> > > > They use different netlink sockets.
> > > > One of them can fail during initialisation, for example...
> > >
> > > Are you saying that there more chances successfully allocating 2
> > > netlink sockets than one?
> >
> > Of course - one may be in use by ipt_ULOG for example.
>
> And what does it have to do with multiple connectors and different sockets?
> You just need to find one free socket, once you found it you have only one
> instance of connector running.
Because one connector may be added for general-purpose socket like
existing one,
any other connectors can be added for different transport layer for
example.
> > Although probably it could be better to get new unit from David Miller.
> > I want to create some kind of unused-in-kernel socket number -
> > each netlink user outside kernel might use it without kernel users
> > bothering.
>
> Right.
>
> > > >
> > > > > > > - simplify cn_notify_request to carry only one range - it simplifies code
> > > > > > > quite a bit - nothing stops clientd from sending several notification
> > > > > > > requests;
> > > > > >
> > > > > > ? Nothing stops clients from sending several notification requests now.
> > > > > > I do not see how it simplifies the things?
> > > > > > Feel free to set idx_num and val_num to 1 and you will have the same.
> > > > >
> > > > > Compare the implementaion - without variable-length notification
> > > > > requests the code is much simplier and flexible.
> > > >
> > > > Hmm, old request had a length, new one - does not.
> > > > But instead of atomically addition of the whole set, one need to send
> > > > couple of events.
> > > > I do not see any profit here.
> > >
> > > Smaller kernel code, simpler notification structure, ability to stop
> > > listenig to a certain events without re-registering the rest of the
> > > block.
> >
> > As I said - set idx and val numbers to one and you will have the same
> > behaviour.
> > Code is neither smaller [ok, you removed a loop] nor simplier
> > - it is functionality decreasing.
>
> Actually, I would do yet another step and just put something like
>
> int cn_request_notify(u32 group, u32 idx, u32 idx_range, u32 val, u32 val_range)
> int cn_cancel_notify(u32 group, u32 idx, u32 idx_range, u32 val, u32 val_range)
>
> into connector core. Then drivers could do something like this:
>
> static int cn_test_want_notify(void)
> {
> int error;
>
> error = cn_request_notify(1, cn_test_id.idx, 10, cn_test_id.val, 10);
> if (error)
> return error;
>
> error = cn_request_notify(1, cn_test_id.idx, 10, cn_test_id.val + 20, 10);
> if (error) {
> /* unregister */
> cn_cancel_notify(1, cn_test_id.idx, 10, cn_test_id.val, 10);
> return error;
> }
>
> return 0;
> }
>
> No messing up with memory allocations and complex message structures.
Yep, it could be done, but it has nothing in common
with request structure changes.
I will place it into TODO, thanks.
> > > > > > Connector with your changes just does not work.
> > > > >
> > > > > As in "I tried to run it and adjusted the application for the new
> > > > > cn_notify_req size and it does not work" or "I am saying it does not
> > > > > work"?
> > > >
> > > > Second, i.e. "I am saing it does not work".
> > > > With your changes userspace just does not see any message being sent by kernelspace.
> > >
> > > Would you care to elaborate on this please. As far as I know
> > > kernel->userspace path was not changed at all.
> >
> > I inserted your module and test module, which sends a message using cn_netlink_send()
> > one time per second.
> > Usersace application bound to -1 group does not receive that messages.
>
> Works for me:
>
> root@core cntest]# ./a.out
> recvfrom successful
> counter = 0
> recvfrom successful
> counter = 1
> recvfrom successful
> counter = 2
> recvfrom successful
> counter = 3
> recvfrom successful
> counter = 4
> recvfrom successful
> counter = 5
> recvfrom successful
> counter = 6
> recvfrom successful
> counter = 7
> recvfrom successful
> counter = 8
>
> (I won't ever show anyone that gutted piece of ulogd that produced these
> messages :) )
>
> Try recompiling your userspace application, messages were changed a bit.
I recompiled everything - still does not work.
Application is just reading from socket after polling, it even does not
know about messages itself.
> > > > > > Dmitry, I do not understand why do you change things in so radical manner,
> > > > > > especially when there are no 1. new interesting features, 2. no performance
> > > > > > improvements, 3. no right design changes.
> > > > >
> > > > > There is a right design change - it keeps the code simple. It peels
> > > > > all the layers you invented for something that does not really need
> > > > > layers - connector is just a simple wrapper over raw netlink.
> > > >
> > > > That is your opinion.
> > > >
> > > > Code is already as simple as possible, since it is layered and each
> > > > piece of code is separated from different logical entities.
> > > >
> > > > > When I look at the code I see the mess of unneeded refcounting, wrong
> > > > > kinds of locking, over-engineering. Of course I want to change it.
> > > >
> > > > There are no unneded reference counting now.
> > > > Locking was not wrong - you only placed semaphore instead of notify BH lock
> > > > just to have ability to sleep under the lock and send callback addition/removing
> > > > notification with GFP_KERNEL mask,
> > > > and it is probably right change from the first view, as I said,
> > > > but it can be implemented in a different way.
> > >
> > > That's what I was referring to as "wring locking". Notifications do
> > > not need to stop bottom handlers, a spinlock or semaphore willl work
> > > just fine. SInce I wanted to allocate memory only after I determined
> > > that it is a new notify request a semaphore was a better choise.
> >
> > I will think of it, but in long-time I do not like such approach -
> > it is some kind of deadlock - we have no memory, so we sleep in allocation,
> > so we can not remove callback since it will sleep in allocation...
>
> There is no deadlock. We are not waiting for memory to be freed, we are
> waiting for paging.
Yep. That is why pool is better.
Actually I even want some kind of pools for skb, but do not think it
will be a good idea.
That will require several of them for different sizes and so on...
> > > > > Keep it simple. Except for your testing environment why do you need to
> > > > > have separate netlink sockets? You have an ample ID space. Why do you
> > > > > need separate cn_dev and cn_queue_dev (and in the end have them at
> > > > > all)? Why do you need "input" handler, do you expect to use different
> > > > > handlers? If so which? And why? Do you expect to use other transports?
> > > > > If so which? And why? You need to have justifiction for every
> > > > > additional layer you putting on.
> > > >
> > > > Sigh.
> > > > It was already answered so many times, as far as I remember...
> > > > Ok, let's do it again.
> > > >
> > > > Several netlink sockets - if you can see, connector uses NETLINK_ULOG
> > > > for it's notification, you can not use the same socket twice, so you
> > > > may have new device with new socket, or several of them, and system
> > > > may work if one initialisation failed.
> > >
> > > Or it may not. Why do you think that having several sockets is safer
> > > than using one? After all, it is just a matter of setting it up and
> > > finally allocating a separate socket number for connector.
> >
> > Yes, there can be such a way.
> > But when connector was not in mainline it used TUN/TAP netlink sockets
> > and tried to get one by one, until successfully allocated.
> > Using dmesg userspace could find needed socket number for itself.
> >
>
> And in the end you have one connector bound to some socket. There is no
> multiple connectors - the goal (as far as I see it) was to have single
> transport web of messages and callbacks for entire kernel. It does not make
> sense to divide it.
Callbacks and messages live in cn_queue_dev - it could be the same for
different transport layers, transport layer - it is cn_dev, they might
be different.
You want to remove even ability to extend the system, since you do not
see and want
to understand that with existing design it could be somehow extended,
with your
patch - it will be completely closed system.
> > > > cn_dev is connector device structure, cn_queue_dev - it's queueing part.
> > > > input handler is needed for ability to receive messages.
> > >
> > > You chose not to understand my question. Why do you need to have is as
> > > a pointer? Do you expect to have different input handler?
> >
> > In runtime - no.
> > cn_dev is a main transport structure, while cn_queue_dev is for
> > callback queue handling.
> >
> > > > Couple of next questions can be answered in one reply:
> > > > original connector's idea was to implement notification based
> > > > not on top of netlink protocol, but using ioctls of char device,
> > > > so there could be different connector device, input method,
> > > > transport and so on. Probably I will not implement it thought.
> > >
> > > IOW this all is remnants of an older design. You have no plans to
> > > utilize the framework and therefore it is not needed anymore and can
> > > be dropped.
> >
> > Different connector devices still there.
> > They use different sockets.
> > Concider netlink2 [with guaranteed delivering - I have some ideas about it] -
> > one just needs to add a new connector device, no new braindumping and
> > recalling how it was cool before.
> > It will require some changes to the connector structure, but it will not
> > be complete rewrite because some time ago someone decided we do not
> > need it and completely removed anything that lives outside simple
> > netlink wrapper.
>
> Think about implications some more - userspace needs to be aware and chose
> what socket to bind to, applications will change... Messages will change.
> In-kernel users will also have to select kind of transport they want to use
> (guaranteed delivery or not) - it looks like you'll be better off having
> these connectors completely separated.
No, I do not think messages will be changed, they are quite
selfcontained,
may be some kind of flags - but existing 32bit length can be closed to
16 bit -
16kb is quite enough, especiall with 1024byte checks in connector
itself.
It is only transport layer that will be changed.
Naive approach is just attaching a timer to selected skb and reinit it
each time
socket has overrun until some timeout, of course with some simple
congestion mechanism.
Logical high-level messages were separated specially to hide transport
layer.
We do not change route table when switching from UDP to TCP routes,
and connector message header is exactly a "route" for included message.
Using flags in connector message header one may select desired connector
from
kernelspace, binding to different socket or using ioctl selects
appropriate
connector from userspace.
And even better - kernelspace might use not flags, but might add/remove
callback
to appropriate transport device and thus cn_netlink_send will select
required device using message idx.val.
Your changes break this idea completely - it will require new queues,
new request lists,
new set of messages, new ids - almost all will be duplicated instead of
new device
strucutre which is 28 bytes in size and has a pointer to the common
part, which
has reference counter for such kind of use.
--
Evgeniy Polyakov
Crash is better than data corruption -- Arthur Grabowski
> BTW, I do not think that "All rights reserved" statement is applicable/
> compatible with GPL this software is supposedly released under, I have
> taken liberty of removing this statement.
Its perfectly compatible with the GPL and BSD license. All rights
reserved is simply asserting all your rights (moral and other). It then
goes on to say you can use it under the GPL
Alan