2006-08-14 05:58:43

by Evgeniy Polyakov

[permalink] [raw]
Subject: [take9 0/2] kevent: Generic event handling mechanism.


Generic event handling mechanism.

Changes from 'take8' patchset:
* fixed mmap release bug
* use module_init() instead of late_initcall()
* use better structures for timer notifications

Changes from 'take7' patchset:
* new mmap interface (not tested, waiting for other changes to be acked)
- use nopage() method to dynamically substitue pages
- allocate new page for events only when new added kevent requres it
- do not use ugly index dereferencing, use structure instead
- reduced amount of data in the ring (id and flags),
maximum 12 pages on x86 per kevent fd

Changes from 'take6' patchset:
* a lot of comments!
* do not use list poisoning for detection of the fact, that entry is in the list
* return number of ready kevents even if copy*user() fails
* strict check for number of kevents in syscall
* use ARRAY_SIZE for array size calculation
* changed superblock magic number
* use SLAB_PANIC instead of direct panic() call
* changed -E* return values
* a lot of small cleanups and indent fixes

Changes from 'take5' patchset:
* removed compilation warnings about unused wariables when lockdep is not turned on
* do not use internal socket structures, use appropriate (exported) wrappers instead
* removed default 1 second timeout
* removed AIO stuff from patchset

Changes from 'take4' patchset:
* use miscdevice instead of chardevice
* comments fixes

Changes from 'take3' patchset:
* removed serializing mutex from kevent_user_wait()
* moved storage list processing to RCU
* removed lockdep screaming - all storage locks are initialized in the same function, so it was learned
to differentiate between various cases
* remove kevent from storage if is marked as broken after callback
* fixed a typo in mmaped buffer implementation which would end up in wrong index calcualtion

Changes from 'take2' patchset:
* split kevent_finish_user() to locked and unlocked variants
* do not use KEVENT_STAT ifdefs, use inline functions instead
* use array of callbacks of each type instead of each kevent callback initialization
* changed name of ukevent guarding lock
* use only one kevent lock in kevent_user for all hash buckets instead of per-bucket locks
* do not use kevent_user_ctl structure instead provide needed arguments as syscall parameters
* various indent cleanups
* added optimisation, which is aimed to help when a lot of kevents are being copied from userspace
* mapped buffer (initial) implementation (no userspace yet)

Changes from 'take1' patchset:
- rebased against 2.6.18-git tree
- removed ioctl controlling
- added new syscall kevent_get_events(int fd, unsigned int min_nr, unsigned int max_nr,
unsigned int timeout, void __user *buf, unsigned flags)
- use old syscall kevent_ctl for creation/removing, modification and initial kevent
initialization
- use mutuxes instead of semaphores
- added file descriptor check and return error if provided descriptor does not match
kevent file operations
- various indent fixes
- removed aio_sendfile() declarations.

Thank you.

Signed-off-by: Evgeniy Polyakov <[email protected]>



2006-08-14 05:59:05

by Evgeniy Polyakov

[permalink] [raw]
Subject: [take9 2/2] kevent: poll/select() notifications. Timer notifications.


poll/select() notifications. Timer notifications.

This patch includes generic poll/select and timer notifications.

kevent_poll works simialr to epoll and has the same issues (callback
is invoked not from internal state machine of the caller, but through
process awake).

Timer notifications can be used for fine grained per-process time
management, since interval timers are very inconvenient to use,
and they are limited.

Signed-off-by: Evgeniy Polyakov <[email protected]>

diff --git a/kernel/kevent/kevent_poll.c b/kernel/kevent/kevent_poll.c
new file mode 100644
index 0000000..8a4f863
--- /dev/null
+++ b/kernel/kevent/kevent_poll.c
@@ -0,0 +1,220 @@
+/*
+ * kevent_poll.c
+ *
+ * 2006 Copyright (c) Evgeniy Polyakov <[email protected]>
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/timer.h>
+#include <linux/file.h>
+#include <linux/kevent.h>
+#include <linux/poll.h>
+#include <linux/fs.h>
+
+static kmem_cache_t *kevent_poll_container_cache;
+static kmem_cache_t *kevent_poll_priv_cache;
+
+struct kevent_poll_ctl
+{
+ struct poll_table_struct pt;
+ struct kevent *k;
+};
+
+struct kevent_poll_wait_container
+{
+ struct list_head container_entry;
+ wait_queue_head_t *whead;
+ wait_queue_t wait;
+ struct kevent *k;
+};
+
+struct kevent_poll_private
+{
+ struct list_head container_list;
+ spinlock_t container_lock;
+};
+
+static int kevent_poll_enqueue(struct kevent *k);
+static int kevent_poll_dequeue(struct kevent *k);
+static int kevent_poll_callback(struct kevent *k);
+
+static int kevent_poll_wait_callback(wait_queue_t *wait,
+ unsigned mode, int sync, void *key)
+{
+ struct kevent_poll_wait_container *cont =
+ container_of(wait, struct kevent_poll_wait_container, wait);
+ struct kevent *k = cont->k;
+ struct file *file = k->st->origin;
+ u32 revents;
+
+ revents = file->f_op->poll(file, NULL);
+
+ kevent_storage_ready(k->st, NULL, revents);
+
+ return 0;
+}
+
+static void kevent_poll_qproc(struct file *file, wait_queue_head_t *whead,
+ struct poll_table_struct *poll_table)
+{
+ struct kevent *k =
+ container_of(poll_table, struct kevent_poll_ctl, pt)->k;
+ struct kevent_poll_private *priv = k->priv;
+ struct kevent_poll_wait_container *cont;
+ unsigned long flags;
+
+ cont = kmem_cache_alloc(kevent_poll_container_cache, SLAB_KERNEL);
+ if (!cont) {
+ kevent_break(k);
+ return;
+ }
+
+ cont->k = k;
+ init_waitqueue_func_entry(&cont->wait, kevent_poll_wait_callback);
+ cont->whead = whead;
+
+ spin_lock_irqsave(&priv->container_lock, flags);
+ list_add_tail(&cont->container_entry, &priv->container_list);
+ spin_unlock_irqrestore(&priv->container_lock, flags);
+
+ add_wait_queue(whead, &cont->wait);
+}
+
+static int kevent_poll_enqueue(struct kevent *k)
+{
+ struct file *file;
+ int err, ready = 0;
+ unsigned int revents;
+ struct kevent_poll_ctl ctl;
+ struct kevent_poll_private *priv;
+
+ file = fget(k->event.id.raw[0]);
+ if (!file)
+ return -ENODEV;
+
+ err = -EINVAL;
+ if (!file->f_op || !file->f_op->poll)
+ goto err_out_fput;
+
+ err = -ENOMEM;
+ priv = kmem_cache_alloc(kevent_poll_priv_cache, SLAB_KERNEL);
+ if (!priv)
+ goto err_out_fput;
+
+ spin_lock_init(&priv->container_lock);
+ INIT_LIST_HEAD(&priv->container_list);
+
+ k->priv = priv;
+
+ ctl.k = k;
+ init_poll_funcptr(&ctl.pt, &kevent_poll_qproc);
+
+ err = kevent_storage_enqueue(&file->st, k);
+ if (err)
+ goto err_out_free;
+
+ revents = file->f_op->poll(file, &ctl.pt);
+ if (revents & k->event.event) {
+ ready = 1;
+ kevent_poll_dequeue(k);
+ }
+
+ return ready;
+
+err_out_free:
+ kmem_cache_free(kevent_poll_priv_cache, priv);
+err_out_fput:
+ fput(file);
+ return err;
+}
+
+static int kevent_poll_dequeue(struct kevent *k)
+{
+ struct file *file = k->st->origin;
+ struct kevent_poll_private *priv = k->priv;
+ struct kevent_poll_wait_container *w, *n;
+ unsigned long flags;
+
+ kevent_storage_dequeue(k->st, k);
+
+ spin_lock_irqsave(&priv->container_lock, flags);
+ list_for_each_entry_safe(w, n, &priv->container_list, container_entry) {
+ list_del(&w->container_entry);
+ remove_wait_queue(w->whead, &w->wait);
+ kmem_cache_free(kevent_poll_container_cache, w);
+ }
+ spin_unlock_irqrestore(&priv->container_lock, flags);
+
+ kmem_cache_free(kevent_poll_priv_cache, priv);
+ k->priv = NULL;
+
+ fput(file);
+
+ return 0;
+}
+
+static int kevent_poll_callback(struct kevent *k)
+{
+ struct file *file = k->st->origin;
+ unsigned int revents = file->f_op->poll(file, NULL);
+ return (revents & k->event.event);
+}
+
+static int __init kevent_poll_sys_init(void)
+{
+ struct kevent_callbacks *pc = &kevent_registered_callbacks[KEVENT_POLL];
+
+ kevent_poll_container_cache = kmem_cache_create("kevent_poll_container_cache",
+ sizeof(struct kevent_poll_wait_container), 0, 0, NULL, NULL);
+ if (!kevent_poll_container_cache) {
+ printk(KERN_ERR "Failed to create kevent poll container cache.\n");
+ return -ENOMEM;
+ }
+
+ kevent_poll_priv_cache = kmem_cache_create("kevent_poll_priv_cache",
+ sizeof(struct kevent_poll_private), 0, 0, NULL, NULL);
+ if (!kevent_poll_priv_cache) {
+ printk(KERN_ERR "Failed to create kevent poll private data cache.\n");
+ kmem_cache_destroy(kevent_poll_container_cache);
+ kevent_poll_container_cache = NULL;
+ return -ENOMEM;
+ }
+
+ pc->enqueue = &kevent_poll_enqueue;
+ pc->dequeue = &kevent_poll_dequeue;
+ pc->callback = &kevent_poll_callback;
+
+ printk(KERN_INFO "Kevent poll()/select() subsystem has been initialized.\n");
+ return 0;
+}
+
+static struct lock_class_key kevent_poll_key;
+
+void kevent_poll_reinit(struct file *file)
+{
+ lockdep_set_class(&file->st.lock, &kevent_poll_key);
+}
+
+static void __exit kevent_poll_sys_fini(void)
+{
+ kmem_cache_destroy(kevent_poll_priv_cache);
+ kmem_cache_destroy(kevent_poll_container_cache);
+}
+
+module_init(kevent_poll_sys_init);
+module_exit(kevent_poll_sys_fini);
diff --git a/kernel/kevent/kevent_timer.c b/kernel/kevent/kevent_timer.c
new file mode 100644
index 0000000..fe39b4e
--- /dev/null
+++ b/kernel/kevent/kevent_timer.c
@@ -0,0 +1,108 @@
+/*
+ * kevent_timer.c
+ *
+ * 2006 Copyright (c) Evgeniy Polyakov <[email protected]>
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/timer.h>
+#include <linux/jiffies.h>
+#include <linux/kevent.h>
+
+struct kevent_timer
+{
+ struct timer_list ktimer;
+ struct kevent_storage ktimer_storage;
+};
+
+static void kevent_timer_func(unsigned long data)
+{
+ struct kevent *k = (struct kevent *)data;
+ struct timer_list *t = k->st->origin;
+
+ kevent_storage_ready(k->st, NULL, KEVENT_MASK_ALL);
+ mod_timer(t, jiffies + msecs_to_jiffies(k->event.id.raw[0]));
+}
+
+static struct lock_class_key kevent_timer_key;
+
+static int kevent_timer_enqueue(struct kevent *k)
+{
+ int err;
+ struct kevent_timer *t;
+
+ t = kmalloc(sizeof(struct kevent_timer), GFP_KERNEL);
+ if (!t)
+ return -ENOMEM;
+
+ setup_timer(&t->ktimer, &kevent_timer_func, (unsigned long)k);
+
+ err = kevent_storage_init(&t->ktimer, &t->ktimer_storage);
+ if (err)
+ goto err_out_free;
+ lockdep_set_class(&t->ktimer_storage.lock, &kevent_timer_key);
+
+ err = kevent_storage_enqueue(&t->ktimer_storage, k);
+ if (err)
+ goto err_out_st_fini;
+
+ mod_timer(&t->ktimer, jiffies + msecs_to_jiffies(k->event.id.raw[0]));
+
+ return 0;
+
+err_out_st_fini:
+ kevent_storage_fini(&t->ktimer_storage);
+err_out_free:
+ kfree(t);
+
+ return err;
+}
+
+static int kevent_timer_dequeue(struct kevent *k)
+{
+ struct kevent_storage *st = k->st;
+ struct kevent_timer *t = container_of(st, struct kevent_timer, ktimer_storage);
+
+ del_timer_sync(&t->ktimer);
+ kevent_storage_dequeue(st, k);
+ kfree(t);
+
+ return 0;
+}
+
+static int kevent_timer_callback(struct kevent *k)
+{
+ k->event.ret_data[0] = (__u32)jiffies;
+ return 1;
+}
+
+static int __init kevent_init_timer(void)
+{
+ struct kevent_callbacks *tc = &kevent_registered_callbacks[KEVENT_TIMER];
+
+ tc->enqueue = &kevent_timer_enqueue;
+ tc->dequeue = &kevent_timer_dequeue;
+ tc->callback = &kevent_timer_callback;
+
+ return 0;
+}
+module_init(kevent_init_timer);

2006-08-14 05:59:17

by Evgeniy Polyakov

[permalink] [raw]
Subject: [take9 1/2] kevent: Core files.


Core files.

This patch includes core kevent files:
- userspace controlling
- kernelspace interfaces
- initialization
- notification state machines

Signed-off-by: Evgeniy Polyakov <[email protected]>

diff --git a/arch/i386/kernel/syscall_table.S b/arch/i386/kernel/syscall_table.S
index dd63d47..091ff42 100644
--- a/arch/i386/kernel/syscall_table.S
+++ b/arch/i386/kernel/syscall_table.S
@@ -317,3 +317,5 @@ ENTRY(sys_call_table)
.long sys_tee /* 315 */
.long sys_vmsplice
.long sys_move_pages
+ .long sys_kevent_get_events
+ .long sys_kevent_ctl
diff --git a/arch/x86_64/ia32/ia32entry.S b/arch/x86_64/ia32/ia32entry.S
index 5d4a7d1..b2af4a8 100644
--- a/arch/x86_64/ia32/ia32entry.S
+++ b/arch/x86_64/ia32/ia32entry.S
@@ -713,4 +713,6 @@ #endif
.quad sys_tee
.quad compat_sys_vmsplice
.quad compat_sys_move_pages
+ .quad sys_kevent_get_events
+ .quad sys_kevent_ctl
ia32_syscall_end:
diff --git a/include/asm-i386/unistd.h b/include/asm-i386/unistd.h
index fc1c8dd..c9dde13 100644
--- a/include/asm-i386/unistd.h
+++ b/include/asm-i386/unistd.h
@@ -323,10 +323,12 @@ #define __NR_sync_file_range 314
#define __NR_tee 315
#define __NR_vmsplice 316
#define __NR_move_pages 317
+#define __NR_kevent_get_events 318
+#define __NR_kevent_ctl 319

#ifdef __KERNEL__

-#define NR_syscalls 318
+#define NR_syscalls 320

/*
* user-visible error numbers are in the range -1 - -128: see
diff --git a/include/asm-x86_64/unistd.h b/include/asm-x86_64/unistd.h
index 94387c9..61363e0 100644
--- a/include/asm-x86_64/unistd.h
+++ b/include/asm-x86_64/unistd.h
@@ -619,10 +619,14 @@ #define __NR_vmsplice 278
__SYSCALL(__NR_vmsplice, sys_vmsplice)
#define __NR_move_pages 279
__SYSCALL(__NR_move_pages, sys_move_pages)
+#define __NR_kevent_get_events 280
+__SYSCALL(__NR_kevent_get_events, sys_kevent_get_events)
+#define __NR_kevent_ctl 281
+__SYSCALL(__NR_kevent_ctl, sys_kevent_ctl)

#ifdef __KERNEL__

-#define __NR_syscall_max __NR_move_pages
+#define __NR_syscall_max __NR_kevent_ctl

#ifndef __NO_STUBS

diff --git a/include/linux/kevent.h b/include/linux/kevent.h
new file mode 100644
index 0000000..03eeeea
--- /dev/null
+++ b/include/linux/kevent.h
@@ -0,0 +1,310 @@
+/*
+ * kevent.h
+ *
+ * 2006 Copyright (c) Evgeniy Polyakov <[email protected]>
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef __KEVENT_H
+#define __KEVENT_H
+
+/*
+ * Kevent request flags.
+ */
+
+#define KEVENT_REQ_ONESHOT 0x1 /* Process this event only once and then dequeue. */
+
+/*
+ * Kevent return flags.
+ */
+#define KEVENT_RET_BROKEN 0x1 /* Kevent is broken. */
+#define KEVENT_RET_DONE 0x2 /* Kevent processing was finished successfully. */
+
+/*
+ * Kevent type set.
+ */
+#define KEVENT_SOCKET 0
+#define KEVENT_INODE 1
+#define KEVENT_TIMER 2
+#define KEVENT_POLL 3
+#define KEVENT_NAIO 4
+#define KEVENT_AIO 5
+#define KEVENT_MAX 6
+
+/*
+ * Per-type event sets.
+ * Number of per-event sets should be exactly as number of kevent types.
+ */
+
+/*
+ * Timer events.
+ */
+#define KEVENT_TIMER_FIRED 0x1
+
+/*
+ * Socket/network asynchronous IO events.
+ */
+#define KEVENT_SOCKET_RECV 0x1
+#define KEVENT_SOCKET_ACCEPT 0x2
+#define KEVENT_SOCKET_SEND 0x4
+
+/*
+ * Inode events.
+ */
+#define KEVENT_INODE_CREATE 0x1
+#define KEVENT_INODE_REMOVE 0x2
+
+/*
+ * Poll events.
+ */
+#define KEVENT_POLL_POLLIN 0x0001
+#define KEVENT_POLL_POLLPRI 0x0002
+#define KEVENT_POLL_POLLOUT 0x0004
+#define KEVENT_POLL_POLLERR 0x0008
+#define KEVENT_POLL_POLLHUP 0x0010
+#define KEVENT_POLL_POLLNVAL 0x0020
+
+#define KEVENT_POLL_POLLRDNORM 0x0040
+#define KEVENT_POLL_POLLRDBAND 0x0080
+#define KEVENT_POLL_POLLWRNORM 0x0100
+#define KEVENT_POLL_POLLWRBAND 0x0200
+#define KEVENT_POLL_POLLMSG 0x0400
+#define KEVENT_POLL_POLLREMOVE 0x1000
+
+/*
+ * Asynchronous IO events.
+ */
+#define KEVENT_AIO_BIO 0x1
+
+#define KEVENT_MASK_ALL 0xffffffff /* Mask of all possible event values. */
+#define KEVENT_MASK_EMPTY 0x0 /* Empty mask of ready events. */
+
+struct kevent_id
+{
+ __u32 raw[2];
+};
+
+struct ukevent
+{
+ struct kevent_id id; /* Id of this request, e.g. socket number, file descriptor and so on... */
+ __u32 type; /* Event type, e.g. KEVENT_SOCK, KEVENT_INODE, KEVENT_TIMER and so on... */
+ __u32 event; /* Event itself, e.g. SOCK_ACCEPT, INODE_CREATED, TIMER_FIRED... */
+ __u32 req_flags; /* Per-event request flags */
+ __u32 ret_flags; /* Per-event return flags */
+ __u32 ret_data[2]; /* Event return data. Event originator fills it with anything it likes. */
+ union {
+ __u32 user[2]; /* User's data. It is not used, just copied to/from user. */
+ void *ptr;
+ };
+};
+
+struct mukevent
+{
+ struct kevent_id id;
+ __u32 ret_flags;
+};
+
+#define KEVENT_CTL_ADD 0
+#define KEVENT_CTL_REMOVE 1
+#define KEVENT_CTL_MODIFY 2
+#define KEVENT_CTL_INIT 3
+
+#ifdef __KERNEL__
+
+#include <linux/types.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/mutex.h>
+#include <linux/wait.h>
+#include <linux/net.h>
+#include <linux/rcupdate.h>
+#include <linux/kevent_storage.h>
+
+#define KEVENT_MAX_EVENTS 4096
+#define KEVENT_MIN_BUFFS_ALLOC 3
+
+struct inode;
+struct dentry;
+struct sock;
+
+struct kevent;
+struct kevent_storage;
+typedef int (* kevent_callback_t)(struct kevent *);
+
+/* @callback is called each time new event has been caught. */
+/* @enqueue is called each time new event is queued. */
+/* @dequeue is called each time event is dequeued. */
+
+struct kevent_callbacks {
+ kevent_callback_t callback, enqueue, dequeue;
+};
+
+#define KEVENT_READY 0x1
+#define KEVENT_STORAGE 0x2
+#define KEVENT_USER 0x4
+
+struct kevent
+{
+ struct rcu_head rcu_head; /* Used for kevent freeing.*/
+ struct ukevent event;
+ spinlock_t ulock; /* This lock protects ukevent manipulations, e.g. ret_flags changes. */
+
+ struct list_head kevent_entry; /* Entry of user's queue. */
+ struct list_head storage_entry; /* Entry of origin's queue. */
+ struct list_head ready_entry; /* Entry of user's ready. */
+
+ u32 flags;
+
+ struct kevent_user *user; /* User who requested this kevent. */
+ struct kevent_storage *st; /* Kevent container. */
+
+ struct kevent_callbacks callbacks;
+
+ void *priv; /* Private data for different storages.
+ * poll()/select storage has a list of wait_queue_t containers
+ * for each ->poll() { poll_wait()' } here.
+ */
+};
+
+extern struct kevent_callbacks kevent_registered_callbacks[];
+
+#define KEVENT_HASH_MASK 0xff
+
+struct kevent_user
+{
+ struct list_head kevent_list[KEVENT_HASH_MASK+1];
+ spinlock_t kevent_lock;
+ unsigned int kevent_num; /* Number of queued kevents. */
+
+ struct list_head ready_list; /* List of ready kevents. */
+ unsigned int ready_num; /* Number of ready kevents. */
+ spinlock_t ready_lock; /* Protects all manipulations with ready queue. */
+
+ unsigned int max_ready_num; /* Requested number of kevents. */
+
+ struct mutex ctl_mutex; /* Protects against simultaneous kevent_user control manipulations. */
+ wait_queue_head_t wait; /* Wait until some events are ready. */
+
+ atomic_t refcnt; /* Reference counter, increased for each new kevent. */
+
+ unsigned int pages_in_use;
+ unsigned long *pring; /* Array of pages forming mapped ring buffer */
+
+#ifdef CONFIG_KEVENT_USER_STAT
+ unsigned long im_num;
+ unsigned long wait_num;
+ unsigned long total;
+#endif
+};
+
+extern kmem_cache_t *kevent_cache;
+int kevent_sys_init(void);
+int kevent_enqueue(struct kevent *k);
+int kevent_dequeue(struct kevent *k);
+int kevent_init(struct kevent *k);
+void kevent_requeue(struct kevent *k);
+int kevent_break(struct kevent *k);
+
+void kevent_user_ring_add_event(struct kevent *k);
+
+void kevent_storage_ready(struct kevent_storage *st,
+ kevent_callback_t ready_callback, u32 event);
+int kevent_storage_init(void *origin, struct kevent_storage *st);
+void kevent_storage_fini(struct kevent_storage *st);
+int kevent_storage_enqueue(struct kevent_storage *st, struct kevent *k);
+void kevent_storage_dequeue(struct kevent_storage *st, struct kevent *k);
+
+int kevent_user_add_ukevent(struct ukevent *uk, struct kevent_user *u);
+
+#ifdef CONFIG_KEVENT_POLL
+void kevent_poll_reinit(struct file *file);
+#else
+static inline void kevent_poll_reinit(struct file *file)
+{
+}
+#endif
+
+#ifdef CONFIG_KEVENT_INODE
+void kevent_inode_notify(struct inode *inode, u32 event);
+void kevent_inode_notify_parent(struct dentry *dentry, u32 event);
+void kevent_inode_remove(struct inode *inode);
+#else
+static inline void kevent_inode_notify(struct inode *inode, u32 event)
+{
+}
+static inline void kevent_inode_notify_parent(struct dentry *dentry, u32 event)
+{
+}
+static inline void kevent_inode_remove(struct inode *inode)
+{
+}
+#endif /* CONFIG_KEVENT_INODE */
+#ifdef CONFIG_KEVENT_SOCKET
+#ifdef CONFIG_LOCKDEP
+void kevent_socket_reinit(struct socket *sock);
+void kevent_sk_reinit(struct sock *sk);
+#else
+static inline void kevent_socket_reinit(struct socket *sock)
+{
+}
+static inline void kevent_sk_reinit(struct sock *sk)
+{
+}
+#endif
+void kevent_socket_notify(struct sock *sock, u32 event);
+int kevent_socket_dequeue(struct kevent *k);
+int kevent_socket_enqueue(struct kevent *k);
+#define sock_async(__sk) sock_flag(__sk, SOCK_ASYNC)
+#else
+static inline void kevent_socket_notify(struct sock *sock, u32 event)
+{
+}
+#define sock_async(__sk) ({ (void)__sk; 0; })
+#endif
+
+#ifdef CONFIG_KEVENT_USER_STAT
+static inline void kevent_stat_init(struct kevent_user *u)
+{
+ u->wait_num = u->im_num = u->total = 0;
+}
+static inline void kevent_stat_print(struct kevent_user *u)
+{
+ pr_debug("%s: u=%p, wait=%lu, immediately=%lu, total=%lu.\n",
+ __func__, u, u->wait_num, u->im_num, u->total);
+}
+static inline void kevent_stat_im(struct kevent_user *u)
+{
+ u->im_num++;
+}
+static inline void kevent_stat_wait(struct kevent_user *u)
+{
+ u->wait_num++;
+}
+static inline void kevent_stat_total(struct kevent_user *u)
+{
+ u->total++;
+}
+#else
+#define kevent_stat_print(u) ({ (void) u;})
+#define kevent_stat_init(u) ({ (void) u;})
+#define kevent_stat_im(u) ({ (void) u;})
+#define kevent_stat_wait(u) ({ (void) u;})
+#define kevent_stat_total(u) ({ (void) u;})
+#endif
+
+#endif /* __KERNEL__ */
+#endif /* __KEVENT_H */
diff --git a/include/linux/kevent_storage.h b/include/linux/kevent_storage.h
new file mode 100644
index 0000000..a38575d
--- /dev/null
+++ b/include/linux/kevent_storage.h
@@ -0,0 +1,11 @@
+#ifndef __KEVENT_STORAGE_H
+#define __KEVENT_STORAGE_H
+
+struct kevent_storage
+{
+ void *origin; /* Originator's pointer, e.g. struct sock or struct file. Can be NULL. */
+ struct list_head list; /* List of queued kevents. */
+ spinlock_t lock; /* Protects users queue. */
+};
+
+#endif /* __KEVENT_STORAGE_H */
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 008f04c..8609910 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -597,4 +597,7 @@ asmlinkage long sys_get_robust_list(int
asmlinkage long sys_set_robust_list(struct robust_list_head __user *head,
size_t len);

+asmlinkage long sys_kevent_get_events(int ctl_fd, unsigned int min, unsigned int max,
+ unsigned int timeout, void __user *buf, unsigned flags);
+asmlinkage long sys_kevent_ctl(int ctl_fd, unsigned int cmd, unsigned int num, void __user *buf);
#endif
diff --git a/init/Kconfig b/init/Kconfig
index a099fc6..c550fcc 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -218,6 +218,8 @@ config AUDITSYSCALL
such as SELinux. To use audit's filesystem watch feature, please
ensure that INOTIFY is configured.

+source "kernel/kevent/Kconfig"
+
config IKCONFIG
bool "Kernel .config support"
---help---
diff --git a/kernel/Makefile b/kernel/Makefile
index d62ec66..2d7a6dd 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -47,6 +47,7 @@ obj-$(CONFIG_DETECT_SOFTLOCKUP) += softl
obj-$(CONFIG_GENERIC_HARDIRQS) += irq/
obj-$(CONFIG_SECCOMP) += seccomp.o
obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
+obj-$(CONFIG_KEVENT) += kevent/
obj-$(CONFIG_RELAY) += relay.o
obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
obj-$(CONFIG_TASKSTATS) += taskstats.o
diff --git a/kernel/kevent/Kconfig b/kernel/kevent/Kconfig
new file mode 100644
index 0000000..31ea7b2
--- /dev/null
+++ b/kernel/kevent/Kconfig
@@ -0,0 +1,59 @@
+config KEVENT
+ bool "Kernel event notification mechanism"
+ help
+ This option enables event queue mechanism.
+ It can be used as replacement for poll()/select(), AIO callback
+ invocations, advanced timer notifications and other kernel
+ object status changes.
+
+config KEVENT_USER_STAT
+ bool "Kevent user statistic"
+ depends on KEVENT
+ default N
+ help
+ This option will turn kevent_user statistic collection on.
+ Statistic data includes total number of kevent, number of kevents
+ which are ready immediately at insertion time and number of kevents
+ which were removed through readiness completion.
+ It will be printed each time control kevent descriptor is closed.
+
+config KEVENT_SOCKET
+ bool "Kernel event notifications for sockets"
+ depends on NET && KEVENT
+ help
+ This option enables notifications through KEVENT subsystem of
+ sockets operations, like new packet receiving conditions,
+ ready for accept conditions and so on.
+
+config KEVENT_INODE
+ bool "Kernel event notifications for inodes"
+ depends on KEVENT
+ help
+ This option enables notifications through KEVENT subsystem of
+ inode operations, like file creation, removal and so on.
+
+config KEVENT_TIMER
+ bool "Kernel event notifications for timers"
+ depends on KEVENT
+ help
+ This option allows to use timers through KEVENT subsystem.
+
+config KEVENT_POLL
+ bool "Kernel event notifications for poll()/select()"
+ depends on KEVENT
+ help
+ This option allows to use kevent subsystem for poll()/select()
+ notifications.
+
+config KEVENT_NAIO
+ bool "Network asynchronous IO"
+ depends on KEVENT && KEVENT_SOCKET
+ help
+ This option enables kevent based network asynchronous IO subsystem.
+
+config KEVENT_AIO
+ bool "Asynchronous IO"
+ depends on KEVENT
+ help
+ This option allows to use kevent subsystem for AIO operations.
+ AIO read is currently supported.
diff --git a/kernel/kevent/Makefile b/kernel/kevent/Makefile
new file mode 100644
index 0000000..d1ef9ba
--- /dev/null
+++ b/kernel/kevent/Makefile
@@ -0,0 +1,7 @@
+obj-y := kevent.o kevent_user.o
+obj-$(CONFIG_KEVENT_SOCKET) += kevent_socket.o
+obj-$(CONFIG_KEVENT_INODE) += kevent_inode.o
+obj-$(CONFIG_KEVENT_TIMER) += kevent_timer.o
+obj-$(CONFIG_KEVENT_POLL) += kevent_poll.o
+obj-$(CONFIG_KEVENT_NAIO) += kevent_naio.o
+obj-$(CONFIG_KEVENT_AIO) += kevent_aio.o
diff --git a/kernel/kevent/kevent.c b/kernel/kevent/kevent.c
new file mode 100644
index 0000000..3814464
--- /dev/null
+++ b/kernel/kevent/kevent.c
@@ -0,0 +1,249 @@
+/*
+ * kevent.c
+ *
+ * 2006 Copyright (c) Evgeniy Polyakov <[email protected]>
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/mempool.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+#include <linux/kevent.h>
+
+kmem_cache_t *kevent_cache;
+
+/*
+ * Attempts to add an event into appropriate origin's queue.
+ * Returns positive value if this event is ready immediately,
+ * negative value in case of error and zero if event has been queued.
+ * ->enqueue() callback must increase origin's reference counter.
+ */
+int kevent_enqueue(struct kevent *k)
+{
+ if (k->event.type >= KEVENT_MAX)
+ return -EINVAL;
+
+ if (!k->callbacks.enqueue) {
+ kevent_break(k);
+ return -EINVAL;
+ }
+
+ return k->callbacks.enqueue(k);
+}
+
+/*
+ * Remove event from the appropriate queue.
+ * ->dequeue() callback must decrease origin's reference counter.
+ */
+int kevent_dequeue(struct kevent *k)
+{
+ if (k->event.type >= KEVENT_MAX)
+ return -EINVAL;
+
+ if (!k->callbacks.dequeue) {
+ kevent_break(k);
+ return -EINVAL;
+ }
+
+ return k->callbacks.dequeue(k);
+}
+
+/*
+ * Mark kevent as broken.
+ */
+int kevent_break(struct kevent *k)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&k->ulock, flags);
+ k->event.ret_flags |= KEVENT_RET_BROKEN;
+ spin_unlock_irqrestore(&k->ulock, flags);
+ return 0;
+}
+
+struct kevent_callbacks kevent_registered_callbacks[KEVENT_MAX];
+
+/*
+ * Must be called before event is going to be added into some origin's queue.
+ * Initializes ->enqueue(), ->dequeue() and ->callback() callbacks.
+ * If failed, kevent should not be used or kevent_enqueue() will fail to add
+ * this kevent into origin's queue with setting
+ * KEVENT_RET_BROKEN flag in kevent->event.ret_flags.
+ */
+int kevent_init(struct kevent *k)
+{
+ spin_lock_init(&k->ulock);
+ k->flags = 0;
+
+ if (k->event.type >= KEVENT_MAX)
+ return -EINVAL;
+
+ k->callbacks = kevent_registered_callbacks[k->event.type];
+ if (!k->callbacks.callback) {
+ kevent_break(k);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/*
+ * Called from ->enqueue() callback when reference counter for given
+ * origin (socket, inode...) has been increased.
+ */
+int kevent_storage_enqueue(struct kevent_storage *st, struct kevent *k)
+{
+ unsigned long flags;
+
+ k->st = st;
+ spin_lock_irqsave(&st->lock, flags);
+ list_add_tail_rcu(&k->storage_entry, &st->list);
+ k->flags |= KEVENT_STORAGE;
+ spin_unlock_irqrestore(&st->lock, flags);
+ return 0;
+}
+
+/*
+ * Dequeue kevent from origin's queue.
+ * It does not decrease origin's reference counter in any way
+ * and must be called before it, so storage itself must be valid.
+ * It is called from ->dequeue() callback.
+ */
+void kevent_storage_dequeue(struct kevent_storage *st, struct kevent *k)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&st->lock, flags);
+ if (k->flags & KEVENT_STORAGE) {
+ list_del_rcu(&k->storage_entry);
+ k->flags &= ~KEVENT_STORAGE;
+ }
+ spin_unlock_irqrestore(&st->lock, flags);
+}
+
+/*
+ * Call kevent ready callback and queue it into ready queue if needed.
+ * If kevent is marked as one-shot, then remove it from storage queue.
+ */
+static void __kevent_requeue(struct kevent *k, u32 event)
+{
+ int ret, rem = 0;
+ unsigned long flags;
+
+ ret = k->callbacks.callback(k);
+
+ spin_lock_irqsave(&k->ulock, flags);
+ if (ret > 0) {
+ k->event.ret_flags |= KEVENT_RET_DONE;
+ } else if (ret < 0) {
+ k->event.ret_flags |= KEVENT_RET_BROKEN;
+ k->event.ret_flags |= KEVENT_RET_DONE;
+ }
+ rem = (k->event.req_flags & KEVENT_REQ_ONESHOT);
+ if (!ret)
+ ret = (k->event.ret_flags & (KEVENT_RET_BROKEN|KEVENT_RET_DONE));
+ spin_unlock_irqrestore(&k->ulock, flags);
+
+ if (ret) {
+ if ((rem || ret < 0) && k->flags &KEVENT_STORAGE) {
+ list_del_rcu(&k->storage_entry);
+ k->flags &= ~KEVENT_STORAGE;
+ }
+
+ spin_lock_irqsave(&k->user->ready_lock, flags);
+ if (!(k->flags & KEVENT_READY)) {
+ kevent_user_ring_add_event(k);
+ list_add_tail(&k->ready_entry, &k->user->ready_list);
+ k->flags |= KEVENT_READY;
+ k->user->ready_num++;
+ }
+ spin_unlock_irqrestore(&k->user->ready_lock, flags);
+ wake_up(&k->user->wait);
+ }
+}
+
+/*
+ * Check if kevent is ready (by invoking it's callback) and requeue/remove
+ * if needed.
+ */
+void kevent_requeue(struct kevent *k)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&k->st->lock, flags);
+ __kevent_requeue(k, 0);
+ spin_unlock_irqrestore(&k->st->lock, flags);
+}
+
+/*
+ * Called each time some activity in origin (socket, inode...) is noticed.
+ */
+void kevent_storage_ready(struct kevent_storage *st,
+ kevent_callback_t ready_callback, u32 event)
+{
+ struct kevent *k;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(k, &st->list, storage_entry) {
+ if (ready_callback)
+ (*ready_callback)(k);
+
+ if (event & k->event.event)
+ __kevent_requeue(k, event);
+ }
+ rcu_read_unlock();
+}
+
+int kevent_storage_init(void *origin, struct kevent_storage *st)
+{
+ spin_lock_init(&st->lock);
+ st->origin = origin;
+ INIT_LIST_HEAD(&st->list);
+ return 0;
+}
+
+/*
+ * Mark all events as broken, that will remove them from storage,
+ * so storage origin (inode, sockt and so on) can be safely removed.
+ * No new entries are allowed to be added into the storage at this point.
+ * (Socket is removed from file table at this point for example).
+ */
+void kevent_storage_fini(struct kevent_storage *st)
+{
+ kevent_storage_ready(st, kevent_break, KEVENT_MASK_ALL);
+}
+
+int kevent_sys_init(void)
+{
+ int i;
+
+ kevent_cache = kmem_cache_create("kevent_cache",
+ sizeof(struct kevent), 0, SLAB_PANIC, NULL, NULL);
+
+ for (i=0; i<ARRAY_SIZE(kevent_registered_callbacks); ++i) {
+ struct kevent_callbacks *c = &kevent_registered_callbacks[i];
+
+ c->callback = c->enqueue = c->dequeue = NULL;
+ }
+
+ return 0;
+}
diff --git a/kernel/kevent/kevent_user.c b/kernel/kevent/kevent_user.c
new file mode 100644
index 0000000..f6d1ff6
--- /dev/null
+++ b/kernel/kevent/kevent_user.c
@@ -0,0 +1,1007 @@
+/*
+ * kevent_user.c
+ *
+ * 2006 Copyright (c) Evgeniy Polyakov <[email protected]>
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/mount.h>
+#include <linux/device.h>
+#include <linux/poll.h>
+#include <linux/kevent.h>
+#include <linux/jhash.h>
+#include <linux/miscdevice.h>
+#include <asm/io.h>
+
+static char kevent_name[] = "kevent";
+
+static int kevent_user_open(struct inode *, struct file *);
+static int kevent_user_release(struct inode *, struct file *);
+static unsigned int kevent_user_poll(struct file *, struct poll_table_struct *);
+static int kevent_user_mmap(struct file *, struct vm_area_struct *);
+
+static struct file_operations kevent_user_fops = {
+ .mmap = kevent_user_mmap,
+ .open = kevent_user_open,
+ .release = kevent_user_release,
+ .poll = kevent_user_poll,
+ .owner = THIS_MODULE,
+};
+
+static struct miscdevice kevent_miscdev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = kevent_name,
+ .fops = &kevent_user_fops,
+};
+
+static int kevent_get_sb(struct file_system_type *fs_type,
+ int flags, const char *dev_name, void *data, struct vfsmount *mnt)
+{
+ /* So original magic... */
+ return get_sb_pseudo(fs_type, kevent_name, NULL, 0xbcdbcdul, mnt);
+}
+
+static struct file_system_type kevent_fs_type = {
+ .name = kevent_name,
+ .get_sb = kevent_get_sb,
+ .kill_sb = kill_anon_super,
+};
+
+static struct vfsmount *kevent_mnt;
+
+/*
+ * kevents are pollable, return POLLIN and POLLRDNORM
+ * when there is at least one ready kevent.
+ */
+static unsigned int kevent_user_poll(struct file *file, struct poll_table_struct *wait)
+{
+ struct kevent_user *u = file->private_data;
+ unsigned int mask;
+
+ poll_wait(file, &u->wait, wait);
+ mask = 0;
+
+ if (u->ready_num)
+ mask |= POLLIN | POLLRDNORM;
+
+ return mask;
+}
+
+/*
+ * Note that kevents does not exactly fill the page (each mukevent is 40 bytes),
+ * so we reuse 4 bytes at the begining of the first page to store index.
+ * Take that into account if you want to change size of struct ukevent.
+ */
+#define KEVENTS_ON_PAGE ((PAGE_SIZE-sizeof(unsigned int))/sizeof(struct mukevent))
+struct kevent_mring
+{
+ unsigned int index;
+ struct mukevent event[KEVENTS_ON_PAGE];
+};
+
+static inline void kevent_user_ring_set(struct kevent_user *u, unsigned int num)
+{
+ struct kevent_mring *ring;
+
+ ring = (struct kevent_mring *)u->pring[0];
+ ring->index = num;
+}
+
+static inline void kevent_user_ring_inc(struct kevent_user *u)
+{
+ struct kevent_mring *ring;
+
+ ring = (struct kevent_mring *)u->pring[0];
+ ring->index++;
+}
+
+static int kevent_user_ring_grow(struct kevent_user *u)
+{
+ struct kevent_mring *ring;
+ unsigned int idx;
+
+ ring = (struct kevent_mring *)u->pring[0];
+
+ idx = (ring->index + 1) / KEVENTS_ON_PAGE;
+ if (idx >= u->pages_in_use) {
+ u->pring[idx] = __get_free_page(GFP_KERNEL);
+ if (!u->pring[idx])
+ return -ENOMEM;
+ u->pages_in_use++;
+ }
+ return 0;
+}
+
+/*
+ * Called under kevent_user->ready_lock, so updates are always protected.
+ */
+void kevent_user_ring_add_event(struct kevent *k)
+{
+ unsigned int pidx, off;
+ struct kevent_mring *ring, *copy_ring;
+
+ ring = (struct kevent_mring *)k->user->pring[0];
+
+ pidx = ring->index/KEVENTS_ON_PAGE;
+ off = ring->index%KEVENTS_ON_PAGE;
+
+ copy_ring = (struct kevent_mring *)k->user->pring[pidx];
+
+ copy_ring->event[off].id.raw[0] = k->event.id.raw[0];
+ copy_ring->event[off].id.raw[1] = k->event.id.raw[1];
+ copy_ring->event[off].ret_flags = k->event.ret_flags;
+
+ if (++ring->index >= KEVENT_MAX_EVENTS)
+ ring->index = 0;
+}
+
+/*
+ * Initialize mmap ring buffer.
+ * It will store ready kevents, so userspace could get them directly instead
+ * of using syscall. Esentially syscall becomes just a waiting point.
+ */
+static int kevent_user_ring_init(struct kevent_user *u)
+{
+ int pnum;
+
+ pnum = ALIGN(KEVENT_MAX_EVENTS*sizeof(struct mukevent) + sizeof(unsigned int), PAGE_SIZE)/PAGE_SIZE;
+
+ u->pring = kmalloc(pnum * sizeof(unsigned long), GFP_KERNEL);
+ if (!u->pring)
+ return -ENOMEM;
+
+ u->pring[0] = __get_free_page(GFP_KERNEL);
+ if (!u->pring[0])
+ goto err_out_free;
+
+ u->pages_in_use = 1;
+ kevent_user_ring_set(u, 0);
+
+ return 0;
+
+err_out_free:
+ kfree(u->pring);
+
+ return -ENOMEM;
+}
+
+static void kevent_user_ring_fini(struct kevent_user *u)
+{
+ int i;
+
+ for (i=0; i<u->pages_in_use; ++i)
+ free_page(u->pring[i]);
+
+ kfree(u->pring);
+}
+
+
+/*
+ * Allocate new kevent userspace control entry.
+ */
+static struct kevent_user *kevent_user_alloc(void)
+{
+ struct kevent_user *u;
+ int i;
+
+ u = kzalloc(sizeof(struct kevent_user), GFP_KERNEL);
+ if (!u)
+ return NULL;
+
+ INIT_LIST_HEAD(&u->ready_list);
+ spin_lock_init(&u->ready_lock);
+ kevent_stat_init(u);
+ spin_lock_init(&u->kevent_lock);
+ for (i=0; i<ARRAY_SIZE(u->kevent_list); ++i)
+ INIT_LIST_HEAD(&u->kevent_list[i]);
+
+ mutex_init(&u->ctl_mutex);
+ init_waitqueue_head(&u->wait);
+
+ atomic_set(&u->refcnt, 1);
+
+ if (kevent_user_ring_init(u)) {
+ kfree(u);
+ u = NULL;
+ }
+
+ return u;
+}
+
+static int kevent_user_open(struct inode *inode, struct file *file)
+{
+ struct kevent_user *u = kevent_user_alloc();
+
+ if (!u)
+ return -ENOMEM;
+
+ file->private_data = u;
+
+ return 0;
+}
+
+
+/*
+ * Kevent userspace control block reference counting.
+ * Set to 1 at creation time, when appropriate kevent file descriptor
+ * is closed, that reference counter is decreased.
+ * When counter hits zero block is freed.
+ */
+static inline void kevent_user_get(struct kevent_user *u)
+{
+ atomic_inc(&u->refcnt);
+}
+
+static inline void kevent_user_put(struct kevent_user *u)
+{
+ if (atomic_dec_and_test(&u->refcnt)) {
+ kevent_stat_print(u);
+ kevent_user_ring_fini(u);
+ kfree(u);
+ }
+}
+
+static struct page *kevent_user_nopage(struct vm_area_struct *vma, unsigned long addr, int *type)
+{
+ struct kevent_user *u = vma->vm_file->private_data;
+ unsigned long off = (addr - vma->vm_start)/PAGE_SIZE;
+ unsigned int pnum = ALIGN(KEVENT_MAX_EVENTS*sizeof(struct mukevent) + sizeof(unsigned int), PAGE_SIZE)/PAGE_SIZE;
+
+ if (type)
+ *type = VM_FAULT_MINOR;
+
+ if (off >= pnum)
+ goto err_out_sigbus;
+
+ u->pring[off] = __get_free_page(GFP_KERNEL);
+ if (!u->pring)
+ goto err_out_sigbus;
+
+ return virt_to_page(u->pring[off]);
+
+err_out_sigbus:
+ return NOPAGE_SIGBUS;
+}
+
+static struct vm_operations_struct kevent_user_vm_ops = {
+ .nopage = &kevent_user_nopage,
+};
+
+/*
+ * Mmap implementation for ring buffer, which is created as array
+ * of pages, so vm_pgoff is an offset (in pages, not in bytes) of
+ * the first page to be mapped.
+ */
+static int kevent_user_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ unsigned long start = vma->vm_start;
+ struct kevent_user *u = file->private_data;
+
+ if (vma->vm_flags & VM_WRITE)
+ return -EPERM;
+
+ vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+ vma->vm_ops = &kevent_user_vm_ops;
+ vma->vm_flags |= VM_RESERVED;
+ vma->vm_file = file;
+
+ if (remap_pfn_range(vma, start, virt_to_phys((void *)u->pring[0]), PAGE_SIZE,
+ vma->vm_page_prot))
+ return -EFAULT;
+
+ return 0;
+}
+
+#if 0
+static inline unsigned int kevent_user_hash(struct ukevent *uk)
+{
+ unsigned int h = (uk->user[0] ^ uk->user[1]) ^ (uk->id.raw[0] ^ uk->id.raw[1]);
+
+ h = (((h >> 16) & 0xffff) ^ (h & 0xffff)) & 0xffff;
+ h = (((h >> 8) & 0xff) ^ (h & 0xff)) & KEVENT_HASH_MASK;
+
+ return h;
+}
+#else
+static inline unsigned int kevent_user_hash(struct ukevent *uk)
+{
+ return jhash_1word(uk->id.raw[0], 0) & KEVENT_HASH_MASK;
+}
+#endif
+
+/*
+ * RCU protects storage list (kevent->storage_entry).
+ * Free entry in RCU callback, it is dequeued from all lists at
+ * this point.
+ */
+
+static void kevent_free_rcu(struct rcu_head *rcu)
+{
+ struct kevent *kevent = container_of(rcu, struct kevent, rcu_head);
+ kmem_cache_free(kevent_cache, kevent);
+}
+
+/*
+ * Complete kevent removing - it dequeues kevent from storage list
+ * if it is requested, removes kevent from ready list, drops userspace
+ * control block reference counter and schedules kevent freeing through RCU.
+ */
+static void kevent_finish_user_complete(struct kevent *k, int deq)
+{
+ struct kevent_user *u = k->user;
+ unsigned long flags;
+
+ if (deq)
+ kevent_dequeue(k);
+
+ spin_lock_irqsave(&u->ready_lock, flags);
+ if (k->flags & KEVENT_READY) {
+ list_del(&k->ready_entry);
+ k->flags &= ~KEVENT_READY;
+ u->ready_num--;
+ }
+ spin_unlock_irqrestore(&u->ready_lock, flags);
+
+ kevent_user_put(u);
+ call_rcu(&k->rcu_head, kevent_free_rcu);
+}
+
+/*
+ * Remove from all lists and free kevent.
+ * Must be called under kevent_user->kevent_lock to protect
+ * kevent->kevent_entry removing.
+ */
+static void __kevent_finish_user(struct kevent *k, int deq)
+{
+ struct kevent_user *u = k->user;
+
+ list_del(&k->kevent_entry);
+ k->flags &= ~KEVENT_USER;
+ u->kevent_num--;
+ kevent_finish_user_complete(k, deq);
+}
+
+/*
+ * Remove kevent from user's list of all events,
+ * dequeue it from storage and decrease user's reference counter,
+ * since this kevent does not exist anymore. That is why it is freed here.
+ */
+static void kevent_finish_user(struct kevent *k, int deq)
+{
+ struct kevent_user *u = k->user;
+ unsigned long flags;
+
+ spin_lock_irqsave(&u->kevent_lock, flags);
+ list_del(&k->kevent_entry);
+ k->flags &= ~KEVENT_USER;
+ u->kevent_num--;
+ spin_unlock_irqrestore(&u->kevent_lock, flags);
+ kevent_finish_user_complete(k, deq);
+}
+
+/*
+ * Dequeue one entry from user's ready queue.
+ */
+static struct kevent *kqueue_dequeue_ready(struct kevent_user *u)
+{
+ unsigned long flags;
+ struct kevent *k = NULL;
+
+ spin_lock_irqsave(&u->ready_lock, flags);
+ if (u->ready_num && !list_empty(&u->ready_list)) {
+ k = list_entry(u->ready_list.next, struct kevent, ready_entry);
+ list_del(&k->ready_entry);
+ k->flags &= ~KEVENT_READY;
+ u->ready_num--;
+ }
+ spin_unlock_irqrestore(&u->ready_lock, flags);
+
+ return k;
+}
+
+/*
+ * Search a kevent inside hash bucket for given ukevent.
+ */
+static struct kevent *__kevent_search(struct list_head *head, struct ukevent *uk,
+ struct kevent_user *u)
+{
+ struct kevent *k, *ret = NULL;
+
+ list_for_each_entry(k, head, kevent_entry) {
+ spin_lock(&k->ulock);
+ if (k->event.user[0] == uk->user[0] && k->event.user[1] == uk->user[1] &&
+ k->event.id.raw[0] == uk->id.raw[0] &&
+ k->event.id.raw[1] == uk->id.raw[1]) {
+ ret = k;
+ spin_unlock(&k->ulock);
+ break;
+ }
+ spin_unlock(&k->ulock);
+ }
+
+ return ret;
+}
+
+/*
+ * Search and modify kevent according to provided ukevent.
+ */
+static int kevent_modify(struct ukevent *uk, struct kevent_user *u)
+{
+ struct kevent *k;
+ unsigned int hash = kevent_user_hash(uk);
+ int err = -ENODEV;
+ unsigned long flags;
+
+ spin_lock_irqsave(&u->kevent_lock, flags);
+ k = __kevent_search(&u->kevent_list[hash], uk, u);
+ if (k) {
+ spin_lock(&k->ulock);
+ k->event.event = uk->event;
+ k->event.req_flags = uk->req_flags;
+ k->event.ret_flags = 0;
+ spin_unlock(&k->ulock);
+ kevent_requeue(k);
+ err = 0;
+ }
+ spin_unlock_irqrestore(&u->kevent_lock, flags);
+
+ return err;
+}
+
+/*
+ * Remove kevent which matches provided ukevent.
+ */
+static int kevent_remove(struct ukevent *uk, struct kevent_user *u)
+{
+ int err = -ENODEV;
+ struct kevent *k;
+ unsigned int hash = kevent_user_hash(uk);
+ unsigned long flags;
+
+ spin_lock_irqsave(&u->kevent_lock, flags);
+ k = __kevent_search(&u->kevent_list[hash], uk, u);
+ if (k) {
+ __kevent_finish_user(k, 1);
+ err = 0;
+ }
+ spin_unlock_irqrestore(&u->kevent_lock, flags);
+
+ return err;
+}
+
+/*
+ * Detaches userspace control block from file descriptor
+ * and decrease it's reference counter.
+ * No new kevents can be added or removed from any list at this point.
+ */
+static int kevent_user_release(struct inode *inode, struct file *file)
+{
+ struct kevent_user *u = file->private_data;
+ struct kevent *k, *n;
+ int i;
+
+ for (i=0; i<ARRAY_SIZE(u->kevent_list); ++i) {
+ list_for_each_entry_safe(k, n, &u->kevent_list[i], kevent_entry)
+ kevent_finish_user(k, 1);
+ }
+
+ kevent_user_put(u);
+ file->private_data = NULL;
+
+ return 0;
+}
+
+/*
+ * Read requested number of ukevents in one shot.
+ */
+static struct ukevent *kevent_get_user(unsigned int num, void __user *arg)
+{
+ struct ukevent *ukev;
+
+ ukev = kmalloc(sizeof(struct ukevent) * num, GFP_KERNEL);
+ if (!ukev)
+ return NULL;
+
+ if (copy_from_user(ukev, arg, sizeof(struct ukevent) * num)) {
+ kfree(ukev);
+ return NULL;
+ }
+
+ return ukev;
+}
+
+/*
+ * Read from userspace all ukevents and modify appropriate kevents.
+ * If provided number of ukevents is more that threshold, it is faster
+ * to allocate a room for them and copy in one shot instead of copy
+ * one-by-one and then process them.
+ */
+static int kevent_user_ctl_modify(struct kevent_user *u, unsigned int num, void __user *arg)
+{
+ int err = 0, i;
+ struct ukevent uk;
+
+ mutex_lock(&u->ctl_mutex);
+
+ if (num > u->kevent_num) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ if (num > KEVENT_MIN_BUFFS_ALLOC) {
+ struct ukevent *ukev;
+
+ ukev = kevent_get_user(num, arg);
+ if (ukev) {
+ for (i=0; i<num; ++i) {
+ if (kevent_modify(&ukev[i], u))
+ ukev[i].ret_flags |= KEVENT_RET_BROKEN;
+ ukev[i].ret_flags |= KEVENT_RET_DONE;
+ }
+ if (copy_to_user(arg, ukev, num*sizeof(struct ukevent)))
+ err = -EFAULT;
+ kfree(ukev);
+ goto out;
+ }
+ }
+
+ for (i=0; i<num; ++i) {
+ if (copy_from_user(&uk, arg, sizeof(struct ukevent))) {
+ err = -EFAULT;
+ break;
+ }
+
+ if (kevent_modify(&uk, u))
+ uk.ret_flags |= KEVENT_RET_BROKEN;
+ uk.ret_flags |= KEVENT_RET_DONE;
+
+ if (copy_to_user(arg, &uk, sizeof(struct ukevent))) {
+ err = -EFAULT;
+ break;
+ }
+
+ arg += sizeof(struct ukevent);
+ }
+out:
+ mutex_unlock(&u->ctl_mutex);
+
+ return err;
+}
+
+/*
+ * Read from userspace all ukevents and remove appropriate kevents.
+ * If provided number of ukevents is more that threshold, it is faster
+ * to allocate a room for them and copy in one shot instead of copy
+ * one-by-one and then process them.
+ */
+static int kevent_user_ctl_remove(struct kevent_user *u, unsigned int num, void __user *arg)
+{
+ int err = 0, i;
+ struct ukevent uk;
+
+ mutex_lock(&u->ctl_mutex);
+
+ if (num > u->kevent_num) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ if (num > KEVENT_MIN_BUFFS_ALLOC) {
+ struct ukevent *ukev;
+
+ ukev = kevent_get_user(num, arg);
+ if (ukev) {
+ for (i=0; i<num; ++i) {
+ if (kevent_remove(&ukev[i], u))
+ ukev[i].ret_flags |= KEVENT_RET_BROKEN;
+ ukev[i].ret_flags |= KEVENT_RET_DONE;
+ }
+ if (copy_to_user(arg, ukev, num*sizeof(struct ukevent)))
+ err = -EFAULT;
+ kfree(ukev);
+ goto out;
+ }
+ }
+
+ for (i=0; i<num; ++i) {
+ if (copy_from_user(&uk, arg, sizeof(struct ukevent))) {
+ err = -EFAULT;
+ break;
+ }
+
+ if (kevent_remove(&uk, u))
+ uk.ret_flags |= KEVENT_RET_BROKEN;
+
+ uk.ret_flags |= KEVENT_RET_DONE;
+
+ if (copy_to_user(arg, &uk, sizeof(struct ukevent))) {
+ err = -EFAULT;
+ break;
+ }
+
+ arg += sizeof(struct ukevent);
+ }
+out:
+ mutex_unlock(&u->ctl_mutex);
+
+ return err;
+}
+
+/*
+ * Queue kevent into userspace control block and increase
+ * it's reference counter.
+ */
+static void kevent_user_enqueue(struct kevent_user *u, struct kevent *k)
+{
+ unsigned long flags;
+ unsigned int hash = kevent_user_hash(&k->event);
+
+ spin_lock_irqsave(&u->kevent_lock, flags);
+ list_add_tail(&k->kevent_entry, &u->kevent_list[hash]);
+ k->flags |= KEVENT_USER;
+ u->kevent_num++;
+ kevent_user_get(u);
+ spin_unlock_irqrestore(&u->kevent_lock, flags);
+}
+
+/*
+ * Add kevent from both kernel and userspace users.
+ * This function allocates and queues kevent, returns negative value
+ * on error, positive if kevent is ready immediately and zero
+ * if kevent has been queued.
+ */
+int kevent_user_add_ukevent(struct ukevent *uk, struct kevent_user *u)
+{
+ struct kevent *k;
+ int err;
+
+ if (kevent_user_ring_grow(u)) {
+ err = -ENOMEM;
+ goto err_out_exit;
+ }
+
+ k = kmem_cache_alloc(kevent_cache, GFP_KERNEL);
+ if (!k) {
+ err = -ENOMEM;
+ goto err_out_exit;
+ }
+
+ memcpy(&k->event, uk, sizeof(struct ukevent));
+ INIT_RCU_HEAD(&k->rcu_head);
+
+ k->event.ret_flags = 0;
+
+ err = kevent_init(k);
+ if (err) {
+ kmem_cache_free(kevent_cache, k);
+ goto err_out_exit;
+ }
+ k->user = u;
+ kevent_stat_total(u);
+ kevent_user_enqueue(u, k);
+
+ err = kevent_enqueue(k);
+ if (err) {
+ memcpy(uk, &k->event, sizeof(struct ukevent));
+ kevent_finish_user(k, 0);
+ } else {
+ kevent_user_ring_inc(u);
+ }
+
+err_out_exit:
+ if (err < 0) {
+ uk->ret_flags |= KEVENT_RET_BROKEN | KEVENT_RET_DONE;
+ uk->ret_data[1] = err;
+ }
+ return err;
+}
+
+/*
+ * Copy all ukevents from userspace, allocate kevent for each one
+ * and add them into appropriate kevent_storages,
+ * e.g. sockets, inodes and so on...
+ * Ready events will replace ones provided by used and number
+ * of ready events is returned.
+ * User must check ret_flags field of each ukevent structure
+ * to determine if it is fired or failed event.
+ */
+static int kevent_user_ctl_add(struct kevent_user *u, unsigned int num, void __user *arg)
+{
+ int err, cerr = 0, knum = 0, rnum = 0, i;
+ void __user *orig = arg;
+ struct ukevent uk;
+
+ mutex_lock(&u->ctl_mutex);
+
+ err = -EINVAL;
+ if (u->kevent_num + num >= KEVENT_MAX_EVENTS)
+ goto out_remove;
+
+ if (num > KEVENT_MIN_BUFFS_ALLOC) {
+ struct ukevent *ukev;
+
+ ukev = kevent_get_user(num, arg);
+ if (ukev) {
+ for (i=0; i<num; ++i) {
+ err = kevent_user_add_ukevent(&ukev[i], u);
+ if (err) {
+ kevent_stat_im(u);
+ if (i != rnum)
+ memcpy(&ukev[rnum], &ukev[i], sizeof(struct ukevent));
+ rnum++;
+ } else
+ knum++;
+ }
+ if (copy_to_user(orig, ukev, rnum*sizeof(struct ukevent)))
+ cerr = -EFAULT;
+ kfree(ukev);
+ goto out_setup;
+ }
+ }
+
+ for (i=0; i<num; ++i) {
+ if (copy_from_user(&uk, arg, sizeof(struct ukevent))) {
+ cerr = -EFAULT;
+ break;
+ }
+ arg += sizeof(struct ukevent);
+
+ err = kevent_user_add_ukevent(&uk, u);
+ if (err) {
+ kevent_stat_im(u);
+ if (copy_to_user(orig, &uk, sizeof(struct ukevent))) {
+ cerr = -EFAULT;
+ break;
+ }
+ orig += sizeof(struct ukevent);
+ rnum++;
+ } else
+ knum++;
+ }
+
+out_setup:
+ if (cerr < 0) {
+ err = cerr;
+ goto out_remove;
+ }
+
+ err = rnum;
+out_remove:
+ mutex_unlock(&u->ctl_mutex);
+
+ return err;
+}
+
+/*
+ * In nonblocking mode it returns as many events as possible, but not more than @max_nr.
+ * In blocking mode it waits until timeout or if at least @min_nr events are ready.
+ */
+static int kevent_user_wait(struct file *file, struct kevent_user *u,
+ unsigned int min_nr, unsigned int max_nr, unsigned int timeout,
+ void __user *buf)
+{
+ struct kevent *k;
+ int num = 0;
+
+ if (!(file->f_flags & O_NONBLOCK)) {
+ wait_event_interruptible_timeout(u->wait,
+ u->ready_num >= min_nr, msecs_to_jiffies(timeout));
+ }
+
+ while (num < max_nr && ((k = kqueue_dequeue_ready(u)) != NULL)) {
+ if (copy_to_user(buf + num*sizeof(struct ukevent),
+ &k->event, sizeof(struct ukevent)))
+ break;
+
+ /*
+ * If it is one-shot kevent, it has been removed already from
+ * origin's queue, so we can easily free it here.
+ */
+ if (k->event.req_flags & KEVENT_REQ_ONESHOT)
+ kevent_finish_user(k, 1);
+ ++num;
+ kevent_stat_wait(u);
+ }
+
+ return num;
+}
+
+/*
+ * Userspace control block creation and initialization.
+ */
+static int kevent_ctl_init(void)
+{
+ struct kevent_user *u;
+ struct file *file;
+ int fd, ret;
+
+ fd = get_unused_fd();
+ if (fd < 0)
+ return fd;
+
+ file = get_empty_filp();
+ if (!file) {
+ ret = -ENFILE;
+ goto out_put_fd;
+ }
+
+ u = kevent_user_alloc();
+ if (unlikely(!u)) {
+ ret = -ENOMEM;
+ goto out_put_file;
+ }
+
+ file->f_op = &kevent_user_fops;
+ file->f_vfsmnt = mntget(kevent_mnt);
+ file->f_dentry = dget(kevent_mnt->mnt_root);
+ file->f_mapping = file->f_dentry->d_inode->i_mapping;
+ file->f_mode = FMODE_READ;
+ file->f_flags = O_RDONLY;
+ file->private_data = u;
+
+ fd_install(fd, file);
+
+ return fd;
+
+out_put_file:
+ put_filp(file);
+out_put_fd:
+ put_unused_fd(fd);
+ return ret;
+}
+
+static int kevent_ctl_process(struct file *file, unsigned int cmd, unsigned int num, void __user *arg)
+{
+ int err;
+ struct kevent_user *u = file->private_data;
+
+ if (!u || num > KEVENT_MAX_EVENTS)
+ return -EINVAL;
+
+ switch (cmd) {
+ case KEVENT_CTL_ADD:
+ err = kevent_user_ctl_add(u, num, arg);
+ break;
+ case KEVENT_CTL_REMOVE:
+ err = kevent_user_ctl_remove(u, num, arg);
+ break;
+ case KEVENT_CTL_MODIFY:
+ err = kevent_user_ctl_modify(u, num, arg);
+ break;
+ default:
+ err = -EINVAL;
+ break;
+ }
+
+ return err;
+}
+
+/*
+ * Used to get ready kevents from queue.
+ * @ctl_fd - kevent control descriptor which must be obtained through kevent_ctl(KEVENT_CTL_INIT).
+ * @min_nr - minimum number of ready kevents.
+ * @max_nr - maximum number of ready kevents.
+ * @timeout - timeout in milliseconds to wait until some events are ready.
+ * @buf - buffer to place ready events.
+ * @flags - ununsed for now (will be used for mmap implementation).
+ */
+asmlinkage long sys_kevent_get_events(int ctl_fd, unsigned int min_nr, unsigned int max_nr,
+ unsigned int timeout, void __user *buf, unsigned flags)
+{
+ int err = -EINVAL;
+ struct file *file;
+ struct kevent_user *u;
+
+ file = fget(ctl_fd);
+ if (!file)
+ return -ENODEV;
+
+ if (file->f_op != &kevent_user_fops)
+ goto out_fput;
+ u = file->private_data;
+
+ err = kevent_user_wait(file, u, min_nr, max_nr, timeout, buf);
+out_fput:
+ fput(file);
+ return err;
+}
+
+/*
+ * This syscall is used to perform various control operations
+ * on given kevent queue, which is obtained through kevent file descriptor @fd.
+ * @cmd - type of operation.
+ * @num - number of kevents to be processed.
+ * @arg - pointer to array of struct ukevent.
+ */
+asmlinkage long sys_kevent_ctl(int fd, unsigned int cmd, unsigned int num, void __user *arg)
+{
+ int err = -EINVAL;
+ struct file *file;
+
+ if (cmd == KEVENT_CTL_INIT)
+ return kevent_ctl_init();
+
+ file = fget(fd);
+ if (!file)
+ return -ENODEV;
+
+ if (file->f_op != &kevent_user_fops)
+ goto out_fput;
+
+ err = kevent_ctl_process(file, cmd, num, arg);
+
+out_fput:
+ fput(file);
+ return err;
+}
+
+/*
+ * Kevent subsystem initialization - create kevent cache and register
+ * filesystem to get control file descriptors from.
+ */
+static int __devinit kevent_user_init(void)
+{
+ int err = 0;
+
+ err = kevent_sys_init();
+ if (err)
+ panic("%s: failed to initialize kevent: err=%d.\n", err);
+
+ err = register_filesystem(&kevent_fs_type);
+ if (err)
+ panic("%s: failed to register filesystem: err=%d.\n",
+ kevent_name, err);
+
+ kevent_mnt = kern_mount(&kevent_fs_type);
+ if (IS_ERR(kevent_mnt))
+ panic("%s: failed to mount silesystem: err=%ld.\n",
+ kevent_name, PTR_ERR(kevent_mnt));
+
+ err = misc_register(&kevent_miscdev);
+ if (err) {
+ printk(KERN_ERR "Failed to register kevent miscdev: err=%d.\n", err);
+ goto err_out_exit;
+ }
+
+ printk("KEVENT subsystem has been successfully registered.\n");
+
+ return 0;
+
+err_out_exit:
+ mntput(kevent_mnt);
+ unregister_filesystem(&kevent_fs_type);
+
+ return err;
+}
+
+static void __devexit kevent_user_fini(void)
+{
+ misc_deregister(&kevent_miscdev);
+ mntput(kevent_mnt);
+ unregister_filesystem(&kevent_fs_type);
+}
+
+module_init(kevent_user_init);
+module_exit(kevent_user_fini);
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 6991bec..8d3769b 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -122,6 +122,9 @@ cond_syscall(ppc_rtas);
cond_syscall(sys_spu_run);
cond_syscall(sys_spu_create);

+cond_syscall(sys_kevent_get_events);
+cond_syscall(sys_kevent_ctl);
+
/* mmu depending weak syscall entries */
cond_syscall(sys_mprotect);
cond_syscall(sys_msync);

2006-08-16 13:26:54

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [take9 0/2] kevent: Generic event handling mechanism.

On Mon, Aug 14, 2006 at 10:21:36AM +0400, Evgeniy Polyakov wrote:
>
> Generic event handling mechanism.

Hi, I've just started looking into this, so some comments here first
on the submission process:

- could you send new revisions of the patches in a new thread so one can
easily find them?
- the patch split is not very nice, your first patch adds Makefile and
Kconfig entries for files only in the second patch or not actually
submitted at all, that's a big no-no.

2006-08-16 13:30:18

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [take9 2/2] kevent: poll/select() notifications. Timer notifications.

On Mon, Aug 14, 2006 at 10:21:36AM +0400, Evgeniy Polyakov wrote:
>
> poll/select() notifications. Timer notifications.
>
> This patch includes generic poll/select and timer notifications.
>
> kevent_poll works simialr to epoll and has the same issues (callback
> is invoked not from internal state machine of the caller, but through
> process awake).

I'm not a big fan of duplicating code over and over. kevent is a candidate
for a generic event devlivery mechanisms which is a _very_ good thing. But
starting that system by duplicating existing functionality is not very nice.

What speaks against a patch the recplaces the epoll core by something that
build on kevent while still supporting the epoll interface as a compatibility
shim?

> Timer notifications can be used for fine grained per-process time
> management, since interval timers are very inconvenient to use,
> and they are limited.

I have similar reservations about this one. Having timers as part of a
generic events system is very nice, but having so much duplicated functionality
is not. Cc'ed Thomas on behalf of the Timer cabal if there's a point in
integrating this into a larger framework of timer code.


Also it would be nice if you could submit each of the notifications as a patch
on it's own.

> diff --git a/kernel/kevent/kevent_poll.c b/kernel/kevent/kevent_poll.c
> new file mode 100644
> index 0000000..8a4f863
> --- /dev/null
> +++ b/kernel/kevent/kevent_poll.c
> @@ -0,0 +1,220 @@
> +/*
> + * kevent_poll.c
> + *
> + * 2006 Copyright (c) Evgeniy Polyakov <[email protected]>
> + * All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/list.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/timer.h>
> +#include <linux/file.h>
> +#include <linux/kevent.h>
> +#include <linux/poll.h>
> +#include <linux/fs.h>
> +
> +static kmem_cache_t *kevent_poll_container_cache;
> +static kmem_cache_t *kevent_poll_priv_cache;
> +
> +struct kevent_poll_ctl
> +{
> + struct poll_table_struct pt;
> + struct kevent *k;
> +};
> +
> +struct kevent_poll_wait_container
> +{
> + struct list_head container_entry;
> + wait_queue_head_t *whead;
> + wait_queue_t wait;
> + struct kevent *k;
> +};
> +
> +struct kevent_poll_private
> +{
> + struct list_head container_list;
> + spinlock_t container_lock;
> +};
> +
> +static int kevent_poll_enqueue(struct kevent *k);
> +static int kevent_poll_dequeue(struct kevent *k);
> +static int kevent_poll_callback(struct kevent *k);
> +
> +static int kevent_poll_wait_callback(wait_queue_t *wait,
> + unsigned mode, int sync, void *key)
> +{
> + struct kevent_poll_wait_container *cont =
> + container_of(wait, struct kevent_poll_wait_container, wait);
> + struct kevent *k = cont->k;
> + struct file *file = k->st->origin;
> + u32 revents;
> +
> + revents = file->f_op->poll(file, NULL);
> +
> + kevent_storage_ready(k->st, NULL, revents);
> +
> + return 0;
> +}
> +
> +static void kevent_poll_qproc(struct file *file, wait_queue_head_t *whead,
> + struct poll_table_struct *poll_table)
> +{
> + struct kevent *k =
> + container_of(poll_table, struct kevent_poll_ctl, pt)->k;
> + struct kevent_poll_private *priv = k->priv;
> + struct kevent_poll_wait_container *cont;
> + unsigned long flags;
> +
> + cont = kmem_cache_alloc(kevent_poll_container_cache, SLAB_KERNEL);
> + if (!cont) {
> + kevent_break(k);
> + return;
> + }
> +
> + cont->k = k;
> + init_waitqueue_func_entry(&cont->wait, kevent_poll_wait_callback);
> + cont->whead = whead;
> +
> + spin_lock_irqsave(&priv->container_lock, flags);
> + list_add_tail(&cont->container_entry, &priv->container_list);
> + spin_unlock_irqrestore(&priv->container_lock, flags);
> +
> + add_wait_queue(whead, &cont->wait);
> +}
> +
> +static int kevent_poll_enqueue(struct kevent *k)
> +{
> + struct file *file;
> + int err, ready = 0;
> + unsigned int revents;
> + struct kevent_poll_ctl ctl;
> + struct kevent_poll_private *priv;
> +
> + file = fget(k->event.id.raw[0]);
> + if (!file)
> + return -ENODEV;
> +
> + err = -EINVAL;
> + if (!file->f_op || !file->f_op->poll)
> + goto err_out_fput;
> +
> + err = -ENOMEM;
> + priv = kmem_cache_alloc(kevent_poll_priv_cache, SLAB_KERNEL);
> + if (!priv)
> + goto err_out_fput;
> +
> + spin_lock_init(&priv->container_lock);
> + INIT_LIST_HEAD(&priv->container_list);
> +
> + k->priv = priv;
> +
> + ctl.k = k;
> + init_poll_funcptr(&ctl.pt, &kevent_poll_qproc);
> +
> + err = kevent_storage_enqueue(&file->st, k);
> + if (err)
> + goto err_out_free;
> +
> + revents = file->f_op->poll(file, &ctl.pt);
> + if (revents & k->event.event) {
> + ready = 1;
> + kevent_poll_dequeue(k);
> + }
> +
> + return ready;
> +
> +err_out_free:
> + kmem_cache_free(kevent_poll_priv_cache, priv);
> +err_out_fput:
> + fput(file);
> + return err;
> +}
> +
> +static int kevent_poll_dequeue(struct kevent *k)
> +{
> + struct file *file = k->st->origin;
> + struct kevent_poll_private *priv = k->priv;
> + struct kevent_poll_wait_container *w, *n;
> + unsigned long flags;
> +
> + kevent_storage_dequeue(k->st, k);
> +
> + spin_lock_irqsave(&priv->container_lock, flags);
> + list_for_each_entry_safe(w, n, &priv->container_list, container_entry) {
> + list_del(&w->container_entry);
> + remove_wait_queue(w->whead, &w->wait);
> + kmem_cache_free(kevent_poll_container_cache, w);
> + }
> + spin_unlock_irqrestore(&priv->container_lock, flags);
> +
> + kmem_cache_free(kevent_poll_priv_cache, priv);
> + k->priv = NULL;
> +
> + fput(file);
> +
> + return 0;
> +}
> +
> +static int kevent_poll_callback(struct kevent *k)
> +{
> + struct file *file = k->st->origin;
> + unsigned int revents = file->f_op->poll(file, NULL);
> + return (revents & k->event.event);
> +}
> +
> +static int __init kevent_poll_sys_init(void)
> +{
> + struct kevent_callbacks *pc = &kevent_registered_callbacks[KEVENT_POLL];
> +
> + kevent_poll_container_cache = kmem_cache_create("kevent_poll_container_cache",
> + sizeof(struct kevent_poll_wait_container), 0, 0, NULL, NULL);
> + if (!kevent_poll_container_cache) {
> + printk(KERN_ERR "Failed to create kevent poll container cache.\n");
> + return -ENOMEM;
> + }
> +
> + kevent_poll_priv_cache = kmem_cache_create("kevent_poll_priv_cache",
> + sizeof(struct kevent_poll_private), 0, 0, NULL, NULL);
> + if (!kevent_poll_priv_cache) {
> + printk(KERN_ERR "Failed to create kevent poll private data cache.\n");
> + kmem_cache_destroy(kevent_poll_container_cache);
> + kevent_poll_container_cache = NULL;
> + return -ENOMEM;
> + }
> +
> + pc->enqueue = &kevent_poll_enqueue;
> + pc->dequeue = &kevent_poll_dequeue;
> + pc->callback = &kevent_poll_callback;
> +
> + printk(KERN_INFO "Kevent poll()/select() subsystem has been initialized.\n");
> + return 0;
> +}
> +
> +static struct lock_class_key kevent_poll_key;
> +
> +void kevent_poll_reinit(struct file *file)
> +{
> + lockdep_set_class(&file->st.lock, &kevent_poll_key);
> +}
> +
> +static void __exit kevent_poll_sys_fini(void)
> +{
> + kmem_cache_destroy(kevent_poll_priv_cache);
> + kmem_cache_destroy(kevent_poll_container_cache);
> +}
> +
> +module_init(kevent_poll_sys_init);
> +module_exit(kevent_poll_sys_fini);
> diff --git a/kernel/kevent/kevent_timer.c b/kernel/kevent/kevent_timer.c
> new file mode 100644
> index 0000000..fe39b4e
> --- /dev/null
> +++ b/kernel/kevent/kevent_timer.c
> @@ -0,0 +1,108 @@
> +/*
> + * kevent_timer.c
> + *
> + * 2006 Copyright (c) Evgeniy Polyakov <[email protected]>
> + * All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/list.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/timer.h>
> +#include <linux/jiffies.h>
> +#include <linux/kevent.h>
> +
> +struct kevent_timer
> +{
> + struct timer_list ktimer;
> + struct kevent_storage ktimer_storage;
> +};
> +
> +static void kevent_timer_func(unsigned long data)
> +{
> + struct kevent *k = (struct kevent *)data;
> + struct timer_list *t = k->st->origin;
> +
> + kevent_storage_ready(k->st, NULL, KEVENT_MASK_ALL);
> + mod_timer(t, jiffies + msecs_to_jiffies(k->event.id.raw[0]));
> +}
> +
> +static struct lock_class_key kevent_timer_key;
> +
> +static int kevent_timer_enqueue(struct kevent *k)
> +{
> + int err;
> + struct kevent_timer *t;
> +
> + t = kmalloc(sizeof(struct kevent_timer), GFP_KERNEL);
> + if (!t)
> + return -ENOMEM;
> +
> + setup_timer(&t->ktimer, &kevent_timer_func, (unsigned long)k);
> +
> + err = kevent_storage_init(&t->ktimer, &t->ktimer_storage);
> + if (err)
> + goto err_out_free;
> + lockdep_set_class(&t->ktimer_storage.lock, &kevent_timer_key);
> +
> + err = kevent_storage_enqueue(&t->ktimer_storage, k);
> + if (err)
> + goto err_out_st_fini;
> +
> + mod_timer(&t->ktimer, jiffies + msecs_to_jiffies(k->event.id.raw[0]));
> +
> + return 0;
> +
> +err_out_st_fini:
> + kevent_storage_fini(&t->ktimer_storage);
> +err_out_free:
> + kfree(t);
> +
> + return err;
> +}
> +
> +static int kevent_timer_dequeue(struct kevent *k)
> +{
> + struct kevent_storage *st = k->st;
> + struct kevent_timer *t = container_of(st, struct kevent_timer, ktimer_storage);
> +
> + del_timer_sync(&t->ktimer);
> + kevent_storage_dequeue(st, k);
> + kfree(t);
> +
> + return 0;
> +}
> +
> +static int kevent_timer_callback(struct kevent *k)
> +{
> + k->event.ret_data[0] = (__u32)jiffies;
> + return 1;
> +}
> +
> +static int __init kevent_init_timer(void)
> +{
> + struct kevent_callbacks *tc = &kevent_registered_callbacks[KEVENT_TIMER];
> +
> + tc->enqueue = &kevent_timer_enqueue;
> + tc->dequeue = &kevent_timer_dequeue;
> + tc->callback = &kevent_timer_callback;
> +
> + return 0;
> +}
> +module_init(kevent_init_timer);
>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
---end quoted text---

2006-08-16 13:39:19

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [take9 0/2] kevent: Generic event handling mechanism.

On Wed, Aug 16, 2006 at 02:26:31PM +0100, Christoph Hellwig ([email protected]) wrote:
> On Mon, Aug 14, 2006 at 10:21:36AM +0400, Evgeniy Polyakov wrote:
> >
> > Generic event handling mechanism.
>
> Hi, I've just started looking into this, so some comments here first
> on the submission process:
>
> - could you send new revisions of the patches in a new thread so one can
> easily find them?

Ok.

> - the patch split is not very nice, your first patch adds Makefile and
> Kconfig entries for files only in the second patch or not actually
> submitted at all, that's a big no-no.

It is done by scripts using list of files generated by git-diff, but I
can reformat them to be in a way:
core files
poll/select
timer
any other
main Kconfig/Makefile

Kevent's makefile still contains all entries for files added later, is
it a big problem right now?
I can split patches manually, but it would be much better to do it when
decision about it's inclusion is made, and until review and feature
addiotion process is not completed generate patches as is...

--
Evgeniy Polyakov

2006-08-16 13:41:18

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [take9 2/2] kevent: poll/select() notifications. Timer notifications.

On Wed, Aug 16, 2006 at 02:30:14PM +0100, Christoph Hellwig ([email protected]) wrote:
> On Mon, Aug 14, 2006 at 10:21:36AM +0400, Evgeniy Polyakov wrote:
> >
> > poll/select() notifications. Timer notifications.
> >
> > This patch includes generic poll/select and timer notifications.
> >
> > kevent_poll works simialr to epoll and has the same issues (callback
> > is invoked not from internal state machine of the caller, but through
> > process awake).
>
> I'm not a big fan of duplicating code over and over. kevent is a candidate
> for a generic event devlivery mechanisms which is a _very_ good thing. But
> starting that system by duplicating existing functionality is not very nice.
>
> What speaks against a patch the recplaces the epoll core by something that
> build on kevent while still supporting the epoll interface as a compatibility
> shim?

There is no problem from my side, but epoll and kevent_poll differs on
some aspects, so it can be better to not replace them for a while.

> > Timer notifications can be used for fine grained per-process time
> > management, since interval timers are very inconvenient to use,
> > and they are limited.
>
> I have similar reservations about this one. Having timers as part of a
> generic events system is very nice, but having so much duplicated functionality
> is not. Cc'ed Thomas on behalf of the Timer cabal if there's a point in
> integrating this into a larger framework of timer code.
>
>
> Also it would be nice if you could submit each of the notifications as a patch
> on it's own.

Ok.

--
Evgeniy Polyakov

2006-08-16 13:46:13

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [take9 1/2] kevent: Core files.

> diff --git a/include/linux/kevent.h b/include/linux/kevent.h
> new file mode 100644
> index 0000000..03eeeea
> --- /dev/null
> +++ b/include/linux/kevent.h
> @@ -0,0 +1,310 @@
> +/*
> + * kevent.h

Please don't put filenames in the top of file block comments. They're
redudant and as history shows out of date far too often.

> +#ifdef __KERNEL__

Please split the user/kernel ABI and kernel implementation details into
two different headers. That way we don't have to run unifdef as part of
the user headers generation process and it's much cleaner what bit is a
kernel implementation details and what's the public ABI.

> +#define KEVENT_READY 0x1
> +#define KEVENT_STORAGE 0x2
> +#define KEVENT_USER 0x4

Please use enums here.

> + void *priv; /* Private data for different storages.
> + * poll()/select storage has a list of wait_queue_t containers
> + * for each ->poll() { poll_wait()' } here.
> + */

Please try to avoid spilling over the 80 chars limit. In this case it's
easy, just put the comment before the field beeing documented.

> +extern struct kevent_callbacks kevent_registered_callbacks[];

Having global arrays is not very nice. Any chance this could be hidden
behind proper accessor functions?

> +#ifdef CONFIG_KEVENT_INODE
> +void kevent_inode_notify(struct inode *inode, u32 event);
> +void kevent_inode_notify_parent(struct dentry *dentry, u32 event);
> +void kevent_inode_remove(struct inode *inode);
> +#else
> +static inline void kevent_inode_notify(struct inode *inode, u32 event)
> +{
> +}
> +static inline void kevent_inode_notify_parent(struct dentry *dentry, u32 event)
> +{
> +}
> +static inline void kevent_inode_remove(struct inode *inode)
> +{
> +}
> +#endif /* CONFIG_KEVENT_INODE */

The code implementing these prototypes doesn't exist.

> +#ifdef CONFIG_KEVENT_SOCKET
> +#ifdef CONFIG_LOCKDEP
> +void kevent_socket_reinit(struct socket *sock);
> +void kevent_sk_reinit(struct sock *sk);
> +#else
> +static inline void kevent_socket_reinit(struct socket *sock)
> +{
> +}
> +static inline void kevent_sk_reinit(struct sock *sk)
> +{
> +}
> +#endif

Dito. Please clean the header from all this dead code.

> +int kevent_storage_init(void *origin, struct kevent_storage *st)
> +{
> + spin_lock_init(&st->lock);
> + st->origin = origin;
> + INIT_LIST_HEAD(&st->list);
> + return 0;
> +}

Why does this need a return value?

> +int kevent_sys_init(void)
> +{
> + int i;
> +
> + kevent_cache = kmem_cache_create("kevent_cache",
> + sizeof(struct kevent), 0, SLAB_PANIC, NULL, NULL);
> +
> + for (i=0; i<ARRAY_SIZE(kevent_registered_callbacks); ++i) {
> + struct kevent_callbacks *c = &kevent_registered_callbacks[i];
> +
> + c->callback = c->enqueue = c->dequeue = NULL;
> + }
> +
> + return 0;
> +}

Please make this an initcall in this file and make sure it's linked before
kevent_users.c


> +static int kevent_user_open(struct inode *, struct file *);
> +static int kevent_user_release(struct inode *, struct file *);
> +static unsigned int kevent_user_poll(struct file *, struct poll_table_struct *);
> +static int kevent_user_mmap(struct file *, struct vm_area_struct *);

Could you reorder the file so these forward-declaring prototypes aren't
needed?

> + for (i=0; i<ARRAY_SIZE(u->kevent_list); ++i)

for (i = 0; i < ARRAY_SIZE(u->kevent_list); i++)

> +static struct page *kevent_user_nopage(struct vm_area_struct *vma, unsigned long addr, int *type)
> +{
> + struct kevent_user *u = vma->vm_file->private_data;
> + unsigned long off = (addr - vma->vm_start)/PAGE_SIZE;
> + unsigned int pnum = ALIGN(KEVENT_MAX_EVENTS*sizeof(struct mukevent) + sizeof(unsigned int), PAGE_SIZE)/PAGE_SIZE;
> +
> + if (type)
> + *type = VM_FAULT_MINOR;
> +
> + if (off >= pnum)
> + goto err_out_sigbus;
> +
> + u->pring[off] = __get_free_page(GFP_KERNEL);

So we have a pagefault handler that allocates pages.

> +static int kevent_user_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> + unsigned long start = vma->vm_start;
> + struct kevent_user *u = file->private_data;
> +
> + if (vma->vm_flags & VM_WRITE)
> + return -EPERM;
> +
> + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> + vma->vm_ops = &kevent_user_vm_ops;
> + vma->vm_flags |= VM_RESERVED;
> + vma->vm_file = file;
> +
> + if (remap_pfn_range(vma, start, virt_to_phys((void *)u->pring[0]), PAGE_SIZE,
> + vma->vm_page_prot))
> + return -EFAULT;

but you always map the first page. This model sounds odd and rather confusing.
Do we really need to avoid of the cost of the pagefault just for the special
first page?

If so please at least use vm_insert_page() instead of remap_pfn_range().

> +#if 0
> +static inline unsigned int kevent_user_hash(struct ukevent *uk)
> +{
> + unsigned int h = (uk->user[0] ^ uk->user[1]) ^ (uk->id.raw[0] ^ uk->id.raw[1]);
> +
> + h = (((h >> 16) & 0xffff) ^ (h & 0xffff)) & 0xffff;
> + h = (((h >> 8) & 0xff) ^ (h & 0xff)) & KEVENT_HASH_MASK;
> +
> + return h;
> +}
> +#else
> +static inline unsigned int kevent_user_hash(struct ukevent *uk)
> +{
> + return jhash_1word(uk->id.raw[0], 0) & KEVENT_HASH_MASK;
> +}
> +#endif

Please remove that #if 0 code.

> +static int kevent_ctl_process(struct file *file, unsigned int cmd, unsigned int num, void __user *arg)
> +{
> + int err;
> + struct kevent_user *u = file->private_data;
> +
> + if (!u || num > KEVENT_MAX_EVENTS)
> + return -EINVAL;
> +
> + switch (cmd) {
> + case KEVENT_CTL_ADD:
> + err = kevent_user_ctl_add(u, num, arg);
> + break;
> + case KEVENT_CTL_REMOVE:
> + err = kevent_user_ctl_remove(u, num, arg);
> + break;
> + case KEVENT_CTL_MODIFY:
> + err = kevent_user_ctl_modify(u, num, arg);
> + break;
> + default:
> + err = -EINVAL;
> + break;

We were rather against these kind of odd multiplexers in the past. For
these three we at least have a common type beeing passed down so there's
not compat handling problem, but I'm still not very happy with it..

> +asmlinkage long sys_kevent_ctl(int fd, unsigned int cmd, unsigned int num, void __user *arg)
> +{
> + int err = -EINVAL;
> + struct file *file;
> +
> + if (cmd == KEVENT_CTL_INIT)
> + return kevent_ctl_init();

This one on the other hand is plain wrong. At least it should be a separate
syscall. But looking at the code I don't quite understand why you need
a syscall at all, why can't kevent be implemented as a cloning chardevice
(on where every open allocates a new structure and stores it into
file->private_data?)

2006-08-16 13:57:30

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [take9 1/2] kevent: Core files.

On Wed, Aug 16, 2006 at 02:45:50PM +0100, Christoph Hellwig ([email protected]) wrote:
> > diff --git a/include/linux/kevent.h b/include/linux/kevent.h
> > new file mode 100644
> > index 0000000..03eeeea
> > --- /dev/null
> > +++ b/include/linux/kevent.h
> > @@ -0,0 +1,310 @@
> > +/*
> > + * kevent.h
>
> Please don't put filenames in the top of file block comments. They're
> redudant and as history shows out of date far too often.

Ok.

> > +#ifdef __KERNEL__
>
> Please split the user/kernel ABI and kernel implementation details into
> two different headers. That way we don't have to run unifdef as part of
> the user headers generation process and it's much cleaner what bit is a
> kernel implementation details and what's the public ABI.

ok.

> > +#define KEVENT_READY 0x1
> > +#define KEVENT_STORAGE 0x2
> > +#define KEVENT_USER 0x4
>
> Please use enums here.

I used, but I was sugested to use define in some previous releases :)

> > + void *priv; /* Private data for different storages.
> > + * poll()/select storage has a list of wait_queue_t containers
> > + * for each ->poll() { poll_wait()' } here.
> > + */
>
> Please try to avoid spilling over the 80 chars limit. In this case it's
> easy, just put the comment before the field beeing documented.

Ok.

> > +extern struct kevent_callbacks kevent_registered_callbacks[];
>
> Having global arrays is not very nice. Any chance this could be hidden
> behind proper accessor functions?

Ok.

> > +#ifdef CONFIG_KEVENT_INODE
> > +void kevent_inode_notify(struct inode *inode, u32 event);
> > +void kevent_inode_notify_parent(struct dentry *dentry, u32 event);
> > +void kevent_inode_remove(struct inode *inode);
> > +#else
> > +static inline void kevent_inode_notify(struct inode *inode, u32 event)
> > +{
> > +}
> > +static inline void kevent_inode_notify_parent(struct dentry *dentry, u32 event)
> > +{
> > +}
> > +static inline void kevent_inode_remove(struct inode *inode)
> > +{
> > +}
> > +#endif /* CONFIG_KEVENT_INODE */
>
> The code implementing these prototypes doesn't exist.

It exist, it was suggested by you to not include into patchset right
now, so this file is not yet updated to not contain AIO stuff.

> > +#ifdef CONFIG_KEVENT_SOCKET
> > +#ifdef CONFIG_LOCKDEP
> > +void kevent_socket_reinit(struct socket *sock);
> > +void kevent_sk_reinit(struct sock *sk);
> > +#else
> > +static inline void kevent_socket_reinit(struct socket *sock)
> > +{
> > +}
> > +static inline void kevent_sk_reinit(struct sock *sk)
> > +{
> > +}
> > +#endif
>
> Dito. Please clean the header from all this dead code.
>
> > +int kevent_storage_init(void *origin, struct kevent_storage *st)
> > +{
> > + spin_lock_init(&st->lock);
> > + st->origin = origin;
> > + INIT_LIST_HEAD(&st->list);
> > + return 0;
> > +}
>
> Why does this need a return value?

Initialization in general can fail, this one can not, but I prefer to
reserve a second plan.

> > +int kevent_sys_init(void)
> > +{
> > + int i;
> > +
> > + kevent_cache = kmem_cache_create("kevent_cache",
> > + sizeof(struct kevent), 0, SLAB_PANIC, NULL, NULL);
> > +
> > + for (i=0; i<ARRAY_SIZE(kevent_registered_callbacks); ++i) {
> > + struct kevent_callbacks *c = &kevent_registered_callbacks[i];
> > +
> > + c->callback = c->enqueue = c->dequeue = NULL;
> > + }
> > +
> > + return 0;
> > +}
>
> Please make this an initcall in this file and make sure it's linked before
> kevent_users.c

Ok.

> > +static int kevent_user_open(struct inode *, struct file *);
> > +static int kevent_user_release(struct inode *, struct file *);
> > +static unsigned int kevent_user_poll(struct file *, struct poll_table_struct *);
> > +static int kevent_user_mmap(struct file *, struct vm_area_struct *);
>
> Could you reorder the file so these forward-declaring prototypes aren't
> needed?

I prefer structures to be placed at the begining of the file, so it
requires forward declaration.
But there is no sttrong feeling about that, so I will put it at the end.

> > + for (i=0; i<ARRAY_SIZE(u->kevent_list); ++i)
>
> for (i = 0; i < ARRAY_SIZE(u->kevent_list); i++)

Ugh, no. It reduces readability due to exessive number of spaces.

> > +static struct page *kevent_user_nopage(struct vm_area_struct *vma, unsigned long addr, int *type)
> > +{
> > + struct kevent_user *u = vma->vm_file->private_data;
> > + unsigned long off = (addr - vma->vm_start)/PAGE_SIZE;
> > + unsigned int pnum = ALIGN(KEVENT_MAX_EVENTS*sizeof(struct mukevent) + sizeof(unsigned int), PAGE_SIZE)/PAGE_SIZE;
> > +
> > + if (type)
> > + *type = VM_FAULT_MINOR;
> > +
> > + if (off >= pnum)
> > + goto err_out_sigbus;
> > +
> > + u->pring[off] = __get_free_page(GFP_KERNEL);
>
> So we have a pagefault handler that allocates pages.

It is fixed in take10 patchset, which is why it was released.

> > +static int kevent_user_mmap(struct file *file, struct vm_area_struct *vma)
> > +{
> > + unsigned long start = vma->vm_start;
> > + struct kevent_user *u = file->private_data;
> > +
> > + if (vma->vm_flags & VM_WRITE)
> > + return -EPERM;
> > +
> > + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> > + vma->vm_ops = &kevent_user_vm_ops;
> > + vma->vm_flags |= VM_RESERVED;
> > + vma->vm_file = file;
> > +
> > + if (remap_pfn_range(vma, start, virt_to_phys((void *)u->pring[0]), PAGE_SIZE,
> > + vma->vm_page_prot))
> > + return -EFAULT;
>
> but you always map the first page. This model sounds odd and rather confusing.
> Do we really need to avoid of the cost of the pagefault just for the special
> first page?

->nopage() is fixed in take10 patchset to return next one.
Number of pages can grow with te time until limit (which is quite high,
and I was suggested to not allocate them all at startup).

> If so please at least use vm_insert_page() instead of remap_pfn_range().

Ok.

> > +#if 0
> > +static inline unsigned int kevent_user_hash(struct ukevent *uk)
> > +{
> > + unsigned int h = (uk->user[0] ^ uk->user[1]) ^ (uk->id.raw[0] ^ uk->id.raw[1]);
> > +
> > + h = (((h >> 16) & 0xffff) ^ (h & 0xffff)) & 0xffff;
> > + h = (((h >> 8) & 0xff) ^ (h & 0xff)) & KEVENT_HASH_MASK;
> > +
> > + return h;
> > +}
> > +#else
> > +static inline unsigned int kevent_user_hash(struct ukevent *uk)
> > +{
> > + return jhash_1word(uk->id.raw[0], 0) & KEVENT_HASH_MASK;
> > +}
> > +#endif
>
> Please remove that #if 0 code.

Ok.

> +static int kevent_ctl_process(struct file *file, unsigned int cmd, unsigned int num, void __user *arg)
> > +{
> > + int err;
> > + struct kevent_user *u = file->private_data;
> > +
> > + if (!u || num > KEVENT_MAX_EVENTS)
> > + return -EINVAL;
> > +
> > + switch (cmd) {
> > + case KEVENT_CTL_ADD:
> > + err = kevent_user_ctl_add(u, num, arg);
> > + break;
> > + case KEVENT_CTL_REMOVE:
> > + err = kevent_user_ctl_remove(u, num, arg);
> > + break;
> > + case KEVENT_CTL_MODIFY:
> > + err = kevent_user_ctl_modify(u, num, arg);
> > + break;
> > + default:
> > + err = -EINVAL;
> > + break;
>
> We were rather against these kind of odd multiplexers in the past. For
> these three we at least have a common type beeing passed down so there's
> not compat handling problem, but I'm still not very happy with it..

I use one syscall for add/remove/modify, so it requires multiplexer.

> > +asmlinkage long sys_kevent_ctl(int fd, unsigned int cmd, unsigned int num, void __user *arg)
> > +{
> > + int err = -EINVAL;
> > + struct file *file;
> > +
> > + if (cmd == KEVENT_CTL_INIT)
> > + return kevent_ctl_init();
>
> This one on the other hand is plain wrong. At least it should be a separate
> syscall. But looking at the code I don't quite understand why you need
> a syscall at all, why can't kevent be implemented as a cloning chardevice
> (on where every open allocates a new structure and stores it into
> file->private_data?)

That requires separate syscall.

I created a char device in first releases and was forced to not use it
at all.

--
Evgeniy Polyakov

2006-08-16 18:09:29

by Zach Brown

[permalink] [raw]
Subject: Re: [take9 1/2] kevent: Core files.


>>> + for (i=0; i<ARRAY_SIZE(u->kevent_list); ++i)
>> for (i = 0; i < ARRAY_SIZE(u->kevent_list); i++)
>
> Ugh, no. It reduces readability due to exessive number of spaces.

Ihavetoverystronglydisagree.

- z

2006-08-16 18:11:24

by Zach Brown

[permalink] [raw]
Subject: Re: [take9 0/2] kevent: Generic event handling mechanism.


> It is done by scripts using list of files generated by git-diff, but I
> can reformat them to be in a way:

Perhaps you should think about maintaining them as an explicit series of
patches, say with quilt or mq, instead of as one repository that you try
and cut up into separate patches.

- z

2006-08-16 19:25:20

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [take9 1/2] kevent: Core files.

On Wed, Aug 16, 2006 at 11:08:41AM -0700, Zach Brown ([email protected]) wrote:
>
> >>> + for (i=0; i<ARRAY_SIZE(u->kevent_list); ++i)
> >> for (i = 0; i < ARRAY_SIZE(u->kevent_list); i++)
> >
> > Ugh, no. It reduces readability due to exessive number of spaces.
>
> Ihavetoverystronglydisagree.

W e l l , i f y o u i n s i s t a n d a b s o l u t e l y s u r e.

> - z

--
Evgeniy Polyakov

2006-08-16 19:45:55

by David Miller

[permalink] [raw]
Subject: Re: [take9 1/2] kevent: Core files.

From: Zach Brown <[email protected]>
Date: Wed, 16 Aug 2006 11:08:41 -0700

> >>> + for (i=0; i<ARRAY_SIZE(u->kevent_list); ++i)
> >> for (i = 0; i < ARRAY_SIZE(u->kevent_list); i++)
> >
> > Ugh, no. It reduces readability due to exessive number of spaces.
>
> Ihavetoverystronglydisagree.

Metoo. :-)

Spaces help humans parse out the syntactic structure of
multi-token expressions.

2006-08-16 20:07:44

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [take9 1/2] kevent: Core files.

On Wed, Aug 16, 2006 at 12:45:54PM -0700, David Miller ([email protected]) wrote:
> > >>> + for (i=0; i<ARRAY_SIZE(u->kevent_list); ++i)
> > >> for (i = 0; i < ARRAY_SIZE(u->kevent_list); i++)
> > >
> > > Ugh, no. It reduces readability due to exessive number of spaces.
> >
> > Ihavetoverystronglydisagree.
>
> Metoo. :-)
>
> Spaces help humans parse out the syntactic structure of
> multi-token expressions.

There is an anecdote:
in a near future, when world crisis killed economy,
russain scientists found a way to make time machine, so they
get some dictator from the past and ask him how to improve a situation.
He quickly answers that it is only needed to shoot all opposition, kill
the freedom and redraw Kremlin and Red Square into blue colour.
People wonder "such tragic steps, so much blood, but why Red Square?"
Dictator answers: well, if there are no objections about other issues,
we will not change Red Square.

I'm telling it just because I would like to know if there are any issues
which must be fixed in the tomorrow patchset except mentioned in
previous e-mails... :)

--
Evgeniy Polyakov

2006-08-18 10:42:09

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [take9 2/2] kevent: poll/select() notifications. Timer notifications.

On Wed, Aug 16, 2006 at 05:40:32PM +0400, Evgeniy Polyakov wrote:
> > What speaks against a patch the recplaces the epoll core by something that
> > build on kevent while still supporting the epoll interface as a compatibility
> > shim?
>
> There is no problem from my side, but epoll and kevent_poll differs on
> some aspects, so it can be better to not replace them for a while.

Please explain the differences and why they are important. We really
shouldn't keep on adding code without beeing able to replace older bits.
If there's a really good reason we can keep things separate, but

"epoll and kevent_poll differs on some aspects"

is not one :)

2006-08-18 10:46:26

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [take9 1/2] kevent: Core files.

> > > +#define KEVENT_READY 0x1
> > > +#define KEVENT_STORAGE 0x2
> > > +#define KEVENT_USER 0x4
> >
> > Please use enums here.
>
> I used, but I was sugested to use define in some previous releases :)

defines make some sense for userspace-visible ABIs because then people
can test for features with ifdef. It doesn't make any sense for constants
that are used purely in-kernel. For those enums make more sense because
you can for example looks at the symbolic names with a debugger.

> > We were rather against these kind of odd multiplexers in the past. For
> > these three we at least have a common type beeing passed down so there's
> > not compat handling problem, but I'm still not very happy with it..
>
> I use one syscall for add/remove/modify, so it requires multiplexer.

I noticed that you do it, but it's not exactly considered a nice design.

> > > +asmlinkage long sys_kevent_ctl(int fd, unsigned int cmd, unsigned int num, void __user *arg)
> > > +{
> > > + int err = -EINVAL;
> > > + struct file *file;
> > > +
> > > + if (cmd == KEVENT_CTL_INIT)
> > > + return kevent_ctl_init();
> >
> > This one on the other hand is plain wrong. At least it should be a separate
> > syscall. But looking at the code I don't quite understand why you need
> > a syscall at all, why can't kevent be implemented as a cloning chardevice
> > (on where every open allocates a new structure and stores it into
> > file->private_data?)
>
> That requires separate syscall.

Yes, it requires a separate syscall.

> I created a char device in first releases and was forced to not use it
> at all.

Do you have a reference to it? In this case a char devices makes a lot of
sense because you get a filedescriptor and have operations only defined on
it. In fact given that you have a multiplexer anyway there's really no
point in adding a syscall for that aswell, you could rather use the existing
and debugged ioctl() multiplexer. Sure, it's still not what we consider
nice, but better than adding even more odd multiplexer syscalls.

2006-08-18 11:15:28

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [take9 2/2] kevent: poll/select() notifications. Timer notifications.

On Fri, Aug 18, 2006 at 11:41:20AM +0100, Christoph Hellwig ([email protected]) wrote:
> On Wed, Aug 16, 2006 at 05:40:32PM +0400, Evgeniy Polyakov wrote:
> > > What speaks against a patch the recplaces the epoll core by something that
> > > build on kevent while still supporting the epoll interface as a compatibility
> > > shim?
> >
> > There is no problem from my side, but epoll and kevent_poll differs on
> > some aspects, so it can be better to not replace them for a while.
>
> Please explain the differences and why they are important. We really
> shouldn't keep on adding code without beeing able to replace older bits.
> If there's a really good reason we can keep things separate, but
>
> "epoll and kevent_poll differs on some aspects"
>
> is not one :)

kevent_poll uses hash table (actually it is kevent that uses table),
locking is simpler and part of it is hidden in kevent core.
Actually kevent_poll is just a container allocator for poll wait queue.
So epoll does not differ (except hash/tree and locking,
which is based on locks for pathes which are shared in kevent with those
ones which can be called from irq/bh context) from kevent + kevent_poll.
And since kevent_poll can be not selected while epoll is always there
(until embedded config is turned on), I recommend to have them both.
Or always turn kevent on :)

--
Evgeniy Polyakov

2006-08-18 11:39:40

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [take9 1/2] kevent: Core files.

On Fri, Aug 18, 2006 at 11:46:07AM +0100, Christoph Hellwig ([email protected]) wrote:
> > > > +#define KEVENT_READY 0x1
> > > > +#define KEVENT_STORAGE 0x2
> > > > +#define KEVENT_USER 0x4
> > >
> > > Please use enums here.
> >
> > I used, but I was sugested to use define in some previous releases :)
>
> defines make some sense for userspace-visible ABIs because then people
> can test for features with ifdef. It doesn't make any sense for constants
> that are used purely in-kernel. For those enums make more sense because
> you can for example looks at the symbolic names with a debugger.

Enums are only usefull when value is increased with each new member by
one.

> > > We were rather against these kind of odd multiplexers in the past. For
> > > these three we at least have a common type beeing passed down so there's
> > > not compat handling problem, but I'm still not very happy with it..
> >
> > I use one syscall for add/remove/modify, so it requires multiplexer.
>
> I noticed that you do it, but it's not exactly considered a nice design.

There will be either several syscalls or multiplexer...
I prefer to have one syscall and a lot of multiplexers inside.

> > > > +asmlinkage long sys_kevent_ctl(int fd, unsigned int cmd, unsigned int num, void __user *arg)
> > > > +{
> > > > + int err = -EINVAL;
> > > > + struct file *file;
> > > > +
> > > > + if (cmd == KEVENT_CTL_INIT)
> > > > + return kevent_ctl_init();
> > >
> > > This one on the other hand is plain wrong. At least it should be a separate
> > > syscall. But looking at the code I don't quite understand why you need
> > > a syscall at all, why can't kevent be implemented as a cloning chardevice
> > > (on where every open allocates a new structure and stores it into
> > > file->private_data?)
> >
> > That requires separate syscall.
>
> Yes, it requires a separate syscall.
>
> > I created a char device in first releases and was forced to not use it
> > at all.
>
> Do you have a reference to it? In this case a char devices makes a lot of
> sense because you get a filedescriptor and have operations only defined on
> it. In fact given that you have a multiplexer anyway there's really no
> point in adding a syscall for that aswell, you could rather use the existing
> and debugged ioctl() multiplexer. Sure, it's still not what we consider
> nice, but better than adding even more odd multiplexer syscalls.

Somewhere in february.
Here is link to initial anounce which used ioctl and raw char device and
enums for all constants.

http://marc.theaimsgroup.com/?l=linux-netdev&m=113949344414464&w=2

--
Evgeniy Polyakov

2006-08-21 10:57:03

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [take9 1/2] kevent: Core files.

On Fri, Aug 18, 2006 at 03:23:36PM +0400, Evgeniy Polyakov wrote:
> > defines make some sense for userspace-visible ABIs because then people
> > can test for features with ifdef. It doesn't make any sense for constants
> > that are used purely in-kernel. For those enums make more sense because
> > you can for example looks at the symbolic names with a debugger.
>
> Enums are only usefull when value is increased with each new member by
> one.

No, they are not. Please search the lkml archives, this came up multiple
times.

> There will be either several syscalls or multiplexer...
> I prefer to have one syscall and a lot of multiplexers inside.

To make life for everyone to detangle the mess hard. Please at least
try to follow existing design principles.

> > > I created a char device in first releases and was forced to not use it
> > > at all.
> >
> > Do you have a reference to it? In this case a char devices makes a lot of
> > sense because you get a filedescriptor and have operations only defined on
> > it. In fact given that you have a multiplexer anyway there's really no
> > point in adding a syscall for that aswell, you could rather use the existing
> > and debugged ioctl() multiplexer. Sure, it's still not what we consider
> > nice, but better than adding even more odd multiplexer syscalls.
>
> Somewhere in february.
> Here is link to initial anounce which used ioctl and raw char device and
> enums for all constants.
>
> http://marc.theaimsgroup.com/?l=linux-netdev&m=113949344414464&w=2

That thread only shows your patch but no comments to it. Do you have
an url for the complaint about this design? And please include the author
of it in the cc list of your reply so we can settle the arguments.

2006-08-21 11:01:08

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [take9 2/2] kevent: poll/select() notifications. Timer notifications.

On Fri, Aug 18, 2006 at 02:59:34PM +0400, Evgeniy Polyakov wrote:
> > If there's a really good reason we can keep things separate, but
> >
> > "epoll and kevent_poll differs on some aspects"
> >
> > is not one :)
>
> kevent_poll uses hash table (actually it is kevent that uses table),
> locking is simpler and part of it is hidden in kevent core.
> Actually kevent_poll is just a container allocator for poll wait queue.
> So epoll does not differ (except hash/tree and locking,
> which is based on locks for pathes which are shared in kevent with those
> ones which can be called from irq/bh context) from kevent + kevent_poll.
> And since kevent_poll can be not selected while epoll is always there
> (until embedded config is turned on), I recommend to have them both.
> Or always turn kevent on :)

You mention a lot of implementation details that absoultely shouldn't
matter to the userspace interface.

I might not have explained enough what the point behind all this is, so
I'll try to explain it again:

- the fate of aio, inotify, epoll, etc shows we badly need a generic
event mechnism that unifies event based interfaces of various subsystem.
Only having a single mechanisms allows things like unified event loops
and gives application progreammers the chance to learn that one interface
for real and get it right.
- kevent looks like the right way to do this. but to show it can really
archive this it needs to show it can do the things the existing event
systems can do at least as good. reimplementing their user interfaces
ontop of kevent is the best (or maybe only) way to show that.
epoll is probably the easiest of the ones we have, so I'd suggest starting
with it. inotify will be a lot harder, but we'll need that aswell.
the kevent inode hooks you had in your earlier patches will never ever
get in.

Was this clear enough?

2006-08-21 11:14:45

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [take9 1/2] kevent: Core files.

On Mon, Aug 21, 2006 at 11:56:37AM +0100, Christoph Hellwig ([email protected]) wrote:
> On Fri, Aug 18, 2006 at 03:23:36PM +0400, Evgeniy Polyakov wrote:
> > > defines make some sense for userspace-visible ABIs because then people
> > > can test for features with ifdef. It doesn't make any sense for constants
> > > that are used purely in-kernel. For those enums make more sense because
> > > you can for example looks at the symbolic names with a debugger.
> >
> > Enums are only usefull when value is increased with each new member by
> > one.
>
> No, they are not. Please search the lkml archives, this came up multiple
> times.

Enums when OR'ed and ANDed in theory can produce value out of the enum
set.
And what is the difference between
#define A 1
#define B 2
#define C 4
and
enum {
A = 1,
B = 2,
C = 4,
}
?

> > There will be either several syscalls or multiplexer...
> > I prefer to have one syscall and a lot of multiplexers inside.
>
> To make life for everyone to detangle the mess hard. Please at least
> try to follow existing design principles.

I added as less as possible syscalls - control one read ones.
The former can initialize/add/remove/modify kevents, the latter reads
ready events.

> > > > I created a char device in first releases and was forced to not use it
> > > > at all.
> > >
> > > Do you have a reference to it? In this case a char devices makes a lot of
> > > sense because you get a filedescriptor and have operations only defined on
> > > it. In fact given that you have a multiplexer anyway there's really no
> > > point in adding a syscall for that aswell, you could rather use the existing
> > > and debugged ioctl() multiplexer. Sure, it's still not what we consider
> > > nice, but better than adding even more odd multiplexer syscalls.
> >
> > Somewhere in february.
> > Here is link to initial anounce which used ioctl and raw char device and
> > enums for all constants.
> >
> > http://marc.theaimsgroup.com/?l=linux-netdev&m=113949344414464&w=2
>
> That thread only shows your patch but no comments to it. Do you have
> an url for the complaint about this design? And please include the author
> of it in the cc list of your reply so we can settle the arguments.

enum vs. defines were changed after David Miller's suggestion, since the
whole network stack and epoll use defines extensively, and kevent was
built for them in a first time.

I do not remember who suggested not to use char device, but I saw that
inotify specially mention about ioct/syscall issues, old discussion
about event handling mechanism somewhere in this thread:
http://uwsg.iu.edu/hypermail/linux/kernel/0010.3/0002.html

It looks a bit illogical to have epoll/poll syscalls and read data from
char device and it's ioctls for network sockets completeness or timer
notifications.

--
Evgeniy Polyakov

2006-08-21 11:26:54

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [take9 2/2] kevent: poll/select() notifications. Timer notifications.

On Mon, Aug 21, 2006 at 12:01:04PM +0100, Christoph Hellwig ([email protected]) wrote:
> On Fri, Aug 18, 2006 at 02:59:34PM +0400, Evgeniy Polyakov wrote:
> > > If there's a really good reason we can keep things separate, but
> > >
> > > "epoll and kevent_poll differs on some aspects"
> > >
> > > is not one :)
> >
> > kevent_poll uses hash table (actually it is kevent that uses table),
> > locking is simpler and part of it is hidden in kevent core.
> > Actually kevent_poll is just a container allocator for poll wait queue.
> > So epoll does not differ (except hash/tree and locking,
> > which is based on locks for pathes which are shared in kevent with those
> > ones which can be called from irq/bh context) from kevent + kevent_poll.
> > And since kevent_poll can be not selected while epoll is always there
> > (until embedded config is turned on), I recommend to have them both.
> > Or always turn kevent on :)
>
> You mention a lot of implementation details that absoultely shouldn't
> matter to the userspace interface.
>
> I might not have explained enough what the point behind all this is, so
> I'll try to explain it again:
>
> - the fate of aio, inotify, epoll, etc shows we badly need a generic
> event mechnism that unifies event based interfaces of various subsystem.
> Only having a single mechanisms allows things like unified event loops
> and gives application progreammers the chance to learn that one interface
> for real and get it right.
> - kevent looks like the right way to do this. but to show it can really
> archive this it needs to show it can do the things the existing event
> systems can do at least as good. reimplementing their user interfaces
> ontop of kevent is the best (or maybe only) way to show that.
> epoll is probably the easiest of the ones we have, so I'd suggest starting
> with it. inotify will be a lot harder, but we'll need that aswell.
> the kevent inode hooks you had in your earlier patches will never ever
> get in.
>
> Was this clear enough?

Sure, but if I say that it would sound like advertisement :)
Some inotify notifications (inode create/remove) are implemented already
in (dropped) FS notification patchset.

--
Evgeniy Polyakov

2006-08-21 12:59:23

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: [take9 1/2] kevent: Core files.

On Mon, 2006-08-21 at 15:13 +0400, Evgeniy Polyakov wrote:
[...]
> And what is the difference between

As others already pointed out in this thread:

These are not seen by the C compiler.
> #define A 1
> #define B 2
> #define C 4
> and

These are known by the C compiler and thus usable/viewable in a
debugger.
> enum {
> A = 1,
> B = 2,
> C = 4,
> }
> ?
Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services

2006-08-21 13:02:12

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [take9 1/2] kevent: Core files.

On Mon, Aug 21, 2006 at 02:53:25PM +0200, Bernd Petrovitsch ([email protected]) wrote:
> On Mon, 2006-08-21 at 15:13 +0400, Evgeniy Polyakov wrote:
> [...]
> > And what is the difference between
>
> As others already pointed out in this thread:
>
> These are not seen by the C compiler.
> > #define A 1
> > #define B 2
> > #define C 4
> > and
>
> These are known by the C compiler and thus usable/viewable in a
> debugger.
> > enum {
> > A = 1,
> > B = 2,
> > C = 4,
> > }
> > ?

:) And I pointed quite a few other issues about enums vs. defines.
According to this one - no one wants to watch enums in debugger.

And, ugh:

(gdb) list
1 enum {
2 A = 1,
3 B = 2,
4 };
5
6 int main()
7 {
8 printf("%x\n", A | B);
9 }
(gdb) bre 8
Breakpoint 1 at 0x4004ac: file ./test.c, line 8.
(gdb) r
Starting program: /tmp/test

Breakpoint 1, main () at ./test.c:8
8 printf("%x\n", A | B);
(gdb) p A
No symbol "A" in current context.


Actually I completely do not care about define or enums, it is really
silly dispute, I just do not want to rewrite bunch of code _again_ and
then _again_ when someone decide that defines are better.

--
Evgeniy Polyakov

2006-08-21 13:54:44

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: [take9 1/2] kevent: Core files.

On Mon, 2006-08-21 at 17:01 +0400, Evgeniy Polyakov wrote:
[ #define vs enum { } ]
> And, ugh:
>
> (gdb) list
> 1 enum {
> 2 A = 1,
> 3 B = 2,
> 4 };
> 5
> 6 int main()
> 7 {
> 8 printf("%x\n", A | B);
> 9 }
> (gdb) bre 8
> Breakpoint 1 at 0x4004ac: file ./test.c, line 8.
> (gdb) r
> Starting program: /tmp/test
>
> Breakpoint 1, main () at ./test.c:8
> 8 printf("%x\n", A | B);
> (gdb) p A
> No symbol "A" in current context.

Oops, I stand corrected.

> Actually I completely do not care about define or enums, it is really
> silly dispute, I just do not want to rewrite bunch of code _again_ and
> then _again_ when someone decide that defines are better.

ACK. Personally I also do not care that much - as long as it doesn't
change with the phase of the moon.
And we probably do not want
---- snip ----
#ifdef CONFIG_I_LOVE_ENUMS
#define A 1
#define B 2
#define C 4
#else
enum {
A = 1,
B = 2,
C = 4,
};
#endif
---- snip ----
either.

Bernd, shutting now up on this thread
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services

2006-08-21 19:09:25

by David Miller

[permalink] [raw]
Subject: Re: [take9 1/2] kevent: Core files.

From: Evgeniy Polyakov <[email protected]>
Date: Mon, 21 Aug 2006 17:01:21 +0400

> Actually I completely do not care about define or enums, it is really
> silly dispute, I just do not want to rewrite bunch of code _again_ and
> then _again_ when someone decide that defines are better.

I totally agree.

What in the world is wrong with you people arguing over stuff like
this?

If the goal is to discourage Evgeniy and his work, you might just
get your wish if you keep up with this silly coding style and
enumeration crap!

I can't even stand to read these kevent threads any longer, the
sanity in them has long gone out the window.

2006-08-22 14:36:13

by Davide Libenzi

[permalink] [raw]
Subject: Re: [take9 2/2] kevent: poll/select() notifications. Timer notifications.

On Wed, 16 Aug 2006, Christoph Hellwig wrote:

> On Mon, Aug 14, 2006 at 10:21:36AM +0400, Evgeniy Polyakov wrote:
>>
>> poll/select() notifications. Timer notifications.
>>
>> This patch includes generic poll/select and timer notifications.
>>
>> kevent_poll works simialr to epoll and has the same issues (callback
>> is invoked not from internal state machine of the caller, but through
>> process awake).
>
> I'm not a big fan of duplicating code over and over. kevent is a candidate
> for a generic event devlivery mechanisms which is a _very_ good thing. But
> starting that system by duplicating existing functionality is not very nice.
>
> What speaks against a patch the recplaces the epoll core by something that
> build on kevent while still supporting the epoll interface as a compatibility
> shim?

Sorry, I'm catching up with a huge post-vacation backlog, so I didn't have
the time to look at the source code. But, if kevent performance is same or
better, and the external epoll interface is fully supported, than I think
the shim layer idea is a good one. Provided the shim being smaller than
eventpoll.c :)



- Davide