2007-02-01 09:39:09

by Sébastien Dugué

[permalink] [raw]
Subject: [PATCH -mm 0/7][AIO] - AIO completion signal notification v6


Hi Andrew,

here is the latest rework of the aio notification and listio support
patches with comments from Oleg Nesterov and you addressed.

S?bastien.

This set now consists in 7 patches:

1. rework-compat-sys-io-submit: cleanup the sys_io_submit() compat layer,
making it more efficient and laying out the base for the following patches

2. aio-header-fix-includes: fixes the double inclusion of uio.h in aio.h

3. aio-fix-access_ok-check: fixes the access_ok() checks

4. make-good_sigevent-non-static: move good_sigevent into signal.c, rename
it to something more meaningfull and make it non-static

5. make-__sigqueue_alloc-free-non-static: make __sigqueue_alloc() and
__sigqueue_free() non static

6. aio-notify-sig: the AIO completion signal notification

7. add-listio-syscall-support: adds listio support via a new syscall

Description are in the individual patches.


Changes from v5: based on comments from Oleg Nesterov and Andrew Morton

- Fix for signo range not checked in the SIGEV_THREAD_ID case in
good_sigevent()
- renamed good_sigevent() to sigevent_find_task()

- Fixed the access_ok checks in sys_io_submit() and
compat_sys_io_submit()

- Made __sigqueue_alloc() and __sigqueue_free() non static
- Reworked the aio code to directly use __sigqueue_alloc() and
__sigqueue_free() and avoid messing with SIGQUEUE_PREALLOC

- Removed the unneeded PF_EXITING check in aio_setup_sigevent()
- Added a comment in aio_setup_sigevent() for why there is no task
ref leak
- Changed read_lock(&tasklist_lock) to rcu_read_lock() in
aio_setup_sigevent()

- Fixed an ioctx reference leak in compat_sys_lio_submit()
- Renamed lio_check() as lio_notify()
- Added comments to lio_notify() and lio_create()
- Replaced sigqueue_free() by __sigqueue_free() in lio_notify()

Changes from v4:
No comments received for v4, so it must be perfect ;-)
replaced the IOCB_CMD_GROUP listio implementation with Bharata's
syscall approach which I find much cleaner.

Changes from v3:

- added justification for the compat_sys_io_submit() cleanup - Zach Brown
- more cleanups in compat_sys_io_submit() to put it in line with
sys_io_submit() - Zach Brown

- Changed "Export good_sigevent()" patch name to "Make good_sigevent()
non-static" to better describe what it does. - Christoph Hellwig
- Reworked good_sigevent() to make it more readable. - Christoph Hellwig

- Simplified the use of the SIGEV_* constants in signal notification -
Christoph Hellwig
- Take a reference on the target task both for the SIGEV_THREAD_ID and
SIGEV_SIGNAL cases.

Changes from v2:
- rebased to 2.6.19-rc6-mm2
- reworked the sys_io_submit() compat layer as suggested by Zach Brown
- fixed saving of a pointer to a task struct in aio-notify-sig as
pointed out by Andrew Morton

Changes from v1:
- cleanups suggested by Christoph Hellwig, Badari Pulavarty and
Zach Brown
- added lisio patch



2007-02-01 09:39:06

by Sébastien Dugué

[permalink] [raw]
Subject: [PATCH -mm 6/7][AIO] - AIO completion signal notification


From: S?bastien Dugu? <[email protected]>

AIO completion signal notification

The current 2.6 kernel does not support notification of user space via
an RT signal upon an asynchronous IO completion. The POSIX specification
states that when an AIO request completes, a signal can be delivered to
the application as notification.

This patch adds a struct sigevent *aio_sigeventp to the iocb.
The relevant fields (pid, signal number and value) are stored in the kiocb
for use when the request completes.

That sigevent structure is filled by the application as part of the AIO
request preparation. Upon request completion, the kernel notifies the
application using those sigevent parameters. If SIGEV_NONE has been specified,
then the old behaviour is retained and the application must rely on polling
the completion queue using io_getevents().


A struct sigevent *aio_sigeventp field is added to struct iocb in
include/linux/aio_abi.h

A struct aio_notify containing the sigevent parameters is defined in aio.h:

struct aio_notify {
struct task_struct *target;
__u16 signo;
__u16 notify;
sigval_t value;
};

A struct aio_notify ki_notify is added to struct kiocb in include/linux/aio.h

In io_submit_one(), if the application provided a sigevent then
aio_setup_sigevent() is called which does the following:

- check access to the user sigevent and make a local copy

- if the requested notification is SIGEV_NONE, then nothing to do

- fill in the kiocb->ki_notify fields (notify, signo, value)

- check sigevent consistency, get the signal target task and
save it in kiocb->ki_notify.target

- preallocate a sigqueue for this event using __sigqueue_alloc()

Upon request completion, in aio_complete(), if notification is needed for
this request (iocb->ki_notify.notify != SIGEV_NONE), then aio_send_signal()
is called to signal the target task as follows:

- fill in the siginfo struct to be sent to the task

- if notify is SIGEV_THREAD_ID then send signal to specific task
using send_sigqueue()

- else send signal to task group using send_5group_sigqueue()



Notes concerning sigqueue preallocation:

To ensure reliable delivery of completion notification, the sigqueue is
preallocated in the submission path so that there is no chance it can fail
in the completion path.

Unlike the posix-timers case (currently the single other user of sigqueue
preallocation), where the sigqueue is allocated for the lifetime of the timer
and freed at timer destruction time, the aio case is a bit more tricky due to
the async nature of the whole thing.

In the aio case, the sigqueue exists for the lifetime of the request, therefore
it must be freed only once the signal for the request completion has been
delivered. The sigqueue is therefore freed by the signal delivery mechanism
when the signal is collected. The aio submission and completion error paths
do an explicit __sigqueue_free() to free the sigqueue.


fs/aio.c | 114 ++++++++++++++++++++++++++++++++++++++++++++++--
fs/compat.c | 18 +++++++
include/linux/aio.h | 12 +++++
include/linux/aio_abi.h | 3 -
kernel/signal.c | 4 -
5 files changed, 144 insertions(+), 7 deletions(-)

Signed-off-by: S?bastien Dugu? <[email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

Index: linux-2.6.20-rc6-mm3/fs/aio.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c 2007-01-30 11:32:42.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c 2007-01-30 11:41:41.000000000 +0100
@@ -419,6 +419,7 @@ static struct kiocb fastcall *__aio_get_
req->ki_dtor = NULL;
req->private = NULL;
req->ki_iovec = NULL;
+ req->ki_notify.sigq = NULL;
INIT_LIST_HEAD(&req->ki_run_list);

/* Check if the completion queue has enough free space to
@@ -465,6 +466,12 @@ static inline void really_put_req(struct
req->ki_dtor(req);
if (req->ki_iovec != &req->ki_inline_vec)
kfree(req->ki_iovec);
+
+ /* Release task ref */
+ if (req->ki_notify.notify == SIGEV_THREAD_ID ||
+ req->ki_notify.notify == SIGEV_SIGNAL)
+ put_task_struct(req->ki_notify.target);
+
kmem_cache_free(kiocb_cachep, req);
ctx->reqs_active--;

@@ -916,6 +923,78 @@ void fastcall kick_iocb(struct kiocb *io
}
EXPORT_SYMBOL(kick_iocb);

+static int aio_send_signal(struct aio_notify *notify)
+{
+ struct sigqueue *sigq = notify->sigq;
+ struct siginfo *info = &sigq->info;
+ int ret;
+
+ memset(info, 0, sizeof(struct siginfo));
+
+ info->si_signo = notify->signo;
+ info->si_errno = 0;
+ info->si_code = SI_ASYNCIO;
+ info->si_pid = 0;
+ info->si_uid = 0;
+ info->si_value = notify->value;
+
+ if (notify->notify & SIGEV_THREAD_ID)
+ ret = send_sigqueue(notify->signo, sigq, notify->target);
+ else
+ ret = send_group_sigqueue(notify->signo, sigq, notify->target);
+
+ return ret;
+}
+
+static long aio_setup_sigevent(struct aio_notify *notify,
+ struct sigevent __user *user_event)
+{
+ sigevent_t event;
+ struct task_struct *target;
+
+ if (copy_from_user(&event, user_event, sizeof (event)))
+ return -EFAULT;
+
+ if (event.sigev_notify == SIGEV_NONE)
+ return 0;
+
+ notify->notify = event.sigev_notify;
+ notify->signo = event.sigev_signo;
+ notify->value = event.sigev_value;
+
+ rcu_read_lock();
+ target = sigevent_find_task(&event);
+
+ if (unlikely(!target))
+ goto out_unlock;
+
+ /*
+ * At this point, we know that notify is either SIGEV_SIGNAL or
+ * SIGEV_THREAD_ID and the target task is valid. So get a reference
+ * on the task, it will be dropped in really_put_req() when
+ * we're done with the request.
+ */
+ get_task_struct(target);
+ notify->target = target;
+ rcu_read_unlock();
+
+ notify->sigq = __sigqueue_alloc(current, GFP_KERNEL, 0);
+
+ /*
+ * The task ref will be released in really_put_req()
+ * when we dispose of the iocb later on in the submission
+ * path.
+ */
+ if (unlikely(!notify->sigq))
+ return -EAGAIN;
+
+ return 0;
+
+out_unlock:
+ read_unlock(&tasklist_lock);
+ return -EINVAL;
+}
+
/* aio_complete
* Called when the io request on the given iocb is complete.
* Returns true if this is the last user of the request. The
@@ -963,8 +1042,11 @@ int fastcall aio_complete(struct kiocb *
* cancelled requests don't get events, userland was given one
* when the event got cancelled.
*/
- if (kiocbIsCancelled(iocb))
+ if (kiocbIsCancelled(iocb)) {
+ if (iocb->ki_notify.sigq)
+ __sigqueue_free(iocb->ki_notify.sigq);
goto put_rq;
+ }

ring = kmap_atomic(info->ring_pages[0], KM_IRQ1);

@@ -994,6 +1076,15 @@ int fastcall aio_complete(struct kiocb *
kunmap_atomic(ring, KM_IRQ1);

pr_debug("added to ring %p at [%lu]\n", iocb, tail);
+
+ if (iocb->ki_notify.notify != SIGEV_NONE) {
+ ret = aio_send_signal(&iocb->ki_notify);
+
+ /* If signal generation failed, release the sigqueue */
+ if (ret)
+ __sigqueue_free(iocb->ki_notify.sigq);
+ }
+
put_rq:
/* everything turned out well, dispose of the aiocb. */
ret = __aio_put_req(ctx, iocb);
@@ -1545,8 +1636,7 @@ int fastcall io_submit_one(struct kioctx
ssize_t ret;

/* enforce forwards compatibility on users */
- if (unlikely(iocb->aio_reserved1 || iocb->aio_reserved2 ||
- iocb->aio_reserved3)) {
+ if (unlikely(iocb->aio_reserved1 || iocb->aio_reserved3)) {
pr_debug("EINVAL: io_submit: reserve field set\n");
return -EINVAL;
}
@@ -1555,6 +1645,7 @@ int fastcall io_submit_one(struct kioctx
if (unlikely(
(iocb->aio_buf != (unsigned long)iocb->aio_buf) ||
(iocb->aio_nbytes != (size_t)iocb->aio_nbytes) ||
+ (iocb->aio_sigeventp != (unsigned long)iocb->aio_sigeventp) ||
((ssize_t)iocb->aio_nbytes < 0)
)) {
pr_debug("EINVAL: io_submit: overflow check\n");
@@ -1588,11 +1679,21 @@ int fastcall io_submit_one(struct kioctx
init_waitqueue_func_entry(&req->ki_wait.wait, aio_wake_function);
INIT_LIST_HEAD(&req->ki_wait.wait.task_list);
req->ki_run_list.next = req->ki_run_list.prev = NULL;
+ /* handle setting up the sigevent for POSIX AIO signals */
+ req->ki_notify.notify = SIGEV_NONE;
+
+ if (iocb->aio_sigeventp) {
+ ret = aio_setup_sigevent(&req->ki_notify,
+ (struct sigevent __user *)(unsigned long)
+ iocb->aio_sigeventp);
+ if (ret)
+ goto out_put_req;
+ }

ret = aio_setup_iocb(req);

if (ret)
- goto out_put_req;
+ goto out_sigqfree;

spin_lock_irq(&ctx->ctx_lock);
aio_run_iocb(req);
@@ -1605,6 +1706,11 @@ int fastcall io_submit_one(struct kioctx
aio_put_req(req); /* drop extra ref to req */
return 0;

+out_sigqfree:
+ /* Undo the sigqueue alloc if someting went bad */
+ if (req->ki_notify.sigq)
+ __sigqueue_free(req->ki_notify.sigq);
+
out_put_req:
aio_put_req(req); /* drop extra ref to req */
aio_put_req(req); /* drop i/o ref to req */
Index: linux-2.6.20-rc6-mm3/include/linux/aio_abi.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/linux/aio_abi.h 2007-01-30 11:28:56.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/aio_abi.h 2007-01-30 11:41:41.000000000 +0100
@@ -82,8 +82,9 @@ struct iocb {
__u64 aio_nbytes;
__s64 aio_offset;

+ __u64 aio_sigeventp; /* pointer to struct sigevent */
+
/* extra parameters */
- __u64 aio_reserved2; /* TODO: use this for a (struct sigevent *) */
__u64 aio_reserved3;
}; /* 64 bytes */

Index: linux-2.6.20-rc6-mm3/include/linux/aio.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/linux/aio.h 2007-01-30 11:30:50.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/aio.h 2007-01-30 11:41:41.000000000 +0100
@@ -7,6 +7,7 @@
#include <linux/uio.h>

#include <asm/atomic.h>
+#include <asm/siginfo.h>

#define AIO_MAXSEGS 4
#define AIO_KIOGRP_NR_ATOMIC 8
@@ -54,6 +55,14 @@ struct kioctx;
#define kiocbIsCancelled(iocb) test_bit(KIF_CANCELLED, &(iocb)->ki_flags)
#define kiocbIsRestarted(iocb) test_bit(KIF_RESTARTED, &(iocb)->ki_flags)

+struct aio_notify {
+ struct task_struct *target;
+ __u16 signo;
+ __u16 notify;
+ sigval_t value;
+ struct sigqueue *sigq;
+};
+
/* is there a better place to document function pointer methods? */
/**
* ki_retry - iocb forward progress callback
@@ -123,6 +132,9 @@ struct kiocb {

struct list_head ki_list; /* the aio core uses this
* for cancellation */
+
+ /* to notify a process on I/O event */
+ struct aio_notify ki_notify;
};

#define is_sync_kiocb(iocb) ((iocb)->ki_key == KIOCB_SYNC_KEY)
Index: linux-2.6.20-rc6-mm3/fs/compat.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/compat.c 2007-01-30 11:31:55.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/compat.c 2007-01-30 11:41:41.000000000 +0100
@@ -665,6 +665,7 @@ compat_sys_io_submit(aio_context_t ctx_i
compat_uptr_t uptr;
struct iocb __user *user_iocb;
struct iocb tmp;
+ struct compat_sigevent __user *uevent;

if (unlikely(get_user(uptr, iocb + i))) {
ret = -EFAULT;
@@ -678,6 +679,23 @@ compat_sys_io_submit(aio_context_t ctx_i
break;
}

+ uevent = (struct compat_sigevent __user *)tmp.aio_sigeventp;
+
+ if (uevent) {
+ struct sigevent __user *event = NULL;
+ struct sigevent kevent;
+
+ event = compat_alloc_user_space(sizeof(*event));
+
+ if (get_compat_sigevent(&kevent, uevent) ||
+ copy_to_user(event, &kevent, sizeof(*event))) {
+ ret = -EFAULT;
+ break;
+ }
+
+ tmp.aio_sigeventp = (__u64)event;
+ }
+
ret = io_submit_one(ctx, user_iocb, &tmp);
if (ret)
break;
Index: linux-2.6.20-rc6-mm3/kernel/signal.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/kernel/signal.c 2007-01-30 11:41:39.000000000 +0100
+++ linux-2.6.20-rc6-mm3/kernel/signal.c 2007-01-30 11:41:41.000000000 +0100
@@ -1517,7 +1517,7 @@ int send_sigqueue(int sig, struct sigque
unsigned long flags;
int ret = 0;

- BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
+ BUG_ON(!(q->flags & SIGQUEUE_PREALLOC) && q->info.si_code != SI_ASYNCIO);

/*
* The rcu based delayed sighand destroy makes it possible to
@@ -1568,7 +1568,7 @@ send_group_sigqueue(int sig, struct sigq
unsigned long flags;
int ret = 0;

- BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
+ BUG_ON(!(q->flags & SIGQUEUE_PREALLOC) && q->info.si_code != SI_ASYNCIO);

read_lock(&tasklist_lock);
/* Since it_lock is held, p->sighand cannot be NULL. */

2007-02-01 09:39:33

by Sébastien Dugué

[permalink] [raw]
Subject: [PATCH -mm 4/7][AIO] - Make good_sigevent() non-static


From: S?bastien Dugu? <[email protected]>

Make good_sigevent() non-static and rename it


Move good_sigevent() from posix-timers.c to signal.c where it belongs,
clean it up, rename it to sigevent_find_task() to better describe what the
function does and make it non-static so that it can be used by other
subsystems.


include/linux/signal.h | 1 +
kernel/posix-timers.c | 19 +------------------
kernel/signal.c | 24 ++++++++++++++++++++++++
3 files changed, 26 insertions(+), 18 deletions(-)

Signed-off-by: S?bastien Dugu? <[email protected]>


Index: linux-2.6.20-rc6-mm3/include/linux/signal.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/linux/signal.h 2007-01-30 11:41:27.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/signal.h 2007-01-30 11:41:36.000000000 +0100
@@ -240,6 +240,7 @@ extern int sigprocmask(int, sigset_t *,

struct pt_regs;
extern int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka, struct pt_regs *regs, void *cookie);
+extern struct task_struct * sigevent_find_task(sigevent_t *);

extern struct kmem_cache *sighand_cachep;

Index: linux-2.6.20-rc6-mm3/kernel/posix-timers.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/kernel/posix-timers.c 2007-01-30 11:41:27.000000000 +0100
+++ linux-2.6.20-rc6-mm3/kernel/posix-timers.c 2007-01-30 11:41:36.000000000 +0100
@@ -367,23 +367,6 @@ static enum hrtimer_restart posix_timer_
return ret;
}

-static struct task_struct * good_sigevent(sigevent_t * event)
-{
- struct task_struct *rtn = current->group_leader;
-
- if ((event->sigev_notify & SIGEV_THREAD_ID ) &&
- (!(rtn = find_task_by_pid(event->sigev_notify_thread_id)) ||
- rtn->tgid != current->tgid ||
- (event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_SIGNAL))
- return NULL;
-
- if (((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) &&
- ((event->sigev_signo <= 0) || (event->sigev_signo > SIGRTMAX)))
- return NULL;
-
- return rtn;
-}
-
void register_posix_clock(const clockid_t clock_id, struct k_clock *new_clock)
{
if ((unsigned) clock_id >= MAX_CLOCKS) {
@@ -496,7 +479,7 @@ sys_timer_create(const clockid_t which_c
new_timer->it_sigev_value = event.sigev_value;

read_lock(&tasklist_lock);
- if ((process = good_sigevent(&event))) {
+ if ((process = sigevent_find_task(&event))) {
/*
* We may be setting up this process for another
* thread. It may be exiting. To catch this
Index: linux-2.6.20-rc6-mm3/kernel/signal.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/kernel/signal.c 2007-01-30 11:41:27.000000000 +0100
+++ linux-2.6.20-rc6-mm3/kernel/signal.c 2007-01-30 11:41:36.000000000 +0100
@@ -1213,6 +1213,30 @@ int group_send_sig_info(int sig, struct
return ret;
}

+/***
+ * sigevent_find_task - check and get target task from a sigevent.
+ * @event: the sigevent to be checked
+ *
+ * This function must be called with the tasklist_lock held for reading.
+ */
+struct task_struct * sigevent_find_task(sigevent_t * event)
+{
+ struct task_struct *task = NULL;
+
+ if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
+ return NULL;
+
+ if ((event->sigev_notify & SIGEV_THREAD_ID ) == SIGEV_THREAD_ID) {
+ task = find_task_by_pid(event->sigev_notify_thread_id);
+
+ if (!task || task->tgid != current->tgid)
+ task = NULL;
+ } else if (event->sigev_notify == SIGEV_SIGNAL)
+ task = current->group_leader;
+
+ return task;
+}
+
/*
* kill_pgrp_info() sends a signal to a process group: this is what the tty
* control characters do (^C, ^Z etc)

2007-02-01 09:39:42

by Sébastien Dugué

[permalink] [raw]
Subject: [PATCH -mm 1/7][AIO] - Rework compat_sys_io_submit


From: S?bastien Dugu? <[email protected]>

compat_sys_io_submit() cleanup


Cleanup compat_sys_io_submit by duplicating some of the native syscall
logic in the compat layer and directly calling io_submit_one() instead
of fooling the syscall into thinking it is called from a native 64-bit
caller.

This eliminates:

- the overhead of copying the nr iocb pointers on the userspace stack

- the PAGE_SIZE/(sizeof(void *)) limit on the number of iocbs that
can be submitted.

This is also needed for the completion notification patch to avoid having
to rewrite each iocb on the caller stack for io_submit_one() to find the
sigevents.


compat.c | 61 ++++++++++++++++++++++++++++++++++---------------------------
1 file changed, 34 insertions(+), 27 deletions(-)

Signed-off-by: S?bastien Dugu? <[email protected]>

Index: linux-2.6.20-rc6-mm3/fs/compat.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/compat.c 2007-01-30 10:05:08.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/compat.c 2007-01-30 11:30:29.000000000 +0100
@@ -644,40 +644,47 @@ out:
return ret;
}

-static inline long
-copy_iocb(long nr, u32 __user *ptr32, struct iocb __user * __user *ptr64)
-{
- compat_uptr_t uptr;
- int i;
-
- for (i = 0; i < nr; ++i) {
- if (get_user(uptr, ptr32 + i))
- return -EFAULT;
- if (put_user(compat_ptr(uptr), ptr64 + i))
- return -EFAULT;
- }
- return 0;
-}
-
-#define MAX_AIO_SUBMITS (PAGE_SIZE/sizeof(struct iocb *))
-
asmlinkage long
compat_sys_io_submit(aio_context_t ctx_id, int nr, u32 __user *iocb)
{
- struct iocb __user * __user *iocb64;
- long ret;
+ struct kioctx *ctx;
+ long ret = 0;
+ int i;

if (unlikely(nr < 0))
return -EINVAL;

- if (nr > MAX_AIO_SUBMITS)
- nr = MAX_AIO_SUBMITS;
-
- iocb64 = compat_alloc_user_space(nr * sizeof(*iocb64));
- ret = copy_iocb(nr, iocb, iocb64);
- if (!ret)
- ret = sys_io_submit(ctx_id, nr, iocb64);
- return ret;
+ if (unlikely(!access_ok(VERIFY_READ, iocb, (nr * sizeof(u32)))))
+ return -EFAULT;
+
+ ctx = lookup_ioctx(ctx_id);
+ if (unlikely(!ctx))
+ return -EINVAL;
+
+ for (i=0; i<nr; i++) {
+ compat_uptr_t uptr;
+ struct iocb __user *user_iocb;
+ struct iocb tmp;
+
+ if (unlikely(get_user(uptr, iocb + i))) {
+ ret = -EFAULT;
+ break;
+ }
+
+ user_iocb = compat_ptr(uptr);
+
+ if (unlikely(copy_from_user(&tmp, user_iocb, sizeof(tmp)))) {
+ ret = -EFAULT;
+ break;
+ }
+
+ ret = io_submit_one(ctx, user_iocb, &tmp);
+ if (ret)
+ break;
+ }
+
+ put_ioctx(ctx);
+ return i ? i: ret;
}

struct compat_ncp_mount_data {

2007-02-01 09:40:14

by Sébastien Dugué

[permalink] [raw]
Subject: [PATCH -mm 7/7][AIO] - Add listio syscall support


From: Bharata B Rao <[email protected]>

This patch provides POSIX listio support by means of a new system call.

long lio_submit(aio_context_t ctx_id, int mode, long nr,
struct iocb __user * __user *iocbpp, struct sigevent __user *event)

This system call is similar to the io_submit() system call, but takes
two more arguments.

'mode' argument can be LIO_WAIT or LIO_NOWAIT.

'event' argument specifies the signal notification mechanism.

This patch is built on support provided by the aio signal notification
patch by Sebastien. The following two structures together provide
the support for grouping iocbs belonging to a list (lio).

struct aio_notify {
struct task_struct *target;
__u16 signo;
__u16 notify;
sigval_t value;
struct sigqueue *sigq;
};

struct lio_event {
atomic_t lio_users;
struct aio_notify lio_notify;
};

A single lio_event struct is maintained for the list of iocbs.
lio_users holds the number of requests attached to this lio and lio_notify
has the necessary information for generating completion notification
signal.

If the mode is LIO_WAIT, the event argument is ignored and the system
call waits until all the requests of the lio are completed.

If the mode is LIO_NOWAIT, the system call returns immediately after
submitting the io requests and may optionally notify the process on
list io completion depending on the event argument.

Signed-off-by: S?bastien Dugu? <[email protected]>
Signed-off-by: Laurent Vivier <[email protected]>
Signed-off-by: Bharata B Rao <[email protected]>
---

arch/i386/kernel/syscall_table.S | 1
arch/x86_64/ia32/ia32entry.S | 5 -
fs/aio.c | 193 ++++++++++++++++++++++++++++++++++-----
fs/compat.c | 125 +++++++++++++++++++++----
include/asm-i386/unistd.h | 3
include/asm-x86_64/unistd.h | 4
include/linux/aio.h | 14 ++
include/linux/aio_abi.h | 5 +
include/linux/syscalls.h | 2
9 files changed, 307 insertions(+), 45 deletions(-)

Index: linux-2.6.20-rc6-mm3/arch/i386/kernel/syscall_table.S
===================================================================
--- linux-2.6.20-rc6-mm3.orig/arch/i386/kernel/syscall_table.S 2007-01-31 15:31:59.000000000 +0100
+++ linux-2.6.20-rc6-mm3/arch/i386/kernel/syscall_table.S 2007-01-31 15:58:55.000000000 +0100
@@ -320,3 +320,4 @@ ENTRY(sys_call_table)
.long sys_getcpu
.long sys_epoll_pwait
.long sys_lutimesat /* 320 */
+ .long sys_lio_submit
Index: linux-2.6.20-rc6-mm3/arch/x86_64/ia32/ia32entry.S
===================================================================
--- linux-2.6.20-rc6-mm3.orig/arch/x86_64/ia32/ia32entry.S 2007-01-31 15:31:59.000000000 +0100
+++ linux-2.6.20-rc6-mm3/arch/x86_64/ia32/ia32entry.S 2007-01-31 15:58:55.000000000 +0100
@@ -714,8 +714,11 @@ ia32_sys_call_table:
.quad compat_sys_get_robust_list
.quad sys_splice
.quad sys_sync_file_range
- .quad sys_tee
+ .quad sys_tee /* 315 */
.quad compat_sys_vmsplice
.quad compat_sys_move_pages
.quad sys_getcpu
+ .quad quiet_ni_syscall /* sys_epoll_wait */
+ .quad quiet_ni_syscall /* 320: sys_lutimesat */
+ .quad compat_sys_lio_submit
ia32_syscall_end:
Index: linux-2.6.20-rc6-mm3/fs/aio.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c 2007-01-31 15:31:59.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c 2007-01-31 16:00:57.000000000 +0100
@@ -416,6 +416,7 @@ static struct kiocb fastcall *__aio_get_
req->ki_ctx = ctx;
req->ki_cancel = NULL;
req->ki_retry = NULL;
+ req->ki_lio = NULL;
req->ki_dtor = NULL;
req->private = NULL;
req->ki_iovec = NULL;
@@ -995,6 +996,72 @@ out_unlock:
return -EINVAL;
}

+/*
+ * lio_notify
+ * When all iocbs belonging to an lio are complete, this initiates
+ * the completion notification for the lio.
+ *
+ * NOTE: lio is freed here after the last iocb completes, but only
+ * for LIO_NOWAIT case. In case of LIO_WAIT, the submitting thread
+ * waits for iocbs to complete and later frees the lio.
+ */
+void lio_notify(struct lio_event *lio)
+{
+ int ret;
+
+ ret = atomic_dec_and_test(&lio->lio_users);
+
+ if (unlikely(ret) && lio->lio_notify.notify != SIGEV_NONE) {
+ /* last one -> notify process */
+ if (aio_send_signal(&lio->lio_notify))
+ __sigqueue_free(lio->lio_notify.sigq);
+ kfree(lio);
+ }
+}
+
+/*
+ * lio_create
+ * Allocates a lio_event structure which groups the iocbs belonging
+ * to the lio and sets up the completion notification.
+ *
+ * Case 1: LIO_NOWAIT and NULL sigevent - No need to group the iocbs
+ * and hence lio_event is not created.
+ * Case 2: LIO_NOWAIT and sigvent - lio_event is created and notification
+ * mechanism is setup.
+ * Case 3: LIO_WAIT - lio_event is created and sigevent is ignored.
+ */
+struct lio_event *lio_create(struct sigevent __user *user_event,
+ int mode)
+{
+ int ret = 0;
+ struct lio_event *lio = NULL;
+
+ if (unlikely((mode == LIO_NOWAIT) && !user_event))
+ return lio;
+
+ lio = kzalloc(sizeof(*lio), GFP_KERNEL);
+
+ if (!lio)
+ return ERR_PTR(-EAGAIN);
+
+ /*
+ * Grab an initial ref on the lio to avoid races between
+ * submission and completion.
+ */
+ atomic_set(&lio->lio_users, 1);
+
+ lio->lio_notify.notify = SIGEV_NONE;
+
+ if (user_event && (mode == LIO_NOWAIT)) {
+ ret = aio_setup_sigevent(&lio->lio_notify, user_event);
+ if (ret) {
+ kfree(lio);
+ return ERR_PTR(ret);
+ }
+ }
+ return lio;
+}
+
/* aio_complete
* Called when the io request on the given iocb is complete.
* Returns true if this is the last user of the request. The
@@ -1043,6 +1110,9 @@ int fastcall aio_complete(struct kiocb *
* when the event got cancelled.
*/
if (kiocbIsCancelled(iocb)) {
+ if (iocb->ki_lio)
+ lio_notify(iocb->ki_lio);
+
if (iocb->ki_notify.sigq)
__sigqueue_free(iocb->ki_notify.sigq);
goto put_rq;
@@ -1085,6 +1155,8 @@ int fastcall aio_complete(struct kiocb *
__sigqueue_free(iocb->ki_notify.sigq);
}

+ if (iocb->ki_lio)
+ lio_notify(iocb->ki_lio);
put_rq:
/* everything turned out well, dispose of the aiocb. */
ret = __aio_put_req(ctx, iocb);
@@ -1629,7 +1701,7 @@ static int aio_wake_function(wait_queue_
}

int fastcall io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
- struct iocb *iocb)
+ struct iocb *iocb, struct lio_event *lio)
{
struct kiocb *req;
struct file *file;
@@ -1690,6 +1762,9 @@ int fastcall io_submit_one(struct kioctx
goto out_put_req;
}

+ /* Attach this iocb to its lio */
+ req->ki_lio = lio;
+
ret = aio_setup_iocb(req);

if (ret)
@@ -1717,6 +1792,48 @@ out_put_req:
return ret;
}

+static int io_submit_group(struct kioctx *ctx, long nr,
+ struct iocb __user * __user *iocbpp, struct lio_event *lio)
+{
+ int i;
+ long ret = 0;
+
+ /*
+ * AKPM: should this return a partial result if some of the IOs were
+ * successfully submitted?
+ */
+ for (i = 0; i < nr; i++) {
+ struct iocb __user *user_iocb;
+ struct iocb tmp;
+
+ if (unlikely(__get_user(user_iocb, iocbpp + i))) {
+ ret = -EFAULT;
+ break;
+ }
+
+ if (unlikely(copy_from_user(&tmp, user_iocb, sizeof(tmp)))) {
+ ret = -EFAULT;
+ break;
+ }
+
+ if (lio)
+ atomic_inc(&lio->lio_users);
+
+ ret = io_submit_one(ctx, user_iocb, &tmp, lio);
+ if (ret) {
+ if (lio) {
+ /*
+ * In case of listio, continue with
+ * the subsequent requests
+ */
+ atomic_dec(&lio->lio_users);
+ } else
+ break;
+ }
+ }
+ return i ? i : ret;
+}
+
/* sys_io_submit:
* Queue the nr iocbs pointed to by iocbpp for processing. Returns
* the number of iocbs queued. May return -EINVAL if the aio_context
@@ -1734,7 +1851,6 @@ asmlinkage long sys_io_submit(aio_contex
{
struct kioctx *ctx;
long ret = 0;
- int i;

if (unlikely((nr*sizeof(*iocbpp)) < 0))
return -EINVAL;
@@ -1748,31 +1864,61 @@ asmlinkage long sys_io_submit(aio_contex
return -EINVAL;
}

- /*
- * AKPM: should this return a partial result if some of the IOs were
- * successfully submitted?
- */
- for (i=0; i<nr; i++) {
- struct iocb __user *user_iocb;
- struct iocb tmp;
+ ret = io_submit_group(ctx, nr, iocbpp, NULL);

- if (unlikely(__get_user(user_iocb, iocbpp + i))) {
- ret = -EFAULT;
- break;
- }
+ put_ioctx(ctx);
+ return ret;
+}

- if (unlikely(copy_from_user(&tmp, user_iocb, sizeof(tmp)))) {
- ret = -EFAULT;
- break;
- }
+asmlinkage long sys_lio_submit(aio_context_t ctx_id, int mode, long nr,
+ struct iocb __user * __user *iocbpp, struct sigevent __user *event)
+{
+ struct kioctx *ctx;
+ struct lio_event *lio = NULL;
+ long ret = 0;

- ret = io_submit_one(ctx, user_iocb, &tmp);
- if (ret)
- break;
+ if (unlikely((nr*sizeof(*iocbpp)) < 0))
+ return -EINVAL;
+
+ if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(*iocbpp)))))
+ return -EFAULT;
+
+ ctx = lookup_ioctx(ctx_id);
+ if (unlikely(!ctx)) {
+ pr_debug("EINVAL: lio_submit: invalid context id\n");
+ return -EINVAL;
+ }
+
+ lio = lio_create(event, mode);
+
+ ret = PTR_ERR(lio);
+ if (IS_ERR(lio))
+ goto out_put_ctx;
+
+ ret = io_submit_group(ctx, nr, iocbpp, lio);
+
+ /* If we failed to submit even one request just return */
+ if (ret < 0 ) {
+ if (lio)
+ kfree(lio);
+ goto out_put_ctx;
+ }
+
+ /*
+ * Drop extra ref on the lio now that we're done submitting requests.
+ */
+ if (lio)
+ lio_notify(lio);
+
+ if (mode == LIO_WAIT) {
+ wait_event(ctx->wait, atomic_read(&lio->lio_users) == 0);
+ kfree(lio);
}

+out_put_ctx:
put_ioctx(ctx);
- return i ? i : ret;
+
+ return ret;
}

/* lookup_kiocb
Index: linux-2.6.20-rc6-mm3/fs/compat.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/compat.c 2007-01-31 15:31:59.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/compat.c 2007-01-31 16:03:02.000000000 +0100
@@ -644,24 +644,13 @@ out:
return ret;
}

-asmlinkage long
-compat_sys_io_submit(aio_context_t ctx_id, int nr, u32 __user *iocb)
+static int compat_io_submit_group(struct kioctx *ctx, long nr,
+ u32 __user *iocb, struct lio_event *lio)
{
- struct kioctx *ctx;
- long ret = 0;
int i;
+ long ret = 0;

- if (unlikely((nr * sizeof(u32)) < 0))
- return -EINVAL;
-
- if (unlikely(!access_ok(VERIFY_READ, iocb, (nr * sizeof(u32)))))
- return -EFAULT;
-
- ctx = lookup_ioctx(ctx_id);
- if (unlikely(!ctx))
- return -EINVAL;
-
- for (i=0; i<nr; i++) {
+ for (i = 0; i < nr; i++) {
compat_uptr_t uptr;
struct iocb __user *user_iocb;
struct iocb tmp;
@@ -696,13 +685,105 @@ compat_sys_io_submit(aio_context_t ctx_i
tmp.aio_sigeventp = (__u64)event;
}

- ret = io_submit_one(ctx, user_iocb, &tmp);
- if (ret)
- break;
+ if (lio)
+ atomic_inc(&lio->lio_users);
+
+ ret = io_submit_one(ctx, user_iocb, &tmp, lio);
+ if (ret) {
+ if (lio) {
+ /*
+ * In case of listio, continue with
+ * the subsequent requests
+ */
+ atomic_dec(&lio->lio_users);
+ } else
+ break;
+ }
+
+
}
+ return i ? i : ret;
+}

+asmlinkage long
+compat_sys_io_submit(aio_context_t ctx_id, int nr, u32 __user *iocb)
+{
+ struct kioctx *ctx;
+ long ret = 0;
+
+ if (unlikely((nr*sizeof(u32)) < 0))
+ return -EINVAL;
+
+ if (unlikely(!access_ok(VERIFY_READ, iocb, (nr * sizeof(u32)))))
+ return -EFAULT;
+
+ ctx = lookup_ioctx(ctx_id);
+ if (unlikely(!ctx))
+ return -EINVAL;
+
+ ret = compat_io_submit_group(ctx, nr, iocb, NULL);
put_ioctx(ctx);
- return i ? i: ret;
+ return ret;
+}
+
+asmlinkage long
+compat_sys_lio_submit(aio_context_t ctx_id, int mode, int nr, u32 __user *iocb,
+ struct compat_sigevent __user *sig_user)
+{
+ struct kioctx *ctx;
+ struct lio_event *lio = NULL;
+ struct sigevent __user *event = NULL;
+ long ret = 0;
+
+ if (unlikely((nr*sizeof(u32)) < 0))
+ return -EINVAL;
+
+ if (unlikely(!access_ok(VERIFY_READ, iocb, (nr * sizeof(u32)))))
+ return -EFAULT;
+
+ ctx = lookup_ioctx(ctx_id);
+ if (unlikely(!ctx))
+ return -EINVAL;
+
+ if (sig_user) {
+ struct sigevent kevent;
+ event = compat_alloc_user_space(sizeof(struct sigevent));
+ if (get_compat_sigevent(&kevent, sig_user) ||
+ copy_to_user(event, &kevent, sizeof(struct sigevent))) {
+ ret = -EFAULT;
+ goto out_put_ctx;
+ }
+ }
+
+ lio = lio_create(event, mode);
+
+ ret = PTR_ERR(lio);
+ if (IS_ERR(lio))
+ goto out_put_ctx;
+
+ ret = compat_io_submit_group(ctx, nr, iocb, lio);
+
+ /* If we failed to submit even one request just return */
+ if (ret < 0) {
+ if (lio)
+ kfree(lio);
+ goto out_put_ctx;
+ }
+
+ /*
+ * Drop extra ref on the lio now that we're done submitting requests.
+ */
+ if (lio)
+ lio_notify(lio);
+
+ if (mode == LIO_WAIT) {
+ wait_event(ctx->wait, atomic_read(&lio->lio_users) == 0);
+ kfree(lio);
+ }
+
+out_put_ctx:
+ put_ioctx(ctx);
+ return ret;
}

struct compat_ncp_mount_data {
Index: linux-2.6.20-rc6-mm3/include/asm-i386/unistd.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/asm-i386/unistd.h 2007-01-31 15:31:59.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/asm-i386/unistd.h 2007-01-31 15:58:55.000000000 +0100
@@ -326,10 +326,11 @@
#define __NR_getcpu 318
#define __NR_epoll_pwait 319
#define __NR_lutimesat 320
+#define __NR_lio_submit 321

#ifdef __KERNEL__

-#define NR_syscalls 321
+#define NR_syscalls 322

#define __ARCH_WANT_IPC_PARSE_VERSION
#define __ARCH_WANT_OLD_READDIR
Index: linux-2.6.20-rc6-mm3/include/asm-x86_64/unistd.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/asm-x86_64/unistd.h 2007-01-31 15:31:59.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/asm-x86_64/unistd.h 2007-01-31 15:58:55.000000000 +0100
@@ -619,8 +619,10 @@ __SYSCALL(__NR_sync_file_range, sys_sync
__SYSCALL(__NR_vmsplice, sys_vmsplice)
#define __NR_move_pages 279
__SYSCALL(__NR_move_pages, sys_move_pages)
+#define __NR_lio_submit 280
+__SYSCALL(__NR_lio_submit, sys_lio_submit)

-#define __NR_syscall_max __NR_move_pages
+#define __NR_syscall_max __NR_lio_submit

#ifndef __NO_STUBS
#define __ARCH_WANT_OLD_READDIR
Index: linux-2.6.20-rc6-mm3/include/linux/aio_abi.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/linux/aio_abi.h 2007-01-31 15:31:59.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/aio_abi.h 2007-01-31 15:58:55.000000000 +0100
@@ -45,6 +45,11 @@ enum {
IOCB_CMD_PWRITEV = 8,
};

+enum {
+ LIO_WAIT = 0,
+ LIO_NOWAIT = 1,
+};
+
/* read() from /dev/aio returns these structures. */
struct io_event {
__u64 data; /* the data field from the iocb */
Index: linux-2.6.20-rc6-mm3/include/linux/aio.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/linux/aio.h 2007-01-31 15:31:59.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/aio.h 2007-01-31 15:58:55.000000000 +0100
@@ -63,6 +63,11 @@ struct aio_notify {
struct sigqueue *sigq;
};

+struct lio_event {
+ atomic_t lio_users;
+ struct aio_notify lio_notify;
+};
+
/* is there a better place to document function pointer methods? */
/**
* ki_retry - iocb forward progress callback
@@ -117,6 +122,8 @@ struct kiocb {
__u64 ki_user_data; /* user's data for completion */
struct wait_bit_queue ki_wait;
loff_t ki_pos;
+ /* lio this iocb might be attached to */
+ struct lio_event *ki_lio;

atomic_t ki_bio_count; /* num bio used for this iocb */
void *private;
@@ -225,12 +232,15 @@ struct mm_struct;
extern void FASTCALL(exit_aio(struct mm_struct *mm));
extern struct kioctx *lookup_ioctx(unsigned long ctx_id);
extern int FASTCALL(io_submit_one(struct kioctx *ctx,
- struct iocb __user *user_iocb, struct iocb *iocb));
+ struct iocb __user *user_iocb, struct iocb *iocb,
+ struct lio_event *lio));
+struct lio_event *lio_create(struct sigevent __user *user_event, int mode);
+void lio_notify(struct lio_event *lio);

/* semi private, but used by the 32bit emulations: */
struct kioctx *lookup_ioctx(unsigned long ctx_id);
int FASTCALL(io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
- struct iocb *iocb));
+ struct iocb *iocb, struct lio_event *lio));

#define get_ioctx(kioctx) do { \
BUG_ON(atomic_read(&(kioctx)->users) <= 0); \
Index: linux-2.6.20-rc6-mm3/include/linux/syscalls.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/linux/syscalls.h 2007-01-31 15:31:59.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/syscalls.h 2007-01-31 15:58:55.000000000 +0100
@@ -317,6 +317,8 @@ asmlinkage long sys_io_getevents(aio_con
struct timespec __user *timeout);
asmlinkage long sys_io_submit(aio_context_t, long,
struct iocb __user * __user *);
+asmlinkage long sys_lio_submit(aio_context_t, int, long,
+ struct iocb __user * __user *, struct sigevent __user *);
asmlinkage long sys_io_cancel(aio_context_t ctx_id, struct iocb __user *iocb,
struct io_event __user *result);
asmlinkage ssize_t sys_sendfile(int out_fd, int in_fd,

2007-02-01 09:40:14

by Sébastien Dugué

[permalink] [raw]
Subject: [PATCH -mm 2/7][AIO] - fix aio.h includes


From: S?bastien Dugu? <[email protected]>

Fix the double inclusion of linux/uio.h in linux/aio.h



aio.h | 1 -
1 file changed, 1 deletion(-)

Signed-off-by: S?bastien Dugu? <[email protected]>

Index: linux-2.6.20-rc6-mm3/include/linux/aio.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/linux/aio.h 2007-01-30 10:05:08.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/aio.h 2007-01-30 10:11:22.000000000 +0100
@@ -7,7 +7,6 @@
#include <linux/uio.h>

#include <asm/atomic.h>
-#include <linux/uio.h>

#define AIO_MAXSEGS 4
#define AIO_KIOGRP_NR_ATOMIC 8

2007-02-01 09:40:16

by Sébastien Dugué

[permalink] [raw]
Subject: [PATCH -mm 3/7][AIO] - Fix access_ok() checks


From: S?bastien Dugu? <[email protected]>

Fix access_ok() checks in sys_io_submit() and
compat_sys_io_submit()


sys_io_submit() and compat_sys_io_submit() check for the number of
requests (nr) being positive, but the following access_ok() call uses
nr * sizeof(ptr) which can overlow to a negative number.

Just make sure that nr * sizeof(ptr) is positive in the first place.


aio.c | 2 +-
compat.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

Signed-off-by: S?bastien Dugu? <[email protected]>

Index: linux-2.6.20-rc6-mm3/fs/aio.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c 2007-01-30 11:28:56.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c 2007-01-30 11:32:42.000000000 +0100
@@ -1630,7 +1630,7 @@ asmlinkage long sys_io_submit(aio_contex
long ret = 0;
int i;

- if (unlikely(nr < 0))
+ if (unlikely((nr*sizeof(*iocbpp)) < 0))
return -EINVAL;

if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(*iocbpp)))))
Index: linux-2.6.20-rc6-mm3/fs/compat.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/compat.c 2007-01-30 11:30:29.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/compat.c 2007-01-30 11:31:55.000000000 +0100
@@ -651,7 +651,7 @@ compat_sys_io_submit(aio_context_t ctx_i
long ret = 0;
int i;

- if (unlikely(nr < 0))
+ if (unlikely((nr * sizeof(u32)) < 0))
return -EINVAL;

if (unlikely(!access_ok(VERIFY_READ, iocb, (nr * sizeof(u32)))))

2007-02-01 09:40:14

by Sébastien Dugué

[permalink] [raw]
Subject: [PATCH -mm 5/7][AIO] - Make __sigqueue_free() and __sigqueue_alloc() non static


From: S?bastien Dugu? <[email protected]>

Make __sigqueue_alloc() and __sigqueue_free() non static

Allow subsystems to directly call into __sigqueue_alloc() and __sigqueue_free.
This is used by the AIO signal notification patch.


include/linux/signal.h | 3 +++
kernel/signal.c | 6 +++---
2 files changed, 6 insertions(+), 3 deletions(-)

Signed-off-by: S?bastien Dugu? <[email protected]>


Index: linux-2.6.20-rc6-mm3/include/linux/signal.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/linux/signal.h 2007-01-30 11:41:36.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/signal.h 2007-01-30 11:41:39.000000000 +0100
@@ -241,6 +241,9 @@ extern int sigprocmask(int, sigset_t *,
struct pt_regs;
extern int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka, struct pt_regs *regs, void *cookie);
extern struct task_struct * sigevent_find_task(sigevent_t *);
+extern struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
+ int override_rlimit);
+extern void __sigqueue_free(struct sigqueue *q);

extern struct kmem_cache *sighand_cachep;

Index: linux-2.6.20-rc6-mm3/kernel/signal.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/kernel/signal.c 2007-01-30 11:41:36.000000000 +0100
+++ linux-2.6.20-rc6-mm3/kernel/signal.c 2007-01-30 11:41:39.000000000 +0100
@@ -268,8 +268,8 @@ next_signal(struct sigpending *pending,
return sig;
}

-static struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
- int override_rlimit)
+struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
+ int override_rlimit)
{
struct sigqueue *q = NULL;
struct user_struct *user;
@@ -295,7 +295,7 @@ static struct sigqueue *__sigqueue_alloc
return(q);
}

-static void __sigqueue_free(struct sigqueue *q)
+void __sigqueue_free(struct sigqueue *q)
{
if (q->flags & SIGQUEUE_PREALLOC)
return;

2007-02-02 18:00:54

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH -mm 4/7][AIO] - Make good_sigevent() non-static

On 02/01, S?bastien Dugu? wrote:
>
> +struct task_struct * sigevent_find_task(sigevent_t * event)
> +{
> + struct task_struct *task = NULL;
> +
> + if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
> + return NULL;
> +
> + if ((event->sigev_notify & SIGEV_THREAD_ID ) == SIGEV_THREAD_ID) {
> + task = find_task_by_pid(event->sigev_notify_thread_id);
> +
> + if (!task || task->tgid != current->tgid)
> + task = NULL;
> + } else if (event->sigev_notify == SIGEV_SIGNAL)
> + task = current->group_leader;
> +
> + return task;
> +}

I am afraid this is still not right. Consider

->sigev_notify == SIGEV_THREAD_ID | RANDOM_BIT

Now, the second "if (SIGEV_THREAD_ID)" returns a valid task. However,

really_put_req:

if (notify == SIGEV_THREAD_ID || notify == SIGEV_SIGNAL)
put_task_struct();

doesn't work, so we have task_struct leak.

Worse, this breaks posix-timers. Note that posix-timers allow SIGEV_NONE,
the timer is not queued in that case, we shouldn't do ->sigev_signo check.
This means that aio should check SIGEV_NONE itself.

Also, it is critical for posix-timers that SIGEV_THREAD_ID doesn't come
with another bit (like in the example below), note the code like

if (sigev_notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID))
...

IOW: good_sigevent() in its current form is very cryptic, and it _really_
needs a cleanup, but we should not change its behaviour.

Apart from this, I don't see other problems in the signal related code in
this series.

Oleg.

2007-02-05 11:10:45

by Sébastien Dugué

[permalink] [raw]
Subject: Re: [PATCH -mm 4/7][AIO] - Make good_sigevent() non-static


Hi Oleg,

thanks for your comments.

On Fri, 2 Feb 2007 21:00:39 +0300 Oleg Nesterov <[email protected]> wrote:

> On 02/01, S?bastien Dugu? wrote:
> >
> > +struct task_struct * sigevent_find_task(sigevent_t * event)
> > +{
> > + struct task_struct *task = NULL;
> > +
> > + if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
> > + return NULL;
> > +
> > + if ((event->sigev_notify & SIGEV_THREAD_ID ) == SIGEV_THREAD_ID) {
> > + task = find_task_by_pid(event->sigev_notify_thread_id);
> > +
> > + if (!task || task->tgid != current->tgid)
> > + task = NULL;
> > + } else if (event->sigev_notify == SIGEV_SIGNAL)
> > + task = current->group_leader;
> > +
> > + return task;
> > +}
>
> I am afraid this is still not right. Consider
>
> ->sigev_notify == SIGEV_THREAD_ID | RANDOM_BIT
>
> Now, the second "if (SIGEV_THREAD_ID)" returns a valid task. However,
>
> really_put_req:
>
> if (notify == SIGEV_THREAD_ID || notify == SIGEV_SIGNAL)
> put_task_struct();
>
> doesn't work, so we have task_struct leak.

Right, I'll revert back to the old code with cleanups.

>
> Worse, this breaks posix-timers. Note that posix-timers allow SIGEV_NONE,
> the timer is not queued in that case, we shouldn't do ->sigev_signo check.
> This means that aio should check SIGEV_NONE itself.
>
> Also, it is critical for posix-timers that SIGEV_THREAD_ID doesn't come
> with another bit (like in the example below), note the code like
>
> if (sigev_notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID))
> ...
>
> IOW: good_sigevent() in its current form is very cryptic, and it _really_
> needs a cleanup, but we should not change its behaviour.

Yep. I must admit that I didn't pay enough attention to this 10-liner,
but in the end it rightfully backfired on me.

>
> Apart from this, I don't see other problems in the signal related code in
> this series.
>
> Oleg.
>

Thanks again for your review.

S?bastien.

2007-02-05 12:19:55

by Sébastien Dugué

[permalink] [raw]
Subject: [PATCH -mm 4/7][AIO] Resend - Make good_sigevent() non-static


Andrew,

Could you replace make-good_sigevent-non-static.patch with this
patch, as it fixes some problems reported by Oleg Nesterov.

Oleg, chime in if you still see something I missed.

Hopefully I will have gotten it right this time.

Sorry for the hassle.

S?bastien.


From: S?bastien Dugu? <[email protected]>

Make good_sigevent() non-static and rename it


Move good_sigevent() from posix-timers.c to signal.c where it belongs,
clean it up, rename it to sigevent_find_task() to better describe what the
function does and make it non-static so that it can be used by other
subsystems.


include/linux/signal.h | 1 +
kernel/posix-timers.c | 19 +------------------
kernel/signal.c | 24 ++++++++++++++++++++++++
3 files changed, 26 insertions(+), 18 deletions(-)

Signed-off-by: S?bastien Dugu? <[email protected]>


Index: linux-2.6.20-rc6-mm3/include/linux/signal.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/linux/signal.h 2007-01-30 11:41:27.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/signal.h 2007-02-05 12:25:12.000000000 +0100
@@ -240,6 +240,7 @@ extern int sigprocmask(int, sigset_t *,

struct pt_regs;
extern int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka, struct pt_regs *regs, void *cookie);
+extern struct task_struct * sigevent_find_task(sigevent_t *);

extern struct kmem_cache *sighand_cachep;

Index: linux-2.6.20-rc6-mm3/kernel/posix-timers.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/kernel/posix-timers.c 2007-01-30 11:41:27.000000000 +0100
+++ linux-2.6.20-rc6-mm3/kernel/posix-timers.c 2007-01-30 11:41:36.000000000 +0100
@@ -367,23 +367,6 @@ static enum hrtimer_restart posix_timer_
return ret;
}

-static struct task_struct * good_sigevent(sigevent_t * event)
-{
- struct task_struct *rtn = current->group_leader;
-
- if ((event->sigev_notify & SIGEV_THREAD_ID ) &&
- (!(rtn = find_task_by_pid(event->sigev_notify_thread_id)) ||
- rtn->tgid != current->tgid ||
- (event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_SIGNAL))
- return NULL;
-
- if (((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) &&
- ((event->sigev_signo <= 0) || (event->sigev_signo > SIGRTMAX)))
- return NULL;
-
- return rtn;
-}
-
void register_posix_clock(const clockid_t clock_id, struct k_clock *new_clock)
{
if ((unsigned) clock_id >= MAX_CLOCKS) {
@@ -496,7 +479,7 @@ sys_timer_create(const clockid_t which_c
new_timer->it_sigev_value = event.sigev_value;

read_lock(&tasklist_lock);
- if ((process = good_sigevent(&event))) {
+ if ((process = sigevent_find_task(&event))) {
/*
* We may be setting up this process for another
* thread. It may be exiting. To catch this
Index: linux-2.6.20-rc6-mm3/kernel/signal.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/kernel/signal.c 2007-01-30 11:41:27.000000000 +0100
+++ linux-2.6.20-rc6-mm3/kernel/signal.c 2007-02-05 12:27:33.000000000 +0100
@@ -1213,6 +1213,34 @@ int group_send_sig_info(int sig, struct
return ret;
}

+/***
+ * sigevent_find_task - check and get target task from a sigevent.
+ * @event: the sigevent to be checked
+ *
+ * This function must be called with the tasklist_lock held for reading.
+ */
+struct task_struct * sigevent_find_task(sigevent_t * event)
+{
+ struct task_struct *task = current->group_leader;
+
+ if ((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) {
+ if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
+ return NULL;
+ }
+
+ if (event->sigev_notify & SIGEV_THREAD_ID) {
+ if ((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_SIGNAL)
+ return NULL;
+
+ task = find_task_by_pid(event->sigev_notify_thread_id);
+
+ if (!task || task->tgid != current->tgid)
+ task = NULL;
+ }
+
+ return task;
+}
+
/*
* kill_pgrp_info() sends a signal to a process group: this is what the tty
* control characters do (^C, ^Z etc)

2007-02-05 13:41:25

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH -mm 4/7][AIO] Resend - Make good_sigevent() non-static

On 02/05, S?bastien Dugu? wrote:
>
> +struct task_struct * sigevent_find_task(sigevent_t * event)
> +{
> + struct task_struct *task = current->group_leader;
> +
> + if ((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) {
> + if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
> + return NULL;
> + }
> +
> + if (event->sigev_notify & SIGEV_THREAD_ID) {
> + if ((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_SIGNAL)
> + return NULL;
> +
> + task = find_task_by_pid(event->sigev_notify_thread_id);
> +
> + if (!task || task->tgid != current->tgid)
> + task = NULL;
> + }
> +
> + return task;
> +}

I think this patch is correct, and the code is much more readable than
the old good_sigevet().

But please don't forget that PATCH 6/7 still needs a small fixup, we can
leak a task_struct in really_put_req()

if (notify == SIGEV_THREAD_ID || notify == SIGEV_SIGNAL)
put_task_struct(req->ki_notify.target);

because without SIGEV_THREAD_ID ->sigev_notify can have any bit/bits set,
even SIGEV_NONE (along with some other bit, aio_setup_sigevent checks for
== SIGEV_NONE).

Oleg.

2007-02-05 16:02:14

by Sébastien Dugué

[permalink] [raw]
Subject: [PATCH -mm][AIO] Fix AIO completion signal notification possible ref leak


Andrew,

here is an incremental patch to fix a possible ref leak when users
do weird things with the ->sigev_notify flags.

Thanks Oleg for spotting this.

S?bastien.


From: S?bastien Dugu? <[email protected]>

Fix AIO completion signal notification possible ref leak

Make sure we only accept valid sigev_notify values in aio_setup_sigevent(),
namely SIGEV_NONE, SIGEV_THREAD_ID or SIGEV_SIGNAL.

aio.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

Signed-off-by: S?bastien Dugu? <[email protected]>


Index: linux-2.6.20-rc6-mm3/fs/aio.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c 2007-02-05 16:50:27.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c 2007-02-05 16:53:43.000000000 +0100
@@ -939,7 +939,7 @@ static int aio_send_signal(struct aio_no
info->si_uid = 0;
info->si_value = notify->value;

- if (notify->notify & SIGEV_THREAD_ID)
+ if (notify->notify == SIGEV_THREAD_ID)
ret = send_sigqueue(notify->signo, sigq, notify->target);
else
ret = send_group_sigqueue(notify->signo, sigq, notify->target);
@@ -959,6 +959,10 @@ static long aio_setup_sigevent(struct ai
if (event.sigev_notify == SIGEV_NONE)
return 0;

+ if (event.sigev_notify != SIGEV_SIGNAL &&
+ event.sigev_notify != SIGEV_THREAD_ID)
+ return -EINVAL;
+
notify->notify = event.sigev_notify;
notify->signo = event.sigev_signo;
notify->value = event.sigev_value;

2007-02-05 17:11:25

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH -mm][AIO] Fix AIO completion signal notification possible ref leak

On 02/05, S?bastien Dugu? wrote:
>
> Make sure we only accept valid sigev_notify values in aio_setup_sigevent(),
> namely SIGEV_NONE, SIGEV_THREAD_ID or SIGEV_SIGNAL.

I think this is correct, but I have another concern (most probably I just
confused looking at non-applied patch), could you re-check?

> @@ -959,6 +959,10 @@ static long aio_setup_sigevent(struct ai
> if (event.sigev_notify == SIGEV_NONE)
> return 0;
>
> + if (event.sigev_notify != SIGEV_SIGNAL &&
> + event.sigev_notify != SIGEV_THREAD_ID)
> + return -EINVAL;
> +
> notify->notify = event.sigev_notify;
> notify->signo = event.sigev_signo;
> notify->value = event.sigev_value;

Ok. But what if sigevent_find_task() fails after that? Doesn't this mean
that really_put_req() will do put_task_struct(NULL) ?

Oleg.

2007-02-06 08:33:08

by Sébastien Dugué

[permalink] [raw]
Subject: Re: [PATCH -mm][AIO] Fix AIO completion signal notification possible ref leak

On Mon, 5 Feb 2007 20:13:35 +0300 Oleg Nesterov <[email protected]> wrote:

> On 02/05, S?bastien Dugu? wrote:
> >
> > Make sure we only accept valid sigev_notify values in aio_setup_sigevent(),
> > namely SIGEV_NONE, SIGEV_THREAD_ID or SIGEV_SIGNAL.
>
> I think this is correct, but I have another concern (most probably I just
> confused looking at non-applied patch), could you re-check?
>
> > @@ -959,6 +959,10 @@ static long aio_setup_sigevent(struct ai
> > if (event.sigev_notify == SIGEV_NONE)
> > return 0;
> >
> > + if (event.sigev_notify != SIGEV_SIGNAL &&
> > + event.sigev_notify != SIGEV_THREAD_ID)
> > + return -EINVAL;
> > +
> > notify->notify = event.sigev_notify;
> > notify->signo = event.sigev_signo;
> > notify->value = event.sigev_value;
>
> Ok. But what if sigevent_find_task() fails after that? Doesn't this mean
> that really_put_req() will do put_task_struct(NULL) ?
>

Argh, right, a patch to fix that and a couple of other corner cases to
follow soon.

Thanks Oleg for looking through this.

S?bastien.

2007-02-06 09:24:13

by Sébastien Dugué

[permalink] [raw]
Subject: [PATCH -mm][AIO] AIO completion signal notification fixes and cleanups


Andrew, here is another incremental patch which does a bit of cleanup
as well as fixing a possible release on a task ref that was not taken.

Thanks,

S?bastien.


From: S?bastien Dugu? <[email protected]>

AIO completion signal notification misc fixes and cleanups

This patches cleans up the notification path and fixes a possible
release on a task ref that was not taken in aio_setup_sigevent().


aio.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

Signed-off-by: S?bastien Dugu? <[email protected]>


Index: linux-2.6.20-rc6-mm3/fs/aio.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c 2007-02-05 16:53:43.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c 2007-02-06 09:33:55.000000000 +0100
@@ -469,8 +469,7 @@ static inline void really_put_req(struct
kfree(req->ki_iovec);

/* Release task ref */
- if (req->ki_notify.notify == SIGEV_THREAD_ID ||
- req->ki_notify.notify == SIGEV_SIGNAL)
+ if (req->ki_notify.notify != SIGEV_NONE)
put_task_struct(req->ki_notify.target);

kmem_cache_free(kiocb_cachep, req);
@@ -970,8 +969,14 @@ static long aio_setup_sigevent(struct ai
rcu_read_lock();
target = sigevent_find_task(&event);

- if (unlikely(!target))
+ if (unlikely(!target)) {
+ /*
+ * Revert notify to SIGEV_NONE so that really_put_req()
+ * knows that no ref has been taken on a task.
+ */
+ notify->notify = SIGEV_NONE;
goto out_unlock;
+ }

/*
* At this point, we know that notify is either SIGEV_SIGNAL or
@@ -996,7 +1001,7 @@ static long aio_setup_sigevent(struct ai
return 0;

out_unlock:
- read_unlock(&tasklist_lock);
+ rcu_read_unlock();
return -EINVAL;
}

@@ -1763,7 +1768,7 @@ int fastcall io_submit_one(struct kioctx
(struct sigevent __user *)(unsigned long)
iocb->aio_sigeventp);
if (ret)
- goto out_put_req;
+ goto out_sigqfree;
}

/* Attach this iocb to its lio */

2007-02-06 11:06:04

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH -mm][AIO] AIO completion signal notification fixes and cleanups

On 02/06, S?bastien Dugu? wrote:
>
> @@ -970,8 +969,14 @@ static long aio_setup_sigevent(struct ai
> rcu_read_lock();
> target = sigevent_find_task(&event);
>
> - if (unlikely(!target))
> + if (unlikely(!target)) {
> + /*
> + * Revert notify to SIGEV_NONE so that really_put_req()
> + * knows that no ref has been taken on a task.
> + */
> + notify->notify = SIGEV_NONE;
> goto out_unlock;
> + }

Very minor nit, feel free to ignore.

Isn't it better to move "notify->* = event.*;" down, when we know that
target != NULL. Imho, a bit easier to follow. This way we don't need to
reset notify->notify = SIGEV_NONE.

aio_setup_sigevent() relies on the fact that ->notify = SIGEV_NONE on
entry anyway.

Oleg.

2007-02-06 11:41:10

by Sébastien Dugué

[permalink] [raw]
Subject: Re: [PATCH -mm][AIO] AIO completion signal notification fixes and cleanups

On Tue, 6 Feb 2007 14:05:39 +0300 Oleg Nesterov <[email protected]> wrote:

> On 02/06, S?bastien Dugu? wrote:
> >
> > @@ -970,8 +969,14 @@ static long aio_setup_sigevent(struct ai
> > rcu_read_lock();
> > target = sigevent_find_task(&event);
> >
> > - if (unlikely(!target))
> > + if (unlikely(!target)) {
> > + /*
> > + * Revert notify to SIGEV_NONE so that really_put_req()
> > + * knows that no ref has been taken on a task.
> > + */
> > + notify->notify = SIGEV_NONE;
> > goto out_unlock;
> > + }
>
> Very minor nit, feel free to ignore.
>
> Isn't it better to move "notify->* = event.*;" down, when we know that
> target != NULL. Imho, a bit easier to follow. This way we don't need to
> reset notify->notify = SIGEV_NONE.
>
> aio_setup_sigevent() relies on the fact that ->notify = SIGEV_NONE on
> entry anyway.

Yep, right, it will make things cleaner.

Thanks,

S?bastien.

2007-02-06 11:58:30

by Sébastien Dugué

[permalink] [raw]
Subject: [PATCH -mm][AIO] AIO completion signal notification small cleanup


Andrew, one more cleanup to aio_setup_sigevent() to make it more readable.

S?bastien.


From: S?bastien Dugu? <[email protected]>

AIO completion signal notification small cleanup

This patch cleans up aio_setup_sigevent() to make it more readable.

aio.c | 15 ++++-----------
1 file changed, 4 insertions(+), 11 deletions(-)

Signed-off-by: S?bastien Dugu? <[email protected]>


Index: linux-2.6.20-rc6-mm3/fs/aio.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c 2007-02-06 09:33:55.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c 2007-02-06 12:43:52.000000000 +0100
@@ -962,21 +962,11 @@ static long aio_setup_sigevent(struct ai
event.sigev_notify != SIGEV_THREAD_ID)
return -EINVAL;

- notify->notify = event.sigev_notify;
- notify->signo = event.sigev_signo;
- notify->value = event.sigev_value;
-
rcu_read_lock();
target = sigevent_find_task(&event);

- if (unlikely(!target)) {
- /*
- * Revert notify to SIGEV_NONE so that really_put_req()
- * knows that no ref has been taken on a task.
- */
- notify->notify = SIGEV_NONE;
+ if (unlikely(!target))
goto out_unlock;
- }

/*
* At this point, we know that notify is either SIGEV_SIGNAL or
@@ -988,6 +978,9 @@ static long aio_setup_sigevent(struct ai
notify->target = target;
rcu_read_unlock();

+ notify->notify = event.sigev_notify;
+ notify->signo = event.sigev_signo;
+ notify->value = event.sigev_value;
notify->sigq = __sigqueue_alloc(current, GFP_KERNEL, 0);

/*