2004-11-03 09:01:05

by Suparna Bhattacharya

[permalink] [raw]
Subject: [PATCH 0/6] AIO wait bit support


The series of patches that follow integrate AIO with
William Lee Irwin's wait bit changes, to support asynchronous
page waits.

[1] modify-wait-bit-action-args.patch
Add a wait queue arg to the wait_bit action() routine

[2] lock_page_slow.patch
Rename __lock_page to lock_page_slow

[3] init-wait-bit-key.patch
Interfaces to init and to test wait bit key

[4] tsk-default-io-wait.patch
Add default io wait bit field in task struct

[5] aio-wait-bit.patch
AIO wake bit and AIO wait bit

[6] aio-wait-page.patch
AIO wait page and lock page

Regards
Suparna

--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India


2004-11-03 09:06:14

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: [PATCH 1/6] Add a wait queue arg to the wait_bit action() routine

On Wed, Nov 03, 2004 at 02:40:36PM +0530, Suparna Bhattacharya wrote:
>
> The series of patches that follow integrate AIO with
> William Lee Irwin's wait bit changes, to support asynchronous
> page waits.
>
> [1] modify-wait-bit-action-args.patch
> Add a wait queue arg to the wait_bit action() routine
>

--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India

-------------------------------------------------------

Add a wait queue parameter to the action routine called by
__wait_on_bit to allow it to determine whether to block or
not.

Signed-off-by: Suparna Bhattacharya <[email protected]>

diff -puN kernel/wait.c~modify-wait-bit-action-args kernel/wait.c


linux-2.6.10-rc1-suparna/fs/buffer.c | 2 +-
linux-2.6.10-rc1-suparna/fs/inode.c | 2 +-
linux-2.6.10-rc1-suparna/include/linux/wait.h | 18 ++++++++++++------
linux-2.6.10-rc1-suparna/include/linux/writeback.h | 2 +-
linux-2.6.10-rc1-suparna/kernel/wait.c | 14 ++++++++------
linux-2.6.10-rc1-suparna/mm/filemap.c | 2 +-
6 files changed, 24 insertions(+), 16 deletions(-)

diff -puN fs/buffer.c~modify-wait-bit-action-args fs/buffer.c
--- linux-2.6.10-rc1/fs/buffer.c~modify-wait-bit-action-args 2004-11-03 14:34:15.000000000 +0530
+++ linux-2.6.10-rc1-suparna/fs/buffer.c 2004-11-03 14:34:15.000000000 +0530
@@ -52,7 +52,7 @@ init_buffer(struct buffer_head *bh, bh_e
bh->b_private = private;
}

-static int sync_buffer(void *word)
+static int sync_buffer(void *word, wait_queue_t *wait)
{
struct block_device *bd;
struct buffer_head *bh
diff -puN fs/inode.c~modify-wait-bit-action-args fs/inode.c
--- linux-2.6.10-rc1/fs/inode.c~modify-wait-bit-action-args 2004-11-03 14:34:15.000000000 +0530
+++ linux-2.6.10-rc1-suparna/fs/inode.c 2004-11-03 14:34:15.000000000 +0530
@@ -1264,7 +1264,7 @@ void remove_dquot_ref(struct super_block

#endif

-int inode_wait(void *word)
+int inode_wait(void *word, wait_queue_t *wait)
{
schedule();
return 0;
diff -puN include/linux/wait.h~modify-wait-bit-action-args include/linux/wait.h
--- linux-2.6.10-rc1/include/linux/wait.h~modify-wait-bit-action-args 2004-11-03 14:34:15.000000000 +0530
+++ linux-2.6.10-rc1-suparna/include/linux/wait.h 2004-11-03 14:34:15.000000000 +0530
@@ -140,11 +140,15 @@ void FASTCALL(__wake_up(wait_queue_head_
extern void FASTCALL(__wake_up_locked(wait_queue_head_t *q, unsigned int mode));
extern void FASTCALL(__wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr));
void FASTCALL(__wake_up_bit(wait_queue_head_t *, void *, int));
-int FASTCALL(__wait_on_bit(wait_queue_head_t *, struct wait_bit_queue *, int (*)(void *), unsigned));
-int FASTCALL(__wait_on_bit_lock(wait_queue_head_t *, struct wait_bit_queue *, int (*)(void *), unsigned));
+int FASTCALL(__wait_on_bit(wait_queue_head_t *, struct wait_bit_queue *,
+ int (*)(void *, wait_queue_t *), unsigned));
+int FASTCALL(__wait_on_bit_lock(wait_queue_head_t *, struct wait_bit_queue *,
+ int (*)(void *, wait_queue_t *), unsigned));
void FASTCALL(wake_up_bit(void *, int));
-int FASTCALL(out_of_line_wait_on_bit(void *, int, int (*)(void *), unsigned));
-int FASTCALL(out_of_line_wait_on_bit_lock(void *, int, int (*)(void *), unsigned));
+int FASTCALL(out_of_line_wait_on_bit(void *, int, int (*)(void *,
+ wait_queue_t *), unsigned));
+int FASTCALL(out_of_line_wait_on_bit_lock(void *, int, int (*)(void *,
+ wait_queue_t *), unsigned));
wait_queue_head_t *FASTCALL(bit_waitqueue(void *, int));

#define wake_up(x) __wake_up(x, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 1, NULL)
@@ -364,7 +368,8 @@ int wake_bit_function(wait_queue_t *wait
* but has no intention of setting it.
*/
static inline int wait_on_bit(void *word, int bit,
- int (*action)(void *), unsigned mode)
+ int (*action)(void *, wait_queue_t *),
+ unsigned mode)
{
if (!test_bit(bit, word))
return 0;
@@ -388,7 +393,8 @@ static inline int wait_on_bit(void *word
* clear with the intention of setting it, and when done, clearing it.
*/
static inline int wait_on_bit_lock(void *word, int bit,
- int (*action)(void *), unsigned mode)
+ int (*action)(void *, wait_queue_t *),
+ unsigned mode)
{
if (!test_and_set_bit(bit, word))
return 0;
diff -puN include/linux/writeback.h~modify-wait-bit-action-args include/linux/writeback.h
--- linux-2.6.10-rc1/include/linux/writeback.h~modify-wait-bit-action-args 2004-11-03 14:34:15.000000000 +0530
+++ linux-2.6.10-rc1-suparna/include/linux/writeback.h 2004-11-03 14:34:15.000000000 +0530
@@ -68,7 +68,7 @@ struct writeback_control {
*/
void writeback_inodes(struct writeback_control *wbc);
void wake_up_inode(struct inode *inode);
-int inode_wait(void *);
+int inode_wait(void *, wait_queue_t *);
void sync_inodes_sb(struct super_block *, int wait);
void sync_inodes(int wait);

diff -puN kernel/wait.c~modify-wait-bit-action-args kernel/wait.c
--- linux-2.6.10-rc1/kernel/wait.c~modify-wait-bit-action-args 2004-11-03 14:34:15.000000000 +0530
+++ linux-2.6.10-rc1-suparna/kernel/wait.c 2004-11-03 14:34:15.000000000 +0530
@@ -152,14 +152,14 @@ EXPORT_SYMBOL(wake_bit_function);
*/
int __sched fastcall
__wait_on_bit(wait_queue_head_t *wq, struct wait_bit_queue *q,
- int (*action)(void *), unsigned mode)
+ int (*action)(void *, wait_queue_t *), unsigned mode)
{
int ret = 0;

do {
prepare_to_wait(wq, &q->wait, mode);
if (test_bit(q->key.bit_nr, q->key.flags))
- ret = (*action)(q->key.flags);
+ ret = (*action)(q->key.flags, &q->wait);
} while (test_bit(q->key.bit_nr, q->key.flags) && !ret);
finish_wait(wq, &q->wait);
return ret;
@@ -167,7 +167,8 @@ __wait_on_bit(wait_queue_head_t *wq, str
EXPORT_SYMBOL(__wait_on_bit);

int __sched fastcall out_of_line_wait_on_bit(void *word, int bit,
- int (*action)(void *), unsigned mode)
+ int (*action)(void *, wait_queue_t *),
+ unsigned mode)
{
wait_queue_head_t *wq = bit_waitqueue(word, bit);
DEFINE_WAIT_BIT(wait, word, bit);
@@ -178,14 +179,14 @@ EXPORT_SYMBOL(out_of_line_wait_on_bit);

int __sched fastcall
__wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
- int (*action)(void *), unsigned mode)
+ int (*action)(void *, wait_queue_t *), unsigned mode)
{
int ret = 0;

do {
prepare_to_wait_exclusive(wq, &q->wait, mode);
if (test_bit(q->key.bit_nr, q->key.flags)) {
- if ((ret = (*action)(q->key.flags)))
+ if ((ret = (*action)(q->key.flags, &q->wait)))
break;
}
} while (test_and_set_bit(q->key.bit_nr, q->key.flags));
@@ -195,7 +196,8 @@ __wait_on_bit_lock(wait_queue_head_t *wq
EXPORT_SYMBOL(__wait_on_bit_lock);

int __sched fastcall out_of_line_wait_on_bit_lock(void *word, int bit,
- int (*action)(void *), unsigned mode)
+ int (*action)(void *, wait_queue_t *wait),
+ unsigned mode)
{
wait_queue_head_t *wq = bit_waitqueue(word, bit);
DEFINE_WAIT_BIT(wait, word, bit);
diff -puN mm/filemap.c~modify-wait-bit-action-args mm/filemap.c
--- linux-2.6.10-rc1/mm/filemap.c~modify-wait-bit-action-args 2004-11-03 14:34:15.000000000 +0530
+++ linux-2.6.10-rc1-suparna/mm/filemap.c 2004-11-03 14:34:15.000000000 +0530
@@ -131,7 +131,7 @@ void remove_from_page_cache(struct page
spin_unlock_irq(&mapping->tree_lock);
}

-static int sync_page(void *word)
+static int sync_page(void *word, wait_queue_t *wait)
{
struct address_space *mapping;
struct page *page;

_

2004-11-03 09:08:00

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: [PATCH 2/6] Rename __lock_page to lock_page_slow

On Wed, Nov 03, 2004 at 02:40:36PM +0530, Suparna Bhattacharya wrote:
>
> The series of patches that follow integrate AIO with
> William Lee Irwin's wait bit changes, to support asynchronous
> page waits.
>
> [1] modify-wait-bit-action-args.patch
> Add a wait queue arg to the wait_bit action() routine
>
> [2] lock_page_slow.patch
> Rename __lock_page to lock_page_slow
>

--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India

------------------------------------------------------------


In order to allow for interruptible and asynchronous versions of
lock_page in conjunction with the wait_on_bit changes, we need to
define low-level lock page routines which take an additional
argument, i.e a wait queue entry and may return non-zero status,
e.g -EINTR, -EIOCBRETRY, -EWOULDBLOCK etc. This patch renames
__lock_page to lock_page_slow, so that __lock_page and
__lock_page_slow can denote the versions which take a wait queue
parameter.

Signed-off-by: Suparna Bhattacharya <[email protected]>

diff -puN include/linux/pagemap.h~lock_page_slow include/linux/pagemap.h


linux-2.6.10-rc1-suparna/include/linux/pagemap.h | 4 ++--
linux-2.6.10-rc1-suparna/mm/filemap.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)

diff -puN include/linux/pagemap.h~lock_page_slow include/linux/pagemap.h
--- linux-2.6.10-rc1/include/linux/pagemap.h~lock_page_slow 2004-11-03 14:34:57.000000000 +0530
+++ linux-2.6.10-rc1-suparna/include/linux/pagemap.h 2004-11-03 14:34:57.000000000 +0530
@@ -151,14 +151,14 @@ static inline pgoff_t linear_page_index(
return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
}

-extern void FASTCALL(__lock_page(struct page *page));
+extern void FASTCALL(lock_page_slow(struct page *page));
extern void FASTCALL(unlock_page(struct page *page));

static inline void lock_page(struct page *page)
{
might_sleep();
if (TestSetPageLocked(page))
- __lock_page(page);
+ lock_page_slow(page);
}

/*
diff -puN mm/filemap.c~lock_page_slow mm/filemap.c
--- linux-2.6.10-rc1/mm/filemap.c~lock_page_slow 2004-11-03 14:34:57.000000000 +0530
+++ linux-2.6.10-rc1-suparna/mm/filemap.c 2004-11-03 14:34:57.000000000 +0530
@@ -436,14 +436,14 @@ EXPORT_SYMBOL(end_page_writeback);
* chances are that on the second loop, the block layer's plug list is empty,
* so sync_page() will then return in state TASK_UNINTERRUPTIBLE.
*/
-void fastcall __lock_page(struct page *page)
+void fastcall lock_page_slow(struct page *page)
{
DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);

__wait_on_bit_lock(page_waitqueue(page), &wait, sync_page,
TASK_UNINTERRUPTIBLE);
}
-EXPORT_SYMBOL(__lock_page);
+EXPORT_SYMBOL(lock_page_slow);

/*
* a rather lightweight function, finding and getting a reference to a

_

2004-11-03 09:13:25

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: [PATCH 4/6] Add default io wait bit field in task struct

On Wed, Nov 03, 2004 at 02:40:36PM +0530, Suparna Bhattacharya wrote:
>
> The series of patches that follow integrate AIO with
> William Lee Irwin's wait bit changes, to support asynchronous
> page waits.
>
> [1] modify-wait-bit-action-args.patch
> Add a wait queue arg to the wait_bit action() routine
>
> [2] lock_page_slow.patch
> Rename __lock_page to lock_page_slow
>
> [3] init-wait-bit-key.patch
> Interfaces to init and to test wait bit key
>
> [4] tsk-default-io-wait.patch
> Add default io wait bit field in task struct
>

--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India

--------------------------------------------------------


Allocates space for the default io wait queue entry (actually a wait
bit entry) in the task struct. Doing so simplifies the patches
for AIO wait page allowing for cleaner and more efficient
implementation, at the cost of 28 additional bytes in task struct
vs allocation on demand on-stack.

Thanks to Vatsa for helping debug and fix a problem with wait bit
initializtion.

Signed-off-by: Suparna Bhattacharya <[email protected]>

diff -puN include/linux/sched.h~tsk-default-io-wait include/linux/sched.h


linux-2.6.10-rc1-suparna/include/linux/sched.h | 11 +++++++----
linux-2.6.10-rc1-suparna/kernel/fork.c | 3 ++-
2 files changed, 9 insertions(+), 5 deletions(-)

diff -puN include/linux/sched.h~tsk-default-io-wait include/linux/sched.h
--- linux-2.6.10-rc1/include/linux/sched.h~tsk-default-io-wait 2004-11-03 14:35:13.000000000 +0530
+++ linux-2.6.10-rc1-suparna/include/linux/sched.h 2004-11-03 14:35:13.000000000 +0530
@@ -652,11 +652,14 @@ struct task_struct {

unsigned long ptrace_message;
siginfo_t *last_siginfo; /* For ptrace use. */
+
+/* Space for default IO wait bit entry used for synchronous IO waits */
+ struct wait_bit_queue __wait;
/*
- * current io wait handle: wait queue entry to use for io waits
- * If this thread is processing aio, this points at the waitqueue
- * inside the currently handled kiocb. It may be NULL (i.e. default
- * to a stack based synchronous wait) if its doing sync IO.
+ * Current IO wait handle: wait queue entry to use for IO waits
+ * If this thread is processing AIO, this points at the waitqueue
+ * inside the currently handled kiocb. Otherwise, points to the
+ * default IO wait field (i.e &__wait.wait above).
*/
wait_queue_t *io_wait;
#ifdef CONFIG_NUMA
diff -puN kernel/fork.c~tsk-default-io-wait kernel/fork.c
--- linux-2.6.10-rc1/kernel/fork.c~tsk-default-io-wait 2004-11-03 14:35:13.000000000 +0530
+++ linux-2.6.10-rc1-suparna/kernel/fork.c 2004-11-03 14:35:13.000000000 +0530
@@ -870,7 +870,8 @@ static task_t *copy_process(unsigned lon
do_posix_clock_monotonic_gettime(&p->start_time);
p->security = NULL;
p->io_context = NULL;
- p->io_wait = NULL;
+ init_wait_bit_task(&p->__wait, p);
+ p->io_wait = &p->__wait.wait;
p->audit_context = NULL;
#ifdef CONFIG_NUMA
p->mempolicy = mpol_copy(p->mempolicy);

_

2004-11-03 09:16:30

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: [PATCH 5/6] AIO wake bit and AIO wait bit

On Wed, Nov 03, 2004 at 02:40:36PM +0530, Suparna Bhattacharya wrote:
>
> The series of patches that follow integrate AIO with
> William Lee Irwin's wait bit changes, to support asynchronous
> page waits.
>
> [1] modify-wait-bit-action-args.patch
> Add a wait queue arg to the wait_bit action() routine
>
> [2] lock_page_slow.patch
> Rename __lock_page to lock_page_slow
>
> [3] init-wait-bit-key.patch
> Interfaces to init and to test wait bit key
>
> [4] tsk-default-io-wait.patch
> Add default io wait bit field in task struct
>
> [5] aio-wait-bit.patch
> AIO wake bit and AIO wait bit
>

Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India

----------------------------------------------------------

Enable wait bit based filtered wakeups to work for AIO.
Replaces the wait queue entry in the kiocb with a wait bit
structure, to allow enough space for the wait bit key.
This adds an extra level of indirection in references to the
wait queue entry in the iocb. Also, an extra check had to be
added in aio_wake_function to allow for other kinds of waiters
which do not require wait bit, based on the assumption that
the key passed in would be NULL in such cases.

Thanks to Chinmay for help with testing.

Signed-off-by: Suparna Bhattacharya <[email protected]>

diff -puN fs/aio.c~aio-wait-bit fs/aio.c


linux-2.6.10-rc1-suparna/fs/aio.c | 22 ++++++++++++++--------
linux-2.6.10-rc1-suparna/include/linux/aio.h | 4 ++--
linux-2.6.10-rc1-suparna/kernel/wait.c | 17 ++++++++++++++---
3 files changed, 30 insertions(+), 13 deletions(-)

diff -puN fs/aio.c~aio-wait-bit fs/aio.c
--- linux-2.6.10-rc1/fs/aio.c~aio-wait-bit 2004-11-03 14:35:17.000000000 +0530
+++ linux-2.6.10-rc1-suparna/fs/aio.c 2004-11-03 14:35:17.000000000 +0530
@@ -725,14 +725,14 @@ static ssize_t aio_run_iocb(struct kiocb
* cause the iocb to be kicked for continuation (through
* the aio_wake_function callback).
*/
- BUG_ON(current->io_wait != NULL);
- current->io_wait = &iocb->ki_wait;
+ BUG_ON(!is_sync_wait(current->io_wait));
+ current->io_wait = &iocb->ki_wait.wait;
ret = retry(iocb);
current->io_wait = NULL;

if (-EIOCBRETRY != ret) {
if (-EIOCBQUEUED != ret) {
- BUG_ON(!list_empty(&iocb->ki_wait.task_list));
+ BUG_ON(!list_empty(&iocb->ki_wait.wait.task_list));
aio_complete(iocb, ret, 0);
/* must not access the iocb after this */
}
@@ -741,7 +741,7 @@ static ssize_t aio_run_iocb(struct kiocb
* Issue an additional retry to avoid waiting forever if
* no waits were queued (e.g. in case of a short read).
*/
- if (list_empty(&iocb->ki_wait.task_list))
+ if (list_empty(&iocb->ki_wait.wait.task_list))
kiocbSetKicked(iocb);
}
out:
@@ -887,7 +887,7 @@ void queue_kicked_iocb(struct kiocb *ioc
unsigned long flags;
int run = 0;

- WARN_ON((!list_empty(&iocb->ki_wait.task_list)));
+ WARN_ON((!list_empty(&iocb->ki_wait.wait.task_list)));

spin_lock_irqsave(&ctx->ctx_lock, flags);
run = __queue_kicked_iocb(iocb);
@@ -1473,7 +1473,13 @@ ssize_t aio_setup_iocb(struct kiocb *kio
*/
int aio_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key)
{
- struct kiocb *iocb = container_of(wait, struct kiocb, ki_wait);
+ struct wait_bit_queue *wait_bit
+ = container_of(wait, struct wait_bit_queue, wait);
+ struct kiocb *iocb = container_of(wait_bit, struct kiocb, ki_wait);
+
+ /* Assumes that a non-NULL key implies wait bit filtering */
+ if (key && !test_wait_bit_key(wait, key))
+ return 0;

list_del_init(&wait->task_list);
kick_iocb(iocb);
@@ -1529,8 +1535,8 @@ int fastcall io_submit_one(struct kioctx
req->ki_buf = (char __user *)(unsigned long)iocb->aio_buf;
req->ki_left = req->ki_nbytes = iocb->aio_nbytes;
req->ki_opcode = iocb->aio_lio_opcode;
- init_waitqueue_func_entry(&req->ki_wait, aio_wake_function);
- INIT_LIST_HEAD(&req->ki_wait.task_list);
+ init_waitqueue_func_entry(&req->ki_wait.wait, aio_wake_function);
+ INIT_LIST_HEAD(&req->ki_wait.wait.task_list);
req->ki_run_list.next = req->ki_run_list.prev = NULL;
req->ki_retry = NULL;
req->ki_retried = 0;
diff -puN include/linux/aio.h~aio-wait-bit include/linux/aio.h
--- linux-2.6.10-rc1/include/linux/aio.h~aio-wait-bit 2004-11-03 14:35:17.000000000 +0530
+++ linux-2.6.10-rc1-suparna/include/linux/aio.h 2004-11-03 14:35:17.000000000 +0530
@@ -69,7 +69,7 @@ struct kiocb {
size_t ki_nbytes; /* copy of iocb->aio_nbytes */
char __user *ki_buf; /* remaining iocb->aio_buf */
size_t ki_left; /* remaining bytes */
- wait_queue_t ki_wait;
+ struct wait_bit_queue ki_wait;
long ki_retried; /* just for testing */
long ki_kicked; /* just for testing */
long ki_queued; /* just for testing */
@@ -90,7 +90,7 @@ struct kiocb {
(x)->ki_dtor = NULL; \
(x)->ki_obj.tsk = tsk; \
(x)->ki_user_data = 0; \
- init_wait((&(x)->ki_wait)); \
+ init_wait_bit_task((&(x)->ki_wait), current);\
} while (0)

#define AIO_RING_MAGIC 0xa10a10a1
diff -puN kernel/wait.c~aio-wait-bit kernel/wait.c
--- linux-2.6.10-rc1/kernel/wait.c~aio-wait-bit 2004-11-03 14:35:17.000000000 +0530
+++ linux-2.6.10-rc1-suparna/kernel/wait.c 2004-11-03 14:35:17.000000000 +0530
@@ -132,7 +132,8 @@ EXPORT_SYMBOL(autoremove_wake_function);

int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *arg)
{
- if (!test_wait_bit_key(wait, arg))
+ /* Assumes that a non-NULL key implies wait bit filtering */
+ if (arg && !test_wait_bit_key(wait, arg))
return 0;
return autoremove_wake_function(wait, mode, sync, arg);
}
@@ -154,7 +155,12 @@ __wait_on_bit(wait_queue_head_t *wq, str
if (test_bit(q->key.bit_nr, q->key.flags))
ret = (*action)(q->key.flags, &q->wait);
} while (test_bit(q->key.bit_nr, q->key.flags) && !ret);
- finish_wait(wq, &q->wait);
+ /*
+ * AIO retries require the wait queue entry to remain queued
+ * for async notification
+ */
+ if (ret != -EIOCBRETRY)
+ finish_wait(wq, &q->wait);
return ret;
}
EXPORT_SYMBOL(__wait_on_bit);
@@ -183,7 +189,12 @@ __wait_on_bit_lock(wait_queue_head_t *wq
break;
}
} while (test_and_set_bit(q->key.bit_nr, q->key.flags));
- finish_wait(wq, &q->wait);
+ /*
+ * AIO retries require the wait queue entry to remain queued
+ * for async notification
+ */
+ if (ret != -EIOCBRETRY)
+ finish_wait(wq, &q->wait);
return ret;
}
EXPORT_SYMBOL(__wait_on_bit_lock);

_

2004-11-03 09:25:57

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: [PATCH 6/6] AIO wait page and AIO lock page


> [6] aio-wait-page.patch

Sorry, forgot to regenerate this one against 2.6.10-rc1.

--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India

--------------------------------------------


Define low-level page wait and lock page routines which take a
wait queue entry pointer as an additional parameter and
return status (which may be non-zero when the wait queue
parameter signifies an asynchronous wait, typically during
AIO).

Synchronous IO waits become a special case where the wait
queue parameter is the running task's default io wait context.
Asynchronous IO waits happen when the wait queue parameter
is the io wait context of a kiocb. Code paths which choose to
execute synchronous or asynchronous behaviour depending on the
called context specify the current io wait context (which points
to sync or async context as the case may be) as the wait
parameter.

Signed-off-by: Suparna Bhattacharya <[email protected]>


linux-2.6.10-rc1-suparna/include/linux/pagemap.h | 29 ++++++++++++++++-------
linux-2.6.10-rc1-suparna/kernel/sched.c | 14 +++++++++++
linux-2.6.10-rc1-suparna/mm/filemap.c | 28 ++++++++++++----------
3 files changed, 51 insertions(+), 20 deletions(-)

diff -puN include/linux/pagemap.h~aio-wait-page include/linux/pagemap.h
--- linux-2.6.10-rc1/include/linux/pagemap.h~aio-wait-page 2004-11-03 14:58:42.000000000 +0530
+++ linux-2.6.10-rc1-suparna/include/linux/pagemap.h 2004-11-03 14:58:42.000000000 +0530
@@ -151,21 +151,25 @@ static inline pgoff_t linear_page_index(
return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
}

-extern void FASTCALL(lock_page_slow(struct page *page));
+extern int FASTCALL(lock_page_slow(struct page *page, wait_queue_t *wait));
extern void FASTCALL(unlock_page(struct page *page));

-static inline void lock_page(struct page *page)
+static inline int __lock_page(struct page *page, wait_queue_t *wait)
{
might_sleep();
if (TestSetPageLocked(page))
- lock_page_slow(page);
+ return lock_page_slow(page, wait);
+ return 0;
}
+
+#define lock_page(page) __lock_page(page, &current->__wait.wait)

/*
* This is exported only for wait_on_page_locked/wait_on_page_writeback.
* Never use this directly!
*/
-extern void FASTCALL(wait_on_page_bit(struct page *page, int bit_nr));
+extern int FASTCALL(wait_on_page_bit(struct page *page, int bit_nr,
+ wait_queue_t *wait));

/*
* Wait for a page to be unlocked.
@@ -174,21 +178,30 @@ extern void FASTCALL(wait_on_page_bit(st
* ie with increased "page->count" so that the page won't
* go away during the wait..
*/
-static inline void wait_on_page_locked(struct page *page)
+static inline int __wait_on_page_locked(struct page *page, wait_queue_t *wait)
{
if (PageLocked(page))
- wait_on_page_bit(page, PG_locked);
+ return wait_on_page_bit(page, PG_locked, wait);
+ return 0;
}

+#define wait_on_page_locked(page) \
+ __wait_on_page_locked(page, &current->__wait.wait)
+
/*
* Wait for a page to complete writeback
*/
-static inline void wait_on_page_writeback(struct page *page)
+static inline int __wait_on_page_writeback(struct page *page,
+ wait_queue_t *wait)
{
if (PageWriteback(page))
- wait_on_page_bit(page, PG_writeback);
+ return wait_on_page_bit(page, PG_writeback, wait);
+ return 0;
}

+#define wait_on_page_writeback(page) \
+ __wait_on_page_writeback(page, &current->__wait.wait)
+
extern void end_page_writeback(struct page *page);

/*
diff -puN kernel/sched.c~aio-wait-page kernel/sched.c
--- linux-2.6.10-rc1/kernel/sched.c~aio-wait-page 2004-11-03 14:58:42.000000000 +0530
+++ linux-2.6.10-rc1-suparna/kernel/sched.c 2004-11-03 14:58:42.000000000 +0530
@@ -3444,6 +3444,20 @@ long __sched io_schedule_timeout(long ti
return ret;
}

+/*
+ * Sleep only if the wait context passed is not async,
+ * otherwise return so that a retry can be issued later.
+ */
+int __sched io_wait_schedule(wait_queue_t *wait)
+{
+ if (!is_sync_wait(wait))
+ return -EIOCBRETRY;
+ io_schedule();
+ return 0;
+}
+
+EXPORT_SYMBOL(io_wait_schedule);
+
/**
* sys_sched_get_priority_max - return maximum RT priority.
* @policy: scheduling class.
diff -puN mm/filemap.c~aio-wait-page mm/filemap.c
--- linux-2.6.10-rc1/mm/filemap.c~aio-wait-page 2004-11-03 14:58:42.000000000 +0530
+++ linux-2.6.10-rc1-suparna/mm/filemap.c 2004-11-03 14:58:42.000000000 +0530
@@ -145,8 +145,7 @@ static int sync_page(void *word, wait_qu
mapping = page_mapping(page);
if (mapping && mapping->a_ops && mapping->a_ops->sync_page)
mapping->a_ops->sync_page(page);
- io_schedule();
- return 0;
+ return io_wait_schedule(wait);
}

/**
@@ -376,13 +375,17 @@ static inline void wake_up_page(struct p
__wake_up_bit(page_waitqueue(page), &page->flags, bit);
}

-void fastcall wait_on_page_bit(struct page *page, int bit_nr)
+int fastcall wait_on_page_bit(struct page *page, int bit_nr,
+ wait_queue_t *wait)
{
- DEFINE_WAIT_BIT(wait, &page->flags, bit_nr);
-
- if (test_bit(bit_nr, &page->flags))
- __wait_on_bit(page_waitqueue(page), &wait, sync_page,
+ if (test_bit(bit_nr, &page->flags)) {
+ struct wait_bit_queue *wait_bit
+ = container_of(wait, struct wait_bit_queue, wait);
+ init_wait_bit_key(wait_bit, &page->flags, bit_nr);
+ return __wait_on_bit(page_waitqueue(page), wait_bit, sync_page,
TASK_UNINTERRUPTIBLE);
+ }
+ return 0;
}
EXPORT_SYMBOL(wait_on_page_bit);

@@ -411,7 +414,6 @@ void fastcall unlock_page(struct page *p
}

EXPORT_SYMBOL(unlock_page);
-EXPORT_SYMBOL(lock_page);

/*
* End writeback against a page.
@@ -429,18 +431,20 @@ void end_page_writeback(struct page *pag
EXPORT_SYMBOL(end_page_writeback);

/*
- * Get a lock on the page, assuming we need to sleep to get it.
+ * Get a lock on the page, assuming we need to wait to get it.
*
* Ugly: running sync_page() in state TASK_UNINTERRUPTIBLE is scary. If some
* random driver's requestfn sets TASK_RUNNING, we could busywait. However
* chances are that on the second loop, the block layer's plug list is empty,
* so sync_page() will then return in state TASK_UNINTERRUPTIBLE.
*/
-void fastcall lock_page_slow(struct page *page)
+int fastcall lock_page_slow(struct page *page, wait_queue_t *wait)
{
- DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
+ struct wait_bit_queue *wait_bit
+ = container_of(wait, struct wait_bit_queue, wait);

- __wait_on_bit_lock(page_waitqueue(page), &wait, sync_page,
+ init_wait_bit_key(wait_bit, &page->flags, PG_locked);
+ return __wait_on_bit_lock(page_waitqueue(page), wait_bit, sync_page,
TASK_UNINTERRUPTIBLE);
}
EXPORT_SYMBOL(lock_page_slow);

_

2004-11-03 09:31:29

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH 0/6] AIO wait bit support

On Wed, Nov 03, 2004 at 02:40:36PM +0530, Suparna Bhattacharya wrote:
> The series of patches that follow integrate AIO with
> William Lee Irwin's wait bit changes, to support asynchronous
> page waits.

Thank you for pursuing this. I apologize for not participating more
directly in the follow-up.

I also apologize for mentioning this, but I'm disturbed by current
events right now, so I won't be evaluating these in any technical
sense for at least a few days.


-- wli

2004-11-03 09:18:36

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: [PATCH 6/6] AIO wait page and AIO lock page

On Wed, Nov 03, 2004 at 02:40:36PM +0530, Suparna Bhattacharya wrote:
>
> The series of patches that follow integrate AIO with
> William Lee Irwin's wait bit changes, to support asynchronous
> page waits.
>
> [1] modify-wait-bit-action-args.patch
> Add a wait queue arg to the wait_bit action() routine
>
> [2] lock_page_slow.patch
> Rename __lock_page to lock_page_slow
>
> [3] init-wait-bit-key.patch
> Interfaces to init and to test wait bit key
>
> [4] tsk-default-io-wait.patch
> Add default io wait bit field in task struct
>
> [5] aio-wait-bit.patch
> AIO wake bit and AIO wait bit
>
> [6] aio-wait-page.patch
> AIO wait page and lock page
>

--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India

---------------------------------------------------


Define low-level page wait and lock page routines which take a
wait queue entry pointer as an additional parameter and
return status (which may be non-zero when the wait queue
parameter signifies an asynchronous wait, typically during
AIO).

Synchronous IO waits become a special case where the wait
queue parameter is the running task's default io wait context.
Asynchronous IO waits happen when the wait queue parameter
is the io wait context of a kiocb. Code paths which choose to
execute synchronous or asynchronous behaviour depending on the
called context specify the current io wait context (which points
to sync or async context as the case may be) as the wait
parameter.

Signed-off-by: Suparna Bhattacharya <[email protected]>

linux-2.6.9-rc1-mm4-suparna/include/linux/pagemap.h | 38 ++++++++++++++------
linux-2.6.9-rc1-mm4-suparna/mm/filemap.c | 27 ++++++++------
2 files changed, 44 insertions(+), 21 deletions(-)

diff -urp linux-2.6.9-rc3/kernel/sched.c linux-2.6.9-rc3-mm2/kernel/sched.c
--- linux-2.6.9-rc3/kernel/sched.c 2004-10-07 13:19:10.000000000 +0530
+++ linux-2.6.9-rc3-mm2/kernel/sched.c 2004-10-08 11:53:18.000000000 +0530
@@ -4428,6 +4428,20 @@ long __sched io_schedule_timeout(long ti
return ret;
}

+/*
+ * Sleep only if the wait context passed is not async,
+ * otherwise return so that a retry can be issued later.
+ */
+int __sched io_wait_schedule(wait_queue_t *wait)
+{
+ if (!is_sync_wait(wait))
+ return -EIOCBRETRY;
+ io_schedule();
+ return 0;
+}
+
+EXPORT_SYMBOL(io_wait_schedule);
+
/**
* sys_sched_get_priority_max - return maximum RT priority.
* @policy: scheduling class.
diff -puN mm/filemap.c~aio-wait-page mm/filemap.c
--- linux-2.6.9-rc1-mm4/mm/filemap.c~aio-wait-page 2004-09-17 09:25:48.000000000 +0530
+++ linux-2.6.9-rc1-mm4-suparna/mm/filemap.c 2004-09-20 22:57:37.000000000 +0530
@@ -146,8 +146,7 @@ static int sync_page(void *word, wait_qu
mapping = page_mapping(page);
if (mapping && mapping->a_ops && mapping->a_ops->sync_page)
mapping->a_ops->sync_page(page);
- io_schedule();
- return 0;
+ return io_wait_schedule(wait);
}

/**
@@ -378,13 +377,17 @@ static inline void wake_up_page(struct p
__wake_up_bit(page_waitqueue(page), &page->flags, bit);
}

-void fastcall wait_on_page_bit(struct page *page, int bit_nr)
+int fastcall wait_on_page_bit(struct page *page, int bit_nr,
+ wait_queue_t *wait)
{
- DEFINE_WAIT_BIT(wait, &page->flags, bit_nr);
-
- if (test_bit(bit_nr, &page->flags))
- __wait_on_bit(page_waitqueue(page), &wait, sync_page,
+ if (test_bit(bit_nr, &page->flags)) {
+ struct wait_bit_queue *wait_bit
+ = container_of(wait, struct wait_bit_queue, wait);
+ init_wait_bit_key(wait_bit, &page->flags, bit_nr);
+ return __wait_on_bit(page_waitqueue(page), wait_bit, sync_page,
TASK_UNINTERRUPTIBLE);
+ }
+ return 0;
}
EXPORT_SYMBOL(wait_on_page_bit);

@@ -414,7 +417,6 @@ void fastcall unlock_page(struct page *p
}

EXPORT_SYMBOL(unlock_page);
-EXPORT_SYMBOL(lock_page);

/*
* End writeback against a page.
@@ -431,18 +434,20 @@ void end_page_writeback(struct page *pag
EXPORT_SYMBOL(end_page_writeback);

/*
- * Get a lock on the page, assuming we need to sleep to get it.
+ * Get a lock on the page, assuming we need to wait to get it.
*
* Ugly: running sync_page() in state TASK_UNINTERRUPTIBLE is scary. If some
* random driver's requestfn sets TASK_RUNNING, we could busywait. However
* chances are that on the second loop, the block layer's plug list is empty,
* so sync_page() will then return in state TASK_UNINTERRUPTIBLE.
*/
-void fastcall lock_page_slow(struct page *page)
+int fastcall lock_page_slow(struct page *page, wait_queue_t *wait)
{
- DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
+ struct wait_bit_queue *wait_bit
+ = container_of(wait, struct wait_bit_queue, wait);

- __wait_on_bit_lock(page_waitqueue(page), &wait, sync_page,
+ init_wait_bit_key(wait_bit, &page->flags, PG_locked);
+ return __wait_on_bit_lock(page_waitqueue(page), wait_bit, sync_page,
TASK_UNINTERRUPTIBLE);
}
EXPORT_SYMBOL(lock_page_slow);
diff -puN include/linux/pagemap.h~aio-wait-page include/linux/pagemap.h
--- linux-2.6.9-rc1-mm4/include/linux/pagemap.h~aio-wait-page 2004-09-17 09:25:48.000000000 +0530
+++ linux-2.6.9-rc1-mm4-suparna/include/linux/pagemap.h 2004-09-20 22:56:21.000000000 +0530
@@ -151,21 +151,25 @@ static inline pgoff_t linear_page_index(
return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
}

-extern void FASTCALL(lock_page_slow(struct page *page));
+extern int FASTCALL(lock_page_slow(struct page *page, wait_queue_t *wait));
extern void FASTCALL(unlock_page(struct page *page));

-static inline void lock_page(struct page *page)
+static inline int __lock_page(struct page *page, wait_queue_t *wait)
{
might_sleep();
if (TestSetPageLocked(page))
- lock_page_slow(page);
+ return lock_page_slow(page, wait);
+ return 0;
}
+
+#define lock_page(page) __lock_page(page, &current->__wait.wait)

/*
* This is exported only for wait_on_page_locked/wait_on_page_writeback.
* Never use this directly!
*/
-extern void FASTCALL(wait_on_page_bit(struct page *page, int bit_nr));
+extern int FASTCALL(wait_on_page_bit(struct page *page, int bit_nr,
+ wait_queue_t *wait));

/*
* Wait for a page to be unlocked.
@@ -174,20 +178,29 @@ extern void FASTCALL(wait_on_page_bit(st
* ie with increased "page->count" so that the page won't
* go away during the wait..
*/
-static inline void wait_on_page_locked(struct page *page)
+static inline int __wait_on_page_locked(struct page *page, wait_queue_t *wait)
{
if (PageLocked(page))
- wait_on_page_bit(page, PG_locked);
+ return wait_on_page_bit(page, PG_locked, wait);
+ return 0;
}

+#define wait_on_page_locked(page) \
+ __wait_on_page_locked(page, &current->__wait.wait)
+
/*
* Wait for a page to complete writeback
*/
-static inline void wait_on_page_writeback(struct page *page)
+static inline int __wait_on_page_writeback(struct page *page,
+ wait_queue_t *wait)
{
if (PageWriteback(page))
- wait_on_page_bit(page, PG_writeback);
+ return wait_on_page_bit(page, PG_writeback, wait);
+ return 0;
}

+#define wait_on_page_writeback(page) \
+ __wait_on_page_writeback(page, &current->__wait.wait)
+
extern void end_page_writeback(struct page *page);

/*
* Fault a userspace page into pagetables. Return non-zero on a fault.

2004-11-03 09:10:55

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: [PATCH 3/6] Interfaces to init and to test wait bit key

On Wed, Nov 03, 2004 at 02:40:36PM +0530, Suparna Bhattacharya wrote:
>
> The series of patches that follow integrate AIO with
> William Lee Irwin's wait bit changes, to support asynchronous
> page waits.
>
> [1] modify-wait-bit-action-args.patch
> Add a wait queue arg to the wait_bit action() routine
>
> [2] lock_page_slow.patch
> Rename __lock_page to lock_page_slow
>
> [3] init-wait-bit-key.patch
> Interfaces to init and to test wait bit key
>

--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India

-------------------------------------------------------------

init_wait_bit_key() initializes the key field in an already
allocated wait bit structure, useful for async wait bit support.
Also separate out the wait bit test to a common routine which
can be used by different kinds of wakeup callbacks.

Signed-off-by: Suparna Bhattacharya <[email protected]>

diff -puN include/linux/wait.h~init-wait-bit-key include/linux/wait.h


linux-2.6.10-rc1-suparna/include/linux/wait.h | 30 +++++++++++++++++++++++---
linux-2.6.10-rc1-suparna/kernel/wait.c | 11 +--------
2 files changed, 29 insertions(+), 12 deletions(-)

diff -puN include/linux/wait.h~init-wait-bit-key include/linux/wait.h
--- linux-2.6.10-rc1/include/linux/wait.h~init-wait-bit-key 2004-11-03 14:35:07.000000000 +0530
+++ linux-2.6.10-rc1-suparna/include/linux/wait.h 2004-11-03 14:35:07.000000000 +0530
@@ -103,6 +103,17 @@ static inline int waitqueue_active(wait_
return !list_empty(&q->task_list);
}

+static inline int test_wait_bit_key(wait_queue_t *wait,
+ struct wait_bit_key *key)
+{
+ struct wait_bit_queue *wait_bit
+ = container_of(wait, struct wait_bit_queue, wait);
+
+ return (wait_bit->key.flags == key->flags &&
+ wait_bit->key.bit_nr == key->bit_nr &&
+ !test_bit(key->bit_nr, key->flags));
+}
+
/*
* Used to distinguish between sync and async io wait context:
* sync i/o typically specifies a NULL wait queue entry or a wait
@@ -348,9 +359,22 @@ int wake_bit_function(wait_queue_t *wait

#define init_wait(wait) \
do { \
- wait->task = current; \
- wait->func = autoremove_wake_function; \
- INIT_LIST_HEAD(&wait->task_list); \
+ (wait)->task = current; \
+ (wait)->func = autoremove_wake_function; \
+ INIT_LIST_HEAD(&(wait)->task_list); \
+ } while (0)
+
+#define init_wait_bit_key(waitbit, word, bit) \
+ do { \
+ (waitbit)->key.flags = word; \
+ (waitbit)->key.bit_nr = bit; \
+ } while (0)
+
+#define init_wait_bit_task(waitbit, tsk) \
+ do { \
+ (waitbit)->wait.task = tsk; \
+ (waitbit)->wait.func = wake_bit_function; \
+ INIT_LIST_HEAD(&(waitbit)->wait.task_list); \
} while (0)

/**
diff -puN kernel/wait.c~init-wait-bit-key kernel/wait.c
--- linux-2.6.10-rc1/kernel/wait.c~init-wait-bit-key 2004-11-03 14:35:07.000000000 +0530
+++ linux-2.6.10-rc1-suparna/kernel/wait.c 2004-11-03 14:35:07.000000000 +0530
@@ -132,16 +132,9 @@ EXPORT_SYMBOL(autoremove_wake_function);

int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *arg)
{
- struct wait_bit_key *key = arg;
- struct wait_bit_queue *wait_bit
- = container_of(wait, struct wait_bit_queue, wait);
-
- if (wait_bit->key.flags != key->flags ||
- wait_bit->key.bit_nr != key->bit_nr ||
- test_bit(key->bit_nr, key->flags))
+ if (!test_wait_bit_key(wait, arg))
return 0;
- else
- return autoremove_wake_function(wait, mode, sync, key);
+ return autoremove_wake_function(wait, mode, sync, arg);
}
EXPORT_SYMBOL(wake_bit_function);


_

2004-11-04 04:52:12

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: [PATCH 0/6] AIO wait bit support

On Wed, Nov 03, 2004 at 01:23:11AM -0800, William Lee Irwin III wrote:
> On Wed, Nov 03, 2004 at 02:40:36PM +0530, Suparna Bhattacharya wrote:
> > The series of patches that follow integrate AIO with
> > William Lee Irwin's wait bit changes, to support asynchronous
> > page waits.
>
> Thank you for pursuing this. I apologize for not participating more
> directly in the follow-up.
>
> I also apologize for mentioning this, but I'm disturbed by current
> events right now, so I won't be evaluating these in any technical
> sense for at least a few days.

The main change to the wait bit code is the addition of a wait queue
argument to the action routine, and abstracting the wait bit key
check into a separate function. Rest of the stuff is mostly in aioland.

Regards
Suparna

>
>
> -- wli
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to [email protected]. For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"[email protected]">[email protected]</a>

--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India