2014-04-29 18:39:27

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 0/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()

Completely untested and I know nothing about aio, thus needs an ack
from maintainers or should be ignored.

But exit_aio() looks "obviously unnecessarily overcomplicated", I was
really puzzled when I looked at this code by accident.

Kent, could you also explain kioctx->dead is atomic_t? This looks
pointless? afaics atomic_ buys nothing in this case.

Oleg.


2014-04-29 18:39:42

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()

1. We can read ->ioctx_table only once and we do not read rcu_read_lock()
or even rcu_dereference().

This mm has no users, nobody else can play with ->ioctx_table. Otherwise
the code is buggy anyway, if we need rcu_read_lock() in a loop because
->ioctx_table can be updated then kfree(table) is obviously wrong.

2. Update the comment. "exit_mmap(mm) is coming" is the good reason to avoid
munmap(), but another reason is that we simply can't do vm_munmap() unless
current->mm == mm and this is not true in general, the caller is mmput().

Signed-off-by: Oleg Nesterov <[email protected]>
---
fs/aio.c | 47 ++++++++++++++++++-----------------------------
1 files changed, 18 insertions(+), 29 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 12a3de0..5fd1fe7 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -777,40 +777,29 @@ EXPORT_SYMBOL(wait_on_sync_kiocb);
*/
void exit_aio(struct mm_struct *mm)
{
- struct kioctx_table *table;
- struct kioctx *ctx;
- unsigned i = 0;
-
- while (1) {
- rcu_read_lock();
- table = rcu_dereference(mm->ioctx_table);
-
- do {
- if (!table || i >= table->nr) {
- rcu_read_unlock();
- rcu_assign_pointer(mm->ioctx_table, NULL);
- if (table)
- kfree(table);
- return;
- }
-
- ctx = table->table[i++];
- } while (!ctx);
+ struct kioctx_table *table = rcu_dereference_raw(mm->ioctx_table);
+ int i;

- rcu_read_unlock();
+ if (!table)
+ return;

+ for (i = 0; i < table->nr; ++i) {
+ struct kioctx *ctx = table->table[i];
/*
- * We don't need to bother with munmap() here -
- * exit_mmap(mm) is coming and it'll unmap everything.
- * Since aio_free_ring() uses non-zero ->mmap_size
- * as indicator that it needs to unmap the area,
- * just set it to 0; aio_free_ring() is the only
- * place that uses ->mmap_size, so it's safe.
+ * We don't need to bother with munmap() here - exit_mmap(mm)
+ * is coming and it'll unmap everything. And we simply can't,
+ * this is not necessarily our ->mm.
+ * Since kill_ioctx() uses non-zero ->mmap_size as indicator
+ * that it needs to unmap the area, just set it to 0.
*/
- ctx->mmap_size = 0;
-
- kill_ioctx(mm, ctx);
+ if (ctx) {
+ ctx->mmap_size = 0;
+ kill_ioctx(mm, ctx);
+ }
}
+
+ rcu_assign_pointer(mm->ioctx_table, NULL);
+ kfree(table);
}

static void put_reqs_available(struct kioctx *ctx, unsigned nr)
--
1.5.5.1

2014-04-29 19:18:52

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()

1. We can read ->ioctx_table only once and we do not read rcu_read_lock()
or even rcu_dereference().

This mm has no users, nobody else can play with ->ioctx_table. Otherwise
the code is buggy anyway, if we need rcu_read_lock() in a loop because
->ioctx_table can be updated then kfree(table) is obviously wrong.

2. Update the comment. "exit_mmap(mm) is coming" is the good reason to avoid
munmap(), but another reason is that we simply can't do vm_munmap() unless
current->mm == mm and this is not true in general, the caller is mmput().

Signed-off-by: Oleg Nesterov <[email protected]>
---
fs/aio.c | 47 ++++++++++++++++++-----------------------------
1 files changed, 18 insertions(+), 29 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 12a3de0..5fd1fe7 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -777,40 +777,29 @@ EXPORT_SYMBOL(wait_on_sync_kiocb);
*/
void exit_aio(struct mm_struct *mm)
{
- struct kioctx_table *table;
- struct kioctx *ctx;
- unsigned i = 0;
-
- while (1) {
- rcu_read_lock();
- table = rcu_dereference(mm->ioctx_table);
-
- do {
- if (!table || i >= table->nr) {
- rcu_read_unlock();
- rcu_assign_pointer(mm->ioctx_table, NULL);
- if (table)
- kfree(table);
- return;
- }
-
- ctx = table->table[i++];
- } while (!ctx);
+ struct kioctx_table *table = rcu_dereference_raw(mm->ioctx_table);
+ int i;

- rcu_read_unlock();
+ if (!table)
+ return;

+ for (i = 0; i < table->nr; ++i) {
+ struct kioctx *ctx = table->table[i];
/*
- * We don't need to bother with munmap() here -
- * exit_mmap(mm) is coming and it'll unmap everything.
- * Since aio_free_ring() uses non-zero ->mmap_size
- * as indicator that it needs to unmap the area,
- * just set it to 0; aio_free_ring() is the only
- * place that uses ->mmap_size, so it's safe.
+ * We don't need to bother with munmap() here - exit_mmap(mm)
+ * is coming and it'll unmap everything. And we simply can't,
+ * this is not necessarily our ->mm.
+ * Since kill_ioctx() uses non-zero ->mmap_size as indicator
+ * that it needs to unmap the area, just set it to 0.
*/
- ctx->mmap_size = 0;
-
- kill_ioctx(mm, ctx);
+ if (ctx) {
+ ctx->mmap_size = 0;
+ kill_ioctx(mm, ctx);
+ }
}
+
+ rcu_assign_pointer(mm->ioctx_table, NULL);
+ kfree(table);
}

static void put_reqs_available(struct kioctx *ctx, unsigned nr)
--
1.5.5.1

2014-04-29 19:33:56

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [PATCH 0/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()

On Tue, Apr 29, 2014 at 08:39:15PM +0200, Oleg Nesterov wrote:
> Completely untested and I know nothing about aio, thus needs an ack
> from maintainers or should be ignored.
>
> But exit_aio() looks "obviously unnecessarily overcomplicated", I was
> really puzzled when I looked at this code by accident.
>
> Kent, could you also explain kioctx->dead is atomic_t? This looks
> pointless? afaics atomic_ buys nothing in this case.

If it wasn't atomic, it would need to be volatile.

-ben
--
"Thought is the essence of where you are now."

2014-04-29 19:49:12

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()

On 04/29, Benjamin LaHaise wrote:
>
> On Tue, Apr 29, 2014 at 08:39:15PM +0200, Oleg Nesterov wrote:
> >
> > Kent, could you also explain kioctx->dead is atomic_t? This looks
> > pointless? afaics atomic_ buys nothing in this case.
>
> If it wasn't atomic, it would need to be volatile.

Then ACCESS_ONCE() in aio_read_events() makes more sense. Although
I can't say I understand why it is needed.

Oleg.

2014-04-29 20:42:18

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [PATCH 1/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()

On Tue, Apr 29, 2014 at 08:40:04PM +0200, Oleg Nesterov wrote:
> 1. We can read ->ioctx_table only once and we do not read rcu_read_lock()
> or even rcu_dereference().
>
> This mm has no users, nobody else can play with ->ioctx_table. Otherwise
> the code is buggy anyway, if we need rcu_read_lock() in a loop because
> ->ioctx_table can be updated then kfree(table) is obviously wrong.
>
> 2. Update the comment. "exit_mmap(mm) is coming" is the good reason to avoid
> munmap(), but another reason is that we simply can't do vm_munmap() unless
> current->mm == mm and this is not true in general, the caller is mmput().
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Your patch does not apply because it is whitespace damaged. Please resend
and verify that it applies with 'git am'.

-ben

> ---
> fs/aio.c | 47 ++++++++++++++++++-----------------------------
> 1 files changed, 18 insertions(+), 29 deletions(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 12a3de0..5fd1fe7 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -777,40 +777,29 @@ EXPORT_SYMBOL(wait_on_sync_kiocb);
> */
> void exit_aio(struct mm_struct *mm)
> {
> - struct kioctx_table *table;
> - struct kioctx *ctx;
> - unsigned i = 0;
> -
> - while (1) {
> - rcu_read_lock();
> - table = rcu_dereference(mm->ioctx_table);
> -
> - do {
> - if (!table || i >= table->nr) {
> - rcu_read_unlock();
> - rcu_assign_pointer(mm->ioctx_table, NULL);
> - if (table)
> - kfree(table);
> - return;
> - }
> -
> - ctx = table->table[i++];
> - } while (!ctx);
> + struct kioctx_table *table = rcu_dereference_raw(mm->ioctx_table);
> + int i;
>
> - rcu_read_unlock();
> + if (!table)
> + return;
>
> + for (i = 0; i < table->nr; ++i) {
> + struct kioctx *ctx = table->table[i];
> /*
> - * We don't need to bother with munmap() here -
> - * exit_mmap(mm) is coming and it'll unmap everything.
> - * Since aio_free_ring() uses non-zero ->mmap_size
> - * as indicator that it needs to unmap the area,
> - * just set it to 0; aio_free_ring() is the only
> - * place that uses ->mmap_size, so it's safe.
> + * We don't need to bother with munmap() here - exit_mmap(mm)
> + * is coming and it'll unmap everything. And we simply can't,
> + * this is not necessarily our ->mm.
> + * Since kill_ioctx() uses non-zero ->mmap_size as indicator
> + * that it needs to unmap the area, just set it to 0.
> */
> - ctx->mmap_size = 0;
> -
> - kill_ioctx(mm, ctx);
> + if (ctx) {
> + ctx->mmap_size = 0;
> + kill_ioctx(mm, ctx);
> + }
> }
> +
> + rcu_assign_pointer(mm->ioctx_table, NULL);
> + kfree(table);
> }
>
> static void put_reqs_available(struct kioctx *ctx, unsigned nr)
> --
> 1.5.5.1
>

--
"Thought is the essence of where you are now."

2014-04-29 21:22:24

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [PATCH 1/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()

On Tue, Apr 29, 2014 at 04:42:17PM -0400, Benjamin LaHaise wrote:
> > Signed-off-by: Oleg Nesterov <[email protected]>
>
> Your patch does not apply because it is whitespace damaged. Please resend
> and verify that it applies with 'git am'.

Whoops, it's not whitespace damange, but rather that it doesn't apply with
the other changes that are queued up in the aio-next tree. You can find a
copy of that tree at git://git.kvack.org/~bcrl/aio-next.git . The change
that conflicts is an additional parameter to kill_ioctx().

-ben
--
"Thought is the essence of where you are now."

2014-04-29 23:05:13

by Mateusz Guzik

[permalink] [raw]
Subject: Re: [PATCH 1/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()

On Tue, Apr 29, 2014 at 05:22:22PM -0400, Benjamin LaHaise wrote:
> On Tue, Apr 29, 2014 at 04:42:17PM -0400, Benjamin LaHaise wrote:
> > > Signed-off-by: Oleg Nesterov <[email protected]>
> >
> > Your patch does not apply because it is whitespace damaged. Please resend
> > and verify that it applies with 'git am'.
>
> Whoops, it's not whitespace damange, but rather that it doesn't apply with
> the other changes that are queued up in the aio-next tree. You can find a
> copy of that tree at git://git.kvack.org/~bcrl/aio-next.git . The change
> that conflicts is an additional parameter to kill_ioctx().

While here is there any reason for:
rcu_assign_pointer(mm->ioctx_table, NULL);

Nothing looks at this pointer afterwards and mm is about to be freed.

I thought it would be used to sanity check that everything was cleared
before freeing, but that is nod one and not every pointer is nullified
anyway.

--
Mateusz Guzik

2014-04-29 23:36:32

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 0/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()

On Tue, Apr 29, 2014 at 08:39:30PM +0200, Oleg Nesterov wrote:
> 1. We can read ->ioctx_table only once and we do not read rcu_read_lock()
> or even rcu_dereference().
>
> This mm has no users, nobody else can play with ->ioctx_table. Otherwise
> the code is buggy anyway, if we need rcu_read_lock() in a loop because
> ->ioctx_table can be updated then kfree(table) is obviously wrong.
>
> 2. Update the comment. "exit_mmap(mm) is coming" is the good reason to avoid
> munmap(), but another reason is that we simply can't do vm_munmap() unless
> current->mm == mm and this is not true in general, the caller is mmput().

I'm pretty sure you're analysis is all correct. IIRC there's other things in the
shutdown path we still have to be carefull with synchronization wise, but unless
I've forgotten something you're right about ioctx_table.

If I wrote that code (I'd have to check git blame), I'd say I just wrote it that
way because I prefer to be consistent about how things like RCU protected
pointers are accessed, even if it's not strictly necessary. Feels less tricky to
me. But I don't have a strong preference one way or the other.

w.r.t. ioctx->dead, yes, no need for it to be atomic. a regular int and xchg()
is probably more correct anyways since xchg() implies memory barriers and
atomic_xchg() does not.

2014-04-30 12:13:47

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/1] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()

On 04/30, Mateusz Guzik wrote:
>
> On Tue, Apr 29, 2014 at 05:22:22PM -0400, Benjamin LaHaise wrote:
> > On Tue, Apr 29, 2014 at 04:42:17PM -0400, Benjamin LaHaise wrote:
> > > > Signed-off-by: Oleg Nesterov <[email protected]>
> > >
> > > Your patch does not apply because it is whitespace damaged. Please resend
> > > and verify that it applies with 'git am'.
> >
> > Whoops, it's not whitespace damange, but rather that it doesn't apply with
> > the other changes that are queued up in the aio-next tree. You can find a
> > copy of that tree at git://git.kvack.org/~bcrl/aio-next.git . The change
> > that conflicts is an additional parameter to kill_ioctx().
>
> While here is there any reason for:
> rcu_assign_pointer(mm->ioctx_table, NULL);
>
> Nothing looks at this pointer afterwards and mm is about to be freed.

Yees, and initially I was going to remove it, but this needs "3" in the
changelog and I am lazy ;)

> I thought it would be used to sanity check that everything was cleared
> before freeing,

Actually, ->mm_count can be > 1, we do not know if we are going to free
this mm_struct or not.

We do not really need to nullify this ptr anyway, but perhaps it makes
sense to keep this initialization anyway, I dunno. But I'll change it to
use RCU_INIT_POINTER(NULL) in v2.

Oleg.

2014-04-30 14:16:09

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 0/2] aio: ioctx_table/rcu_read_lock cleanups

On 04/29, Benjamin LaHaise wrote:
>
> Whoops, it's not whitespace damange, but rather that it doesn't apply with
> the other changes that are queued up in the aio-next tree. You can find a
> copy of that tree at git://git.kvack.org/~bcrl/aio-next.git . The change
> that conflicts is an additional parameter to kill_ioctx().

Please see v2. Rebased, added another untested cleanup.

Kent, it seems that you agree at least with 1/2, I'll appreciate your ack ;)

And it seems that we can kill mm->ioctx_lock, with the minimal complications
we can move it into kioctx_table, but this is minor.

Oleg.

2014-04-30 14:16:31

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 1/2] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()

1. We can read ->ioctx_table only once and we do not read rcu_read_lock()
or even rcu_dereference().

This mm has no users, nobody else can play with ->ioctx_table. Otherwise
the code is buggy anyway, if we need rcu_read_lock() in a loop because
->ioctx_table can be updated then kfree(table) is obviously wrong.

2. Update the comment. "exit_mmap(mm) is coming" is the good reason to avoid
munmap(), but another reason is that we simply can't do vm_munmap() unless
current->mm == mm and this is not true in general, the caller is mmput().

3. We do not really need to nullify mm->ioctx_table before return, probably
the current code does this to catch the potential problems. But in this
case RCU_INIT_POINTER(NULL) looks better.

Signed-off-by: Oleg Nesterov <[email protected]>
---
fs/aio.c | 47 ++++++++++++++++++-----------------------------
1 files changed, 18 insertions(+), 29 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 79b7e69..3526c2b 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -791,40 +791,29 @@ EXPORT_SYMBOL(wait_on_sync_kiocb);
*/
void exit_aio(struct mm_struct *mm)
{
- struct kioctx_table *table;
- struct kioctx *ctx;
- unsigned i = 0;
-
- while (1) {
- rcu_read_lock();
- table = rcu_dereference(mm->ioctx_table);
-
- do {
- if (!table || i >= table->nr) {
- rcu_read_unlock();
- rcu_assign_pointer(mm->ioctx_table, NULL);
- if (table)
- kfree(table);
- return;
- }
-
- ctx = table->table[i++];
- } while (!ctx);
+ struct kioctx_table *table = rcu_dereference_raw(mm->ioctx_table);
+ int i;

- rcu_read_unlock();
+ if (!table)
+ return;

+ for (i = 0; i < table->nr; ++i) {
+ struct kioctx *ctx = table->table[i];
/*
- * We don't need to bother with munmap() here -
- * exit_mmap(mm) is coming and it'll unmap everything.
- * Since aio_free_ring() uses non-zero ->mmap_size
- * as indicator that it needs to unmap the area,
- * just set it to 0; aio_free_ring() is the only
- * place that uses ->mmap_size, so it's safe.
+ * We don't need to bother with munmap() here - exit_mmap(mm)
+ * is coming and it'll unmap everything. And we simply can't,
+ * this is not necessarily our ->mm.
+ * Since kill_ioctx() uses non-zero ->mmap_size as indicator
+ * that it needs to unmap the area, just set it to 0.
*/
- ctx->mmap_size = 0;
-
- kill_ioctx(mm, ctx, NULL);
+ if (ctx) {
+ ctx->mmap_size = 0;
+ kill_ioctx(mm, ctx, NULL);
+ }
}
+
+ RCU_INIT_POINTER(mm->ioctx_table, NULL);
+ kfree(table);
}

static void put_reqs_available(struct kioctx *ctx, unsigned nr)
--
1.5.5.1

2014-04-30 14:16:49

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 2/2] aio: kill the misleading rcu read locks in ioctx_add_table() and kill_ioctx()

ioctx_add_table() is the writer, it does not need rcu_read_lock() to
protect ->ioctx_table. It relies on mm->ioctx_lock and rcu locks just
add the confusion.

And it doesn't need rcu_dereference() by the same reason, it must see
any updates previously done under the same ->ioctx_lock. We could use
rcu_dereference_protected() but the patch uses rcu_dereference_raw(),
the function is simple enough.

The same for kill_ioctx(), although it does not update the pointer.

Signed-off-by: Oleg Nesterov <[email protected]>
---
fs/aio.c | 14 +++-----------
1 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 3526c2b..a415870 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -554,8 +554,7 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
struct aio_ring *ring;

spin_lock(&mm->ioctx_lock);
- rcu_read_lock();
- table = rcu_dereference(mm->ioctx_table);
+ table = rcu_dereference_raw(mm->ioctx_table);

while (1) {
if (table)
@@ -563,7 +562,6 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
if (!table->table[i]) {
ctx->id = i;
table->table[i] = ctx;
- rcu_read_unlock();
spin_unlock(&mm->ioctx_lock);

/* While kioctx setup is in progress,
@@ -577,8 +575,6 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
}

new_nr = (table ? table->nr : 1) * 4;
-
- rcu_read_unlock();
spin_unlock(&mm->ioctx_lock);

table = kzalloc(sizeof(*table) + sizeof(struct kioctx *) *
@@ -589,8 +585,7 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm)
table->nr = new_nr;

spin_lock(&mm->ioctx_lock);
- rcu_read_lock();
- old = rcu_dereference(mm->ioctx_table);
+ old = rcu_dereference_raw(mm->ioctx_table);

if (!old) {
rcu_assign_pointer(mm->ioctx_table, table);
@@ -737,12 +732,9 @@ static int kill_ioctx(struct mm_struct *mm, struct kioctx *ctx,


spin_lock(&mm->ioctx_lock);
- rcu_read_lock();
- table = rcu_dereference(mm->ioctx_table);
-
+ table = rcu_dereference_raw(mm->ioctx_table);
WARN_ON(ctx != table->table[ctx->id]);
table->table[ctx->id] = NULL;
- rcu_read_unlock();
spin_unlock(&mm->ioctx_lock);

/* percpu_ref_kill() will do the necessary call_rcu() */
--
1.5.5.1

2014-04-30 15:23:46

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()

Hi Oleg,

On Wed, Apr 30, 2014 at 04:16:16PM +0200, Oleg Nesterov wrote:
> 1. We can read ->ioctx_table only once and we do not read rcu_read_lock()
> or even rcu_dereference().
>
> This mm has no users, nobody else can play with ->ioctx_table. Otherwise
> the code is buggy anyway, if we need rcu_read_lock() in a loop because
> ->ioctx_table can be updated then kfree(table) is obviously wrong.
>
> 2. Update the comment. "exit_mmap(mm) is coming" is the good reason to avoid
> munmap(), but another reason is that we simply can't do vm_munmap() unless
> current->mm == mm and this is not true in general, the caller is mmput().
>
> 3. We do not really need to nullify mm->ioctx_table before return, probably
> the current code does this to catch the potential problems. But in this
> case RCU_INIT_POINTER(NULL) looks better.

Looks pretty good. One minor style comment below.

> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> fs/aio.c | 47 ++++++++++++++++++-----------------------------
> 1 files changed, 18 insertions(+), 29 deletions(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 79b7e69..3526c2b 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -791,40 +791,29 @@ EXPORT_SYMBOL(wait_on_sync_kiocb);
> */
> void exit_aio(struct mm_struct *mm)
> {
> - struct kioctx_table *table;
> - struct kioctx *ctx;
> - unsigned i = 0;
> -
> - while (1) {
> - rcu_read_lock();
> - table = rcu_dereference(mm->ioctx_table);
> -
> - do {
> - if (!table || i >= table->nr) {
> - rcu_read_unlock();
> - rcu_assign_pointer(mm->ioctx_table, NULL);
> - if (table)
> - kfree(table);
> - return;
> - }
> -
> - ctx = table->table[i++];
> - } while (!ctx);
> + struct kioctx_table *table = rcu_dereference_raw(mm->ioctx_table);
> + int i;
>
> - rcu_read_unlock();
> + if (!table)
> + return;
>
> + for (i = 0; i < table->nr; ++i) {
> + struct kioctx *ctx = table->table[i];
> /*
> - * We don't need to bother with munmap() here -
> - * exit_mmap(mm) is coming and it'll unmap everything.
> - * Since aio_free_ring() uses non-zero ->mmap_size
> - * as indicator that it needs to unmap the area,
> - * just set it to 0; aio_free_ring() is the only
> - * place that uses ->mmap_size, so it's safe.
> + * We don't need to bother with munmap() here - exit_mmap(mm)
> + * is coming and it'll unmap everything. And we simply can't,
> + * this is not necessarily our ->mm.
> + * Since kill_ioctx() uses non-zero ->mmap_size as indicator
> + * that it needs to unmap the area, just set it to 0.
> */
> - ctx->mmap_size = 0;
> -
> - kill_ioctx(mm, ctx, NULL);
> + if (ctx) {
> + ctx->mmap_size = 0;
> + kill_ioctx(mm, ctx, NULL);
> + }

Rather than indenting and moving the two lines changing mmap_size and the
kill_ioctx() call, why not just do "if (!ctx) ... continue;"? That reduces
the number of lines changed and avoid excessive indentation.

-ben

> }
> +
> + RCU_INIT_POINTER(mm->ioctx_table, NULL);
> + kfree(table);
> }
>
> static void put_reqs_available(struct kioctx *ctx, unsigned nr)
> --
> 1.5.5.1
>

--
"Thought is the essence of where you are now."

2014-04-30 17:03:04

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v3 1/2] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()

On 04/30, Benjamin LaHaise wrote:
>
> > - ctx->mmap_size = 0;
> > -
> > - kill_ioctx(mm, ctx, NULL);
> > + if (ctx) {
> > + ctx->mmap_size = 0;
> > + kill_ioctx(mm, ctx, NULL);
> > + }
>
> Rather than indenting and moving the two lines changing mmap_size and the
> kill_ioctx() call, why not just do "if (!ctx) ... continue;"? That reduces
> the number of lines changed and avoid excessive indentation.

OK. To me the code looks better/simpler with "if (ctx)", but this is subjective
of course, I won't argue.

The patch still removes the empty line between mmap_size = 0 and kill_ioctx(),
we reset mmap_size only for kill_ioctx(). But feel free to remove this change.

-------------------------------------------------------------------------------
Subject: [PATCH v3 1/2] aio: change exit_aio() to load mm->ioctx_table once and avoid rcu_read_lock()

1. We can read ->ioctx_table only once and we do not read rcu_read_lock()
or even rcu_dereference().

This mm has no users, nobody else can play with ->ioctx_table. Otherwise
the code is buggy anyway, if we need rcu_read_lock() in a loop because
->ioctx_table can be updated then kfree(table) is obviously wrong.

2. Update the comment. "exit_mmap(mm) is coming" is the good reason to avoid
munmap(), but another reason is that we simply can't do vm_munmap() unless
current->mm == mm and this is not true in general, the caller is mmput().

3. We do not really need to nullify mm->ioctx_table before return, probably
the current code does this to catch the potential problems. But in this
case RCU_INIT_POINTER(NULL) looks better.

Signed-off-by: Oleg Nesterov <[email protected]>
---
fs/aio.c | 42 ++++++++++++++++--------------------------
1 files changed, 16 insertions(+), 26 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 79b7e69..b67f3e2 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -791,40 +791,30 @@ EXPORT_SYMBOL(wait_on_sync_kiocb);
*/
void exit_aio(struct mm_struct *mm)
{
- struct kioctx_table *table;
- struct kioctx *ctx;
- unsigned i = 0;
-
- while (1) {
- rcu_read_lock();
- table = rcu_dereference(mm->ioctx_table);
-
- do {
- if (!table || i >= table->nr) {
- rcu_read_unlock();
- rcu_assign_pointer(mm->ioctx_table, NULL);
- if (table)
- kfree(table);
- return;
- }
+ struct kioctx_table *table = rcu_dereference_raw(mm->ioctx_table);
+ int i;

- ctx = table->table[i++];
- } while (!ctx);
+ if (!table)
+ return;

- rcu_read_unlock();
+ for (i = 0; i < table->nr; ++i) {
+ struct kioctx *ctx = table->table[i];

+ if (!ctx)
+ continue;
/*
- * We don't need to bother with munmap() here -
- * exit_mmap(mm) is coming and it'll unmap everything.
- * Since aio_free_ring() uses non-zero ->mmap_size
- * as indicator that it needs to unmap the area,
- * just set it to 0; aio_free_ring() is the only
- * place that uses ->mmap_size, so it's safe.
+ * We don't need to bother with munmap() here - exit_mmap(mm)
+ * is coming and it'll unmap everything. And we simply can't,
+ * this is not necessarily our ->mm.
+ * Since kill_ioctx() uses non-zero ->mmap_size as indicator
+ * that it needs to unmap the area, just set it to 0.
*/
ctx->mmap_size = 0;
-
kill_ioctx(mm, ctx, NULL);
}
+
+ RCU_INIT_POINTER(mm->ioctx_table, NULL);
+ kfree(table);
}

static void put_reqs_available(struct kioctx *ctx, unsigned nr)
--
1.5.5.1