2004-11-05 19:37:26

by djwong

[permalink] [raw]
Subject: [PATCH] Oops in aio_free_ring on 2.6.9

Hi,

In pounding on various i386 machines with a random syscall generator, I
uncovered a situation in which the kernel oopses:

1. Use mmap() to map out as much of the process address
space as possible. This is about 2047M on i386.
2. Call io_setup with the first argument set to a
large (~65000) value.

(These notes reference the mainline 2.6.9 source.)

What happens is that the number of pages required to service the
io_setup request is larger than the block of internally allocated page
pointers (fs/aio.c:126), so aio_setup_ring kmalloc's a blob of struct
page pointers, and initializes these pointers to NULL. (fs/aio.c:130)

Next, the aio_setup_ring function tries to mmap a bunch of pages and
fails, because in step 1 we used up all the address space.
aio_setup_ring then calls aio_free_ring to tear all of this down.
(fs/aio.c:143)

aio_free_ring sees the block of struct page pointers and calls free_page
(fs/aio.c:88) on the pointers without checking that they're not NULL.
Unfortunately, they _are_ NULL and *oops*! My patch amends the function
to include a null pointer check.

I have a simple testcase that causes this oops; see
http://submarine.dyndns.org/~djwong/docs/iosetup_crash.c. Run
"./iosetup_crash 100 100000 1000000" to reproduce the oops.

This flaw shows up on mainline 2.6.9, Debian 2.6.8 and SLES9 2.6.5 on
four different machines. The patch is against 2.6.9 mainline and the
problem isn't fixed in 2.6.10-rc1.

Please send replies directly to me, as I'm not subscribed to
linux-kernel or linux-aio.

--Darrick

Signed-off-by: Darrick J. Wong <[email protected]>

diff -uprN linux-2.6.9-orig/fs/aio.c linux-2.6.9/fs/aio.c
--- linux-2.6.9-orig/fs/aio.c 2004-10-18 14:54:07.000000000 -0700
+++ linux-2.6.9/fs/aio.c 2004-11-03 08:47:51.000000000 -0800
@@ -85,7 +85,8 @@ static void aio_free_ring(struct kioctx
long i;

for (i=0; i<info->nr_pages; i++)
- put_page(info->ring_pages[i]);
+ if (info->ring_pages[i])
+ put_page(info->ring_pages[i]);

if (info->mmap_size) {
down_write(&ctx->mm->mmap_sem);


2004-11-07 03:43:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Oops in aio_free_ring on 2.6.9



On Fri, 5 Nov 2004, Darrick J. Wong wrote:
>
> Next, the aio_setup_ring function tries to mmap a bunch of pages and
> fails, because in step 1 we used up all the address space.
> aio_setup_ring then calls aio_free_ring to tear all of this down.
> (fs/aio.c:143)
>
> aio_free_ring sees the block of struct page pointers and calls free_page
> (fs/aio.c:88) on the pointers without checking that they're not NULL.
> Unfortunately, they _are_ NULL and *oops*! My patch amends the function
> to include a null pointer check.

I don't disagree with the bug, but I disagree with the fix.

In my opinion, the problem is that "info->nr_pages" is _wrong_. It's wrong
because it has been initialized to a bogus value.

I'd much prefer this alternate appended patch. Can you verify that it also
fixes the problem (we can drop the bogus info->nr_pages initialization,
because the context - including the info part - has been cleared when it
was allocated, so nr_pages should already have the _correct_ value of zero
at this point).

Linus

-----
===== fs/aio.c 1.60 vs edited =====
--- 1.60/fs/aio.c 2004-10-20 01:12:10 -07:00
+++ edited/fs/aio.c 2004-11-06 19:41:45 -08:00
@@ -118,8 +118,6 @@
if (nr_pages < 0)
return -EINVAL;

- info->nr_pages = nr_pages;
-
nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring)) / sizeof(struct io_event);

info->nr = 0;

2004-11-08 04:08:14

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: [PATCH] Oops in aio_free_ring on 2.6.9

On Sat, Nov 06, 2004 at 07:43:33PM -0800, Linus Torvalds wrote:
>
> On Fri, 5 Nov 2004, Darrick J. Wong wrote:
> >
> > Next, the aio_setup_ring function tries to mmap a bunch of pages and
> > fails, because in step 1 we used up all the address space.
> > aio_setup_ring then calls aio_free_ring to tear all of this down.
> > (fs/aio.c:143)
> >
> > aio_free_ring sees the block of struct page pointers and calls free_page
> > (fs/aio.c:88) on the pointers without checking that they're not NULL.
> > Unfortunately, they _are_ NULL and *oops*! My patch amends the function
> > to include a null pointer check.
>
> I don't disagree with the bug, but I disagree with the fix.
>
> In my opinion, the problem is that "info->nr_pages" is _wrong_. It's wrong
> because it has been initialized to a bogus value.
>
> I'd much prefer this alternate appended patch. Can you verify that it also
> fixes the problem (we can drop the bogus info->nr_pages initialization,
> because the context - including the info part - has been cleared when it
> was allocated, so nr_pages should already have the _correct_ value of zero
> at this point).

Since aio_free_ring uses comparison with info->internal_pages rather than
nr_pages to decide whether to kfree(info->ring_pages), I see your point.

Regards
Suparna

>
> Linus
>
> -----
> ===== fs/aio.c 1.60 vs edited =====
> --- 1.60/fs/aio.c 2004-10-20 01:12:10 -07:00
> +++ edited/fs/aio.c 2004-11-06 19:41:45 -08:00
> @@ -118,8 +118,6 @@
> if (nr_pages < 0)
> return -EINVAL;
>
> - info->nr_pages = nr_pages;
> -
> nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring)) / sizeof(struct io_event);
>
> info->nr = 0;
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to [email protected]. For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"[email protected]">[email protected]</a>

--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India

2004-11-08 19:34:28

by djwong

[permalink] [raw]
Subject: Re: [PATCH] Oops in aio_free_ring on 2.6.9

On Sat, 2004-11-06 at 19:43, Linus Torvalds wrote:

> I don't disagree with the bug, but I disagree with the fix.
>
> In my opinion, the problem is that "info->nr_pages" is _wrong_. It's wrong
> because it has been initialized to a bogus value.
>
> I'd much prefer this alternate appended patch. Can you verify that it also
> fixes the problem (we can drop the bogus info->nr_pages initialization,

You're right, that is a better fix. The aio code is not my current area
of expertise, as I've demonstrated. :)

Your patch also fixes the problem.

Thanks,

--Darrick