2009-10-14 13:53:51

by Earl Chew

[permalink] [raw]
Subject: fs/pipe.c null pointer dereference

I'm working on a 2.6.21 based kernel and received the following
oops last tonight:

> stopped custom tracer.
> Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
> [<ffffffff802899a5>] pipe_rdwr_open+0x35/0x70
> PGD 17d198067 PUD 17c672067 PMD 0
> Oops: 0002 [1] PREEMPT SMP
> CPU 0
> Modules linked in: jffs2 cfi_cmdset_0001 cfi_util cfi_probe gen_probe physmap_lo e1000e fakephp amp_uio uio coretemp lm90 hwmon w83627ehf ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler ppdev physmap mtdpart chipreg map_funcs mtdblock mtd_blkdevs mtdchar mtdcore
> Pid: 6928, comm: poll Not tainted 2.6.21-amp64c-10X-n2x-10X #1
> RIP: 0010:[<ffffffff802899a5>] [<ffffffff802899a5>] pipe_rdwr_open+0x35/0x70
> RSP: 0018:ffff81017c583e48 EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffff81017c9bc490 RCX: ffffffff80e48c00
> RDX: ffff81017c583fd8 RSI: ffff81017c603040 RDI: ffff81000642bf40
> RBP: ffff81017cf2dec0 R08: ffff81017c582000 R09: 0000000000000082
> R10: ffff81017d1b6000 R11: ffffffff802985f0 R12: ffff81017c9bc550
> R13: ffffffff80289970 R14: ffff81017cea1e90 R15: ffff81017fc2c980
> FS: 0000000000000000(0000) GS:ffffffff806b30c0(0063) knlGS:00000000f7dc16c0
> CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
> CR2: 00000000f7dfb040 CR3: 000000017c5bc000 CR4: 00000000000006e0
> Process poll (pid: 6928, threadinfo ffff81017c582000, task ffff81017c603040)
> Stack: ffff81017cf2dec0 ffff81017c9bc490 0000000000008000 ffffffff8028125c
> ffff81017c603040 0000000000008000 ffff81017f68e000 0000000000008000
> 0000000000000004 00000000ffffff9c 0000000000000000 ffffffff8028143d
> Call Trace:
> [<ffffffff8028125c>] __dentry_open+0x13c/0x230
> [<ffffffff8028143d>] do_filp_open+0x2d/0x40
> [<ffffffff802814aa>] do_sys_open+0x5a/0x100
> [<ffffffff8021faf3>] sysenter_do_call+0x1b/0x67
>
>
> Code: 83 40 28 01 8b 45 38 a8 02 74 0b 48 8b 83 d8 01 00 00 83 40
> RIP [<ffffffff802899a5>] pipe_rdwr_open+0x35/0x70
> RSP <ffff81017c583e48>
> CR2: 0000000000000028

The null dereference is happening at the ++ operator in:

> static int
> pipe_rdwr_open(struct inode *inode, struct file *filp)
> {
> mutex_lock(&inode->i_mutex);
> if (filp->f_mode & FMODE_READ)
> inode->i_pipe->readers++;

The corresponding assembler is:

> 0000000000000190 <pipe_rdwr_open>:
> 190: 48 83 ec 18 sub $0x18,%rsp
> 194: 4c 89 64 24 10 mov %r12,0x10(%rsp)
> 199: 4c 8d a7 c0 00 00 00 lea 0xc0(%rdi),%r12
> 1a0: 48 89 1c 24 mov %rbx,(%rsp)
> 1a4: 48 89 6c 24 08 mov %rbp,0x8(%rsp)
> 1a9: 48 89 fb mov %rdi,%rbx
> 1ac: 48 89 f5 mov %rsi,%rbp
> 1af: 4c 89 e7 mov %r12,%rdi
> 1b2: e8 00 00 00 00 callq 1b7 <pipe_rdwr_open+0x27>
> 1b3: R_X86_64_PC32 mutex_lock+0xfffffffffffffffc
> 1b7: 8b 45 38 mov 0x38(%rbp),%eax
> 1ba: a8 01 test $0x1,%al
> 1bc: 74 0e je 1cc <pipe_rdwr_open+0x3c>
> 1be: 48 8b 83 d8 01 00 00 mov 0x1d8(%rbx),%rax
> 1c5: 83 40 28 01 addl $0x1,0x28(%rax) <--------**** FAULT HERE ****

IOW i_pipe is NULL, apparently set by free_pipe_info()

I went trawling through the code to see if I could figure out
how this might have happened. The are mutexes of the form:

mutex_lock(&inode->i_mutex);
...
mutex_unlock(&inode->i_mutex);

throughout fs/pipe.c and fs/fifo.c so the above seems to be
an impossibility ;-)


Perhaps there is a potential window for failure in fs/fifo.c.

pipe_rdwr_open() is only accessible via rdwr_pipefifo_fops and
that is obtained via fs/fifo.c.

Looking at fs/fifo.c I see:

mutex_lock(&inode->i_mutex);
...
switch (filp->f_mode) {
case FMODE_READ:
...
if (!pipe->writers) {
wait_for_partner(inode, filp, &pipe->w_counter);
...
case FMODE_WRITE:
...
if (!pipe->readers) {
wait_for_partner(inode, filp, &pipe->r_counter);
...

case FMODE_READ | FMODE_WRITE:
filp->f_op = &rdwr_pipefifo_fops;
...
if (pipe->readers == 1 || pipe->writers == 1)
wake_up_partner(inode);
break;

}
...
mutex_unlock(&inode->i_mutex);

So it turns out that FMODE_READ|FMODE_WRITE does not block.

However, FMODE_READ alone or FMODE_WRITE alone may call
wait_for_partner(), which in turn calls pipe_wait(), which
in turn drops the mutex, then reacquires it:

> void pipe_wait(struct pipe_inode_info *pipe)
> {
> ...
> if (pipe->inode)
> mutex_unlock(&pipe->inode->i_mutex);
> ...
> if (pipe->inode)
> mutex_lock(&pipe->inode->i_mutex);
> }

So perhaps:

1. Process A calls fifo_open(FMODE_READ), then relinquishes
the mutex at pipe_wait() (readers == 1, writers == 0)

2. Process B calls fifo_open(FMODE_WRITE|FMODE_READ) and completes
(readers == 2, writers == 1)

3. Process A wakes, but finds signal pending, so goes to err_rd
and drops readers to 1

... but I couldn't figure out a way for this to fail ...

Any other ideas?

Earl


2009-10-14 14:53:42

by Frans Pop

[permalink] [raw]
Subject: Re: fs/pipe.c null pointer dereference

Earl Chew wrote:
> I'm working on a 2.6.21 based kernel and received the following
> oops last tonight:

2.6.21 is really rather ancient for the kernel community and I doubt anyone
here will really want to look into issues with it, given that there's a
realistic chance it has already been fixed since then. The current stable
version is 2.6.31 after all.

Can you reproduce the oops with a current kernel? Have you checked if there
have been any changes in the code you identified since 2.6.21?

Cheers,
FJP

2009-10-14 22:51:55

by Earl Chew

[permalink] [raw]
Subject: Re: fs/pipe.c null pointer dereference

Frans Pop wrote:
> 2.6.21 is really rather ancient for the kernel community and I doubt anyone
> here will really want to look into issues with it, given that there's a
> realistic chance it has already been fixed since then. The current stable
> version is 2.6.31 after all.
>
> Can you reproduce the oops with a current kernel? Have you checked if there
> have been any changes in the code you identified since 2.6.21?

I was mostly after some guidance. I'm happy to dig around to figure
out what the problem is and log a bug if indeed I find one.


Yes, I have searched for obvious patches but didn't turn up anything.

I have looked at the code for more recent 2.6.31 kernel on
lxr.linux.no and haven't turned up any appreciable difference.

I have given the matter more thought, and I believe it is unlikely
that the problem is in fs/fifo.c. As outlined in my previous
email, the code looks ok.


I'm a little confused about the switch between the fifo_open()
and the subsequent call to pipe_rdwr_open(). My understanding
is fifo_open() is called during open("/fifo", O_RDWR) which
fills in filp->f_op to point at rdwr_fifo_fops.

Thereafter, a call to close() will vector through filp->f_op
to pipe_rdwr_release().

I'm still puzzled as to how pipe_rdwr_open() is called. The
stack trace says:

Call Trace:
[<ffffffff8028125c>] __dentry_open+0x13c/0x230
[<ffffffff8028143d>] do_filp_open+0x2d/0x40
[<ffffffff802814aa>] do_sys_open+0x5a/0x100
[<ffffffff8021faf3>] sysenter_do_call+0x1b/0x67

I figure that the fileoperation for the fifo inode must be
revectored to rdwr_fifo_fops --- but I can't see that's
done exactly.

Any hints as to where to look for this ?


In any case, my current guess as to how this happens is that
there might be window of time between do_sys_open() ... pipe_rdwr_open()
where the inode is obtained and the fileoperation pointer is
followed to get to pipe_rdwr_open().

During that window of time, might it be possible for the fifo
to be closed and the underlying pipe released?


Where can I look to see if this is possible?

Earl

2009-10-15 00:12:21

by Frans Pop

[permalink] [raw]
Subject: Re: fs/pipe.c null pointer dereference

On Thursday 15 October 2009, Earl Chew wrote:
> Frans Pop wrote:
> > 2.6.21 is really rather ancient for the kernel community and I doubt
> > anyone here will really want to look into issues with it, given that
> > there's a realistic chance it has already been fixed since then. The
> > current stable version is 2.6.31 after all.
> >
> > Can you reproduce the oops with a current kernel? Have you checked if
> > there have been any changes in the code you identified since 2.6.21?
>
> I was mostly after some guidance. I'm happy to dig around to figure
> out what the problem is and log a bug if indeed I find one.

OK. I'm afraid I can't help you there myself, but maybe with this
additional info someone else will.

Good luck with the hunt.

2009-10-16 03:33:38

by Earl Chew

[permalink] [raw]
Subject: Re: fs/pipe.c null pointer dereference

Earl Chew wrote:
> I have given the matter more thought, and I believe it is unlikely
> that the problem is in fs/fifo.c. As outlined in my previous
> email, the code looks ok.

I notice that the other place the rdwr_pipe_fops is set up is:

> static struct inode * get_pipe_inode(void)
> {
> ...
> inode->i_fop = &rdwr_pipe_fops;
>

and that get_pipe_inode() is called from:

> struct file *create_write_pipe(void)
> {
> ...
> inode = get_pipe_inode();
> if (!inode)
> goto err_file;

and that is called from do_pipe():

> int do_pipe(int *fd)
> {
> ...
> fw = create_write_pipe();
> if (IS_ERR(fw))
> return PTR_ERR(fw);
> fr = create_read_pipe(fw);
> error = PTR_ERR(fr);
> if (IS_ERR(fr))
> goto err_write_pipe;


... and do_pipe() is called from many places.


The stack trace I have shows:

> Call Trace:
> [<ffffffff802899a5>] pipe_rdwr_open+0x35/0x70
> [<ffffffff8028125c>] __dentry_open+0x13c/0x230
> [<ffffffff8028143d>] do_filp_open+0x2d/0x40
> [<ffffffff802814aa>] do_sys_open+0x5a/0x100
> [<ffffffff8021faf3>] sysenter_do_call+0x1b/0x67


How can it be possible that sys_open (ie open(2)) can get hold
of an inode whose i_fop->open() points at pipe_rdwr_open() ?

Is this possible via /proc/pid/fd/* ?

Yes, it looks likely:

> { echo y ; sleep 1 ; } | { while read ; do echo z$REPLY; done ; } &
> PID=$!
> OUT=$(ps -efl | grep 'sleep 1' | grep -v grep)
> OUT=${OUT%% *}
> echo n > /proc/$OUT/fd/1

This test prints zy and zn indicating that the 2nd echo was able
to open the writing end of the pipe and inject another character.

Earl

2009-10-16 03:41:34

by Earl Chew

[permalink] [raw]
Subject: Re: fs/pipe.c null pointer dereference

Earl Chew wrote:
> Is this possible via /proc/pid/fd/* ?

The window for failure is small. It's easiest to reproduce
this problem by stalling pipe_rdwr_open() to open up the
window:

--- pipe.c.orig 2009-10-15 20:33:53.000000000 -0700
+++ pipe.c 2009-10-15 20:17:40.000000000 -0700
@@ -736,2 +736,3 @@
{
+ msleep(100);
mutex_lock(&inode->i_mutex);


With the failure window widened, it's easy to reproduce
the failure with:

--------------------------------------------------------------
#!/bin/sh

while : ; do
{ echo y ; sleep 1 ; } | { while read ; do echo z$REPLY; done ; } &
PID=$!
OUT=$(ps -efl | grep 'sleep 1' | grep -v grep |
{ read PID REST ; echo $PID; } )
OUT="${OUT%% *}"
DELAY=$((RANDOM * 1000 / 32768))
usleep $((DELAY * 1000 + RANDOM % 1000 ))
echo n > /proc/$OUT/fd/1
done
--------------------------------------------------------------

2009-10-16 04:28:18

by Earl Chew

[permalink] [raw]
Subject: Re: fs/pipe.c null pointer dereference

Frans Pop wrote:
> Good luck with the hunt.

I've logged a defect:

http://bugzilla.kernel.org/show_bug.cgi?id=14416

which includes a proposed fix and script to reproduce.

Earl


2009-10-16 07:18:13

by Pekka Enberg

[permalink] [raw]
Subject: Re: fs/pipe.c null pointer dereference

Hi Earl,

On Fri, Oct 16, 2009 at 7:27 AM, Earl Chew <[email protected]> wrote:
> Frans Pop wrote:
>>
>> Good luck with the hunt.
>
> I've logged a defect:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=14416
>
> which includes a proposed fix and script to reproduce.

Can you please send the patch to LKML so your fix doesn't get lost in
the noise? See Documentation/SubmittingPatches for details.

Pekka

2009-10-16 14:52:17

by Earl Chew

[permalink] [raw]
Subject: Re: fs/pipe.c null pointer dereference

Pekka Enberg wrote:
> Can you please send the patch to LKML so your fix doesn't get lost in
> the noise? See Documentation/SubmittingPatches for details.

Done.

http://lkml.org/lkml/2009/10/16/146

Earl