2004-01-28 17:17:33

by Miquel van Smoorenburg

[permalink] [raw]
Subject: 2.6.2-rc2 nfsd+xfs spins in i_size_read()

I have a Linux 2.6.2-rc2 NFS file server and another similar
box as client. Kernel is compiled for SMP (hyperthreading).

On the NFS server I'm exporting an XFS filesystem to the client
over Gigabit ethernet. The client mounts using
mount -o nfsvers=3,intr,rsize=32768,wsize=32768 server:/export/hwr /mnt

On the client I then run a large dd to a file on the server,
like dd if=/dev/zero of=/mnt/file bs=4096 count=800000

In a few seconds, the server locks up. It spins in
generic_fillattr(), apparently in the i_size_read() inline function.
Server responds to pings and sysrq, but nothing else.

2.6.1 doesn't show this behaviour. I tested several kernels:

2.6.1 OK!
2.6.1-bk4 OK!
2.6.1-bk5 doesn't boot
2.6.1-bk6 hangs
2.6.2-rc1 hangs
2.6.2-rc2 hangs
2.6.2-rc2-gcc-2.95 hangs
2.6.2-rc1-mm3 OK!
2.6.2-rc2-mm1 OK!

I can't reproduce it on the local filesystem on the NFS server
directly, and I can't reproduce it on other filesystems than XFS.
But NFSD+XFS locks up every time.

Unfortunately the diff between 2.6.1-bk4 and bk6 is too large
for me to see what the difference is, likewise 2.6.2-rc2-mm1,
but perhaps someone has an idea what could be causing this ?

Here's the sysrq output:

Pid: 531, comm: nfsd
EIP: 0060:[<c01577c2>] CPU: 0
EIP is at generic_fillattr+0x82/0xa0
EFLAGS: 00000246 Not tainted
EAX: 00000000 EBX: 07ae7200 ECX: f650a0a0 EDX: 000113eb
ESI: 00000000 EDI: f66cfed4 EBP: f66e5804 DS: 007b ES: 007b
CR0: 8005003b CR2: 40019000 CR3: 37245000 CR4: 000006d0
Call Trace:
[<f8ab1f6b>] linvfs_getattr+0x2b/0x34 [xfs]
[<c0157805>] vfs_getattr+0x25/0x84
[<c01c19a3>] encode_post_op_attr+0x53/0x238
[<c01c1e27>] encode_wcc_data+0x29f/0x2a8
[<c01c4521>] nfs3svc_encode_commitres+0x19/0x5c
[<c01b709d>] nfsd_dispatch+0x14d/0x1a3
[<c02fb79b>] svc_process+0x3b3/0x640
[<c01b6ddc>] nfsd+0x1e4/0x358
[<c01b6bf8>] nfsd+0x0/0x358
[<c0107251>] kernel_thread_helper+0x5/0xc




(By the way, on 2.6.2-rc1-mm3 and 2.6.2-rc2-mm1 mounting an XFS
filesystem results in lots of:

Badness in interruptible_sleep_on at kernel/sched.c:2239
Call Trace:
[<c011f5a3>] interruptible_sleep_on+0xf6/0xfb
[<c011f209>] default_wake_function+0x0/0x12
[<f8ac0fa2>] pagebuf_daemon+0x231/0x24c [xfs]
[<c0339ed6>] ret_from_fork+0x6/0x14
[<f8ac0d47>] pagebuf_daemon_wakeup+0x0/0x2a [xfs]
[<f8ac0d71>] pagebuf_daemon+0x0/0x24c [xfs]
[<c0109269>] kernel_thread_helper+0x5/0xb )

Mike.


2004-01-29 06:31:20

by Nathan Scott

[permalink] [raw]
Subject: Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()

On Wed, Jan 28, 2004 at 05:17:27PM +0000, Miquel van Smoorenburg wrote:
> I have a Linux 2.6.2-rc2 NFS file server and another similar
> box as client. Kernel is compiled for SMP (hyperthreading).
>
> On the NFS server I'm exporting an XFS filesystem to the client
> over Gigabit ethernet. The client mounts using
> mount -o nfsvers=3,intr,rsize=32768,wsize=32768 server:/export/hwr /mnt
>
> On the client I then run a large dd to a file on the server,
> like dd if=/dev/zero of=/mnt/file bs=4096 count=800000
>
> In a few seconds, the server locks up. It spins in
> generic_fillattr(), apparently in the i_size_read() inline function.
> Server responds to pings and sysrq, but nothing else.
>
> 2.6.1 doesn't show this behaviour. I tested several kernels:
>
> 2.6.1 OK!
> 2.6.1-bk4 OK!
> 2.6.1-bk5 doesn't boot
> 2.6.1-bk6 hangs
> 2.6.2-rc1 hangs
> 2.6.2-rc2 hangs
> 2.6.2-rc2-gcc-2.95 hangs
> 2.6.2-rc1-mm3 OK!
> 2.6.2-rc2-mm1 OK!

> I can't reproduce it on the local filesystem on the NFS server
> directly, and I can't reproduce it on other filesystems than XFS.
> But NFSD+XFS locks up every time.

Hmmmm... I don't think Andrew has any XFS fixes in his tree that
might help there; and I can't think of any XFS change in -rc1 that
might have caused this (does the fs/xfs subdir from 2.6.1 plonked
down in place of the 2.6.2-rc1/2 version still have the problem?)

i_size_read seems to have a fair bit of config dependency - are you
CONFIG_SMP / CONFIG_PREEMPT / neither? and is your BITS_PER_LONG
32 or 64? thanks.

> (By the way, on 2.6.2-rc1-mm3 and 2.6.2-rc2-mm1 mounting an XFS
> filesystem results in lots of:
>
> Badness in interruptible_sleep_on at kernel/sched.c:2239
> Call Trace:
> [<c011f5a3>] interruptible_sleep_on+0xf6/0xfb
> [<c011f209>] default_wake_function+0x0/0x12
> [<f8ac0fa2>] pagebuf_daemon+0x231/0x24c [xfs]
> [<c0339ed6>] ret_from_fork+0x6/0x14
> [<f8ac0d47>] pagebuf_daemon_wakeup+0x0/0x2a [xfs]
> [<f8ac0d71>] pagebuf_daemon+0x0/0x24c [xfs]
> [<c0109269>] kernel_thread_helper+0x5/0xb )
>

There's fixes floating around for this, I'll get it merged soon.

cheers.

--
Nathan

2004-01-29 06:24:58

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()

"Miquel van Smoorenburg" <[email protected]> wrote:
>
> I have a Linux 2.6.2-rc2 NFS file server and another similar
> box as client. Kernel is compiled for SMP (hyperthreading).
>
> On the NFS server I'm exporting an XFS filesystem to the client
> over Gigabit ethernet. The client mounts using
> mount -o nfsvers=3,intr,rsize=32768,wsize=32768 server:/export/hwr /mnt
>
> On the client I then run a large dd to a file on the server,
> like dd if=/dev/zero of=/mnt/file bs=4096 count=800000
>
> In a few seconds, the server locks up. It spins in
> generic_fillattr(), apparently in the i_size_read() inline function.
> Server responds to pings and sysrq, but nothing else.
>
> 2.6.1 doesn't show this behaviour. I tested several kernels:
>
> 2.6.1 OK!
> 2.6.1-bk4 OK!
> 2.6.1-bk5 doesn't boot
> 2.6.1-bk6 hangs
> 2.6.2-rc1 hangs
> 2.6.2-rc2 hangs
> 2.6.2-rc2-gcc-2.95 hangs
> 2.6.2-rc1-mm3 OK!
> 2.6.2-rc2-mm1 OK!
>
> I can't reproduce it on the local filesystem on the NFS server
> directly, and I can't reproduce it on other filesystems than XFS.
> But NFSD+XFS locks up every time.
>
> Unfortunately the diff between 2.6.1-bk4 and bk6 is too large
> for me to see what the difference is, likewise 2.6.2-rc2-mm1,
> but perhaps someone has an idea what could be causing this ?
>
> Here's the sysrq output:
>
> Pid: 531, comm: nfsd
> EIP: 0060:[<c01577c2>] CPU: 0
> EIP is at generic_fillattr+0x82/0xa0
> EFLAGS: 00000246 Not tainted
> EAX: 00000000 EBX: 07ae7200 ECX: f650a0a0 EDX: 000113eb
> ESI: 00000000 EDI: f66cfed4 EBP: f66e5804 DS: 007b ES: 007b
> CR0: 8005003b CR2: 40019000 CR3: 37245000 CR4: 000006d0
> Call Trace:
> [<f8ab1f6b>] linvfs_getattr+0x2b/0x34 [xfs]
> [<c0157805>] vfs_getattr+0x25/0x84
> [<c01c19a3>] encode_post_op_attr+0x53/0x238
> [<c01c1e27>] encode_wcc_data+0x29f/0x2a8
> [<c01c4521>] nfs3svc_encode_commitres+0x19/0x5c
> [<c01b709d>] nfsd_dispatch+0x14d/0x1a3
> [<c02fb79b>] svc_process+0x3b3/0x640
> [<c01b6ddc>] nfsd+0x1e4/0x358
> [<c01b6bf8>] nfsd+0x0/0x358
> [<c0107251>] kernel_thread_helper+0x5/0xc
>

Is the EIP _always_ inside generic_fillattr()?

If so then yes, your analysis look right. I'd say that the inode has been
corrupted and the seqcount counter has assumed an non-even value. That
will cause i_size_read() to lock up.

Are you using slab debugging? Is so, does the lockup go away if you change
mm/slab.c:POISON_FREE to an even value, say 0x6a? That would tend to
confirm a use-after-free problem. Or a totally wild pointer.




2004-01-29 23:21:41

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()

According to Andrew Morton:
> "Miquel van Smoorenburg" <[email protected]> wrote:
> >
> > On the NFS server I'm exporting an XFS filesystem to the client
> > over Gigabit ethernet. The client mounts using
> > mount -o nfsvers=3,intr,rsize=32768,wsize=32768 server:/export/hwr /mnt
> >
> > On the client I then run a large dd to a file on the server,
> > like dd if=/dev/zero of=/mnt/file bs=4096 count=800000
> >
> > In a few seconds, the server locks up. It spins in
> > generic_fillattr(), apparently in the i_size_read() inline function.
> > Server responds to pings and sysrq, but nothing else.
> >
> > 2.6.1 doesn't show this behaviour. I tested several kernels:

Okay so I did some more tests today. As it turns out, all stock
2.6 kernels lock up - it's just that 2.6.2-rc* locks up faster
than earlier versions. I tested 2.6.1-rc1-mm3 and 2.6.2-rc2-mm1,
these do not lock up (test on -mm has been running for 8 hours
straight now, 2.6.* vanilla locks up in a few minutes).

> > Here's the sysrq output:
> >
> > Pid: 531, comm: nfsd
> > EIP: 0060:[<c01577c2>] CPU: 0
> > EIP is at generic_fillattr+0x82/0xa0
> > EFLAGS: 00000246 Not tainted
> > EAX: 00000000 EBX: 07ae7200 ECX: f650a0a0 EDX: 000113eb
> > ESI: 00000000 EDI: f66cfed4 EBP: f66e5804 DS: 007b ES: 007b
> > CR0: 8005003b CR2: 40019000 CR3: 37245000 CR4: 000006d0
> > Call Trace:
> > [<f8ab1f6b>] linvfs_getattr+0x2b/0x34 [xfs]
> > [<c0157805>] vfs_getattr+0x25/0x84
> > [<c01c19a3>] encode_post_op_attr+0x53/0x238
> > [<c01c1e27>] encode_wcc_data+0x29f/0x2a8
> > [<c01c4521>] nfs3svc_encode_commitres+0x19/0x5c
> > [<c01b709d>] nfsd_dispatch+0x14d/0x1a3
> > [<c02fb79b>] svc_process+0x3b3/0x640
> > [<c01b6ddc>] nfsd+0x1e4/0x358
> > [<c01b6bf8>] nfsd+0x0/0x358
> > [<c0107251>] kernel_thread_helper+0x5/0xc
> >
>
> Is the EIP _always_ inside generic_fillattr()?

Yes, if a keep on doing BREAK-p the output is alway the same.
EIP varies a few bytes, ofcourse, not much.

> If so then yes, your analysis look right. I'd say that the inode has been
> corrupted and the seqcount counter has assumed an non-even value. That
> will cause i_size_read() to lock up.
>
> Are you using slab debugging? Is so, does the lockup go away if you change
> mm/slab.c:POISON_FREE to an even value, say 0x6a? That would tend to
> confirm a use-after-free problem. Or a totally wild pointer.

I turned on slab debugging but it didn't help, and it didn't make
any difference. Well actually it made it take longer to show the
effect - at first I thought that the problem dissapeared when I turned
on slab debugging, but when I removed the "count=bignumber" from the
dd command, it locked up eventually (in a few minutes).

I also changed the debug constanst from odd to even and the
other way around, but that had no effect - no messages logged,
and the kernel still locked up.

Note that at this heavy load, there are up to 8 nfsd's running
on the server concurrently. There must be some lock or race ...

According to Nathan Scott:
> Hmmmm... I don't think Andrew has any XFS fixes in his tree that
> might help there; and I can't think of any XFS change in -rc1 that
> might have caused this (does the fs/xfs subdir from 2.6.1 plonked
> down in place of the 2.6.2-rc1/2 version still have the problem?)

Yes, sorry for the confusion - all 2.6 (non-mm) shows this behaviour.
I tried -mm1 XFS in 2.6.1-rc2, but that did't help, it still
locks up (the diff is just a one-liner, so that was to be expected).
So it doesn't look like an XFS problem, perse.

> i_size_read seems to have a fair bit of config dependency - are you
> CONFIG_SMP / CONFIG_PREEMPT / neither? and is your BITS_PER_LONG
> 32 or 64? thanks.

CONFIG_SMP (for P IV hyperthreading), 32 bits.

Mike.

2004-01-30 16:01:32

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()

On Thu, 29 Jan 2004 07:25:21, Andrew Morton wrote:
> "Miquel van Smoorenburg" <[email protected]> wrote:
> >
> > I have a Linux 2.6.2-rc2 NFS file server and another similar
> > box as client. Kernel is compiled for SMP (hyperthreading).
> >
> > In a few seconds, the server locks up. It spins in
> > generic_fillattr(), apparently in the i_size_read() inline function.
> > Server responds to pings and sysrq, but nothing else.
> > Here's the sysrq output:
> >
> > Pid: 531, comm: nfsd
> > EIP: 0060:[<c01577c2>] CPU: 0
> > EIP is at generic_fillattr+0x82/0xa0
> > EFLAGS: 00000246 Not tainted
> > EAX: 00000000 EBX: 07ae7200 ECX: f650a0a0 EDX: 000113eb
> > ESI: 00000000 EDI: f66cfed4 EBP: f66e5804 DS: 007b ES: 007b
> > CR0: 8005003b CR2: 40019000 CR3: 37245000 CR4: 000006d0
> > Call Trace:
> > [<f8ab1f6b>] linvfs_getattr+0x2b/0x34 [xfs]
> > [<c0157805>] vfs_getattr+0x25/0x84
> > [<c01c19a3>] encode_post_op_attr+0x53/0x238
> > [<c01c1e27>] encode_wcc_data+0x29f/0x2a8
> > [<c01c4521>] nfs3svc_encode_commitres+0x19/0x5c
> > [<c01b709d>] nfsd_dispatch+0x14d/0x1a3
> > [<c02fb79b>] svc_process+0x3b3/0x640
> > [<c01b6ddc>] nfsd+0x1e4/0x358
> > [<c01b6bf8>] nfsd+0x0/0x358
> > [<c0107251>] kernel_thread_helper+0x5/0xc
> >
>
> Is the EIP _always_ inside generic_fillattr()?
>
> If so then yes, your analysis look right. I'd say that the inode has been
> corrupted and the seqcount counter has assumed an non-even value. That
> will cause i_size_read() to lock up.

Yes, but why does it lock up the whole machine? I'd say just that one
process should go into 'R', but other things should get scheduled right?
Or is some very common lock being held? The machine responds to pings,
so interrupts are working.

In fact as this is a hyperthreaded machine, why don't I see anything
running on the "other virtual cpu" ? I only see nfsd spinning on CPU#0

Could it be a CPU problem, the other CPU-thread is in i_size_write() but
that CPU-thread is simply never run ? To test this, I rebooted the exact
same 2.6.2-rc2 kernel with "maxcpus=1" .. and it doesn't lock up ..

(Still weird that -mm doesn't lock up)

Mike.

2004-01-30 20:22:22

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()

On Thu, 29 Jan 2004 07:25:21, Andrew Morton wrote:
> "Miquel van Smoorenburg" <[email protected]> wrote:
> >
> > I have a Linux 2.6.2-rc2 NFS file server and another similar
> > box as client. Kernel is compiled for SMP (hyperthreading).
> > In a few seconds, the server locks up. It spins in
> > generic_fillattr(), apparently in the i_size_read() inline function.
> > Server responds to pings and sysrq, but nothing else.
>
> Is the EIP _always_ inside generic_fillattr()?
>
> If so then yes, your analysis look right. I'd say that the inode has been
> corrupted and the seqcount counter has assumed an non-even value. That
> will cause i_size_read() to lock up.

I added some extra code to i_size_read() and i_size_write(). First,
some debugging code:

--- fs.h.orig 2004-01-30 21:10:28.000000000 +0100
+++ fs.h.v1 2004-01-30 21:11:19.000000000 +0100
@@ -425,6 +425,7 @@
} u;
#ifdef __NEED_I_SIZE_ORDERED
seqcount_t i_size_seqcount;
+ pid_t seq_pid; /* XXX */
#endif
};

@@ -450,6 +451,12 @@
do {
seq = read_seqcount_begin(&inode->i_size_seqcount);
i_size = inode->i_size;
+#if 1 /* XXX HACK */
+ if ((++count & 65535) == 0) {
+ printk("i_size_read() seems to be looping - pid %d\n", inode->seq_pid);
+ mdelay(100);
+ }
+#endif
} while (read_seqcount_retry(&inode->i_size_seqcount, seq));
return i_size;
#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPT)
@@ -467,9 +474,20 @@
static inline void i_size_write(struct inode *inode, loff_t i_size)
{
#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
+#if 1 /* XXX */
+ inode->seq_pid = current->tgid;
+ write_seqcount_begin(&inode->i_size_seqcount);
+ inode->i_size = i_size;
+ write_seqcount_end(&inode->i_size_seqcount);
+ if (inode->i_size_seqcount.sequence & 1)
+ printk("i_size_write: pid %d: sequence is odd!\n",
+ current->tgid);
+ inode->seq_pid = 0;
+#else
write_seqcount_begin(&inode->i_size_seqcount);
inode->i_size = i_size;
write_seqcount_end(&inode->i_size_seqcount);
+#endif
#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPT)
preempt_disable();
inode->i_size = i_size;

I then started the test that locks up the kernel, and it printed this:

i_size_write: pid 542: sequence is odd!
i_size_write: pid 543: sequence is odd!
i_size_write: pid 542: sequence is odd!

i_size_read() seems to be looping - pid 0
i_size_read() seems to be looping - pid 0
[this keeps on being printed and the kernel is locked up]

It took some time for the i_size_write messages to show up, and they were
spaced 10-30 seconds apart, and during that time the server was still
up - right until the first i_size_read message.

Then I added this patch:

--- fs.h.v1 2004-01-30 21:11:19.000000000 +0100
+++ fs.h 2004-01-30 20:16:35.000000000 +0100
@@ -426,6 +426,7 @@
#ifdef __NEED_I_SIZE_ORDERED
seqcount_t i_size_seqcount;
pid_t seq_pid; /* XXX */
+ spinlock_t i_size_lock;
#endif
};

@@ -475,6 +476,7 @@
{
#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
#if 1 /* XXX */
+ spin_lock(&inode->i_size_lock);
inode->seq_pid = current->tgid;
write_seqcount_begin(&inode->i_size_seqcount);
inode->i_size = i_size;
@@ -483,6 +485,7 @@
printk("i_size_write: pid %d: sequence is odd!\n",
current->tgid);
inode->seq_pid = 0;
+ spin_unlock(&inode->i_size_lock);
#else
write_seqcount_begin(&inode->i_size_seqcount);
inode->i_size = i_size;

(and some code in fs/inode.c to initialize i_size_lock)

Guess what. No more debug output, no more lockups ... is there
anything else I can do to debug this ? Because I'm not really
sure what I'm doing, you see :)

Mike.

2004-01-30 22:14:07

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()

On Fri, 30 Jan 2004 21:21:55, Miquel van Smoorenburg wrote:
> I added some extra code to i_size_read() and i_size_write(). First,
> some debugging code:
> I then started the test that locks up the kernel, and it printed this:
>
> i_size_write: pid 542: sequence is odd!
> i_size_write: pid 543: sequence is odd!
> i_size_write: pid 542: sequence is odd!
>
> i_size_read() seems to be looping - pid 0
> i_size_read() seems to be looping - pid 0
> [this keeps on being printed and the kernel is locked up]
>
> It took some time for the i_size_write messages to show up, and they were
> spaced 10-30 seconds apart, and during that time the server was still
> up - right until the first i_size_read message.
>

Okay, I added a patch to make the sequence increments atomic. Now
i_size_write() still sometimes ends up with an odd sequence, but
i_size_read() doesn't lock up anymore.

What lock exactly is supposed to protect i_size_write, since it
appears that i_size_write is being called without proper locking ?
(Am I right?)


--- fs.h.v1 2004-01-30 21:11:19.000000000 +0100
+++ fs.h 2004-01-30 21:55:17.000000000 +0100
@@ -426,6 +426,7 @@
#ifdef __NEED_I_SIZE_ORDERED
seqcount_t i_size_seqcount;
pid_t seq_pid; /* XXX */
+ spinlock_t i_size_lock;
#endif
};

@@ -441,6 +442,7 @@
*/
#include <linux/delay.h>
#include <linux/sched.h>
+#include <asm/atomic.h>
static inline loff_t i_size_read(struct inode *inode)
{
#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
@@ -476,9 +478,11 @@
#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
#if 1 /* XXX */
inode->seq_pid = current->tgid;
- write_seqcount_begin(&inode->i_size_seqcount);
+ atomic_inc((atomic_t *)&inode->i_size_seqcount.sequence);
+ smp_wmb();
inode->i_size = i_size;
- write_seqcount_end(&inode->i_size_seqcount);
+ smp_wmb();
+ atomic_inc((atomic_t *)&inode->i_size_seqcount.sequence);
if (inode->i_size_seqcount.sequence & 1)
printk("i_size_write: pid %d: sequence is odd!\n",
current->tgid);

2004-01-30 22:33:37

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()

Miquel van Smoorenburg <[email protected]> wrote:
>
> On Fri, 30 Jan 2004 21:21:55, Miquel van Smoorenburg wrote:
> > I added some extra code to i_size_read() and i_size_write(). First,
> > some debugging code:
> > I then started the test that locks up the kernel, and it printed this:
> >
> > i_size_write: pid 542: sequence is odd!
> > i_size_write: pid 543: sequence is odd!
> > i_size_write: pid 542: sequence is odd!
> >
> > i_size_read() seems to be looping - pid 0
> > i_size_read() seems to be looping - pid 0
> > [this keeps on being printed and the kernel is locked up]
> >
> > It took some time for the i_size_write messages to show up, and they were
> > spaced 10-30 seconds apart, and during that time the server was still
> > up - right until the first i_size_read message.
> >
>
> Okay, I added a patch to make the sequence increments atomic. Now
> i_size_write() still sometimes ends up with an odd sequence, but
> i_size_read() doesn't lock up anymore.

Go, Miquel!

> What lock exactly is supposed to protect i_size_write, since it
> appears that i_size_write is being called without proper locking ?
> (Am I right?)

If two CPUs hit i_size_write() at the same time we have a bug. That
function requires that the caller provide external serialisation, via i_sem.

Try adding this to the start of i_size_write():

if (down_trylock(&inode->i_sem) == 0) {
printk("I am buggy\n");
dump_stack();
up(&inode->i_sem);
}

2004-01-30 22:54:14

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()

On Fri, Jan 30, 2004 at 02:34:59PM -0800, Andrew Morton wrote:
> If two CPUs hit i_size_write() at the same time we have a bug. That
> function requires that the caller provide external serialisation, via i_sem.

O_APPEND|O_DIRECT writes could do that under XFS..

2004-01-30 23:08:28

by Nathan Scott

[permalink] [raw]
Subject: Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()

On Fri, Jan 30, 2004 at 02:34:59PM -0800, Andrew Morton wrote:
> Miquel van Smoorenburg <[email protected]> wrote:
> > What lock exactly is supposed to protect i_size_write, since it
> > appears that i_size_write is being called without proper locking ?
> > (Am I right?)
>
> If two CPUs hit i_size_write() at the same time we have a bug. That
> function requires that the caller provide external serialisation, via i_sem.

Hmm... I suspect we may not always be providing that in XFS -
I'll go audit our calls.

> Try adding this to the start of i_size_write():
>
> if (down_trylock(&inode->i_sem) == 0) {
> printk("I am buggy\n");
> dump_stack();
> up(&inode->i_sem);
> }

Let me know what you hit Miquel :) - I'll run the XFS tests
next week with this too and see what I can find.

thanks.

--
Nathan

2004-01-30 23:12:05

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()

Christoph Hellwig <[email protected]> wrote:
>
> On Fri, Jan 30, 2004 at 02:34:59PM -0800, Andrew Morton wrote:
> > If two CPUs hit i_size_write() at the same time we have a bug. That
> > function requires that the caller provide external serialisation, via i_sem.
>
> O_APPEND|O_DIRECT writes could do that under XFS..

Sigh.


diff -puN include/linux/fs.h~i_size_write-check include/linux/fs.h
--- 25/include/linux/fs.h~i_size_write-check Fri Jan 30 15:09:47 2004
+++ 25-akpm/include/linux/fs.h Fri Jan 30 15:10:28 2004
@@ -464,9 +464,11 @@ static inline loff_t i_size_read(struct
#endif
}

+void i_size_write_check(struct inode *inode);

static inline void i_size_write(struct inode *inode, loff_t i_size)
{
+ i_size_write_check(inode);
#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
write_seqcount_begin(&inode->i_size_seqcount);
inode->i_size = i_size;
diff -puN mm/filemap.c~i_size_write-check mm/filemap.c
--- 25/mm/filemap.c~i_size_write-check Fri Jan 30 15:10:23 2004
+++ 25-akpm/mm/filemap.c Fri Jan 30 15:11:41 2004
@@ -2010,3 +2010,18 @@ out:
}

EXPORT_SYMBOL_GPL(generic_file_direct_IO);
+
+void i_size_write_check(struct inode *inode)
+{
+ static int count = 0;
+
+ if (down_trylock(&inode->i_sem) == 0) {
+ if (count < 10) {
+ count++;
+ printk("i_size_write() called without i_sem\n");
+ dump_stack();
+ }
+ up(&inode->i_sem);
+ }
+}
+EXPORT_SYMBOL(i_size_write_check);

_

2004-01-31 01:25:24

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()

On Sat, 31 Jan 2004 00:13:16, Andrew Morton wrote:
> Christoph Hellwig <[email protected]> wrote:
> >
> > On Fri, Jan 30, 2004 at 02:34:59PM -0800, Andrew Morton wrote:
> > > If two CPUs hit i_size_write() at the same time we have a bug. That
> > > function requires that the caller provide external serialisation, via i_sem.
> >
> > O_APPEND|O_DIRECT writes could do that under XFS..
>
> Sigh.
>
> diff -puN mm/filemap.c~i_size_write-check mm/filemap.c
> --- 25/mm/filemap.c~i_size_write-check Fri Jan 30 15:10:23 2004
> +++ 25-akpm/mm/filemap.c Fri Jan 30 15:11:41 2004
> @@ -2010,3 +2010,18 @@ out:
> }
>
> EXPORT_SYMBOL_GPL(generic_file_direct_IO);
> +
> +void i_size_write_check(struct inode *inode)
> +{
> + static int count = 0;
> +
> + if (down_trylock(&inode->i_sem) == 0) {
> + if (count < 10) {

You want to set this to 100 at least, since at boot time the message
happens _often_ even without XFS.

It's caused by sysfs:

Jan 31 00:48:33 meghan kernel: i_size_write() called without i_sem
Jan 31 00:48:33 meghan kernel: Call Trace:
Jan 31 00:48:33 meghan kernel: [i_size_write_check+95/97] i_size_write_check+0x5f/0x61
Jan 31 00:48:33 meghan kernel: [simple_commit_write+74/148] simple_commit_write+0x4a/0x94
Jan 31 00:48:33 meghan kernel: [page_symlink+210/414] page_symlink+0xd2/0x19e
Jan 31 00:48:33 meghan kernel: [sysfs_symlink+107/126] sysfs_symlink+0x6b/0x7e
Jan 31 00:48:33 meghan kernel: [sysfs_create_link+313/317] sysfs_create_link+0x139/0x13d
Jan 31 00:48:33 meghan kernel: [bus_add_device+143/161] bus_add_device+0x8f/0xa1
[etc etc]

Also, bd_set_size runs unlocked:

Jan 31 02:02:39 meghan kernel: I am buggy
Jan 31 02:02:39 meghan kernel: Call Trace:
Jan 31 02:02:39 meghan kernel: [bd_set_size+201/218] bd_set_size+0xc9/0xda
Jan 31 02:02:39 meghan kernel: [do_open+282/1014] do_open+0x11a/0x3f6
Jan 31 02:02:39 meghan kernel: [blkdev_get+114/128] blkdev_get+0x72/0x80
Jan 31 02:02:39 meghan kernel: [do_open+603/1014] do_open+0x25b/0x3f6
Jan 31 02:02:39 meghan kernel: [blkdev_get+114/128] blkdev_get+0x72/0x80
Jan 31 02:02:39 meghan kernel: [open_bdev_excl+87/172] open_bdev_excl+0x57/0xacJan 31 02:02:39 meghan kernel: [get_sb_bdev+55/344] get_sb_bdev+0x37/0x158
Jan 31 02:02:39 meghan kernel: [ext3_get_sb+47/51] ext3_get_sb+0x2f/0x33
Jan 31 02:02:39 meghan kernel: [ext3_fill_super+0/2949] ext3_fill_super+0x0/0xb85
Jan 31 02:02:39 meghan kernel: [do_kern_mount+95/213] do_kern_mount+0x5f/0xd5
Jan 31 02:02:39 meghan kernel: [do_add_mount+120/334] do_add_mount+0x78/0x14e
Jan 31 02:02:39 meghan kernel: [do_mount+289/359] do_mount+0x121/0x167
Jan 31 02:02:39 meghan kernel: [__copy_from_user_ll+104/108] __copy_from_user_ll+0x68/0x6c
Jan 31 02:02:39 meghan kernel: [copy_mount_options+128/234] copy_mount_options+0x80/0xea
Jan 31 02:02:39 meghan kernel: [sys_mount+203/309] sys_mount+0xcb/0x135
Jan 31 02:02:39 meghan kernel: [do_mount_root+47/156] do_mount_root+0x2f/0x9c
Jan 31 02:02:39 meghan kernel: [mount_block_root+84/285] mount_block_root+0x54/0x11d
Jan 31 02:02:39 meghan kernel: [mount_root+94/102] mount_root+0x5e/0x66
Jan 31 02:02:39 meghan kernel: [prepare_namespace+38/195] prepare_namespace+0x26/0xc3
Jan 31 02:02:39 meghan kernel: [init+80/334] init+0x50/0x14e
Jan 31 02:02:39 meghan kernel: [init+0/334] init+0x0/0x14e
Jan 31 02:02:39 meghan kernel: [kernel_thread_helper+5/11] kernel_thread_helper+0x5/0xb

But the XFS problem appears to be vn_revalidate which calls i_size_write()
without holding i_sem:

Jan 31 02:04:00 meghan kernel: I am buggy
Jan 31 02:04:00 meghan kernel: Call Trace:
Jan 31 02:04:00 meghan kernel: [__crc_pm_idle+1813107/4617778] vn_revalidate+0x18e/0x1a9 [xfs]
Jan 31 02:04:00 meghan kernel: [nfsd_commit+205/219] nfsd_commit+0xcd/0xdb
Jan 31 02:04:00 meghan kernel: [__crc_pm_idle+1798087/4617778] linvfs_getattr+0x39/0x3f [xfs]
Jan 31 02:04:00 meghan kernel: [vfs_getattr+57/154] vfs_getattr+0x39/0x9a
Jan 31 02:04:00 meghan kernel: [encode_post_op_attr+104/577] encode_post_op_attr+0x68/0x241
Jan 31 02:04:00 meghan kernel: [nfs3svc_encode_commitres+42/115] nfs3svc_encode_commitres+0x2a/0x73
Jan 31 02:04:00 meghan kernel: [nfsd_dispatch+297/485] nfsd_dispatch+0x129/0x1e5
Jan 31 02:04:00 meghan kernel: [svc_process+1170/1621] svc_process+0x492/0x655
Jan 31 02:04:00 meghan kernel: [apic_timer_interrupt+26/32] apic_timer_interrupt+0x1a/0x20
Jan 31 02:04:00 meghan kernel: [nfsd+490/915] nfsd+0x1ea/0x393
Jan 31 02:04:00 meghan kernel: [nfsd+0/915] nfsd+0x0/0x393
Jan 31 02:04:00 meghan kernel: [kernel_thread_helper+5/11] kernel_thread_helper+0x5/0xb

Mike.

2004-01-31 01:38:12

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()

Miquel van Smoorenburg <[email protected]> wrote:
>
> On Sat, 31 Jan 2004 00:13:16, Andrew Morton wrote:
> > Christoph Hellwig <[email protected]> wrote:
> > >
> > > On Fri, Jan 30, 2004 at 02:34:59PM -0800, Andrew Morton wrote:
> > > > If two CPUs hit i_size_write() at the same time we have a bug. That
> > > > function requires that the caller provide external serialisation, via i_sem.
> > >
> > > O_APPEND|O_DIRECT writes could do that under XFS..
> >
> > Sigh.
> >
> > diff -puN mm/filemap.c~i_size_write-check mm/filemap.c
> > --- 25/mm/filemap.c~i_size_write-check Fri Jan 30 15:10:23 2004
> > +++ 25-akpm/mm/filemap.c Fri Jan 30 15:11:41 2004
> > @@ -2010,3 +2010,18 @@ out:
> > }
> >
> > EXPORT_SYMBOL_GPL(generic_file_direct_IO);
> > +
> > +void i_size_write_check(struct inode *inode)
> > +{
> > + static int count = 0;
> > +
> > + if (down_trylock(&inode->i_sem) == 0) {
> > + if (count < 10) {
>
> You want to set this to 100 at least, since at boot time the message
> happens _often_ even without XFS.

fun.

> It's caused by sysfs:

OK, sysfs_symlink() needs i_sem.

> Also, bd_set_size runs unlocked:

I don't expect we'll ever see a race over the blockdev's i_size in there.
It might be simplest to just do a straight assignment. I'll work that with
viro.

> But the XFS problem appears to be vn_revalidate which calls i_size_write()
> without holding i_sem:

There's your bug.


2004-01-31 11:48:08

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()

On Sat, 31 Jan 2004 02:38:51, Andrew Morton wrote:
> Miquel van Smoorenburg <[email protected]> wrote:
> > But the XFS problem appears to be vn_revalidate which calls i_size_write()
> > without holding i_sem:
>
> There's your bug.

Okay. It seems that XFS uses its own locking with xfs_ilock() etc,
so I am not sure if this should be fixed by using down(&inode->i_sem)
or by using xfs_ilock().

Perhaps xfs_ilock() should also get the inode->i_sem semaphore
in the XFS_ILOCK_EXCL case ?

Also, there are one or two more places that call i_size_write()
that should be looked at I guess.

Ofcourse I'll test any patches you send me.

Mike.

2004-01-31 17:07:13

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()

Here's an (untested) quickhack to take i_sem in linvfs_getattr.
I'll try to come up with a real fix next week.

--- 1.39/fs/xfs/linux/xfs_iops.c Fri Jan 9 01:07:07 2004
+++ edited/fs/xfs/linux/xfs_iops.c Sat Jan 31 18:41:44 2004
@@ -478,8 +478,11 @@
vnode_t *vp = LINVFS_GET_VP(inode);
int error = 0;

- if (unlikely(vp->v_flag & VMODIFIED))
+ if (unlikely(vp->v_flag & VMODIFIED)) {
+ down(&inode->i_sem);
error = vn_revalidate(vp);
+ up(&inode->i_sem);
+ }
if (!error)
generic_fillattr(inode, stat);
return 0;

2004-02-01 01:49:00

by Anton Blanchard

[permalink] [raw]
Subject: Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()


> > + if (down_trylock(&inode->i_sem) == 0) {
> > + if (count < 10) {
>
> You want to set this to 100 at least, since at boot time the message
> happens _often_ even without XFS.

You could also just printk_ratelimit it :)

Anton