LinuxLists.cc - Oops: Quota race in 2.4.12?

2001-10-29 05:58:05

Subject: Oops: Quota race in 2.4.12?

Some of our dual CPU web servers with 2.4.12 are Oopsing while running
quotacheck. They don't seem to die immediately, but oops many times and
eventually break. The old tools didn't warn about quotachecking on a
live file system, so some of our servers were set up to run quotacheck
nightly. The new tools still allow you to do it, but warn that it may
not be consistent. We didn't have any problems with 2.2 kernels.

First oops, as already processed (grumble) by klogd:

Oct 28 04:22:32 pro kernel: remove_free_dquot: dquot not on the free list??
Oct 28 04:22:32 pro last message repeated 90 times
Oct 28 04:22:32 pro kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000004
...dates stripped:

Unable to handle kernel NULL pointer dereference at virtual address 00000004
printing eip:
c0149edc
*pde = 00000000
Oops: 0002
CPU: 1
EIP: 0010:[dqput+148/188] Not tainted
EFLAGS: 00010246
eax: d58c8830 ebx: cf330cc0 ecx: cf330cd0 edx: 00000000
esi: cf330cc0 edi: d2847f6c ebp: 00000000 esp: d2847f30
ds: 0018 es: 0018 ss: 0018
Process quotacheck (pid: 3933, stackpage=d2847000)
Stack: 00000000 c014a93e cf330cc0 00006000 c1a58800 00000000 d2847fa4 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 c014b8f0
c1a58800 0000f465 00000000 00000004 bffffd54 d2846000 bffffd54 001e8ca0
Call Trace: [set_dqblk+390/404] [sys_quotactl+780/892] [sys_read+188/196] [system_call+51/56]

Code: 89 4a 04 89 53 10 89 41 04 89 08 ff 05 e4 ab 34 c0 8d 43 24

Perhaps there is some obviously broken locking/code in the quotactl syscall?

The next Oops, 6 seconds later:

<1>Unable to handle kernel NULL pointer dereference at virtual address 00000004
printing eip:
c0149edc
*pde = 00000000
Oops: 0002
CPU: 1
EIP: 0010:[dqput+148/188] Not tainted
EFLAGS: 00010246
eax: d58c8830 ebx: cf330c40 ecx: cf330c50 edx: 00000000
esi: d4a08ca4 edi: d4a08bc0 ebp: c36f5a40 esp: d16f5efc
ds: 0018 es: 0018 ss: 0018
Process mv (pid: 7146, stackpage=d16f5000)
Stack: 00000000 c014acda cf330c40 d16f4000 c1a58c00 c0155a8f d4a08bc0 d4a08bc0
d4a08bc0 c0156370 c36f5a40 c36f5a40 00000022 00000000 e2757480 c01563f7
c015641d d4a08bc0 d4a08bc0 d16f4000 c0146c19 d4a08bc0 c36f5a40 d4a08bc0
Call Trace: [dquot_drop+54/68] [ext2_free_inode+231/616] [ext2_delete_inode+0/296] [ext2_delete_inode+135/296] [ext2_delete_inode+173/296]
[iput+389/600] [d_delete+98/160] [vfs_unlink+492/540] [sys_unlink+169/288] [system_call+51/56]

Code: 89 4a 04 89 53 10 89 41 04 89 08 ff 05 e4 ab 34 c0 8d 43 24

...Many more oopses follow over time.

Simon-

[ Stormix Technologies Inc. ][ NetNation Communications Inc. ]
[ [email protected] ][ [email protected] ]
[ Opinions expressed are not necessarily those of my employers. ]

2001-10-29 13:44:32

by Jan Kara

[permalink] [raw]

Subject: Re: Oops: Quota race in 2.4.12?

Hello,

> Some of our dual CPU web servers with 2.4.12 are Oopsing while running
> quotacheck. They don't seem to die immediately, but oops many times and
> eventually break. The old tools didn't warn about quotachecking on a
> live file system, so some of our servers were set up to run quotacheck
> nightly. The new tools still allow you to do it, but warn that it may
> not be consistent. We didn't have any problems with 2.2 kernels.
>
> First oops, as already processed (grumble) by klogd:
>
> Oct 28 04:22:32 pro kernel: remove_free_dquot: dquot not on the free list??
> Oct 28 04:22:32 pro last message repeated 90 times
> Oct 28 04:22:32 pro kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000004
> ...dates stripped:
>
> Unable to handle kernel NULL pointer dereference at virtual address 00000004
> printing eip:
> c0149edc
> *pde = 00000000
> Oops: 0002
> CPU: 1
> EIP: 0010:[dqput+148/188] Not tainted
> EFLAGS: 00010246
> eax: d58c8830 ebx: cf330cc0 ecx: cf330cd0 edx: 00000000
> esi: cf330cc0 edi: d2847f6c ebp: 00000000 esp: d2847f30
> ds: 0018 es: 0018 ss: 0018
> Process quotacheck (pid: 3933, stackpage=d2847000)
> Stack: 00000000 c014a93e cf330cc0 00006000 c1a58800 00000000 d2847fa4 00000000
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 c014b8f0
> c1a58800 0000f465 00000000 00000004 bffffd54 d2846000 bffffd54 001e8ca0
> Call Trace: [set_dqblk+390/404] [sys_quotactl+780/892] [sys_read+188/196] [system_call+51/56]
>
> Code: 89 4a 04 89 53 10 89 41 04 89 08 ff 05 e4 ab 34 c0 8d 43 24
>
> Perhaps there is some obviously broken locking/code in the quotactl syscall?
I'd also blame some SMP locking (I think that on UP everything was tested well) but
everything should be protected by lock_kernel() and it seems to me that everything really
is protected. Anyway I'll try to find the problem.

Thanks for report
Honza

--
Jan Kara <[email protected]>
SuSE CR Labs

2001-10-29 16:30:32

by Simon Kirby

[permalink] [raw]

Subject: Re: Oops: Quota race in 2.4.12?

On Mon, Oct 29, 2001 at 02:44:41PM +0100, Jan Kara wrote:

> I'd also blame some SMP locking (I think that on UP everything was tested well) but
> everything should be protected by lock_kernel() and it seems to me that everything really
> is protected. Anyway I'll try to find the problem.

I notice you just recently posted a patch to fix possible list
corruption. Could this be related?

Simon-

[ Stormix Technologies Inc. ][ NetNation Communications Inc. ]
[ [email protected] ][ [email protected] ]
[ Opinions expressed are not necessarily those of my employers. ]

2001-10-29 23:18:50

by NeilBrown

[permalink] [raw]

Subject: Re: Oops: Quota race in 2.4.12?

On Sunday October 28, [email protected] wrote:
> Some of our dual CPU web servers with 2.4.12 are Oopsing while running
> quotacheck. They don't seem to die immediately, but oops many times and
> eventually break. The old tools didn't warn about quotachecking on a
> live file system, so some of our servers were set up to run quotacheck
> nightly. The new tools still allow you to do it, but warn that it may
> not be consistent. We didn't have any problems with 2.2 kernels.

quotacheck cannot be reliable on a live system as it scans through the
filesystem counting the usage for each user and then updates the
quotas file. If usage changes between scanning a file and updating
the quota record, you get an error. This is particularly a problem
if quotacheck takes a long time, and on one of my servers (heavily
loaded NFS server) quotacheck takes a *long* time if the server is
live (it isn't exactly quick if it isn't live either).

I wrote a little program which uses libext2fs to scan the block device
for inodes and add up usage that way (as opposed to walking the
filetree as I believe quotacheck does). It runs *much* faster
(minutes instead of hours).

What I have been doing lately is running it every few hours and having
it reports the differences that it found, rather then actually
changing anything.
Then I look for differences that have persisted over several runs.
These differences I assume are real differences and I correct them.

I find that I often gets tens of uids which have an apparent error on
a single run (due to changes happening during the run). This drops to
less than ten that appear to have an error on each of two consecutive
runs. Any uid that appears to have an error on three consecutive runs
is almost always truely in error.

I have included below the program that scans the filesystem and a
script that I use to run it and monitor the output.

The usage of the program is

newquotacheck -[ugt] /dev/device /path/to/filesystem

It can compute one of uid (-u), gid (-g) or tid (-t) [see tree quotas
thread] quotas for the filesystem on the device. The
/path/to/filesystem is only needed for the output.

The program produces output like:

changequota -bu -12345 1002 /path/to/filesystem

which means change BlockUsage by subtracting 12345 for uid 1002 on
filesystem "/path/to/filesystem". I have a program called
changequota which does this.

The program only copes with uids (and gids) less than 65536, and only
with ext2/ext3 filesystems.

The script runs newquotacheck every 8 hours, and keeps the output in
files called
quota.diff.$n
when $n is 0 for the most recent, 1 for the next most recent and so
on. It then looks for uids that have changed on each for the past $n
runs, and records them in
quota.changes.$n

The script is written for a non-standard shell. You might need to
translate, but I don't know 'bash' well enough to do it for you.
$[1:20] expands to all the numbers from 1 to 20.
$[arith-expression] expands to the value of the arithmetic expression.

I have been running this for about 2 weeks and had about 5 users with
errors of some sort. I haven't looked deeply into what might have
caused the errors yet.

NeilBrown

---------------------cut------------here---------------------------
/*
* calculate block/inode usage for each user
* use libext2 to walk through inodes.
*/

#include <stdio.h>
#include <sys/quota.h>
#include "ext2fs/ext2_fs.h"
#include "ext2fs/ext2fs.h"

long blocks[65536];
long inodes[65536];

#ifndef TREEQUOTA
#define TREEQUOTA 2
#endif

main(int argc, char *argv[])
{
int i;
ext2_filsys fs;
ext2_inode_scan scan;
errcode_t err;
ext2_ino_t ino;
struct ext2_inode inode;
char *dev, *fsn;
int type=USRQUOTA;
int arg = 1;

if (arg<argc && strcmp(argv[arg], "-t")==0) {
type = TREEQUOTA;
arg++;
} else if (arg < argc && strcmp(argv[arg], "-g")==0) {
type = GRPQUOTA;
arg++;
}

if (arg != argc-2) {
fprintf(stderr, "Usage: checkquota [-t|-g|-u] dev filesys\n");
exit(1);
}

dev = argv[arg];
fsn = argv[arg+1];

err = ext2fs_open(dev, EXT2_FLAG_FORCE, 0, 0, unix_io_manager, &fs);

if (err) exit(1);

err = ext2fs_open_inode_scan(fs, 10000, &scan);

if (err) exit(2);

while ((err = ext2fs_get_next_inode(scan, &ino, &inode)) != 123456)
{
int id;
if (err != 0) {
printf("inode %d err %d\n", ino, err);
}
if (ino == 0 ) { break; }
if (inode.i_mode == 0 ||
inode.i_dtime != 0) {
/* printf("inode %d deleted\n", ino);*/
continue;
}
switch (type) {
case USRQUOTA: id = inode.i_uid; break;
case GRPQUOTA: id = inode.i_gid; break;
case TREEQUOTA: id = inode.i_reserved2; break;
}
blocks[id] += inode.i_blocks;
inodes[id] ++;
}

ext2fs_close_inode_scan(scan);

for (i=1; i<65536; i++)
{
if (blocks[i] || inodes[i]) {
struct dqblk info;

if (quotactl(QCMD(Q_GETQUOTA,type), dev, i, (void*)&info)==0) {
if (blocks[i] == info.dqb_curblocks*2
&& inodes[i] == info.dqb_curinodes)
continue;
printf("changequota ");
if (blocks[i] != info.dqb_curblocks*2)
printf("-bu %s%d ",
blocks[i]/2 > info.dqb_curblocks?"+":"",
blocks[i]/2 - info.dqb_curblocks);
if (inodes[i] != info.dqb_curinodes)
printf("-iu %s%d ",
inodes[i] > info.dqb_curinodes?"+":"",
inodes[i] - info.dqb_curinodes);
} else {
printf("changequota -bu %d -iu %d ", i, blocks[i], inodes[i]);
}
printf(" %d %s\n", i, fsn);
}
}
exit(0);
}

---------------------cut-----------here------------too------------------------
#!/usr/local/bin/ae

while :
do

for i in $[1:20]
do
n=$[20-i]
if [ -f quota.diffs.$n ]
then mv quota.diffs.$n quota.diffs.$[n+1]
fi
done

/root/newcheckquota /dev/md0 /export/glass/1 > quota.diffs.0
awk '{a=NF-1; print $a}' quota.diffs.0 | sort > quota.changes.0
for i in $[1:20]
do
if [ -f quota.diffs.$i ]
then
awk '{a=NF-1; print $a}' quota.diffs.$i | sort | comm -12 - quota.changes.$[i-1] > quota.changes.$i
else break;
fi
done
sleep $[8*3600]
done

2001-10-29 23:23:40

by NeilBrown

[permalink] [raw]

Subject: Re: Oops: Quota race in 2.4.12?

On Sunday October 28, [email protected] wrote:
> Some of our dual CPU web servers with 2.4.12 are Oopsing while running
> quotacheck.

And speaking of quota oopses, I have had oops while enabling quota on
an active filesystem (which admittedly isn't very smart, but shouldn't
oops).
I think the following patch fixes it for 2.4.13. I had a quick look
at the latest -ac code it doesn't have the same problem.

--------------------------------------------------------------------
Avoid Oops when quotas turned on on active filesystem

Current code
sets quotas-enabled flag
possibly blocks on dqget or dqput
then sets dq_op

If other code call DQUOT_INIT (for example) during the block, it will oops.

--- ./fs/dquot.c 2001/10/30 00:17:23 1.1
+++ ./fs/dquot.c 2001/10/30 00:18:26 1.2
@@ -1363,6 +1363,7 @@
inode->i_flags |= S_NOQUOTA;

dqopt->files[type] = f;
+ sb->dq_op = &dquot_operations;
set_enable_flags(dqopt, type);

dquot = dqget(sb, 0, type);
@@ -1370,7 +1371,6 @@
dqopt->block_expire[type] = (dquot != NODQUOT) ? dquot->dq_btime : MAX_DQ_TIME;
dqput(dquot);

- sb->dq_op = &dquot_operations;
add_dquot_ref(sb, type);

up(&dqopt->dqoff_sem);

-------------------------------------------------------------------

2001-10-30 12:12:24

by Jan Kara

[permalink] [raw]

Subject: Re: Oops: Quota race in 2.4.12?

> On Mon, Oct 29, 2001 at 02:44:41PM +0100, Jan Kara wrote:
>
> > I'd also blame some SMP locking (I think that on UP everything was tested well) but
> > everything should be protected by lock_kernel() and it seems to me that everything really
> > is protected. Anyway I'll try to find the problem.
>
> I notice you just recently posted a patch to fix possible list
> corruption. Could this be related?
Nope. That was a fix specific to code in -ac kernel..

Honza
--
Jan Kara <[email protected]>
SuSE CR Labs

2001-10-30 12:14:24

by Jan Kara

[permalink] [raw]

Subject: Re: Oops: Quota race in 2.4.12?

Hello,

> On Sunday October 28, [email protected] wrote:
> > Some of our dual CPU web servers with 2.4.12 are Oopsing while running
> > quotacheck. They don't seem to die immediately, but oops many times and
> > eventually break. The old tools didn't warn about quotachecking on a
> > live file system, so some of our servers were set up to run quotacheck
> > nightly. The new tools still allow you to do it, but warn that it may
> > not be consistent. We didn't have any problems with 2.2 kernels.
>
> quotacheck cannot be reliable on a live system as it scans through the
> filesystem counting the usage for each user and then updates the
> quotas file. If usage changes between scanning a file and updating
> the quota record, you get an error. This is particularly a problem
> if quotacheck takes a long time, and on one of my servers (heavily
> loaded NFS server) quotacheck takes a *long* time if the server is
> live (it isn't exactly quick if it isn't live either).
>
> I wrote a little program which uses libext2fs to scan the block device
> for inodes and add up usage that way (as opposed to walking the
> filetree as I believe quotacheck does). It runs *much* faster
> (minutes instead of hours).
Note that quotacheck(8) uses e2fslib too if compiled properly...

Honza

>

2001-10-31 13:47:16

by Jan Kara

[permalink] [raw]

Subject: Re: Oops: Quota race in 2.4.12?

Hi,

> Some of our dual CPU web servers with 2.4.12 are Oopsing while running
> quotacheck. They don't seem to die immediately, but oops many times and
> eventually break. The old tools didn't warn about quotachecking on a
> live file system, so some of our servers were set up to run quotacheck
> nightly. The new tools still allow you to do it, but warn that it may
> not be consistent. We didn't have any problems with 2.2 kernels.
>
> First oops, as already processed (grumble) by klogd:
>
> Oct 28 04:22:32 pro kernel: remove_free_dquot: dquot not on the free list??
> Oct 28 04:22:32 pro last message repeated 90 times
> Oct 28 04:22:32 pro kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000004
> ...dates stripped:
<snip>

Attached is the patch against 2.4.13 which should solve some SMP races... Can you try
it if it fixes your problems? I know also about one possible race during quotaoff()
which I'll fix tonight but that shouldn't be your case :).
Honza

Attachments:

(No filename) (1.00 kB)
quota-fix-2.4.13.diff (3.14 kB)
Download all attachments