2008-05-05 21:11:49

by Geert Uytterhoeven

[permalink] [raw]
Subject: Problem mounting ext2 using ext3?

Hi,

When trying today's kernel (f74d505b58d36ceeef10e459094f0eb760681165) on
ARAnyM (emulated m68k Atari), I get

| Unable to handle kernel NULL pointer dereference at virtual address 000000b4
| Oops: 00000000
| Modules linked in:
| PC: [<000aad9e>] ext3_sync_fs+0x16/0x4a
| SR: 2304 SP: 00c09cc0 a2: 00c07a80
| d0: 00000001 d1: 00002300 d2: 00000002 d3: 00677600
| d4: 00677752 d5: 006776d2 a0: 00000000 a1: 00677600
| Process swapper (pid: 1, task=00c07a80)
| Frame format=7 eff addr=000000b4 ssw=0505 faddr=000000b4
| wb 1 stat/addr/data: 0000 00000000 00000000
| wb 2 stat/addr/data: 0000 00000000 00000000
| wb 3 stat/addr/data: 0000 000000b4 00000000
| push data: 00000000 00000000 00000000 00000000
| Stack from 00c09d28:
| 00c09d30 00677600 40000000 00c09df0 000909fe 00677600 00000001 00000001
| 00a02cd0 00000040 00000000 00000000 00000000 00677600 0022671c 00677600
| 00c09df0 00c09d7c 00677666 006776be 006776b2 00001000 00000000 00000000
| 00000000 00000001 00000000 00000000 00065b7a 00677600 ffffffff 00000000
| 00a02ce8 ffffffea 00065e5c 00677600 0067763a 0067b000 0067b000 0080de20
| 0022671c 00c09ea4 68646131 000e38de 00241072 00c01f70 000000d0 00241072
| Call Trace: [<000909fe>] vfs_quota_off+0xb4/0x562
| [<00001000>] kernel_pg_dir+0x0/0x1000
| [<00065b7a>] deactivate_super+0x44/0x72
| [<00065e5c>] get_sb_bdev+0x12a/0x138
| [<000e38de>] idr_pre_get+0x32/0x44
| [<000e37f8>] ida_get_new+0x10/0x16
| [<000a9930>] ext3_get_sb+0x1e/0x24

when mounting the root file system, which is ext2 (has_journal is not set).
Apparently it crashes in ext3_sync_fs because EXT3_SB(sb)->s_journal is NULL.

At first I thought it was an issue with the byteswapped IDE bus on Atari (a
new and different solution to handle this just went into mainline), but if I
disable CONFIG_EXT3 support, it boots up fine.

Is this a known problem?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


2008-05-05 22:26:48

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Problem mounting ext2 using ext3?

On Mon, May 05, 2008 at 11:11:46PM +0200, Geert Uytterhoeven wrote:
> when mounting the root file system, which is ext2 (has_journal is not set).
> Apparently it crashes in ext3_sync_fs because EXT3_SB(sb)->s_journal is NULL.
>
> At first I thought it was an issue with the byteswapped IDE bus on Atari (a
> new and different solution to handle this just went into mainline), but if I
> disable CONFIG_EXT3 support, it boots up fine.
>
> Is this a known problem?

I can confirm this as a regression. You don't even need to mount it
as a root filesystem, or do this on an 68k system. On my x86 system,
using a kernel based off of git commit: afa26be8 (6 commits after
2.6.26-rc1), mounting an ext3 filesystem, you can cause an oops by
taking an ext2 filesystem and forcing a mount as ext3, "mount -t ext3
/dev/closure/textext2fs /mnt"). (see below for my oops). This does
not occur with a kernel based off of 2.6.25, so it's a definite
regression.

Looks like the problem is some of the recent quota cleanups. The
problem is that ext3_fill_super is returning an error, because the
journal is missing. get_sb_dev() calls ext3_fill_super, and upon
receiving an error, it is calling deactivate_super(), which calls:

DQUOT_OFF(s, 0);

(line 182 in fs/super.c, in deactivate_super(), recently modified just
after 2.6.25, at comment 0ff5af8340aa6be44220d7237ef4a654314cf795,
although I'm not sure this is actually the problem commit)).

The blow up is happening because the because superblock was not fully
set up, and the comment in the commit involved mentioned cleaning up
what is supposed to happen when remounting a filesystem turning quota
on or off. I'm guessing that the changes didn't take into account
that DQUOT_OFF() can get called with a partially set-up superblock,
which will happen when the filesystme specific get_sb() code refuses a
mount and returns an error.

Jan, can you take a look at this and confirm whether or not this is
the root cause of the crash?

Thanks!!

- Ted


Pid: 6738, comm: mount Tainted: G W (2.6.26-rc1-01265-g1f94101 #12)
EIP: 0060:[<f8980f89>] EFLAGS: 00010286 CPU: 0
EIP is at ext3_sync_fs+0x19/0x47 [ext3]
EAX: 00000000 EBX: f619e400 ECX: f8980f70 EDX: ed029dd8
ESI: 00000001 EDI: f8990520 EBP: ed029de4 ESP: ed029dd8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process mount (pid: 6738, ti=ed029000 task=f36eaea0 task.ti=ed029000)
Stack: f8990520 f619e400 f619e400 ed029e44 c01b4218 00000000 ffffffff f619e560
ed029e18 00000246 f619e514 00000246 ed029e38 f619e65c f619e664 c03ccd80
00000002 c03ccd90 00000246 c018890e 00000246 c03ccd80 00000000 00000000
Call Trace:
[vfs_quota_off+1051/1287] ? vfs_quota_off+0x41b/0x507
[deactivate_super+51/106] ? deactivate_super+0x33/0x6a
[vfs_quota_off+0/1287] ? vfs_quota_off+0x0/0x507
[deactivate_super+74/106] ? deactivate_super+0x4a/0x6a
[get_sb_bdev+230/275] ? get_sb_bdev+0xe6/0x113
[alloc_vfsmnt+225/265] ? alloc_vfsmnt+0xe1/0x109
[<f897fe47>] ? ext3_get_sb+0x13/0x15 [ext3]
[<f89817c1>] ? ext3_fill_super+0x0/0x14d9 [ext3]
[vfs_kern_mount+129/247] ? vfs_kern_mount+0x81/0xf7
[do_kern_mount+50/186] ? do_kern_mount+0x32/0xba
[do_new_mount+70/113] ? do_new_mount+0x46/0x71
[do_mount+407/437] ? do_mount+0x197/0x1b5
[down+43/47] ? down+0x2b/0x2f
[down+43/47] ? down+0x2b/0x2f
[sys_mount+100/156] ? sys_mount+0x64/0x9c
[sysenter_past_esp+120/209] ? sysenter_past_esp+0x78/0xd1
=======================
Code: 00 8b 80 c4 11 00 00 c6 42 11 00 e8 5f 20 fc ff 5d c3 55 89 e5 56 89 d6 53 89 c3 83 ec 04 c6 40 11 00 8b 80 b0 02 00 00 8d 55 f4 <8b> 80 c4 11 00 00 e8 ff 62 fc ff 85 c0 74 18 85 f6 74 14 8b 83
EIP: [<f8980f89>] ext3_sync_fs+0x19/0x47 [ext3] SS:ESP 0068:ed029dd8
May 5 17:42:53 closure kernel: [ 102.207975] ---[ end trace ac590292814c8102 ]



2008-05-06 07:11:15

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: Problem mounting ext2 using ext3?

Hi Ted,

On Mon, 5 May 2008, Theodore Tso wrote:
> On Mon, May 05, 2008 at 11:11:46PM +0200, Geert Uytterhoeven wrote:
> > when mounting the root file system, which is ext2 (has_journal is not set).
> > Apparently it crashes in ext3_sync_fs because EXT3_SB(sb)->s_journal is NULL.
> >
> > At first I thought it was an issue with the byteswapped IDE bus on Atari (a
> > new and different solution to handle this just went into mainline), but if I
> > disable CONFIG_EXT3 support, it boots up fine.
> >
> > Is this a known problem?
>
> I can confirm this as a regression. You don't even need to mount it

Thanks for confirming!

> as a root filesystem, or do this on an 68k system. On my x86 system,

That's all I had available for a quick test ;-)

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2008-05-06 07:26:30

by Vegard Nossum

[permalink] [raw]
Subject: Re: Problem mounting ext2 using ext3?

Hi,

On Tue, May 6, 2008 at 12:26 AM, Theodore Tso <[email protected]> wrote:
> On Mon, May 05, 2008 at 11:11:46PM +0200, Geert Uytterhoeven wrote:
> > when mounting the root file system, which is ext2 (has_journal is not set).
> > Apparently it crashes in ext3_sync_fs because EXT3_SB(sb)->s_journal is NULL.
> >
> > At first I thought it was an issue with the byteswapped IDE bus on Atari (a
> > new and different solution to handle this just went into mainline), but if I
> > disable CONFIG_EXT3 support, it boots up fine.
> >
> > Is this a known problem?
>
> I can confirm this as a regression. You don't even need to mount it
> as a root filesystem, or do this on an 68k system. On my x86 system,
> using a kernel based off of git commit: afa26be8 (6 commits after
> 2.6.26-rc1), mounting an ext3 filesystem, you can cause an oops by
> taking an ext2 filesystem and forcing a mount as ext3, "mount -t ext3
> /dev/closure/textext2fs /mnt"). (see below for my oops). This does
> not occur with a kernel based off of 2.6.25, so it's a definite
> regression.

Hi,

I posted a very similar problem a couple of days ago:
http://www.nabble.com/BUG-in-ext3_sync_fs-td16999997.html

to which I got zero replies. Can I close this in my internal bugzilla
as dup/"not my fault"? The stacktrace looks very similar. This was
also ext2 fs mounted (apparently) by ext3 code.

Thanks.

Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-05-06 09:50:28

by Jan Kara

[permalink] [raw]
Subject: Re: Problem mounting ext2 using ext3?

On Tue 06-05-08 09:26:30, Vegard Nossum wrote:
> Hi,
>
> On Tue, May 6, 2008 at 12:26 AM, Theodore Tso <[email protected]> wrote:
> > On Mon, May 05, 2008 at 11:11:46PM +0200, Geert Uytterhoeven wrote:
> > > when mounting the root file system, which is ext2 (has_journal is not set).
> > > Apparently it crashes in ext3_sync_fs because EXT3_SB(sb)->s_journal is NULL.
> > >
> > > At first I thought it was an issue with the byteswapped IDE bus on Atari (a
> > > new and different solution to handle this just went into mainline), but if I
> > > disable CONFIG_EXT3 support, it boots up fine.
> > >
> > > Is this a known problem?
> >
> > I can confirm this as a regression. You don't even need to mount it
> > as a root filesystem, or do this on an 68k system. On my x86 system,
> > using a kernel based off of git commit: afa26be8 (6 commits after
> > 2.6.26-rc1), mounting an ext3 filesystem, you can cause an oops by
> > taking an ext2 filesystem and forcing a mount as ext3, "mount -t ext3
> > /dev/closure/textext2fs /mnt"). (see below for my oops). This does
> > not occur with a kernel based off of 2.6.25, so it's a definite
> > regression.
>
> Hi,
>
> I posted a very similar problem a couple of days ago:
> http://www.nabble.com/BUG-in-ext3_sync_fs-td16999997.html
>
> to which I got zero replies. Can I close this in my internal bugzilla
> as dup/"not my fault"? The stacktrace looks very similar. This was
> also ext2 fs mounted (apparently) by ext3 code.
Yes, this looks like the same problem. I'll take care of that.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2008-05-06 10:02:18

by Jan Kara

[permalink] [raw]
Subject: Re: Problem mounting ext2 using ext3?

On Mon 05-05-08 18:26:23, Theodore Tso wrote:
> On Mon, May 05, 2008 at 11:11:46PM +0200, Geert Uytterhoeven wrote:
> > when mounting the root file system, which is ext2 (has_journal is not set).
> > Apparently it crashes in ext3_sync_fs because EXT3_SB(sb)->s_journal is NULL.
> >
> > At first I thought it was an issue with the byteswapped IDE bus on Atari (a
> > new and different solution to handle this just went into mainline), but if I
> > disable CONFIG_EXT3 support, it boots up fine.
> >
> > Is this a known problem?
>
> I can confirm this as a regression. You don't even need to mount it
> as a root filesystem, or do this on an 68k system. On my x86 system,
> using a kernel based off of git commit: afa26be8 (6 commits after
> 2.6.26-rc1), mounting an ext3 filesystem, you can cause an oops by
> taking an ext2 filesystem and forcing a mount as ext3, "mount -t ext3
> /dev/closure/textext2fs /mnt"). (see below for my oops). This does
> not occur with a kernel based off of 2.6.25, so it's a definite
> regression.
>
> Looks like the problem is some of the recent quota cleanups. The
> problem is that ext3_fill_super is returning an error, because the
> journal is missing. get_sb_dev() calls ext3_fill_super, and upon
> receiving an error, it is calling deactivate_super(), which calls:
>
> DQUOT_OFF(s, 0);
>
> (line 182 in fs/super.c, in deactivate_super(), recently modified just
> after 2.6.25, at comment 0ff5af8340aa6be44220d7237ef4a654314cf795,
> although I'm not sure this is actually the problem commit)).
>
> The blow up is happening because the because superblock was not fully
> set up, and the comment in the commit involved mentioned cleaning up
> what is supposed to happen when remounting a filesystem turning quota
> on or off. I'm guessing that the changes didn't take into account
> that DQUOT_OFF() can get called with a partially set-up superblock,
> which will happen when the filesystme specific get_sb() code refuses a
> mount and returns an error.
>
> Jan, can you take a look at this and confirm whether or not this is
> the root cause of the crash?
Thanks Ted for looking into this. Yes, the problem is caused by my
modifications to quota code... The patch below fixes it for me and I've
also added a comment so that someone does not remove the check again in
future ;).

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
---

>From af9d1ac1db9acea1516350d4269796061e0f2ab5 Mon Sep 17 00:00:00 2001
From: Jan Kara <[email protected]>
Date: Tue, 6 May 2008 11:33:00 +0200
Subject: [PATCH] quota: Don't call sync_fs() from vfs_quota_off() when there's no quota turn off

Sometimes, vfs_quota_off() is called on a partially set up super block (for example
when fill_super() fails for some reason). In such cases we cannot call ->sync_fs()
because it can Oops because of not properly filled in super block. So in case we
find there's not quota to turn off, we just skip everything and return which fixes
the above problem.

Signed-off-by: Jan Kara <[email protected]>
---
fs/dquot.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/dquot.c b/fs/dquot.c
index dfba162..bdcb15e 100644
--- a/fs/dquot.c
+++ b/fs/dquot.c
@@ -1491,6 +1491,16 @@ int vfs_quota_off(struct super_block *sb, int type, int remount)

/* We need to serialize quota_off() for device */
mutex_lock(&dqopt->dqonoff_mutex);
+
+ /*
+ * Skip everything if there's nothing to do. We have to do this becase
+ * sometimes we are called when fill_super() failed and calling
+ * sync_fs() in such cases does no good.
+ */
+ if (!sb_any_quota_enabled(sb) && !sb_any_quota_suspended(sb)) {
+ mutex_unlock(&dqopt->dqonoff_mutex);
+ return 0;
+ }
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
toputinode[cnt] = NULL;
if (type != -1 && cnt != type)
--
1.5.2.4


2008-05-06 11:20:30

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: Problem mounting ext2 using ext3?

On Tue, 6 May 2008, Jan Kara wrote:
> Thanks Ted for looking into this. Yes, the problem is caused by my
> modifications to quota code... The patch below fixes it for me and I've
> also added a comment so that someone does not remove the check again in
> future ;).

I'll try to give it a try this evening...

> + * Skip everything if there's nothing to do. We have to do this becase
^^^^^^
because

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2008-05-06 12:06:03

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Problem mounting ext2 using ext3?

On Tue, May 06, 2008 at 09:26:30AM +0200, Vegard Nossum wrote:
>
> I posted a very similar problem a couple of days ago:
> http://www.nabble.com/BUG-in-ext3_sync_fs-td16999997.html
>
> to which I got zero replies. Can I close this in my internal bugzilla
> as dup/"not my fault"? The stacktrace looks very similar. This was
> also ext2 fs mounted (apparently) by ext3 code.

Yeah, sorry, I didn't have network access when I looked at your
e-mail, and so I didn't see the stack trace until you pointed at the
mail message again. Because you mentioned a USB stick, I assumed it
was the problem with a USB stick getting pulled or being loose causing
an I/O error leading to an oops problem, and I missed the hint of the
problem occurring in the ext3 code when you were using an ext2
formattem. Fortunately when Geert reported the bug a second time, I
was less dense at the time, and figured it out quickly. :-)

- Ted

2008-05-06 19:05:07

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: Problem mounting ext2 using ext3?

On Tue, 6 May 2008, Jan Kara wrote:
> On Mon 05-05-08 18:26:23, Theodore Tso wrote:
> > On Mon, May 05, 2008 at 11:11:46PM +0200, Geert Uytterhoeven wrote:
> > > when mounting the root file system, which is ext2 (has_journal is not set).
> > > Apparently it crashes in ext3_sync_fs because EXT3_SB(sb)->s_journal is NULL.
> > >
> > > At first I thought it was an issue with the byteswapped IDE bus on Atari (a
> > > new and different solution to handle this just went into mainline), but if I
> > > disable CONFIG_EXT3 support, it boots up fine.
> > >
> > > Is this a known problem?
> >
> > I can confirm this as a regression. You don't even need to mount it
> > as a root filesystem, or do this on an 68k system. On my x86 system,
> > using a kernel based off of git commit: afa26be8 (6 commits after
> > 2.6.26-rc1), mounting an ext3 filesystem, you can cause an oops by
> > taking an ext2 filesystem and forcing a mount as ext3, "mount -t ext3
> > /dev/closure/textext2fs /mnt"). (see below for my oops). This does
> > not occur with a kernel based off of 2.6.25, so it's a definite
> > regression.
> >
> > Looks like the problem is some of the recent quota cleanups. The
> > problem is that ext3_fill_super is returning an error, because the
> > journal is missing. get_sb_dev() calls ext3_fill_super, and upon
> > receiving an error, it is calling deactivate_super(), which calls:
> >
> > DQUOT_OFF(s, 0);
> >
> > (line 182 in fs/super.c, in deactivate_super(), recently modified just
> > after 2.6.25, at comment 0ff5af8340aa6be44220d7237ef4a654314cf795,
> > although I'm not sure this is actually the problem commit)).
> >
> > The blow up is happening because the because superblock was not fully
> > set up, and the comment in the commit involved mentioned cleaning up
> > what is supposed to happen when remounting a filesystem turning quota
> > on or off. I'm guessing that the changes didn't take into account
> > that DQUOT_OFF() can get called with a partially set-up superblock,
> > which will happen when the filesystme specific get_sb() code refuses a
> > mount and returns an error.
> >
> > Jan, can you take a look at this and confirm whether or not this is
> > the root cause of the crash?
> Thanks Ted for looking into this. Yes, the problem is caused by my
> modifications to quota code... The patch below fixes it for me and I've
> also added a comment so that someone does not remove the check again in
> future ;).

Thanks Jan! Your patch fixed my problem.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds