2000-11-28 20:03:15

by Petr Vandrovec

[permalink] [raw]
Subject: 2.4.0-test11 ext2 fs corruption

Hi Al,
during weekend I was uncompressing XFree (Debian's 4.0.1-7) at home,
with 2.4.0-test11 running on Celeron 300A, 128MB RAM, SMP kernel on up.
It failed to compile lbxproxy/di/main.c. After some investigation I found
that they were overwritten by some source font data. fsck did not reveal
any croslinked clusters, nothing. Filesystem itself uses 4KB clusters.

Today I found some spare time and investigated it further. There is
same data contents in:

programs/lbxproxy/di/init.c 0-8720 fonts/bdf/75dpi/lubR24.bdf 0x5000-0x7210
lbxfuncs.c 0x0000-0x0EC0 lubR24.bdf 0x8000-0x8EC0
0x0EC1-0x0FFF zero
0x1000-0x5ABC lutBS08.bdf 0x0000-0x4ABC
0x5ABD-0x5FFF zero
0x6000-0x92C1 lutBS10.bdf 0x0000-0x32C1
lbxutil.c 0x0000-0x1E27 lutBS10.bdf 0x4000-0x5E27
0x1E28-0x1FFF zero
0x2000-0x3452 lutBS12.bdf 0x0000-0x1452
main.c 0-4614 lutBS12.bdf 0x2000-0x3206
options.c 0x0000-0x222E lutBS12.bdf 0x4000-0x622E
0x222F-0x2FFF zero
0x3000-0x4E30 lutBS14.bdf 0x0000-0x1E30
pm.c 0-11706 lutBS14.bdf 0x2000-0x4DA8
(blocks 722433-722459) (blocks 558899-~558927)

Other files are intouch. As you can see, somewhat disk blocks
ended somewhere else than they should in addition to correct place.
I also found that data after end of file in di/*.c files are not
cleared, so maybe that ide driver did a mistake? But I was not able
to find how to convert either block address, or LBA adress, or CHS
address (drive uses 839/240/63, but I hope that it runs in LBA) to
get 558899 from 722433 or vice versa.

Motherboard is i440BX, HDD was IDE TOSHIBA MK6409MAV on secondary IDE,
running UDMA2.

Nobody complained - neither IDE nor kernel nor ext2, just data were
damaged. Machine does not have any other problems, so I have no idea
what caused this incident. Maybe I stressed MM system too much with
some gnome app during untar?

And last note, according to debian/scripts/source.unpack, programs/lbxproxy
was created first, and fonts/bdf/... was created after that (i.e.
X401src-1 was decompressed first, X401src-2_debian was decompressed
second). This also agrees with zeroed bytes in these datablocks.
Thanks,
Petr Vandrovec
[email protected]

P.S.: Ted, why field 'Blocks: XXX' in debugfs (1.19) is 'Sectors: '
in reality (it reports blocks * 8, so I assume (as I have 4KB clusters)
that it converts it to sector count)?


2000-11-28 20:32:55

by Alexander Viro

[permalink] [raw]
Subject: Re: 2.4.0-test11 ext2 fs corruption



On Tue, 28 Nov 2000, Petr Vandrovec wrote:

> Hi Al,
> during weekend I was uncompressing XFree (Debian's 4.0.1-7) at home,
> with 2.4.0-test11 running on Celeron 300A, 128MB RAM, SMP kernel on up.
> It failed to compile lbxproxy/di/main.c. After some investigation I found
> that they were overwritten by some source font data. fsck did not reveal
> any croslinked clusters, nothing. Filesystem itself uses 4KB clusters.
>
> Today I found some spare time and investigated it further. There is
> same data contents in:
>
> programs/lbxproxy/di/init.c 0-8720 fonts/bdf/75dpi/lubR24.bdf 0x5000-0x7210
> lbxfuncs.c 0x0000-0x0EC0 lubR24.bdf 0x8000-0x8EC0
> 0x0EC1-0x0FFF zero
> 0x1000-0x5ABC lutBS08.bdf 0x0000-0x4ABC
> 0x5ABD-0x5FFF zero
> 0x6000-0x92C1 lutBS10.bdf 0x0000-0x32C1
> lbxutil.c 0x0000-0x1E27 lutBS10.bdf 0x4000-0x5E27
> 0x1E28-0x1FFF zero
> 0x2000-0x3452 lutBS12.bdf 0x0000-0x1452
> main.c 0-4614 lutBS12.bdf 0x2000-0x3206
> options.c 0x0000-0x222E lutBS12.bdf 0x4000-0x622E
> 0x222F-0x2FFF zero
> 0x3000-0x4E30 lutBS14.bdf 0x0000-0x1E30
> pm.c 0-11706 lutBS14.bdf 0x2000-0x4DA8
> (blocks 722433-722459) (blocks 558899-~558927)
>
> Other files are intouch. As you can see, somewhat disk blocks
> ended somewhere else than they should in addition to correct place.
> I also found that data after end of file in di/*.c files are not
> cleared, so maybe that ide driver did a mistake? But I was not able
> to find how to convert either block address, or LBA adress, or CHS
> address (drive uses 839/240/63, but I hope that it runs in LBA) to
> get 558899 from 722433 or vice versa.

Erm... Do you mean that you've got a 1-1 correspondence in data between these
two ranges? Then it looks like something way below the fs level... Weird.
Could you verify it with dd?

2000-11-28 20:41:27

by Petr Vandrovec

[permalink] [raw]
Subject: Re: 2.4.0-test11 ext2 fs corruption

On 28 Nov 00 at 15:02, Alexander Viro wrote:
> On Tue, 28 Nov 2000, Petr Vandrovec wrote:
>
> > Hi Al,
> > during weekend I was uncompressing XFree (Debian's 4.0.1-7) at home,
> > with 2.4.0-test11 running on Celeron 300A, 128MB RAM, SMP kernel on up.
> > It failed to compile lbxproxy/di/main.c. After some investigation I found
> > that they were overwritten by some source font data. fsck did not reveal
> > any croslinked clusters, nothing. Filesystem itself uses 4KB clusters.
> >
> > Today I found some spare time and investigated it further. There is
> > same data contents in:
> >
> > programs/lbxproxy/di/init.c 0-8720 fonts/bdf/75dpi/lubR24.bdf 0x5000-0x7210
> > lbxfuncs.c 0x0000-0x0EC0 lubR24.bdf 0x8000-0x8EC0
> > 0x0EC1-0x0FFF zero
> > 0x1000-0x5ABC lutBS08.bdf 0x0000-0x4ABC
> > 0x5ABD-0x5FFF zero
> > 0x6000-0x92C1 lutBS10.bdf 0x0000-0x32C1
> > lbxutil.c 0x0000-0x1E27 lutBS10.bdf 0x4000-0x5E27
> > 0x1E28-0x1FFF zero
> > 0x2000-0x3452 lutBS12.bdf 0x0000-0x1452
> > main.c 0-4614 lutBS12.bdf 0x2000-0x3206
> > options.c 0x0000-0x222E lutBS12.bdf 0x4000-0x622E
> > 0x222F-0x2FFF zero
> > 0x3000-0x4E30 lutBS14.bdf 0x0000-0x1E30
> > pm.c 0-11706 lutBS14.bdf 0x2000-0x4DA8
> > (blocks 722433-722459) (blocks 558899-~558927)
> >
> > Other files are intouch. As you can see, somewhat disk blocks
> > ended somewhere else than they should in addition to correct place.
> > I also found that data after end of file in di/*.c files are not
> > cleared, so maybe that ide driver did a mistake? But I was not able
> > to find how to convert either block address, or LBA adress, or CHS
> > address (drive uses 839/240/63, but I hope that it runs in LBA) to
> > get 558899 from 722433 or vice versa.
>
> Erm... Do you mean that you've got a 1-1 correspondence in data between these
> two ranges? Then it looks like something way below the fs level... Weird.
> Could you verify it with dd?

Yes, it is identical copy. But I do not think that hdd can write same
data into two places with one command...

vana:/# dd if=/dev/hdd1 bs=4096 count=27 skip=722433 | md5sum
27+0 records in
27+0 records out
613de4a7ea664ce34b2a9ec8203de0f4
vana:/# dd if=/dev/hdd1 bs=4096 count=27 skip=558899 | md5sum
27+0 records in
27+0 records out
613de4a7ea664ce34b2a9ec8203de0f4
vana:/#

I found match by searching of contents of init.c in other files.

It is just these 27 blocks; blocks before and after range differs.
Best regards,
Petr Vandrovec
[email protected]

2000-11-28 20:49:50

by David Miller

[permalink] [raw]
Subject: Re: 2.4.0-test11 ext2 fs corruption

From: "Petr Vandrovec" <[email protected]>
Date: Tue, 28 Nov 2000 21:10:36 MET-1

Yes, it is identical copy. But I do not think that hdd can write same
data into two places with one command...

Petr, did the af_inet.c assertions get triggered on this
same machine?

If yes, you seem to have some crazy kernel data corruptions
going on, and whatever it is would seem to be the cause of
both these problems you are reporting.

Later,
David S. Miller
[email protected]

2000-11-28 21:12:42

by Alexander Viro

[permalink] [raw]
Subject: Re: 2.4.0-test11 ext2 fs corruption



On Tue, 28 Nov 2000, Petr Vandrovec wrote:

> > two ranges? Then it looks like something way below the fs level... Weird.
> > Could you verify it with dd?
>
> Yes, it is identical copy. But I do not think that hdd can write same
> data into two places with one command...
>
> vana:/# dd if=/dev/hdd1 bs=4096 count=27 skip=722433 | md5sum
> 27+0 records in
> 27+0 records out
> 613de4a7ea664ce34b2a9ec8203de0f4
> vana:/# dd if=/dev/hdd1 bs=4096 count=27 skip=558899 | md5sum
> 27+0 records in
> 27+0 records out
> 613de4a7ea664ce34b2a9ec8203de0f4
> vana:/#

Bloody hell... OK, let's see. Both ranges are covered by multiple files
and are way larger than one page. I.e. anything on pagecache level is
extremely unlikely - pages are not searched by physical location on
disk. And I really doubt that it's ext2_get_block() - we would have
to get a systematic error (constant offset), then read the data in
for no good reason, then forget the page->buffers, then get the right
values fro ext2_get_block(), leave the data unmodified _and_ write it.

It almost looks like a request in queue got fscked up retaining the
->bh from one of the previous (also coalesced) requests and having
correct ->sector. Weird.

Linus, Andrea - any ideas? Situation looks so: after massive file creation
a range of disk with the data from new files (many new files) got
duplicated over another range - one with the data from older files
(also many of them). 27 blocks, block size == 4Kb. No intersection
between inodes, fsck is happy with fs, just a data ending up in two
places on disk. No warnings from IDE or ext2 drivers.

Kernel: test11 built with 2.95.2, so gcc bug may very well be there.
However, I really wonder what could trigger it in ll_rw_blk.c - 5:1
that shit had hit the fan there.

2000-11-28 21:17:33

by Petr Vandrovec

[permalink] [raw]
Subject: Re: 2.4.0-test11 ext2 fs corruption

On 28 Nov 00 at 12:04, David S. Miller wrote:
>
> Yes, it is identical copy. But I do not think that hdd can write same
> data into two places with one command...
>
> Petr, did the af_inet.c assertions get triggered on this
> same machine?

No, ext2fs is at home, and af_inet is at work... At work I'm using
vmware, at home I do not use it... But kernel sources are same
(g450 patch for matroxfb, ncpfs supporting device nodes, threaded ipx;
but neither ncpfs nor ipx is compiled at home).
Petr Vandrovec
[email protected]


2000-11-29 01:14:09

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.4.0-test11 ext2 fs corruption

On Tue, Nov 28 2000, Petr Vandrovec wrote:
> On 28 Nov 00 at 12:04, David S. Miller wrote:
> >
> > Yes, it is identical copy. But I do not think that hdd can write same
> > data into two places with one command...
> >
> > Petr, did the af_inet.c assertions get triggered on this
> > same machine?
>
> No, ext2fs is at home, and af_inet is at work... At work I'm using
> vmware, at home I do not use it... But kernel sources are same
> (g450 patch for matroxfb, ncpfs supporting device nodes, threaded ipx;
> but neither ncpfs nor ipx is compiled at home).
> Petr Vandrovec
> [email protected]

Petr,

Could you try and reproduce with attached patch? If this would trigger
I would assume fs corruption as well (which doesn't seem to be the
case for you), but it's worth a shot.

--- drivers/block/ll_rw_blk.c~ Wed Nov 29 01:30:22 2000
+++ drivers/block/ll_rw_blk.c Wed Nov 29 01:33:00 2000
@@ -684,7 +684,7 @@
int max_segments = MAX_SEGMENTS;
struct request * req = NULL, *freereq = NULL;
int rw_ahead, max_sectors, el_ret;
- struct list_head *head = &q->queue_head;
+ struct list_head *head;
int latency;
elevator_t *elevator = &q->elevator;

@@ -734,6 +734,7 @@
*/
again:
spin_lock_irq(&io_request_lock);
+ head = &q->queue_head;

/*
* skip first entry, for devices with active queue head

--
* Jens Axboe <[email protected]>
* SuSE Labs

2000-11-29 01:39:19

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.0-test11 ext2 fs corruption

Side note: that could generate mem/io corruption only on headactive devices
(like IDE).

Andrea

2000-11-29 01:42:29

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.4.0-test11 ext2 fs corruption

On Wed, Nov 29 2000, Andrea Arcangeli wrote:
> Side note: that could generate mem/io corruption only on headactive devices
> (like IDE).

Yep, that's why I told Linus it was a long shot and couldn't possibly
account for all the corruption cases reported. And one would expect
fs corruption to go with that as well. So it's of course a long shot,
but still worth trying for Petr.

--
* Jens Axboe <[email protected]>
* SuSE Labs

2000-11-29 02:05:27

by Andre Hedrick

[permalink] [raw]
Subject: Re: 2.4.0-test11 ext2 fs corruption

On Wed, 29 Nov 2000, Jens Axboe wrote:

> On Wed, Nov 29 2000, Andrea Arcangeli wrote:
> > Side note: that could generate mem/io corruption only on headactive devices
> > (like IDE).
>
> Yep, that's why I told Linus it was a long shot and couldn't possibly
> account for all the corruption cases reported. And one would expect
> fs corruption to go with that as well. So it's of course a long shot,
> but still worth trying for Petr.

Okay, I have spent part of the afternoon kicking my FW around and have not
followed all of the thread. However we are talking FSC and ATA so what
are the details? And where are we poking into the driver.

Andre Hedrick
Linux ATA Development

2000-11-29 12:12:42

by Petr Vandrovec

[permalink] [raw]
Subject: Re: 2.4.0-test11 ext2 fs corruption

On 29 Nov 00 at 1:43, Jens Axboe wrote:

> Could you try and reproduce with attached patch? If this would trigger
> I would assume fs corruption as well (which doesn't seem to be the
> case for you), but it's worth a shot.

I'll try, but it is not easily reproducible. Fortunately.

BTW, during night, it came to me that maybe I was biased with original
diagnostics (thing written twice), as there was (~3 weeks ago) unpacked
XF4.0.1-0phase?v27 on the same disk.

As font data did not change between these two versions, it is possible
that one 27 blocks chunk (*.c files) was lost (or written somewhere where
I did not found it yet), instead of another one (fonts) duplicated.
Thanks,
Petr Vandrovec
[email protected]

2000-11-29 13:00:05

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: 2.4.0-test11 ext2 fs corruption



>--- drivers/block/ll_rw_blk.c~ Wed Nov 29 01:30:22 2000
>+++ drivers/block/ll_rw_blk.c Wed Nov 29 01:33:00 2000
>@@ -684,7 +684,7 @@
> int max_segments = MAX_SEGMENTS;
> struct request * req = NULL, *freereq = NULL;
> int rw_ahead, max_sectors, el_ret;
>- struct list_head *head = &q->queue_head;
>+ struct list_head *head;
> int latency;
> elevator_t *elevator = &q->elevator;

head = &q->queue_head is a simple offset calculation in the request
queue structure. Moving this into the spinlock won't change anything,
since q->queue_head isn't a pointer that can change.

Independent of that I can second the observation that test11 can corrupt
ext2 in memory. I think that this is related to the memory management
problems I see but I can't prove it yet.

blue skies,
Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Sch?naicherstr. 220, D-71032 B?blingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [email protected]


2000-11-29 13:13:58

by Alexander Viro

[permalink] [raw]
Subject: Re: 2.4.0-test11 ext2 fs corruption



On Wed, 29 Nov 2000 [email protected] wrote:

>
>
> >--- drivers/block/ll_rw_blk.c~ Wed Nov 29 01:30:22 2000
> >+++ drivers/block/ll_rw_blk.c Wed Nov 29 01:33:00 2000
> >@@ -684,7 +684,7 @@
> > int max_segments = MAX_SEGMENTS;
> > struct request * req = NULL, *freereq = NULL;
> > int rw_ahead, max_sectors, el_ret;
> >- struct list_head *head = &q->queue_head;
> >+ struct list_head *head;
> > int latency;
> > elevator_t *elevator = &q->elevator;
>
> head = &q->queue_head is a simple offset calculation in the request
> queue structure. Moving this into the spinlock won't change anything,
> since q->queue_head isn't a pointer that can change.

That's fine, but head is _re_assigned later. Grep for 'head =' and 'again'
in __make_request().

2000-11-29 16:57:31

by Juri Haberland

[permalink] [raw]
Subject: Re: 2.4.0-test11 ext2 fs corruption

Alexander Viro wrote:
>
> On Tue, 28 Nov 2000, Petr Vandrovec wrote:
>
> > > two ranges? Then it looks like something way below the fs level... Weird.
> > > Could you verify it with dd?
> >
> > Yes, it is identical copy. But I do not think that hdd can write same
> > data into two places with one command...
> >
> > vana:/# dd if=/dev/hdd1 bs=4096 count=27 skip=722433 | md5sum
> > 27+0 records in
> > 27+0 records out
> > 613de4a7ea664ce34b2a9ec8203de0f4
> > vana:/# dd if=/dev/hdd1 bs=4096 count=27 skip=558899 | md5sum
> > 27+0 records in
> > 27+0 records out
> > 613de4a7ea664ce34b2a9ec8203de0f4
> > vana:/#
>
> Bloody hell... OK, let's see. Both ranges are covered by multiple files
> and are way larger than one page. I.e. anything on pagecache level is
> extremely unlikely - pages are not searched by physical location on
> disk. And I really doubt that it's ext2_get_block() - we would have
> to get a systematic error (constant offset), then read the data in
> for no good reason, then forget the page->buffers, then get the right
> values fro ext2_get_block(), leave the data unmodified _and_ write it.
>
> It almost looks like a request in queue got fscked up retaining the
> ->bh from one of the previous (also coalesced) requests and having
> correct ->sector. Weird.
>
> Linus, Andrea - any ideas? Situation looks so: after massive file creation
> a range of disk with the data from new files (many new files) got
> duplicated over another range - one with the data from older files
> (also many of them). 27 blocks, block size == 4Kb. No intersection
> between inodes, fsck is happy with fs, just a data ending up in two
> places on disk. No warnings from IDE or ext2 drivers.
>
> Kernel: test11 built with 2.95.2, so gcc bug may very well be there.
> However, I really wonder what could trigger it in ll_rw_blk.c - 5:1
> that shit had hit the fan there.

I picked up a bug in ialloc.c back in February, for which I submitted a
poorly constructed patch (for which I was privately and properly flamed;
as I recall my subsequent attempts to post an improved version failed
for various reasons which may or may not include ORBS). Anyway, the
basic idea is clear:

http://marc.theaimsgroup.com/?l=linux-kernel&m=95162877201890&w=2

I'll make a proper patch out of this if you like. This *could* cause
the effect we're seeing here.

--
Daniel

2000-11-30 00:34:41

by Juri Haberland

[permalink] [raw]
Subject: [PATCH] Re: 2.4.0-test11 ext2 fs corruption

Alexander Viro wrote:
> Bloody hell...

I don't know if this is the bug he's got, in fact I doubt it, but it's a
bug and it needs fixing. The problem is, ext2_get_group_desc
effectively returns two results; one of them is being assigned from on
conditional paths and the other isn't. This bug will cause - on very
rare occasions - the wrong group descriptor block to be marked dirty,
and changes might be lost. I think what we'd see as a result is wrong
block, inode and directory counts.

The fix below is kind of gross. The way I really want to do the fix is
to remove one parameter from ext2_get_group_desc and thereby get rid of
the troublesome side effect for good, but that kind of change isn't
compatible with 'code freeze'.

(linux.2.4.0-test11)

--- fs/ext2/ialloc.c.old Thu Nov 30 00:36:02 2000
+++ fs/ext2/ialloc.c Thu Nov 30 00:36:39 2000
@@ -260,7 +260,7 @@
{
struct super_block * sb;
struct buffer_head * bh;
- struct buffer_head * bh2;
+ struct buffer_head * bh2, * tmpbh2;
int i, j, avefreei;
struct inode * inode;
int bitmap_nr;
@@ -293,10 +293,11 @@
/* I am not yet convinced that this next bit is necessary.
i = dir->u.ext2_i.i_block_group;
for (j = 0; j < sb->u.ext2_sb.s_groups_count; j++) {
- tmp = ext2_get_group_desc (sb, i, &bh2);
+ tmp = ext2_get_group_desc (sb, i, &tmpbh2);
if (tmp &&
(le16_to_cpu(tmp->bg_used_dirs_count) << 8) <
le16_to_cpu(tmp->bg_free_inodes_count)) {
+ bh2 = tmpbh2;
gdp = tmp;
break;
}
@@ -306,7 +307,7 @@
*/
if (!gdp) {
for (j = 0; j < sb->u.ext2_sb.s_groups_count; j++) {
- tmp = ext2_get_group_desc (sb, j, &bh2);
+ tmp = ext2_get_group_desc (sb, j, &tmpbh2);
if (tmp &&
le16_to_cpu(tmp->bg_free_inodes_count) &&
le16_to_cpu(tmp->bg_free_inodes_count) >= avefreei) {
@@ -314,6 +315,7 @@
(le16_to_cpu(tmp->bg_free_blocks_count) >
le16_to_cpu(gdp->bg_free_blocks_count))) {
i = j;
+ bh2 = tmpbh2;
gdp = tmp;
}
}
@@ -326,11 +328,11 @@
* Try to place the inode in its parent directory
*/
i = dir->u.ext2_i.i_block_group;
- tmp = ext2_get_group_desc (sb, i, &bh2);
- if (tmp && le16_to_cpu(tmp->bg_free_inodes_count))
+ tmp = ext2_get_group_desc (sb, i, &tmpbh2);
+ if (tmp && le16_to_cpu(tmp->bg_free_inodes_count)) {
+ bh2 = tmpbh2;
gdp = tmp;
- else
- {
+ } else {
/*
* Use a quadratic hash to find a group with a
* free inode
@@ -339,9 +341,10 @@
i += j;
if (i >= sb->u.ext2_sb.s_groups_count)
i -= sb->u.ext2_sb.s_groups_count;
- tmp = ext2_get_group_desc (sb, i, &bh2);
+ tmp = ext2_get_group_desc (sb, i, &tmpbh2);
if (tmp &&
le16_to_cpu(tmp->bg_free_inodes_count)) {
+ bh2 = tmpbh2;
gdp = tmp;
break;
}
@@ -355,9 +358,10 @@
for (j = 2; j < sb->u.ext2_sb.s_groups_count; j++) {
if (++i >= sb->u.ext2_sb.s_groups_count)
i = 0;
- tmp = ext2_get_group_desc (sb, i, &bh2);
+ tmp = ext2_get_group_desc (sb, i, &tmpbh2);
if (tmp &&
le16_to_cpu(tmp->bg_free_inodes_count)) {
+ bh2 = tmpbh2;
gdp = tmp;
break;
}

--
Daniel

2000-11-30 01:04:19

by Alexander Viro

[permalink] [raw]
Subject: Re: [PATCH] Re: 2.4.0-test11 ext2 fs corruption



On Thu, 30 Nov 2000, Daniel Phillips wrote:

> Alexander Viro wrote:
> > Bloody hell...
>
> I don't know if this is the bug he's got, in fact I doubt it, but it's a
> bug and it needs fixing. The problem is, ext2_get_group_desc
> effectively returns two results; one of them is being assigned from on
> conditional paths and the other isn't. This bug will cause - on very
> rare occasions - the wrong group descriptor block to be marked dirty,
> and changes might be lost. I think what we'd see as a result is wrong
> block, inode and directory counts.
>
> The fix below is kind of gross. The way I really want to do the fix is
> to remove one parameter from ext2_get_group_desc and thereby get rid of
> the troublesome side effect for good, but that kind of change isn't
> compatible with 'code freeze'.

Yes, it is. Moreover, correct solution is slightly different and changes
ext2_get_group_desc() semantics. Wait until tomorrow, OK?

However, it's not the source of reported problems. We clearly have b0rken
data in bitmaps themselves.