2008-05-20 09:04:32

by Bas van Schaik

[permalink] [raw]
Subject: Reoccurring ext3 errors: attempt to access beyond end of device, freeing blocks not in datazone

Dear all,

First of all, I'm very sorry to post such a vague question. I've been
working this it for months now, but the ext3 errors keep reoccurring
every once in a while. That is, every week or sometimes once a month.
The errors I get in such a situation:

> May 20 09:13:14 infinity kernel: attempt to access beyond end of device
> May 20 09:13:14 infinity kernel: loop0: rw=0, want=15629775440, limit=4404019200
> May 20 09:13:14 infinity kernel: attempt to access beyond end of device
> May 20 09:13:14 infinity kernel: loop0: rw=0, want=13075964688, limit=4404019200
> May 20 09:13:14 infinity kernel: attempt to access beyond end of device
> May 20 09:13:14 infinity kernel: loop0: rw=0, want=15354014352, limit=4404019200
> (...)
>
and a little bit later:
> May 20 09:15:07 infinity kernel: EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in datazone - block = 1953721929, count = 1
> May 20 09:15:07 infinity kernel: Remounting filesystem read-only
> May 20 09:15:07 infinity kernel: EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in datazone - block = 1634495585, count = 1
> May 20 09:15:07 infinity kernel: EXT3-fs error (device loop0): ext3_free_blocks_sb: bit already cleared for block 543517044
> May 20 09:15:07 infinity kernel: EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in datazone - block = 1919251793, count = 1
> (...)
>

After running e2fsck everything is fine again for a week or a little
longer...

The situation is not really trivial: I'm importing block devices from
several other machines using ATA over Ethernet (AoE). These block
devices are combined in a set of RAID5 arrays, each of them a physical
volume for LVM. Inside one large LVM volume group there exists a
cryptoloop encrypted logical volume, of which the 'plain' version is
accessible via /dev/loop0.

As you can see, there are a lot of possible failure points, I don't
expect you to identify those for me. I already performed thorough tests
on all servers (from memtest to I/O stressing to networking) and another
point is: this configuration did work flawlessly for about two years!

What I would like to know: what are the possible underlying causes for
the "attempt to access beyond en of device" error? Does anyone see any
meaning in the block (?) numbers mentioned in my syslog?

I would really like to hear from you, this problem is really causing me
headaches!

Regards,

Bas



2008-05-20 12:35:29

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Reoccurring ext3 errors: attempt to access beyond end of device, freeing blocks not in datazone

> After running e2fsck everything is fine again for a week or a little
> longer...

Did e2fsck report any errors? You didn't say; but if this kind of
corruption were really on the hard drive itself, it would cause e2fsck
to scream bloody murder.

If e2fsck didn't report any problems, then the problem is certainly on
the read path, and your system is sometimes returning corrupted data
--- probably because in some cases when the system asked for a disk
block #1234, it somehow got disk block #1257, or something like that.
In any case, you should have gotten some errors that inodes were
referring to invalid block numbers; that's what the Linux kernel was
complaining about.

> What I would like to know: what are the possible underlying causes for
> the "attempt to access beyond en of device" error? Does anyone see any
> meaning in the block (?) numbers mentioned in my syslog?

The errors you have here indicate a corrupt indirect block. So when
trying to read from an inode with the corrupted indirect block, it
asked the block I/O layer to read from a block far larger than the
size of the block device.

The errors from the block I/O error are in units of sectors, so you
need to divide them by 8 to get 4k block numbers:

> > May 20 09:13:14 infinity kernel: attempt to access beyond end of device
> > May 20 09:13:14 infinity kernel: loop0: rw=0, want=15629775440, limit=4404019200

So for these, you had

15629775440 / 8 = 0x74736E4A (or in ascii 'Jnst')
13075964688 / 8 = 0x616C6C62 (or in ascii 'blla')
15354014352 / 8 = 0x72657552 (or in ascii 'Ruer')

These errors errors happen when deleting a file with a bogus indirect
block (or garbage in the inode, but that's much less likely and
probably would have triggered additional complaints).

> > May 20 09:15:07 infinity kernel: EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in datazone - block = 1953721929, count = 1

Converting these numbers to hex:

1953721929 = 0x74736E49 (or in ascii 'Jnst')
1634495585 = 0x616C6C61 (or in ascii 'alla'
543517044 = 0x20656974 (or in ascii 'tie ')
1919251793 = 0x72657551 (or in ascii 'Quer')

Given that it's all ascii, it looks like the indirect block somehow
was overwritten, or was substituted by text.

- Ted

2008-05-20 22:02:47

by Bas van Schaik

[permalink] [raw]
Subject: Re: Reoccurring ext3 errors: attempt to access beyond end of device, freeing blocks not in datazone

Theodore Tso wrote:
>> After running e2fsck everything is fine again for a week or a little
>> longer...
>>
>
> Did e2fsck report any errors? You didn't say; but if this kind of
> corruption were really on the hard drive itself, it would cause e2fsck
> to scream bloody murder.
>
Yeah, I'm sorry: e2fsck did scream bloody murder. After the check, the
filesystem is clean again and remains clean for about a week. I still
haven't found out what operation exactly is causing the problems, the
only thing I can think of is rsync eating too much memory. The system is
used for backups of about 50 rsync clients, but the system should be
able to handle this even under high load.

>>> May 20 09:13:14 infinity kernel: attempt to access beyond end of device
>>> May 20 09:13:14 infinity kernel: loop0: rw=0, want=15629775440, limit=4404019200
>>>
> So for these, you had
>
> 15629775440 / 8 = 0x74736E4A (or in ascii 'Jnst')
> 13075964688 / 8 = 0x616C6C62 (or in ascii 'blla')
> 15354014352 / 8 = 0x72657552 (or in ascii 'Ruer')
>
> These errors errors happen when deleting a file with a bogus indirect
> block (or garbage in the inode, but that's much less likely and
> probably would have triggered additional complaints).
>
>
>>> May 20 09:15:07 infinity kernel: EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in datazone - block = 1953721929, count = 1
>>>
> Converting these numbers to hex:
>
> 1953721929 = 0x74736E49 (or in ascii 'Jnst')
> 1634495585 = 0x616C6C61 (or in ascii 'alla'
> 543517044 = 0x20656974 (or in ascii 'tie ')
> 1919251793 = 0x72657551 (or in ascii 'Quer')
>
> Given that it's all ascii, it looks like the indirect block somehow
> was overwritten, or was substituted by text.
>
Ah, such a lead was exactly what I was looking for, now I at least know
where those bogus numbers were coming from. Maybe a very dump question:
you seem to have reverse the ascii "translation", why? And shouldn't
"Jnst" be "Inst"? Note also that the "translations" seem to resemble
each other a little bit: "Jnst" = "Jnst", "alla" looks like "blla" and
"Ruer" looks like "Quer". Coincidence?

Summarizing all this: there is clearly something writing garbage to the
wrong place. It must be something above the encryption layer, since
that's the only way ascii can be written to the device.

Remember the different layers:
ext3 on decrypted /dev/loop0
LVM logical volume (encrypted)
RAID5 arrays
Imported AoE-devices
Physical disks

This conclusion kind of worries me, I was assuming that there was
something wrong at the networking level (AoE) or below. If that were the
case, the encrypted data would get modified and the corruptions would
look totally different. Or am I missing something?

Some other "magic numbers" to test the ascii hypothesis:
> May 20 09:15:07 infinity kernel: EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in datazone - block = 1279611658, count = 1
> May 20 09:15:07 infinity kernel: EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in datazone - block = 779051620, count = 1
> May 20 09:15:07 infinity kernel: EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in datazone - block = 1701409346, count = 1
> May 20 09:15:07 infinity kernel: EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in datazone - block = 1634366310, count = 1
> May 20 09:15:07 infinity kernel: EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in datazone - block = 1869573218, count = 1
> May 20 09:15:07 infinity kernel: EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks not in datazone - block = 1783836270, count = 1
1279611658 = 0x4C45530A = "SEL"
779051620 = 0x2E6F6264 = "dbo."
1701409346 = 0x65697242 = "Brie"
1634366310 = 0x616A7366 = "fsja"
1869573218 = 0x6F6F6C62 = "bloo"
1783836270 = 0x6A532E6E = "n.Sj"

If you ask me: enough certainty that it is indeed ascii text that was
written to the wrong place. Point is that I have another storage
"cluster" with almost exactly the same setup (main difference: LUKS is
used in favour of some AES cryptoloop). This cluster appears to be
having exactly the same problems:

> May 20 01:09:09 dust kernel: EXT3-fs error (device dm-10): ext3_free_blocks: Freeing blocks not in datazone - block = 4292870144, count = 1
> May 20 01:09:09 dust kernel: EXT3-fs error (device dm-10): ext3_free_blocks: Freeing blocks not in datazone - block = 3137339670, count = 1
> May 20 01:09:09 dust kernel: EXT3-fs error (device dm-10): ext3_free_blocks: Freeing blocks not in datazone - block = 892220416, count = 1
> May 20 01:09:09 dust kernel: EXT3-fs error (device dm-10): ext3_free_blocks: Freeing blocks not in datazone - block = 1919240224, count = 1
> May 20 01:09:09 dust kernel: EXT3-fs error (device dm-10): ext3_free_blocks: Freeing blocks not in datazone - block = 1768710504, count = 1
> May 20 01:09:09 dust kernel: EXT3-fs error (device dm-10): ext3_free_blocks: Freeing blocks not in datazone - block = 1920165742, count = 1
Convert these numbers to hex and ascii and you'll see that the outcome
doesn't look like "plain" ascii. This can mean two things (correct me if
I'm wrong):
1) there is a totally different problem
2) the data which was written to the wrong place wasn't plain ascii

I would really like to hear your thoughts on all this, maybe there is a
simple explanation out there after all...

-- Bas
**

2008-05-21 11:39:15

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Reoccurring ext3 errors: attempt to access beyond end of device, freeing blocks not in datazone

On Wed, May 21, 2008 at 12:02:42AM +0200, Bas van Schaik wrote:
> Ah, such a lead was exactly what I was looking for, now I at least know
> where those bogus numbers were coming from. Maybe a very dump question:
> you seem to have reverse the ascii "translation", why?

x86 (and the ext3 indirect blocks) are stored in little endian format.
If you doubt me, try running this program:

main(int argc, char **argv)
{
char a[5];
int *b;

b = (int *) a;
*b = 0x61626364;
a[4] = 0;
printf("%s\n", a);
}

> Summarizing all this: there is clearly something writing garbage to the
> wrong place. It must be something above the encryption layer, since
> that's the only way ascii can be written to the device.
>
> Remember the different layers:
> ext3 on decrypted /dev/loop0
> LVM logical volume (encrypted)
> RAID5 arrays
> Imported AoE-devices
> Physical disks
>
> This conclusion kind of worries me, I was assuming that there was
> something wrong at the networking level (AoE) or below. If that were the
> case, the encrypted data would get modified and the corruptions would
> look totally different. Or am I missing something?

Not necessarily, this could be simply valid data getting written to
the wrong place.

How are you encrypting your loop device, and what encryption system
are you using?

What sort of workload are you using with your filesystem, what version
of the kernel are your running, and does the machine crash often
(i.e., forcing journal replays)?

- Ted

2008-05-21 22:27:11

by Bas van Schaik

[permalink] [raw]
Subject: Re: Reoccurring ext3 errors: attempt to access beyond end of device, freeing blocks not in datazone

Theodore Tso wrote:
> On Wed, May 21, 2008 at 12:02:42AM +0200, Bas van Schaik wrote:
>
>> Ah, such a lead was exactly what I was looking for, now I at least know
>> where those bogus numbers were coming from. Maybe a very dump question:
>> you seem to have reverse the ascii "translation", why?
>>
>
> x86 (and the ext3 indirect blocks) are stored in little endian format.
> If you doubt me, try running this program:
>
> main(int argc, char **argv)
> {
> char a[5];
> int *b;
>
> b = (int *) a;
> *b = 0x61626364;
> a[4] = 0;
> printf("%s\n", a);
> }
>
No, I certainly do most certainly not doubt you. I was just wondering...

>> Summarizing all this: there is clearly something writing garbage to the
>> wrong place. It must be something above the encryption layer, since
>> that's the only way ascii can be written to the device.
>>
>> Remember the different layers:
>> ext3 on decrypted /dev/loop0
>> LVM logical volume (encrypted)
>> RAID5 arrays
>> Imported AoE-devices
>> Physical disks
>>
>> This conclusion kind of worries me, I was assuming that there was
>> something wrong at the networking level (AoE) or below. If that were the
>> case, the encrypted data would get modified and the corruptions would
>> look totally different. Or am I missing something?
>>
>
> Not necessarily, this could be simply valid data getting written to
> the wrong place.
>
Of course, but there are no processes performing direct I/O to one of
the underlying block devices. So how could plain ascii data get written
to the wrong place and still appear as plain ascii after decrypting it?

> How are you encrypting your loop device, and what encryption system
> are you using?
>
I think this tells you everything:
> cat $KEYFILE | losetup -e aes128 -p0 /dev/loop0 /dev/vg_backups2/backups

However, the other system I was mentioning is using LUKS (dm-crypt) to
achieve the same goal.

> What sort of workload are you using with your filesystem, what version
> of the kernel are your running, and does the machine crash often
> (i.e., forcing journal replays)?
The system is under high load: sometimes there are about 20 rsync server
processes fighting for some time. As you might know, rsync is not really
thrifty with claiming resoures, especially not when building file lists.
The machine itself doesn't crash, it seems to be perfectly stable. These
corruptions are the only problem...

-- Bas

2008-05-22 18:18:33

by Andreas Dilger

[permalink] [raw]
Subject: Re: Reoccurring ext3 errors: attempt to access beyond end of device, freeing blocks not in datazone

On May 21, 2008 00:02 +0200, Bas van Schaik wrote:
> Theodore Tso wrote:
> > 15629775440 / 8 = 0x74736E4A (or in ascii 'Inst')
> > 13075964688 / 8 = 0x616C6C62 (or in ascii 'blla')
> > 15354014352 / 8 = 0x72657552 (or in ascii 'Ruer')
> >
> > Converting these numbers to hex:
> >
> > 1953721929 = 0x74736E49 (or in ascii 'Jnst')
> > 1634495585 = 0x616C6C61 (or in ascii 'alla'
> > 543517044 = 0x20656974 (or in ascii 'tie ')
> > 1919251793 = 0x72657551 (or in ascii 'Quer')
> >
> > Given that it's all ascii, it looks like the indirect block somehow
> > was overwritten, or was substituted by text.
>
> Ah, such a lead was exactly what I was looking for, now I at least know
> where those bogus numbers were coming from. Maybe a very dump question:
> you seem to have reverse the ascii "translation", why? And shouldn't
> "Jnst" be "Inst"? Note also that the "translations" seem to resemble
> each other a little bit: "Jnst" = "Jnst", "alla" looks like "blla" and
> "Ruer" looks like "Quer". Coincidence?

Or it is possible you are getting single-bit errors somewhere along
your IO path, and another single-bit error has resulted in this data
being written to the wrong block to begin with...


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-05-23 14:41:15

by Bas van Schaik

[permalink] [raw]
Subject: Re: Reoccurring ext3 errors: attempt to access beyond end of device, freeing blocks not in datazone

Andreas Dilger wrote:
> On May 21, 2008 00:02 +0200, Bas van Schaik wrote:
>
>> Theodore Tso wrote:
>>
>>> 15629775440 / 8 = 0x74736E4A (or in ascii 'Inst')
>>> 13075964688 / 8 = 0x616C6C62 (or in ascii 'blla')
>>> 15354014352 / 8 = 0x72657552 (or in ascii 'Ruer')
>>>
>>> Converting these numbers to hex:
>>>
>>> 1953721929 = 0x74736E49 (or in ascii 'Jnst')
>>> 1634495585 = 0x616C6C61 (or in ascii 'alla'
>>> 543517044 = 0x20656974 (or in ascii 'tie ')
>>> 1919251793 = 0x72657551 (or in ascii 'Quer')
>>>
>>> Given that it's all ascii, it looks like the indirect block somehow
>>> was overwritten, or was substituted by text.
>>>
>> Ah, such a lead was exactly what I was looking for, now I at least know
>> where those bogus numbers were coming from. Maybe a very dump question:
>> you seem to have reverse the ascii "translation", why? And shouldn't
>> "Jnst" be "Inst"? Note also that the "translations" seem to resemble
>> each other a little bit: "Jnst" = "Jnst", "alla" looks like "blla" and
>> "Ruer" looks like "Quer". Coincidence?
>>
>
> Or it is possible you are getting single-bit errors somewhere along
> your IO path, and another single-bit error has resulted in this data
> being written to the wrong block to begin with...
>
I don't think I understand your statement fully... Do you mean that some
bitflip caused the filesystem to initiate a write to the wrong blocks?

-- Bas