by Ric Wheeler

[permalink] [raw]

Subject: Re: Regular ext4 error warning with HD in USB dock

On 12/28/2010 05:30 AM, Amir Goldstein wrote:
> On Tue, Dec 28, 2010 at 10:19 AM, Rogier Wolff<[email protected]> wrote:
>> On Mon, Dec 27, 2010 at 09:53:43PM -0500, Ted Ts'o wrote:
>>> On Tue, Dec 28, 2010 at 09:53:45AM +1100, Con Kolivas wrote:
>>>> [1048401.773270] EXT4-fs (sde8): mounted filesystem with writeback data mode.
>>>> Opts: (null)
>>>> [1048702.736011] EXT4-fs (sde8): error count: 3
>>>> [1048702.736016] EXT4-fs (sde8): initial error at 1289053677:
>>>> ext4_journal_start_sb:251
>>>> [1048702.736018] EXT4-fs (sde8): last error at 1289080948: ext4_put_super:719
>>> That's actually not an error. It's a report which is generated every
>>> 24 hours, indicating that there has been 3 errors since the last time
>>> the error count has been cleared, with the first error taking place at
>>> Sat Nov 6 10:27:57 2010 (US/Eastern) in the function
>>> ext4_journal_start_sb(), at line 251, and the most recent error taking
>>> place at Sat Nov 6 18:02:28 2010 (US/Eastern), in the function
>>> ext4_put_super() at line 719. This is a new feature which was added
>>> in 2.6.36.
>> Nice. But the issue you're not mentioning is: What errors could have
>> happened on the 6th of november? Should Con worry about those errors?
>>
> Ted,
>
> I would like to use this opportunity to remind you about my
> record_journal_errstr()
> implementation, see:
> https://github.com/amir73il/ext4-snapshots/blob/next3-stable/fs/next3/super.c#L157
>
> It records the initial errors messages (which I found to be the most
> interesting),
> in a message buffer on the unused space after the journal super block
> (3K on a 4K block fs).
>
> fsck prints out those messages and clears the buffer.
> In under a year of Next3 fs in beta and production, it has helped me many times
> to analyse bugs post-mortem and find the problem.
>
> If there is demand, I can post the patch for Ext4.
>
> Amir.

I do think that this sounds like a useful addition - should be very useful in
doing post mortem analysis...

Thanks!

Ric

2011-01-02 19:23:21

[permalink] [raw]

Subject: Re: Regular ext4 error warning with HD in USB dock

On Sun, Jan 09, 2011 at 09:58:38AM -0500, Ted Ts'o wrote:
> On Sun, Jan 09, 2011 at 09:12:49AM +0100, Rogier Wolff wrote:
> > > No. The superblock nor its offset will never change. It's like the
> > > syscall ABI, only worse. If we changed it would break *everybody*.
> > > Fortunately there is a huge amount of space left over in the 1024 byte
> > > superblock.
> >
> > It's called defensive programming. It prevents bugs before they
> > happen. By your reasoning you could've written 2048 or 0x800 there.
>
> Defensive programming would be something like
>
> BUG_ON(sizeof(struct ext4_super_block) != 1024);

It is one form of "defense", but not what I call defensive
programming. Defensive programming, is that you make things robust in
the the face of unexpected changes.

If you do the BUG_ON and do this throughout the code, one day your
grandson will be increasing the superblock size. He'll fix all the
BUG_ON and your other "defensive" measures. But lo and behold. He's
human, and forgot one or two. Especially the run-time detections that
only get called occasionally (like in this case on an error
sitatuation) might take a while before they are noticed.

What use is it to turn a "we've found a serious error in your
filesystem, we strongly recommend you no longer write to it and run
fsck first" into a "system halted"?

What is wrong with just putting the right formula where it belongs?

We need to set the variable "len" to the amount of free space beyond
the superblock in the first block of the filesystem. So we take the
size of the first block, subtract the size of the superblock and we
subtract the start of the superblock. It's as simple as that.

> We could add that, if people like. I do have regression tests (i.e.,
> boot a system with ext4) which would die if anything like that
> changed, though.

How about

Makefile:
ext4.o: ...the objects.... testsbsize.out

testsbsize.out: testsbsize
./testsbsize

(oh and something about useing "hostcc" for testsbsize).

with testsbsize.c:
#include <stdio.h>
#include <...ext4....>
int main (int argc, char **argv)
{
if (sizeof (struct ext4_super_block) != 1024) {
fprintf (stderr, "Superblock size is %d, should be 1024.\n", sizeof (struct ext4_super_block));
exit (1);
}
exit (0);
}

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ