Hi all
Since I moved my internal HD into a USB dock externally and mount the ext4
filesystem on it, I regularly get the following errors after it has been
mounted for a while (see timecode). It doesn't seem to matter which recent
kernel I use.
[1048401.773270] EXT4-fs (sde8): mounted filesystem with writeback data mode.
Opts: (null)
[1048702.736011] EXT4-fs (sde8): error count: 3
[1048702.736016] EXT4-fs (sde8): initial error at 1289053677:
ext4_journal_start_sb:251
[1048702.736018] EXT4-fs (sde8): last error at 1289080948: ext4_put_super:719
I've forced an fs check numerous times but nothing comes up. There was only
one occasion where the filesystem mounted RO but that was months ago and has
since been fine.
This does not appear to cause any actual problems and happens it seems at
_precisely_ the moment the hard drive decides to go into sleep mode as I hear
it spin down. hdparm is unable to extract details from the hard drive when it
is plugged into the usb dock.
The options used on this are as follows:
tune2fs 1.41.12 (17-May-2010)
Filesystem volume name: backup
Last mounted on: /media/backup
Filesystem UUID: 4eff5eb7-89fb-4db1-8cfd-f09db598a00b
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype
needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg
dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: journal_data_writeback
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 22593536
Block count: 90355576
Reserved block count: 0
Free blocks: 30330219
Free inodes: 21828725
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 1002
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Flex block group size: 16
Filesystem created: Fri Jun 19 22:38:36 2009
Last mount time: Tue Dec 28 09:36:34 2010
Last write time: Tue Dec 28 09:36:34 2010
Mount count: 8
Maximum mount count: -1
Last checked: Sun Dec 19 21:50:05 2010
Check interval: 0 (<none>)
Lifetime writes: 1992 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: 5cdfe619-0de5-4bcd-85d2-5fa5f4f13da7
Journal backup: inode blocks
lsusb
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 007 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 006 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 008: ID 046d:c70a Logitech, Inc. MX5000 Cordless Desktop
Bus 002 Device 007: ID 046d:c70e Logitech, Inc. MX1000 Bluetooth Laser Mouse
Bus 002 Device 006: ID 0424:2228 Standard Microsystems Corp. 9-in-2 Card
Reader
Bus 002 Device 005: ID 046d:0b02 Logitech, Inc. BT Mini-Receiver (HID proxy
mode)
Bus 002 Device 004: ID 04b8:012c Seiko Epson Corp. Perfection V350 (GT-F700)
Bus 002 Device 003: ID 0424:2602 Standard Microsystems Corp. USB 2.0 Hub
Bus 002 Device 002: ID 0424:2514 Standard Microsystems Corp. USB 2.0 Hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 003: ID 1759:5000 LucidPort Technology, Inc.
Bus 001 Device 002: ID 1058:1100 Western Digital Technologies, Inc.
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
lspci
00:00.0 Host bridge: Intel Corporation 82P965/G965 Memory Controller Hub (rev
02)
00:01.0 PCI bridge: Intel Corporation 82P965/G965 PCI Express Root Port (rev
02)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI
Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI
Controller #5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI
Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio
Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1
(rev 02)
00:1c.3 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 4
(rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5
(rev 02)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI
Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI
Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI
Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI
Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801HB/HR (ICH8/R) LPC Interface
Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801HB (ICH8) 4 port SATA AHCI
Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev
02)
01:00.0 VGA compatible controller: nVidia Corporation G96 [GeForce 9400 GT]
(rev a1)
02:00.0 SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA
Controller (rev 02)
02:00.1 IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA
Controller (rev 02)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI
Express Gigabit Ethernet controller (rev 01)
05:02.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture
(rev 11)
05:02.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev
11)
Pointers would be appreciated, even if it's just "it's harmless, ignore it".
Thanks,
Con
--
-ck
On Tue, Dec 28, 2010 at 09:53:45AM +1100, Con Kolivas wrote:
>
> Since I moved my internal HD into a USB dock externally and mount the ext4
> filesystem on it, I regularly get the following errors after it has been
> mounted for a while (see timecode). It doesn't seem to matter which recent
> kernel I use.
>
> [1048401.773270] EXT4-fs (sde8): mounted filesystem with writeback data mode.
> Opts: (null)
> [1048702.736011] EXT4-fs (sde8): error count: 3
> [1048702.736016] EXT4-fs (sde8): initial error at 1289053677:
> ext4_journal_start_sb:251
> [1048702.736018] EXT4-fs (sde8): last error at 1289080948: ext4_put_super:719
That's actually not an error. It's a report which is generated every
24 hours, indicating that there has been 3 errors since the last time
the error count has been cleared, with the first error taking place at
Sat Nov 6 10:27:57 2010 (US/Eastern) in the function
ext4_journal_start_sb(), at line 251, and the most recent error taking
place at Sat Nov 6 18:02:28 2010 (US/Eastern), in the function
ext4_put_super() at line 719. This is a new feature which was added
in 2.6.36.
This information (which is stored in the ext4 superblock) gets cleared
by e2fsck version 1.41.13 or newer (1.41.14 is the most recently
released version). If you are using an older version of e2fsck, this
information won't get cleared, which is not a disaster, although it
could potentially get annoying. The solution is to simply to upgrade
to the newest version of e2fsprogs if you are using the 2.6.36 or
newer kernel.
The idea behind this feature is to make sure that if there was a file
system corruption detected, it wouldn't be lost due to log files
getting overflowed. It's also helpful when users report problems to
distribution support lines, since it makes it more likely support
personnel will be able to determine that there has been some file
system corrutpion issues, perhaps caused by hardware problems, or some
other cause, including kernel bugs.
Hope this helps explain what you're seeing!
- Ted
On Mon, Dec 27, 2010 at 09:53:43PM -0500, Ted Ts'o wrote:
> On Tue, Dec 28, 2010 at 09:53:45AM +1100, Con Kolivas wrote:
> > [1048401.773270] EXT4-fs (sde8): mounted filesystem with writeback data mode.
> > Opts: (null)
> > [1048702.736011] EXT4-fs (sde8): error count: 3
> > [1048702.736016] EXT4-fs (sde8): initial error at 1289053677:
> > ext4_journal_start_sb:251
> > [1048702.736018] EXT4-fs (sde8): last error at 1289080948: ext4_put_super:719
>
> That's actually not an error. It's a report which is generated every
> 24 hours, indicating that there has been 3 errors since the last time
> the error count has been cleared, with the first error taking place at
> Sat Nov 6 10:27:57 2010 (US/Eastern) in the function
> ext4_journal_start_sb(), at line 251, and the most recent error taking
> place at Sat Nov 6 18:02:28 2010 (US/Eastern), in the function
> ext4_put_super() at line 719. This is a new feature which was added
> in 2.6.36.
Nice. But the issue you're not mentioning is: What errors could have
happened on the 6th of november? Should Con worry about those errors?
OK, the chances are that he has rebooted since november, and that an
older fsck fixed the errors, but not cleared the "fs errror log". Would
these errors have triggered a "remount-readonly" if the fs was mounted
like that?
I don't reboot that often:
obelix:~> uptime
09:16:19 up 175 days, 19:04, 19 users, load average: 110.10, 110.61, 111.22
(and yes, the load is quite high on that machine. I won't polute this
thread about that....)
Roger.
--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ
On Tue, 28 Dec 2010 19:19:17 Rogier Wolff wrote:
> On Mon, Dec 27, 2010 at 09:53:43PM -0500, Ted Ts'o wrote:
> > On Tue, Dec 28, 2010 at 09:53:45AM +1100, Con Kolivas wrote:
> > > [1048401.773270] EXT4-fs (sde8): mounted filesystem with writeback data
> > > mode. Opts: (null)
> > > [1048702.736011] EXT4-fs (sde8): error count: 3
> > > [1048702.736016] EXT4-fs (sde8): initial error at 1289053677:
> > > ext4_journal_start_sb:251
> > > [1048702.736018] EXT4-fs (sde8): last error at 1289080948:
> > > ext4_put_super:719
> >
> > That's actually not an error. It's a report which is generated every
> > 24 hours, indicating that there has been 3 errors since the last time
> > the error count has been cleared, with the first error taking place at
> > Sat Nov 6 10:27:57 2010 (US/Eastern) in the function
> > ext4_journal_start_sb(), at line 251, and the most recent error taking
> > place at Sat Nov 6 18:02:28 2010 (US/Eastern), in the function
> > ext4_put_super() at line 719. This is a new feature which was added
> > in 2.6.36.
>
> Nice. But the issue you're not mentioning is: What errors could have
> happened on the 6th of november? Should Con worry about those errors?
>
> OK, the chances are that he has rebooted since november, and that an
> older fsck fixed the errors, but not cleared the "fs errror log". Would
> these errors have triggered a "remount-readonly" if the fs was mounted
> like that?
Indeed the FS was remounted as RO at the time. I had to power down the USB
device and power it up again after umounting the FS and then doing an FSCK
before it was usable again, and the machine had about a month of uptime when
it happened. It has not caused a problem since.
Thanks,
Con
--
-ck
On Tue, Dec 28, 2010 at 10:19 AM, Rogier Wolff <[email protected]> wrote:
> On Mon, Dec 27, 2010 at 09:53:43PM -0500, Ted Ts'o wrote:
>> On Tue, Dec 28, 2010 at 09:53:45AM +1100, Con Kolivas wrote:
>> > [1048401.773270] EXT4-fs (sde8): mounted filesystem with writeback data mode.
>> > Opts: (null)
>> > [1048702.736011] EXT4-fs (sde8): error count: 3
>> > [1048702.736016] EXT4-fs (sde8): initial error at 1289053677:
>> > ext4_journal_start_sb:251
>> > [1048702.736018] EXT4-fs (sde8): last error at 1289080948: ext4_put_super:719
>>
>> That's actually not an error. ?It's a report which is generated every
>> 24 hours, indicating that there has been 3 errors since the last time
>> the error count has been cleared, with the first error taking place at
>> Sat Nov 6 10:27:57 2010 (US/Eastern) in the function
>> ext4_journal_start_sb(), at line 251, and the most recent error taking
>> place at Sat Nov 6 18:02:28 2010 (US/Eastern), in the function
>> ext4_put_super() at line 719. ?This is a new feature which was added
>> in 2.6.36.
>
> Nice. But the issue you're not mentioning is: What errors could have
> happened on the 6th of november? Should Con worry about those errors?
>
Ted,
I would like to use this opportunity to remind you about my
record_journal_errstr()
implementation, see:
https://github.com/amir73il/ext4-snapshots/blob/next3-stable/fs/next3/super.c#L157
It records the initial errors messages (which I found to be the most
interesting),
in a message buffer on the unused space after the journal super block
(3K on a 4K block fs).
fsck prints out those messages and clears the buffer.
In under a year of Next3 fs in beta and production, it has helped me many times
to analyse bugs post-mortem and find the problem.
If there is demand, I can post the patch for Ext4.
Amir.
> OK, the chances are that he has rebooted since november, and that an
> older fsck fixed the errors, but not cleared the "fs errror log". Would
> these errors have triggered a "remount-readonly" if the fs was mounted
> like that?
>
> I don't reboot that often:
>
> obelix:~> uptime
> ?09:16:19 up 175 days, 19:04, 19 users, ?load average: 110.10, 110.61, 111.22
>
> (and yes, the load is quite high on that machine. I won't polute this
> thread about that....)
>
> ? ? ? ?Roger.
>
> --
> ** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
> ** ? ?Delftechpark 26 2628 XH ?Delft, The Netherlands. KVK: 27239233 ? ?**
> *-- BitWizard writes Linux device drivers for any device you may have! --*
> Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
> Does it sit on the couch all day? Is it unemployed? Please be specific!
> Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>
On 12/28/2010 03:53 AM, Ted Ts'o wrote:
> On Tue, Dec 28, 2010 at 09:53:45AM +1100, Con Kolivas wrote:
>
>> Since I moved my internal HD into a USB dock externally and mount the ext4
>> filesystem on it, I regularly get the following errors after it has been
>> mounted for a while (see timecode). It doesn't seem to matter which recent
>> kernel I use.
>>
>> [1048401.773270] EXT4-fs (sde8): mounted filesystem with writeback data mode.
>> Opts: (null)
>> [1048702.736011] EXT4-fs (sde8): error count: 3
>> [1048702.736016] EXT4-fs (sde8): initial error at 1289053677:
>> ext4_journal_start_sb:251
>> [1048702.736018] EXT4-fs (sde8): last error at 1289080948: ext4_put_super:719
>>
> That's actually not an error. It's a report which is generated every
> 24 hours, indicating that there has been 3 errors since the last time
> the error count has been cleared, with the first error taking place at
> Sat Nov 6 10:27:57 2010 (US/Eastern) in the function
> ext4_journal_start_sb(), at line 251, and the most recent error taking
> place at Sat Nov 6 18:02:28 2010 (US/Eastern), in the function
> ext4_put_super() at line 719. This is a new feature which was added
> in 2.6.36.
>
This is going to be a faq...
I suppose the datetime is encoded (what format is that?) in that long
number after "at".
May I suggest the datetime gets decoded in the printing?
Also may I suggest that the error happens immediately after mount and
not after 300 seconds from mount?
I just subscribed to this list exactly to report the same kind of error.
Last week I was doing reliability tests for open-iscsi and this error
drove me crazy. I spent days in tests where I thought I could reproduce
an error in open-iscsi by disconnecting and reconnecting the network; I
even reported this to the open-iscsi mailing list, but in fact it was an
old error of the filesystem and it was not getting cleared by my older
e2fsck 1.41.11 .
If that happened immediately after mount or if it spitted the datetime
in human readable format, I would have immediately guessed it was due to
an existing "filesystem problem" (even though I was running fsck.ext4 -f
prior to each mount) but instead I thought it was due to my 5 minutes of
"networks disconnection tests" I was doing after each mount. DOH!
Anyway thanks for your work: excellent filesystem.
PS: I have a question for you regarding ext4 behaviour with SCSI
commands resubmissions... ok I am opening another thread for that.
On Tue, Dec 28, 2010 at 11:41:53AM +0100, torn5 wrote:
>
> This is going to be a faq...
> I suppose the datetime is encoded (what format is that?) in that
> long number after "at".
Number of seconds since January 1st, 1970 midnight GMT. (Standard
Unix-time format). It's hard for the kernel to decode it since it
doesn't know what time-zone you are in, or what your local legislature
has done vis-a-vis arbitrary and cabritious adjustments to the
start/stop date of daylight savings time, etc.
It's easy enough to decode it using perl:
perl -e '$t = localtime(1289053677); print "$t\n"'
> May I suggest the datetime gets decoded in the printing?
> Also may I suggest that the error happens immediately after mount
> and not after 300 seconds from mount?
Hmmm.... maybe. The idea behind the 300 second delay was for the root
file system, to let e2fsck have a chance to fix the file system before
reporting the errors at mount time.
- Ted
On Tue, Dec 28, 2010 at 09:19:17AM +0100, Rogier Wolff wrote:
> > That's actually not an error. It's a report which is generated every
> > 24 hours, indicating that there has been 3 errors since the last time
> > the error count has been cleared, with the first error taking place at
> > Sat Nov 6 10:27:57 2010 (US/Eastern) in the function
> > ext4_journal_start_sb(), at line 251, and the most recent error taking
> > place at Sat Nov 6 18:02:28 2010 (US/Eastern), in the function
> > ext4_put_super() at line 719. This is a new feature which was added
> > in 2.6.36.
>
> Nice. But the issue you're not mentioning is: What errors could have
> happened on the 6th of november? Should Con worry about those errors?
It looks like those errors were probably caused by I/O errors while
writing to the journal. Maybe due to a flakey USB connection,
perhaps? It would be useful to look at the /var/log/messagfes around
that date to get a more detailed look.
> OK, the chances are that he has rebooted since november, and that an
> older fsck fixed the errors, but not cleared the "fs errror log". Would
> these errors have triggered a "remount-readonly" if the fs was mounted
> like that?
Yes, errors that get recorded into the superblock are also errors that
would cause a read-only remount if the file system is set that way.
- Ted
Ted Ts'o <[email protected]> writes:
> On Tue, Dec 28, 2010 at 11:41:53AM +0100, torn5 wrote:
>>
>> This is going to be a faq...
>> I suppose the datetime is encoded (what format is that?) in that
>> long number after "at".
>
> Number of seconds since January 1st, 1970 midnight GMT. (Standard
> Unix-time format). It's hard for the kernel to decode it since it
> doesn't know what time-zone you are in, or what your local legislature
> has done vis-a-vis arbitrary and cabritious adjustments to the
> start/stop date of daylight savings time, etc.
>
> It's easy enough to decode it using perl:
>
> perl -e '$t = localtime(1289053677); print "$t\n"'
The "date" program can decode these values too:
date -d @1289053677
--
Ben Pfaff
http://benpfaff.org
On 12/28/2010 03:32 PM, Ted Ts'o wrote:
> On Tue, Dec 28, 2010 at 11:41:53AM +0100, torn5 wrote:
>
>> This is going to be a faq...
>> I suppose the datetime is encoded (what format is that?) in that
>> long number after "at".
>>
> Number of seconds since January 1st, 1970 midnight GMT. (Standard
> Unix-time format). It's hard for the kernel to decode it since it
> doesn't know what time-zone you are in, or what your local legislature
> has done vis-a-vis arbitrary and cabritious adjustments to the
> start/stop date of daylight savings time, etc.
>
I suggest to decode it in GMT then.
Like this it's not possible to guess it's a "time" value, and hence that
it's an *old* error. I thought it was a pointer to something, and so I
thought the error was happening now, or more likely (since it was
triggered after 300 seconds) that it was some inconsistency in the
filesystem being detected now, in some kind of lazy check.
On 12/28/2010 05:30 AM, Amir Goldstein wrote:
> On Tue, Dec 28, 2010 at 10:19 AM, Rogier Wolff<[email protected]> wrote:
>> On Mon, Dec 27, 2010 at 09:53:43PM -0500, Ted Ts'o wrote:
>>> On Tue, Dec 28, 2010 at 09:53:45AM +1100, Con Kolivas wrote:
>>>> [1048401.773270] EXT4-fs (sde8): mounted filesystem with writeback data mode.
>>>> Opts: (null)
>>>> [1048702.736011] EXT4-fs (sde8): error count: 3
>>>> [1048702.736016] EXT4-fs (sde8): initial error at 1289053677:
>>>> ext4_journal_start_sb:251
>>>> [1048702.736018] EXT4-fs (sde8): last error at 1289080948: ext4_put_super:719
>>> That's actually not an error. It's a report which is generated every
>>> 24 hours, indicating that there has been 3 errors since the last time
>>> the error count has been cleared, with the first error taking place at
>>> Sat Nov 6 10:27:57 2010 (US/Eastern) in the function
>>> ext4_journal_start_sb(), at line 251, and the most recent error taking
>>> place at Sat Nov 6 18:02:28 2010 (US/Eastern), in the function
>>> ext4_put_super() at line 719. This is a new feature which was added
>>> in 2.6.36.
>> Nice. But the issue you're not mentioning is: What errors could have
>> happened on the 6th of november? Should Con worry about those errors?
>>
> Ted,
>
> I would like to use this opportunity to remind you about my
> record_journal_errstr()
> implementation, see:
> https://github.com/amir73il/ext4-snapshots/blob/next3-stable/fs/next3/super.c#L157
>
> It records the initial errors messages (which I found to be the most
> interesting),
> in a message buffer on the unused space after the journal super block
> (3K on a 4K block fs).
>
> fsck prints out those messages and clears the buffer.
> In under a year of Next3 fs in beta and production, it has helped me many times
> to analyse bugs post-mortem and find the problem.
>
> If there is demand, I can post the patch for Ext4.
>
> Amir.
I do think that this sounds like a useful addition - should be very useful in
doing post mortem analysis...
Thanks!
Ric
On Sat, Jan 1, 2011 at 7:20 PM, Ric Wheeler <[email protected]> wrote:
> On 12/28/2010 05:30 AM, Amir Goldstein wrote:
>>
>> On Tue, Dec 28, 2010 at 10:19 AM, Rogier Wolff<[email protected]>
>> ?wrote:
>>>
>>> On Mon, Dec 27, 2010 at 09:53:43PM -0500, Ted Ts'o wrote:
>>>>
>>>> On Tue, Dec 28, 2010 at 09:53:45AM +1100, Con Kolivas wrote:
>>>>>
>>>>> [1048401.773270] EXT4-fs (sde8): mounted filesystem with writeback data
>>>>> mode.
>>>>> Opts: (null)
>>>>> [1048702.736011] EXT4-fs (sde8): error count: 3
>>>>> [1048702.736016] EXT4-fs (sde8): initial error at 1289053677:
>>>>> ext4_journal_start_sb:251
>>>>> [1048702.736018] EXT4-fs (sde8): last error at 1289080948:
>>>>> ext4_put_super:719
>>>>
>>>> That's actually not an error. ?It's a report which is generated every
>>>> 24 hours, indicating that there has been 3 errors since the last time
>>>> the error count has been cleared, with the first error taking place at
>>>> Sat Nov 6 10:27:57 2010 (US/Eastern) in the function
>>>> ext4_journal_start_sb(), at line 251, and the most recent error taking
>>>> place at Sat Nov 6 18:02:28 2010 (US/Eastern), in the function
>>>> ext4_put_super() at line 719. ?This is a new feature which was added
>>>> in 2.6.36.
>>>
>>> Nice. But the issue you're not mentioning is: What errors could have
>>> happened on the 6th of november? Should Con worry about those errors?
>>>
>> Ted,
>>
>> I would like to use this opportunity to remind you about my
>> record_journal_errstr()
>> implementation, see:
>>
>> https://github.com/amir73il/ext4-snapshots/blob/next3-stable/fs/next3/super.c#L157
>>
>> It records the initial errors messages (which I found to be the most
>> interesting),
>> in a message buffer on the unused space after the journal super block
>> (3K on a 4K block fs).
>>
>> fsck prints out those messages and clears the buffer.
>> In under a year of Next3 fs in beta and production, it has helped me many
>> times
>> to analyse bugs post-mortem and find the problem.
>>
>> If there is demand, I can post the patch for Ext4.
>>
>> Amir.
>
> I do think that this sounds like a useful addition - should be very useful
> in doing post mortem analysis...
>
> Thanks!
>
> Ric
>
>
Hi Ric,
Attached are the patches for ext4 (against Ted's next branch) and
e2fsck (against 1.41.14).
e2fsck patch is running in production with e2fsprogs-1.41.9 and the
porting to 1.41.14 was trivial.
ext4 patch is a 'trivial+' port of the Next3 patch running in production.
Do you think you guys can take the patches for a test drive?
Thanks,
Amir.
ext4: record error messages in message buffer
Ext4 error messages are stored in the free space after the journal super block.
After journal recovery, the message buffer is copied from the journal super
block to the file system super block.
When error behavior is configured to remount-ro, the message buffer can be read
and displayed at a later time by e2fsck. The messages from the time of the
error are very helpful in post-mortem bug analysis.
Signed-off-by: Amir Goldstein <[email protected]>
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 7728a4c..8888815 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -290,10 +290,53 @@ int __ext4_journal_stop(const char *where,
unsigned int line, handle_t *handle)
return err;
}
+/* record error messages after journal super block */
+static void ext4_record_journal_err(struct super_block *sb, const char *where,
+ const char *function, const char *fmt, va_list args)
+{
+#define MSGLEN 256
+ journal_t *journal = EXT4_SB(sb)->s_journal;
+ char *buf;
+ unsigned long offset;
+ int len;
+
+ if (!journal)
+ return;
+
+ buf = (char *)journal->j_superblock;
+ offset = (unsigned long)buf % sb->s_blocksize;
+ buf += sizeof(journal_superblock_t);
+ offset += sizeof(journal_superblock_t);
+
+ /* seek to end of message buffer */
+ while (offset < sb->s_blocksize && *buf) {
+ buf += MSGLEN;
+ offset += MSGLEN;
+ }
+
+ if (offset+MSGLEN > sb->s_blocksize)
+ /* no space left in message buffer */
+ return;
+
+ len = snprintf(buf, MSGLEN, "%s: %s: ", where, function);
+ len += vsnprintf(buf+len, MSGLEN-len, fmt, args);
+}
+
+static void ext4_record_journal_errstr(struct super_block *sb,
+ const char *where, const char *function, ...)
+{
+ va_list args;
+
+ va_start(args, function);
+ ext4_record_journal_err(sb, where, function, "%s\n", args);
+ va_end(args);
+}
+
void ext4_journal_abort_handle(const char *caller, unsigned int line,
const char *err_fn, struct buffer_head *bh,
handle_t *handle, int err)
{
+ struct super_block *sb = handle->h_transaction->t_journal->j_private;
char nbuf[16];
const char *errstr = ext4_decode_error(NULL, err, nbuf);
@@ -311,6 +354,9 @@ void ext4_journal_abort_handle(const char *caller,
unsigned int line,
printk(KERN_ERR "%s:%d: aborting transaction: %s in %s\n",
caller, line, errstr, err_fn);
+ /* record error message in journal super block */
+ ext4_record_journal_errstr(sb, caller, err_fn, errstr);
+
jbd2_journal_abort_handle(handle);
}
@@ -398,6 +444,11 @@ void __ext4_error(struct super_block *sb, const
char *function,
sb->s_id, function, line, current->comm, &vaf);
va_end(args);
+ /* record error message in journal super block */
+ va_start(args, fmt);
+ ext4_record_journal_err(sb, __func__, function, fmt, args);
+ va_end(args);
+
ext4_handle_error(sb);
}
@@ -422,6 +473,11 @@ void ext4_error_inode(struct inode *inode, const
char *function,
printk(KERN_CONT "comm %s: %pV\n", current->comm, &vaf);
va_end(args);
+ /* record error message in journal super block */
+ va_start(args, fmt);
+ ext4_record_journal_err(inode->i_sb, __func__, function, fmt, args);
+ va_end(args);
+
ext4_handle_error(inode->i_sb);
}
@@ -452,6 +508,11 @@ void ext4_error_file(struct file *file, const
char *function,
printk(KERN_CONT "comm %s: path %s: %pV\n", current->comm, path, &vaf);
va_end(args);
+ /* record error message in journal super block */
+ va_start(args, fmt);
+ ext4_record_journal_err(inode->i_sb, __func__, function, fmt, args);
+ va_end(args);
+
ext4_handle_error(inode->i_sb);
}
@@ -510,6 +571,9 @@ void __ext4_std_error(struct super_block *sb,
const char *function,
sb->s_id, function, line, errstr);
save_error_info(sb, function, line);
+ /* record error message in journal super block */
+ ext4_record_journal_errstr(sb, __func__, function, errstr);
+
ext4_handle_error(sb);
}
@@ -536,6 +600,11 @@ void __ext4_abort(struct super_block *sb, const
char *function,
printk("\n");
va_end(args);
+ /* record error message in journal super block */
+ va_start(args, fmt);
+ ext4_record_journal_err(sb, __func__, function, fmt, args);
+ va_end(args);
+
if ((sb->s_flags & MS_RDONLY) == 0) {
ext4_msg(sb, KERN_CRIT, "Remounting filesystem read-only");
sb->s_flags |= MS_RDONLY;
@@ -602,6 +671,11 @@ __acquires(bitlock)
printk(KERN_CONT "%pV\n", &vaf);
va_end(args);
+ /* record error message in journal super block */
+ va_start(args, fmt);
+ ext4_record_journal_err(sb, __func__, function, fmt, args);
+ va_end(args);
+
if (test_opt(sb, ERRORS_CONT)) {
ext4_commit_super(sb, 0);
return;
@@ -4067,12 +4141,35 @@ static void ext4_clear_journal_err(struct
super_block *sb,
j_errno = jbd2_journal_errno(journal);
if (j_errno) {
char nbuf[16];
+ char *buf1, *buf2;
+ unsigned long offset1, offset2;
+ int len1, len2;
+
+ /* copy message buffer from journal to super block */
+ buf1 = (char *)journal->j_superblock;
+ offset1 = (unsigned long)buf1 % sb->s_blocksize;
+ buf1 += sizeof(journal_superblock_t);
+ offset1 += sizeof(journal_superblock_t);
+ len1 = sb->s_blocksize - offset1;
+ buf2 = (char *)EXT4_SB(sb)->s_es;
+ offset2 = (unsigned long)buf2 % sb->s_blocksize;
+ buf2 += sizeof(struct ext4_super_block);
+ offset2 += sizeof(struct ext4_super_block);
+ len2 = sb->s_blocksize - offset2;
+ if (len2 > len1)
+ len2 = len1;
+ if (len2 > 0 && *buf1)
+ memcpy(buf2, buf1, len2);
errstr = ext4_decode_error(sb, j_errno, nbuf);
ext4_warning(sb, "Filesystem error recorded "
"from previous mount: %s", errstr);
ext4_warning(sb, "Marking fs in need of filesystem check.");
+ /* clear journal message buffer */
+ if (len1 > 0)
+ memset(buf1, 0, len1);
+
EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS;
es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
ext4_commit_super(sb, 1);
Am I missing something? The kernel stores up to 3k worth of data, on
a 4k block file system. Whereas e2fsck patch blindly assume 2k worhth
of data regardless of the block size. The kernel patch looks ok, but
the e2fsprogs patch seems badly broken....
- Ted
Also, the kernel patch seems to write the messages in 256-byte
records, whereas the e2fsck patch assumes the messages are packed
together in null-terminated packed lines....
- Ted
On Fri, Jan 7, 2011 at 7:26 AM, Ted Ts'o <[email protected]> wrote:
> Am I missing something? ?The kernel stores up to 3k worth of data, on
> a 4k block file system. ?Whereas e2fsck patch blindly assume 2k worhth
> of data regardless of the block size. ?The kernel patch looks ok, but
> the e2fsprogs patch seems badly broken....
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- Ted
>
I thought that ext4 super block on 4K block is 1K at offset 1K, so
kernel only uses the remaining 2K.
It may very well be that e2fsck patch only works with 4K block size properly,
because I never tested it on other block sizes.
Amir.
On Fri, Jan 7, 2011 at 7:28 AM, Ted Ts'o <[email protected]> wrote:
> Also, the kernel patch seems to write the messages in 256-byte
> records, whereas the e2fsck patch assumes the messages are packed
> together in null-terminated packed lines....
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- Ted
>
I guess that is the reason I always see only the first message on fsck :-)
It was always enough information for me though...
feel free to amend to e2fsck patch to match the kernel side.
Thanks,
Amir.
On Fri, Jan 7, 2011 at 9:43 PM, Amir Goldstein <[email protected]> wrote:
> On Fri, Jan 7, 2011 at 7:28 AM, Ted Ts'o <[email protected]> wrote:
>> Also, the kernel patch seems to write the messages in 256-byte
>> records, whereas the e2fsck patch assumes the messages are packed
>> together in null-terminated packed lines....
I checked again. e2fsck patch DOES read messages in 256-byte records:
+static void e2fsck_print_message_buffer(e2fsck_t ctx)
...
+#define MSGLEN 256
...
+ fputs(buf+offset, stdout);
+ offset += MSGLEN;
>>
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- Ted
>>
>
> I guess that is the reason I always see only the first message on fsck :-)
> It was always enough information for me though...
> feel free to amend to e2fsck patch to match the kernel side.
>
> Thanks,
> Amir.
>
On Fri, Jan 7, 2011 at 9:41 PM, Amir Goldstein <[email protected]> wrote:
> On Fri, Jan 7, 2011 at 7:26 AM, Ted Ts'o <[email protected]> wrote:
>> Am I missing something? ?The kernel stores up to 3k worth of data, on
>> a 4k block file system. ?Whereas e2fsck patch blindly assume 2k worhth
>> of data regardless of the block size. ?The kernel patch looks ok, but
>> the e2fsprogs patch seems badly broken....
So it's not badly broken, it copies blocksize-2K, which is clumsily
written like this:
+ int len = ctx->fs->blocksize - 2*SUPERBLOCK_OFFSET;
After that, only 4K block and 8K block will have a leftover,
which will be copied from journal sb+1K to ext4 sb+1K.
Yes, 2K blocks - no message buffer for you!
The reason I am only copying 2K and throwing the extra K is this:
print_message_buffer() is called from check_super_block(),
*after* journal recovery, which was executed either by e2fsck itself
or (and more likely in a errors=remount-ro situation) by ext4
on read-only mount.
In the latter case, the extra K must be discarded, so I saw no reason
to write special code for the first case.
Neither did I find a good reason to complicate the recording code
and limit it to record only blocksize-2K.
>>
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- Ted
>>
>
> I thought that ext4 super block on 4K block is 1K at offset 1K, so
> kernel only uses the remaining 2K.
>
> It may very well be that e2fsck patch only works with 4K block size properly,
> because I never tested it on other block sizes.
>
> Amir.
>
On Fri, Jan 7, 2011 at 11:07 PM, Amir Goldstein <[email protected]> wrote:
> On Fri, Jan 7, 2011 at 9:41 PM, Amir Goldstein <[email protected]> wrote:
>> On Fri, Jan 7, 2011 at 7:26 AM, Ted Ts'o <[email protected]> wrote:
>>> Am I missing something? ?The kernel stores up to 3k worth of data, on
>>> a 4k block file system. ?Whereas e2fsck patch blindly assume 2k worhth
>>> of data regardless of the block size. ?The kernel patch looks ok, but
>>> the e2fsprogs patch seems badly broken....
>
> So it's not badly broken, it copies blocksize-2K, which is clumsily
> written like this:
> + ? ? ? int len = ctx->fs->blocksize - 2*SUPERBLOCK_OFFSET;
>
> After that, only 4K block and 8K block will have a leftover,
> which will be copied from journal sb+1K to ext4 sb+1K.
> Yes, 2K blocks - no message buffer for you!
>
> The reason I am only copying 2K and throwing the extra K is this:
> print_message_buffer() is called from check_super_block(),
> *after* journal recovery, which was executed either by e2fsck itself
> or (and more likely in a errors=remount-ro situation) by ext4
> on read-only mount.
> In the latter case, the extra K must be discarded, so I saw no reason
> to write special code for the first case.
> Neither did I find a good reason to complicate the recording code
> and limit it to record only blocksize-2K.
>
>
Ted,
I have a suggestion how to use the wasted extra K.
As I pointed out in the past, the first/last_error_xxx statistics are most
likely to be lost in errors=panic and errors=remount-ro (journal
recovery will override super block)
If you record this information in the last K of journal sb (even copy
the entire ext4 sb),
you can then override ext4 sb with the most up-to-date error stats
after journal recovery.
Amir.
On Fri, Jan 07, 2011 at 11:07:23PM +0200, Amir Goldstein wrote:
> On Fri, Jan 7, 2011 at 9:41 PM, Amir Goldstein <[email protected]> wrote:
> > On Fri, Jan 7, 2011 at 7:26 AM, Ted Ts'o <[email protected]> wrote:
> >> Am I missing something? ?The kernel stores up to 3k worth of data, on
> >> a 4k block file system. ?Whereas e2fsck patch blindly assume 2k worhth
> >> of data regardless of the block size. ?The kernel patch looks ok, but
> >> the e2fsprogs patch seems badly broken....
>
> So it's not badly broken, it copies blocksize-2K, which is clumsily
> written like this:
> + int len = ctx->fs->blocksize - 2*SUPERBLOCK_OFFSET;
So this should be:
int len = ctx->fs->blocksize - SUPERBLOCK_OFFSET - sizeof (<superblock>);
Although those two numbers are equal right now, there is no reason to
assume that they will remain so in the future. So if the superblock
size (or the offset) changes in the future, it's much better to have
programmed this so that it will keep on working as opposed to getting
to deal with ugly bugs in code that hasn't changed in years...
Roger.
--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ
On Sat, Jan 8, 2011 at 10:05 AM, Rogier Wolff <[email protected]> wrote:
> On Fri, Jan 07, 2011 at 11:07:23PM +0200, Amir Goldstein wrote:
>> On Fri, Jan 7, 2011 at 9:41 PM, Amir Goldstein <[email protected]> wrote:
>> > On Fri, Jan 7, 2011 at 7:26 AM, Ted Ts'o <[email protected]> wrote:
>> >> Am I missing something? ?The kernel stores up to 3k worth of data, on
>> >> a 4k block file system. ?Whereas e2fsck patch blindly assume 2k worhth
>> >> of data regardless of the block size. ?The kernel patch looks ok, but
>> >> the e2fsprogs patch seems badly broken....
>>
>> So it's not badly broken, it copies blocksize-2K, which is clumsily
>> written like this:
>> + ? ? int len = ctx->fs->blocksize - 2*SUPERBLOCK_OFFSET;
>
> So this should be:
>
> ?int len = ctx->fs->blocksize - SUPERBLOCK_OFFSET - sizeof (<superblock>);
>
> Although those two numbers are equal right now, there is no reason to
> assume that they will remain so in the future. So if the superblock
> size (or the offset) changes in the future, it's much better to have
> programmed this so that it will keep on working as opposed to getting
> to deal with ugly bugs in code that hasn't changed in years...
>
Very well, here is a less ugly version of the e2fsck patch:
e2fsck: display recorded error messages
Ext4 error messages are recorded in a message buffer after the
journal super block. After journal recovery, the journal message buffer
is copied to the file system message buffer. On fsck, if the message
buffer is not empty, the recorded messages are printed to stdout and
the buffer is cleared.
Signed-off-by: Amir Goldstein <[email protected]>
diff --git a/e2fsck/journal.c b/e2fsck/journal.c
index 93f685c..00d4d8f 100644
--- a/e2fsck/journal.c
+++ b/e2fsck/journal.c
@@ -837,6 +837,22 @@ static errcode_t recover_ext3_journal(e2fsck_t ctx)
if (journal->j_superblock->s_errno) {
+ /* journal message buffer at journal super block + 1K */
+ char *buf = ((char *) journal->j_superblock) +
+ SUPERBLOCK_OFFSET;
+ int n, len = ctx->fs->blocksize - MSGBUF_OFFSET;
+
+ if (len >= MSGBUF_OFFSET && *buf) {
+ /* keep it simple - write in MSGBUF_OFFSET blocksize */
+ io_channel_set_blksize(ctx->fs->io, MSGBUF_OFFSET);
+ n = len / MSGBUF_OFFSET;
+ /* write journal message buffer to super block + 2K */
+ retval = io_channel_write_blk(ctx->fs->io, 1, n, buf);
+ io_channel_set_blksize(ctx->fs->io, ctx->fs->blocksize);
+ /* clear journal message buffer */
+ memset(buf, 0, len);
+ }
+
ctx->fs->super->s_state |= EXT2_ERROR_FS;
ext2fs_mark_super_dirty(ctx->fs);
journal->j_superblock->s_errno = 0;
diff --git a/e2fsck/super.c b/e2fsck/super.c
index b6923c6..f3274df 100644
--- a/e2fsck/super.c
+++ b/e2fsck/super.c
@@ -452,6 +452,45 @@ static void e2fsck_fix_dirhash_hint(e2fsck_t ctx)
}
}
+/*
+ * This function prints the message buffer at the end of super block.
+ */
+static void e2fsck_print_message_buffer(e2fsck_t ctx)
+{
+ char *buf;
+ int n, len = ctx->fs->blocksize - MSGBUF_OFFSET;
+ unsigned offset = 0;
+ int retval;
+
+ if (len < MSGBUF_OFFSET)
+ /* 1K or 2K fs->blocksize */
+ return;
+
+ buf = (char *) e2fsck_allocate_memory(ctx, len, "message buffer");
+
+ /* keep it simple - write in MSGBUF_OFFSET blocksize */
+ io_channel_set_blksize(ctx->fs->io, MSGBUF_OFFSET);
+ n = len / MSGBUF_OFFSET;
+ /* read message buffer from super block + 2K */
+ retval = io_channel_read_blk(ctx->fs->io, 1, n, buf);
+ if (retval || !*buf)
+ goto out;
+
+ /* print messages in buffer */
+ puts("Error messages recorded in message buffer:");
+ while (offset < len && buf[offset]) {
+ puts(buf+offset);
+ offset += MSGBUF_RECLEN;
+ }
+ /* clear message buffer */
+ memset(buf, 0, len);
+ retval = io_channel_write_blk(ctx->fs->io, 2, 2, buf);
+ puts("End of message buffer.");
+out:
+ io_channel_set_blksize(ctx->fs->io, ctx->fs->blocksize);
+ ext2fs_free_mem(&buf);
+}
+
void check_super_block(e2fsck_t ctx)
{
@@ -874,6 +913,11 @@ void check_super_block(e2fsck_t ctx)
*/
e2fsck_fix_dirhash_hint(ctx);
+ /*
+ * Print message buffer if necessary
+ */
+ e2fsck_print_message_buffer(ctx);
+
return;
}
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index a204eb7..5b73c2a 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -38,6 +38,12 @@ extern "C" {
*/
#define SUPERBLOCK_OFFSET 1024
#define SUPERBLOCK_SIZE 1024
+/*
+ * When blocksize > 2K, the space after the superblock is used as
+ * a buffer to record error messages (in 256 bytes records).
+ */
+#define MSGBUF_OFFSET (SUPERBLOCK_OFFSET+SUPERBLOCK_SIZE)
+#define MSGBUF_RECLEN 256
/*
* The last ext2fs revision level that this version of the library is
On Sat, Jan 8, 2011 at 12:12 AM, Amir Goldstein <[email protected]> wrote:
> On Fri, Jan 7, 2011 at 11:07 PM, Amir Goldstein <[email protected]> wrote:
>> On Fri, Jan 7, 2011 at 9:41 PM, Amir Goldstein <[email protected]> wrote:
>>> On Fri, Jan 7, 2011 at 7:26 AM, Ted Ts'o <[email protected]> wrote:
>>>> Am I missing something? ?The kernel stores up to 3k worth of data, on
>>>> a 4k block file system. ?Whereas e2fsck patch blindly assume 2k worhth
>>>> of data regardless of the block size. ?The kernel patch looks ok, but
>>>> the e2fsprogs patch seems badly broken....
>>
>> So it's not badly broken, it copies blocksize-2K, which is clumsily
>> written like this:
>> + ? ? ? int len = ctx->fs->blocksize - 2*SUPERBLOCK_OFFSET;
>>
>> After that, only 4K block and 8K block will have a leftover,
>> which will be copied from journal sb+1K to ext4 sb+1K.
>> Yes, 2K blocks - no message buffer for you!
>>
>> The reason I am only copying 2K and throwing the extra K is this:
>> print_message_buffer() is called from check_super_block(),
>> *after* journal recovery, which was executed either by e2fsck itself
>> or (and more likely in a errors=remount-ro situation) by ext4
>> on read-only mount.
>> In the latter case, the extra K must be discarded, so I saw no reason
>> to write special code for the first case.
>> Neither did I find a good reason to complicate the recording code
>> and limit it to record only blocksize-2K.
>>
>>
>
> Ted,
>
> I have a suggestion how to use the wasted extra K.
>
> As I pointed out in the past, the first/last_error_xxx statistics are most
> likely to be lost in errors=panic and errors=remount-ro (journal
> recovery will override super block)
> If you record this information in the last K of journal sb (even copy
> the entire ext4 sb),
> you can then override ext4 sb with the most up-to-date error stats
> after journal recovery.
>
> Amir.
>
Ted,
I just realize you did implement save&update of super block s_error_xxx fields.
However, I wonder if it is not a bug to call ext4_commit_super() from
save_error_info()
to commit the s_error_xxx fields in the first place.
The super block buffer is most likely participating in the current
transaction and should
not be committed to disk.
The alleged bug is likely to be hidden by the fact that the super
block has most likely
participated also in the last committed transaction, so a valid
version of it will most
likely override the invalid version.
I can imagine there could be a corner case, though, when committing
the super block
a head of transaction commit will result in inconsistencies.
Am I missing something?
Amir.
On Sat, Jan 08, 2011 at 09:05:20AM +0100, Rogier Wolff wrote:
> Although those two numbers are equal right now, there is no reason to
> assume that they will remain so in the future. So if the superblock
> size (or the offset) changes in the future, it's much better to have
> programmed this so that it will keep on working as opposed to getting
> to deal with ugly bugs in code that hasn't changed in years...
No. The superblock nor its offset will never change. It's like the
syscall ABI, only worse. If we changed it would break *everybody*.
Fortunately there is a huge amount of space left over in the 1024 byte
superblock.
- Ted
On Sat, Jan 08, 2011 at 05:00:59PM -0500, Ted Ts'o wrote:
> On Sat, Jan 08, 2011 at 09:05:20AM +0100, Rogier Wolff wrote:
> > Although those two numbers are equal right now, there is no reason to
> > assume that they will remain so in the future. So if the superblock
> > size (or the offset) changes in the future, it's much better to have
> > programmed this so that it will keep on working as opposed to getting
> > to deal with ugly bugs in code that hasn't changed in years...
>
> No. The superblock nor its offset will never change. It's like the
> syscall ABI, only worse. If we changed it would break *everybody*.
> Fortunately there is a huge amount of space left over in the 1024 byte
> superblock.
It's called defensive programming. It prevents bugs before they
happen. By your reasoning you could've written 2048 or 0x800 there.
My version:
- documents why something is subtracted from the blockszize and how
much.
- keeps on working even if the superblock would oddly change size
in the future. Even if now you don't expect that to happen ever.
Roger.
--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ
On Sun, Jan 09, 2011 at 09:12:49AM +0100, Rogier Wolff wrote:
> > No. The superblock nor its offset will never change. It's like the
> > syscall ABI, only worse. If we changed it would break *everybody*.
> > Fortunately there is a huge amount of space left over in the 1024 byte
> > superblock.
>
> It's called defensive programming. It prevents bugs before they
> happen. By your reasoning you could've written 2048 or 0x800 there.
Defensive programming would be something like
BUG_ON(sizeof(struct ext4_super_block) != 1024);
(unfortunately #error sizeof(struct ext4_super_block) != 1024 won't
work since #error is handled by the preprocessor, and I don't think we
can trigger a compile-time warning for a structure size issue).
We could add that, if people like. I do have regression tests (i.e.,
boot a system with ext4) which would die if anything like that
changed, though.
And yes, I have similar regression tests in e2fsprogs that would
trigger if the superblock size were to ever change.
- Ted
P.S. The only way I can think of to do it at compile time would be to
build a test .o file with -g, and then use a program like pahole that
pulls the information out of the DWARF information. Might actually be
a good thing to do that, since it could also be useful for automating
searches for unoptimize structures. Unfortunately, many developers
don't have the DWARF utilities installed, so that would add a
dependency on the kernel build.
On 2011-01-09, at 07:58, Ted Ts'o wrote:
> Defensive programming would be something like
>
> BUG_ON(sizeof(struct ext4_super_block) != 1024);
>
> (unfortunately #error sizeof(struct ext4_super_block) != 1024 won't
> work since #error is handled by the preprocessor, and I don't think we
> can trigger a compile-time warning for a structure size issue).
We actually use a compile-time assertion in the Lustre code:
/*
* compile-time assertions. @cond has to be constant expression.
* ISO C Standard:
*
* 6.8.4.2 The switch statement
*
* ....
*
* [#3] The expression of each case label shall be an integer
* constant expression and no two of the case constant expressions
* in the same switch statement shall have the same value after
* after conversion...
*/
#define CLASSERT(cond) ({ switch(42) { case (cond): case 0: break; } })
Of course, the "cond" expression must be resolvable at compile time, or
it will turn into a compile error because it is not legal to have a
variable "case".
Cheers, Andreas
On Sun, Jan 09, 2011 at 09:58:38AM -0500, Ted Ts'o wrote:
> On Sun, Jan 09, 2011 at 09:12:49AM +0100, Rogier Wolff wrote:
> > > No. The superblock nor its offset will never change. It's like the
> > > syscall ABI, only worse. If we changed it would break *everybody*.
> > > Fortunately there is a huge amount of space left over in the 1024 byte
> > > superblock.
> >
> > It's called defensive programming. It prevents bugs before they
> > happen. By your reasoning you could've written 2048 or 0x800 there.
>
> Defensive programming would be something like
>
> BUG_ON(sizeof(struct ext4_super_block) != 1024);
It is one form of "defense", but not what I call defensive
programming. Defensive programming, is that you make things robust in
the the face of unexpected changes.
If you do the BUG_ON and do this throughout the code, one day your
grandson will be increasing the superblock size. He'll fix all the
BUG_ON and your other "defensive" measures. But lo and behold. He's
human, and forgot one or two. Especially the run-time detections that
only get called occasionally (like in this case on an error
sitatuation) might take a while before they are noticed.
What use is it to turn a "we've found a serious error in your
filesystem, we strongly recommend you no longer write to it and run
fsck first" into a "system halted"?
What is wrong with just putting the right formula where it belongs?
We need to set the variable "len" to the amount of free space beyond
the superblock in the first block of the filesystem. So we take the
size of the first block, subtract the size of the superblock and we
subtract the start of the superblock. It's as simple as that.
> We could add that, if people like. I do have regression tests (i.e.,
> boot a system with ext4) which would die if anything like that
> changed, though.
How about
Makefile:
ext4.o: ...the objects.... testsbsize.out
testsbsize.out: testsbsize
./testsbsize
(oh and something about useing "hostcc" for testsbsize).
with testsbsize.c:
#include <stdio.h>
#include <...ext4....>
int main (int argc, char **argv)
{
if (sizeof (struct ext4_super_block) != 1024) {
fprintf (stderr, "Superblock size is %d, should be 1024.\n", sizeof (struct ext4_super_block));
exit (1);
}
exit (0);
}
Roger.
--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ