On Tue, Mar 17, 2009 at 11:40:19AM +0200, Denis Karpov wrote:
> Hello,
>
> first off, sorry if you getting this email twice.
No problem, I'm not exactly able to reproduce it myself, but Jan Kara
has just fixed some issues which could explain it: they happen under
memory pressure so I may not have triggered it if I didn't put it
under pressure.
Jan's fixes are here:
http://marc.info/?l=linux-ext4&m=123731584711382&w=2
It would be interesting to try them, and if they don't work maybe
he's also interested so I cc'ed him.
> I also tried to do ext3/ext4 fs smoketesting and used Adraian's
> script. I am consistently getting the same results - filesystem get's
> corrupted.
> I tested on quad Xeon, with patches posted in this thread.
>
> 1. tests with brd:
> - ext3fs on brd
> corruption (see attached ext3fs.brd.corruption.txt)
> - ext4fs on brd
> corruption (see attached ext4fs.brd.corruption.txt)
>
> In both cases I saw some complains from JBD/JBD2:
> JBD: Detected IO errors while flushing file data on
>
> 2. I enabled JBD debugging, re-run the tests. Console was
> flooded with messages and in the end I got a soft lockup.
> I cannot consistently reproduce this (see attached
> brd.ext3fs.softlock.txt).
>
> Just to be sure I re-run the tests on real block device (usb stick)
>
> 3. tests with real block device (usb stick)
> - ext3fs
> no fs currption (overnight run)
> - ext4fs
> no fs currption (overnight run)
It's possible the real block device is not fast enough to trigger
it, or different timings don't trigger it (brd requests complete
immediately wheras real devices tend to complete afterwards,
from (soft)interrupt context).
Or it could be that brd is consuming some more memory to push
the system into reclaim and exposing those bugs Jan has fixed...
> Any ideas what else can be done here? I'd like to find out if this is
> filesystem or brd related fault.
Yes, thanks for persisting. If you can test the patches and see
if they help? If not, does ext2 show corruption? How about ext3
on loop device (with backing file from tmpfs/ramfs for speed).
Thanks,
Nick
> On Tue, Mar 17, 2009 at 11:40:19AM +0200, Denis Karpov wrote:
> > Hello,
> >
> > first off, sorry if you getting this email twice.
>
> No problem, I'm not exactly able to reproduce it myself, but Jan Kara
> has just fixed some issues which could explain it: they happen under
> memory pressure so I may not have triggered it if I didn't put it
> under pressure.
>
> Jan's fixes are here:
>
> http://marc.info/?l=linux-ext4&m=123731584711382&w=2
>
> It would be interesting to try them, and if they don't work maybe
> he's also interested so I cc'ed him.
>
>
> > I also tried to do ext3/ext4 fs smoketesting and used Adraian's
> > script. I am consistently getting the same results - filesystem get's
> > corrupted.
> > I tested on quad Xeon, with patches posted in this thread.
> >
> > 1. tests with brd:
> > - ext3fs on brd
> > corruption (see attached ext3fs.brd.corruption.txt)
> > - ext4fs on brd
> > corruption (see attached ext4fs.brd.corruption.txt)
> >
> > In both cases I saw some complains from JBD/JBD2:
> > JBD: Detected IO errors while flushing file data on
Yes, my patches fix exactly this problem. So please try running with
them. I'm not sure about that HTREE corruption you see during fsck. That
seems to be a separate issue.
> > 2. I enabled JBD debugging, re-run the tests. Console was
> > flooded with messages and in the end I got a soft lockup.
> > I cannot consistently reproduce this (see attached
> > brd.ext3fs.softlock.txt).
Yes, this usually produces far too many messages. The soft lockup was
probably caused by the machine being too busy logging all the messages
(log files are synced which adds much more to the load of the
filesystem). I'd probably leave that aside for now and concentrate on
the corruption problem.
Honza
--
Jan Kara <[email protected]>
SuSE CR Labs
> > Jan's fixes are here:
> > http://marc.info/?l=linux-ext4&m=123731584711382&w=2
> > It would be interesting to try them, and if they don't work maybe
> > he's also interested so I cc'ed him.
Hi,
thank you reppl. I re-run the tests with this patch.
> > >
> > > In both cases I saw some complains from JBD/JBD2:
> > > JBD: Detected IO errors while flushing file data on
> Yes, my patches fix exactly this problem. So please try running with
> them. I'm not sure about that HTREE corruption you see during fsck. That
> seems to be a separate issue.
Unfortunately it looks like the problem is not fixed - JBD still complains
and in the end HTREE is getting damaged, in both ext3 and ext4 tests (see
attached logs).
Denis
On Fri, Mar 20, 2009 at 02:24:05PM +0200, Denis Karpov wrote:
> Unfortunately it looks like the problem is not fixed - JBD still complains
> and in the end HTREE is getting damaged, in both ext3 and ext4 tests (see
> attached logs).
>
> Denis
Please, disregard the previos message, I tested with a wrong patchset.
Sorry for the hussle.
Denis
On Wed, Mar 18, 2009 at 02:42:02PM +0100, ext Jan Kara wrote:
> > On Tue, Mar 17, 2009 at 11:40:19AM +0200, Denis Karpov wrote:
> > Jan's fixes are here:
> > http://marc.info/?l=linux-ext4&m=123731584711382&w=2
> > It would be interesting to try them, and if they don't work maybe
> > he's also interested so I cc'ed him.
Hello,
I've re-run the tests (with Jan's patches and also Nick's "fs: new inode
i_state corruption fix" patch).
> > > In both cases I saw some complains from JBD/JBD2:
> > > JBD: Detected IO errors while flushing file data on
> Yes, my patches fix exactly this problem. So please try running with
> them. I'm not sure about that HTREE corruption you see during fsck. That
> seems to be a separate issue.
The issue with JBD seems to be gone. But problem with HTREE being corrupted
still remains (see attached logs).
Denis