2006-02-04 23:43:43

by Ed Tomlinson

[permalink] [raw]
Subject: Re: 2.6.16-rc1-mm2 (mm5 too) panics

Hi,

I need some help figuring this one out. I cannot use git bisect and applying the 2.6.15-1 reiserfs4
patch does not work with newer 2.6.16 levels - looks like the mutex conversion hits it hard. Looking
in mm there are about 50 reiser4 patches...

To reproduce this problem I need reiser4 though the issue maybe in libata or scsi. Does anyone
have an idea how I can proceed to find what makes newer kernels get io errors which trigger a reiser4
panic? The panic is a zam-597 (fs/reiser4/txnmgr.c) with an rc=-5

Anyone have a git tree tracking linus with reiser4 patches applied staring before 2.6.15 - if so git
bisect could be used to find the change causing the problem.

One datapoint. In 2.6.15 + reiser4 2.6.15-1 works fine - the resiser4 filesystem is heavily used with
no errors. Smart report no problems with the drive being used. The libata driver used is sata_nv.

Please reply to my email - I am only subscribed to lkml.

Help!
Ed Tomlinson

ret = reiser4_write_logs(nr_submitted);
if (ret < 0)
reiser4_panic("zam-597", "write log failed (%ld)\n", ret);

On Saturday 28 January 2006 08:46, Ed Tomlinson wrote:
> On Friday 27 January 2006 09:53, you wrote:
> > Ed Tomlinson wrote:
> > > Summarizing all this. There are two problems here.
> > >
> > > 1. reserifs4 panics when it gets io errors - I remember this was an issue that
> > > needed to be fixed in the R4 code before it moves to mainline...
> > >
> > > 2. Why does a drive which is fine with 2.6.15-rc5-mm3, return a -5 with 2.6.16-mm3
> > > and above? Smart reports no problems with the drive hardware. What has changed
> > > in the libata/scsi stacks?
> >
> > That's a long answer. Could you assist in narrowing down the versions
> > which are affected?
> >
> > It would also be useful if you could try vanilla kernels, and help us
> > discover whether problems surfaces in 2.6.15, 2.6.15-git[1234],
> > 2.6.16-rc1, etc.
>
> Jeff,
>
> I'll see what I can do with git bisect. Given that reserifs4 is in the picture this may be
> fun... I expect it will be a slow process (kernels take 40min to build here).
>
> Thanks
> Ed Tomlinson
>
>


2006-02-24 01:58:18

by Ed Tomlinson

[permalink] [raw]
Subject: 2.6.16-rc4-mm1 & 2.6.16-rc1-mm2 (mm5 too) panics

Hi,

2.6.16-rc4-mm1 panics with a zam-597 (fs/reiser4/txnmgr.c) with an rc=-5

This has been happening since 2.6.16-rc1-mm2. It makes testing with reiserfs4 impossible here.

I have not been able to figure out how to use bisect etc to find out what is triggering this. I had
hoped that reversing "jbd: split checkpoint lists" as per the fix for ocfs2 might help (its in rc4-mm1)
but no such luck.

Does anyone have a suggestion on how I can help find what is broken here?

TIA,
Ed Tomlinson

On Saturday 04 February 2006 18:43, Ed Tomlinson wrote:

> I need some help figuring this one out. I cannot use git bisect and applying the 2.6.15-1 reiserfs4
> patch does not work with newer 2.6.16 levels - looks like the mutex conversion hits it hard. Looking
> in mm there are about 50 reiser4 patches...
>
> To reproduce this problem I need reiser4 though the issue maybe in libata or scsi. Does anyone
> have an idea how I can proceed to find what makes newer kernels get io errors which trigger a reiser4
> panic? The panic is a zam-597 (fs/reiser4/txnmgr.c) with an rc=-5
>
> Anyone have a git tree tracking linus with reiser4 patches applied staring before 2.6.15 - if so git
> bisect could be used to find the change causing the problem.
>
> One datapoint. In 2.6.15 + reiser4 2.6.15-1 works fine - the resiser4 filesystem is heavily used with
> no errors. Smart report no problems with the drive being used. The libata driver used is sata_nv.
>
> Please reply to my email - I am only subscribed to lkml.
>
> Help!
> Ed Tomlinson
>
> ret = reiser4_write_logs(nr_submitted);
> if (ret < 0)
> reiser4_panic("zam-597", "write log failed (%ld)\n", ret);
>
> On Saturday 28 January 2006 08:46, Ed Tomlinson wrote:
> > On Friday 27 January 2006 09:53, you wrote:
> > > Ed Tomlinson wrote:
> > > > Summarizing all this. There are two problems here.
> > > >
> > > > 1. reserifs4 panics when it gets io errors - I remember this was an issue that
> > > > needed to be fixed in the R4 code before it moves to mainline...
> > > >
> > > > 2. Why does a drive which is fine with 2.6.15-rc5-mm3, return a -5 with 2.6.16-mm3
> > > > and above? Smart reports no problems with the drive hardware. What has changed
> > > > in the libata/scsi stacks?
> > >
> > > That's a long answer. Could you assist in narrowing down the versions
> > > which are affected?
> > >
> > > It would also be useful if you could try vanilla kernels, and help us
> > > discover whether problems surfaces in 2.6.15, 2.6.15-git[1234],
> > > 2.6.16-rc1, etc.
> >
> > Jeff,
> >
> > I'll see what I can do with git bisect. Given that reserifs4 is in the picture this may be
> > fun... I expect it will be a slow process (kernels take 40min to build here).
> >
> > Thanks
> > Ed Tomlinson
> >
> >
>
>