Hi,
I've undone aic7xxx changes which were locking up some machines on
initialization.
The new driver is now named drivers/scsi/aic79xx and is under
CONFIG_AIC79XX.
Justin, unfortunately I can't even THINK about updating aic7xxx to your
new driver at the current release stage. I will do so in the 2.4.22.
The update also contains a PCI posting flush fix from Arjan.
People, please test the driver.
> Hi,
>
> I've undone aic7xxx changes which were locking up some machines on
> initialization.
Hmm. It would have been nice to have the oportunity to fix this correctly.
As it stands now, I have really no idea what people were testing or not
since by taking Alan's patch you have lost the complete change history
and the ability to step people through the changes. I have preserved
this history in the bk send output that is available on my site if at
some point that is useful to you.
> The new driver is now named drivers/scsi/aic79xx and is under
> CONFIG_AIC79XX.
So we now have an extra copy of the assembler, the Config files, and
the aiclib files. This is not a solution. If you wanted to selectively
update the aic79xx driver, all you had to do was ask me for the requisite
change sets. This is what a mainatiner is for.
> Justin, unfortunately I can't even THINK about updating aic7xxx to your
> new driver at the current release stage. I will do so in the 2.4.22.
Does this mean that you will actually take BK changes form me instead of
from just about anyone else that sends you aic7xxx driver updates? I had
pretty much given up on this.
> The update also contains a PCI posting flush fix from Arjan.
Which is completely unnecessary and in fact will cause hangs and crashes
on many Dell servers. The "fix" for the VIA systems that violate the
PCI spec is to either:
1) Update the driver correctly so that it's detection logic will
automatically disable memory mapped I/O for these broken systems.
or
2) Just disable the BIOS options that configure the system to violate
the PCI prefetching rules.
Slowing down all systems, even the ones that are *not broken* by doing
extra, random, PCI read cycles is not a fix.
If you want some verification of the Dell issue (which I'm sure will
cause problems on other "fast" systems too), just ask Matt Domsh.
Again, if you have concerns about the aic7xxx or aic79xx drivers, my
mail box is always open. Waiting to contact me until the last minute
where I can only sit on the sidelines and watch another train wreck is
not the best way to ensure that the drivers function correctly in 2.4.X.
What this basically boils down to is trust. If you don't trust me,
tell me how I can build that trust. Without it, I can only continue
to tell most people that contact me with bug reports, "It's already
fixed in the official driver. You can pull the latest from ..."
--
Justin
On Thu, 08 May 2003 18:45:42 -0600
"Justin T. Gibbs" <[email protected]> wrote:
> > Hi,
> > [...]
> > Justin, unfortunately I can't even THINK about updating aic7xxx to your
> > new driver at the current release stage. I will do so in the 2.4.22.
>
> [...]
> Again, if you have concerns about the aic7xxx or aic79xx drivers, my
> mail box is always open. Waiting to contact me until the last minute
> where I can only sit on the sidelines and watch another train wreck is
> not the best way to ensure that the drivers function correctly in 2.4.X.
>
> What this basically boils down to is trust. If you don't trust me,
> tell me how I can build that trust. Without it, I can only continue
> to tell most people that contact me with bug reports, "It's already
> fixed in the official driver. You can pull the latest from ..."
Justin, just to complete the picture: as I wrote some days ago concerning your
hint to "use the latest from ..." your latest driver does not complete booting
on (at least) my system but freezes - which I wrote to LKML. I have not yet
heard
anything about this issue. You cannot expect to include a newer driver which
performs obviously worse in some cases.
"Worse" here means "fails" and not "performs bad". Marcelos' decision on the
topic looks pretty reasonable to me...
Regards,
Stephan
On Fri, May 09, 2003 at 12:06:48PM +0200, Stephan von Krawczynski wrote:
> Justin, just to complete the picture: as I wrote some days ago concerning your
> hint to "use the latest from ..." your latest driver does not complete booting
> on (at least) my system but freezes - which I wrote to LKML. I have not yet
> heard
> anything about this issue. You cannot expect to include a newer driver which
> performs obviously worse in some cases.
> "Worse" here means "fails" and not "performs bad". Marcelos' decision on the
> topic looks pretty reasonable to me...
What's your setup ? Are you in SMP ? I was hit by a lock bug introduced near
6.2.30, which Justin fixed recently and included in his latest driver
(20030502). Justin suggested to me to try the NMI watchdog to find what was
wrong and it pointed us to a spinlock problem. Have you tried to debug
something ? I must say that this driver seems really robust now on my setup
(dual athlon), but perhaps your problem is of the same order and could be fixed
easily with some help, which would be good for you and everyone else.
Regards,
Willy
On Fri, 9 May 2003 14:06:59 +0200
Willy Tarreau <[email protected]> wrote:
> On Fri, May 09, 2003 at 12:06:48PM +0200, Stephan von Krawczynski wrote:
>
> > Justin, just to complete the picture: as I wrote some days ago concerning
> > your hint to "use the latest from ..." your latest driver does not complete
> > booting on (at least) my system but freezes - which I wrote to LKML. I have
> > not yet heard
> > anything about this issue. You cannot expect to include a newer driver
> > which performs obviously worse in some cases.
> > "Worse" here means "fails" and not "performs bad". Marcelos' decision on
> > the topic looks pretty reasonable to me...
>
> What's your setup ? Are you in SMP ?
SMP PIII 1.4 GHz, dual Adaptec AIC-7899P U160/m (rev 01)
> I was hit by a lock bug introduced near
> 6.2.30, which Justin fixed recently and included in his latest driver
> (20030502). Justin suggested to me to try the NMI watchdog to find what was
> wrong and it pointed us to a spinlock problem. Have you tried to debug
> something ?
I cannot say which version of the driver it was, the only thing I can tell you
is that the archive was called aic79xx-linux-2.4-20030410-tar.gz.
> I must say that this driver seems really robust now on my setup
> (dual athlon), but perhaps your problem is of the same order and could be
> fixed easily with some help, which would be good for you and everyone else.
I can't tell, basic problem in my setup is that it seems virtually impossible
to bring some 100GB of data onto a streamer connected to the above aic. It
crashes almost every day with a freeze and no oops or other message. I am at
the moment willing to await 2.4.21 and see, and if that does not solve it, then
I will probably go back to a dual symbios controller which I used before and
never had any glitches with.
This is a system in production and not particularly useful for debugging a lot
and correspoding downtime.
Regards,
Stephan
On Fri, May 09, 2003 at 03:02:07PM +0200, Stephan von Krawczynski wrote:
> I cannot say which version of the driver it was, the only thing I can tell you
> is that the archive was called aic79xx-linux-2.4-20030410-tar.gz.
That's really interesting, because I got the bug since around this version
(20030417 IIRC), and it locked up only on SMP, sometimes during boot, or
during heavy disk accesses caused by "updatedb" and "make -j dep". It's
fixed in 20030502 from http://people.freebsd.org/~gibbs/linux/SRC/
> I can't tell, basic problem in my setup is that it seems virtually impossible
> to bring some 100GB of data onto a streamer connected to the above aic. It
> crashes almost every day with a freeze and no oops or other message.
I had the same symptom which is very frustrating, I agree. I even had
difficulties to catch the NMI watchdog output which was often truncated.
> I am at the moment willing to await 2.4.21 and see, and if that does not solve it,
Well, would you at least agree to retest current version from the above URL ?
I find it a bit of a shame that the driver goes back in -rc stage.
Marcelo, do you have some information about the setup from the people who reported
hangs to you ? Perhaps we could even ask them to confirm that Justin's updated
driver fixes their problems ?
> This is a system in production and not particularly useful for debugging a lot
> and correspoding downtime.
I certainly can understand ;-)
Regards,
Willy
On Fri, 9 May 2003 15:27:57 +0200
Willy Tarreau <[email protected]> wrote:
> On Fri, May 09, 2003 at 03:02:07PM +0200, Stephan von Krawczynski wrote:
>
> > I cannot say which version of the driver it was, the only thing I can tell
> > you is that the archive was called aic79xx-linux-2.4-20030410-tar.gz.
>
> That's really interesting, because I got the bug since around this version
> (20030417 IIRC), and it locked up only on SMP, sometimes during boot, or
> during heavy disk accesses caused by "updatedb" and "make -j dep". It's
> fixed in 20030502 from http://people.freebsd.org/~gibbs/linux/SRC/
I tried to merge the latest aic archive into 2.4.21-rc2, besides the "usual"
signed/unsigned warnings I got this one:
aic7xxx_osm.c: In function `ahc_linux_map_seg':
aic7xxx_osm.c:770: warning: integer constant is too large for "long" type
FYI
--
Regards,
Stephan
On Fri, 9 May 2003 15:27:57 +0200
Willy Tarreau <[email protected]> wrote:
> Well, would you at least agree to retest current version from the above URL ?
> I find it a bit of a shame that the driver goes back in -rc stage.
Ok, I can tell you at least this: it boots. Just did it. I can tell tomorrow
how it behaves with my specific problem.
This is a setup with 2.4.21-rc2 and aic79xx-linux-2.4-20030502-tar.gz.
--
Regards,
Stephan
On Fri, May 09, 2003 at 04:11:06PM +0200, Stephan von Krawczynski wrote:
> On Fri, 9 May 2003 15:27:57 +0200
> Willy Tarreau <[email protected]> wrote:
>
> > Well, would you at least agree to retest current version from the above URL ?
> > I find it a bit of a shame that the driver goes back in -rc stage.
>
> Ok, I can tell you at least this: it boots. Just did it. I can tell tomorrow
> how it behaves with my specific problem.
Thanks for having tried ;-)
Willy
On Fri, May 09, 2003 at 03:46:37PM +0200, Stephan von Krawczynski wrote:
> On Fri, 9 May 2003 15:27:57 +0200
> Willy Tarreau <[email protected]> wrote:
>
> > On Fri, May 09, 2003 at 03:02:07PM +0200, Stephan von Krawczynski wrote:
> >
> > > I cannot say which version of the driver it was, the only thing I can tell
> > > you is that the archive was called aic79xx-linux-2.4-20030410-tar.gz.
> >
> > That's really interesting, because I got the bug since around this version
> > (20030417 IIRC), and it locked up only on SMP, sometimes during boot, or
> > during heavy disk accesses caused by "updatedb" and "make -j dep". It's
> > fixed in 20030502 from http://people.freebsd.org/~gibbs/linux/SRC/
>
> I tried to merge the latest aic archive into 2.4.21-rc2, besides the "usual"
> signed/unsigned warnings I got this one:
>
> aic7xxx_osm.c: In function `ahc_linux_map_seg':
> aic7xxx_osm.c:770: warning: integer constant is too large for "long" type
Good catch, but in fact, it's more this line which worries me :
758: if ((addr ^ (addr + len - 1)) & ~0xFFFFFFFF) {
I don't see how ~0xFFFFFFFF can be non-null on 32 bits archs, because addr is
a bus_addr_t which is in turn dma_addr_t which itself is u32. So unless I don't
find the trick this would mean that this code should never be executed. Perhaps
~0xFFFFFFFFULL would be more appropriate, or even >0xFFFFFFFF, since this can be
detected with u32 using the carry left by the addition.
Regards,
Willy
> ull on 32 bits archs, because addr is
> a bus_addr_t which is in turn dma_addr_t which itself is u32. So unless I don't
> find the trick this would mean that this code should never be executed. Perhaps
dma_addr_t is either u32 or u64 on x86
On Fri, May 09, 2003 at 04:56:21PM +0200, Willy Tarreau wrote:
> I don't see how ~0xFFFFFFFF can be non-null on 32 bits archs, because addr is
> a bus_addr_t which is in turn dma_addr_t which itself is u32. So unless I don't
> find the trick this would mean that this code should never be executed. Perhaps
> ~0xFFFFFFFFULL would be more appropriate, or even >0xFFFFFFFF, since this can be
> detected with u32 using the carry left by the addition.
include/asm-i386/types.h line 55
#ifdef CONFIG_HIGHMEM
typedef u64 dma_addr_t;
#else
typedef u32 dma_addr_t;
#endif
-- wli
Willy Tarreau <[email protected]> writes:
|> On Fri, May 09, 2003 at 03:46:37PM +0200, Stephan von Krawczynski wrote:
|> > On Fri, 9 May 2003 15:27:57 +0200
|> > Willy Tarreau <[email protected]> wrote:
|> >
|> > > On Fri, May 09, 2003 at 03:02:07PM +0200, Stephan von Krawczynski wrote:
|> > >
|> > > > I cannot say which version of the driver it was, the only thing I can tell
|> > > > you is that the archive was called aic79xx-linux-2.4-20030410-tar.gz.
|> > >
|> > > That's really interesting, because I got the bug since around this version
|> > > (20030417 IIRC), and it locked up only on SMP, sometimes during boot, or
|> > > during heavy disk accesses caused by "updatedb" and "make -j dep". It's
|> > > fixed in 20030502 from http://people.freebsd.org/~gibbs/linux/SRC/
|> >
|> > I tried to merge the latest aic archive into 2.4.21-rc2, besides the "usual"
|> > signed/unsigned warnings I got this one:
|> >
|> > aic7xxx_osm.c: In function `ahc_linux_map_seg':
|> > aic7xxx_osm.c:770: warning: integer constant is too large for "long" type
|>
|> Good catch, but in fact, it's more this line which worries me :
|>
|> 758: if ((addr ^ (addr + len - 1)) & ~0xFFFFFFFF) {
|>
|> I don't see how ~0xFFFFFFFF can be non-null on 32 bits archs
It will always be zero even on 64 bit archs, because ~0xFFFFFFFF is of
type unsigned int. The context doesn't matter.
Andreas.
--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 N?rnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
On Fri, May 09, 2003 at 05:08:03PM +0200, Arjan van de Ven wrote:
> > ull on 32 bits archs, because addr is
> > a bus_addr_t which is in turn dma_addr_t which itself is u32. So unless I don't
> > find the trick this would mean that this code should never be executed. Perhaps
>
> dma_addr_t is either u32 or u64 on x86
Yes Arjan, but it's u64 only if CONFIG_HIGHMEM is set. So I repost my question
in another way : is this code supposed to be executed when CONFIG_HIGHMEM=n
since (u32)(~0xFFFFFFFF) = 0 ?
Regards,
Willy
On Fri, 9 May 2003 16:57:38 +0200
Willy Tarreau <[email protected]> wrote:
> On Fri, May 09, 2003 at 04:11:06PM +0200, Stephan von Krawczynski wrote:
> > On Fri, 9 May 2003 15:27:57 +0200
> > Willy Tarreau <[email protected]> wrote:
> >
> > > Well, would you at least agree to retest current version from the above
> > > URL ? I find it a bit of a shame that the driver goes back in -rc stage.
> >
> > Ok, I can tell you at least this: it boots. Just did it. I can tell
> > tomorrow how it behaves with my specific problem.
>
> Thanks for having tried ;-)
Hello all,
I have tried 2.4.21-rc2 with aic79xx-linux-2.4-20030502-tar.gz for three days
now and have to say it performs well. I had no freezes any more and nothing
weird happening. Everything is smooth and ok. This is the best performance I
have seen comparing all 2.4.21-X versions tested.
Thanks a lot.
I will proceed with further stress tests...
Regards,
Stephan
On Monday 12 May 2003 11:02, Stephan von Krawczynski wrote:
> I have tried 2.4.21-rc2 with aic79xx-linux-2.4-20030502-tar.gz for three
> days now and have to say it performs well. I had no freezes any more and
> nothing weird happening. Everything is smooth and ok. This is the best
> performance I have seen comparing all 2.4.21-X versions tested.
>
> Thanks a lot.
same here. 0 Problems at all.
ciao, Marc
Hi All,
On Mon, May 12, 2003 at 11:02:18AM +0200, Stephan von Krawczynski wrote:
> I have tried 2.4.21-rc2 with aic79xx-linux-2.4-20030502-tar.gz for three days
> now and have to say it performs well. I had no freezes any more and nothing
> weird happening. Everything is smooth and ok. This is the best performance I
> have seen comparing all 2.4.21-X versions tested.
Same here, it seems rock solid on my dual athlon and has survived several
hours of 5 simultaneous make -j 8 bzImage modules with swapping. Definitely the
most stable for me since I've switched from Doug's to Justin's driver.
Marcelo, would it be unreasonable to include it in -rc3 ? After all, it would
not be a radical update, since it was removed from -rc2 ? Just a few bug fixes.
What do you think ?
Regards,
Willy
On Mon, 12 May 2003 11:02:18 +0200
Stephan von Krawczynski <[email protected]> wrote:
> On Fri, 9 May 2003 16:57:38 +0200
> Willy Tarreau <[email protected]> wrote:
>
> > On Fri, May 09, 2003 at 04:11:06PM +0200, Stephan von Krawczynski wrote:
> > > On Fri, 9 May 2003 15:27:57 +0200
> > > Willy Tarreau <[email protected]> wrote:
> > >
> > > > Well, would you at least agree to retest current version from the above
> > > > URL ? I find it a bit of a shame that the driver goes back in -rc
> > > > stage.
> > >
> > > Ok, I can tell you at least this: it boots. Just did it. I can tell
> > > tomorrow how it behaves with my specific problem.
> >
> > Thanks for having tried ;-)
>
> Hello all,
>
> I have tried 2.4.21-rc2 with aic79xx-linux-2.4-20030502-tar.gz for three days
> now and have to say it performs well. I had no freezes any more and nothing
> weird happening. Everything is smooth and ok. This is the best performance I
> have seen comparing all 2.4.21-X versions tested.
>
> Thanks a lot.
>
> I will proceed with further stress tests...
Ok. I managed to crash the tested machine after 14 days now. The crash itself
is exactly like former 2.4.21-X. It just freezes, no oops no nothing. It looks
like things got better, but not solved.
Regards,
Stephan
> Ok. I managed to crash the tested machine after 14 days now. The crash itself
> is exactly like former 2.4.21-X. It just freezes, no oops no nothing. It looks
> like things got better, but not solved.
What is telling you that the freeze is SCSI related? Are you running
with the nmi watchdog and have a trace? Do you have driver messages
that you aren't sharing?
--
Justin
On Fri, 23 May 2003 06:58:41 -0600
"Justin T. Gibbs" <[email protected]> wrote:
> > Ok. I managed to crash the tested machine after 14 days now. The crash
> > itself is exactly like former 2.4.21-X. It just freezes, no oops no
> > nothing. It looks like things got better, but not solved.
>
> What is telling you that the freeze is SCSI related? Are you running
> with the nmi watchdog and have a trace? Do you have driver messages
> that you aren't sharing?
Hello Justin,
to make that clear: I am in no way sure _what_ is causing the problem. I am
only updating the (very few) infos I gave/could give during the last weeks.
>From looking at the ongoings I would say your driver patch (URL already sent
several times) made things better. This does obviously not mean that the
kernel-included aic-driver is the sole cause of the troubles.
I am in fact very pleased that rc2/aic-20030502 made things quite noticably
better than every 21-rc/pre before.
What I am giving is a positive feedback, but I have as few logs for it as I had
for the very negative I sent times ago.
Anyway, I am continuing with stress-tests on rc3/aic-20030520.
Regards,
Stephan
On Fri, 23 May 2003, Stephan von Krawczynski wrote:
> On Mon, 12 May 2003 11:02:18 +0200
> Stephan von Krawczynski <[email protected]> wrote:
>
> > On Fri, 9 May 2003 16:57:38 +0200
> > Willy Tarreau <[email protected]> wrote:
> >
> > > On Fri, May 09, 2003 at 04:11:06PM +0200, Stephan von Krawczynski wrote:
> > > > On Fri, 9 May 2003 15:27:57 +0200
> > > > Willy Tarreau <[email protected]> wrote:
> > > >
> > > > > Well, would you at least agree to retest current version from the above
> > > > > URL ? I find it a bit of a shame that the driver goes back in -rc
> > > > > stage.
> > > >
> > > > Ok, I can tell you at least this: it boots. Just did it. I can tell
> > > > tomorrow how it behaves with my specific problem.
> > >
> > > Thanks for having tried ;-)
> >
> > Hello all,
> >
> > I have tried 2.4.21-rc2 with aic79xx-linux-2.4-20030502-tar.gz for three days
> > now and have to say it performs well. I had no freezes any more and nothing
> > weird happening. Everything is smooth and ok. This is the best performance I
> > have seen comparing all 2.4.21-X versions tested.
> >
> > Thanks a lot.
> >
> > I will proceed with further stress tests...
>
> Ok. I managed to crash the tested machine after 14 days now. The crash itself
> is exactly like former 2.4.21-X. It just freezes, no oops no nothing. It looks
> like things got better, but not solved.
>
What about rc3?
On Fri, 23 May 2003 15:30:33 -0300 (BRT)
Marcelo Tosatti <[email protected]> wrote:
> What about rc3?
I will inform you if anything bad happens :-)
rc3+aic20030520 tests started today.
Regards,
Stephan
Hello !
On Fri, May 23, 2003 at 06:58:41AM -0600, Justin T. Gibbs wrote:
> > Ok. I managed to crash the tested machine after 14 days now. The crash itself
> > is exactly like former 2.4.21-X. It just freezes, no oops no nothing. It looks
> > like things got better, but not solved.
>
> What is telling you that the freeze is SCSI related? Are you running
> with the nmi watchdog and have a trace? Do you have driver messages
> that you aren't sharing?
Stephen,
Justin is right, you should run it through the NMI watchdog, in the hope to
find something useful. If it hangs again in 14 days, you won't know why and
that may be frustrating. With the NMI watchdog, you at least have a chance to
see where it locks up, and you may find it to be within the driver, which would
help Justin stabilize it, or within any other kernel subsystem.
I had to use nmi_watchdog=2 at boot time, but other people use 1.
Regards,
Willy
On Fri, 23 May 2003 21:57:57 +0200
Willy Tarreau <[email protected]> wrote:
> Hello !
>
> On Fri, May 23, 2003 at 06:58:41AM -0600, Justin T. Gibbs wrote:
> > > Ok. I managed to crash the tested machine after 14 days now. The crash
> > > itself is exactly like former 2.4.21-X. It just freezes, no oops no
> > > nothing. It looks like things got better, but not solved.
> >
> > What is telling you that the freeze is SCSI related? Are you running
> > with the nmi watchdog and have a trace? Do you have driver messages
> > that you aren't sharing?
>
> Stephen,
>
> Justin is right, you should run it through the NMI watchdog, in the hope to
> find something useful. If it hangs again in 14 days, you won't know why and
> that may be frustrating. With the NMI watchdog, you at least have a chance to
> see where it locks up, and you may find it to be within the driver, which
> would help Justin stabilize it, or within any other kernel subsystem.
>
> I had to use nmi_watchdog=2 at boot time, but other people use 1.
>
> Regards,
> Willy
Hello Willy,
I will do that, but I am not so confident about this, because the box runs X
and a console oops output from nmi may as well not be visible nor written to
disk.
Regards,
Stephan
On Sat, May 24, 2003 at 12:52:52PM +0200, Stephan von Krawczynski wrote:
> On Fri, 23 May 2003 21:57:57 +0200
> Willy Tarreau <[email protected]> wrote:
>
> > Hello !
> >
> > On Fri, May 23, 2003 at 06:58:41AM -0600, Justin T. Gibbs wrote:
> > > > Ok. I managed to crash the tested machine after 14 days now. The crash
> > > > itself is exactly like former 2.4.21-X. It just freezes, no oops no
> > > > nothing. It looks like things got better, but not solved.
> > >
> > > What is telling you that the freeze is SCSI related? Are you running
> > > with the nmi watchdog and have a trace? Do you have driver messages
> > > that you aren't sharing?
> >
> > Stephen,
> >
> > Justin is right, you should run it through the NMI watchdog, in the hope to
> > find something useful. If it hangs again in 14 days, you won't know why and
> > that may be frustrating. With the NMI watchdog, you at least have a chance to
> > see where it locks up, and you may find it to be within the driver, which
> > would help Justin stabilize it, or within any other kernel subsystem.
> >
> > I had to use nmi_watchdog=2 at boot time, but other people use 1.
> >
> > Regards,
> > Willy
>
> Hello Willy,
>
> I will do that, but I am not so confident about this, because the box runs X
> and a console oops output from nmi may as well not be visible nor written to
> disk.
OK, I understand. Other options are : serial console (worked for me after
several retries), remote syslogd (sometimes works if the system can still
schedule a bit), or patches such as netconsole, which sends the logs to a
remote host, and kmsgdump which tries to get them onto a floppy after a
panic or a forced dump.
Regards,
Willy
On Sat, 24 May 2003 13:16:08 +0200
Willy Tarreau <[email protected]> wrote:
> > Hello Willy,
> >
> > I will do that, but I am not so confident about this, because the box runs
> > X and a console oops output from nmi may as well not be visible nor written
> > to disk.
>
> OK, I understand. Other options are : serial console (worked for me after
> several retries), remote syslogd (sometimes works if the system can still
> schedule a bit), or patches such as netconsole, which sends the logs to a
> remote host, and kmsgdump which tries to get them onto a floppy after a
> panic or a forced dump.
>
> Regards,
> Willy
Hello all,
it did not take really long for rc3+aic20030520 to freeze - exactly one day.
Though I used nmi_watchdog there are no presentable outputs. As I expected the
screen simply is black and no messages are in any logfiles.
Again it froze while tar-ing about 80 GB of data onto an aic-driven SDLT. Data
is coming from IDE drive connected to a 3ware 7500-8 (though no raid
configuration).
I conclude that rc2+aic20030502 was way better.
Ah yes, one more thing: I can ping the box, but keyboard, mouse, display is
dead and usually working processes stopped (like snmp).
Willy: I am willing to try a serial console setup (as it does not interfere
with X). I have tried this before with no luck. Can you provide some hints how
you got that working (yes, I read Documentation/serial-console.txt, but I could
not manage any output on the serial line).
Regards,
Stephan
Hello !
On Sun, May 25, 2003 at 12:58:11PM +0200, Stephan von Krawczynski wrote:
> it did not take really long for rc3+aic20030520 to freeze - exactly one day.
Well, in some ways, it will be easier to debug it than when it took 14 days, if
it's the same bug, of course.
> Though I used nmi_watchdog there are no presentable outputs. As I expected the
> screen simply is black and no messages are in any logfiles.
> Again it froze while tar-ing about 80 GB of data onto an aic-driven SDLT. Data
> is coming from IDE drive connected to a 3ware 7500-8 (though no raid
> configuration).
OK, so there's a high probability that the problem is related to either SCSI or
IDE (or both), and less likely implies any other parts.
> Ah yes, one more thing: I can ping the box, but keyboard, mouse, display is
> dead and usually working processes stopped (like snmp).
that's surprizing, mine was completely dead IIRC. It's like it doesn't schedule
anymore but still processes interrupts. I don't know if a deadlock can cause
this behaviour.
> Willy: I am willing to try a serial console setup (as it does not interfere
> with X). I have tried this before with no luck. Can you provide some hints how
> you got that working (yes, I read Documentation/serial-console.txt, but I could
> not manage any output on the serial line).
I had to try several times, because the freeze was so sudden that I often
caught only a few chars. Justin even didn't believe me. First, you have to
check that CONFIG_SERIAL_CONSOLE is enabled. After that, you'll need a remote
console which can work at high speeds (I could get interesting results at 38400
bps). Surprizingly, above I had mangled output. Perhaps my cable wasn't good
enough (flat cisco RJ45 console cable). I also disabled hard and soft flow
control. But as I already stated, in my case it was easier because it froze
every 2-3 boots, and when it didn't I only had to start a "make -j dep" to get
it. So if I got frozen with no messages, I simply hit the reset button and tried
again. It seems more complicated in your case (although your big tar may be
helping).
When your setup seems OK, you should test it to be sure. I often use "mdir"
with nothing in the drive, or AltGr-SysRq-P to get console messages. If you
don't see anything on your serial console, then your setup is not ready yet
for a test.
Oh and by the way, if you're using modules, you may find interesting to keep
copies of lsmod output, and /proc/ksyms to get a more accurate decoding with a
further ksymoops.
If you really cannot catch anything, I suggest one of these solutions :
- apply the netconsole patch and have a linux box on the same lan with the
netconsole server. You can find it in -aa kernels for example.
- apply the kmsgdump patch, only if you have a floppy drive or a parallel
printer. It will try to reset the system after a panic, and use bios calls
to write the kernel messages buffer on the media. This usually works, but
there are some corner cases where it doesn't. But it's easy to try with
AltGr-SysRq-D. Download it from http://w.ods.org/tools/kmsgdump/
Good luck !
Willy
On Sunday 25 May 2003 12:58, Stephan von Krawczynski wrote:
Hi Stephan,
> Though I used nmi_watchdog there are no presentable outputs. As I expected
> the screen simply is black and no messages are in any logfiles.
> Again it froze while tar-ing about 80 GB of data onto an aic-driven SDLT.
> Data is coming from IDE drive connected to a 3ware 7500-8 (though no raid
> configuration).
>
> I conclude that rc2+aic20030502 was way better.
> Ah yes, one more thing: I can ping the box, but keyboard, mouse, display is
> dead and usually working processes stopped (like snmp).
> Willy: I am willing to try a serial console setup (as it does not interfere
> with X). I have tried this before with no luck. Can you provide some hints
> how you got that working (yes, I read Documentation/serial-console.txt, but
> I could not manage any output on the serial line).
before trying this, could you please update to aic20030523? Thank you.
ciao, Marc
On Sun, 25 May 2003 14:47:56 +0200
Marc-Christian Petersen <[email protected]> wrote:
> On Sunday 25 May 2003 12:58, Stephan von Krawczynski wrote:
>
> Hi Stephan,
> before trying this, could you please update to aic20030523? Thank you.
Is there a changelog somewhere? What is the difference between 20030520 and 20030523 ?
Regards,
Stephan
On Sun, 25 May 2003, Stephan von Krawczynski wrote:
> Is there a changelog somewhere? What is the difference between 20030520
> and 20030523 ?
See drivers/scsi/aic7xxx/CHANGELOG
Geller Sandor <[email protected]>
On Sunday 25 May 2003 15:50, Stephan von Krawczynski wrote:
Hi Stephan,
> > before trying this, could you please update to aic20030523? Thank you.
> Is there a changelog somewhere? What is the difference between 20030520 and
> 20030523 ?
yes, there is a changelog. Unfortunately in the tar.gz package because the one
on Justins website isn't up2date. I've made it available on my website.
http://wolk.sf.net/tmp/AIC-CHANGELOG
ciao, Marc
> Willy: I am willing to try a serial console setup (as it does not interfere
> with X).
Are you still running all of your tests with X up? You then have no chance
of getting any useful diagnostics without a serial console. Can't you switch
back to a vty while the test is running?
>I have tried this before with no luck. Can you provide some hints how
> you got that working (yes, I read Documentation/serial-console.txt, but
> I could not manage any output on the serial line).
You will need a null modem cable. Config a kernel with serial console
support enabled. Use a fairly high speed for your console (115200). To
enable your first serial port as a console add something like the following
to your kenrel command line:
console=ttyS0,115200 console=vty0
This will retain console output on the vty too.
--
Justin
On Sun, 25 May 2003 14:47:56 +0200
Marc-Christian Petersen <[email protected]> wrote:
> On Sunday 25 May 2003 12:58, Stephan von Krawczynski wrote:
>
> Hi Stephan,
> before trying this, could you please update to aic20030523? Thank you.
>
>
> ciao, Marc
Hello Marc,
I did this. The combination rc3+aic20030523 survived the first day of tests. So
it seems at least better than rc3+aic20030520.
I'll keep you informed.
Regards,
Stephan
On Mon, May 26, 2003 at 05:00:58PM +0200, Stephan von Krawczynski wrote:
> On Sun, 25 May 2003 14:47:56 +0200
> Marc-Christian Petersen <[email protected]> wrote:
>
> > On Sunday 25 May 2003 12:58, Stephan von Krawczynski wrote:
> >
> > Hi Stephan,
> > before trying this, could you please update to aic20030523? Thank you.
> >
> >
> > ciao, Marc
>
> Hello Marc,
>
> I did this. The combination rc3+aic20030523 survived the first day of tests. So
> it seems at least better than rc3+aic20030520.
The same has been running on my Alpha since yesterday evening on a 54GB raid0
which I transformed to raid5 (39 GB backed up to IDE ; mkraid ; 39GB restored).
Still alive.
Cheers,
Willy
Hello Marcelo,
I tried plain rc6 now and have to tell you it does not survive a single day of
my usual tests. It freezes during tar from 3ware-driven IDE to aic-driven SDLT.
This is identical to all previous rc (and some pre) releases of 2.4.21. So far
I can tell you that the only thing that has recently cured this problem is
replacing the aic-driver with latest of justins' releases.
As plain rc6 does definitely not work I will now switch over to
rc6+aic-20030523. Remember that rc3+aic-20030523 already worked quite ok (4
days test survived).
My personal opinion is a known-to-be-broken 2.4.21 should not be released, as a
lot of people only try/use the releases and therefore an immediately released
2.4.22-pre1 with justins driver will not be a good solution.
Regards,
Stephan
On Friday 30 May 2003 10:09, Stephan von Krawczynski wrote:
Hi Stephan,
> I tried plain rc6 now and have to tell you it does not survive a single day
> of my usual tests. It freezes during tar from 3ware-driven IDE to
> aic-driven SDLT. This is identical to all previous rc (and some pre)
> releases of 2.4.21. So far I can tell you that the only thing that has
> recently cured this problem is replacing the aic-driver with latest of
> justins' releases.
> As plain rc6 does definitely not work I will now switch over to
> rc6+aic-20030523. Remember that rc3+aic-20030523 already worked quite ok (4
> days test survived).
same experience on my boxen (quite much with AIC)
> My personal opinion is a known-to-be-broken 2.4.21 should not be released,
> as a lot of people only try/use the releases and therefore an immediately
> released 2.4.22-pre1 with justins driver will not be a good solution.
ACK!
Maybe we should disable AIC Config option and instead add a comment like:
comment 'For AICXXXX, please go to http://people.freebsd.org/~gibbs/linux/'
comment 'and download the latest tar.gz and unpack these drivers!'
comment 'After unpacking, enable Config.in option in drivers/scsi/Config.in'
*scnr* ;)
ciao, Marc
> My personal opinion is a known-to-be-broken 2.4.21 should not be released, as a
> lot of people only try/use the releases and therefore an immediately released
> 2.4.22-pre1 with justins driver will not be a good solution.
I think you missed the point entirely before. 2.4.21 CANNOT cause
regressions most of all. At this point there is no way to know if the
thing that fixes your machine breaks on 100s others that DO work
correctly in 2.4.20. Even if it would fix 100s and break 1 it's still
not acceptable for stable kernel releases.
On 30 May 2003 10:21:33 +0200
Arjan van de Ven <[email protected]> wrote:
>
>
> > My personal opinion is a known-to-be-broken 2.4.21 should not be released,
> > as a lot of people only try/use the releases and therefore an immediately
> > released 2.4.22-pre1 with justins driver will not be a good solution.
>
> I think you missed the point entirely before. 2.4.21 CANNOT cause
> regressions most of all. At this point there is no way to know if the
> thing that fixes your machine breaks on 100s others that DO work
> correctly in 2.4.20. Even if it would fix 100s and break 1 it's still
> not acceptable for stable kernel releases.
Unfortunately you miss my point (which is probably too simple to be clearly
visible):
I want to give some feedback on a topic/problem I am experiencing since _long_.
I was _asked_ to do so. Additionally I am stating my _opinion_. I am _not_
telling anybody what to do. I am not in a position to do so. Very likely only
_few_ people are in such a position, very likely the maintainer of aic and
hopefully Marcelo.
Have you read all available bug reports Justin got? If you have not, don't play
with numbers.
Another personal opinion: software development tends to make things possible
that "cannot be". ;-)
Regards,
Stephan
On Fri, May 30, 2003 at 10:09:00AM +0200, Stephan von Krawczynski wrote:
> Hello Marcelo,
>
> I tried plain rc6 now and have to tell you it does not survive a single day of
> my usual tests. It freezes during tar from 3ware-driven IDE to aic-driven SDLT.
> This is identical to all previous rc (and some pre) releases of 2.4.21. So far
> I can tell you that the only thing that has recently cured this problem is
> replacing the aic-driver with latest of justins' releases.
> As plain rc6 does definitely not work I will now switch over to
> rc6+aic-20030523. Remember that rc3+aic-20030523 already worked quite ok (4
> days test survived).
Also, does the aic7xxx_old driver work for you?
The "old" part is only in regards to lack of support for very-new
aic7xxx hardware.
Jeff
On Fri, May 30, 2003 at 10:09:00AM +0200, Stephan von Krawczynski wrote:
> Hello Marcelo,
>
> I tried plain rc6 now and have to tell you it does not survive a single day of
> my usual tests. It freezes during tar from 3ware-driven IDE to aic-driven SDLT.
> This is identical to all previous rc (and some pre) releases of 2.4.21. So far
> I can tell you that the only thing that has recently cured this problem is
> replacing the aic-driver with latest of justins' releases.
So Justin's driver fixes your 3ware problems???
And exactly what -rc/-pre release stopped working for you?
Jeff
On Fri, 30 May 2003 09:34:56 -0400
Jeff Garzik <[email protected]> wrote:
> On Fri, May 30, 2003 at 10:09:00AM +0200, Stephan von Krawczynski wrote:
> > Hello Marcelo,
> >
> > I tried plain rc6 now and have to tell you it does not survive a single day
> > of my usual tests. It freezes during tar from 3ware-driven IDE to
> > aic-driven SDLT. This is identical to all previous rc (and some pre)
> > releases of 2.4.21. So far I can tell you that the only thing that has
> > recently cured this problem is replacing the aic-driver with latest of
> > justins' releases.
>
> So Justin's driver fixes your 3ware problems???
This is _no_ 3ware problem. As I told you data comes from 3ware and goes to
aic. The problem occurs if using plain-version aic and is gone if using justins
latest releases.
As long as we do nothing with the aic driver there is no problem at all (3ware
works fine here).
> And exactly what -rc/-pre release stopped working for you?
Very good question. I can check, but I need one day per version to check. It
may well be that in fact none of the pre/rc releases worked, we have this box
since about pre3 and to my knowledge we always had the problem. Boy, we were
quite happy when we found out that Justins stuff got it going - it already got
on our nerves quite a bit ;-)
If you want to know about some special kernel release just tell me and I will
try it.
Maybe I should tell again details about the test setup as not all may remember
in this long-lasting thread.
Basically the problem seldomly arises after booting. I have the impression that
this got in fact better over the releases, earlier pre's froze earlier.
what we do:
1) copy around 50 - 100 GB of data via nfs to a 3ware drive (always works well)
2) tar this data on the nfs server from 3ware drive to aic(-driven) SDLT
(quantum)
3) verify the archived data via tar
freezes happen while 2) or 3). If you reboot after 1) they are very rare, never
on any later rc-release.
As this whole things takes time we do it overnight and have a look at the box
next morning. Not a single plain release is ok on the next morning. Checking
the logs we find out it froze in 2) or 3).
If you do exactly the same thing on exactly the same box with exactly the same
data but Justins driver everything is ok (aic-20030523). It was not ok with
aic-20030520 (just to mention this), aic-20030502 was quite ok (survived 14 days).
What else can I tell you?
Regards,
Stephan
Hello all,
It took some days to produce output for my freezing problem. This one is rc7+aic20030603:
Jun 5 16:53:55 admin kernel: Unable to handle kernel paging request at virtual address 8e30a7c5
Jun 5 16:53:55 admin kernel: printing eip:
Jun 5 16:53:55 admin kernel: c013755e
Jun 5 16:53:55 admin kernel: *pde = 00000000
Jun 5 16:53:55 admin kernel: Oops: 0000
Jun 5 16:53:55 admin kernel: CPU: 0
Jun 5 16:53:55 admin kernel: EIP: 0010:[kmem_cache_alloc_batch+78/272] Not tainted
Jun 5 16:53:55 admin kernel: EIP: 0010:[<c013755e>] Not tainted
Jun 5 16:53:55 admin kernel: EFLAGS: 00010006
Jun 5 16:53:55 admin kernel: eax: e62d70eb ebx: e62d70eb ecx: f57ae401 edx: 00000020
Jun 5 16:53:55 admin kernel: esi: 00000043 edi: 0000003a ebp: c342b060 esp: e5e63a28
Jun 5 16:53:55 admin kernel: ds: 0018 es: 0018 ss: 0018
Jun 5 16:53:55 admin kernel: Process tar (pid: 7112, stackpage=e5e63000)
Jun 5 16:53:55 admin kernel: Stack: c342b068 c342b070 c342b060 00000246 00000020 e7420000 c01382eb c342b060
Jun 5 16:53:55 admin kernel: c3461000 00000020 00000000 c342bdb8 00000000 e7420000 c013749c c342b060
Jun 5 16:53:55 admin kernel: 00000020 d3d05ec0 00000003 00000020 c342bdb8 00000246 00000020 e5e63b14
Jun 5 16:53:55 admin kernel: Call Trace: [__kmem_cache_alloc+107/304] [kmem_cache_grow+508/624] [__kmem_cache_alloc+125/304] [get_mem_for_virtual_node+87/224] [fix_nodes+198/1008]
Jun 5 16:53:55 admin kernel: Call Trace: [<c01382eb>] [<c013749c>] [<c01382fd>] [<c01846a7>] [<c0184bc6>]
Jun 5 16:53:55 admin kernel: [reiserfs_paste_into_item+147/304] [reiserfs_get_block+1989/4800] [bh_action+106/112] [tasklet_hi_action+83/160] [smp_apic_timer_interrupt+264/304] [.text.lock.buffer+191/610]
Jun 5 16:53:55 admin kernel: [<c0191ae3>] [<c017cca5>] [<c012252a>] [<c01223b3>] [<c0115d88>] [<c01474bd>]
Jun 5 16:53:55 admin kernel: [getblk+109/128] [is_tree_node+100/112] [search_by_key+1824/3792] [__block_prepare_write+479/880] [block_prepare_write+51/144] [reiserfs_get_block+0/4800]
Jun 5 16:53:55 admin kernel: [<c014447d>] [<c018e8f4>] [<c018f020>] [<c014503f>] [<c0145a23>] [<c017c4e0>]
Jun 5 16:53:55 admin kernel: [generic_file_write+970/2128] [reiserfs_get_block+0/4800] [sys_write+155/384] [system_call+51/56]
Jun 5 16:53:55 admin kernel: [<c013397a>] [<c017c4e0>] [<c0141d8b>] [<c010782f>]
Jun 5 16:53:55 admin kernel:
Jun 5 16:53:55 admin kernel: Code: 8b 44 81 18 0f af da 8b 51 0c 89 41 14 01 d3 40 0f 84 89 00
Does this help?
Regards,
Stephan
On Thu, Jun 05, 2003 at 05:05:51PM +0200, Stephan von Krawczynski wrote:
> Hello all,
>
> It took some days to produce output for my freezing problem. This one is rc7+aic20030603:
Good !
It seems that it crashed in the reiserfs code rather than in aic7xxx ! perhaps
you hit 2 different bugs, or perhaps there's a race that only newer code can
trigger, or there's a leak somewhere. You may want to forward the oops to the
reiserfs team too.
> Jun 5 16:53:55 admin kernel: Call Trace: [<c01382eb>] [<c013749c>] [<c01382fd>] [<c01846a7>] [<c0184bc6>]
> Jun 5 16:53:55 admin kernel: [reiserfs_paste_into_item+147/304] [reiserfs_get_block+1989/4800] [bh_action+106/112] [tasklet_hi_action+83/160] [smp_apic_timer_interrupt+264/304] [.text.lock.buffer+191/610]
> Jun 5 16:53:55 admin kernel: [<c0191ae3>] [<c017cca5>] [<c012252a>] [<c01223b3>] [<c0115d88>] [<c01474bd>]
> Jun 5 16:53:55 admin kernel: [getblk+109/128] [is_tree_node+100/112] [search_by_key+1824/3792] [__block_prepare_write+479/880] [block_prepare_write+51/144] [reiserfs_get_block+0/4800]
> Jun 5 16:53:55 admin kernel: [<c014447d>] [<c018e8f4>] [<c018f020>] [<c014503f>] [<c0145a23>] [<c017c4e0>]
> Jun 5 16:53:55 admin kernel: [generic_file_write+970/2128] [reiserfs_get_block+0/4800] [sys_write+155/384] [system_call+51/56]
> Jun 5 16:53:55 admin kernel: [<c013397a>] [<c017c4e0>] [<c0141d8b>] [<c010782f>]
> Jun 5 16:53:55 admin kernel:
> Jun 5 16:53:55 admin kernel: Code: 8b 44 81 18 0f af da 8b 51 0c 89 41 14 01 d3 40 0f 84 89 00
Cheers and thanks for the test !
Willy
Hello!
On Thu, Jun 05, 2003 at 08:14:23PM +0200, Willy Tarreau wrote:
> > It took some days to produce output for my freezing problem. This one is rc7+aic20030603:
> Good !
> It seems that it crashed in the reiserfs code rather than in aic7xxx ! perhaps
> you hit 2 different bugs, or perhaps there's a race that only newer code can
> trigger, or there's a leak somewhere. You may want to forward the oops to the
> reiserfs team too.
No, it did crashed in allocation code (you skipped one trace line):
Jun 5 16:53:55 admin kernel: Call Trace: [__kmem_cache_alloc+107/304] [kmem_cache_grow+508/624] [__kmem_cache_alloc+125/304]
+[get_mem_for_virtual_node+87/224] [fix_nodes+198/1008]
And the EIP is in kmem_cache_alloc_batch, sounds like it tripped on bad pointer or something like this.
So something is corrupting slab lists it seems.
Bye,
Oleg
On Fri, 6 Jun 2003 12:17:12 +0400
Oleg Drokin <[email protected]> wrote:
> Hello!
>
> On Thu, Jun 05, 2003 at 08:14:23PM +0200, Willy Tarreau wrote:
> > > It took some days to produce output for my freezing problem. This one is
> > > rc7+aic20030603:
> > Good !
> > It seems that it crashed in the reiserfs code rather than in aic7xxx !
> > perhaps you hit 2 different bugs, or perhaps there's a race that only newer
> > code can trigger, or there's a leak somewhere. You may want to forward the
> > oops to the reiserfs team too.
>
> No, it did crashed in allocation code (you skipped one trace line):
> Jun 5 16:53:55 admin kernel: Call Trace: [__kmem_cache_alloc+107/304]
> [kmem_cache_grow+508/624]
> [__kmem_cache_alloc+125/304]+[get_mem_for_virtual_node+87/224]
> [fix_nodes+198/1008]
>
> And the EIP is in kmem_cache_alloc_batch, sounds like it tripped on bad
> pointer or something like this. So something is corrupting slab lists it
> seems.
>
> Bye,
> Oleg
I agree with you. Only problem is: how can I find out what caused the problem.
The only thing I can tell is that the box never hangs when using only HDs on
the aic & 3ware controllers. As soon as I begin to use a SDLT drive on aic
things get fishy.
Regards,
Stephan
Hello!
On Fri, Jun 06, 2003 at 11:04:08AM +0200, Stephan von Krawczynski wrote:
> > No, it did crashed in allocation code (you skipped one trace line):
> > Jun 5 16:53:55 admin kernel: Call Trace: [__kmem_cache_alloc+107/304]
> > [kmem_cache_grow+508/624]
> > [__kmem_cache_alloc+125/304]+[get_mem_for_virtual_node+87/224]
> > [fix_nodes+198/1008]
> >
> > And the EIP is in kmem_cache_alloc_batch, sounds like it tripped on bad
> > pointer or something like this. So something is corrupting slab lists it
> > seems.
> I agree with you. Only problem is: how can I find out what caused the problem.
Probably by careful code observations.
> The only thing I can tell is that the box never hangs when using only HDs on
> the aic & 3ware controllers. As soon as I begin to use a SDLT drive on aic
> things get fishy.
You do not have reiserfs filesystem on a tape drive, right? ;)
But thhat reduces the region to review to parts thqt deal with tape devices and
tape-specific stuff, it seems.
Bye,
Oleg
Hello Oleg,
while experimenting around my other problem I noticed my box freezes for some
seconds while tar is re-creating an archive of around 70 GB size on a reiserfs
with 3ware-connected device.
This is experienced with 2.4.21-rc7. Reproducable via:
create BIG tar archive file (my size 70 GB) on a reiserfs
re-create same archive and watch box gone dead while the old archive is zapped.
(Gone dead means: mouse froze, keyboard froze, X froze)
The effect is visible for several seconds, then everything is back to normal.
It's no big deal if you are interactively dealing with the cause (tar). But if
you deal with background processes in server environment where your primary
process goes suddenly dead for seconds you are probably not amused...
Can you verify this? Is this device or fs dependant?
Regards,
Stephan
Hello!
On Fri, Jun 06, 2003 at 05:24:54PM +0200, Stephan von Krawczynski wrote:
> while experimenting around my other problem I noticed my box freezes for some
> seconds while tar is re-creating an archive of around 70 GB size on a reiserfs
> with 3ware-connected device.
> This is experienced with 2.4.21-rc7. Reproducable via:
> create BIG tar archive file (my size 70 GB) on a reiserfs
> re-create same archive and watch box gone dead while the old archive is zapped.
> (Gone dead means: mouse froze, keyboard froze, X froze)
Hm, I will try .
Wild guess: does this patch helps? (untessted, not even compiled, but should be safe )
Bye,
Oleg
===== stree.c 1.21 vs edited =====
--- 1.21/fs/reiserfs/stree.c Tue Mar 4 19:48:52 2003
+++ edited/fs/reiserfs/stree.c Fri Jun 6 20:01:29 2003
@@ -1773,6 +1773,8 @@
journal_begin(th, p_s_inode->i_sb, orig_len_alloc) ;
reiserfs_update_inode_transaction(p_s_inode) ;
}
+ if (current->need_resched)
+ schedule() ;
} while ( n_file_size > ROUND_UP (n_new_file_size) &&
search_for_position_by_key(p_s_inode->i_sb, &s_item_key, &s_search_path) == POSITION_FOUND ) ;
On Fri, 2003-06-06 at 12:02, Oleg Drokin wrote:
> Hello!
>
> On Fri, Jun 06, 2003 at 05:24:54PM +0200, Stephan von Krawczynski wrote:
>
> > while experimenting around my other problem I noticed my box freezes for some
> > seconds while tar is re-creating an archive of around 70 GB size on a reiserfs
> > with 3ware-connected device.
> > This is experienced with 2.4.21-rc7. Reproducable via:
> > create BIG tar archive file (my size 70 GB) on a reiserfs
> > re-create same archive and watch box gone dead while the old archive is zapped.
> > (Gone dead means: mouse froze, keyboard froze, X froze)
>
> Hm, I will try .
>
> Wild guess: does this patch helps? (untessted, not even compiled, but should be safe )
>
There are still some latency issues with io in rc7, it could be a
general problem.
-chris
Hello!
On Fri, Jun 06, 2003 at 03:00:54PM -0400, Chris Mason wrote:
> There are still some latency issues with io in rc7, it could be a
> general problem.
Hm. But I think everything that was not needing disk io (i.e. mouse stuff)
should not be affected?
Bye,
Oleg
On Fri, 2003-06-06 at 15:10, Oleg Drokin wrote:
> Hello!
>
> On Fri, Jun 06, 2003 at 03:00:54PM -0400, Chris Mason wrote:
>
> > There are still some latency issues with io in rc7, it could be a
> > general problem.
>
> Hm. But I think everything that was not needing disk io (i.e. mouse stuff)
> should not be affected?
>
It shouldn't ;-) But the problems are still not completely understood.
This particular problem could still be reiserfs, it's hard to say right
now.
-chris
> -----Original Message-----
> From: Oleg Drokin [mailto:[email protected]]
> Sent: Friday, June 06, 2003 3:10 PM
> To: Chris Mason
> Cc: Stephan von Krawczynski; linux-kernel
> Subject: Re: short freezing while file re-creation
>
> Hello!
>
> On Fri, Jun 06, 2003 at 03:00:54PM -0400, Chris Mason wrote:
>
> > There are still some latency issues with io in rc7, it could be a
> > general problem.
>
> Hm. But I think everything that was not needing disk io (i.e. mouse
stuff)
> should not be affected?
There is still some *serious* issues on machines with large amounts of
memory and the default VM. Maybe they happen in smaller memory
systems too, just smaller boxes recover quicker?
On Fri, 6 Jun 2003 13:17:59 +0400
Oleg Drokin <[email protected]> wrote:
> Hello!
>
> On Fri, Jun 06, 2003 at 11:04:08AM +0200, Stephan von Krawczynski wrote:
> > > No, it did crashed in allocation code (you skipped one trace line):
> > > Jun 5 16:53:55 admin kernel: Call Trace: [__kmem_cache_alloc+107/304]
> > > [kmem_cache_grow+508/624]
> > > [__kmem_cache_alloc+125/304]+[get_mem_for_virtual_node+87/224]
> > > [fix_nodes+198/1008]
> > >
> > > And the EIP is in kmem_cache_alloc_batch, sounds like it tripped on bad
> > > pointer or something like this. So something is corrupting slab lists it
> > > seems.
> > I agree with you. Only problem is: how can I find out what caused the problem.
>
> Probably by careful code observations.
>
> > The only thing I can tell is that the box never hangs when using only HDs on
> > the aic & 3ware controllers. As soon as I begin to use a SDLT drive on aic
> > things get fishy.
>
> You do not have reiserfs filesystem on a tape drive, right? ;)
> But thhat reduces the region to review to parts thqt deal with tape devices and
> tape-specific stuff, it seems.
>
> Bye,
> Oleg
Hello all,
in the meantime I got another oops and it looks like this:
ksymoops 2.4.8 on i686 2.4.21-rc7-aic. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.21-rc7-aic/ (default)
-m /boot/System.map-2.4.21-rc7-aic (default)
Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.
Jun 8 10:48:49 linux kernel: Oops: 0000
Jun 8 10:48:49 linux kernel: CPU: 1
Jun 8 10:48:49 linux kernel: EIP: 0010:[<c013755e>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Jun 8 10:48:49 linux kernel: EFLAGS: 00010006
Jun 8 10:48:49 linux kernel: eax: 5a005139 ebx: 5a005139 ecx: edb89c21 edx: 00000060
Jun 8 10:48:49 linux kernel: esi: 00000021 edi: 0000005c ebp: c342fecc esp: e4007d74
Jun 8 10:48:49 linux kernel: ds: 0018 es: 0018 ss: 0018
Jun 8 10:48:49 linux kernel: Process tar (pid: 17369, stackpage=e4007000)
Jun 8 10:48:49 linux kernel: Stack: c342fed4 c342fedc c342fecc 00000246 00000070 effa58a0 c01382eb c342fecc
Jun 8 10:48:49 linux kernel: c3467800 00000070 00000000 c1000020 effa58a0 effa58a0 c013f7d9 c342fecc
Jun 8 10:48:49 linux kernel: 00000070 00000000 c013f8a5 c349d418 f6fc1200 00000000 00000000 c1000020
Jun 8 10:48:49 linux kernel: Call Trace: [<c01382eb>] [<c013f7d9>] [<c013f8a5>] [<c01b8f73>] [<c01b929e>]
Jun 8 10:48:49 linux kernel: [<c01b936c>] [<c0145596>] [<c0139fc2>] [<c013069e>] [<c017c4e0>] [<c013124f>]
Jun 8 10:48:49 linux kernel: [<c0131531>] [<c0131ad0>] [<c0131d20>] [<c0131ad0>] [<c0141c0b>] [<c010782f>]
Jun 8 10:48:49 linux kernel: Code: 8b 44 81 18 0f af da 8b 51 0c 89 41 14 01 d3 40 0f 84 89 00
>>EIP; c013755e <kmem_cache_alloc_batch+4e/110> <=====
>>ecx; edb89c21 <_end+2d7f78e1/38547d20>
>>ebp; c342fecc <_end+309db8c/38547d20>
>>esp; e4007d74 <_end+23c75a34/38547d20>
Trace; c01382eb <__kmem_cache_alloc+6b/130>
Trace; c013f7d9 <alloc_bounce_bh+19/a0>
Trace; c013f8a5 <create_bounce+45/190>
Trace; c01b8f73 <__make_request+3d3/640>
Trace; c01b929e <generic_make_request+be/140>
Trace; c01b936c <submit_bh+4c/70>
Trace; c0145596 <block_read_full_page+2c6/2e0>
Trace; c0139fc2 <__alloc_pages+42/190>
Trace; c013069e <generic_buffer_fdatasync+5e/110>
Trace; c017c4e0 <reiserfs_get_block+0/12c0>
Trace; c013124f <generic_file_readahead+af/1a0>
Trace; c0131531 <do_generic_file_read+1c1/470>
Trace; c0131ad0 <file_read_actor+0/110>
Trace; c0131d20 <generic_file_read+140/160>
Trace; c0131ad0 <file_read_actor+0/110>
Trace; c0141c0b <sys_read+9b/180>
Trace; c010782f <system_call+33/38>
Code; c013755e <kmem_cache_alloc_batch+4e/110>
00000000 <_EIP>:
Code; c013755e <kmem_cache_alloc_batch+4e/110> <=====
0: 8b 44 81 18 mov 0x18(%ecx,%eax,4),%eax <=====
Code; c0137562 <kmem_cache_alloc_batch+52/110>
4: 0f af da imul %edx,%ebx
Code; c0137565 <kmem_cache_alloc_batch+55/110>
7: 8b 51 0c mov 0xc(%ecx),%edx
Code; c0137568 <kmem_cache_alloc_batch+58/110>
a: 89 41 14 mov %eax,0x14(%ecx)
Code; c013756b <kmem_cache_alloc_batch+5b/110>
d: 01 d3 add %edx,%ebx
Code; c013756d <kmem_cache_alloc_batch+5d/110>
f: 40 inc %eax
Code; c013756e <kmem_cache_alloc_batch+5e/110>
10: 0f 84 89 00 00 00 je 9f <_EIP+0x9f>
1 warning issued. Results may not be reliable.
This is the second oops inside kmem_cache_alloc_batch, the problem can be talked of as reproducable.
This is a 2.4.21-rc7+aic20030603 kernel.
Regards,
Stephan
Hello all,
looking at code around my problem I discovered this:
static inline void * __kmem_cache_alloc (kmem_cache_t *cachep, int flags)
{
unsigned long save_flags;
void* objp;
kmem_cache_alloc_head(cachep, flags);
try_again:
local_irq_save(save_flags);
#ifdef CONFIG_SMP
{
cpucache_t *cc = cc_data(cachep);
if (cc) {
if (cc->avail) {
STATS_INC_ALLOCHIT(cachep);
objp = cc_entry(cc)[--cc->avail];
} else {
STATS_INC_ALLOCMISS(cachep);
objp = kmem_cache_alloc_batch(cachep,cc,flags);
if (!objp)
goto alloc_new_slab_nolock;
}
} else {
spin_lock(&cachep->spinlock);
objp = kmem_cache_alloc_one(cachep);
spin_unlock(&cachep->spinlock);
}
}
#else
objp = kmem_cache_alloc_one(cachep);
#endif
local_irq_restore(save_flags);
return objp;
alloc_new_slab:
#ifdef CONFIG_SMP
spin_unlock(&cachep->spinlock);
alloc_new_slab_nolock:
#endif
local_irq_restore(save_flags);
if (kmem_cache_grow(cachep, flags))
/* Someone may have stolen our objs. Doesn't matter, we'll
* just come back here again.
*/
goto try_again;
return NULL;
}
I suggest it for most-absurd-goto-usage-award.
1) There seems to be no reference for symbol "alloc_new_slab"
2) "spin_unlock" (right below) is never reached
3) The not-ifdef'ed code below is only used if CONFIG_SMP
4) The code "alloc_new_slab_nolock" is referenced only once by a goto
(why not simply pasted there?)
This does not look like a problem, it only is damn ugly. I have no idea
what this code actually does, but it looks patched-to-the-limit. Has
anybody reviewed slab regarding CONFIG_SMP?
Regards,
Stephan
Hello author,
shoot me for the last comment regarding __kmem_cache_alloc (which means: forget
it).
Still you have significant source code duplication between "#define
kmem_cache_alloc_one" and "void* kmem_cache_alloc_batch".
How about an exit-symbol parameter?
Regards,
Stephan
Hello Justin,
another thing I stumbled across: if you compile the latest aic-driver
(20030603) for smp, but boot the kernel with nosmp flag, the driver hangs
during device-scan.
Regards,
Stephan
Hello all,
I just finished another bunch of tests around the discussed issue and it's
getting to an end.
Yesterday I started using the test box with UP kernel instead of SMP, because I
have the feeling the whole problem is somewhere around an SMP race condition.
As far as I can see now the box runs 24h stable _and_ (and this is the
important part) one problem I did not talk about till now is completely gone:
During the whole testing with SMP I recognised that the tar-verify always
brought up "content differs" warnings. Which basically means that the filesize
is ok but the content is not. As there might be various causes for this (bad
tape, bad drive, bad cabling) I did not give very much about it. But it turns
out there are no more such warnings when using an UP kernel (on the same box
with the complete same hardware including tapes).
>From this experience I would conclude the following (for my personal test
case):
1) aic-driver has problems with smp/up switching (meaning crashes when trying
an SMP build with nosmp). This is completely reproducable.
2) aic-driver (almost no matter what version) has problems with SMP setup and
tape drives. Obviously data integrity is not given. This is completely
reproducable in my test setup.
For Marcelo:
It seems you can take any version of the aic driver for small box setups with
UP, I never saw any troubles with it. As soon as you look at SMP flush it down
the t..let.
For Justin:
Thank you for your continous openness and support in the whole issue in form of
exactly _zero_ comments (,besides "how do you know aic is to blame?").
For Willy:
I honour your efforts, but we are not capable of solving the issue.
For Oleg:
Stay tuned, I will test the re-creation issue and your patch.
And now I go and buy a Symbios controller and re-try.
Regards,
Stephan
> For Justin:
> Thank you for your continous openness and support in the whole issue in form
> of exactly _zero_ comments (,besides "how do you know aic is to blame?").
Stephan,
Other than your most recent complaint that the driver doesn't function
correctly in an SMP kernel when you specify the nosmp option, you have
yet to provide any information that points to a problem in the aic7xxx
driver. Without such information, I'm at a loss to help you. One thing
that you forgot to mention in your "report" is that data corruption can
happen in many more places than just in the aic7xxx driver. The data
could be corrupted by a VM bug, a buffer layer bug, or a filesystem
bug. When testing our drivers against RHAS2.1 we found that the stock
kernel had data corruption issues very similar to what your are talking
about when run on very fast, hyperthreading, SMP machines. The data
corruption occurred with any SCSI controller we tried, regardless of vendor.
If you continue to feel that the aic7xxx driver is at fault, I encourage you
to try to reproduce this failure with someone elses card. I think you'll
find that the problem persists even with this change.
I will be more than happy to look into why the aic7xxx driver may not
operate correctly in an SMP kernel with the nosmp option. Considering
that your complaint about this failure came into my email box just
yesterday, perhaps you can give me just a few days to look into this
before you decide to call me unresponsive. Since I'm attending a
conference this whole week, I won't even be able to look at this
until I return on Monday of next week.
I'm sorry that you are experiencing data corruption. I take those
issues very seriously, but all of your panics and other reports point
to issues elsewhere in the kernel that should be resolved before you
conclude that the data corruption you are experiencing is somehow
the aic7xxx driver's fault. I'll be more than happy to fess up to
and correct any defect that is found in the driver, but I cannot fix
bugs that I cannot reproduce and that have no usable debugging information
associated with them.
--
Justin
On Mon, 9 Jun 2003, Stephan von Krawczynski wrote:
> During the whole testing with SMP I recognised that the tar-verify always
> brought up "content differs" warnings. Which basically means that the filesize
> is ok but the content is not. As there might be various causes for this (bad
> tape, bad drive, bad cabling) I did not give very much about it. But it turns
> out there are no more such warnings when using an UP kernel (on the same box
> with the complete same hardware including tapes).
>
> >From this experience I would conclude the following (for my personal test
> case):
Can you also try this with 2.5?
> 1) aic-driver has problems with smp/up switching (meaning crashes when trying
> an SMP build with nosmp). This is completely reproducable.
Can you also try an SMP kernel with noapic?
> 2) aic-driver (almost no matter what version) has problems with SMP setup and
> tape drives. Obviously data integrity is not given. This is completely
> reproducable in my test setup.
I have had problems with symmetric interrupt handling but can normally get
it working with noapic. And no it doesn't appear to be an interrupt
routing problem on my box (If it is someone please clearly state what the
exact problem is to me)
Zwane
--
function.linuxpower.ca
On Mon, 09 Jun 2003 15:32:11 +0000
"Justin T. Gibbs" <[email protected]> wrote:
> > For Justin:
> > Thank you for your continous openness and support in the whole issue in
> > form of exactly _zero_ comments (,besides "how do you know aic is to
> > blame?").
>
> Stephan,
>
> Other than your most recent complaint that the driver doesn't function
> correctly in an SMP kernel when you specify the nosmp option, you have
> yet to provide any information that points to a problem in the aic7xxx
> driver.
Dear Justin,
I am really not complaining about you not helping specifically _me_, I am
complaining about your quite visible general opinion that this whole thing is
really not serious, or maybe it is only that you are not making your efforts
transparent to others, I don't know.
> Without such information, I'm at a loss to help you. One thing
> that you forgot to mention in your "report" is that data corruption can
> happen in many more places than just in the aic7xxx driver.
<sarcasm>Did I mention the big magnet right beside the tape?</sarcasm>
> The data
> could be corrupted by a VM bug,
VM is quite the same, tar'ing to /dev/tape or /var/bak/mybackfile.tar.
> a buffer layer bug, or a filesystem
> bug.
/dev/tape with a filesystem? Have you read what we are talking about?
> When testing our drivers against RHAS2.1 we found that the stock
> kernel had data corruption issues very similar to what your are talking
> about when run on very fast, hyperthreading, SMP machines. The data
> corruption occurred with any SCSI controller we tried, regardless of vendor.
My question is: is it solved?
> If you continue to feel that the aic7xxx driver is at fault, I encourage you
> to try to reproduce this failure with someone elses card. I think you'll
> find that the problem persists even with this change.
This is not the first discussion about an instability in aic. We had the same
thing months ago for another setup (where btw you said the same thing). Back
then I switched to symbios and everything went ok from then on. Thing is: I am
not a big learner, I just re-tried with aic now, and it happened again. I will
do the same thing now like back then: switching to symbios. Be sure I am going
to tell my experiences. Be aware that I have already received reports from
others with the same problem solving it the same way - switching away from aic.
> I will be more than happy to look into why the aic7xxx driver may not
> operate correctly in an SMP kernel with the nosmp option. Considering
> that your complaint about this failure came into my email box just
> yesterday, perhaps you can give me just a few days to look into this
> before you decide to call me unresponsive. Since I'm attending a
> conference this whole week, I won't even be able to look at this
> until I return on Monday of next week.
Justin, this is nothing quite serious, I just mentioned it for a feedback to
something _simple_.
> I'm sorry that you are experiencing data corruption. I take those
> issues very seriously, but all of your panics and other reports point
> to issues elsewhere in the kernel that should be resolved before you
> conclude that the data corruption you are experiencing is somehow
> the aic7xxx driver's fault. I'll be more than happy to fess up to
> and correct any defect that is found in the driver, but I cannot fix
> bugs that I cannot reproduce and that have no usable debugging information
> associated with them.
What exactly is "elsewhere" if your data is bogus when tar'ing onto /dev/tape
via aic and it is completely ok when tar'ing into a file via reiserfs/3ware ?
There is not really much left between tar and the aic-driver and the tape.
Where is your favourite in this game?
Regards,
Stephan
On Mon, 9 Jun 2003 21:38:16 -0400 (EDT)
Zwane Mwaikambo <[email protected]> wrote:
> On Mon, 9 Jun 2003, Stephan von Krawczynski wrote:
>
> > During the whole testing with SMP I recognised that the tar-verify always
> > brought up "content differs" warnings. Which basically means that the
> > filesize is ok but the content is not. As there might be various causes for
> > this (bad tape, bad drive, bad cabling) I did not give very much about it.
> > But it turns out there are no more such warnings when using an UP kernel
> > (on the same box with the complete same hardware including tapes).
> >
> > >From this experience I would conclude the following (for my personal test
> > case):
>
> Can you also try this with 2.5?
Uh, do I trust Linus ? ;-) Well, probably I am going to take a look. The whole
story eats a lot of time as I have to deal with GBs of data for every single
test.
> > 1) aic-driver has problems with smp/up switching (meaning crashes when
> > trying an SMP build with nosmp). This is completely reproducable.
>
> Can you also try an SMP kernel with noapic?
Can you clarify? Do you mean options "nosmp noapic" or just "noapic" on SMP
kernel?
> > 2) aic-driver (almost no matter what version) has problems with SMP setup
> > and tape drives. Obviously data integrity is not given. This is completely
> > reproducable in my test setup.
>
> I have had problems with symmetric interrupt handling but can normally get
> it working with noapic. And no it doesn't appear to be an interrupt
> routing problem on my box (If it is someone please clearly state what the
> exact problem is to me)
Hm, my question is: if it were exclusively an apic problem, why do other
controllers (in a filesystem environment) work flawlessly. Maybe the driver and
apic simply have differing opinions in certain race cases, but that does not
mean that apic is always to blame, does it?
Regards,
Stephan
On Tue, 10 Jun 2003, Stephan von Krawczynski wrote:
> Uh, do I trust Linus ? ;-) Well, probably I am going to take a look. The whole
> story eats a lot of time as I have to deal with GBs of data for every single
> test.
Cool, i'll wait on that then.
> Can you clarify? Do you mean options "nosmp noapic" or just "noapic" on SMP
> kernel?
Kernel built with CONFIG_SMP and booted with 'noapic' kernel parameter
> Hm, my question is: if it were exclusively an apic problem, why do other
> controllers (in a filesystem environment) work flawlessly. Maybe the driver and
> apic simply have differing opinions in certain race cases, but that does not
> mean that apic is always to blame, does it?
I'm a bit wary of blaming the interrupt routing setup, as i have also
noted that other devices work fine. But we have to be objective and try
and isolate things first. You seem to have a good head start on that.
Zwane
--
function.linuxpower.ca
On Tue, 10 Jun 2003 08:51:35 -0400 (EDT)
Zwane Mwaikambo <[email protected]> wrote:
> > Can you clarify? Do you mean options "nosmp noapic" or just "noapic" on SMP
> > kernel?
>
> Kernel built with CONFIG_SMP and booted with 'noapic' kernel parameter
Ok. To speed up the tests I call it "ok" if there are no verify errors within
70 GB and "fail" if there are one or more.
I have tried rc7+aic20030603 SMP with noapic and it is ok.
/proc/interrupts:
CPU0 CPU1
0: 1061143 0 XT-PIC timer
1: 6582 0 XT-PIC keyboard
2: 0 0 XT-PIC cascade
5: 1229 0 XT-PIC EMU10K1
9: 9269694 0 XT-PIC aic7xxx, aic7xxx, 3ware Storage Controller, fcpcipnp, eth0, eth1, eth2
12: 129555 0 XT-PIC PS/2 Mouse
15: 4 0 XT-PIC ide1
NMI: 0 0
LOC: 1061054 1061028
ERR: 1
MIS: 0
Reading around the whole interrupt stuff I came across a very simple idea which
I am going to test right now. See you in some hours ;-)
Regards,
Stephan
On Tue, 10 Jun 2003, Stephan von Krawczynski wrote:
> On Tue, 10 Jun 2003 08:51:35 -0400 (EDT)
> Zwane Mwaikambo <[email protected]> wrote:
>
> > > Can you clarify? Do you mean options "nosmp noapic" or just "noapic" on SMP
> > > kernel?
> >
> > Kernel built with CONFIG_SMP and booted with 'noapic' kernel parameter
>
> Ok. To speed up the tests I call it "ok" if there are no verify errors within
> 70 GB and "fail" if there are one or more.
> I have tried rc7+aic20030603 SMP with noapic and it is ok.
Can you also test it with an SMP kernel and only maxcpus=1 ?
> Reading around the whole interrupt stuff I came across a very simple idea which
> I am going to test right now. See you in some hours ;-)
Cool
Zwane
--
function.linuxpower.ca
>> Stephan,
>>
>> Other than your most recent complaint that the driver doesn't function
>> correctly in an SMP kernel when you specify the nosmp option, you have
>> yet to provide any information that points to a problem in the aic7xxx
>> driver.
>
> Dear Justin,
>
> I am really not complaining about you not helping specifically _me_, I am
> complaining about your quite visible general opinion that this whole thing is
> really not serious, or maybe it is only that you are not making your efforts
> transparent to others, I don't know.
I never said that it wasn't serios, I just haven't seen any indication
that this problem is caused by my driver. There is a big difference.
If your complaint is that I typically help people to solve their problems
*off-list*, then I'm sorry if that offends your sensibilities.
I personally don't think that I need to CC a million people while I'm
passing back various debugging information and asking for new output. Its
just a lot of noise for the majority of people on the linux-kernel list.
>> Without such information, I'm at a loss to help you. One thing
>> that you forgot to mention in your "report" is that data corruption can
>> happen in many more places than just in the aic7xxx driver.
>
> <sarcasm>Did I mention the big magnet right beside the tape?</sarcasm>
I'm just sick of being blamed for anything that goes wrong on any system
that happens to have an aic7xxx controller in it. 99% or the time its
not my fault, but I suppose since I debug and resolve these issues off
list for people that contact me, the general assumption is that these
issues are the aic7xxx driver's fault.
>> The data could be corrupted by a VM bug,
>
> VM is quite the same, tar'ing to /dev/tape or /var/bak/mybackfile.tar.
No, the VM activity is quite different.
>> a buffer layer bug, or a filesystem bug.
>
> /dev/tape with a filesystem? Have you read what we are talking about?
Where did you get the data to place on the tape? /dev/zero?
>> When testing our drivers against RHAS2.1 we found that the stock
>> kernel had data corruption issues very similar to what your are talking
>> about when run on very fast, hyperthreading, SMP machines. The data
>> corruption occurred with any SCSI controller we tried, regardless of vendor.
>
> My question is: is it solved?
My understanding is that it was fixed in 2.4.18 level kernels, but since
I don't know the root cause of the corruption, it could have just been
made more difficult to reproduce.
>> If you continue to feel that the aic7xxx driver is at fault, I encourage you
>> to try to reproduce this failure with someone elses card. I think you'll
>> find that the problem persists even with this change.
>
> This is not the first discussion about an instability in aic.
I'm not talking about *every case of aic7xxx driver instability*, I'm
talking about *this particular case* of driver instability. Problems
that to the naive user look similar are typically not.
>> I will be more than happy to look into why the aic7xxx driver may not
>> operate correctly in an SMP kernel with the nosmp option. Considering
>> that your complaint about this failure came into my email box just
>> yesterday, perhaps you can give me just a few days to look into this
>> before you decide to call me unresponsive. Since I'm attending a
>> conference this whole week, I won't even be able to look at this
>> until I return on Monday of next week.
>
> Justin, this is nothing quite serious, I just mentioned it for a feedback to
> something _simple_.
It's the only thing you've mentioned that I have enough information to
look at.
>> I'm sorry that you are experiencing data corruption. I take those
>> issues very seriously, but all of your panics and other reports point
>> to issues elsewhere in the kernel that should be resolved before you
>> conclude that the data corruption you are experiencing is somehow
>> the aic7xxx driver's fault. I'll be more than happy to fess up to
>> and correct any defect that is found in the driver, but I cannot fix
>> bugs that I cannot reproduce and that have no usable debugging information
>> associated with them.
>
> What exactly is "elsewhere" if your data is bogus when tar'ing onto /dev/tape
> via aic and it is completely ok when tar'ing into a file via reiserfs/3ware ?
> There is not really much left between tar and the aic-driver and the tape.
I suggest you go browse the code that is exercised by such an activity
before you say that.
--
Jusitn
On Tue, 10 Jun 2003 09:51:34 -0400 (EDT)
Zwane Mwaikambo <[email protected]> wrote:
> > Reading around the whole interrupt stuff I came across a very simple idea which
> > I am going to test right now. See you in some hours ;-)
>
> Cool
Hoho, how about this one:
ksymoops 2.4.8 on i686 2.4.21-rc7-aic. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.21-rc7-aic/ (default)
-m /boot/System.map-2.4.21-rc7-aic (default)
Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.
Jun 10 17:50:53 admin kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000b2c
Jun 10 17:50:53 admin kernel: c0221c37
Jun 10 17:50:53 admin kernel: *pde = 00000000
Jun 10 17:50:53 admin kernel: Oops: 0000
Jun 10 17:50:53 admin kernel: CPU: 0
Jun 10 17:50:53 admin kernel: EIP: 0010:[st_do_scsi+295/384] Not tainted
Jun 10 17:50:53 admin kernel: EIP: 0010:[<c0221c37>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Jun 10 17:50:53 admin kernel: EFLAGS: 00010246
Jun 10 17:50:53 admin kernel: eax: 00000000 ebx: 00000001 ecx: 00000000 edx: c34a0424
Jun 10 17:50:53 admin kernel: esi: f5f2c180 edi: 00000b00 ebp: 00008090 esp: dead5edc
Jun 10 17:50:53 admin kernel: ds: 0018 es: 0018 ss: 0018
Jun 10 17:50:53 admin kernel: Process tar (pid: 4004, stackpage=dead5000)
Jun 10 17:50:53 admin kernel: Stack: f5f2c180 00000000 c0090000 00008000 c0221a10 00015f90 00000000 dead5f7c
Jun 10 17:50:53 admin kernel: c34a0400 00000001 00008000 c0223abd 00000000 c34a0400 dead5f40 00008000
Jun 10 17:50:53 admin kernel: 00000002 00015f90 00000000 00000001 00000000 00000000 c34a04c0 c34a0450
Jun 10 17:50:53 admin kernel: Call Trace: [st_sleep_done+0/256] [read_tape+269/1024] [scsi_finish_command+152/208] [st_read+1015/1152] [sys_read+155/384]
Jun 10 17:50:53 admin kernel: Call Trace: [<c0221a10>] [<c0223abd>] [<c01ede38>] [<c02241a7>] [<c0141c0b>]
Jun 10 17:50:53 admin kernel: [<c010782f>]
Jun 10 17:50:53 admin kernel: Code: 8b 5f 2c 89 74 24 04 89 3c 24 e8 ea fb ff ff 89 43 1c eb a5
>>EIP; c0221c37 <st_do_scsi+127/180> <=====
>>edx; c34a0424 <_end+310e0e4/38547d20>
>>esi; f5f2c180 <_end+35b99e40/38547d20>
>>esp; dead5edc <_end+1e743b9c/38547d20>
Trace; c0221a10 <st_sleep_done+0/100>
Trace; c0223abd <read_tape+10d/400>
Trace; c01ede38 <scsi_finish_command+98/d0>
Trace; c02241a7 <st_read+3f7/480>
Trace; c0141c0b <sys_read+9b/180>
Trace; c010782f <system_call+33/38>
Code; c0221c37 <st_do_scsi+127/180>
00000000 <_EIP>:
Code; c0221c37 <st_do_scsi+127/180> <=====
0: 8b 5f 2c mov 0x2c(%edi),%ebx <=====
Code; c0221c3a <st_do_scsi+12a/180>
3: 89 74 24 04 mov %esi,0x4(%esp,1)
Code; c0221c3e <st_do_scsi+12e/180>
7: 89 3c 24 mov %edi,(%esp,1)
Code; c0221c41 <st_do_scsi+131/180>
a: e8 ea fb ff ff call fffffbf9 <_EIP+0xfffffbf9>
Code; c0221c46 <st_do_scsi+136/180>
f: 89 43 1c mov %eax,0x1c(%ebx)
Code; c0221c49 <st_do_scsi+139/180>
12: eb a5 jmp ffffffb9 <_EIP+0xffffffb9>
1 warning issued. Results may not be reliable.
Anybody able to comment on that?
Regards,
Stephan
Hello!
On Tue, Jun 10, 2003 at 05:55:06PM +0200, Stephan von Krawczynski wrote:
> Jun 10 17:50:53 admin kernel: Process tar (pid: 4004, stackpage=dead5000)
Hehe, whith this kind of stackpage, this process was doomed just after the fork() ;)
> >>EIP; c0221c37 <st_do_scsi+127/180> <=====
It seems that in st_do_scsi, in this line
(STp->buffer)->syscall_result = st_chk_result(STp, SRpnt);
STp is garbage for some reason, though it was valid before.
Bye,
Oleg
On Tue, 10 Jun 2003 09:38:31 -0600
"Justin T. Gibbs" <[email protected]> wrote:
> I never said that it wasn't serios, I just haven't seen any indication
> that this problem is caused by my driver. There is a big difference.
> If your complaint is that I typically help people to solve their problems
> *off-list*, then I'm sorry if that offends your sensibilities.
It does not offend my sensibilities, it is simply damaging the available
information about typical problems and their solving. If you don't do it open,
there is no way for others to follow your thoughts and debugging and therefore
you are confronted hundred times with the same questions. People have no choice
but asking you, because your debugging cases are hidden.
> I personally don't think that I need to CC a million people while I'm
> passing back various debugging information and asking for new output. Its
> just a lot of noise for the majority of people on the linux-kernel list.
Keep in mind the broad user base of aics. Compared to other stuff in the kernel
your messages may be a whole lot more interesting to listening LKML readers
than other threads.
> I'm just sick of being blamed for anything that goes wrong on any system
> that happens to have an aic7xxx controller in it. 99% or the time its
> not my fault, but I suppose since I debug and resolve these issues off
> list for people that contact me, the general assumption is that these
> issues are the aic7xxx driver's fault.
No, you produce your own problem. You cannot help every single who has a
problem around his box/aic. This is impossible. So you have to create a
valuable information basis others can read and think about. This is most simply
done by debugging problems _openly_.
> >> a buffer layer bug, or a filesystem bug.
> >
> > /dev/tape with a filesystem? Have you read what we are talking about?
>
> Where did you get the data to place on the tape? /dev/zero?
Don't be silly. If reading a file from some hd would be a problem in itself,
then we could all go home and have a beer. You are talking about the minimum
requirement for an os.
> >> When testing our drivers against RHAS2.1 we found that the stock
> >> kernel had data corruption issues very similar to what your are talking
> >> about when run on very fast, hyperthreading, SMP machines. The data
> >> corruption occurred with any SCSI controller we tried, regardless of
> >vendor.
> >
> > My question is: is it solved?
>
> My understanding is that it was fixed in 2.4.18 level kernels, but since
> I don't know the root cause of the corruption, it could have just been
> made more difficult to reproduce.
Can you point to some URL where information about this is available?
> > This is not the first discussion about an instability in aic.
>
> I'm not talking about *every case of aic7xxx driver instability*, I'm
> talking about *this particular case* of driver instability. Problems
> that to the naive user look similar are typically not.
Sorry, I should have said: "This is not the first discussion about an
instability in aic between you and me".
> > Justin, this is nothing quite serious, I just mentioned it for a feedback
> > to something _simple_.
>
> It's the only thing you've mentioned that I have enough information to
> look at.
No, it is only the most simple one. Unfortunately scsi-driver development is
everything but simple for the standard problem case. It requires the ability to
set up equipment just like the discussed case for reproduction of the problem.
Of course only for cases the author cannot reproduce inside his software via
brain.
All information needed to reproduce the main problem is available in this
thread.
> > What exactly is "elsewhere" if your data is bogus when tar'ing onto
> > /dev/tape via aic and it is completely ok when tar'ing into a file via
> > reiserfs/3ware ? There is not really much left between tar and the
> > aic-driver and the tape.
>
> I suggest you go browse the code that is exercised by such an activity
> before you say that.
What kind of a statement is this? I spent days for reproduction of the error
case, every single test takes something from 3,5 to 24 hours. And you tell me
"well, guy, if you want to know what I know go ahead and read my code", well
knowing that at least 50% of the knowledge is not in the code but in the
surrounding material you read to get where you are. I don't want to become scsi
maintainer, I want to solve a problem - for me _and_ for others (and this is
why I do it openly).
I really have not understood what you want, besides not being spoken to.
If I were you I would try to _prove_ that it is _not_ my problem, in best by
finding the real problem. Unfortunately I (and some others) do have the
impression that you simply live by the idea that as long as nobody can _prove_
your code has a problem, there is no problem.
This is in fact the bofh lifestyle that works for you (as long as you do not
meet an equally skilled person), but not for the users (spell "rest of us").
Back to the facts:
Simple question: you say its not a problem inside the driver. Ok. Question: how
to you prove that? Can you specify a test setup (program or something) I can
check to see that there is no problem with the general SMP tape usage of the
aic driver? I mean you must have seen something working, or not?
Regards,
Stephan
On Tue, 10 Jun 2003 09:51:34 -0400 (EDT)
Zwane Mwaikambo <[email protected]> wrote:
> > Reading around the whole interrupt stuff I came across a very simple idea
> > which I am going to test right now. See you in some hours ;-)
I now tried rc7+aic20030603 SMP apic _but_ interrupts from aic only bound to
single cpu. I did this with help of irqbalance from Arjan.
/proc/interrupts:
CPU0 CPU1
0: 5148 571297 IO-APIC-edge timer
1: 9733 97 IO-APIC-edge keyboard
2: 0 0 XT-PIC cascade
12: 43720 1271 IO-APIC-edge PS/2 Mouse
15: 4 4 IO-APIC-edge ide1
17: 1297 1336383 IO-APIC-level 3ware Storage Controller
18: 344 16447 IO-APIC-level eth0, eth1
20: 570 3 IO-APIC-level fcpcipnp
21: 57292 340 IO-APIC-level eth2
22: 443161 2776 IO-APIC-level aic7xxx
23: 31 2005037 IO-APIC-level aic7xxx
26: 0 0 IO-APIC-level EMU10K1
NMI: 593524 582633
LOC: 576356 576330
ERR: 0
MIS: 0
The controller used is the second aic7xxx. The 31 interrupts on CPU0 have
occured before the test. This setup fails during verify (data corruption).
I would say that the interrupt code of the aic in itself is therefore ok with
SMP. If it were a SMP race condition inside the interrupt routine this test
should have been ok (as only one CPU is used).
Regards,
Stephan
>> I never said that it wasn't serios, I just haven't seen any indication
>> that this problem is caused by my driver. There is a big difference.
>> If your complaint is that I typically help people to solve their problems
>> *off-list*, then I'm sorry if that offends your sensibilities.
>
> It does not offend my sensibilities, it is simply damaging the available
> information about typical problems and their solving. If you don't do it open,
> there is no way for others to follow your thoughts and debugging and therefore
> you are confronted hundred times with the same questions. People have no
> choice but asking you, because your debugging cases are hidden.
99% of the problems have to do with broken interrupt routing. There is
plenty of information about this issue on the mailing lists, but people
still ask me. It seems that SCSI is suitably complex for the common
user that even when the driver explictly tells you "your drive is dying",
I get email asking how I can fix my driver so that their drive doesn't
die. The same is true if you look at the large body of dump card state
information that people have posted from the aic7xxx and aic79xx drivers
to this list. Anyone who gets this type of output seems to think that
their problem must be the same as any other person that gets a dump
card state. I don't think that any amount of posting information about
how I decifer what the registers are telling me will cut down on this
confusion.
>> I'm just sick of being blamed for anything that goes wrong on any system
>> that happens to have an aic7xxx controller in it. 99% or the time its
>> not my fault, but I suppose since I debug and resolve these issues off
>> list for people that contact me, the general assumption is that these
>> issues are the aic7xxx driver's fault.
>
> No, you produce your own problem. You cannot help every single who has a
> problem around his box/aic. This is impossible. So you have to create a
> valuable information basis others can read and think about. This is most
> simply done by debugging problems _openly_.
I just don't believe that this is true. Most of the questions that people
email me directly are questions that are easily answered by a google search.
In otherwords, the information is already readily available. It is just
easier to send email than to actually investigate a potential solution
to the problem. So, people send email and ask the same questions, and
get the same answers.
>> >> a buffer layer bug, or a filesystem bug.
>> >
>> > /dev/tape with a filesystem? Have you read what we are talking about?
>>
>> Where did you get the data to place on the tape? /dev/zero?
>
> Don't be silly. If reading a file from some hd would be a problem in itself,
> then we could all go home and have a beer. You are talking about the minimum
> requirement for an os.
You're the one being silly. You are oversimplifying what it takes to
do I/O and the components that are involved in doing that I/O. If you
don't understand that the load on several components in the kernel changes,
often in subtle but important ways, when you change the target of your
I/O, then I don't know what to say to you.
>> >> When testing our drivers against RHAS2.1 we found that the stock
>> >> kernel had data corruption issues very similar to what your are talking
>> >> about when run on very fast, hyperthreading, SMP machines. The data
>> >> corruption occurred with any SCSI controller we tried, regardless of
>> > vendor.
>> >
>> > My question is: is it solved?
>>
>> My understanding is that it was fixed in 2.4.18 level kernels, but since
>> I don't know the root cause of the corruption, it could have just been
>> made more difficult to reproduce.
>
> Can you point to some URL where information about this is available?
https://rhn.redhat.com/errata/RHSA-2003-147.html
This is just the most recent attempt to fix these issues. You might
want to go back and read the other erratas.
>> > Justin, this is nothing quite serious, I just mentioned it for a feedback
>> > to something _simple_.
>>
>> It's the only thing you've mentioned that I have enough information to
>> look at.
>
> No, it is only the most simple one. Unfortunately scsi-driver development is
> everything but simple for the standard problem case. It requires the ability
> to set up equipment just like the discussed case for reproduction of the
> problem. Of course only for cases the author cannot reproduce inside his
> software via brain. All information needed to reproduce the main problem is
> available in this thread.
To reproduce your problem, I need the same MB, memory configuration, drive
types, a 3ware card, and the same tape drive you have. I have tried various
backup scenarios with *other hardware* and have failed to reproduce your
problem.
>> I suggest you go browse the code that is exercised by such an activity
>> before you say that.
>
> What kind of a statement is this?
Its one way of saying that you need to understand all of the code involved
with turing a write syscall into a call into the aic7xxx driver. If you
review the code path, you'll find that there are thousands of lines of
code involved that have nothing to do with SCSI or the aic7xxx driver.
To say that you have created a simple example that proves that the problem
is in the aic7xxx driver is naive at best.
> I want to solve a problem - for me _and_ for others (and this is
> why I do it openly).
> I really have not understood what you want, besides not being spoken to.
> If I were you I would try to _prove_ that it is _not_ my problem, in best by
> finding the real problem.
As I said before, I have tried to reproduce your problem, but I cannot.
I have no hope of proving that a problem I cannot replicate is not a
problem with my driver.
Some additional things that might help:
o Charaterize the type of corruption that you are seeing in a more
formal way. For example, use an easy to verify pattern that will
allow you to actually analyze the corruption. Is the corruption
following some pattern?
o Can you determine if the corruption is happening when writting to
the tape vs. reading from it? You might do this by writing to
the tape in an SMP mode that shows data corruption and then validate
the driver in a safe, UP, mode and vice-versa.
o What happens when you use different hardware/FS type/etc for the source
and destination?
> Unfortunately I (and some others) do have the
> impression that you simply live by the idea that as long as nobody can
> _prove_ your code has a problem, there is no problem.
> This is in fact the bofh lifestyle that works for you (as long as you do not
> meet an equally skilled person), but not for the users (spell "rest of us").
In this case, the information you have so far provided points away from
the aic7xxx driver. I don't say that in all cases that I investigate,
but I believe it to be true in this case. If past experience is any guide,
80-90% of the problems like this that I have debugged (and that I could
actually replicate) were induced by using the aic7xxx driver, but turned
out to be bugs in other components in the system. The aic7xxx driver
happens to be one of the more agressive SCSI drivers in the system and
that can often lead to finding bugs in other components.
> Back to the facts:
> Simple question: you say its not a problem inside the driver. Ok. Question:
> how to you prove that? Can you specify a test setup (program or something) I
> can check to see that there is no problem with the general SMP tape usage of
> the aic driver? I mean you must have seen something working, or not?
The only way to do this is to find the actual bug. The problem feels like
a VM or FS race condition most likely caused by having the source controller and
the destination controller on separate interrupts in the apic case so that
you have real concurrency in the system. In the non apic case, it looks
like everyone shares the same interrupt, so you cannot field interrupts
for both the 3ware and the aic7xxx driver at the same time. I also say
this because data corruption is something that is very difficult for the
aic7xxx driver to acomplish without there being some kind of error message
from the driver.
I have lots of test setups that show the aic7xxx and aic79xx driver working
just fine in PIII and P4 dual and quad configurations with and without apic
interrupt routing and writing to tape. There's not much more that I can
do here without having your exact system here or having more information.
--
Justin
On Tue, 10 Jun 2003, Stephan von Krawczynski wrote:
> The controller used is the second aic7xxx. The 31 interrupts on CPU0 have
> occured before the test. This setup fails during verify (data corruption).
>
> I would say that the interrupt code of the aic in itself is therefore ok with
> SMP. If it were a SMP race condition inside the interrupt routine this test
> should have been ok (as only one CPU is used).
Thanks for verifying this, at least i know the problem isn't with
interrupt routing in your specific case.
Zwane
--
function.linuxpower.ca
On Tue, 10 Jun 2003, Stephan von Krawczynski wrote:
> occured before the test. This setup fails during verify (data corruption).
Can you reproduce this with disks only?
Zwane
--
function.linuxpower.ca
On Tue, 10 Jun 2003 14:15:58 -0400 (EDT)
Zwane Mwaikambo <[email protected]> wrote:
> On Tue, 10 Jun 2003, Stephan von Krawczynski wrote:
>
> > The controller used is the second aic7xxx. The 31 interrupts on CPU0 have
> > occured before the test. This setup fails during verify (data corruption).
> >
> > I would say that the interrupt code of the aic in itself is therefore ok
> > with SMP. If it were a SMP race condition inside the interrupt routine this
> > test should have been ok (as only one CPU is used).
>
> Thanks for verifying this, at least i know the problem isn't with
> interrupt routing in your specific case.
>
> Zwane
I guess your comment is a bit ahead of my tests. I just completed the test with
rc7+aic20030603 SMP, apic and maxcpus=1. It fails.
This means that although there is only one CPU used through the whole kernel
the data corruption occurs.
I would therefore conclude that the corruption is only possible if in fact the
standard code path is flaky in terms of data completeness per request.
Something like a broken synchronous action, a read request coming back
completed although it is in fact still running or the like.
May also be a misinterpretation of a kind of an "action completed" interrupt.
Or something like one interrupt for multiple running actions with a mixup of
the various causes.
To make sure it is not a problem in the SMP code path through the driver I have
to check a UP kernel with apic support enabled. I will do this tommorrow.
If this is ok then things are simple, because its nailed down to the SMP code
path without a concurrency cause.
Lets see ...
Regards,
Stephan
On Tue, 10 Jun 2003 12:07:00 -0600
"Justin T. Gibbs" <[email protected]> wrote:
> >> I never said that it wasn't serios, I just haven't seen any indication
> >> that this problem is caused by my driver. There is a big difference.
> >> If your complaint is that I typically help people to solve their problems
> >> *off-list*, then I'm sorry if that offends your sensibilities.
> >
> > It does not offend my sensibilities, it is simply damaging the available
> > information about typical problems and their solving. If you don't do it
> > open, there is no way for others to follow your thoughts and debugging and
> > therefore you are confronted hundred times with the same questions. People
> > have no choice but asking you, because your debugging cases are hidden.
>
> 99% of the problems have to do with broken interrupt routing. There is
> plenty of information about this issue on the mailing lists, but people
> still ask me.
You should state an exact definition of "broken interrupt routing" in this
case. The only thing I would call a broken interrupt routing is if an interrupt
does not show up at all. Everything else is in my eyes a broken interrupt
handling in the driver (generally spoken). A driver has (in my programming
world) to cope with:
- interrupts showing up immediately during the currently running interrupt
handling (immediate recausing)
- multiple interrupt causes per one shot (software or interrupt controller were
to lazy for producing single interrupts per cause)
- lost interrupts (may cause error condition of course but at least a message
in some log)
- continous interrupts (handler has to know when he is too long inside
interrupt and give the rest of the system a chance to survive)
- optimistic interrupt requeuing (handler has to know from the past what is the
right flow of interrupt causes in a multiple caused interrupt, though hardware
may be unable to tell him).
> I just don't believe that this is true. Most of the questions that people
> email me directly are questions that are easily answered by a google search.
> In otherwords, the information is already readily available. It is just
> easier to send email than to actually investigate a potential solution
> to the problem. So, people send email and ask the same questions, and
> get the same answers.
Do you have a FAQ?
> >> >> a buffer layer bug, or a filesystem bug.
> >> >
> >> > /dev/tape with a filesystem? Have you read what we are talking about?
> >>
> >> Where did you get the data to place on the tape? /dev/zero?
> >
> > Don't be silly. If reading a file from some hd would be a problem in
> > itself, then we could all go home and have a beer. You are talking about
> > the minimum requirement for an os.
>
> You're the one being silly. You are oversimplifying what it takes to
> do I/O and the components that are involved in doing that I/O. If you
> don't understand that the load on several components in the kernel changes,
> often in subtle but important ways, when you change the target of your
> I/O, then I don't know what to say to you.
Data corruption is nothing subtle. We are not talking about performance tweaks,
we are talking about the basics. Something like "a synchronous action (like
reading during a verify) has to be synchronous". We are not talking about a
hardware related problem on scsi bus. We are not talking about the box
stumbling over a massive data flood. We are talking about reading a file/device
to a memory buffer and doing a cmp action between two of those. If your os is
not able to perform something like this you can do virtually nothing, not even
booting (because your reading action corrupts the data).
> >> >> When testing our drivers against RHAS2.1 we found that the stock
> >> >> kernel had data corruption issues very similar to what your are talking
> >> >> about when run on very fast, hyperthreading, SMP machines. The data
> >> >> corruption occurred with any SCSI controller we tried, regardless of
> >> > vendor.
> >> >
> >> > My question is: is it solved?
> >>
> >> My understanding is that it was fixed in 2.4.18 level kernels, but since
> >> I don't know the root cause of the corruption, it could have just been
> >> made more difficult to reproduce.
> >
> > Can you point to some URL where information about this is available?
>
> https://rhn.redhat.com/errata/RHSA-2003-147.html
The scenario described there is unlikely for my case because
a) I have only 3 GB of mem
b) no hints are available that UP can solve the problem on the same hardware
> > No, it is only the most simple one. Unfortunately scsi-driver development
> > is everything but simple for the standard problem case. It requires the
> > ability to set up equipment just like the discussed case for reproduction
> > of the problem. Of course only for cases the author cannot reproduce
> > inside his software via brain. All information needed to reproduce the
> > main problem is available in this thread.
>
> To reproduce your problem, I need the same MB, memory configuration, drive
> types, a 3ware card, and the same tape drive you have. I have tried various
> backup scenarios with *other hardware* and have failed to reproduce your
> problem.
I have talked to others with similar problems and none has the same mb or a
3ware controller. All have problems with streamers on aic. All solutions I
heard so far were done by replacing aic by whatever strange controller they got
their hands on.
> >> I suggest you go browse the code that is exercised by such an activity
> >> before you say that.
> >
> > What kind of a statement is this?
>
> Its one way of saying that you need to understand all of the code involved
> with turing a write syscall into a call into the aic7xxx driver. If you
> review the code path, you'll find that there are thousands of lines of
> code involved that have nothing to do with SCSI or the aic7xxx driver.
> To say that you have created a simple example that proves that the problem
> is in the aic7xxx driver is naive at best.
To tell me it is not is just as good.
> In this case, the information you have so far provided points away from
> the aic7xxx driver. I don't say that in all cases that I investigate,
> but I believe it to be true in this case. If past experience is any guide,
> 80-90% of the problems like this that I have debugged (and that I could
> actually replicate) were induced by using the aic7xxx driver, but turned
> out to be bugs in other components in the system. The aic7xxx driver
> happens to be one of the more agressive SCSI drivers in the system and
> that can often lead to finding bugs in other components.
Agressive is indeed a good term for it. And it describes exactly what I don't
like about it. The primary goal of a driver (in my eyes) is to make some
connected hardware work as expected. It is definitely not its primary goal to
be overly brilliant and therefore detecting bugs in other subsystems. I have
told you months ago that a symbios driven systems feels somehow smoother and
faster - elegant. Whereas aic gives you the feeling someone tried to kick the
systems butt with a big hammer. Its a matter of style and _defensiveness_.
As long as you ride it agressively don't complain a lot of people go after you
for explanations.
And btw: you win nothing with your way, not even performance.
> > Back to the facts:
> > Simple question: you say its not a problem inside the driver. Ok. Question:
> > how to you prove that? Can you specify a test setup (program or something)
> > I can check to see that there is no problem with the general SMP tape usage
> > of the aic driver? I mean you must have seen something working, or not?
>
> The only way to do this is to find the actual bug. The problem feels like
> a VM or FS race condition most likely caused by having the source controller
> and the destination controller on separate interrupts in the apic case so
> that you have real concurrency in the system. In the non apic case, it looks
> like everyone shares the same interrupt, so you cannot field interrupts
> for both the 3ware and the aic7xxx driver at the same time. I also say
> this because data corruption is something that is very difficult for the
> aic7xxx driver to acomplish without there being some kind of error message
> from the driver.
Well, at least I managed to get some interesting statement from you after all.
I have to think about this a bit.
> I have lots of test setups that show the aic7xxx and aic79xx driver working
> just fine in PIII and P4 dual and quad configurations with and without apic
> interrupt routing and writing to tape.
This does only mean you have not yet met something similar to my setup. It does
not really prove a lot.
> There's not much more that I can
> do here without having your exact system here or having more information.
Well, the thing is, I try to achieve information. But since the whole issue is
all about lots of data I try to find an intelligent way to locate the cause of
it all. I am not very confident that analysis of the trashed data will lead
somewhere. I think narrowing the code path that leads to the problem by
multiple distinct test scenarios looks more/faster promising. Can you think of
something reducing the test complexity (not using tar, not comparing to a file
or whatever)?
Regards,
Stephan
>> 99% of the problems have to do with broken interrupt routing. There is
>> plenty of information about this issue on the mailing lists, but people
>> still ask me.
>
> You should state an exact definition of "broken interrupt routing" in this
> case. The only thing I would call a broken interrupt routing is if an
> interrupt does not show up at all.
That's the only definition for it and 99% of the email I field about
the aic7xxx driver is due to interrupts *not arriving*.
>> I just don't believe that this is true. Most of the questions that people
>> email me directly are questions that are easily answered by a google search.
>> In otherwords, the information is already readily available. It is just
>> easier to send email than to actually investigate a potential solution
>> to the problem. So, people send email and ask the same questions, and
>> get the same answers.
>
> Do you have a FAQ?
It's the driver readme file.
>> You're the one being silly. You are oversimplifying what it takes to
>> do I/O and the components that are involved in doing that I/O. If you
>> don't understand that the load on several components in the kernel changes,
>> often in subtle but important ways, when you change the target of your
>> I/O, then I don't know what to say to you.
>
> Data corruption is nothing subtle. We are not talking about performance tweaks,
> we are talking about the basics. Something like "a synchronous action (like
> reading during a verify) has to be synchronous". We are not talking about a
> hardware related problem on scsi bus. We are not talking about the box
> stumbling over a massive data flood. We are talking about reading a file/device
> to a memory buffer and doing a cmp action between two of those. If your os is
> not able to perform something like this you can do virtually nothing, not even
> booting (because your reading action corrupts the data).
And with any experience you will find that subtle races in all of these
"basic operations" can often only be triggered by certain scenarios. Saying
that "well my machine boots" is not enough to prove that the components
involved to that point are bug free. You may be able to operate just
fine in 99% of your test scenarios yet still have a very catastrophic
flaw in the code.
>> >> >> When testing our drivers against RHAS2.1 we found that the stock
>> >> >> kernel had data corruption issues very similar to what your are talking
>> >> >> about when run on very fast, hyperthreading, SMP machines. The data
>> >> >> corruption occurred with any SCSI controller we tried, regardless of
>> >> > vendor.
>> >> >
>> >> > My question is: is it solved?
>> >>
>> >> My understanding is that it was fixed in 2.4.18 level kernels, but since
>> >> I don't know the root cause of the corruption, it could have just been
>> >> made more difficult to reproduce.
>> >
>> > Can you point to some URL where information about this is available?
>>
>> https://rhn.redhat.com/errata/RHSA-2003-147.html
>
> The scenario described there is unlikely for my case because
> a) I have only 3 GB of mem
> b) no hints are available that UP can solve the problem on the same hardware
This is only the latest corruption bug that has been addressed. You
should really read all of the kernel erratas. The one we hit originally
was this one:
https://rhn.redhat.com/errata/RHSA-2002-227.html
I'm not saying that this is your problem or even related, but just to
point out that the type of data corruption you are talking about can
occur due to bugs in core kernel functionality.
>> To reproduce your problem, I need the same MB, memory configuration, drive
>> types, a 3ware card, and the same tape drive you have. I have tried various
>> backup scenarios with *other hardware* and have failed to reproduce your
>> problem.
>
> I have talked to others with similar problems and none has the same mb or a
> 3ware controller.
Define similar. You are the only person I know of that is currently
indicating they are having *data corruption* with the aic7xxx driver.
That is, in particular, what I am trying to reproduce locally.
> All have problems with streamers on aic. All solutions I
> heard so far were done by replacing aic by whatever strange controller
> they got their hands on.
I'm glad they were able to resolve their problems.
>> >> I suggest you go browse the code that is exercised by such an activity
>> >> before you say that.
>> >
>> > What kind of a statement is this?
>>
>> Its one way of saying that you need to understand all of the code involved
>> with turing a write syscall into a call into the aic7xxx driver. If you
>> review the code path, you'll find that there are thousands of lines of
>> code involved that have nothing to do with SCSI or the aic7xxx driver.
>> To say that you have created a simple example that proves that the problem
>> is in the aic7xxx driver is naive at best.
>
> To tell me it is not is just as good.
You mean "just as naive"? Pointing your finger at the aic7xxx driver
is not going to solve your problem. Ruling out other system components
(of which there are many in your test case) also won't help find it.
>> In this case, the information you have so far provided points away from
>> the aic7xxx driver. I don't say that in all cases that I investigate,
>> but I believe it to be true in this case. If past experience is any guide,
>> 80-90% of the problems like this that I have debugged (and that I could
>> actually replicate) were induced by using the aic7xxx driver, but turned
>> out to be bugs in other components in the system. The aic7xxx driver
>> happens to be one of the more agressive SCSI drivers in the system and
>> that can often lead to finding bugs in other components.
>
> Agressive is indeed a good term for it. And it describes exactly what I don't
> like about it.
Then don't use choose to use it.
> The primary goal of a driver (in my eyes) is to make some
> connected hardware work as expected. It is definitely not its primary goal to
> be overly brilliant and therefore detecting bugs in other subsystems.
My goal is to take full advantage of the hardware I support in my drivers.
That isn't an attempt to be "brilliant", but rather just taking advantage
of the hardware you have purchased. The end result is that for instance
the aic79xx driver can achieve sustained random I/O throughput 40% above
it's main competetor. That isn't an attempt to break the rest of linux,
but to get the most performance possible out of Linux.
> I have
> told you months ago that a symbios driven systems feels somehow smoother and
> faster - elegant.
Which doesn't tell me anything about the relative performance of the
two drivers. Such subjective remarks do not provide any feedback that
can be turned into a concrete plan to improve the driver. They don't even
really tell me what you think is wrong with it.
> And btw: you win nothing with your way, not even performance.
Another unsubstantiated claim. Again, if you don't like the driver, or
its style, you should just use something else if it will make you happier.
It certainly sounds like that is the case.
>> I have lots of test setups that show the aic7xxx and aic79xx driver working
>> just fine in PIII and P4 dual and quad configurations with and without apic
>> interrupt routing and writing to tape.
>
> This does only mean you have not yet met something similar to my setup. It
> does not really prove a lot.
Which is exactly my point! You act as though I should be able to magically
reproduce and fix your problem. I've said that I can't reproduce it and
that means I can't fix it without more information. I never claimed anything
more than that other than your current data points do not, in my opinion,
point to an aic7xxx driver problem. That doesn't *eliminate* the aic7xxx
driver as a cause just as your test cases don't eliminate the other
components of the system.
> Well, the thing is, I try to achieve information. But since the whole issue is
> all about lots of data I try to find an intelligent way to locate the cause of
> it all. I am not very confident that analysis of the trashed data will lead
> somewhere.
If you filter all available to what you only believe will be relavent to
solving the problem, then you will likely filter out things that might
give others a clue as to the true cause of your problem.
> I think narrowing the code path that leads to the problem by
> multiple distinct test scenarios looks more/faster promising. Can you think of
> something reducing the test complexity (not using tar, not comparing to a file
> or whatever)?
I would be analyzing the current failure modes first, but if you just want
to try to narrow the cause by varying your configuration, you could do
that by using a different source filesystem or even using /dev/zero or
a program that generates the data that will be written to tape. You might
also try to determine if the corruption happens when the tape is written
or if the data is corrupted during the read. You could do this by
doing multiple read sessions to see if the corruption is consistent or
doing the write in what appears to be a safe kernel mode and the read
in the unsafe kernel and vice - versa. Etc.
--
Justin
Hello,
a short note on todays test cycles.
I switched to rc8 (SMP, apic), took three cycles until it failed.
rc8 (SMP, apic, HIGHIO) failed on the first try.
I thought HIGHIO could make a difference if there were inherent problems with
bounce buffers. Unfortunately this seems not the case.
Anyway it looks like failures have gotten fewer since rc8. I will try an
overnight stress test now to see if I get it freezing again.
Regards,
Stephan
Stephan> I switched to rc8 (SMP, apic), took three cycles until it
Stephan> failed. rc8 (SMP, apic, HIGHIO) failed on the first try. I
Stephan> thought HIGHIO could make a difference if there were inherent
Stephan> problems with bounce buffers. Unfortunately this seems not
Stephan> the case.
I'm doing testing on 2.5.70-mm3, SMP, APIC, PREEMPT with an AIC7880
driving a DLT7000 along with some idle disks on the same bus. I'm
dumping data to tape and verifying it. Once I get more data, I'll
followup.
John
On Wed, 11 Jun 2003 22:23:46 +0200
Stephan von Krawczynski <[email protected]> wrote:
> Hello,
> [...]
> Anyway it looks like failures have gotten fewer since rc8. I will try an
> overnight stress test now to see if I get it freezing again.
Interestingly it does not freeze. One file shows data corruption, but the
system looks stable. None of the older rc's made it up to this point. Looks
like something in rc8 got better and I am in fact experiencing a set of bugs
and not only one.
Regards,
Stephan
Hello all,
this is the second day of stress-testing pure rc8 in SMP, apic mode. Today
everything is fine, no freeze, no data corruption.
current standings:
2 days continuous test, one file data corruption on day 1
Regards,
Stephan
Hello all,
this is the fourth day of stress-testing pure rc8/2.4.21 in SMP, apic mode. Today
another corruption happened.
current standings:
4 days continuous test,
one file data corruption on day 1
one file data corruption on day 4
Regards,
Stephan
Stephan> this is the fourth day of stress-testing pure rc8/2.4.21 in
Stephan> SMP, apic mode. Today another corruption happened.
Stephan> current standings:
Stephan> 4 days continuous test,
Stephan> one file data corruption on day 1
Stephan> one file data corruption on day 4
Can you define corruption? Can you tell us what commands you are
using to generate the data which is written to tape?
John
On Fri, 13 Jun 2003, Stephan von Krawczynski wrote:
> Hello all,
>
> this is the second day of stress-testing pure rc8 in SMP, apic mode. Today
> everything is fine, no freeze, no data corruption.
>
> current standings:
>
> 2 days continuous test, one file data corruption on day 1
What kind of data corruption and what tests are you doing ? (sorry if you
already mentionad that on the list)
On Tue, 17 Jun 2003 17:47:02 -0300 (BRT)
Marcelo Tosatti <[email protected]> wrote:
>
>
> On Fri, 13 Jun 2003, Stephan von Krawczynski wrote:
>
> > Hello all,
> >
> > this is the second day of stress-testing pure rc8 in SMP, apic mode. Today
> > everything is fine, no freeze, no data corruption.
> >
> > current standings:
> >
> > 2 days continuous test, one file data corruption on day 1
>
>
> What kind of data corruption and what tests are you doing ? (sorry if you
> already mentionad that on the list)
Todays score:
7 days continuous test
one file data corruption on day 1
one file data corruption on day 4
two file data corruptions on day 6
Test is performed as follows:
around 70-100 GB of data is transferred to a nfs-server with rc8 onto a RAID5
on 3ware-controller.
The data is then copied via tar onto a SDLT drive connected to an aic
controller.
Afterwards the data is verified by tar.
Since rc8 this runs stable (froze before during the first day).
Whats left is that the verify done failes sometimes (see above). It does not
look like a write error to tape, because retrying the verify cycle the errors
occur in other files most of the time (or even none at all). It seems reading
back is the problem. I doubt the problem lies on the 3ware side, because this
would mean you cannot use it at all (there should be errors all over other
actions as well then).
Most of the several files tar'ed are beyond the 2 GB file size. They vary from
around 100MB upto about 15 GB per file, around 70 GB minimum summed up.
Of course I exchanged the tapes and the drive. Didn't get better.
Regards,
Stephan
Stephan> 7 days continuous test
Stephan> one file data corruption on day 1
Stephan> one file data corruption on day 4
Stephan> two file data corruptions on day 6
Stephan> Test is performed as follows:
Stephan> around 70-100 GB of data is transferred to a nfs-server with
Stephan> rc8 onto a RAID5 on 3ware-controller. The data is then
Stephan> copied via tar onto a SDLT drive connected to an aic
Stephan> controller. Afterwards the data is verified by tar.
Is the data verified after the transfer to the NFS server? Does it
pass muster then using MD5 sums on the files?
What happens if you cut the tape drive out of the loop and copy the
data to another partition on the 3ware controller and do the compare
then?
I assume you're doing:
tar -c -f /dev/tape --verify /path/to/files
and that's when you get the errors? Or are you writing to tape, and
then doing a compare with:
tar -c -f /dev/tape /path/to/files
tar -d -f /dev/tape /path/to/files
Stephan> Since rc8 this runs stable (froze before during the first
Stephan> day).
How much RAM is in the box, and how much free space is on the
filesystem? I've been trying to replicate this type of test on
2.5.7x, but I've been having issues. I'm also just dumping a pile of
MP3s to tape and reading them back to check.
Stephan> Most of the several files tar'ed are beyond the 2 GB file
Stephan> size. They vary from around 100MB upto about 15 GB per file,
Stephan> around 70 GB minimum summed up. Of course I exchanged the
Stephan> tapes and the drive. Didn't get better.
This is an interesting data point. What happens if you make all the
files be 2.5gb in size, do you get corruption more consistently then?
I'm interested in this issue because I want to make sure that tape
backups work reliably on Linux.
John
On Wed, 18 Jun 2003 10:21:25 -0400
"John Stoffel" <[email protected]> wrote:
>
> Stephan> 7 days continuous test
> Stephan> one file data corruption on day 1
> Stephan> one file data corruption on day 4
> Stephan> two file data corruptions on day 6
>
> Stephan> Test is performed as follows:
>
> Stephan> around 70-100 GB of data is transferred to a nfs-server with
> Stephan> rc8 onto a RAID5 on 3ware-controller. The data is then
> Stephan> copied via tar onto a SDLT drive connected to an aic
> Stephan> controller. Afterwards the data is verified by tar.
>
> Is the data verified after the transfer to the NFS server? Does it
> pass muster then using MD5 sums on the files?
No, the content is not verified to be the same as on the nfs clients. But
this is not the point here, it could as well be bad content that is saved to
tape, and if you get wrong verification for this, your bad data simply got
worse. Right?
> What happens if you cut the tape drive out of the loop and copy the
> data to another partition on the 3ware controller and do the compare
> then?
I have not managed to get the corruption on archives written to (the same)
3ware partition instead of tape up to this day.
>
> I assume you're doing:
>
> tar -c -f /dev/tape --verify /path/to/files
No. See your second guess.
> and that's when you get the errors? Or are you writing to tape, and
> then doing a compare with:
>
> tar -c -f /dev/tape /path/to/files
> tar -d -f /dev/tape /path/to/files
Yes, I am separately verifying with "-d".
> Stephan> Since rc8 this runs stable (froze before during the first
> Stephan> day).
>
> How much RAM is in the box, and how much free space is on the
> filesystem? I've been trying to replicate this type of test on
> 2.5.7x, but I've been having issues. I'm also just dumping a pile of
> MP3s to tape and reading them back to check.
See first post of the thread, in case it already vanished: 3 GB RAM, 320 GB
filesystem space, at least half free.
> Stephan> Most of the several files tar'ed are beyond the 2 GB file
> Stephan> size. They vary from around 100MB upto about 15 GB per file,
> Stephan> around 70 GB minimum summed up. Of course I exchanged the
> Stephan> tapes and the drive. Didn't get better.
>
> This is an interesting data point. What happens if you make all the
> files be 2.5gb in size, do you get corruption more consistently then?
Hm, I haven't tried this so far. My next guess would have been not to verify
but to read the data completely in (to disk) again and then do a verification
based on a file-compare utility. If there is a difference one can have a real
look on the data, which is a bit of a mess on tape.
> I'm interested in this issue because I want to make sure that tape
> backups work reliably on Linux.
Well, two of the same kind :-)
Regards,
Stephan
On Wed, 18 Jun 2003, Stephan von Krawczynski wrote:
> On Tue, 17 Jun 2003 17:47:02 -0300 (BRT)
> Marcelo Tosatti <[email protected]> wrote:
>
> >
> >
> > On Fri, 13 Jun 2003, Stephan von Krawczynski wrote:
> >
> > > Hello all,
> > >
> > > this is the second day of stress-testing pure rc8 in SMP, apic mode. Today
> > > everything is fine, no freeze, no data corruption.
> > >
> > > current standings:
> > >
> > > 2 days continuous test, one file data corruption on day 1
> >
> >
> > What kind of data corruption and what tests are you doing ? (sorry if you
> > already mentionad that on the list)
>
> Todays score:
>
> 7 days continuous test
> one file data corruption on day 1
> one file data corruption on day 4
> two file data corruptions on day 6
>
> Test is performed as follows:
>
> around 70-100 GB of data is transferred to a nfs-server with rc8 onto a
> RAID5 on 3ware-controller. The data is then copied via tar onto a SDLT
> drive connected to an aic controller. Afterwards the data is verified by
> tar.
So the data is intact when it arrives on the 3ware and gets corrupted
on the write to the tape?
Marcelo Tosatti wrote:
> So the data is intact when it arrives on the 3ware and gets corrupted
> on the write to the tape?
>
Actually, without another copy of the data on a different system to verify it
with, you can't know that for sure. It could easily be getting to the tape (the
actual media) just fine, but then get corrupted during the verify readback.
On Fri, 20 Jun 2003, Kevin P. Fleming wrote:
> Marcelo Tosatti wrote:
>
> > So the data is intact when it arrives on the 3ware and gets corrupted
> > on the write to the tape?
> >
>
> Actually, without another copy of the data on a different system to
> verify it with, you can't know that for sure. It could easily be getting
> to the tape (the actual media) just fine, but then get corrupted during
> the verify readback.
Right. Stephan, if you could use a bit of your time to isolate the problem
I would be VERY grateful.
Hi !
On Fri, Jun 20, 2003 at 06:13:53PM -0300, Marcelo Tosatti wrote:
> > Actually, without another copy of the data on a different system to
> > verify it with, you can't know that for sure. It could easily be getting
> > to the tape (the actual media) just fine, but then get corrupted during
> > the verify readback.
>
> Right. Stephan, if you could use a bit of your time to isolate the problem
> I would be VERY grateful.
I remember Stephan once said that he used tar to verify the tape, and that for
one backup, he did several tests showing corruption on different files. Altough
that doesn't mean that the tape is written totally correctly, it at proves that
there's at least a read corruption.
I think that comparing multiple reads to find a pattern in corruption offsets
(if any) is the only thing he could do (not speaking about mixing read/writes
with good/bad kernels). Of course, storing several times 70GB on disk is not
easy, but at least a 16 bits checksum for each 1kB block would result on about
140 MB files, which will be "easier" to compare. It could be enough to check
for empty blocks, duplicated blocks or totally random ones.
Stephan, if you're willing to do the test but don't have such a tool, I may
write a quick dirty one tomorrow if that helps.
BTW, it could be interesting to note the read buffer's hardware address for
each test, in case it matters.
Cheers,
Willy
On Sat, 21 Jun 2003 00:03:31 +0200
Willy Tarreau <[email protected]> wrote:
> Hi !
>
> On Fri, Jun 20, 2003 at 06:13:53PM -0300, Marcelo Tosatti wrote:
> > > Actually, without another copy of the data on a different system to
> > > verify it with, you can't know that for sure. It could easily be getting
> > > to the tape (the actual media) just fine, but then get corrupted during
> > > the verify readback.
> >
> > Right. Stephan, if you could use a bit of your time to isolate the problem
> > I would be VERY grateful.
>
> I remember Stephan once said that he used tar to verify the tape, and that
> for one backup, he did several tests showing corruption on different files.
> Altough that doesn't mean that the tape is written totally correctly, it at
> proves that there's at least a read corruption.
Hello Willy, hello Marcelo,
in fact I noticed that doing multiple verify cycles the so-called corruption
happens rarely (read _very_ rarely) on the same files. So it is indeed very
likely that the read case is a problem.
Another thing to note is that I did not manage to produce a failed verify on a
dataset tar'ed to the 3ware raid and not to tape. I did not test that very
intensively, but from the tests I did I would have expected a corruption to
happen based on the cycles I did on tape.
> I think that comparing multiple reads to find a pattern in corruption offsets
> (if any) is the only thing he could do (not speaking about mixing read/writes
> with good/bad kernels). Of course, storing several times 70GB on disk is not
> easy, but at least a 16 bits checksum for each 1kB block would result on
> about 140 MB files, which will be "easier" to compare. It could be enough to
> check for empty blocks, duplicated blocks or totally random ones.
>
> Stephan, if you're willing to do the test but don't have such a tool, I may
> write a quick dirty one tomorrow if that helps.
>
> BTW, it could be interesting to note the read buffer's hardware address for
> each test, in case it matters.
Well, in fact I am a bit lost in the case, because of the shere data volume, I
have space for several sets on disk, but it takes a damn long time to produce
one cycle write/verify. Anyway I will do if that helps. The big problem with
tar is that I have (to my knowledge) no chance to let it somewhere save the
verify-failing data parts. I guess this could help a lot, because we could then
see what the corruption looks like, how long (in bytes) it is and so on.
If anybody has an idea how to achieve this goal let me know.
I am not 100% confident that the tests would look the same if I simply read the
whole tape onto the disks again and then verify via file compare, but anyway I
should try this too several times to complete the picture.
Ok, weekend is here, I see what can be done.
Regards,
Stephan
On Sat, Jun 21, 2003 at 01:48:28AM +0200, Stephan von Krawczynski wrote:
> Well, in fact I am a bit lost in the case, because of the shere data volume, I
> have space for several sets on disk, but it takes a damn long time to produce
> one cycle write/verify. Anyway I will do if that helps. The big problem with
> tar is that I have (to my knowledge) no chance to let it somewhere save the
> verify-failing data parts. I guess this could help a lot, because we could then
> see what the corruption looks like, how long (in bytes) it is and so on.
> If anybody has an idea how to achieve this goal let me know.
I wanted to implement a compare-and-capture feature in my check tool, but
realized that it would certainly be of no help if you get duplicated blocks or
so, because you'll have no way to tell *where* the captured block should have
been. That's why I suggested the checksum instead : if you get a pattern such
as :
check1 check2
0: 1234 1234
1: 4567 4567
3: 789a 4567
4: bcde 789a
5: f012 bcde
... it will mean than block 1 was duplicated in check2. If you see :
check1 check2
0: 1234 1234
1: 4567 4567
3: 789a 4567
4: bcde bcde
5: f012 f012
... it will mean than block 1 was repeated instead of block 2 in check2.
If you see 0000, it probably means that you got a block full of zeros, since
the algorithm is only additive.
The resulting files will be 1/512 of the input, I think you'll find some space
on your disk for such a file.
It may be interesting to do regular checks during the second read, so that you
can abort after the first error, and not have to get a second full read.
> Ok, weekend is here, I see what can be done.
Here is my proposed program. I tried it on my local hard disk, it took 5 min
to check the full 8 GB (30 MB/s), and I reached 123 MB/s on a 4 disks software
raid5 array with an AHA29160. It outputs the current offset every 64 MB.
Here it is running on a DDS3 :
[root@alpha /root]# ~willy/c/chkblk.alpha /dev/nst0 > nst0.chk
At offset 603979776...
I hope it can help.
Cheers,
Willy
/*
* chkblk - computes block checksums - 2003/06/21 - Willy Tarreau <[email protected]>
*
* This program is free, do what you want with it, I will not be responsible if
* it trashes all your data.
*
* Reads a file and outputs a binary 16 bit checksum for each 1KB block.
* Useful to check for data corruption. Eg :
*
* # chkblk /dev/tape > test1.chk
* # chkblk /dev/tape > test2.chk
* # cmp -l test[12].chk
*
* or :
* # chkblk /dev/sda2 |od -tx2 -Ax > test1.txt
* # chkblk /dev/sda2 |od -tx2 -Ax > test2.txt
* # diff -u test[12].txt
*
* To be able to read files bigger than 2GB, you should compile it
* with "-D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64".
*
*
*/
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#define BLOCKSIZE 1024
#if _FILE_OFFSET_BITS == 64
#define OFF_T_FMT "%ll"
#else
#define OFF_T_FMT "%l"
#endif
void usage() {
fprintf(stderr,
"Usage: chkblk input > output\n"
" - input is a file, device, ...\n"
" - output will be a binary file 1/512th the size of input\n"
);
exit(1);
}
main(int argc, char **argv) {
int fd;
int len;
off_t inp_off;
unsigned long *buffer;
if (argc != 2)
usage();
buffer = (void *)malloc(BLOCKSIZE);
if (buffer == NULL) {
fprintf(stderr,"Out of memory\n");
exit(2);
}
fd = open(argv[1], O_RDONLY);
if (fd < 0) {
perror("open");
exit(3);
}
inp_off = 0;
while ((len = read(fd, buffer, BLOCKSIZE)) > 0) {
unsigned long sum = 0;
int off;
inp_off += len;
/* displays the offset every 64 MB */
if ((inp_off & 0x3ffffff) == 0)
fprintf(stderr,"At offset " OFF_T_FMT "u...\r", inp_off);
for (off = 0; off < len/sizeof(*buffer); off++)
sum += buffer[off];
while (sum >= (1<<16)) {
sum = (sum & 0xffff) + (sum >> 16);
}
putchar(sum);
putchar(sum >> 8);
}
fprintf(stderr,"At offset " OFF_T_FMT "u", inp_off);
if (len < 0) {
fprintf(stderr, ", read returned : \n");
perror("");
close(fd);
exit(4);
}
else {
fprintf(stderr, ", check completed without error\n");
}
close(fd);
exit(0);
}
Hello all,
here is the interesting result of my working weekend with intensive testing:
As 22-pre1 just came out I decided to use it for further testing of the issue,
because I don't like testing old kernels particularly. And to my great surprise
I have not managed to break 22-pre1 so far. I have up to now moved about 1 TB
of data through the box (written to tape and verified) and have not yet
produced a single verify error.
Question is: how do I continue?
Of course the tape-writing actions will be continuing, so I still have a look
at the issue every day.
Are we interested in finding out what particular patch in pre1 is responsible
for this?
Well, at least there is the positive result that pre1 seems significantly
better...
Regards,
Stephan
Hello again,
so we learned that working on the weekend is no good ;-)
The problem is back - still on 22-pre1 . I had two failed verifications this
morning.
Now I am giving Willy's checksumming a try. I'll keep you informed.
Regards,
Stephan
Hello all, hello Willy,
I tried to produce the problem by using your chkblk tool, but was not
successful up to now. All checksums are the same. Is it possible that the
problem lies deeper in the process than expected. Remember I do:
copy data via NFS to server
tar data on server to tape
read data back vor verification with tar -d
Is it possible that the verification errors do not occur because of a read
problem, but because of a page cached block getting trashed somehow between
"tar to tape" and "read from tape". I would suspect that some blocks survive in
memory and are re-used during verification. If for some reason this data is
invalid or corrupted the verification fails although the read was correct.
I know that this sounds weird, but nevertheless possible, or not?
It may even be worse, the data may have also been left from the original nfs
action, correct?
Is there a way to completely invalidate/flush all cached blocks concerning this
fs (besides umount)?
Regards,
Stephan
Hi Stephan,
> Is it possible that the verification errors do not occur because of a read
> problem, but because of a page cached block getting trashed somehow between
> "tar to tape" and "read from tape". I would suspect that some blocks survive in
> memory and are re-used during verification. If for some reason this data is
> invalid or corrupted the verification fails although the read was correct.
That seems strange to me, I don't see how we could cache data from a char
device. It is possible that chkblk and tar don't use same block size and that
your problem only occurs on larger transfers, or particularly aligned ones.
You could try to increase the block size in chkblk to something bigger than a
page for example. I don't know if tar reads your tape at full speed, but it's
possible that if it doesn't cope with the tape speed, an overrun occurs and
something finally gets dropped :-/
> I know that this sounds weird, but nevertheless possible, or not?
> It may even be worse, the data may have also been left from the original nfs
> action, correct?
> Is there a way to completely invalidate/flush all cached blocks concerning this
> fs (besides umount)?
I don't believe in this. But as Justin says, this card can get very high
performances and hassle the hardware. Perhaps you have a rare weakness in your
hardware that only occurs under these conditions, although I don't know how
this could be checked.
IIRC, you said that it works flawlessly in UP and you need SMP to hit the bug.
Perhaps your second CPU is sometimes flaky (bad cache, etc...) :-/
Cheers,
Willy
On Sat, 21 Jun 2003, Willy Tarreau wrote:
> On Fri, Jun 20, 2003 at 06:13:53PM -0300, Marcelo Tosatti wrote:
> > > Actually, without another copy of the data on a different system to
> > > verify it with, you can't know that for sure. It could easily be getting
> > > to the tape (the actual media) just fine, but then get corrupted during
> > > the verify readback.
> >
> > Right. Stephan, if you could use a bit of your time to isolate the problem
> > I would be VERY grateful.
>
> I remember Stephan once said that he used tar to verify the tape, and that for
> one backup, he did several tests showing corruption on different files. Altough
> that doesn't mean that the tape is written totally correctly, it at proves that
> there's at least a read corruption.
>
> I think that comparing multiple reads to find a pattern in corruption offsets
> (if any) is the only thing he could do (not speaking about mixing read/writes
> with good/bad kernels). Of course, storing several times 70GB on disk is not
> easy, but at least a 16 bits checksum for each 1kB block would result on about
> 140 MB files, which will be "easier" to compare. It could be enough to check
> for empty blocks, duplicated blocks or totally random ones.
Actually, to find problems like this, a change to cpio would be useful:
find /home | cpio -oB -Hcrc >/dev/st0
as an example. When reading back you will see errors from the CRC on each
file. I use cpio for this reason in some cases where knowing it's right
is critical.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
On Tue, 24 Jun 2003 19:43:31 +0200
Willy Tarreau <[email protected]> wrote:
> Hi Stephan,
>
> > Is it possible that the verification errors do not occur because of a read
> > problem, but because of a page cached block getting trashed somehow between
> > "tar to tape" and "read from tape". I would suspect that some blocks
> > survive in memory and are re-used during verification. If for some reason
> > this data is invalid or corrupted the verification fails although the read
> > was correct.
>
> That seems strange to me, I don't see how we could cache data from a char
> device.
Hello Willy,
sorry, you probably misunderstood my flaky explanation. What I meant was not a
cached block from the _tape_ (obviously indeed a char-type device) but from the
3ware disk (i.e. the other side of the verification). Consider the tape
completely working, but the disk data corrupt (possibly not from real reading
but from corrupted cache).
> It is possible that chkblk and tar don't use same block size and that
> your problem only occurs on larger transfers, or particularly aligned ones.
Very likely not the same block size, with tar I use -b64.
> You could try to increase the block size in chkblk to something bigger than a
> page for example. I don't know if tar reads your tape at full speed,
It does. There's no head repositioning.
> but it's
> possible that if it doesn't cope with the tape speed, an overrun occurs and
> something finally gets dropped :-/
Very unlikely, how do you create an overrun in a synchronuos single read
operation?
> > I know that this sounds weird, but nevertheless possible, or not?
> > It may even be worse, the data may have also been left from the original
> > nfs action, correct?
> > Is there a way to completely invalidate/flush all cached blocks concerning
> > this fs (besides umount)?
>
> I don't believe in this. But as Justin says, this card can get very high
> performances and hassle the hardware. Perhaps you have a rare weakness in
> your hardware that only occurs under these conditions, although I don't know
> how this could be checked.
I doubt that. Reason is that though the tape is pretty fast for a tape it is
still pretty slow compared to a disk. Since I use the box for months now I
would have expected such a hardware problem to show up for disk access, too.
And there was none.
> IIRC, you said that it works flawlessly in UP and you need SMP to hit the
> bug. Perhaps your second CPU is sometimes flaky (bad cache, etc...) :-/
Hm, interestingly the former freeze bug (solved by marcelo through backout of
some patch in rc8) did not show up in UP. Since then I did not test UP any
more. The problem itself does not necessarily point to flaky hardware, as I
would have no idea how bad cache can only show up during a tape verification,
that does not sound all that reasonable.
More likely could be a SMP race anywhere from nfs-server, 3ware disk driver to
page cache, or not?
Regards,
Stephan
On Tue, Jun 24, 2003 at 11:26:09PM +0200, Stephan von Krawczynski wrote:
> sorry, you probably misunderstood my flaky explanation. What I meant was not a
> cached block from the _tape_ (obviously indeed a char-type device) but from the
> 3ware disk (i.e. the other side of the verification). Consider the tape
> completely working, but the disk data corrupt (possibly not from real reading
> but from corrupted cache).
Ah, OK ! I didn't understand this. You're right, this is also a possibility.
Perhaps a tar cf - /mnt/3ware | chkblk would get evidence of somme corruption ?
<...snip... OK for these points ...>
> Hm, interestingly the former freeze bug (solved by marcelo through backout of
> some patch in rc8) did not show up in UP. Since then I did not test UP any
> more. The problem itself does not necessarily point to flaky hardware, as I
> would have no idea how bad cache can only show up during a tape verification,
> that does not sound all that reasonable.
OK, I agree. And right after posting, I remembered that if this was the case,
you should also see some MCEs which doesn't seem to be your case.
> More likely could be a SMP race anywhere from nfs-server, 3ware disk driver to
> page cache, or not?
fairly possible. That's also what Justin suggested in the past, BTW :-)
Cheers,
Willy
On Wed, 25 Jun 2003 00:03:31 +0200
Willy Tarreau <[email protected]> wrote:
> On Tue, Jun 24, 2003 at 11:26:09PM +0200, Stephan von Krawczynski wrote:
>
> > sorry, you probably misunderstood my flaky explanation. What I meant was
> > not a cached block from the _tape_ (obviously indeed a char-type device)
> > but from the 3ware disk (i.e. the other side of the verification). Consider
> > the tape completely working, but the disk data corrupt (possibly not from
> > real reading but from corrupted cache).
>
> Ah, OK ! I didn't understand this. You're right, this is also a possibility.
> Perhaps a tar cf - /mnt/3ware | chkblk would get evidence of somme corruption
> ?
Hm, probably a dumb question: does repeated tar'ing of the same files lead to
exactly the same archive? There is no timestamp inside or something equivalent
?
Regards,
Stephan
On Tue, 24 Jun 2003 23:26:09 +0200, Stephan von Krawczynski said:
> sorry, you probably misunderstood my flaky explanation. What I meant was not
a
> cached block from the _tape_ (obviously indeed a char-type device) but from t
he
> 3ware disk (i.e. the other side of the verification). Consider the tape
> completely working, but the disk data corrupt (possibly not from real reading
> but from corrupted cache).
Don't rule out odder explanations either. True story follows.. ;)
I once had the misfortune of being the admin for a Gould PN/9080. UTX/32 1.2
came out, and since it changed the inode format on disk, it's dump/mkfs/restore
time. So I take the last 3 full backups, and do 2 more complete dumps besides.
I checked, and *NO* I/O errors had been reported (and then I checked THAT by
giving it a known bad tape and seeing errors WERE reported).
Do the upgrade... and *every single* tape was 'not in dump/restore format'.
Finally traced it down (this was the days when oscilloscopes were still useful)
to a bad 7400 series chip on the tape controller. The backplane was a 32-bit
bus, the tape was an 8-bit device - so there was a 4-to-1 mux that had a bad
chip. Bit 3 would be correct for 4 bits, inverted for 4 bits, correct for
4, etc.. Tape drive *NEVER* complained, because what came over the *cable*
was correct, parity and all..
Oh, and I got the data back something like this:
cat > mangle.c
main() {
int muck[2];
while (read(0,muck,8) == 8) {
muck[1] ^= 0x20202020;
write(1,muck,8);
}
}
^D
cc -o mangle mangle.c
dd if=/dev/rmt0 bs=32k | ./mangle | restore -f -
On Wed, Jun 25, 2003 at 01:43:53AM +0200, Stephan von Krawczynski wrote:
> > Ah, OK ! I didn't understand this. You're right, this is also a possibility.
> > Perhaps a tar cf - /mnt/3ware | chkblk would get evidence of somme corruption
> > ?
>
> Hm, probably a dumb question: does repeated tar'ing of the same files lead to
> exactly the same archive? There is no timestamp inside or something equivalent
> ?
Hmmm no, you're right, I forgot about this case. I think that access time or
other time-dependant informations may change often enough to make a big diff
on checksums. I have no more idea at the moment. Or perhaps tar to a disk file
instead of the tape and check that file :-/
Cheers,
Willy
On Wed, 25 Jun 2003 21:16:55 +0200
Willy Tarreau <[email protected]> wrote:
> On Wed, Jun 25, 2003 at 01:43:53AM +0200, Stephan von Krawczynski wrote:
> > > Ah, OK ! I didn't understand this. You're right, this is also a
> > > possibility. Perhaps a tar cf - /mnt/3ware | chkblk would get evidence of
> > > somme corruption?
> >
> > Hm, probably a dumb question: does repeated tar'ing of the same files lead
> > to exactly the same archive? There is no timestamp inside or something
> > equivalent?
>
> Hmmm no, you're right, I forgot about this case. I think that access time or
> other time-dependant informations may change often enough to make a big diff
> on checksums. I have no more idea at the moment. Or perhaps tar to a disk
> file instead of the tape and check that file :-/
I have tried that already but never managed to get verification errors on tar
archives written to disk.
Maybe I try again some more...
Regards,
Stephan
>>>>> "Stephan" == Stephan von Krawczynski <[email protected]> writes:
Stephan> On Wed, 25 Jun 2003 21:16:55 +0200
Stephan> Willy Tarreau <[email protected]> wrote:
>> Hmmm no, you're right, I forgot about this case. I think that
>> access time or other time-dependant informations may change often
>> enough to make a big diff on checksums. I have no more idea at the
>> moment. Or perhaps tar to a disk file instead of the tape and check
>> that file :-/
Stephan> I have tried that already but never managed to get
Stephan> verification errors on tar archives written to disk. Maybe I
Stephan> try again some more...
I've been trying to get tar errors myself, while writing a 35gb
filesystem to a DLT7000. I'm now running 2.4.21-pre5-ac1 and I
haven't seen any errors. Yet. I'm using the 6.2.8 version of the
driver as well. The filesystem is just a copy of my home directory
and some MP3s and other random files and such. Lots of text and jpegf
files, along with some other stuff.
Maybe I need to try and generate 15-18 files 2gb+ each and dump them
to tape with tar and see how that's handled, and if we get erorrs.
Stephan, can you double check your version info as well? And it would
be great to get some info on your 3ware setup as well, just so we can
work on narrowing down the issues.
Unfortunately, due to the way I have to setup things, the RAID array
and the tape drive are on the same channel, which slows down things
I'm sure.
Here are some timings from dumping and verifying the data to tape:
jfsnew:/# time tar -c-W -b 128 -f /dev/st0 /scratch
tar: Removing leading `/' from member names
408.840u 869.730s 4:03:02.80 8.7% 0+0k 0+0io 258pf+0w
jfsnew:/# time tar -c-W -b 256 -f /dev/st0 /scratch
tar: Removing leading `/' from member names
443.210u 1104.930s 4:07:00.89 10.4% 0+0k 0+0io 264pf+0w
My filesystem is a as follows:
jfsnew:/home# mdadm -D /dev/md1
/dev/md1:
Version : 00.90.00
Creation Time : Mon Jun 23 22:51:43 2003
Raid Level : raid0
Array Size : 44457600 (42.40 GiB 45.57 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Mon Jun 23 22:51:43 2003
State : dirty, no-errors
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Chunk Size : 64K
Number Major Minor RaidDevice State
0 8 48 0 active sync /dev/sdd
1 8 64 1 active sync /dev/sde
2 8 80 2 active sync /dev/sdf
3 8 96 3 active sync /dev/sdg
4 8 112 4 active sync /dev/sdh
UUID : ffa7efb1:1c151f2d:4f6a138c:77085f29
In article <[email protected]> you wrote:
> Hmmm no, you're right, I forgot about this case. I think that access time or
> other time-dependant informations may change often enough to make a big diff
> on checksums. I have no more idea at the moment. Or perhaps tar to a disk file
> instead of the tape and check that file :-/
you can cat the tree into md5sums or run md5sums on the tree:
find . -print0 | xargs -0 cat | md5sum
this will only compare file content. You could first dump it to a file and
then md5sum it, if you want to test also writes.
Greetings
Bernd
--
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/
On Wed, 25 Jun 2003 16:30:22 -0400
"John Stoffel" <[email protected]> wrote:
> >>>>> "Stephan" == Stephan von Krawczynski <[email protected]> writes:
>
> Stephan> I have tried that already but never managed to get
> Stephan> verification errors on tar archives written to disk. Maybe I
> Stephan> try again some more...
>
> I've been trying to get tar errors myself, while writing a 35gb
> filesystem to a DLT7000. I'm now running 2.4.21-pre5-ac1 and I
> haven't seen any errors. Yet. I'm using the 6.2.8 version of the
> driver as well. The filesystem is just a copy of my home directory
> and some MP3s and other random files and such. Lots of text and jpegf
> files, along with some other stuff.
>
> Maybe I need to try and generate 15-18 files 2gb+ each and dump them
> to tape with tar and see how that's handled, and if we get erorrs.
>
> Stephan, can you double check your version info as well? And it would
> be great to get some info on your 3ware setup as well, just so we can
> work on narrowing down the issues.
Hm, I guess you mean kernel version? I am experiencing this problem since about
21-rcX versions, currently running 22-pre1.
The 3ware setup is pretty straight forward a RAID5 with 3 160 GB disks and no
spare.
I would not deny nfs to interact with this problem. Can you try to move your
backup'ed data from somewhere via nfs to your tar'ing box?
Regards,
Stephan
On Wed, 25 Jun 2003 16:30:22 -0400
"John Stoffel" <[email protected]> wrote:
> Maybe I need to try and generate 15-18 files 2gb+ each and dump them
> to tape with tar and see how that's handled, and if we get erorrs.
More data on this:
Today was a very bad day regarding the issue. I experienced three verification
errors, the filesizes were:
563162975
746555206
12679280738
So it seems it is not really linked to the filesize.
Regards,
Stephan
Hello all,
it looks like the problem gets worse currently. This is the second day I see 4
verification errors. This is with kernel 2.4.22-pre2 now.
Regards,
Stephan
On Mon, 30 Jun 2003, Stephan von Krawczynski wrote:
> Hello all,
>
> it looks like the problem gets worse currently. This is the second day I see 4
> verification errors. This is with kernel 2.4.22-pre2 now.
As far as I understood, the tape is corrupting the data (or writting, or
when reading back).
Is this correct?
On Mon, 30 Jun 2003 08:39:38 -0300 (BRT)
Marcelo Tosatti <[email protected]> wrote:
>
>
> On Mon, 30 Jun 2003, Stephan von Krawczynski wrote:
>
> > Hello all,
> >
> > it looks like the problem gets worse currently. This is the second day I
> > see 4 verification errors. This is with kernel 2.4.22-pre2 now.
>
>
> As far as I understood, the tape is corrupting the data (or writting, or
> when reading back).
>
> Is this correct?
Actually my guess is that the _data_ itself is not corrupt, neither the
original set located on 3ware RAID nor the backup'ed set on aic-connected SDLT.
The problem is - according to my personal opinion - flawed during the readback
that occurs while verifying. I do not know if the data is already corrupted by
the aic-driver (less probable currently) or some flaw inside the caching of the
_original_ set. The situation is complex because of the multiple involved
subsystems.
My experience is this:
If you reboot and make backup/verify cycle from 3ware to aic/tape everything
seems fine.
If you reboot and push data over NFS to 3ware-disk, then do the backup/verify
cycle (with this data) from 3ware to aic/tape the corruption is very likely.
If you do try another verify run of the data you see corruptions happen on
_other_ files than the verify before. It is therefore unlikely that both data
"ends" are part of the problem, because you would expect the same corruptions
to show up - at least this is my hope.
Regards,
Stephan