hi,
as the feature freeze of 2.5 comes close, i want to ask if the driver for
the qlogic sanblade 2200/2300 series of hba's will be included in 2.5 ...
are there any plan's to do so ? has it been discussed before ?
i ask because i use those hba's together with ibm's fastt500 storage system,
and it will be nice to have this driver in the default kernel ...
i use version 5.36.3 of the qlogic 2x00 driver in production
(vanilla kernel 2.4.17 + qlogic 2x00 driver v5.36.3) since may 2002
and i never had any problems with this driver ...
(2 lotus domino servers and 1 fileserver all 3 are attached to the ibm fastt500
storage system using qlogic sanblade 2200 cards)
i don't know how many people use those qlogic card's, i got them together
with the fastt500 storage system, ...
the current driver is avaiable here (it's GPL't):
http://www.qlogic.com/support/os_detail.asp?productid=112&osid=26
the qlogic 2x00 driver is also in andrea arcangelis "-aa" patches, so possibly
he knows better if it would be useful to integrate those drivers into 2.5 or
not, ...
thanks for your time,
please CC me, i'm currently not subscribed to lkml,
simon.
On Tue, 2002-10-15 at 21:20, Simon Roscic wrote:
> hi,
>
> as the feature freeze of 2.5 comes close, i want to ask if the driver for
> the qlogic sanblade 2200/2300 series of hba's will be included in 2.5 ...
> are there any plan's to do so ? has it been discussed before ?
>
> i ask because i use those hba's together with ibm's fastt500 storage system,
> and it will be nice to have this driver in the default kernel ...
>
> i use version 5.36.3 of the qlogic 2x00 driver in production
> (vanilla kernel 2.4.17 + qlogic 2x00 driver v5.36.3) since may 2002
> and i never had any problems with this driver ...
Oh so you haven't notices how it buffer-overflows the kernel stack, how
it has major stack hog issues, how it keeps the io request lock (and
interrupts disabled) for a WEEK ?
On Tuesday 15 October 2002 21:31, Arjan van de Ven <[email protected]> wrote:
> Oh so you haven't notices how it buffer-overflows the kernel stack, how
> it has major stack hog issues, how it keeps the io request lock (and
> interrupts disabled) for a WEEK ?
doesn't sound good, ...
as i said, i don't have any problems (= failures, data loss, etc.) with this driver,
sounds like i should update to a newer driver version, wich qlogic 2x00 driver
version do you recommend ? or does this affect all versions of this driver ?
(performance on the machines i use is quite good, a dbench 256 gave me
approx. 60 mb/s (ibm xseries 342, 1x pentium 3 - 1,2 ghz, 512 - 1024 mb ram))
arjan, thanks for the info, i didn't notice that the driver was that bad,
i had much to do in the past months, so i possibly missed disscussions
about the qlogic2x00 stuff, sorry,
simon.
(please CC me, i'm not subscribed to lkml)
Version 6.1b5 does appear to be a big improvement from looking
at the code (certainly much more readable than version 4.x end earlier).
Although the method for creating the different modules for
different hardware is pretty ugly.
in qla2300.c
#define ISP2300
[snip]
#include "qla2x00.c"
in qla2200.c
#define ISP2200
[snip]
#include "qla2x00.c"
I'm sure this would have to go before it got it.
~mc
On 10/16/02 03:53, Simon Roscic wrote:
> On Tuesday 15 October 2002 21:31, Arjan van de Ven <[email protected]> wrote:
>
>>Oh so you haven't notices how it buffer-overflows the kernel stack, how
>>it has major stack hog issues, how it keeps the io request lock (and
>>interrupts disabled) for a WEEK ?
This may have been the cause of problems I had running qla driver with
lvm and ext3 - I was getting ooops with what looked like corrupted bufferheads.
This was happening in pretty much all kernels I tried (a variety of
redhat kernels and aa kernels). Removing LVM has solved the problem.
Although i was blaming LVM - maybe it was a buffer overflow in qla driver.
The rh kernel I tried had quite an old version (4.31) of the driver
suffered from problems recovering from LIP resets. The latest 6.x drivers
seem to handle this much better.
~mc
You might wanna look at version 6.01 instead. I say this because it's
*not* a beta driver.
On Tue, 2002-10-15 at 21:51, Michael Clark wrote:
> Version 6.1b5 does appear to be a big improvement from looking
> at the code (certainly much more readable than version 4.x end earlier).
>
> Although the method for creating the different modules for
> different hardware is pretty ugly.
>
> in qla2300.c
>
> #define ISP2300
> [snip]
> #include "qla2x00.c"
>
> in qla2200.c
>
> #define ISP2200
> [snip]
> #include "qla2x00.c"
>
> I'm sure this would have to go before it got it.
>
> ~mc
>
> On 10/16/02 03:53, Simon Roscic wrote:
> > On Tuesday 15 October 2002 21:31, Arjan van de Ven <[email protected]> wrote:
> >
> >>Oh so you haven't notices how it buffer-overflows the kernel stack, how
> >>it has major stack hog issues, how it keeps the io request lock (and
> >>interrupts disabled) for a WEEK ?
>
> This may have been the cause of problems I had running qla driver with
> lvm and ext3 - I was getting ooops with what looked like corrupted bufferheads.
>
> This was happening in pretty much all kernels I tried (a variety of
> redhat kernels and aa kernels). Removing LVM has solved the problem.
> Although i was blaming LVM - maybe it was a buffer overflow in qla driver.
>
> The rh kernel I tried had quite an old version (4.31) of the driver
> suffered from problems recovering from LIP resets. The latest 6.x drivers
> seem to handle this much better.
>
> ~mc
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
I doubt it will make a difference. LVM and qlogic drivers seem
to be a bad mix. I've already tried the beta5 of 6.01
and same problem exists - ooops about every 5-8 days.
Removing LVM and solved the problem.
The changelog only lists small changes since between 6.01 and 6.01b5
Although one entry suggests a fix to a race in qla2x00_done
that would allow multiple completions on the same IO. Not sure
if this relates to my problem with LVM as this occured with
earlier versions of the qlogic driver without the dpc threads.
~mc
On 10/16/02 11:56, GrandMasterLee wrote:
> You might wanna look at version 6.01 instead. I say this because it's
> *not* a beta driver.
>
>
> On Tue, 2002-10-15 at 21:51, Michael Clark wrote:
>
>>Version 6.1b5 does appear to be a big improvement from looking
>>at the code (certainly much more readable than version 4.x end earlier).
>>
>>Although the method for creating the different modules for
>>different hardware is pretty ugly.
>>
>>in qla2300.c
>>
>>#define ISP2300
>>[snip]
>>#include "qla2x00.c"
>>
>>in qla2200.c
>>
>>#define ISP2200
>>[snip]
>>#include "qla2x00.c"
>>
>>I'm sure this would have to go before it got it.
>>
>>~mc
>>
>>On 10/16/02 03:53, Simon Roscic wrote:
>>
>>>On Tuesday 15 October 2002 21:31, Arjan van de Ven <[email protected]> wrote:
>>>
>>>
>>>>Oh so you haven't notices how it buffer-overflows the kernel stack, how
>>>>it has major stack hog issues, how it keeps the io request lock (and
>>>>interrupts disabled) for a WEEK ?
>>>
>>This may have been the cause of problems I had running qla driver with
>>lvm and ext3 - I was getting ooops with what looked like corrupted bufferheads.
>>
>>This was happening in pretty much all kernels I tried (a variety of
>>redhat kernels and aa kernels). Removing LVM has solved the problem.
>>Although i was blaming LVM - maybe it was a buffer overflow in qla driver.
>>
>>The rh kernel I tried had quite an old version (4.31) of the driver
>>suffered from problems recovering from LIP resets. The latest 6.x drivers
>>seem to handle this much better.
>>
>>~mc
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>the body of a message to [email protected]
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>Please read the FAQ at http://www.tux.org/lkml/
>
--
Michael Clark, . . . . . . . . . . . . . . . [email protected]
Managing Director, . . . . . . . . . . . . . . . phone: +65 6395 6277
Metaparadigm Pte. Ltd. . . . . . . . . . . . . . mobile: +65 9645 9612
25F Paterson Road, Singapore 238515 . . . . . . . . fax: +65 6234 4043
I'm successful because I'm lucky. The harder I work, the luckier I get.
Just to make sure we are on the same page,
was that LVM1, LVM2, or EVMS?
Joe
Michael Clark wrote:
> I doubt it will make a difference. LVM and qlogic drivers seem
> to be a bad mix. I've already tried the beta5 of 6.01
> and same problem exists - ooops about every 5-8 days.
> Removing LVM and solved the problem.
My Dell 6650 has been doing this exact behaviour since we got on 5.38.9
and up, using LVM in a production capacity. Both servers we have, have
crashed mysteriously, without any kernel dump, etc, but all hardware
diags come out clean.
All hardware configuration bits are perfect, as can be anyway, and we
still get this behaviour. After 5-6.5 days...the box black screens. So
bad so, that all the XFS volumes we have, never enter a shutdown. We
must repair them all, today this happened, and we lost one part of the
tablespace on our beta db. We're using LVM1, on 2.4.19-aa1.
On Tue, 2002-10-15 at 23:35, J Sloan wrote:
> Just to make sure we are on the same page,
> was that LVM1, LVM2, or EVMS?
>
> Joe
>
> Michael Clark wrote:
>
> > I doubt it will make a difference. LVM and qlogic drivers seem
> > to be a bad mix. I've already tried the beta5 of 6.01
> > and same problem exists - ooops about every 5-8 days.
> > Removing LVM and solved the problem.
>
>
>
On Tue, 2002-10-15 at 23:35, J Sloan wrote:
> Just to make sure we are on the same page,
> was that LVM1, LVM2, or EVMS?
>
> Joe
>
> Michael Clark wrote:
>
Quick question on this, could this problem be exacerbated, perhaps, by
large pagebuf usage that XFS performs, as well as the FS buffers that it
allocates, since XFS allocates a lot of read/write buffers for logging?
On Tue, 2002-10-15 at 14:20, Simon Roscic wrote:
> hi,
...
> i ask because i use those hba's together with ibm's fastt500 storage system,
> and it will be nice to have this driver in the default kernel ...
>
> i use version 5.36.3 of the qlogic 2x00 driver in production
> (vanilla kernel 2.4.17 + qlogic 2x00 driver v5.36.3) since may 2002
> and i never had any problems with this driver ...
> (2 lotus domino servers and 1 fileserver all 3 are attached to the ibm fastt500
> storage system using qlogic sanblade 2200 cards)
Do you use LVM, EVMS, MD, other, or none?
TIA
--The GrandMaster
LVM1 tried in numerous versions of 2.4.x both aa and rh version.
Every one i was getting oops when used with a combination
of ext3, LVM1 and qla2x00 driver.
Since taking LVM1 out of the picture, my oopsing problem has
gone away. This could of course not be LVM1's fault but the
fact that qla driver is a stack hog or something - i don't have
enough information to draw any conclusions all at the moment
i'm too scared to try LVM again (plus the time it takes to
migrate a few hundred gigs of storage).
~mc
On 10/16/02 12:35, J Sloan wrote:
> Just to make sure we are on the same page,
> was that LVM1, LVM2, or EVMS?
>
> Joe
>
> Michael Clark wrote:
>
>> I doubt it will make a difference. LVM and qlogic drivers seem
>> to be a bad mix. I've already tried the beta5 of 6.01
>> and same problem exists - ooops about every 5-8 days.
>> Removing LVM and solved the problem.
On Oct 16, 2002 13:28 +0800, Michael Clark wrote:
> Every one i was getting oops when used with a combination
> of ext3, LVM1 and qla2x00 driver.
>
> Since taking LVM1 out of the picture, my oopsing problem has
> gone away. This could of course not be LVM1's fault but the
> fact that qla driver is a stack hog or something - i don't have
> enough information to draw any conclusions all at the moment
> i'm too scared to try LVM again (plus the time it takes to
> migrate a few hundred gigs of storage).
Yes, we have seen that ext3 is a stack hog in some cases, and I
know there were some fixes in later LVM versions to remove some
huge stack allocations. Arjan also reported stack problems with
qla2x00, so it is not a surprise that the combination causes
problems.
In 2.5 there is the "4k IRQ stack" patch floating around, which
would avoid these problems.
Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/
On 10/16/02 12:43, GrandMasterLee wrote:
> My Dell 6650 has been doing this exact behaviour since we got on 5.38.9
> and up, using LVM in a production capacity. Both servers we have, have
> crashed mysteriously, without any kernel dump, etc, but all hardware
> diags come out clean.
I tell you my honest hunch - remove LVM and try again. This has made
my life a little more peaceful lately. Even with a 2-3 minute outages
while our cluster automatically fails over - the 100's of users whining
about their sessions being disconnected makes you a bit depressed.
> All hardware configuration bits are perfect, as can be anyway, and we
> still get this behaviour. After 5-6.5 days...the box black screens. So
> bad so, that all the XFS volumes we have, never enter a shutdown. We
> must repair them all, today this happened, and we lost one part of the
> tablespace on our beta db. We're using LVM1, on 2.4.19-aa1.
We had the black screen also until we got the machines oopsing over
serial. The oops was actually showing up in ext3 with a corrupted
bufferhead. Without LVM, i've measured my longest uptime, 17 days x
4 machines in the cluster (68 days) ie. we only did it 17 days ago.
~mc
>
>
>
>
> On Tue, 2002-10-15 at 23:35, J Sloan wrote:
>
>>Just to make sure we are on the same page,
>>was that LVM1, LVM2, or EVMS?
>>
>>Joe
>>
>>Michael Clark wrote:
>>
>>
>>>I doubt it will make a difference. LVM and qlogic drivers seem
>>>to be a bad mix. I've already tried the beta5 of 6.01
>>>and same problem exists - ooops about every 5-8 days.
>>>Removing LVM and solved the problem.
On Wed, 2002-10-16 at 01:03, Michael Clark wrote:
> On 10/16/02 12:43, GrandMasterLee wrote:
> > My Dell 6650 has been doing this exact behaviour since we got on 5.38.9
> > and up, using LVM in a production capacity. Both servers we have, have
> > crashed mysteriously, without any kernel dump, etc, but all hardware
> > diags come out clean.
>
> I tell you my honest hunch - remove LVM and try again. This has made
> my life a little more peaceful lately. Even with a 2-3 minute outages
> while our cluster automatically fails over - the 100's of users whining
> about their sessions being disconnected makes you a bit depressed.
Almost making it to your go-live date, only to have everything come
crashing down all around you is quite depressing.
> > All hardware configuration bits are perfect, as can be anyway, and we
> > still get this behaviour. After 5-6.5 days...the box black screens. So
> > bad so, that all the XFS volumes we have, never enter a shutdown. We
> > must repair them all, today this happened, and we lost one part of the
> > tablespace on our beta db. We're using LVM1, on 2.4.19-aa1.
>
> We had the black screen also until we got the machines oopsing over
> serial. The oops was actually showing up in ext3 with a corrupted
> bufferhead. Without LVM, i've measured my longest uptime, 17 days x
> 4 machines in the cluster (68 days) ie. we only did it 17 days ago.
>
> ~mc
I believe you, that was my next thought, but I didn't know if that would
really help just to be honest. Thanks for the input there.
I've been going crazy trying to catch any piece of sanity out of this
thing to understand if this was what was happening or not. I feel a bit
dumb for not trying serial console yet, but I knew either that or KDB
should tell us something. I will see what we can do, it will take less
time to do this, than to reload everything all over again.
Should I remove LVM all together, or just not use it? In your opinion.
On 10/16/02 14:31, GrandMasterLee wrote:
> On Wed, 2002-10-16 at 01:03, Michael Clark wrote:
>
>>On 10/16/02 12:43, GrandMasterLee wrote:
>>
[snip]
>>>All hardware configuration bits are perfect, as can be anyway, and we
>>>still get this behaviour. After 5-6.5 days...the box black screens. So
>>>bad so, that all the XFS volumes we have, never enter a shutdown. We
>>>must repair them all, today this happened, and we lost one part of the
>>>tablespace on our beta db. We're using LVM1, on 2.4.19-aa1.
>>
>>We had the black screen also until we got the machines oopsing over
>>serial. The oops was actually showing up in ext3 with a corrupted
>>bufferhead. Without LVM, i've measured my longest uptime, 17 days x
>>4 machines in the cluster (68 days) ie. we only did it 17 days ago.
>
> I believe you, that was my next thought, but I didn't know if that would
> really help just to be honest. Thanks for the input there.
>
> I've been going crazy trying to catch any piece of sanity out of this
> thing to understand if this was what was happening or not. I feel a bit
> dumb for not trying serial console yet, but I knew either that or KDB
> should tell us something. I will see what we can do, it will take less
> time to do this, than to reload everything all over again.
>
> Should I remove LVM all together, or just not use it? In your opinion.
I just didn't load the module after migrating my volumes. If the problem
is a stack problem, then its probably not necessarily a bug in LVM
- just the combination of it, ext3 and the qlogic driver don't mix well
- so if its not being used, then it won't be increasing the stack footprint.
~mc
On Wed, 2002-10-16 at 01:40, Michael Clark wrote:
...
> > Should I remove LVM all together, or just not use it? In your opinion.
>
> I just didn't load the module after migrating my volumes. If the problem
> is a stack problem, then its probably not necessarily a bug in LVM
> - just the combination of it, ext3 and the qlogic driver don't mix well
> - so if its not being used, then it won't be increasing the stack footprint.
>
> ~mc
>
Not to be dense, but it's compiled into my kernel, that's why I ask. We
try not to use modules where we can help it. So I'm thinking, if no VG
are actively used, then LVM won't affect the stack much. I just don't
know if that's true or not.
--The GrandMaster
On 10/16/02 14:48, GrandMasterLee wrote:
> On Wed, 2002-10-16 at 01:40, Michael Clark wrote:
> ...
>
>>>Should I remove LVM all together, or just not use it? In your opinion.
>>
>>I just didn't load the module after migrating my volumes. If the problem
>>is a stack problem, then its probably not necessarily a bug in LVM
>>- just the combination of it, ext3 and the qlogic driver don't mix well
>>- so if its not being used, then it won't be increasing the stack footprint.
>
> Not to be dense, but it's compiled into my kernel, that's why I ask. We
> try not to use modules where we can help it. So I'm thinking, if no VG
> are actively used, then LVM won't affect the stack much. I just don't
> know if that's true or not.
Correct. Won't effect the stack at all.
~mc
On Wednesday 16 October 2002 04:51, Michael Clark <[email protected]> wrote:
> Version 6.1b5 does appear to be a big improvement from looking
> at the code (certainly much more readable than version 4.x end earlier).
i'll try version 6.01 or so next week and i will see what happens.
thanks for your help.
> Although the method for creating the different modules for
> different hardware is pretty ugly.
>...
i see.
> This was happening in pretty much all kernels I tried (a variety of
> redhat kernels and aa kernels). Removing LVM has solved the problem.
> Although i was blaming LVM - maybe it was a buffer overflow in qla driver.
looks like i had a lot of luck, because my 3 servers wich are using the
qla2x00 5.36.3 driver were running without problems, but i'll update to 6.01
in the next few day's.
i don't use lvm, the filesystem i use is xfs, so it smells like i had a lot of luck for
not running into this problem, ...
simon.
(please CC me, i'm not subscribed to lkml)
On Wednesday 16 October 2002 07:02, GrandMasterLee <[email protected]> wrote:
> Do you use LVM, EVMS, MD, other, or none?
>
none.
it's a XFS filesystem with the folowing mount options:
rw,noatime,logbufs=8,logbsize=32768
(this apply's to all 3 machines)
simon.
(please CC me, i'm not subscribed to lkml)
On 10/17/02 00:28, Simon Roscic wrote:
>>This was happening in pretty much all kernels I tried (a variety of
>>redhat kernels and aa kernels). Removing LVM has solved the problem.
>>Although i was blaming LVM - maybe it was a buffer overflow in qla driver.
>
> looks like i had a lot of luck, because my 3 servers wich are using the
> qla2x00 5.36.3 driver were running without problems, but i'll update to 6.01
> in the next few day's.
>
> i don't use lvm, the filesystem i use is xfs, so it smells like i had a lot of luck for
> not running into this problem, ...
Seems to be the correlation so far. qlogic driver without lvm works okay.
qlogic driver with lvm, oopsorama.
~mc
On Tue, 15 Oct 2002, Andreas Dilger wrote:
> On Oct 16, 2002 13:28 +0800, Michael Clark wrote:
> > Every one i was getting oops when used with a combination
> > of ext3, LVM1 and qla2x00 driver.
> >
> > Since taking LVM1 out of the picture, my oopsing problem has
> > gone away. This could of course not be LVM1's fault but the
> > fact that qla driver is a stack hog or something - i don't have
> > enough information to draw any conclusions all at the moment
> > i'm too scared to try LVM again (plus the time it takes to
> > migrate a few hundred gigs of storage).
>
> Yes, we have seen that ext3 is a stack hog in some cases, and I
> know there were some fixes in later LVM versions to remove some
> huge stack allocations. Arjan also reported stack problems with
> qla2x00, so it is not a surprise that the combination causes
> problems.
>
The stack issues were a major problem in the 5.3x series driver. I
believe, I can check tomorrow, 5.38.9 (the driver Dell distributes)
contains fixes for the stack clobbering -- qla2x00-rh1-3 also contain
the fixes.
IAC, I believe the support tech working with MasterLee had asked
for additional information regarding the configuration as well as
some basic logs. Ideally we'd like to setup a similiar configuration
in house and see what's happening...
--
Andrew Vasquez | [email protected] |
DSS: 0x508316BB, FP: 79BD 4FAC 7E82 FF70 6C2B 7E8B 168F 5529 5083 16BB
On Wed, 2002-10-16 at 20:59, Andrew Vasquez wrote:
> > Yes, we have seen that ext3 is a stack hog in some cases, and I
> > know there were some fixes in later LVM versions to remove some
> > huge stack allocations. Arjan also reported stack problems with
> > qla2x00, so it is not a surprise that the combination causes
> > problems.
> >
> The stack issues were a major problem in the 5.3x series driver. I
> believe, I can check tomorrow, 5.38.9 (the driver Dell distributes)
> contains fixes for the stack clobbering -- qla2x00-rh1-3 also contain
> the fixes.
Does this mean that 6.01 will NOT work either? What drivers will be
affected? We've already made the move to remove LVM from the mix, but
your comments above give me some doubt as to how definite it is, that
the stack clobbering will be fixed by doing so.
> IAC, I believe the support tech working with MasterLee had asked
> for additional information regarding the configuration as well as
> some basic logs. Ideally we'd like to setup a similiar configuration
> in house and see what's happening...
In-house? Just curious. What can "I" do to know if our configuration
won't get broken, just by removing LVM? TIA.
> --
> Andrew Vasquez | [email protected] |
> DSS: 0x508316BB, FP: 79BD 4FAC 7E82 FF70 6C2B 7E8B 168F 5529 5083 16BB
Do you actually get the lockups then?
On Wed, 2002-10-16 at 11:38, Simon Roscic wrote:
> On Wednesday 16 October 2002 07:02, GrandMasterLee <[email protected]> wrote:
> > Do you use LVM, EVMS, MD, other, or none?
> >
>
> none.
> it's a XFS filesystem with the folowing mount options:
> rw,noatime,logbufs=8,logbsize=32768
>
> (this apply's to all 3 machines)
>
> simon.
> (please CC me, i'm not subscribed to lkml)
>
On Wed, 16 Oct 2002, GrandMasterLee wrote:
> On Wed, 2002-10-16 at 20:59, Andrew Vasquez wrote:
> > > Yes, we have seen that ext3 is a stack hog in some cases, and I
> > > know there were some fixes in later LVM versions to remove some
> > > huge stack allocations. Arjan also reported stack problems with
> > > qla2x00, so it is not a surprise that the combination causes
> > > problems.
> > >
> > The stack issues were a major problem in the 5.3x series driver. I
> > believe, I can check tomorrow, 5.38.9 (the driver Dell distributes)
> > contains fixes for the stack clobbering -- qla2x00-rh1-3 also contain
> > the fixes.
>
> Does this mean that 6.01 will NOT work either? What drivers will be
> affected? We've already made the move to remove LVM from the mix, but
> your comments above give me some doubt as to how definite it is, that
> the stack clobbering will be fixed by doing so.
>
The 6.x series driver basically branched from the 5.x series driver.
Changes made, many moons ago, are already in the 6.x series driver.
To quell your concerns, yes, stack overflow is not an issue with the
6.x series driver.
I believe if we are to get anywhere regarding this issue, we need to
shift focus from stack corruption in early versions of the driver.
> > IAC, I believe the support tech working with MasterLee had asked
> > for additional information regarding the configuration as well as
> > some basic logs. Ideally we'd like to setup a similiar configuration
> > in house and see what's happening...
>
> In-house?
>
Sorry, short introduction, Andrew Vasquez, Linux driver development at
QLogic.
> Just curious. What can "I" do to know if our configuration
> won't get broken, just by removing LVM? TIA.
>
I've personally never used LVM before, so I cannot even begin to
attempt to answer your question -- please work with the tech on this
one, if it's a driver problem, we'd like to fix it.
--
Andrew
On Wed, 2002-10-16 at 11:49, Michael Clark wrote:
> On 10/17/02 00:28, Simon Roscic wrote:
>
> >>This was happening in pretty much all kernels I tried (a variety of
> >>redhat kernels and aa kernels). Removing LVM has solved the problem.
> >>Although i was blaming LVM - maybe it was a buffer overflow in qla driver.
> >
> > looks like i had a lot of luck, because my 3 servers wich are using the
> > qla2x00 5.36.3 driver were running without problems, but i'll update to 6.01
> > in the next few day's.
> >
> > i don't use lvm, the filesystem i use is xfs, so it smells like i had a lot of luck for
> > not running into this problem, ...
So then, it seems that LVM is adding stress to the system in a way that
is bad for the kernel. Perhaps the read-ahead in conjunction with the
large buffers from XFS, plus the amount of volumes we run(22 on the
latest machine to crash).
> Seems to be the correlation so far. qlogic driver without lvm works okay.
> qlogic driver with lvm, oopsorama.
Michael, what exactly do your servers do? Are they DB servers with ~1Tb
connected, or file-servers with hundreds of gigs, etc?
> ~mc
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Wed, 2002-10-16 at 22:11, Andrew Vasquez wrote:
...
> > Does this mean that 6.01 will NOT work either? What drivers will be
> > affected? We've already made the move to remove LVM from the mix, but
> > your comments above give me some doubt as to how definite it is, that
> > the stack clobbering will be fixed by doing so.
> >
I was asking because We crashed, while using this driver, AND LVM.
> The 6.x series driver basically branched from the 5.x series driver.
> Changes made, many moons ago, are already in the 6.x series driver.
> To quell your concerns, yes, stack overflow is not an issue with the
> 6.x series driver.
>
> I believe if we are to get anywhere regarding this issue, we need to
> shift focus from stack corruption in early versions of the driver.
In this way, you mean, that it is not an issue since you guys don't try
to use LVM.
> > > IAC, I believe the support tech working with MasterLee had asked
> > > for additional information regarding the configuration as well as
> > > some basic logs. Ideally we'd like to setup a similiar configuration
> > > in house and see what's happening...
> >
> > In-house?
> >
> Sorry, short introduction, Andrew Vasquez, Linux driver development at
> QLogic.
Nice to meet ya. :)
> > Just curious. What can "I" do to know if our configuration
> > won't get broken, just by removing LVM? TIA.
> >
> I've personally never used LVM before, so I cannot even begin to
> attempt to answer your question --
We've removed LVM from the config, per Michael's issue and
recommendation, but I'm just scared that we *could* see this issue with
XFS and Qlogic. Since you're saying the 6.01 has no stack clobbering
issues, then is it XFS, LVM and Qlogic?
> please work with the tech on this
> one, if it's a driver problem, we'd like to fix it.
I'm going to try, but we've got to get up and in production ASAP. Since
it takes *days* to cause the crash, I don't know how I can cause it and
get the stack dump.
> --
> Andrew
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On 10/17/02 11:12, GrandMasterLee wrote:
> On Wed, 2002-10-16 at 11:49, Michael Clark wrote:
>>Seems to be the correlation so far. qlogic driver without lvm works okay.
>>qlogic driver with lvm, oopsorama.
>
>
> Michael, what exactly do your servers do? Are they DB servers with ~1Tb
> connected, or file-servers with hundreds of gigs, etc?
My customer currently has about 400Gb on this particular 4 node Application
cluster (actually 2 x 2 node clusters using kimberlite HA software).
It has 11 logical hosts (services) spread over the 4 nodes with services such
as Oracle 8.1.7, Oracle Financials (11i), a busy openldap server, and busy
netatalk AppleShare Servers, Cyrus IMAP server. All are on ext3 partitions
and were previously using LVM to slice up the storage.
The cluster usually has around 200-300 active users.
We have had oops (in ext3) on differing logical hosts which where running
different services. ie. has oopsed on the node mastering the fileserver,
and also on the node mastering the oracle database.
Cross fingers, since removing LVM (which was the only change we have made,
same kernel) we have had 3 times our longest uptime and still counting.
By the sounds, from earlier emails I had posted, users had responded
to me who were also using qlogic and none of them had had any problems,
the key factor was none of them were running LVM - this is what made
me think to try and remove it (it was really just a hunch). We had
gone through months of changing kernel versions, changing GigE network
adapters, driver versions, etc, to no avail, then finally the LVM removal.
Due to the potential nature of it being a stack problem. The problem
really can't just be pointed at LVM but more the additive effect this
would have on some underlying stack problem.
I believe the RedHat kernels i tried (rh7.2 2.4.9-34 errata was the most
recent) also had this 'stack' problem. I am currently using 2.4.19pre10aa4.
I would hate to reccomend you remove LVM and it not work, but i
must say it has worked for me (i'm just glad i didn't go to XFS instead
of removing LVM as i did - as this was the other option i was pondering).
~mc
On Wed, 2002-10-16 at 22:54, Michael Clark wrote:
> On 10/17/02 11:12, GrandMasterLee wrote:
> > On Wed, 2002-10-16 at 11:49, Michael Clark wrote:
> >>Seems to be the correlation so far. qlogic driver without lvm works okay.
> >>qlogic driver with lvm, oopsorama.
> >
> >
> > Michael, what exactly do your servers do? Are they DB servers with ~1Tb
> > connected, or file-servers with hundreds of gigs, etc?
>
> My customer currently has about 400Gb on this particular 4 node Application
> cluster (actually 2 x 2 node clusters using kimberlite HA software).
>
> It has 11 logical hosts (services) spread over the 4 nodes with services such
> as Oracle 8.1.7, Oracle Financials (11i), a busy openldap server, and busy
> netatalk AppleShare Servers, Cyrus IMAP server. All are on ext3 partitions
> and were previously using LVM to slice up the storage.
On each of the Nodes, correct?
> The cluster usually has around 200-300 active users.
>
> We have had oops (in ext3) on differing logical hosts which where running
> different services. ie. has oopsed on the node mastering the fileserver,
> and also on the node mastering the oracle database.
And again, each was running LVM in a shared storage mode for failover?
> Cross fingers, since removing LVM (which was the only change we have made,
> same kernel) we have had 3 times our longest uptime and still counting.
>
> By the sounds, from earlier emails I had posted, users had responded
> to me who were also using qlogic and none of them had had any problems,
> the key factor was none of them were running LVM - this is what made
> me think to try and remove it (it was really just a hunch). We had
> gone through months of changing kernel versions, changing GigE network
> adapters, driver versions, etc, to no avail, then finally the LVM removal.
Kewl. That makes me feel much better now too.
> Due to the potential nature of it being a stack problem. The problem
> really can't just be pointed at LVM but more the additive effect this
> would have on some underlying stack problem.
>
> I believe the RedHat kernels i tried (rh7.2 2.4.9-34 errata was the most
> recent) also had this 'stack' problem. I am currently using 2.4.19pre10aa4.
Kewl. I'm using 2.4.19-aa1 (rc5-aa1, but hell, it's the same thing).
> I would hate to reccomend you remove LVM and it not work, but i
> must say it has worked for me (i'm just glad i didn't go to XFS instead
> of removing LVM as i did - as this was the other option i was pondering).
I hear you. We were pondering changing to EXT3, and not just EXT3, RHAS
also. i.e. more money, unknown kernel config, etc. I was going to be
*very* upset. Are you running FC2(qla2300Fs in FC2 config) or FC1?
TIA
> ~mc
>
On 10/17/02 12:08, GrandMasterLee wrote:
> On Wed, 2002-10-16 at 22:54, Michael Clark wrote:
>
>>On 10/17/02 11:12, GrandMasterLee wrote:
>>
>>>On Wed, 2002-10-16 at 11:49, Michael Clark wrote:
>>>
>>>>Seems to be the correlation so far. qlogic driver without lvm works okay.
>>>>qlogic driver with lvm, oopsorama.
>>>
>>>
>>>Michael, what exactly do your servers do? Are they DB servers with ~1Tb
>>>connected, or file-servers with hundreds of gigs, etc?
>>
>>My customer currently has about 400Gb on this particular 4 node Application
>>cluster (actually 2 x 2 node clusters using kimberlite HA software).
>>
>>It has 11 logical hosts (services) spread over the 4 nodes with services such
>>as Oracle 8.1.7, Oracle Financials (11i), a busy openldap server, and busy
>>netatalk AppleShare Servers, Cyrus IMAP server. All are on ext3 partitions
>>and were previously using LVM to slice up the storage.
>
>
> On each of the Nodes, correct?
We had originally planned to split up the storage in the RAID head
using individual luns for each cluster logical host - so we could use SCSI
reservations - but we encountered problems with the RAID heads device queue.
The RAID head has a global queue depth of 64 and to alleviate
queue problems with the RAID heading locking up, we needed to minimise
the number of luns, so late in the piece we added LVM to split up the storage.
We are using LVM in a clustered fashion. ie. we export most of the array
as one big lun, and slice it into lvs, each one associated with a logical host.
All lvs are accessible from all 4 physical hosts in the cluster. Care and
application locking ensures only 1 physical host mounts any lv/partition at
the same time (except for cluster quorum partitions which need to be accessed
concurrently from 2 nodes - and for these we have seperate quorum disks in
the array).
lvm metadata changes are made from one node while the others are down
(or just have volumes deactivated, unmounted, then lvm-mod removed)
to avoid screwing our metadata because lvm is not cluster aware.
We are not using mutlipath put have the cluster arranged in a topology
such that the HA RAID Head has 2 controllers with each side of the cluster
hanging of a different one ie. L and R. If we have a path failure, we will
just loose CPU capacity (25-50% depending). The logical hosts will automatically
move onto a physical node which still has connectivity to the RAID head
(by the cluster software checking of connectivity to the quorum paritions).
This gives us a good level of redundancy without the added cost of 2 paths
from each host. ie. after a path failure we run with degraded performance only.
We are using vanilla 2300's.
~mc
On 10/17/02 11:11, Andrew Vasquez wrote:
> On Wed, 16 Oct 2002, GrandMasterLee wrote:
>
>
>>On Wed, 2002-10-16 at 20:59, Andrew Vasquez wrote:
>>
>>>The stack issues were a major problem in the 5.3x series driver. I
>>>believe, I can check tomorrow, 5.38.9 (the driver Dell distributes)
>>>contains fixes for the stack clobbering -- qla2x00-rh1-3 also contain
>>>the fixes.
>>
>>Does this mean that 6.01 will NOT work either? What drivers will be
>>affected? We've already made the move to remove LVM from the mix, but
>>your comments above give me some doubt as to how definite it is, that
>>the stack clobbering will be fixed by doing so.
>>
>
> The 6.x series driver basically branched from the 5.x series driver.
> Changes made, many moons ago, are already in the 6.x series driver.
> To quell your concerns, yes, stack overflow is not an issue with the
> 6.x series driver.
>
> I believe if we are to get anywhere regarding this issue, we need to
> shift focus from stack corruption in early versions of the driver.
Well corruption of bufferheads was happening for me with a potentially
stack deep setup (ext3+LVM+qlogic). Maybe it has been fixed in the
non-LVM case but is still an issue as I have had it with 6.0.1b3 -
The stack fix is listed in 6.0b13 which is quite a few release behind
the one i've had the problem with.
I posted the oops to lk about 3 weeks ago. Wasn't sure it was a qlogic
problem at the time, and still am not certain - maybe just sum of
stack(ext3+lvm+qlogic). Even if qla stack was trimmed for the common case,
it may still be a problem when LVM is active as there would be much
deeper stacks during block io.
http://marc.theaimsgroup.com/?l=linux-kernel&m=103302016311188&w=2
The oops doesn't show qlogic at all although it is a corrupt bufferhead
that is causing the oops so may have been silently corrupted earlier
by a qlogic interrupt or block io submission while deep inside lvm and
ext3 or some such, ie. the oops is one of those difficult sort that shows
up corruption from some earlier event that is not directly traceable from
the oops itself.
~mc
On Thursday 17 October 2002 05:08, GrandMasterLee <[email protected]> wrote:
> Do you actually get the lockups then?
no, i didn't had any lookups, each of the machines currently have an uptime
of only 16 day's and that's because we had to shutdown the power in our
whole company for a half day.
the best uptime i had was approx 40-50 day's, then i got the following
problem: the lotus domino server processes (not the whole machine) were
freezing every week, but that is a known problem for heavy loaded domino
servers, you have to increase the ammount of ipc memory for java or something
(of the domino server), and since i did this everything works without problems.
the "primary" lotus domino server also got quite swap happy in the last weeks,
currently he has to serve almost everything that has to do with notes, the
second server isn't realy in use yet ...
if you are interested, procinfo -a shows this on one of the 3 machines:
(all 3 are the same, except that the "primary" lotus domino server has 2 cpu's
and 2 gb ram, the other 2, have 1 cpu and 1 gb ram)
---------------- procinfo ----------------
Linux 2.4.17-xfs-smp (root@adam-neu) (gcc 2.95.3 20010315 ) #1 2CPU [adam.]
Memory: Total Used Free Shared Buffers Cached
Mem: 2061272 2050784 10488 0 2328 1865288
Swap: 1056124 265652 790472
Bootup: Tue Oct 1 17:42:07 2002 Load average: 0.14 0.08 0.02 1/445 11305
user : 1d 20:15:32.03 5.7% page in :1196633058 disk 1: 1670401r 953006w
nice : 0:00:24.69 0.0% page out:261985556 disk 2: 27762380r11039499w
system: 13:05:51.19 1.7% swap in : 5870304 disk 3: 4r 0w
idle : 29d 18:09:25.15 92.5% swap out: 5099371 disk 4: 4r 0w
uptime: 16d 1:45:36.53 context :2810591591
irq 0: 138873653 timer irq 12: 104970 PS/2 Mouse
irq 1: 5597 keyboard irq 14: 54 ide0
irq 2: 0 cascade [4] irq 18: 8659653 ips
irq 3: 1 irq 20: 421419256 e1000
irq 4: 1 irq 24: 38444870 qla2200
irq 6: 3 irq 28: 17728 e100
irq 8: 2 rtc
Kernel Command Line:
auto BOOT_IMAGE=Linux ro root=803 BOOT_FILE=/boot/vmlinuz
Modules:
24 *sg 6 lp 25 parport 59 *e100
48 *e1000 165 *qla2200
Character Devices: Block Devices:
1 mem 10 misc 2 fd
2 pty 21 sg 3 ide0
3 ttyp 29 fb 8 sd
4 ttyS 128 ptm 65 sd
5 cua 136 pts 66 sd
6 lp 162 raw
7 vcs 254 HbaApiDev
File Systems:
[rootfs] [bdev] [proc] [sockfs]
[tmpfs] [pipefs] ext3 ext2
[nfs] [smbfs] [devpts] xfs
---------------- procinfo ----------------
the kernel running on the 3 machines is a "vanilla" 2.4.17
plus XFS, plus ext3-0.9.17, plus intel ether express 100 and
and intel ether express 1000 driver (e100 and e1000), and
the qlogic qla2x00 5.36.3 driver ...
i think i will wait for 2.4.20 and then make a new kernel for the 3 machines ...
simon.
(please CC me, i'm not subscribed to lkml)
On Thu, 2002-10-17 at 04:40, Michael Clark wrote:
> On 10/17/02 11:11, Andrew Vasquez wrote:
> > On Wed, 16 Oct 2002, GrandMasterLee wrote:
> >
> >
> >>On Wed, 2002-10-16 at 20:59, Andrew Vasquez wrote:
> >>
> >>>The stack issues were a major problem in the 5.3x series driver. I
> >>>believe, I can check tomorrow, 5.38.9 (the driver Dell distributes)
> >>>contains fixes for the stack clobbering -- qla2x00-rh1-3 also contain
> >>>the fixes.
> >>
> >>Does this mean that 6.01 will NOT work either? What drivers will be
> >>affected? We've already made the move to remove LVM from the mix, but
> >>your comments above give me some doubt as to how definite it is, that
> >>the stack clobbering will be fixed by doing so.
> >>
> >
> > The 6.x series driver basically branched from the 5.x series driver.
> > Changes made, many moons ago, are already in the 6.x series driver.
> > To quell your concerns, yes, stack overflow is not an issue with the
> > 6.x series driver.
> >
> > I believe if we are to get anywhere regarding this issue, we need to
> > shift focus from stack corruption in early versions of the driver.
>
> Well corruption of bufferheads was happening for me with a potentially
> stack deep setup (ext3+LVM+qlogic). Maybe it has been fixed in the
> non-LVM case but is still an issue as I have had it with 6.0.1b3 -
> The stack fix is listed in 6.0b13 which is quite a few release behind
> the one i've had the problem with.
I don't disagree, but I saw the same things with XFS filesystems on LVM
also. This leads me to my next question. Does anyone on this list use
XFS plus QLA2300's with 500GB+ mounted by several volumes on Qlogic
driver 5.38.x or > and have greater than 20 days uptime to date?
> I posted the oops to lk about 3 weeks ago. Wasn't sure it was a qlogic
> problem at the time, and still am not certain - maybe just sum of
> stack(ext3+lvm+qlogic). Even if qla stack was trimmed for the common case,
> it may still be a problem when LVM is active as there would be much
> deeper stacks during block io.
>
Kewl..thanks much.
On Thu, 2002-10-17 at 12:47, Simon Roscic wrote:
> On Thursday 17 October 2002 05:08, GrandMasterLee <[email protected]> wrote:
> > Do you actually get the lockups then?
>
> no, i didn't had any lookups, each of the machines currently have an uptime
> of only 16 day's and that's because we had to shutdown the power in our
> whole company for a half day.
> ...
One question about your config, are you using, on ANY machines,
QLA2300's or PCI-X, and 5.38.x or 6.xx qlogic drivers? If so, then
you've experienced no lockups with those machines too?
> if you are interested, procinfo -a shows this on one of the 3 machines:
> (all 3 are the same, except that the "primary" lotus domino server has 2 cpu's
> and 2 gb ram, the other 2, have 1 cpu and 1 gb ram)
>
> ---------------- procinfo ----------------
> Linux 2.4.17-xfs-smp (root@adam-neu) (gcc 2.95.3 20010315 ) #1 2CPU [adam.]
Thanks for the info. I will hopefully have >5 days uptime now too. If
not, anyone need a Systems Architect? :-D
--The GrandMaster
On Friday 18 October 2002 08:42, GrandMasterLee <[email protected]> wrote:
> One question about your config, are you using, on ANY machines,
> QLA2300's or PCI-X, and 5.38.x or 6.xx qlogic drivers? If so, then
> you've experienced no lockups with those machines too?
no, the 3 machines i use, are basically the same (ibm xseries 342),
and have the same qlogic cards (qla2200), all 3 machines use the
same kernel (2.4.17+xfs+ext3-0.9.17+e100+e1000+qla2x00-5.36.3).
a few details, possibly something help's you:
---------------- lspci ----------------
00:00.0 Host bridge: ServerWorks CNB20HE (rev 23)
00:00.1 Host bridge: ServerWorks CNB20HE (rev 01)
00:00.2 Host bridge: ServerWorks: Unknown device 0006 (rev 01)
00:00.3 Host bridge: ServerWorks: Unknown device 0006 (rev 01)
00:06.0 VGA compatible controller: S3 Inc. Savage 4 (rev 06)
00:0f.0 ISA bridge: ServerWorks OSB4 (rev 51)
00:0f.1 IDE interface: ServerWorks: Unknown device 0211
00:0f.2 USB Controller: ServerWorks: Unknown device 0220 (rev 04)
01:02.0 RAID bus controller: IBM Netfinity ServeRAID controller
01:03.0 Ethernet controller: Intel Corporation 82543GC Gigabit Ethernet Controller (rev 02)
01:07.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 0c)
02:05.0 SCSI storage controller: QLogic Corp. QLA2200 (rev 05)
---------------------------------------
--------- dmesg (qla stuff)--------
qla2x00: Found VID=1077 DID=2200 SSVID=1077 SSDID=2
scsi1: Found a QLA2200 @ bus 2, device 0x5, irq 24, iobase 0x2100
scsi(1): Configure NVRAM parameters...
scsi(1): Verifying loaded RISC code...
scsi(1): Verifying chip...
scsi(1): Waiting for LIP to complete...
scsi(1): LIP reset occurred
scsi(1): LIP occurred.
scsi(1): LOOP UP detected
scsi1: Topology - (Loop), Host Loop address 0x7d
scsi-qla0-adapter-port=210000e08b064002\;
scsi-qla0-tgt-0-di-0-node=200600a0b80c3d8c\;
scsi-qla0-tgt-0-di-0-port=200600a0b80c3d8d\;
scsi-qla0-tgt-0-di-0-control=00\;
scsi-qla0-tgt-0-di-0-preferred=ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff\;
scsi1 : QLogic QLA2200 PCI to Fibre Channel Host Adapter: bus 2 device 5 irq 24
Firmware version: 2.01.37, Driver version 5.36.3
Vendor: IBM Model: 3552 Rev: 0401
Type: Direct-Access ANSI SCSI revision: 03
Vendor: IBM Model: 3552 Rev: 0401
Type: Direct-Access ANSI SCSI revision: 03
Vendor: IBM Model: 3552 Rev: 0401
Type: Direct-Access ANSI SCSI revision: 03
Vendor: IBM Model: 3552 Rev: 0401
Type: Direct-Access ANSI SCSI revision: 03
scsi(1:0:0:0): Enabled tagged queuing, queue depth 16.
scsi(1:0:0:1): Enabled tagged queuing, queue depth 16.
scsi(1:0:0:2): Enabled tagged queuing, queue depth 16.
scsi(1:0:0:3): Enabled tagged queuing, queue depth 16.
Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
Attached scsi disk sdc at scsi1, channel 0, id 0, lun 1
Attached scsi disk sdd at scsi1, channel 0, id 0, lun 2
Attached scsi disk sde at scsi1, channel 0, id 0, lun 3
SCSI device sdb: 125829120 512-byte hdwr sectors (64425 MB)
sdb: sdb1
SCSI device sdc: 125829120 512-byte hdwr sectors (64425 MB)
sdc: sdc1
SCSI device sdd: 125829120 512-byte hdwr sectors (64425 MB)
sdd: sdd1
SCSI device sde: 48599040 512-byte hdwr sectors (24883 MB)
sde: sde1
---------------------------------------
alle 3 machines have the same filesystem concept:
internal storage -> ext3 (linux and programs)
storage on fastt500 -> xfs (data only)
except the mount options, because the fileserver also need's
quotas, he has:
rw,noatime,quota,usrquota,grpquota,logbufs=8,logbsize=32768
i consider the primary lotus domino server to be the machine wich
has to handle the highest load of the 3, because he currently has
to handle almost everything that has to do with lotus notes in your
company, it's friday afternoon here, so the load isn't realy high,
but it's possibly nice for you to know how much load the machine
has to handle:
---------------------------------------
4:59pm up 16 days, 23:17, 3 users, load average: 0.43, 0.18, 0.11
445 processes: 441 sleeping, 4 running, 0 zombie, 0 stopped
CPU0 states: 6.13% user, 1.57% system, 0.0% nice, 91.57% idle
CPU1 states: 26.19% user, 5.27% system, 0.0% nice, 68.17% idle
Mem: 2061272K av, 2049636K used, 11636K free, 0K shrd, 1604K buff
Swap: 1056124K av, 262532K used, 793592K free 1856420K cached
---------------------------------------
(/local/notesdata is on the fastt500)
adam:/ # lsof |grep /local/notesdata/ |wc -l
33076
---------------------------------------
adam:/ # lsof |wc -l
84604
---------------------------------------
simon.
(please CC me, i'm not subscribed to lkml)
> short introduction, Andrew Vasquez, Linux driver development at
> QLogic.
I'd like to see the QLogic 6.x driver in the Linus and Marcelo
trees. In the tests I've run, it has better throughput and
lower latency than the standard driver. The improvement
on journaled filesystems and synchronous I/O is where the 6.x
driver shines.
How about the latest final 6.x driver for 2.4.21-pre,
and the 6.x beta driver for 2.5.x? :)
For people who like numbers...
tiobench-0.3.3 is a multithreaded I/O benchmark.
Unit information
================
File size = 12888 megabytes
Blk Size = 4096 bytes
Num Thr = number of threads
Rate = megabytes per second
CPU% = percentage of CPU used during the test
Latency = milliseconds
Lat% = percent of requests that took longer than X seconds
CPU Eff = Rate divided by CPU% - throughput per cpu load
2.4.19-pre10-aa4 has QLogic 6.x driver
2.4.19-pre10-aa4-oql has old (standard) QLogic driver.
Ext3 fs has dramatic improvements in throughput and generally
much lower average and maximum latency with the QLogic 6.x driver.
Sequential Reads ext3
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency 2s 10s Eff
-------------------- --- --------------------------------------------------------
2.4.19-pre10-aa4 1 50.34 28.23% 0.230 159.48 0.00000 0.00000 178
2.4.19-pre10-aa4-oql 1 49.26 28.80% 0.235 305.75 0.00000 0.00000 171
2.4.19-pre10-aa4 32 7.24 4.34% 50.964 10279.10 1.72278 0.00000 167
2.4.19-pre10-aa4-oql 32 5.00 2.90% 73.941 16359.07 1.99270 0.00000 173
2.4.19-pre10-aa4 64 7.13 4.32% 102.581 21062.55 2.13490 0.00000 165
2.4.19-pre10-aa4-oql 64 4.79 2.78% 152.879 32394.43 2.12285 0.00318 172
2.4.19-pre10-aa4 128 6.92 4.20% 209.363 41943.96 2.22813 1.24283 165
2.4.19-pre10-aa4-oql 128 4.56 2.68% 317.688 66597.95 2.25661 1.98520 170
2.4.19-pre10-aa4 256 6.82 4.09% 418.905 87535.80 2.27172 2.11080 167
2.4.19-pre10-aa4-oql 256 4.43 2.62% 645.476 133976.17 2.30475 2.13893 169
Random Reads ext3
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency 2s 10s Eff
-------------------- --- --------------------------------------------------------
2.4.19-pre10-aa4 1 0.64 0.73% 18.338 111.72 0.00000 0.00000 87
2.4.19-pre10-aa4-oql 1 0.63 0.64% 18.482 99.78 0.00000 0.00000 100
2.4.19-pre10-aa4 32 2.38 2.58% 122.580 14073.47 0.65000 0.00000 92
2.4.19-pre10-aa4-oql 32 1.60 1.64% 179.073 19904.25 0.75000 0.00000 98
2.4.19-pre10-aa4 64 2.36 3.05% 202.490 15891.16 3.04939 0.00000 78
2.4.19-pre10-aa4-oql 64 1.60 2.00% 292.337 25545.70 3.20061 0.00000 80
2.4.19-pre10-aa4 128 2.38 2.72% 355.104 17775.10 6.07358 0.00000 88
2.4.19-pre10-aa4-oql 128 1.56 2.25% 536.262 27685.78 6.95565 0.00000 69
2.4.19-pre10-aa4 256 2.39 3.66% 667.890 18035.51 13.02083 0.00000 65
2.4.19-pre10-aa4-oql 256 1.59 2.19% 995.664 27016.55 15.13020 0.00000 73
Sequential Writes ext3
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency 2s 10s Eff
-------------------- --- --------------------------------------------------------
2.4.19-pre10-aa4 1 44.19 56.57% 0.243 5740.37 0.00003 0.00000 78
2.4.19-pre10-aa4-oql 1 36.93 47.25% 0.291 6984.53 0.00010 0.00000 78
2.4.19-pre10-aa4 32 20.07 130.9% 8.552 11636.29 0.04701 0.00000 15
2.4.19-pre10-aa4-oql 32 18.21 120.4% 10.188 14347.38 0.09546 0.00000 15
2.4.19-pre10-aa4 64 17.02 115.5% 19.166 37065.25 0.30819 0.00019 15
2.4.19-pre10-aa4-oql 64 14.64 102.4% 21.978 28398.20 0.42292 0.00000 14
2.4.19-pre10-aa4 128 14.50 100.8% 44.053 51945.59 0.87108 0.00175 14
2.4.19-pre10-aa4-oql 128 11.34 86.43% 54.214 48119.96 1.15720 0.00410 13
2.4.19-pre10-aa4 256 11.70 78.84% 104.914 54905.07 2.27391 0.02009 15
2.4.19-pre10-aa4-oql 256 9.13 60.18% 131.867 60897.68 2.84818 0.03341 15
Random Writes ext3
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency 2s 10s Eff
-------------------- --- --------------------------------------------------------
2.4.19-pre10-aa4 1 4.42 4.14% 0.086 1.57 0.00000 0.00000 107
2.4.19-pre10-aa4-oql 1 3.54 3.17% 0.086 1.17 0.00000 0.00000 112
2.4.19-pre10-aa4 32 4.24 11.74% 0.280 13.71 0.00000 0.00000 36
2.4.19-pre10-aa4-oql 32 3.47 9.46% 0.294 69.40 0.00000 0.00000 37
2.4.19-pre10-aa4 64 4.25 12.05% 0.283 10.86 0.00000 0.00000 35
2.4.19-pre10-aa4-oql 64 3.47 10.30% 0.356 41.28 0.00000 0.00000 34
2.4.19-pre10-aa4 128 4.38 105.7% 19.575 2590.92 0.75605 0.00000 4
2.4.19-pre10-aa4-oql 128 3.48 11.37% 0.433 97.01 0.00000 0.00000 31
2.4.19-pre10-aa4 256 4.19 11.36% 0.269 9.55 0.00000 0.00000 37
2.4.19-pre10-aa4-oql 256 3.44 10.09% 0.270 8.84 0.00000 0.00000 34
Sequential Reads ext2
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency 2s 10s Eff
-------------------- --- --------------------------------------------------------
2.4.19-pre10-aa4 1 50.22 27.99% 0.231 916.73 0.00000 0.00000 179
2.4.19-pre10-aa4-oql 1 49.31 28.56% 0.235 671.86 0.00000 0.00000 173
2.4.19-pre10-aa4 32 40.58 25.30% 8.851 14977.86 0.13511 0.00000 160
2.4.19-pre10-aa4-oql 32 36.54 21.75% 9.534 33524.30 0.15471 0.00009 168
2.4.19-pre10-aa4 64 40.30 24.97% 17.409 29836.18 0.39867 0.00009 161
2.4.19-pre10-aa4-oql 64 36.75 22.04% 18.318 66908.84 0.17503 0.07566 167
2.4.19-pre10-aa4 128 40.58 25.06% 33.476 59144.35 0.40223 0.03750 162
2.4.19-pre10-aa4-oql 128 36.65 21.80% 35.286 116917.79 0.19392 0.16241 168
2.4.19-pre10-aa4 256 40.43 25.21% 64.505 116303.68 0.42705 0.36046 160
2.4.19-pre10-aa4-oql 256 36.67 22.05% 66.686 247490.25 0.22520 0.19671 166
Random Reads ext2
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency 2s 10s Eff
-------------------- --- --------------------------------------------------------
2.4.19-pre10-aa4 1 0.73 0.79% 16.139 122.84 0.00000 0.00000 92
2.4.19-pre10-aa4-oql 1 0.74 0.71% 15.813 108.18 0.00000 0.00000 104
2.4.19-pre10-aa4 32 5.21 6.12% 65.858 263.66 0.00000 0.00000 85
2.4.19-pre10-aa4-oql 32 3.49 3.65% 96.096 622.30 0.00000 0.00000 96
2.4.19-pre10-aa4 64 5.34 8.04% 124.586 1666.34 0.00000 0.00000 66
2.4.19-pre10-aa4-oql 64 3.66 3.85% 161.338 9160.40 0.47883 0.00000 95
2.4.19-pre10-aa4 128 5.31 9.59% 188.362 6900.68 1.23488 0.00000 55
2.4.19-pre10-aa4-oql 128 3.69 5.39% 256.389 10303.55 3.70464 0.00000 68
2.4.19-pre10-aa4 256 5.35 7.01% 321.321 7466.05 4.06250 0.00000 76
2.4.19-pre10-aa4-oql 256 3.70 5.84% 445.043 11064.58 8.25521 0.00000 63
Sequential Writes ext2
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency 2s 10s Eff
-------------------- --- --------------------------------------------------------
2.4.19-pre10-aa4 1 44.74 29.79% 0.237 10862.42 0.00022 0.00000 150
2.4.19-pre10-aa4-oql 1 39.82 26.00% 0.266 12820.23 0.00136 0.00000 153
2.4.19-pre10-aa4 32 37.54 46.78% 8.435 12528.86 0.02540 0.00000 80
2.4.19-pre10-aa4-oql 32 32.76 38.44% 9.718 12559.96 0.12649 0.00000 85
2.4.19-pre10-aa4 64 37.22 46.35% 16.613 21438.79 0.54842 0.00000 80
2.4.19-pre10-aa4-oql 64 32.77 38.17% 18.967 29070.67 0.52241 0.00003 86
2.4.19-pre10-aa4 128 37.11 45.77% 32.377 48430.99 0.55281 0.00200 81
2.4.19-pre10-aa4-oql 128 32.71 38.04% 37.046 56332.65 0.52779 0.00315 86
2.4.19-pre10-aa4 256 36.97 46.31% 62.846 84414.17 0.58346 0.42848 80
2.4.19-pre10-aa4-oql 256 33.06 38.38% 70.013 93041.19 0.53021 0.45341 86
Random Writes ext2
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency 2s 10s Eff
-------------------- --- --------------------------------------------------------
2.4.19-pre10-aa4 1 4.60 3.73% 0.071 11.32 0.00000 0.00000 123
2.4.19-pre10-aa4-oql 1 3.71 2.38% 0.063 1.39 0.00000 0.00000 156
2.4.19-pre10-aa4 32 4.62 8.18% 0.183 17.48 0.00000 0.00000 56
2.4.19-pre10-aa4-oql 32 3.85 6.81% 0.184 13.01 0.00000 0.00000 56
2.4.19-pre10-aa4 64 4.62 8.75% 0.179 11.23 0.00000 0.00000 53
2.4.19-pre10-aa4-oql 64 3.83 6.01% 0.185 11.79 0.00000 0.00000 64
2.4.19-pre10-aa4 128 4.49 7.82% 0.181 11.25 0.00000 0.00000 57
2.4.19-pre10-aa4-oql 128 3.90 7.64% 0.186 13.04 0.00000 0.00000 51
2.4.19-pre10-aa4 256 4.41 8.92% 0.180 10.56 0.00000 0.00000 49
2.4.19-pre10-aa4-oql 256 3.70 7.32% 0.178 11.26 0.00000 0.00000 51
Sequential Reads reiserfs
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency 2s 10s Eff
--------------------- --- --------------------------------------------------------
2.4.19-pre10-aa4 1 47.63 30.34% 0.244 150.93 0.00000 0.00000 157
2.4.19-pre10-aa4-oql 1 47.99 30.42% 0.242 153.50 0.00000 0.00000 158
2.4.19-pre10-aa4 32 36.75 25.80% 9.761 12904.09 0.08792 0.00000 142
2.4.19-pre10-aa4-oql 32 30.68 20.76% 11.026 47422.16 0.15427 0.00591 148
2.4.19-pre10-aa4 64 34.26 24.31% 20.720 22812.18 0.60685 0.00000 141
2.4.19-pre10-aa4-oql 64 31.50 20.98% 20.887 74077.79 0.17828 0.09984 150
2.4.19-pre10-aa4 128 35.94 25.46% 37.882 50116.12 0.53921 0.00388 141
2.4.19-pre10-aa4-oql 128 32.41 21.82% 39.056 137240.55 0.20548 0.17551 149
2.4.19-pre10-aa4 256 35.28 25.17% 74.221 102660.73 0.59475 0.48787 140
2.4.19-pre10-aa4-oql 256 31.67 21.85% 73.905 248764.61 0.26824 0.23384 145
Random Reads reiserfs
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency 2s 10s Eff
--------------------- --- --------------------------------------------------------
2.4.19-pre10-aa4 1 0.59 0.86% 19.727 129.29 0.00000 0.00000 69
2.4.19-pre10-aa4-oql 1 0.60 0.75% 19.634 124.21 0.00000 0.00000 79
2.4.19-pre10-aa4 32 4.23 5.23% 80.588 363.42 0.00000 0.00000 81
2.4.19-pre10-aa4-oql 32 2.98 3.50% 107.835 352.72 0.00000 0.00000 85
2.4.19-pre10-aa4 64 4.28 5.16% 139.187 7890.49 0.45363 0.00000 83
2.4.19-pre10-aa4-oql 64 3.12 5.36% 184.145 10398.28 0.78125 0.00000 58
2.4.19-pre10-aa4 128 4.51 6.99% 213.281 8206.57 1.71370 0.00000 65
2.4.19-pre10-aa4-oql 128 3.13 4.57% 296.529 12265.36 4.43549 0.00000 68
2.4.19-pre10-aa4 256 4.51 7.61% 378.383 8946.18 6.09375 0.00000 59
2.4.19-pre10-aa4-oql 256 3.10 6.34% 539.497 13397.75 10.57291 0.00000 49
Sequential Writes reiserfs
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency 2s 10s Eff
--------------------- --- --------------------------------------------------------
2.4.19-pre10-aa4 1 30.24 49.54% 0.373 33577.32 0.00201 0.00124 61
2.4.19-pre10-aa4-oql 1 28.25 46.58% 0.400 37683.70 0.00204 0.00137 61
2.4.19-pre10-aa4 32 35.33 158.8% 8.978 27421.64 0.13778 0.00102 22
2.4.19-pre10-aa4-oql 32 30.49 145.0% 10.526 48710.93 0.14690 0.00973 21
2.4.19-pre10-aa4 64 32.43 156.6% 19.320 88360.82 0.29164 0.02301 21
2.4.19-pre10-aa4-oql 64 29.31 126.4% 21.320 75316.32 0.33388 0.01316 23
2.4.19-pre10-aa4 128 32.93 140.0% 35.576 95977.80 0.43586 0.11368 24
2.4.19-pre10-aa4-oql 128 29.18 116.8% 40.995 92350.00 0.46883 0.15281 25
2.4.19-pre10-aa4 256 31.38 138.1% 68.583 175532.20 0.67412 0.22071 23
2.4.19-pre10-aa4-oql 256 28.33 117.1% 79.807 128832.50 0.75308 0.32549 24
Random Writes reiserfs
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency 2s 10s Eff
--------------------- --- --------------------------------------------------------
2.4.19-pre10-aa4 1 4.29 4.12% 0.094 0.26 0.00000 0.00000 104
2.4.19-pre10-aa4-oql 1 3.47 3.33% 0.095 0.93 0.00000 0.00000 104
2.4.19-pre10-aa4 32 4.36 10.78% 0.256 107.72 0.00000 0.00000 40
2.4.19-pre10-aa4-oql 32 3.51 8.38% 0.226 8.23 0.00000 0.00000 42
2.4.19-pre10-aa4 64 4.41 11.08% 0.231 7.93 0.00000 0.00000 40
2.4.19-pre10-aa4-oql 64 3.52 8.87% 0.231 7.92 0.00000 0.00000 40
2.4.19-pre10-aa4 128 4.36 10.60% 0.244 55.45 0.00000 0.00000 41
2.4.19-pre10-aa4-oql 128 3.61 9.54% 0.378 434.79 0.00000 0.00000 38
2.4.19-pre10-aa4 256 4.21 10.46% 0.857 494.67 0.00000 0.00000 40
2.4.19-pre10-aa4-oql 256 3.40 8.78% 0.665 711.39 0.00000 0.00000 39
Dbench improves 4-15% with QLogic 6.x driver.
2.4.19-pre10-aa4 has QLogic 6.x driver.
2.4.19-pre10-aa4-oql has old qlogic driver.
dbench reiserfs 192 processes average (5 runs)
2.4.19-pre10-aa4 47.98
2.4.19-pre10-aa4-oql 45.24
dbench reiserfs 64 processes average
2.4.19-pre10-aa4 65.19
2.4.19-pre10-aa4-oql 61.84
dbench ext3 192 processes average
2.4.19-pre10-aa4 71.62
2.4.19-pre10-aa4-oql 66.19
dbench ext3 64 processes Average
2.4.19-pre10-aa4 91.68
2.4.19-pre10-aa4-oql 86.16
dbench ext2 192 processes Average
2.4.19-pre10-aa4 153.43
2.4.19-pre10-aa4-oql 147.71
dbench ext2 64 processes Average
2.4.19-pre10-aa4 184.89
2.4.19-pre10-aa4-oql 178.27
Bonnie++ average of 3 runs shows improvement in most metrics.
15% improvement in block writes on ext3.
2.4.19-pre10-aa4 has QLogic 6.x driver.
2.4.19-pre10-aa4-oql has old qlogic driver.
bonnie++-1.02a on ext2
-------- Sequential Output ---------- - Sequential Input - ----- Random -----
------ Block ----- ---- Rewrite ---- ----- Block ----- ----- Seeks -----
Kernel Size MB/sec %CPU Eff MB/sec %CPU Eff MB/sec %CPU Eff /sec %CPU Eff
2.4.19-pre10-aa4 8192 52.00 29.0 179 22.35 20.7 108 51.78 26.3 197 435.9 2.00 21797
2.4.19-pre10-aa4-oql 8192 47.36 26.0 182 21.85 20.0 109 50.68 25.0 203 440.9 1.67 26452
---------Sequential ------------------ ------------- Random -----------------
----- Create ----- ---- Delete ---- ----- Create ---- ---- Delete ----
files /sec %CPU Eff /sec %CPU Eff /sec %CPU Eff /sec %CPU Eff
2.4.19-pre10-aa4 65536 174 99.0 175 87563 98.7 8874 170 99.0 172 574 99.0 580
2.4.19-pre10-aa4-oql 65536 172 99.0 174 86867 99.0 8774 170 99.0 172 582 99.0 588
bonnie++-1.02a on ext3
-------- Sequential Output ---------- - Sequential Input - ----- Random -----
------ Block ----- ---- Rewrite ---- ----- Block ----- ----- Seeks -----
Kernel Size MB/sec %CPU Eff MB/sec %CPU Eff MB/sec %CPU Eff /sec %CPU Eff
2.4.19-pre10-aa4 8192 49.99 55.3 90 22.31 23.0 97 51.81 25.3 205 362.3 2.00 18115
2.4.19-pre10-aa4-oql 8192 42.36 46.0 92 21.12 21.7 97 50.75 25.0 203 363.4 1.67 21806
---------Sequential ------------------ ------------- Random -----------------
----- Create ----- ---- Delete ---- ----- Create ---- ---- Delete ----
files /sec %CPU Eff /sec %CPU Eff /sec %CPU Eff /sec %CPU Eff
2.4.19-pre10-aa4 65536 127 99.0 128 27237 96.0 2837 129 99.0 130 481 96.3 499
2.4.19-pre10-aa4-oql 65536 125 99.0 127 28173 96.3 2924 128 99.0 129 478 96.0 498
bonnie++-1.02a on reiserfs
-------- Sequential Output ---------- - Sequential Input - ----- Random -----
------ Block ----- ---- Rewrite ---- ----- Block ----- ----- Seeks -----
Kernel Size MB/sec %CPU Eff MB/sec %CPU Eff MB/sec %CPU Eff /sec %CPU Eff
2.4.19-pre10-aa4 8192 32.98 49.0 67 22.65 24.7 92 49.03 28.0 175 363.0 2.00 18152
2.4.19-pre10-aa4-oql 8192 30.34 45.0 67 21.57 23.0 94 49.09 27.7 177 365.2 2.33 15650
---------Sequential ------------------ ------------- Random -----------------
----- Create ----- ---- Delete ---- ----- Create ---- ---- Delete ----
files /sec %CPU Eff /sec %CPU Eff /sec %CPU Eff /sec %CPU Eff
2.4.19-pre10-aa4 131072 3634 41.7 8722 2406 33.7 7147 3280 39.7 8270 977 18.3 5331
2.4.19-pre10-aa4-oql 131072 3527 40.3 8745 2295 32.0 7170 3349 40.7 8234 871 16.0 5446
--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html