On Friday 27 February 2004 19:52, Marcelo Tosatti wrote:
> Hi,
>
> Not known to me...
>
> Can you get any traces from the lockup? NMI watchdog or sysrq+p and +t?
>
> Did any previous 2.4.x work reliably?
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
Ok I have pinpointed the bug. This time I was unable to reproduce the actual
lockup in a way which disables me from issuing additional commands, but I
think that the sole reason for that was that everything I need was already
running.
Everything I describe below is regardless of irqbalance being run or not.
Attached logs and outputs are from my first test w/ irqbalance. But the
second test wo/ it is 100% the same except for the interrupts being solely on
the CPU0.
These are the disks:
megaraid: v1.18k (Release Date: Thu Aug 28 10:05:11 EDT 2003)
megaraid: found 0x1000:0x1960:idx 0:bus 4:slot 2:func 0
scsi2 : Found a MegaRAID controller at 0xf8842000, IRQ: 52
scsi2 : Enabling 64 bit support
megaraid: [1L19:1.04] detected 2 logical drives
megaraid: supports extended CDBs.
megaraid: channel[1] is raid.
megaraid: channel[2] is raid.
scsi2 : LSI Logic MegaRAID 1L19 254 commands 15 targs 5 chans 7 luns
scsi2: scanning virtual channel 0 for logical drives.
Vendor: MegaRAID Model: LD0 RAID1 35002R Rev: 1L19
Type: Direct-Access ANSI SCSI revision: 02
blk: queue f7dff018, I/O limit 4095Mb (mask 0xffffffff)
Vendor: MegaRAID Model: LD1 RAID5 70004R Rev: 1L19
Type: Direct-Access ANSI SCSI revision: 02
blk: queue f7bd3c18, I/O limit 4095Mb (mask 0xffffffff)
scsi2: scanning virtual channel 1 for logical drives.
scsi2: scanning virtual channel 2 for logical drives.
scsi2: scanning physical channel 0 for devices.
scsi2: scanning physical channel 1 for devices.
Attached scsi disk sda at scsi2, channel 0, id 0, lun 0
Attached scsi disk sdb at scsi2, channel 0, id 1, lun 0
SCSI device sda: 71684096 512-byte hdwr sectors (36702 MB)
Partition check:
sda: sda1
SCSI device sdb: 143368192 512-byte hdwr sectors (73405 MB)
sdb: sdb1 sdb2 sdb3 sdb4
What did I do is start two instances of tarring the root fs (sda) to home fs
(sdb). While tar is working, I try to run lilo. Then it is stuck until tar's
finish their jobs.
I abrupted them with 2x kill to prove that. After tar's vanish, it takes for
some time until cache gets written out to the disk (see vmstat) and only then
lilo finishes its job.
You can see detailed logs attached below. I think everything will be clear to
you after that.
I you need more hw info on our server I will be glad to get them from you.
Regards,
Tvrtko A. Ursulin