2007-02-26 14:00:26

by Andre Noll

[permalink] [raw]
Subject: qla2xxx BUG: workqueue leaked lock or atomic

Hi

On linux-2.6.20.1, we're seeing hard lockups with 2 raid systems
connected to a qla2xxx card and used as a single volume via lvm.
The system seems to lock up only if data gets written to both raid
systems at the same time.

On a standard kernel nothing makes it to the log, the system just
freezes. So we tried a lockdep kernel which reports two BUGs during
boot, see below.

Could this be related to our problem?

Thanks
Andre


[ 64.150773] Loading iSCSI transport class v2.0-724.
[ 64.151096] QLogic Fibre Channel HBA Driver
[ 64.151405] ACPI: PCI Interrupt 0000:05:08.0[A] -> GSI 32 (level, low) -> IRQ 32
[ 64.151821] qla2xxx 0000:05:08.0: Found an ISP2422, irq 32, iobase 0xffffc20000006000
[ 64.152231] qla2xxx 0000:05:08.0: Configuring PCI space...
[ 64.152498] qla2xxx 0000:05:08.0: Configure NVRAM parameters...
[ 64.159088] qla2xxx 0000:05:08.0: Verifying loaded RISC code...
[ 74.169623] qla2xxx 0000:05:08.0: Firmware image unavailable.
[ 74.169737] qla2xxx 0000:05:08.0: Firmware images can be retrieved from: ftp://ftp.qlogic.com/outgoing/linux/firmware/.
[ 74.169902] qla2xxx 0000:05:08.0: Attempting to load (potentially outdated) firmware from flash.
[ 74.760935] qla2xxx 0000:05:08.0: Allocated (64 KB) for EFT...
[ 74.761186] qla2xxx 0000:05:08.0: Allocated (1413 KB) for firmware dump...
[ 74.776988] scsi0 : qla2xxx
[ 74.961451] qla2xxx 0000:05:08.0:
[ 74.961452] QLogic Fibre Channel HBA Driver: 8.01.07-k4
[ 74.961453] QLogic HP AE369-60001 - QLA2340
[ 74.961454] ISP2422: PCI-X Mode 1 (133 MHz) @ 0000:05:08.0 hdma+, host#=0, fw=4.00.70 [IP]
[ 74.961970] ACPI: PCI Interrupt 0000:05:08.1[B] -> GSI 33 (level, low) -> IRQ 33
[ 74.962296] qla2xxx 0000:05:08.1: Found an ISP2422, irq 33, iobase 0xffffc20000172000
[ 74.962662] qla2xxx 0000:05:08.1: Configuring PCI space...
[ 74.962914] qla2xxx 0000:05:08.1: Configure NVRAM parameters...
[ 74.969494] qla2xxx 0000:05:08.1: Verifying loaded RISC code...
[ 75.353426] qla2xxx 0000:05:08.0: LIP reset occured (f7f7).
[ 75.385670] qla2xxx 0000:05:08.0: LIP occured (f7f7).
[ 75.388282] qla2xxx 0000:05:08.0: LOOP UP detected (2 Gbps).
[ 75.778656] BUG: at kernel/lockdep.c:1860 trace_hardirqs_on()
[ 75.778771]
[ 75.778772] Call Trace:
[ 75.778967] <IRQ> [<ffffffff8024b877>] trace_hardirqs_on+0xd7/0x180
[ 75.779154] [<ffffffff8052bc1b>] _spin_unlock_irq+0x2b/0x40
[ 75.779271] [<ffffffff804605d7>] qla2x00_process_completed_request+0x137/0x1d0
[ 75.779424] [<ffffffff804606f2>] qla2x00_status_entry+0x82/0xa40
[ 75.779541] [<ffffffff8024b17f>] __lock_acquire+0xcdf/0xd90
[ 75.779657] [<ffffffff8052bcb2>] _spin_unlock_irqrestore+0x42/0x60
[ 75.779775] [<ffffffff8046228e>] qla24xx_intr_handler+0x4e/0x2b0
[ 75.779892] [<ffffffff804613e1>] qla24xx_process_response_queue+0xc1/0x1c0
[ 75.780012] [<ffffffff80462414>] qla24xx_intr_handler+0x1d4/0x2b0
[ 75.780131] [<ffffffff8025e950>] handle_IRQ_event+0x20/0x60
[ 75.780270] [<ffffffff802604ad>] handle_fasteoi_irq+0xbd/0x110
[ 75.780411] [<ffffffff8020cf62>] do_IRQ+0x132/0x1a0
[ 75.780545] [<ffffffff80208430>] default_idle+0x0/0x60
[ 75.780682] [<ffffffff8020a236>] ret_from_intr+0x0/0xf
[ 75.780818] <EOI> [<ffffffff80208467>] default_idle+0x37/0x60
[ 75.781021] [<ffffffff80208469>] default_idle+0x39/0x60
[ 75.781156] [<ffffffff80208467>] default_idle+0x37/0x60
[ 75.781294] [<ffffffff802084f1>] cpu_idle+0x61/0x90
[ 75.781429] [<ffffffff806d6f8b>] start_secondary+0x51b/0x530
[ 75.781569]
[ 75.781873] scsi 0:0:0:0: Direct-Access transtec T6100F16R1-E 342I PQ: 0 ANSI: 5
[ 75.782532] BUG: workqueue leaked lock or atomic: scsi_wq_0/0x00000000/362
[ 75.782678] last function: fc_scsi_scan_rport+0x0/0x90
[ 75.782878] 1 lock held by scsi_wq_0/362:
[ 75.783008] #0: (&shost->scan_mutex){--..}, at: [<ffffffff80529fe5>] mutex_lock+0x25/0x30
[ 75.783517]
[ 75.783518] Call Trace:
[ 75.783754] [<ffffffff80248319>] debug_show_held_locks+0x9/0x10
[ 75.783896] [<ffffffff8023eb49>] run_workqueue+0x149/0x1a0
[ 75.784036] [<ffffffff802427c0>] keventd_create_kthread+0x0/0x90
[ 75.784180] [<ffffffff8023edc1>] worker_thread+0x151/0x190
[ 75.784322] [<ffffffff80227e80>] default_wake_function+0x0/0x10
[ 75.784463] [<ffffffff8023ec70>] worker_thread+0x0/0x190
[ 75.784600] [<ffffffff80242a2a>] kthread+0xda/0x110
[ 75.784737] [<ffffffff8020ab08>] child_rip+0xa/0x12
[ 75.784875] [<ffffffff8052bc1b>] _spin_unlock_irq+0x2b/0x40
[ 75.785014] [<ffffffff8020a28c>] restore_args+0x0/0x30
[ 75.785149] [<ffffffff80242950>] kthread+0x0/0x110
[ 75.785285] [<ffffffff8020aafe>] child_rip+0x0/0x12
[ 75.785417]
[ 84.980341] qla2xxx 0000:05:08.1: Firmware image unavailable.
[ 84.980455] qla2xxx 0000:05:08.1: Firmware images can be retrieved from: ftp://ftp.qlogic.com/outgoing/linux/firmware/.
[ 84.980620] qla2xxx 0000:05:08.1: Attempting to load (potentially outdated) firmware from flash.
[ 85.571726] qla2xxx 0000:05:08.1: Allocated (64 KB) for EFT...
[ 85.571956] qla2xxx 0000:05:08.1: Allocated (1413 KB) for firmware dump...
[ 85.587766] scsi1 : qla2xxx
[ 85.718476] qla2xxx 0000:05:08.1:
[ 85.718478] QLogic Fibre Channel HBA Driver: 8.01.07-k4
[ 85.718479] QLogic HP AE369-60001 - QLA2340
[ 85.718480] ISP2422: PCI-X Mode 1 (133 MHz) @ 0000:05:08.1 hdma+, host#=1, fw=4.00.70 [IP]
[ 85.719505] sda : very big device. try to use READ CAPACITY(16).
[ 85.719727] SCSI device sda: 11714863104 512-byte hdwr sectors (5998010 MB)
[ 85.720114] sda: Write Protect is off
[ 85.720219] sda: Mode Sense: 9b 00 00 08
[ 85.720608] SCSI device sda: write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 85.721008] sda : very big device. try to use READ CAPACITY(16).
[ 85.721206] SCSI device sda: 11714863104 512-byte hdwr sectors (5998010 MB)
[ 85.721552] sda: Write Protect is off
[ 85.721680] sda: Mode Sense: 9b 00 00 08
[ 85.722088] SCSI device sda: write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 85.722298] sda: unknown partition table
[ 85.722897] sd 0:0:0:0: Attached scsi disk sda
[ 85.723205] sd 0:0:0:0: Attached scsi generic sg0 type 0


Attachments:
(No filename) (6.10 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-02-26 18:26:32

by Andrew Vasquez

[permalink] [raw]
Subject: Re: qla2xxx BUG: workqueue leaked lock or atomic

On Mon, 26 Feb 2007, Andre Noll wrote:

> On linux-2.6.20.1, we're seeing hard lockups with 2 raid systems
> connected to a qla2xxx card and used as a single volume via lvm.
> The system seems to lock up only if data gets written to both raid
> systems at the same time.
>
> On a standard kernel nothing makes it to the log, the system just
> freezes. So we tried a lockdep kernel which reports two BUGs during
> boot, see below.
>
> Could this be related to our problem?

Before we proceed further, could you retrieve the latest firmware
release for 24xx type HBAs:

> [ 64.151096] QLogic Fibre Channel HBA Driver
> [ 64.151405] ACPI: PCI Interrupt 0000:05:08.0[A] -> GSI 32 (level, low) -> IRQ 32
> [ 64.151821] qla2xxx 0000:05:08.0: Found an ISP2422, irq 32, iobase 0xffffc20000006000
> [ 64.152231] qla2xxx 0000:05:08.0: Configuring PCI space...
> [ 64.152498] qla2xxx 0000:05:08.0: Configure NVRAM parameters...
> [ 64.159088] qla2xxx 0000:05:08.0: Verifying loaded RISC code...
> [ 74.169623] qla2xxx 0000:05:08.0: Firmware image unavailable.
> [ 74.169737] qla2xxx 0000:05:08.0: Firmware images can be retrieved from: ftp://ftp.qlogic.com/outgoing/linux/firmware/.
> [ 74.169902] qla2xxx 0000:05:08.0: Attempting to load (potentially outdated) firmware from flash.
> [ 74.760935] qla2xxx 0000:05:08.0: Allocated (64 KB) for EFT...
> [ 74.761186] qla2xxx 0000:05:08.0: Allocated (1413 KB) for firmware dump...
> [ 74.776988] scsi0 : qla2xxx
> [ 74.961451] qla2xxx 0000:05:08.0:
> [ 74.961452] QLogic Fibre Channel HBA Driver: 8.01.07-k4
> [ 74.961453] QLogic HP AE369-60001 - QLA2340
> [ 74.961454] ISP2422: PCI-X Mode 1 (133 MHz) @ 0000:05:08.0 hdma+, host#=0, fw=4.00.70 [IP]

You are loading some stale firmware that's left over on the card --
I'm not even sure what 4.00.70 is, as the latest release firmware is
4.00.27. You can retrieve the image here:

ftp://ftp.qlogic.com/outgoing/linux/firmware/ql2400_fw.bin

Let's start there... before we move on to this:

> [ 75.778656] BUG: at kernel/lockdep.c:1860 trace_hardirqs_on()
> [ 75.778771]
> [ 75.778772] Call Trace:
> [ 75.778967] <IRQ> [<ffffffff8024b877>] trace_hardirqs_on+0xd7/0x180
> [ 75.779154] [<ffffffff8052bc1b>] _spin_unlock_irq+0x2b/0x40
> [ 75.779271] [<ffffffff804605d7>] qla2x00_process_completed_request+0x137/0x1d0
> [ 75.779424] [<ffffffff804606f2>] qla2x00_status_entry+0x82/0xa40
> [ 75.779541] [<ffffffff8024b17f>] __lock_acquire+0xcdf/0xd90
> [ 75.779657] [<ffffffff8052bcb2>] _spin_unlock_irqrestore+0x42/0x60
> [ 75.779775] [<ffffffff8046228e>] qla24xx_intr_handler+0x4e/0x2b0
> [ 75.779892] [<ffffffff804613e1>] qla24xx_process_response_queue+0xc1/0x1c0
> [ 75.780012] [<ffffffff80462414>] qla24xx_intr_handler+0x1d4/0x2b0
> [ 75.780131] [<ffffffff8025e950>] handle_IRQ_event+0x20/0x60

Hmm....

Regards,
Andrew Vasquez

2007-02-27 10:11:17

by Andre Noll

[permalink] [raw]
Subject: Re: qla2xxx BUG: workqueue leaked lock or atomic

On 10:26, Andrew Vasquez wrote:
> You are loading some stale firmware that's left over on the card --
> I'm not even sure what 4.00.70 is, as the latest release firmware is
> 4.00.27.

That's the firmware which came with the card. Anyway, I just upgraded
the firmware, but the bug remains. The backtrace differs a bit though
as now the tg3 network driver seems to be involved as well.

Thanks for your help
Andre

[ 67.511167] qla2xxx 0000:05:08.0: Allocated (64 KB) for EFT...
[ 67.511434] qla2xxx 0000:05:08.0: Allocated (1413 KB) for firmware dump...
[ 67.531231] scsi0 : qla2xxx
[ 67.854344] qla2xxx 0000:05:08.0:
[ 67.854346] QLogic Fibre Channel HBA Driver: 8.01.07-k4
[ 67.854347] QLogic HP AE369-60001 - QLA2340
[ 67.854348] ISP2422: PCI-X Mode 1 (133 MHz) @ 0000:05:08.0 hdma+, host#=0, fw=4.00.27 [IP]
[ 67.854881] ACPI: PCI Interrupt 0000:05:08.1[B] -> GSI 33 (level, low) -> IRQ 33
[ 67.855230] qla2xxx 0000:05:08.1: Found an ISP2422, irq 33, iobase 0xffffc20000012000
[ 67.855645] qla2xxx 0000:05:08.1: Configuring PCI space...
[ 67.855907] qla2xxx 0000:05:08.1: Configure NVRAM parameters...
[ 67.862486] qla2xxx 0000:05:08.1: Verifying loaded RISC code...
[ 68.106663] qla2xxx 0000:05:08.1: Allocated (64 KB) for EFT...
[ 68.107058] qla2xxx 0000:05:08.1: Allocated (1413 KB) for firmware dump...
[ 68.126759] scsi1 : qla2xxx
[ 68.196783] Adding 6540152k swap on /dev/md2. Priority:-1 extents:1 across:6540152k
[ 68.260645] qla2xxx 0000:05:08.0: LIP reset occured (f8f7).
[ 68.296027] qla2xxx 0000:05:08.0: LIP occured (f8f7).
[ 68.298214] qla2xxx 0000:05:08.0: LOOP UP detected (2 Gbps).
[ 68.326627] qla2xxx 0000:05:08.1:
[ 68.326628] QLogic Fibre Channel HBA Driver: 8.01.07-k4
[ 68.326630] QLogic HP AE369-60001 - QLA2340
[ 68.326631] ISP2422: PCI-X Mode 1 (133 MHz) @ 0000:05:08.1 hdma+, host#=1, fw=4.00.27 [IP]
[ 68.504335] EXT3 FS on md1, internal journal
[ 68.524627] PM: Writing back config space on device 0000:03:06.0 at offset b (was 164814e4, writing d00e11)
[ 68.524644] PM: Writing back config space on device 0000:03:06.0 at offset 3 (was 804000, writing 804010)
[ 68.524650] PM: Writing back config space on device 0000:03:06.0 at offset 2 (was 2000000, writing 2000010)
[ 68.524657] PM: Writing back config space on device 0000:03:06.0 at offset 1 (was 2b00000, writing 2b00146)
[ 68.532665] BUG: at kernel/lockdep.c:1860 trace_hardirqs_on()
[ 68.532784]
[ 68.532785] Call Trace:
[ 68.532979] <IRQ> [<ffffffff8024b877>] trace_hardirqs_on+0xd7/0x180
[ 68.533168] [<ffffffff80511f5b>] _spin_unlock_irq+0x2b/0x40
[ 68.533295] [<ffffffff88032747>] :qla2xxx:qla2x00_process_completed_request+0x137/0x1d0
[ 68.533457] [<ffffffff88032862>] :qla2xxx:qla2x00_status_entry+0x82/0xa40
[ 68.533577] [<ffffffff8024b17f>] __lock_acquire+0xcdf/0xd90
[ 68.533693] [<ffffffff80511ff2>] _spin_unlock_irqrestore+0x42/0x60
[ 68.533816] [<ffffffff880343fe>] :qla2xxx:qla24xx_intr_handler+0x4e/0x2b0
[ 68.533942] [<ffffffff88033551>] :qla2xxx:qla24xx_process_response_queue+0xc1/0x1c0
[ 68.534102] [<ffffffff88034584>] :qla2xxx:qla24xx_intr_handler+0x1d4/0x2b0
[ 68.534224] [<ffffffff8025e950>] handle_IRQ_event+0x20/0x60
[ 68.534339] [<ffffffff802604ad>] handle_fasteoi_irq+0xbd/0x110
[ 68.534459] [<ffffffff8020cf62>] do_IRQ+0x132/0x1a0
[ 68.534574] [<ffffffff8020a236>] ret_from_intr+0x0/0xf
[ 68.534687] <EOI> [<ffffffff803ad15c>] __delay+0xc/0x20
[ 68.534862] [<ffffffff803ad1a7>] __const_udelay+0x37/0x40
[ 68.534982] [<ffffffff88006737>] :tg3:tg3_chip_reset+0x547/0x670
[ 68.535103] [<ffffffff8800df2d>] :tg3:tg3_reset_hw+0x5d/0x1790
[ 68.535218] [<ffffffff803ad1e7>] __udelay+0x37/0x40
[ 68.535333] [<ffffffff8800408d>] :tg3:_tw32_flush+0x6d/0x80
[ 68.535451] [<ffffffff88012196>] :tg3:tg3_open+0x2d6/0x610
[ 68.535569] [<ffffffff8800f6a2>] :tg3:tg3_init_hw+0x42/0x50
[ 68.535687] [<ffffffff880121a3>] :tg3:tg3_open+0x2e3/0x610
[ 68.535804] [<ffffffff804b36e3>] dev_open+0x43/0x90
[ 68.535917] [<ffffffff804b2814>] dev_change_flags+0x74/0x160
[ 68.536034] [<ffffffff804f3e66>] devinet_ioctl+0x2e6/0x730
[ 68.536149] [<ffffffff804b4bc2>] dev_ioctl+0x302/0x340
[ 68.536264] [<ffffffff803aa71b>] __up_read+0x9b/0xb0
[ 68.536378] [<ffffffff804f42fc>] inet_ioctl+0x4c/0x70
[ 68.536494] [<ffffffff804a73ec>] sock_ioctl+0x1fc/0x230
[ 68.536610] [<ffffffff8029c701>] do_ioctl+0x31/0xa0
[ 68.536722] [<ffffffff8029ca2b>] vfs_ioctl+0x2bb/0x2e0
[ 68.536836] [<ffffffff8029ca9a>] sys_ioctl+0x4a/0x80
[ 68.536948] [<ffffffff80209cee>] system_call+0x7e/0x83
[ 68.537059]
[ 68.712832] scsi 0:0:0:0: Direct-Access transtec T6100F16R1-E 342I PQ: 0 ANSI: 5
[ 68.713384] sda : very big device. try to use READ CAPACITY(16).
[ 68.713594] SCSI device sda: 11714863104 512-byte hdwr sectors (5998010 MB)
[ 68.713976] sda: Write Protect is off
[ 68.714079] sda: Mode Sense: 9b 00 00 08
[ 68.714483] SCSI device sda: write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 68.714876] sda : very big device. try to use READ CAPACITY(16).
[ 68.715080] SCSI device sda: 11714863104 512-byte hdwr sectors (5998010 MB)
[ 68.715436] sda: Write Protect is off
[ 68.715539] sda: Mode Sense: 9b 00 00 08
[ 68.715944] SCSI device sda: write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 68.718244] sda: unknown partition table
[ 68.718707] sd 0:0:0:0: Attached scsi disk sda
[ 68.718945] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 68.719413] BUG: workqueue leaked lock or atomic: scsi_wq_0/0x00000000/2138
[ 68.719556] last function: fc_scsi_scan_rport+0x0/0x90
[ 68.719754] 1 lock held by scsi_wq_0/2138:
[ 68.719878] #0: (&shost->scan_mutex){--..}, at: [<ffffffff80510325>] mutex_lock+0x25/0x30
[ 68.720380]
[ 68.720381] Call Trace:
[ 68.720616] [<ffffffff80248319>] debug_show_held_locks+0x9/0x10
[ 68.720757] [<ffffffff8023eb49>] run_workqueue+0x149/0x1a0
[ 68.720891] [<ffffffff802427c0>] keventd_create_kthread+0x0/0x90
[ 68.721030] [<ffffffff8023edc1>] worker_thread+0x151/0x190
[ 68.721167] [<ffffffff80227e80>] default_wake_function+0x0/0x10
[ 68.721307] [<ffffffff8023ec70>] worker_thread+0x0/0x190
[ 68.721443] [<ffffffff80242a2a>] kthread+0xda/0x110
[ 68.721575] [<ffffffff8020ab08>] child_rip+0xa/0x12
[ 68.721709] [<ffffffff80511f5b>] _spin_unlock_irq+0x2b/0x40
[ 68.721842] [<ffffffff8020a28c>] restore_args+0x0/0x30
[ 68.721973] [<ffffffff80242950>] kthread+0x0/0x110
[ 68.722106] [<ffffffff8020aafe>] child_rip+0x0/0x12
[ 68.722240]
[ 68.762666] qla2xxx 0000:05:08.1: LIP reset occured (f7f7).
[ 68.797954] qla2xxx 0000:05:08.1: LIP occured (f7f7).
[ 68.800134] qla2xxx 0000:05:08.1: LOOP UP detected (2 Gbps).
[ 69.127937] scsi 1:0:0:0: Direct-Access ADVUNI OXYGENRAID 416F 341B PQ: 0 ANSI: 3
[ 69.128528] sdb : very big device. try to use READ CAPACITY(16).
[ 69.128777] SCSI device sdb: 9370656768 512-byte hdwr sectors (4797776 MB)
[ 69.129220] sdb: Write Protect is off
[ 69.129326] sdb: Mode Sense: 8f 00 00 08
[ 69.129878] SCSI device sdb: write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 69.130342] sdb : very big device. try to use READ CAPACITY(16).
[ 69.130585] SCSI device sdb: 9370656768 512-byte hdwr sectors (4797776 MB)
[ 69.131006] sdb: Write Protect is off
[ 69.131110] sdb: Mode Sense: 8f 00 00 08
[ 69.131660] SCSI device sdb: write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 69.131843] sdb: unknown partition table
[ 69.132401] sd 1:0:0:0: Attached scsi disk sdb
[ 69.132624] sd 1:0:0:0: Attached scsi generic sg1 type 0


Attachments:
(No filename) (7.57 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-02-27 14:35:59

by Andre Noll

[permalink] [raw]
Subject: Re: qla2xxx BUG: workqueue leaked lock or atomic

On 11:11, Andre Noll wrote:
> On 10:26, Andrew Vasquez wrote:
> > You are loading some stale firmware that's left over on the card --
> > I'm not even sure what 4.00.70 is, as the latest release firmware is
> > 4.00.27.
>
> That's the firmware which came with the card. Anyway, I just upgraded
> the firmware, but the bug remains.

the system crashed again btw., this time resulting in a kernel panic
instead of just locking up silently. Here's a screenshot:

http://systemlinux.org/~maan/shots/qla2xxx-crash-huangho2.png

Regards
Andre


Attachments:
(No filename) (539.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-02-27 18:51:43

by Andrew Vasquez

[permalink] [raw]
Subject: Re: qla2xxx BUG: workqueue leaked lock or atomic

On Tue, 27 Feb 2007, Andre Noll wrote:

> On 10:26, Andrew Vasquez wrote:
> > You are loading some stale firmware that's left over on the card --
> > I'm not even sure what 4.00.70 is, as the latest release firmware is
> > 4.00.27.
>
> That's the firmware which came with the card. Anyway, I just upgraded
> the firmware, but the bug remains. The backtrace differs a bit though
> as now the tg3 network driver seems to be involved as well.
>
> Thanks for your help
> Andre
...
> [ 68.532665] BUG: at kernel/lockdep.c:1860 trace_hardirqs_on()
> [ 68.532784]
> [ 68.532785] Call Trace:
> [ 68.532979] <IRQ> [<ffffffff8024b877>] trace_hardirqs_on+0xd7/0x180
> [ 68.533168] [<ffffffff80511f5b>] _spin_unlock_irq+0x2b/0x40
> [ 68.533295] [<ffffffff88032747>] :qla2xxx:qla2x00_process_completed_request+0x137/0x1d0
> [ 68.533457] [<ffffffff88032862>] :qla2xxx:qla2x00_status_entry+0x82/0xa40
> [ 68.533577] [<ffffffff8024b17f>] __lock_acquire+0xcdf/0xd90
> [ 68.533693] [<ffffffff80511ff2>] _spin_unlock_irqrestore+0x42/0x60
> [ 68.533816] [<ffffffff880343fe>] :qla2xxx:qla24xx_intr_handler+0x4e/0x2b0
> [ 68.533942] [<ffffffff88033551>] :qla2xxx:qla24xx_process_response_queue+0xc1/0x1c0
> [ 68.534102] [<ffffffff88034584>] :qla2xxx:qla24xx_intr_handler+0x1d4/0x2b0

Ok, since 2.6.20, there been a patch added to qla2xxx which drops the
spin_unlock_irq() call while attempting to ramp-up the queue-depth:

commit befede3dabd204e9c546cbfbe391b29286c57da2
Author: Seokmann Ju <[email protected]>
Date: Tue Jan 9 11:37:52 2007 -0800

[SCSI] qla2xxx: correct locking while call starget_for_each_device()

Removed spin_unlock_irq()/spin_lock_irq() pairs surrounding
starget_for_each_device() calls.
As Matthew W. pointed out, starget_for_each_device() can be called under
a spinlock being held.
The change has been tested and verified on qla2xxx.ko module.
Thanks Matthew W. and Hisashi H. for help.

Signed-off-by: Andrew Vasquez <[email protected]>
Signed-off-by: Seokmann Ju <[email protected]>
Signed-off-by: James Bottomley <[email protected]>

http://marc.theaimsgroup.com/?l=linux-scsi&m=116837234904583&w=2

Could you try the latest 2.6.21-rc which contains the correction?

Regards,
Andrew Vasquez

2007-02-28 15:18:43

by Andre Noll

[permalink] [raw]
Subject: Re: qla2xxx BUG: workqueue leaked lock or atomic

On 10:51, Andrew Vasquez wrote:
> On Tue, 27 Feb 2007, Andre Noll wrote:
> > [ 68.532665] BUG: at kernel/lockdep.c:1860 trace_hardirqs_on()
>
> Ok, since 2.6.20, there been a patch added to qla2xxx which drops the
> spin_unlock_irq() call while attempting to ramp-up the queue-depth:
>
> Could you try the latest 2.6.21-rc which contains the correction?

With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
writing to both raid systems at the same time via lvm still locks up
the system within minutes.

As lockdep revealed another dm-related lock problem on this kernel,
I guess I'll have to bother the lvm people on this.

Thanks
Andre


Attachments:
(No filename) (658.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-02-28 15:37:35

by Andre Noll

[permalink] [raw]
Subject: Re: qla2xxx BUG: workqueue leaked lock or atomic

On 16:18, Andre Noll wrote:

> With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
> writing to both raid systems at the same time via lvm still locks up
> the system within minutes.

Screenshot of the resulting kernel panic:

http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png

Andre


Attachments:
(No filename) (319.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments