Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161808AbbKTBln (ORCPT ); Thu, 19 Nov 2015 20:41:43 -0500 Received: from kvm5.telegraphics.com.au ([98.124.60.144]:42398 "EHLO kvm5.telegraphics.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161124AbbKTBlk (ORCPT ); Thu, 19 Nov 2015 20:41:40 -0500 Date: Fri, 20 Nov 2015 12:41:19 +1100 (AEDT) From: Finn Thain To: Ondrej Zary cc: Sam Creasey , Michael Schmitz , "James E.J. Bottomley" , linux-m68k@vger.kernel.org, linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 00/71] More fixes, cleanup and modernization for NCR5380 drivers In-Reply-To: <201511192350.58253.linux@rainbow-software.org> Message-ID: References: <20151118083455.331768508@telegraphics.com.au> <201511181235.15663.linux@rainbow-software.org> <201511192350.58253.linux@rainbow-software.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7264 Lines: 139 On Thu, 19 Nov 2015, Ondrej Zary wrote: > On Thursday 19 November 2015 03:24:56 Finn Thain wrote: > > > On Wed, 18 Nov 2015, Ondrej Zary wrote: > > > > > > > > I have some NCR5380 ISA cards and can test them. > > > > Thanks Ondrej. I've no idea which ISA drivers are presently working in > > mainline. Finding regressions may be more difficult than usual ;-) > > You're right... looks very broken: > > [ 62.577194] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x240, n_io_port 16, base 0x0, irq 0, can_queue 16, cmd_per_lun 2, > sg_tablesize 128, this_id 7, flags { DTC3181E NO_PSEUDO_DMA }, USLEEP_POLL 3, USLEEP_WAITLONG 1250, options { AUTOPROBE_IRQ PSEUDO_DMA > NCR53C400 } > [ 62.796635] scsi 2:0:0:0: Direct-Access IBM 0663 e PQ: 0 ANSI: 2 > [ 63.878494] sd 2:0:0:0: Attached scsi generic sg1 type 0 > [ 95.848260] sd 2:0:0:0: aborting command > > And the system hangs completely. > Yes. That was the usual failure mode. The old EH abort routine is fatal. Up until I disabled PDMA by default for mac_scsi (in v3.19), that driver would do the same thing. > It's much better with your patches, but still not great :) > Pleased to hear it :) > [ 93.963264] pnp 01:01.00: [io 0x0240-0x025f] > [ 93.963493] pnp 01:01.00: [irq 5] > [ 93.965768] pnp 01:01.00: activated > [ 93.977147] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x240, n_io_port 16, base 0x0, irq 0, can_queue 16, cmd_per_lun 2, > sg_tablesize 128, this_id 7, flags { DTC3181E NO_PSEUDO_DMA }, options { AUTOPROBE_IRQ PSEUDO_DMA } > [ 93.987527] scsi host2: rejecting message > [ 93.987647] Synchronous Data Transfer Request period = 100 ns offset = 12 > [ 94.001219] scsi 2:0:0:0: Direct-Access IBM 0663 e PQ: 0 ANSI: 2 > [ 113.000794] sd 2:0:0:0: Attached scsi generic sg1 type 0 I'd be interested to know what commands were in play in that 19 second interval. Might need to use scsi_logging_level to figure that out. My tests involved 3 different scsi targets (two disks and a CD-ROM) but none of these send a SDTR. Your log says the driver correctly rejected the SDTR message but that doesn't mean the target actually went to MSG IN phase and got the message. Do you have any older targets you can test? > [ 144.852432] sd 2:0:0:0: [sdb] Unit Not Ready > [ 144.852574] sd 2:0:0:0: [sdb] Sense Key : Aborted Command [current] > [ 144.852713] sd 2:0:0:0: [sdb] Add. Sense: Select or reselect failure AFAIK, the target should not have to abort any commands. Moreover, the target should never experience a select/reselect failure, because you have irq == 0 (see above) and that implies that the target is never permitted the disconnect privilege. > [ 240.108292] INFO: task modprobe:1957 blocked for more than 120 seconds. > [ 240.108418] Not tainted 4.3.0-rc1+ #74 Why not use v4.3? > [ 240.108501] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 240.108597] modprobe D 0000001a 0 1957 1950 0x00000000 > [ 240.108790] ce0fad00 00000086 53881781 0000001a c1525f88 4edbe39c 0000001a 04ac33e5 > [ 240.109246] 00000000 ccd54000 ffffffff ffffffff d204b280 c139c504 00000000 c104416d > [ 240.109699] 00000000 ce0fad00 c1054a45 c151fd8c c151fd8c d204b280 00000000 ccd6d100 > [ 240.110156] Call Trace: > [ 240.110295] [] ? schedule+0x5b/0x67 > [ 240.110430] [] ? async_synchronize_cookie_domain+0x73/0x9f > [ 240.110569] [] ? abort_exclusive_wait+0x6e/0x6e > [ 240.110699] [] ? do_init_module+0xa4/0x1a3 > [ 240.110824] [] ? load_module+0x14de/0x18ca > [ 240.110948] [] ? SyS_finit_module+0x47/0x56 > [ 240.111068] [] ? sysenter_do_call+0x12/0x12 Not sure what module was being probed here. I presume it was g_NCR5380 or g_NCR5380_mmio. Neither of these calls 'scsi_scan_host'. I'm not sure what the implications are (?) > [ 240.852458] sd 2:0:0:0: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_SENSE > [ 240.852620] sd 2:0:0:0: [sdb] Sense Key : Aborted Command [current] > [ 240.852760] sd 2:0:0:0: [sdb] Add. Sense: Select or reselect failure > [ 272.852471] sd 2:0:0:0: [sdb] Write Protect is off > [ 272.852614] sd 2:0:0:0: [sdb] Mode Sense: 00 00 00 00 > [ 304.084452] sd 2:0:0:0: [sdb] Asking for cache data failed > [ 304.084592] sd 2:0:0:0: [sdb] Assuming drive cache: write through This looks like nonsense to me ... I don't think the target actually aborted the reselection phase of a read capacity command. I'm out of ideas here. Can anyone else make sense of this? > [ 360.108284] INFO: task modprobe:1957 blocked for more than 120 seconds. > [ 360.108409] Not tainted 4.3.0-rc1+ #74 > [ 360.108492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 360.108591] modprobe D 0000001a 0 1957 1950 0x00000000 > [ 360.108787] ce0fad00 00000086 53881781 0000001a c1525f88 4edbe39c 0000001a 04ac33e5 > [ 360.109248] 00000000 ccd54000 ffffffff ffffffff d204b280 c139c504 00000000 c104416d > [ 360.109703] 00000000 ce0fad00 c1054a45 c151fd8c c151fd8c d204b280 00000000 ccd6d100 > [ 360.110158] Call Trace: > [ 360.110296] [] ? schedule+0x5b/0x67 > [ 360.110430] [] ? async_synchronize_cookie_domain+0x73/0x9f > [ 360.110568] [] ? abort_exclusive_wait+0x6e/0x6e > [ 360.110699] [] ? do_init_module+0xa4/0x1a3 > [ 360.110823] [] ? load_module+0x14de/0x18ca > [ 360.110945] [] ? SyS_finit_module+0x47/0x56 > [ 360.111065] [] ? sysenter_do_call+0x12/0x12 > [ 431.060488] sd 2:0:0:0: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_SENSE > [ 431.060650] sd 2:0:0:0: [sdb] Sense Key : Aborted Command [current] > [ 431.060791] sd 2:0:0:0: [sdb] Add. Sense: Select or reselect failure > [ 480.108282] INFO: task modprobe:1957 blocked for more than 120 seconds. > [ 480.108405] Not tainted 4.3.0-rc1+ #74 > [ 480.108488] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 480.108585] modprobe D 0000001a 0 1957 1950 0x00000000 > [ 480.108779] ce0fad00 00000086 53881781 0000001a c1525f88 4edbe39c 0000001a 04ac33e5 > [ 480.109236] 00000000 ccd54000 ffffffff ffffffff d204b280 c139c504 00000000 c104416d > [ 480.109689] 00000000 ce0fad00 c1054a45 c151fd8c c151fd8c d204b280 00000000 ccd6d100 > [ 480.110145] Call Trace: > [ 480.110282] [] ? schedule+0x5b/0x67 > [ 480.110417] [] ? async_synchronize_cookie_domain+0x73/0x9f > [ 480.110556] [] ? abort_exclusive_wait+0x6e/0x6e > [ 480.110685] [] ? do_init_module+0xa4/0x1a3 > [ 480.110810] [] ? load_module+0x14de/0x18ca > [ 480.110932] [] ? SyS_finit_module+0x47/0x56 > [ 480.111052] [] ? sysenter_do_call+0x12/0x12 > [ 495.062082] sd 2:0:0:0: [sdb] Attached SCSI disk > > -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/