Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757538AbcKCCQ6 (ORCPT ); Wed, 2 Nov 2016 22:16:58 -0400 Received: from kvm5.telegraphics.com.au ([98.124.60.144]:47132 "EHLO kvm5.telegraphics.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757395AbcKCCQ4 (ORCPT ); Wed, 2 Nov 2016 22:16:56 -0400 Date: Thu, 3 Nov 2016 13:17:04 +1100 (AEDT) From: Finn Thain To: Ondrej Zary cc: Christoph Hellwig , linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/6] g_NCR5380: Test the IRQ before accepting it In-Reply-To: <201611022016.00556.linux@rainbow-software.org> Message-ID: References: <1477945112-25659-1-git-send-email-linux@rainbow-software.org> <1477945112-25659-3-git-send-email-linux@rainbow-software.org> <201611022016.00556.linux@rainbow-software.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1206 Lines: 32 On Wed, 2 Nov 2016, Ondrej Zary wrote: > On Wednesday 02 November 2016 08:45:26 Finn Thain wrote: > > On Mon, 31 Oct 2016, Ondrej Zary wrote: > > > Trigger an IRQ first with a test IRQ handler to find out if it > > > really works. Disable the IRQ if not. > > > > > > This prevents hang when incorrect IRQ was specified by user. > > > > Once again, how does it cause a hang? > > Kernel scans the bus, finds a HDD, then attempts to read MBR. modprobe > process is stuck but the system is still running. Then the transfer > probably times out and everything locks up hard, even fbcon cursor stops > blinking. I guess that kernel is trying to abort or reset. I don't think this issue relates to the patch, because the chip irq is not needed for exception handling. A backtrace from the soft lockup detector should help explain this. > BTW. rescan-scsi-bus also causes hang, anytime, even without IRQ. I would try "scsi_logging_level -s -a 7" to find out what is going on during the bus scan (for modprobe or rescan-scsi-bus). The polling loops in generic_NCR5380_pread/pwrite can cause a lockup because they lack timeouts. Better to call NCR5380_poll_politely, as in macscsi_pread/pwrite. --