Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752561AbaKKXd3 (ORCPT ); Tue, 11 Nov 2014 18:33:29 -0500 Received: from smtp3.tech.numericable.fr ([82.216.111.39]:33184 "EHLO smtp3.tech.numericable.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752183AbaKKXd2 (ORCPT ); Tue, 11 Nov 2014 18:33:28 -0500 Message-ID: <54629CAE.2000207@laposte.net> Date: Wed, 12 Nov 2014 00:33:02 +0100 From: Barto User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: linux-kernel@vger.kernel.org Subject: BUG in scsi_lib.c due to a bad commit Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: 0 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeejhedrieeggdekudcutefuodetggdotefrucfrrhhofhhilhgvmecupfgfoffgtffkveetuefngfenuceurghilhhouhhtmecufedttdenucenucfjughrpefkfffhfgggvffutgfgsehtjegrtddtfeejnecuhfhrohhmpeeurghrthhouceomhhishhtvghrrdhfrhgvvghmrghnsehlrghpohhsthgvrdhnvghtqeenucffohhmrghinhepkhgvrhhnvghlrdhorhhg Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello everyone, I notice a bug since kernel 3.17 ( and also with 3.18 branch ), a random hang at boot on some PC configurations, I did a "git bisect" and I found that the culprit is : [045065d8a300a37218c548e9aa7becd581c6a0e8] [SCSI] fix qemu boot hang problem http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=045065d8a300a37218c548e9aa7becd581c6a0e8 the author of this commit has choosen to inverse the logic of the if statement in the file drivers/scsi/scsi_lib.c in order to solve an issue with qemu : --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1774,7 +1774,7 @@ static void scsi_request_fn(struct request_queue *q) blk_requeue_request(q, req); atomic_dec(&sdev->device_busy); out_delay: - if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev)) + if (!atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev)) blk_delay_queue(q, SCSI_QUEUE_DELAY); } this change triggers a bug on my PC ( I don't have SCSI devices, but only 3 SATA harddisks and 2 IDE harddisks, SATA disks are on an ICH7 sata controler on the motherboard gigabyte GA-P31-DS3L, and IDE disk on a JMicron JMB363/368 Sata/IDE PCIe card ), every 5~10 boots the boot stops suddenly because of this commit, If I revert this commit then the bug is gone, more details can be found here, where I created a patch who reverts commit 045065d8 : https://bugzilla.kernel.org/show_bug.cgi?id=87581 my question: why Guenter Roeck ( the author of the bad commit ) has choosen to inverse the logic in the if statement ? before his commit the if statement was like this : if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev)) blk_delay_queue(q, SCSI_QUEUE_DELAY); if his decision to inverse the logic of "atomic_read(&sdev->device_busy)" is acceptable then the real bug is probably located elsewhere in the scsi source code, and we must solve this mistery because there is obviously a bug regression in SCSI code because with older kernels ( 3.16.7 and lower ) I don't have the random hang boot bug with my configuration, another user in archlinux forums has also this bug and he has a more modern PC ( intel i7 core cpu, SSD device ), my fear is when linux distros will move to kernel 3.17 then more users will have this weird random bug who can occur only on boot and only with a specific PC configuration, if the boot step is passed despite the random bug then the bug will not occur, it occurs only during the boot process, which probably means that the faulty source code is only called during the boot process, thanks for anyone who wants to dig this problem with me -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/