Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp2311989imm; Thu, 2 Aug 2018 09:28:48 -0700 (PDT) X-Google-Smtp-Source: AAOMgpfRsbw05MqE6Qk9HWRvufycoKmWTrsyvgb6hg5178Wf/aStiXSFafiLABhEsngJHxrCYeVw X-Received: by 2002:a65:48cd:: with SMTP id o13-v6mr151215pgs.99.1533227328293; Thu, 02 Aug 2018 09:28:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533227328; cv=none; d=google.com; s=arc-20160816; b=fjQ9yanM8NX+K6s5HrYhhrgODNALfgmNaLMlnwTPZ82CCHM5ITPrv8f76aZpIQFzwR l2KcWRgDGaU8TvoYdG1wiD0WQfD1DO+buABb3X8AVXLK2/6g6NnTYOzwMEWLfVE8TXL1 bnxrRxndcelhS3hSGrdK33N6erGZArilLQF4kCGmV8aLhmdn/J41o6US92Q7g+8XZBX3 CnFn3DKoAiP48dHxzF8QLAF811lOg4s6Hh4F/DuskdQHML+s9et+HP+IrB8cmPnctQRd waO1KdEQh3CuxU6zxW0ykE+dtk9JKwkxhBql/MEw3joA6X5L4mnwXdY8uJ3t7g/5mqdG xxkw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=RYV93tRb+B7ilD1c+cHiyQbLEw1b8QSOnfEHse1G4Mc=; b=v/Ezq9R7P/YWk0oQqO95USBTDPCJghyXdVBEePuHxsmRHw3sBzsx3r9pkx/Ams0Iok YzHF4bn6a/mX1LKNF+RSfcgU82CEmw/XIer9t6MEfwVrtOZDCeRb/W6Zi5an3wVRAp6R DeM1Ru0dOE6lYhLzmbib0OneZaJZfkDALRupSTUP934rfdNKgWSoIHuzUskejx2I7O5d RUcUimZF9y5KrpPmmTY/DKuI3iK3EjD+iy2rtkwWaYdM464Fm5ZCdqtvcq7EHeVKE9Og A4tvfxtPApXj4H738EZ4VvrHqh2PI1Nnz6k4YDp/7yT+zb6jgmLQSDaaAdH8rEMtNOsy FX9Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q18-v6si1669321plr.134.2018.08.02.09.28.27; Thu, 02 Aug 2018 09:28:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727340AbeHBSTc (ORCPT + 99 others); Thu, 2 Aug 2018 14:19:32 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:36650 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726221AbeHBSTb (ORCPT ); Thu, 2 Aug 2018 14:19:31 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5F0DE8197005; Thu, 2 Aug 2018 16:27:22 +0000 (UTC) Received: from ming.t460p (ovpn-12-120.pek2.redhat.com [10.72.12.120]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7DF6920389E0; Thu, 2 Aug 2018 16:27:08 +0000 (UTC) Date: Fri, 3 Aug 2018 00:27:03 +0800 From: Ming Lei To: Guenter Roeck Cc: Ming Lei , linux-ide@vger.kernel.org, Tejun Heo , James Bottomley , Stephen Rothwell , Linux-Next Mailing List , Linux Kernel Mailing List , linux-scsi , Christoph Hellwig , Josef Bacik , Jens Axboe Subject: Re: linux-next: Tree for Aug 1 Message-ID: <20180802162654.GA8928@ming.t460p> References: <20180801175852.36549130@canb.auug.org.au> <20180801224813.GA13074@roeck-us.net> <1533163965.3158.1.camel@HansenPartnership.com> <20180801234727.GA3762@roeck-us.net> <1533168205.3158.12.camel@HansenPartnership.com> <171b2cdc-2e74-2b3c-e5f5-c656a196601a@roeck-us.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.1 (2017-09-22) X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 02 Aug 2018 16:27:22 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 02 Aug 2018 16:27:22 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'ming.lei@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 02, 2018 at 06:05:16AM -0700, Guenter Roeck wrote: > On 08/02/2018 04:35 AM, Ming Lei wrote: > > On Thu, Aug 2, 2018 at 12:58 PM, Guenter Roeck wrote: > > > On 08/01/2018 05:03 PM, James Bottomley wrote: > > > > > > > > On Thu, 2018-08-02 at 07:57 +0800, Ming Lei wrote: > > > > > > > > > > On Thu, Aug 2, 2018 at 7:47 AM, Guenter Roeck > > > > > wrote: > > > > > > > > > > > > On Wed, Aug 01, 2018 at 03:52:45PM -0700, James Bottomley wrote: > > > > > > > > > > > > > > On Wed, 2018-08-01 at 15:48 -0700, Guenter Roeck wrote: > > > > > > > > > > > > > > > > On Wed, Aug 01, 2018 at 05:58:52PM +1000, Stephen Rothwell > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > Changes since 20180731: > > > > > > > > > > > > > > > > > > The pci tree gained a conflict against the pci-current tree. > > > > > > > > > > > > > > > > > > The net-next tree gained a conflict against the bpf tree. > > > > > > > > > > > > > > > > > > The block tree lost its build failure. > > > > > > > > > > > > > > > > > > The staging tree still had its build failure due to an > > > > > > > > > interaction > > > > > > > > > with > > > > > > > > > the vfs tree for which I disabled CONFIG_EROFS_FS. > > > > > > > > > > > > > > > > > > The kspp tree lost its build failure. > > > > > > > > > > > > > > > > > > Non-merge commits (relative to Linus' tree): 10070 > > > > > > > > > 9137 files changed, 417605 insertions(+), 179996 deletions(- > > > > > > > > > ) > > > > > > > > > > > > > > > > > > ----------------------------------------------------------- > > > > > > > > > ------ > > > > > > > > > ----------- > > > > > > > > > > > > > > > > > > > > > > > > > The widespread kernel hang issues are still seen. I managed > > > > > > > > to bisect it after working around the transient build failures. > > > > > > > > Bisect log is attached below. Unfortunately, it doesn't help > > > > > > > > much. > > > > > > > > The culprit is reported as: > > > > > > > > > > > > > > > > 2d542828c5e9 Merge remote-tracking branch 'scsi/for-next' > > > > > > > > > > > > > > > > The preceding merge, > > > > > > > > > > > > > > > > 453f1d821165 Merge remote-tracking branch 'cgroup/for-next' > > > > > > > > > > > > > > > > checks out fine, as does the tip of scsi-next (commit > > > > > > > > 103c7b7e0184, > > > > > > > > "Merge branch 'misc' into for-next"). No idea how to proceed. > > > > > > > > > > > > > > > > > > > > > This sounds like you may have a problem with this patch: > > > > > > > > > > > > > > commit d5038a13eca72fb216c07eb717169092e92284f1 > > > > > > > Author: Johannes Thumshirn > > > > > > > Date: Wed Jul 4 10:53:56 2018 +0200 > > > > > > > > > > > > > > scsi: core: switch to scsi-mq by default > > > > > > > > > > > > > > To verify, boot with the additional kernel parameter > > > > > > > > > > > > > > scsi_mod.use_blk_mq=0 > > > > > > > > > > > > > > Which will reverse the effect of the above patch. > > > > > > > > > > > > > > > > > > > Yes, that fixes the problem. > > > > > > > > > > > > > > > That may not the root cause, given this issue is only started to > > > > > see from next-20180731, but d5038a13eca7 (scsi: core: switch to > > > > > scsi-mq by default) > > > > > has been in -next for quite a while. > > > > > > > > > > Seems something new causes this issue. > > > > > > > > > > > > Read my other email about how to find this. > > > > > > > > https://marc.info/?l=linux-scsi&m=153316446223676 > > > > > > > > Now that we've confirmed the issue, Gunter, could you attempt to bisect > > > > it as that email describes? > > > > > > > > > > So, I am more and more baffled. > > > > > > I ran another round of bisect, this time each test executing twice, > > > once with "scsi_mod.use_blk_mq=1" and once with "scsi_mod.use_blk_mq=0", > > > requiring both to pass. Bisect still points to the merge as culprit. > > > > > > Ok, one step further: Actually _revert_ commit d5038a13eca72 before running > > > each test, meaning the default is use_blk_mq=0. Still run both tests. > > > Bisect _still_ points to the merge of scsi-next as culprit. > > > > > > So, to me it looks like the problem is triggered by _something_ in > > > scsi-next, combined with _something_ in -next prior to the merge, > > > not specifically associated with use_blk_mq=[0|1] or d5038a13eca72, > > > but to a combination of some patch in scsi-next and some other patch. > > > > Today I am a bit busy, and not trace it much. > > > > So far, I found the code hangs in scsi_test_unit_ready() > > <-get_capabilities()<-sr_probe(), and scsi_queue_rq()/ata_scsi_queuecmd() > > has queued the command successfully, but never completed. > > > > Also tried to revert commits merged to ata tree on 30th, 31th, > > but no difference. > > > > Looking at my commit logs, the problem started to happen after various DMA > changes were introduced. The boot tests fail on ppc (few), mips (all 32 bit, > most 64 bit), i386 (all), x86_64 (most). All other platform pass, even with > the same type of boot tests. Here is an example from alpha: > > Building alpha:defconfig:initrd ... running .... passed > Building alpha:defconfig:sata:rootfs ... running ..... passed > Building alpha:defconfig:usb:rootfs ... running ..... passed > Building alpha:defconfig:usb-uas:rootfs ... running ...... passed > Building alpha:defconfig:scsi[AM53C974]:rootfs ... running ....... passed > Building alpha:defconfig:scsi[DC395]:rootfs ... running ....... passed > Building alpha:defconfig:scsi[MEGASAS]:rootfs ... running ...... passed > Building alpha:defconfig:scsi[MEGASAS2]:rootfs ... running ...... passed > Building alpha:defconfig:scsi[FUSION]:rootfs ... running ...... passed > Building alpha:defconfig:nvme:rootfs ... running ..... passed > > arm64: > > Building arm64:virt:defconfig:smp:initrd ... running ..... passed > Building arm64:virt:defconfig:smp:usb:rootfs ... running ..... passed > Building arm64:virt:defconfig:smp:usb-uas:rootfs ... running ..... passed > Building arm64:virt:defconfig:smp:virtio:rootfs ... running ..... passed > Building arm64:virt:defconfig:smp:nvme:rootfs ... running ..... passed > Building arm64:virt:defconfig:smp:mmc:rootfs ... running ..... passed > Building arm64:virt:defconfig:smp:scsi[DC395]:rootfs ... running ..... passed > Building arm64:virt:defconfig:smp:scsi[AM53C974]:rootfs ... running ..... passed > Building arm64:virt:defconfig:smp:scsi[MEGASAS]:rootfs ... running ..... passed > Building arm64:virt:defconfig:smp:scsi[MEGASAS2]:rootfs ... running ..... passed > Building arm64:virt:defconfig:smp:scsi[53C810]:rootfs ... running ...... passed > Building arm64:virt:defconfig:smp:scsi[53C895A]:rootfs ... running ...... passed > Building arm64:virt:defconfig:smp:scsi[FUSION]:rootfs ... running ...... passed > Skipping arm64:xlnx-zcu102:defconfig:smp:initrd:xilinx/zynqmp-ep108 ... > Skipping arm64:xlnx-zcu102:defconfig:smp:sd:rootfs:xilinx/zynqmp-ep108 ... > Skipping arm64:xlnx-zcu102:defconfig:smp:sata:rootfs:xilinx/zynqmp-ep108 ... > Building arm64:xlnx-zcu102:defconfig:smp:initrd:xilinx/zynqmp-zcu102-rev1.0 ... running ....... passed > Building arm64:xlnx-zcu102:defconfig:smp:sd1:rootfs:xilinx/zynqmp-zcu102-rev1.0 ... running ......... passed > Building arm64:xlnx-zcu102:defconfig:smp:sata:rootfs:xilinx/zynqmp-zcu102-rev1.0 ... running ...... passed > Building arm64:raspi3:defconfig:smp:initrd:broadcom/bcm2837-rpi-3-b ... running ..... passed > Building arm64:raspi3:defconfig:smp:sd:rootfs:broadcom/bcm2837-rpi-3-b ... running ........ passed > Building arm64:virt:defconfig:nosmp:initrd ... running ..... passed > Skipping arm64:xlnx-zcu102:defconfig:nosmp:initrd:xilinx/zynqmp-ep108 ... > Skipping arm64:xlnx-zcu102:defconfig:nosmp:sd:rootfs:xilinx/zynqmp-ep108 ... > Building arm64:xlnx-zcu102:defconfig:nosmp:initrd:xilinx/zynqmp-zcu102-rev1.0 ... running ......... passed > Building arm64:xlnx-zcu102:defconfig:nosmp:sd1:rootfs:xilinx/zynqmp-zcu102-rev1.0 ... running ......... passed > > ppc: > > Building powerpc:mac99:qemu_ppc_book3s_defconfig:nosmp:rootfs ... running ....... passed > Building powerpc:g3beige:qemu_ppc_book3s_defconfig:nosmp:rootfs ... running ...... passed > Building powerpc:mac99:qemu_ppc_book3s_defconfig:smp:rootfs ... running ....... passed > Building powerpc:virtex-ml507:44x/virtex5_defconfig:devtmpfs:initrd ... running .... passed > Building powerpc:mpc8544ds:mpc85xx_defconfig:initrd ... running .... passed > Building powerpc:mpc8544ds:mpc85xx_defconfig:scsi:rootfs ... running ..... passed > Building powerpc:mpc8544ds:mpc85xx_defconfig:sata:rootfs ... running .... passed > Building powerpc:mpc8544ds:mpc85xx_smp_defconfig:initrd ... running .... passed > Building powerpc:mpc8544ds:mpc85xx_smp_defconfig:scsi:rootfs ... running ..... passed > Building powerpc:mpc8544ds:mpc85xx_smp_defconfig:sata:rootfs ... running .... passed > Building powerpc:bamboo:44x/bamboo_defconfig:devtmpfs:initrd ... running .... passed > Building powerpc:bamboo:44x/bamboo_defconfig:devtmpfs:scsi[AM53C974]:rootfs ... running ..... passed > Building powerpc:bamboo:44x/bamboo_defconfig:devtmpfs:smp:initrd ... running .... passed > Building powerpc:bamboo:44x/bamboo_defconfig:devtmpfs:smp:scsi[AM53C974]:rootfs ... running ..... passed > Building powerpc:sam460ex:44x/canyonlands_defconfig:devtmpfs:initrd ... running ..... passed > Building powerpc:sam460ex:44x/canyonlands_defconfig:devtmpfs:usbdisk:rootfs ... running ...... passed > Building powerpc:mac99:pmac32_defconfig:devtmpfs:zilog:initrd ... running .................................. failed (timeout) > Building powerpc:mac99:pmac32_defconfig:devtmpfs:zilog:rootfs ... running .................................. failed (timeout) > > Maybe that is a coincidence, but it is at least suspicious. This issue can be fixed by reverting d250bf4e776ff09d5 ("blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iter"). This patch looks wrong, because 'blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT' isn't completely same with 'blk_mq_request_started(req)'. Thanks, Ming