Received: by 10.223.164.202 with SMTP id h10csp5617476wrb; Tue, 21 Nov 2017 12:40:07 -0800 (PST) X-Google-Smtp-Source: AGs4zMZUlRxNvKwD3w3tj8gf06QHjiOs7TGHqq+Bh8QUHWXYyeRw5NKguOURGqeyT8yfTW7ccXjk X-Received: by 10.101.90.138 with SMTP id c10mr4646060pgt.441.1511296807306; Tue, 21 Nov 2017 12:40:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511296807; cv=none; d=google.com; s=arc-20160816; b=H/tFBAX17jQMVrWH6qPnwFm9U52xPHvzULzH6aWNFxCxb8SwsA7rrpfFvlPzwm8/1W it1xDBsEOHYmjwEbpGfcxLXQgOyrYlZsUk0G11sx+9pgw7q2JiwAL0KkgXxEHUNj15Ji ueQl2TFQ6NXj5SBFBThJ8Qr0oDPSAhubFMx4zTbRATu35TlFUOB9wcXpho+R7lQF5czF 1zW2dQ1+bC4TxIV9narxapezvSUwX9w/WOzbxPVbYq1x3zOMTEO9TIq6O28ArzQXKb0s 51rKgemNgab36F8aWqorJ/x7FZDp6tqnJB03nM4TPx/ajM62idAKp9LaqX6GvDd1/FY8 eBkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:to:subject:dkim-signature :arc-authentication-results; bh=t3F5SqPZcc3uUSy7v7B0HkLjeJDGQ0sWzonCZoAi7I8=; b=AvuEeFaaALQrY6AymPSi8lbx5TFZJrzuDDatiHxSZBzcpnkfyf5970GdWvtlCQDT7M Ssl172IomemeAYZIIiCz0kk3aT9BgrkcguJ54uhe/sB4nbqoc1eNE/0ARCGvmGlwDU45 wiZKWnAHjIe43TMqVQgwpWmkZMUywh2l3thWBlPdai6dKPpTtvs6qyHF/K8wCS10JLPt aVRkgXAkfM+5Ecrh0KBX0C7eVuxhsFInISKs/YB5sZazobopDRgQBr/dd5F2iHOzNM1w p7jLmA9cczRgWxdJ5GZAaiLA0d3jbUXDQIgkH9Iav5kCnCGgbCizlFbh3Gt9brJGBHX7 /+bg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=oGz816w6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v10si11431030plz.525.2017.11.21.12.39.55; Tue, 21 Nov 2017 12:40:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=oGz816w6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751413AbdKUUjQ (ORCPT + 76 others); Tue, 21 Nov 2017 15:39:16 -0500 Received: from mail-it0-f44.google.com ([209.85.214.44]:38080 "EHLO mail-it0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751184AbdKUUjN (ORCPT ); Tue, 21 Nov 2017 15:39:13 -0500 Received: by mail-it0-f44.google.com with SMTP id n134so3635631itg.3 for ; Tue, 21 Nov 2017 12:39:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=t3F5SqPZcc3uUSy7v7B0HkLjeJDGQ0sWzonCZoAi7I8=; b=oGz816w6NSuqeMq80r9TQr6weH8VsbSJ+PsaljXzokqGvLOhLE1mD+FAx/D4iFmHbA ISUYnAATmfaaxwYJiaRItbKhEalmCGU942Qwb5/E5v/GHpprY8YAg3y9e2+iM1+9pTA3 KZjRYhIPpKi5loiF5lau+FGBiRf3tMcjbL3+bxwigZCEslAfjGqwzxDDRR+S0bWmuikH X43gJcju8NY/srD6aj+znZd3wrL1a8WGgZw/W60sgHLVOhYfpkICMC4EUpa6uQmqgCC3 xs/oraMd9bO7t9XMteW639F77C+THD4WLbdBcAyzOAsxbNkemE0N00YbBH1vcgqvmcwe ibLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=t3F5SqPZcc3uUSy7v7B0HkLjeJDGQ0sWzonCZoAi7I8=; b=KSFsNuy4RV+HqseUUST7HGBJP83XbYBHLsQxVRxrDNZcAp9OQtprT4IV6FCktHi5y/ FX4MJixzLJbLgFPzuB1dH6jxMOEa70XvZtBnCY1VR6y9B+hEqwLl7wYo67dzAlqnPvaY bWtAIgUmalp4/BSnelJDFbQPm+K3qEcCm7YXuTC48ObxHqdZahWjOvrvtICItOZcPLAR nWiAE5o0UmRha6RPaPxhZ87fYyC035zZkSkU55ptVTsThealKH4FTG4k2mW8opwwav6K z6bM3DqXjNbEvEqNLfGAH+9KiqtYVRIYg8Yq7qsIi9EqC1xOU7N2FggPCdNRCKK48tlI woNQ== X-Gm-Message-State: AJaThX6fDsltlFESk98i94Peb+j3kDqD2edEtOI3nw8cup9xfx3LrdPU LgZVjzB3YpjpBvUd9MVR9iIISQ== X-Received: by 10.36.13.13 with SMTP id 13mr3676776itx.3.1511296753197; Tue, 21 Nov 2017 12:39:13 -0800 (PST) Received: from [192.168.1.160] ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k23sm1038091iti.22.2017.11.21.12.39.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 Nov 2017 12:39:12 -0800 (PST) Subject: Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable) To: Christian Borntraeger , Bart Van Assche , "virtualization@lists.linux-foundation.org" , "linux-block@vger.kernel.org" , "mst@redhat.com" , "jasowang@redhat.com" , "linux-kernel@vger.kernel.org" , Christoph Hellwig , Greg Kroah-Hartman , stable@vger.kernel.org References: <9c5eec5d-f542-4d76-6933-6fe31203ce09@de.ibm.com> <5c9f2228-0a8b-8225-7038-e6cb3f31ca0b@kernel.dk> <2e44dbd3-2f90-c267-560c-91d1d4b0e892@de.ibm.com> <823b9dd5-7781-5a72-03ff-bc931433fc19@kernel.dk> <15f232d2-2aaa-df7c-57e8-2f710e051e84@de.ibm.com> <055f040d-3f9a-a8fd-e8e2-326c6b9094a1@kernel.dk> <1aeecf2e-a68e-4c18-5912-2473f457e6ea@de.ibm.com> <8fedc2ad-d775-7789-742c-92ca928a3aee@kernel.dk> <276625a9-44fb-719d-9281-caacefdbb99f@de.ibm.com> <1ddd1cd4-2862-849e-7849-82544bcb86be@de.ibm.com> From: Jens Axboe Message-ID: <08e6f35a-4f49-973e-99f7-6087b44337c4@kernel.dk> Date: Tue, 21 Nov 2017 13:39:11 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <1ddd1cd4-2862-849e-7849-82544bcb86be@de.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/21/2017 01:31 PM, Christian Borntraeger wrote: > > > On 11/21/2017 09:21 PM, Jens Axboe wrote: >> On 11/21/2017 01:19 PM, Christian Borntraeger wrote: >>> >>> On 11/21/2017 09:14 PM, Jens Axboe wrote: >>>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote: >>>>> >>>>> >>>>> On 11/21/2017 08:30 PM, Jens Axboe wrote: >>>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote: >>>>>>> >>>>>>> >>>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote: >>>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote: >>>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote: >>>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote: >>>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote: >>>>>>>>>>>>> Bisect points to >>>>>>>>>>>>> >>>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit >>>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1 >>>>>>>>>>>>> Author: Christoph Hellwig >>>>>>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200 >>>>>>>>>>>>> >>>>>>>>>>>>> blk-mq: Create hctx for each present CPU >>>>>>>>>>>>> >>>>>>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream. >>>>>>>>>>>>> >>>>>>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot >>>>>>>>>>>>> of churn due to frequent soft offline / online operations. Instead >>>>>>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify >>>>>>>>>>>>> the code. >>>>>>>>>>>>> >>>>>>>>>>>>> Signed-off-by: Christoph Hellwig >>>>>>>>>>>>> Reviewed-by: Jens Axboe >>>>>>>>>>>>> Cc: Keith Busch >>>>>>>>>>>>> Cc: linux-block@vger.kernel.org >>>>>>>>>>>>> Cc: linux-nvme@lists.infradead.org >>>>>>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de >>>>>>>>>>>>> Signed-off-by: Thomas Gleixner >>>>>>>>>>>>> Cc: Oleksandr Natalenko >>>>>>>>>>>>> Cc: Mike Galbraith >>>>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman >>>>>>>>>>>> >>>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll >>>>>>>>>>>> take a look. >>>>>>>>>>> >>>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means >>>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show >>>>>>>>>>> up as present (just offline) and get mapped accordingly. >>>>>>>>>>> >>>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show >>>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition >>>>>>>>>>> would trigger. What environment are you running this in? We might have >>>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor >>>>>>>>>>> for a dead cpu and handle that. >>>>>>>>>> >>>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously >>>>>>>>>> not available CPU. >>>>>>>>>> >>>>>>>>>> in libvirt/virsh speak: >>>>>>>>>> 4 >>>>>>>>> >>>>>>>>> So that's why we run into problems. It's not present when we load the device, >>>>>>>>> but becomes present and online afterwards. >>>>>>>>> >>>>>>>>> Christoph, we used to handle this just fine, your patch broke it. >>>>>>>>> >>>>>>>>> I'll see if I can come up with an appropriate fix. >>>>>>>> >>>>>>>> Can you try the below? >>>>>>> >>>>>>> >>>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq: >>>>>>> >>>>>>> >>>>>>> output with 2 cpus: >>>>>>> /sys/kernel/debug/block/vda >>>>>>> /sys/kernel/debug/block/vda/hctx0 >>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0 >>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed >>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged >>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched >>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list >>>>>>> /sys/kernel/debug/block/vda/hctx0/active >>>>>>> /sys/kernel/debug/block/vda/hctx0/run >>>>>>> /sys/kernel/debug/block/vda/hctx0/queued >>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched >>>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll >>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap >>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags >>>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap >>>>>>> /sys/kernel/debug/block/vda/hctx0/tags >>>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map >>>>>>> /sys/kernel/debug/block/vda/hctx0/busy >>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch >>>>>>> /sys/kernel/debug/block/vda/hctx0/flags >>>>>>> /sys/kernel/debug/block/vda/hctx0/state >>>>>>> /sys/kernel/debug/block/vda/sched >>>>>>> /sys/kernel/debug/block/vda/sched/dispatch >>>>>>> /sys/kernel/debug/block/vda/sched/starved >>>>>>> /sys/kernel/debug/block/vda/sched/batching >>>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq >>>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list >>>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq >>>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list >>>>>>> /sys/kernel/debug/block/vda/write_hints >>>>>>> /sys/kernel/debug/block/vda/state >>>>>>> /sys/kernel/debug/block/vda/requeue_list >>>>>>> /sys/kernel/debug/block/vda/poll_stat >>>>>> >>>>>> Try this, basically just a revert. >>>>> >>>>> Yes, seems to work. >>>>> >>>>> Tested-by: Christian Borntraeger >>>> >>>> Great, thanks for testing. >>>> >>>>> Do you know why the original commit made it into 4.12 stable? After all >>>>> it has no Fixes tag and no cc stable- >>>> >>>> I was wondering the same thing when you said it was in 4.12.stable and >>>> not in 4.12 release. That patch should absolutely not have gone into >>>> stable, it's not marked as such and it's not fixing a problem that is >>>> stable worthy. In fact, it's causing a regression... >>>> >>>> Greg? Upstream commit is mentioned higher up, start of the email. >>>> >>> >>> >>> Forgot to cc Greg? >> >> I did, thanks for doing that. Now I wonder how to mark this patch, >> as we should revert it from kernels that have the bad commit. 4.12 >> is fine, 4.12.later-stable is not. >> > > I think we should tag it with: > > Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU") > > which should bring it into 4.13 stable and 4.14 stable. 4.12 stable seems EOL anyway. Yeah, I think so too. But thinking more about this, I'm pretty sure this adds a bad lock dependency with hotplug. Need to verify so we ensure we don't introduce a potential deadlock here... -- Jens Axboe From 1584709185437162577@xxx Tue Nov 21 20:34:09 +0000 2017 X-GM-THRID: 1584670276912512570 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread