Received: by 10.223.164.202 with SMTP id h10csp5601956wrb; Tue, 21 Nov 2017 12:22:03 -0800 (PST) X-Google-Smtp-Source: AGs4zMbJR0/Z7Z977DmwxDMou0A0/wH94DY0EiRqZobZg8m4qeCOWBp/CCGOk2CKUQkDjaAjdA8M X-Received: by 10.98.14.195 with SMTP id 64mr16777434pfo.197.1511295723244; Tue, 21 Nov 2017 12:22:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511295723; cv=none; d=google.com; s=arc-20160816; b=M4HCp4FfVAqBVmJTQtC/wV0z9h2o2NxCxHjJnXs1Pf5H6yWN8tLgrMp5mK0fRajjAJ et4cnGr3aaZBwdbqmU3GRsqJN6cd1cBAEwIPtzbV4hWb1usNpw3YT58yIp1bXEP/K4in xn/W3/r5h9Q4KHtdOX+LSMIyB2ZlpYvueJ+++9AICc6CJwQWsAQbVLRHGqODKYBRBVKq sUwAdw5K9PD1vhkZ3p9VnSvcF65xvwIgHesp6H6ZNcZdUhbeM63xl5UcB9ZCyYo1uqg2 0xTK7RCdSq1tKDfTX2w5RZHc2mKUQiFZsBx3IEHUVtOlBmXeXi4AaVbUlWAxFhlvazuv /tRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:to:subject:dkim-signature :arc-authentication-results; bh=kXLyDphylS1dqH35mU2ldicqAQK5Ul+Mblbw5/ohfq0=; b=aAXK1fdOMoaJAVvUBoBeVTIXMY4bHVcktkzVm6NIhmREPhVpV67nI6ui1XMzGjG1ON NU64aLNZ4RyIk7C+aCMMQ8wNpOAG3T0JKF2+wLcSOIsRwh3QmzZ/MM8LxmEIsdYbTXBe blyGu8vLrK92nD5qXjsvHjq11k3jGpFO2a9E/Wd1+rQ1Yod0PURJYKhuP+dkJXC2RyMO a00QUVxo7WMbuO0/J3+Y9n6Fuv3QKRnxn88sWDPXEpwKUpJgN0UBp9T2JVCvwghwMVjl FMLyy3AWd1duoSCZEDRDPNWlez4xbBZfBfK9bUdze2MOXmQfo34szscTq3dg6SXAxiwd MXXg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=A9UP+ddZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a14si11987786plt.567.2017.11.21.12.21.51; Tue, 21 Nov 2017 12:22:03 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=A9UP+ddZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751410AbdKUUVO (ORCPT + 76 others); Tue, 21 Nov 2017 15:21:14 -0500 Received: from mail-pg0-f66.google.com ([74.125.83.66]:41441 "EHLO mail-pg0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751145AbdKUUVL (ORCPT ); Tue, 21 Nov 2017 15:21:11 -0500 Received: by mail-pg0-f66.google.com with SMTP id p9so11052361pgc.8 for ; Tue, 21 Nov 2017 12:21:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=kXLyDphylS1dqH35mU2ldicqAQK5Ul+Mblbw5/ohfq0=; b=A9UP+ddZR7j4D3FSZZe2w9ZPauEQtL+oYWg/O7+CKBNHIx0vgVsy6WsrWJ5lN4zwVP cHcgGFGuFOdH59H2iswi0nZ8jTdys372oq2acyJ6WGkpJ7TTBHz5pF2sTolGnboYM4FI Km9j1Rz1Qf4rKulDAOXq6Lyd9phmdAFSDRaW/MB0ts+4KrTeZhIG0DqQrQLvrtJGVjvo yPRf6mnM1lOnd/RJkV9uxR7dgOWhy8ouUrUS7Nc0GfJ0ChJP5A4o9BHRH3CdIm0TCDix PgLncGriuOw7vV4a9XBBFs0ckOBTs1qzWE0LhLQk6Cqa933ODokoxU9I51zNYabCQTvb 4t3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=kXLyDphylS1dqH35mU2ldicqAQK5Ul+Mblbw5/ohfq0=; b=j3IZDANzp7WSAk+JjfX709XEebFKr1KiEBLSHQ/VMi0r5U9thguNJ6IUV+d79um1xD n/La+btBhJFXYixCWCOVK5bj8DNfp6XiPv8aPfrh3LbyWwkS0lavlp9PCt15ZpBNXq8/ f9QXsqzP72/GBjWGcj7bBanJKzVhj75C8lheytbHaPVMQxh2T3kCHvgernasOObvFONs x1yap1ffzZX3DvURgoNRyE6WGP16Aq1CMFZ4xCHp5AZNykKIVshSpPt0CWjOT1aW0Y9m oizzZE7ATnazWTmtjJygN3DMVzAy4YG0Fp5fk86xyRCiu1TE00toDSNWhYsqt0aBvlkL cjxA== X-Gm-Message-State: AJaThX7u0UE5IBljO7Y6b2NJNBH/D/IvGgxltahvd7AXaBOLFAatg3lz YqZximIMGx5fMJyprTO18L7rcA== X-Received: by 10.159.231.2 with SMTP id w2mr18497513plq.286.1511295671133; Tue, 21 Nov 2017 12:21:11 -0800 (PST) Received: from ?IPv6:2620:10d:c081:1130::1066? ([2620:10d:c090:180::1:9a4c]) by smtp.gmail.com with ESMTPSA id j6sm25913663pfk.152.2017.11.21.12.21.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 Nov 2017 12:21:10 -0800 (PST) Subject: Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable) To: Christian Borntraeger , Bart Van Assche , "virtualization@lists.linux-foundation.org" , "linux-block@vger.kernel.org" , "mst@redhat.com" , "jasowang@redhat.com" , "linux-kernel@vger.kernel.org" , Christoph Hellwig , Greg Kroah-Hartman , stable@vger.kernel.org References: <9c5eec5d-f542-4d76-6933-6fe31203ce09@de.ibm.com> <1511205644.2396.32.camel@wdc.com> <04526c98-ffc5-1eca-3aa8-50f9212c4323@de.ibm.com> <5c9f2228-0a8b-8225-7038-e6cb3f31ca0b@kernel.dk> <2e44dbd3-2f90-c267-560c-91d1d4b0e892@de.ibm.com> <823b9dd5-7781-5a72-03ff-bc931433fc19@kernel.dk> <15f232d2-2aaa-df7c-57e8-2f710e051e84@de.ibm.com> <055f040d-3f9a-a8fd-e8e2-326c6b9094a1@kernel.dk> <1aeecf2e-a68e-4c18-5912-2473f457e6ea@de.ibm.com> <8fedc2ad-d775-7789-742c-92ca928a3aee@kernel.dk> <276625a9-44fb-719d-9281-caacefdbb99f@de.ibm.com> From: Jens Axboe Message-ID: Date: Tue, 21 Nov 2017 13:21:08 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <276625a9-44fb-719d-9281-caacefdbb99f@de.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/21/2017 01:19 PM, Christian Borntraeger wrote: > > On 11/21/2017 09:14 PM, Jens Axboe wrote: >> On 11/21/2017 01:12 PM, Christian Borntraeger wrote: >>> >>> >>> On 11/21/2017 08:30 PM, Jens Axboe wrote: >>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote: >>>>> >>>>> >>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote: >>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote: >>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote: >>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote: >>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote: >>>>>>>>>>> Bisect points to >>>>>>>>>>> >>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit >>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1 >>>>>>>>>>> Author: Christoph Hellwig >>>>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200 >>>>>>>>>>> >>>>>>>>>>> blk-mq: Create hctx for each present CPU >>>>>>>>>>> >>>>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream. >>>>>>>>>>> >>>>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot >>>>>>>>>>> of churn due to frequent soft offline / online operations. Instead >>>>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify >>>>>>>>>>> the code. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Christoph Hellwig >>>>>>>>>>> Reviewed-by: Jens Axboe >>>>>>>>>>> Cc: Keith Busch >>>>>>>>>>> Cc: linux-block@vger.kernel.org >>>>>>>>>>> Cc: linux-nvme@lists.infradead.org >>>>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de >>>>>>>>>>> Signed-off-by: Thomas Gleixner >>>>>>>>>>> Cc: Oleksandr Natalenko >>>>>>>>>>> Cc: Mike Galbraith >>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman >>>>>>>>>> >>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll >>>>>>>>>> take a look. >>>>>>>>> >>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means >>>>>>>>> that if I offline a few CPUs here and register a queue, those still show >>>>>>>>> up as present (just offline) and get mapped accordingly. >>>>>>>>> >>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show >>>>>>>>> up as present and it gets hotplugged, then I can see how this condition >>>>>>>>> would trigger. What environment are you running this in? We might have >>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor >>>>>>>>> for a dead cpu and handle that. >>>>>>>> >>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously >>>>>>>> not available CPU. >>>>>>>> >>>>>>>> in libvirt/virsh speak: >>>>>>>> 4 >>>>>>> >>>>>>> So that's why we run into problems. It's not present when we load the device, >>>>>>> but becomes present and online afterwards. >>>>>>> >>>>>>> Christoph, we used to handle this just fine, your patch broke it. >>>>>>> >>>>>>> I'll see if I can come up with an appropriate fix. >>>>>> >>>>>> Can you try the below? >>>>> >>>>> >>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq: >>>>> >>>>> >>>>> output with 2 cpus: >>>>> /sys/kernel/debug/block/vda >>>>> /sys/kernel/debug/block/vda/hctx0 >>>>> /sys/kernel/debug/block/vda/hctx0/cpu0 >>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed >>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged >>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched >>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list >>>>> /sys/kernel/debug/block/vda/hctx0/active >>>>> /sys/kernel/debug/block/vda/hctx0/run >>>>> /sys/kernel/debug/block/vda/hctx0/queued >>>>> /sys/kernel/debug/block/vda/hctx0/dispatched >>>>> /sys/kernel/debug/block/vda/hctx0/io_poll >>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap >>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags >>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap >>>>> /sys/kernel/debug/block/vda/hctx0/tags >>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map >>>>> /sys/kernel/debug/block/vda/hctx0/busy >>>>> /sys/kernel/debug/block/vda/hctx0/dispatch >>>>> /sys/kernel/debug/block/vda/hctx0/flags >>>>> /sys/kernel/debug/block/vda/hctx0/state >>>>> /sys/kernel/debug/block/vda/sched >>>>> /sys/kernel/debug/block/vda/sched/dispatch >>>>> /sys/kernel/debug/block/vda/sched/starved >>>>> /sys/kernel/debug/block/vda/sched/batching >>>>> /sys/kernel/debug/block/vda/sched/write_next_rq >>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list >>>>> /sys/kernel/debug/block/vda/sched/read_next_rq >>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list >>>>> /sys/kernel/debug/block/vda/write_hints >>>>> /sys/kernel/debug/block/vda/state >>>>> /sys/kernel/debug/block/vda/requeue_list >>>>> /sys/kernel/debug/block/vda/poll_stat >>>> >>>> Try this, basically just a revert. >>> >>> Yes, seems to work. >>> >>> Tested-by: Christian Borntraeger >> >> Great, thanks for testing. >> >>> Do you know why the original commit made it into 4.12 stable? After all >>> it has no Fixes tag and no cc stable- >> >> I was wondering the same thing when you said it was in 4.12.stable and >> not in 4.12 release. That patch should absolutely not have gone into >> stable, it's not marked as such and it's not fixing a problem that is >> stable worthy. In fact, it's causing a regression... >> >> Greg? Upstream commit is mentioned higher up, start of the email. >> > > > Forgot to cc Greg? I did, thanks for doing that. Now I wonder how to mark this patch, as we should revert it from kernels that have the bad commit. 4.12 is fine, 4.12.later-stable is not. -- Jens Axboe From 1584708305764251734@xxx Tue Nov 21 20:20:10 +0000 2017 X-GM-THRID: 1584670276912512570 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread