Received: by 10.223.176.46 with SMTP id f43csp387216wra; Thu, 18 Jan 2018 19:07:45 -0800 (PST) X-Google-Smtp-Source: ACJfBosbW8H9bF4k0xtLh5Vzrq5BNWwdGKXv6DCUMK1BT4lGkhbQPDeGJfONZG0iPW/aaFq57y2R X-Received: by 10.99.105.72 with SMTP id e69mr31891889pgc.239.1516331265707; Thu, 18 Jan 2018 19:07:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516331265; cv=none; d=google.com; s=arc-20160816; b=cUYKoGC6KyohKh8MMX4ixHj5zuyG8HvlL1MZfHplUDcxxQm8zaXq+35k8oPdDVl2rR E4Ys/jUViMs6Sl3M4TBZxn/jqx+FRnBrgMykJD7dErrojB0GneQ7MLC8/X5Uizwo+d+R gjJYQS78yXzw2Ka3/BiaViYgwuNl0fXzy5bl5Ghz2lBYmN1N+/OrpxgMtCvOr25DB458 16WdDEeQZXxbT+TaM8eDHYJwGBfsN8HKs1qB1nY9p8HWgbwa1QqcA6+fbap35OXvdeae g9lAhaxTgLQqkbuDJnLngvIOx+Xu/8/Ps9K9Vxvxhmaft7gk714EE78X3efLSfa32GTX HpSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=aw28mSoWSKRlykv7wTvW8FBpYSzzLGMI329V/GHCxIA=; b=b1yKCOOWeYQphV1ftVHav95wZdfGh9eHy2F270fVjawyf1osLloQQ/a8bVesmf7yL4 z5FgKKMFthnJA82hE/bQc8JbhCm4PAqtLEPTnqRJ1oDRrrK5CakH9mb6El89lXwrkNsm UqVA9ngRacuw+hPh7jGez+Ltwoi3hlTSa1wCLP7jyixlEnW2aYGyp+KIHgfeh5FR3jE+ lBiBItgv1bx0Torhyu4IzKpzKBj+E+TuoCTXq852LNAvzPPTyXi8oPn1H90hL8qt8d6y +GN69+0OCXyM3hu+BnpcLtZBR3m+ufK5D6tz30nwQIse93ZYqTqjy23Owq2GkxjLWs5g nqrQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=GFXDEkn4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x5si3577261pgo.709.2018.01.18.19.07.31; Thu, 18 Jan 2018 19:07:45 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=GFXDEkn4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755307AbeASDGz (ORCPT + 99 others); Thu, 18 Jan 2018 22:06:55 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:54830 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755175AbeASDGv (ORCPT ); Thu, 18 Jan 2018 22:06:51 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w0J31skL105065; Fri, 19 Jan 2018 03:05:50 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=aw28mSoWSKRlykv7wTvW8FBpYSzzLGMI329V/GHCxIA=; b=GFXDEkn4SZAoxvx3gtgimY3a9aFrJr3QGftqTl/EWBHse5/VbjFyUDg2JqO4/VqW06jb w0io5y1FcagD26YMs7QPDc/Lf49gFcfBVSu+Np/6r6YdQrrQMpIDoqiXPQfRxveOVpOJ AH+QJ+s/MWwZxDJYzgSe/BDVBj4iUbiBgrBXwvBLNTkbAGNsYg39WNywPlaWz7vbHIcJ x7gZBfBBFdJMq/m0bBK86k1uAXTyRhxhjf07PyEd5zECikWp3N4v0pddCFqkIkM5y4gi /3tStCOrfR1/76ixBbYVQXXRBYWZs1ILpHCvcVK7sgQ4n1nFS1ae09SPbLbhP5Z9TX2p vw== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp2120.oracle.com with ESMTP id 2fk88700wf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 19 Jan 2018 03:05:50 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w0J35nnx012610 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 19 Jan 2018 03:05:49 GMT Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w0J35jn7023346; Fri, 19 Jan 2018 03:05:46 GMT Received: from [10.182.70.180] (/10.182.70.180) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 18 Jan 2018 19:05:45 -0800 Subject: Re: [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each possisble CPU To: Ming Lei Cc: Keith Busch , Sagi Grimberg , Christoph Hellwig , Jens Axboe , Stefan Haberland , linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, James Smart , linux-block@vger.kernel.org, Christian Borntraeger , Thomas Gleixner , Christoph Hellwig References: <20180112025306.28004-3-ming.lei@redhat.com> <0d36c16b-cb4b-6088-fdf3-2fe5d8f33cd7@oracle.com> <20180116121010.GA26429@ming.t460p> <7c24e321-2d3b-cdec-699a-f58c34300aa9@oracle.com> <20180116153248.GA3018@ming.t460p> <7f5bad86-febc-06fc-67c0-393777d172e4@oracle.com> <20180117035159.GA9487@ming.t460p> <8c8efce8-ea02-0a9e-8369-44c885f4731d@oracle.com> <20180117062251.GC9487@ming.t460p> <977e9c62-c7f2-d1df-7d6b-5903f3b21cb6@oracle.com> <20180117095744.GF9487@ming.t460p> From: "jianchao.wang" Message-ID: <53da00dc-3d46-dcdb-2be4-277f79a9888b@oracle.com> Date: Fri, 19 Jan 2018 11:05:35 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <20180117095744.GF9487@ming.t460p> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8778 signatures=668654 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801190034 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi ming Sorry for delayed report this. On 01/17/2018 05:57 PM, Ming Lei wrote: > 2) hctx->next_cpu can become offline from online before __blk_mq_run_hw_queue > is run, there isn't warning, but once the IO is submitted to hardware, > after it is completed, how does the HBA/hw queue notify CPU since CPUs > assigned to this hw queue(irq vector) are offline? blk-mq's timeout > handler may cover that, but looks too tricky. In theory, the irq affinity will be migrated to other cpu. This is done by fixup_irqs() in the context of stop_machine. However, in my test, I found this log: [ 267.161043] do_IRQ: 7.33 No irq handler for vector The 33 is the vector used by nvme cq. The irq seems to be missed and sometimes IO hang occurred. It is not every time, I think maybe due to nvme_process_cq in nvme_queue_rq. I add dump stack behind the error log and get following: [ 267.161043] do_IRQ: 7.33 No irq handler for vector migration/7 [ 267.161045] CPU: 7 PID: 52 Comm: migration/7 Not tainted 4.15.0-rc7+ #27 [ 267.161045] Hardware name: LENOVO 10MLS0E339/3106, BIOS M1AKT22A 06/27/2017 [ 267.161046] Call Trace: [ 267.161047] [ 267.161052] dump_stack+0x7c/0xb5 [ 267.161054] do_IRQ+0xb9/0xf0 [ 267.161056] common_interrupt+0xa2/0xa2 [ 267.161057] [ 267.161059] RIP: 0010:multi_cpu_stop+0xb0/0x120 [ 267.161060] RSP: 0018:ffffbb6c81af7e70 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffde [ 267.161061] RAX: 0000000000000001 RBX: 0000000000000004 RCX: 0000000000000000 [ 267.161062] RDX: 0000000000000006 RSI: ffffffff898c4591 RDI: 0000000000000202 [ 267.161063] RBP: ffffbb6c826e7c88 R08: ffff991abc1256bc R09: 0000000000000005 [ 267.161063] R10: ffffbb6c81af7db8 R11: ffffffff89c91d20 R12: 0000000000000001 [ 267.161064] R13: ffffbb6c826e7cac R14: 0000000000000003 R15: 0000000000000000 [ 267.161067] ? cpu_stop_queue_work+0x90/0x90 [ 267.161068] cpu_stopper_thread+0x83/0x100 [ 267.161070] smpboot_thread_fn+0x161/0x220 [ 267.161072] kthread+0xf5/0x130 [ 267.161073] ? sort_range+0x20/0x20 [ 267.161074] ? kthread_associate_blkcg+0xe0/0xe0 [ 267.161076] ret_from_fork+0x24/0x30 The irq just occurred after the irq is enabled in multi_cpu_stop. 0xffffffff8112d655 is in multi_cpu_stop (/home/will/u04/source_code/linux-block/kernel/stop_machine.c:223). 218 */ 219 touch_nmi_watchdog(); 220 } 221 } while (curstate != MULTI_STOP_EXIT); 222 223 local_irq_restore(flags); 224 return err; 225 } Thanks Jianchao