Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp7422733imu; Thu, 31 Jan 2019 09:50:58 -0800 (PST) X-Google-Smtp-Source: ALg8bN4dmtqwFo89AXsrvIRt10yiPq9lQiV5w0VoiTVfEvl2Pm4YmwgRy7Q71bHzN6eN7tjrdHsT X-Received: by 2002:a17:902:ac8f:: with SMTP id h15mr34608012plr.245.1548957058862; Thu, 31 Jan 2019 09:50:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548957058; cv=none; d=google.com; s=arc-20160816; b=l6Q5GyVpbdHI+7bY/qwQRK70CKrfrsgQ9x7/lHDyIfeff0bqVHBTxHdxXe8b6LIhEr HTUCBx/hep33W/iH03pJ69gLL5tM2jwF50/py/ejlzpG4odr0sVm2hjUgc2yF16iNjmF uVlwPa+gNvGakUazc7hNTf61+vzLT9milxSqguFhmY0RIlp12iFCKzODsQTJ6btpK1uA fH74QPYem6rTBrxMo1Wz4UBJXx4YE/tyhJceessXNkfrO6SPc4pO4UXUZBLXJz3n9chZ 3JtfsJbpumi6OlneiQVGy8giq9y5xrAh/EfTj08a68+qhZDro5XN9yZxCZ537hGFyo3X fWcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=/bZ35vhj/RGO4ku3sY0FHgZbk85BJFhoVAC/jyOAjmY=; b=VTDwZKKItw3NStuOh7q/fhE4F/+iV+CyeFGCMPFe7hIrV5+VUkiwR/b3jeMp3cBLqg OflNyXgW75KFztc+dI9C96OPUKrw+bIjQRMT7QICBaihc+UeXhBMpUfD4y952PmVa7ZZ xYv84aOZ13sE1FCZXaUf4Au23Va1toAA1l4Ibak02AWWiYJtU2jBpwnElJf4VwZhBRJA BJtX19t8VlmrvRp4ozZ7MZYPGHohvAW6XSYAQAOiNG+OESxRULEfwbaNob77rShguAFX 5MykV8k95rv6TeRkfsq04KMA3siRI88eMqckHl2JntKiT48MawtOyPPQG61Y9kh1nWqB 6sYg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q14si4771178pgg.433.2019.01.31.09.50.42; Thu, 31 Jan 2019 09:50:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727587AbfAaRsU (ORCPT + 99 others); Thu, 31 Jan 2019 12:48:20 -0500 Received: from szxga08-in.huawei.com ([45.249.212.255]:51468 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725849AbfAaRsU (ORCPT ); Thu, 31 Jan 2019 12:48:20 -0500 Received: from DGGEMS414-HUB.china.huawei.com (unknown [172.30.72.54]) by Forcepoint Email with ESMTP id 60DD867F5522B6745ED4; Fri, 1 Feb 2019 01:48:17 +0800 (CST) Received: from [127.0.0.1] (10.202.226.43) by DGGEMS414-HUB.china.huawei.com (10.3.19.214) with Microsoft SMTP Server id 14.3.408.0; Fri, 1 Feb 2019 01:48:09 +0800 Subject: Re: Question on handling managed IRQs when hotplugging CPUs To: Thomas Gleixner References: <20190129154433.GF15302@localhost.localdomain> <757902fc-a9ea-090b-7853-89944a0ce1b5@huawei.com> <20190129172059.GC17132@localhost.localdomain> <3fe63dab-0791-f476-69c4-9866b70e8520@huawei.com> CC: Keith Busch , Christoph Hellwig , "Marc Zyngier" , "axboe@kernel.dk" , "Peter Zijlstra" , Michael Ellerman , Linuxarm , "linux-kernel@vger.kernel.org" , Hannes Reinecke , "linux-scsi@vger.kernel.org" , "linux-block@vger.kernel.org" From: John Garry Message-ID: <86d5028d-44ab-3696-f7fe-828d7655faa9@huawei.com> Date: Thu, 31 Jan 2019 17:48:02 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.202.226.43] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 30/01/2019 12:43, Thomas Gleixner wrote: > On Wed, 30 Jan 2019, John Garry wrote: >> On 29/01/2019 17:20, Keith Busch wrote: >>> On Tue, Jan 29, 2019 at 05:12:40PM +0000, John Garry wrote: >>>> On 29/01/2019 15:44, Keith Busch wrote: >>>>> >>>>> Hm, we used to freeze the queues with CPUHP_BLK_MQ_PREPARE callback, >>>>> which would reap all outstanding commands before the CPU and IRQ are >>>>> taken offline. That was removed with commit 4b855ad37194f ("blk-mq: >>>>> Create hctx for each present CPU"). It sounds like we should bring >>>>> something like that back, but make more fine grain to the per-cpu >>>>> context. >>>>> >>>> >>>> Seems reasonable. But we would need it to deal with drivers where they >>>> only >>>> expose a single queue to BLK MQ, but use many queues internally. I think >>>> megaraid sas does this, for example. >>>> >>>> I would also be slightly concerned with commands being issued from the >>>> driver unknown to blk mq, like SCSI TMF. >>> >>> I don't think either of those descriptions sound like good candidates >>> for using managed IRQ affinities. >> >> I wouldn't say that this behaviour is obvious to the developer. I can't see >> anything in Documentation/PCI/MSI-HOWTO.txt >> >> It also seems that this policy to rely on upper layer to flush+freeze queues >> would cause issues if managed IRQs are used by drivers in other subsystems. >> Networks controllers may have multiple queues and unsoliciated interrupts. > > It's doesn't matter which part is managing flush/freeze of queues as long > as something (either common subsystem code, upper layers or the driver > itself) does it. > > So for the megaraid SAS example the BLK MQ layer obviously can't do > anything because it only sees a single request queue. But the driver could, > if the the hardware supports it. tell the device to stop queueing > completions on the completion queue which is associated with a particular > CPU (or set of CPUs) during offline and then wait for the on flight stuff > to be finished. If the hardware does not allow that, then managed > interrupts can't work for it. > A rough audit of current SCSI drivers tells that these set PCI_IRQ_AFFINITY in some path but don't set Scsi_host.nr_hw_queues at all: aacraid, be2iscsi, csiostor, megaraid, mpt3sas I don't know specific driver details, like changing completion queue. Thanks, John > Thanks, > > tglx > > . >