Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp2817119ybe; Sun, 8 Sep 2019 01:14:28 -0700 (PDT) X-Google-Smtp-Source: APXvYqyrwxzWNCpvYXPEAUANaSdkBnZZChYqb8pa5eYOTR4RJBu43f80AiG8Eg5zjb3j6k3qYiIw X-Received: by 2002:a62:b416:: with SMTP id h22mr20224400pfn.180.1567930468609; Sun, 08 Sep 2019 01:14:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567930468; cv=none; d=google.com; s=arc-20160816; b=qMcdpr3nR7g3VDt8J3zgGFDNMnKYvaJcrEQp3mmJpXX1/GwhH5bZ1mZb15IJrmTxTb IiVF6yy4feRHdBvgWVJY4DVhomQ3qSlQJjiMbEvb1hpSAAfUh2attiXDk1mRGsa4eskV RWX8NDk58bNGsP6lu4md1bz1POc2MQ8UgQcpiGqwmDSUj/d92RnIxkz81ox7l/oOJ3ie K/fokB4pBXrU4eS2xZWKuXc4ZHz0/QT0nmH9neY+tO+9iHr5ZvckYlPYDlrqBImxHqKQ kYMkDVUoURwLGy39q3ApxB2HwFCHFpY4fiMrv2+ILqCAmbOUPzXxX1H/yZC9ya3VVFya JHJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=DAadR46JwjZ7dh4S4T/8sG+jikd/OWe8s4qAz0FzRYw=; b=fW82w4cFpOY6J3d3trAKaEr1omd/7Muod1taO/ouJ5mutX5ohZBzGMQoIey3JQURuo Dz5v9H/7oAujyiw1RMOv+na9mJ2M8dbQT/tEmzB1boH/c863q4aTUG5ahZOSsg+n6qQv esnmltijXzax4jfAqoxOdtRT1McCDu1WcqknPtuNGvxlSl4wHUcKGEgC0Nd81YNjyEAW yAIm3X3y2LKjyoksYgcAP8o5HLQEPYa0ps9RrYSAWclPik59oTfTAYzctkExCPkCUZK3 fQTfgrn2VyubDjxcLKB/RKK8tbxGau6iKwDX06GV8VzqAQ+sJdlBBJLNv5WP9E+4BKet wGZQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p5si3626994pjn.54.2019.09.08.01.14.13; Sun, 08 Sep 2019 01:14:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2392750AbfIFXNj (ORCPT + 99 others); Fri, 6 Sep 2019 19:13:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46626 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729017AbfIFXNj (ORCPT ); Fri, 6 Sep 2019 19:13:39 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7C1848E2B62; Fri, 6 Sep 2019 23:13:38 +0000 (UTC) Received: from ming.t460p (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CE8935D721; Fri, 6 Sep 2019 23:13:27 +0000 (UTC) Date: Sat, 7 Sep 2019 07:13:22 +0800 From: Ming Lei To: Keith Busch Cc: Long Li , Daniel Lezcano , Keith Busch , Hannes Reinecke , Bart Van Assche , "linux-scsi@vger.kernel.org" , Peter Zijlstra , John Garry , LKML , "linux-nvme@lists.infradead.org" , Jens Axboe , Ingo Molnar , Thomas Gleixner , Christoph Hellwig , Sagi Grimberg Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism Message-ID: <20190906231321.GB12290@ming.t460p> References: <6f3b6557-1767-8c80-f786-1ea667179b39@acm.org> <2a8bd278-5384-d82f-c09b-4fce236d2d95@linaro.org> <20190905090617.GB4432@ming.t460p> <6a36ccc7-24cd-1d92-fef1-2c5e0f798c36@linaro.org> <20190906014819.GB27116@ming.t460p> <20190906141858.GA3953@localhost.localdomain> <20190906221920.GA12290@ming.t460p> <20190906222555.GB4260@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190906222555.GB4260@localhost.localdomain> User-Agent: Mutt/1.11.3 (2019-02-01) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (mx1.redhat.com [10.5.110.69]); Fri, 06 Sep 2019 23:13:38 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 06, 2019 at 04:25:55PM -0600, Keith Busch wrote: > On Sat, Sep 07, 2019 at 06:19:21AM +0800, Ming Lei wrote: > > On Fri, Sep 06, 2019 at 05:50:49PM +0000, Long Li wrote: > > > >Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism > > > > > > > >Why are all 8 nvmes sharing the same CPU for interrupt handling? > > > >Shouldn't matrix_find_best_cpu_managed() handle selecting the least used > > > >CPU from the cpumask for the effective interrupt handling? > > > > > > The tests run on 10 NVMe disks on a system of 80 CPUs. Each NVMe disk has 32 hardware queues. > > > > Then there are total 320 NVMe MSI/X vectors, and 80 CPUs, so irq matrix > > can't avoid effective CPUs overlapping at all. > > Sure, but it's at most half, meanwhile the CPU that's dispatching requests > would naturally be throttled by the other half who's completions are > interrupting that CPU, no? The root cause is that multiple submission vs. single completion. Let's see two cases: 1) 10 NVMe, each 8 queues, 80 CPU cores - suppose genriq matrix can avoid effective cpu overlap, each cpu only handles one nvme interrupt - there can be concurrent submissions from 10 CPUs, and all may be completed on one CPU - IRQ flood couldn't happen for this case, given each CPU is only handling completion from one NVMe drive, which shouldn't be fast than CPU. 2) 10 NVMe, each 32 queues, 80 CPU cores - one CPU may handle 4 NVMe interrupts, each from different NVMe drive - then there may be 4*3 submissions aimed at single completion, then IRQ flood should be easy triggered on CPU for handing 4 NVMe interrupts. Because IO from 4 NVMe drive may be quicker than one CPU. I can observe IRQ flood on the case #1, but there are still CPUs for handling 2 NVMe interrupt, as the reason mentioned by Long. We could improve for this case. Thanks, Ming