Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp5985313ybe; Tue, 10 Sep 2019 11:45:22 -0700 (PDT) X-Google-Smtp-Source: APXvYqx42kw7u4gIZWEmbZJVLuUSJ4Owvo0ws0alCg/rkIj+IBopu7ez59DEYfhy8k3THgwdSy36 X-Received: by 2002:a17:906:bcc9:: with SMTP id lw9mr26290222ejb.161.1568141122348; Tue, 10 Sep 2019 11:45:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568141122; cv=none; d=google.com; s=arc-20160816; b=CkhPqvrErFKrkE1LhmY3pgVidKAJfslbR7Ugoq4AcluE0qQ2YJAwTHUNwhobtzIPdW clAxe/Tn9VtLdm7ugKOeVNszdDZHw24EZrOPIdqXMeHGBZhbCT15TaxwDDLhNEVMvUBF M0RUudFWHZ16Xu3lTY4e+kBOnPyD5hRuj/FyisICjPDrtIuzXRO9Xk8vVJ0ccC0+b2FC vPOiy9iPCVwtrq+br57aswyk3HfSAm5IbYyX/kCml7ytURDs2HN5CkpIjRkoTXLUXM74 5bcFxhfdHLHnXrQO18BwsYhJvzY1P9WIFDx9wLPmlGBJs344gDzaO4d/ctpo1IvZ/jEj Bvbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=DU2G5zu06G58RoQv8mIjxulFnOyiq17JDx0XS2CRqGs=; b=yURhUvkoKisH94s/WlEKAl8vuZd6D2NO6HQgFa5LAgkhecdps7n0tdMcJeQnlHH2C4 RJETgdELEGvTqx1paIiPUYDwXyLhURzxBYHR+8XHAx5aJ4dMS5o/nmeBXD7IDcxGtvQE x56IHTu2q7YeSeRu3nUIbwUD+h5wTr8t5kJhNyurwmlcNP4SIU4qB16HidsDOuFdKIk0 pf59bRQFDs1BzIBWuPcjT/POalKVNsplOPGhEkDU0PP+hlYKAXsFKwCJHcOWVosm5PlL Ws0bnaj0eEySIydu9SqV4mtOk1SwtubqVmqZyjFhY8NB6U/gASzjPaSGWgRTJd57cOHc rJTw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j35si11905247eda.144.2019.09.10.11.44.58; Tue, 10 Sep 2019 11:45:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390528AbfIJAYv (ORCPT + 99 others); Mon, 9 Sep 2019 20:24:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:38484 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729118AbfIJAYv (ORCPT ); Mon, 9 Sep 2019 20:24:51 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 326AF10576D5; Tue, 10 Sep 2019 00:24:50 +0000 (UTC) Received: from ming.t460p (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 8FDDE5C21F; Tue, 10 Sep 2019 00:24:39 +0000 (UTC) Date: Tue, 10 Sep 2019 08:24:34 +0800 From: Ming Lei To: Long Li Cc: Keith Busch , Daniel Lezcano , Keith Busch , Hannes Reinecke , Bart Van Assche , "linux-scsi@vger.kernel.org" , Peter Zijlstra , John Garry , LKML , "linux-nvme@lists.infradead.org" , Jens Axboe , Ingo Molnar , Thomas Gleixner , Christoph Hellwig , Sagi Grimberg Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism Message-ID: <20190910002433.GA20557@ming.t460p> References: <20190903072848.GA22170@ming.t460p> <6f3b6557-1767-8c80-f786-1ea667179b39@acm.org> <2a8bd278-5384-d82f-c09b-4fce236d2d95@linaro.org> <20190905090617.GB4432@ming.t460p> <6a36ccc7-24cd-1d92-fef1-2c5e0f798c36@linaro.org> <20190906014819.GB27116@ming.t460p> <20190906141858.GA3953@localhost.localdomain> <20190906221920.GA12290@ming.t460p> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190906221920.GA12290@ming.t460p> User-Agent: Mutt/1.11.3 (2019-02-01) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (mx1.redhat.com [10.5.110.64]); Tue, 10 Sep 2019 00:24:50 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Sep 07, 2019 at 06:19:20AM +0800, Ming Lei wrote: > On Fri, Sep 06, 2019 at 05:50:49PM +0000, Long Li wrote: > > >Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism > > > > > >On Fri, Sep 06, 2019 at 09:48:21AM +0800, Ming Lei wrote: > > >> When one IRQ flood happens on one CPU: > > >> > > >> 1) softirq handling on this CPU can't make progress > > >> > > >> 2) kernel thread bound to this CPU can't make progress > > >> > > >> For example, network may require softirq to xmit packets, or another > > >> irq thread for handling keyboards/mice or whatever, or rcu_sched may > > >> depend on that CPU for making progress, then the irq flood stalls the > > >> whole system. > > >> > > >> > > > >> > AFAIU, there are fast medium where the responses to requests are > > >> > faster than the time to process them, right? > > >> > > >> Usually medium may not be faster than CPU, now we are talking about > > >> interrupts, which can be originated from lots of devices concurrently, > > >> for example, in Long Li'test, there are 8 NVMe drives involved. > > > > > >Why are all 8 nvmes sharing the same CPU for interrupt handling? > > >Shouldn't matrix_find_best_cpu_managed() handle selecting the least used > > >CPU from the cpumask for the effective interrupt handling? > > > > The tests run on 10 NVMe disks on a system of 80 CPUs. Each NVMe disk has 32 hardware queues. > > Then there are total 320 NVMe MSI/X vectors, and 80 CPUs, so irq matrix > can't avoid effective CPUs overlapping at all. > > > It seems matrix_find_best_cpu_managed() has done its job, but we may still have CPUs that service several hardware queues mapped from other issuing CPUs. > > Another thing to consider is that there may be other managed interrupts on the system, so NVMe interrupts may not end up evenly distributed on such a system. > > Another improvement could be to try to not overlap effective CPUs among > vectors of fast device first, meantime allow the overlap between slow > vectors and fast vectors. > > This way could improve in case that total fast vectors are <= nr_cpu_cores. For this particular case, it can't be done, because: 1) this machine has 10 NUMA nodes, and each NVMe has 8 hw queues, so too many CPUs are assigned to the 1st two hw queues, see the code branch of 'if (numvecs <= nodes)' in __irq_build_affinity_masks(). 2) then less CPUs are assigned to the other 6 hw queues 3) finally same effective CPU is shared by two IRQ vector. Also looks matrix_find_best_cpu_managed() has been doing well enough for choosing best effective CPU. Thanks, Ming