Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp698779ybl; Wed, 28 Aug 2019 04:08:03 -0700 (PDT) X-Google-Smtp-Source: APXvYqxv/cN0940VXW9innBaEnbCPAJ1+kJfEfH8DX6Z9/A7u1MIFTTTB8JSdVazORRFLoj1Zb+9 X-Received: by 2002:a63:d315:: with SMTP id b21mr3016016pgg.326.1566990482891; Wed, 28 Aug 2019 04:08:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1566990482; cv=none; d=google.com; s=arc-20160816; b=UINPMZmeBHs0NPsAmIU5J4GpBTTKzXQVYHtXASuBzpd2SfYooPLPPVdIt+rtetILaO Yz4JU3wQstSnw5hr5ISosHa6q2bQDajr/msEKxJIS60vFH3IZatctBiMn01CAJbJS8Pu tqkC5rogl/i/DsIxwsXG3PtVYdSc8PaOv59qugs0jSpI0zGkRN1KhcUOn6xCujVvLWYS 8ZJqPFukLnL1YsVUFIeFHSAdMXROf/Cf3y6QipZ8o6KGmSLJ9cApMjMA8iFi1Va3ZN3i pcPUTo2wh74A5hrDxmecpxCVyT1o8ljDwgb+R+KUAVld4+YY5nkXmzveC3dY/UdA1DdQ FtAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=sDrewxE3bseXl9Vh1tX9A6x1jUOJ7d81q2bYV0LudOY=; b=OhTP2Fc5gwzWHpFe4DsIs8bDGb9k6Hcik3I25sAssp1eyfvFbfiPiN7c9HRrmZW8s5 VFEZpoTIULM0sv9LGk52IiOctUP1UAThjr09ZstFHUaYXc8MnRloOLxSEpHc98S5Iv5S YVqHxK3xFV9QTdxYQivM2kR+rLmswjRv1NQAQkElMBWAADhfg0WeDuW1RWRqUdN2HDhL WTVii4ffGeUnNJaX7dXf1KrB30IrcYiFyX9qLbeLDKEfDuIeb8o5Kn0IW5ttgnwKfE4F VTfg13ieENLy9aNa+s8QOw+sXEv7Rk/utmrRbcp9BYDh6P2H6fyFktsAcSzMKuqyPELa RL+w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k11si1841184pgt.593.2019.08.28.04.07.46; Wed, 28 Aug 2019 04:08:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726378AbfH1LGu (ORCPT + 99 others); Wed, 28 Aug 2019 07:06:50 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56100 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725991AbfH1LGt (ORCPT ); Wed, 28 Aug 2019 07:06:49 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 436273087958; Wed, 28 Aug 2019 11:06:49 +0000 (UTC) Received: from ming.t460p (ovpn-8-32.pek2.redhat.com [10.72.8.32]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C6336194B2; Wed, 28 Aug 2019 11:06:39 +0000 (UTC) Date: Wed, 28 Aug 2019 19:06:34 +0800 From: Ming Lei To: Thomas Gleixner Cc: linux-kernel@vger.kernel.org, Long Li , Ingo Molnar , Peter Zijlstra , Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , John Garry , Hannes Reinecke , linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism Message-ID: <20190828110633.GC15524@ming.t460p> References: <20190827085344.30799-1-ming.lei@redhat.com> <20190827085344.30799-2-ming.lei@redhat.com> <20190827225827.GA5263@ming.t460p> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.3 (2019-02-01) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Wed, 28 Aug 2019 11:06:49 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 28, 2019 at 01:09:44AM +0200, Thomas Gleixner wrote: > On Wed, 28 Aug 2019, Ming Lei wrote: > > On Tue, Aug 27, 2019 at 04:42:02PM +0200, Thomas Gleixner wrote: > > > On Tue, 27 Aug 2019, Ming Lei wrote: > > > > + > > > > + int cpu = raw_smp_processor_id(); > > > > + struct irq_interval *inter = per_cpu_ptr(&avg_irq_interval, cpu); > > > > + u64 delta = sched_clock_cpu(cpu) - inter->last_irq_end; > > > > > > Why are you doing that raw_smp_processor_id() dance? The call site has > > > interrupts and preemption disabled. > > > > OK, will change to __smp_processor_id(). > > You do not need smp_processor_id() as all. OK. > > > > Also how is that supposed to work when sched_clock is jiffies based? > > > > Good catch, looks ktime_get_ns() is needed. > > And what is ktime_get_ns() returning when the only available clocksource is > jiffies? IMO, it isn't one issue. If the only clocksource is jiffies, we needn't to expect high IO performance. Then it is fine to always handle the irq in interrupt context or thread context. However, if it can be recognized runtime, irq_flood_detected() can always return true or false. > > > > > > > > + inter->avg = (inter->avg * IRQ_INTERVAL_EWMA_PREV_FACTOR + > > > > + delta * IRQ_INTERVAL_EWMA_CURR_FACTOR) / > > > > + IRQ_INTERVAL_EWMA_WEIGHT; > > > > > > We definitely are not going to have a 64bit multiplication and division on > > > every interrupt. Asided of that this breaks 32bit builds all over the place. > > > > I will convert the above into add/subtract/shift only. > > No. Talk to Daniel. There is not going to be yet another ad hoc thingy just > to make block happy. I just take a first glance at the code of 'interrupt timing', and its motivation is to predicate of the next occurrence of interested interrupt for use cases of PM, such as predicate wakeup time. And predication could be one much more difficult thing, and its implementation requires to record more info: such as irq number, recent multiple irq timestamps, that means its overhead is a bit high. Meantime IRQS_TIMINGS should only be set on interested interrupt(s). Yeah, irq timing uses the Exponential Weighted Moving Average(EWMA) for computing average irq interval for each interrupt. So either motivation or approaches taken are totally different between irq timing and irq flood detection. Daniel, correct me if I am wrong. For the case of detecting IRQ flood, we only need to sum the average interval of any do_IRQ() on each CPU, and the overhead is quite low since just two read & write on one percpu varible is required. We don't care what the exact irq number is. However, we have to account time taken by softirq handler, which can't be covered by interrupt timing. Also it is quite simple to use EWMA to compute average interrupt interval, however we can't reuse irq timing's result which is done on each irq. IRQ flood detection simply computes the average interval of any do_IRQ() on each CPU for covering handling interrupt and softirq. > > And aside of that why does block not have a "NAPI" mode which was > explicitely designed to avoid all that? Block layer knows nothing about interrupt, even don't know which context is run for completing IO request, which is decided by driver, could be interrupt context, softirq context, or process context. Also it is hard for block layer to figure out when IRQ flood is triggered. CPU is shared resource, all kinds of interrupt sources may contribute some for IRQ flood. That is why this patch implements the detection mechanism in genirq/softirq code. In theory, this patch provides one simple generic mechanism for avoiding IRQ flood/CPU lockup, which could be used for any devices, not only high performance storage. Thanks, Ming