Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp1394046ybe; Mon, 2 Sep 2019 20:31:29 -0700 (PDT) X-Google-Smtp-Source: APXvYqw7rm4FLkbX9XlSq1eY7irwPXByHJNCNgTbQRgltBk75cjnd270n4lPENWC+etwswvS0VUr X-Received: by 2002:a63:714a:: with SMTP id b10mr28991048pgn.25.1567481489751; Mon, 02 Sep 2019 20:31:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567481489; cv=none; d=google.com; s=arc-20160816; b=MpPyVTuhCw7Sk1UNso2vaQWQ2wmALXpdZL7T27IoAX+BE9xY8P6C13XH/SLO6m3hDh 0QYDZdozX584lA8QWgxnxZy3BLaMhZc8H5QJV0wRF92Y1208A4jRq+rIMRFqNKYTYOXu mEXEShCz71DdstYrt6TSxdVw2LP3CQOBt2Q7158y1cHB7Hin+cxTwcs2l9vBJlsnZdn6 0LYF59kI8bwD7H72+voLYCYbdEBbAFv2LTExq32biY+csWQ0mp6nL4yBrjy0X9ZzU99O SyViRcA46jGqJ75bf2L3M07ut685+r0vAmf6+iKjX7ikR8uC0QdOv0HT7T8ep77l4WHv +PTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=IYuOqvdJFYx14LpWNz4KkkQdIF/CqC0i0v9DZLQZBIU=; b=IFNyKRE1LHDkKGV8h7R/VBi9c2NUwTjG0sey+80gIcYSn3YpcJKmB5jf+/RKQvPNwt bvahichGnujbimy8i6ncUxw9TJBnrCJ2QzXV+Fu8nS9OCzcx+gEC6gbxQBCO1Hnr7gFW Aqu2Xg3n4xJ8GuEOqJ6E+DiqoxZnSkbyyUa6TZNHXo5rtVyX7jvWwvu0BMmeJwDZvvzo NBeOPguq49EReEvGfvCns+jd+5WE2932A2JcpQfUVkAfRZ4IXS+/dNrUMN8OmkRt/kCM c5g2Sq38bSjGiJTng9r82oxiBDjrtL8EysuAKWct9JClJgjMyeKTNq9wGVwN6NftCDTb BpIQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f61si13607855plf.258.2019.09.02.20.31.14; Mon, 02 Sep 2019 20:31:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726318AbfICDaY (ORCPT + 99 others); Mon, 2 Sep 2019 23:30:24 -0400 Received: from mx1.redhat.com ([209.132.183.28]:48956 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725839AbfICDaY (ORCPT ); Mon, 2 Sep 2019 23:30:24 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7A7D948FD; Tue, 3 Sep 2019 03:30:23 +0000 (UTC) Received: from ming.t460p (ovpn-8-19.pek2.redhat.com [10.72.8.19]) by smtp.corp.redhat.com (Postfix) with ESMTPS id AAF345C21A; Tue, 3 Sep 2019 03:30:13 +0000 (UTC) Date: Tue, 3 Sep 2019 11:30:08 +0800 From: Ming Lei To: Thomas Gleixner Cc: LKML , Long Li , Ingo Molnar , Peter Zijlstra , Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , John Garry , Hannes Reinecke , linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, Daniel Lezcano Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism Message-ID: <20190903033001.GB23861@ming.t460p> References: <20190827085344.30799-1-ming.lei@redhat.com> <20190827085344.30799-2-ming.lei@redhat.com> <20190827225827.GA5263@ming.t460p> <20190828110633.GC15524@ming.t460p> <20190828135054.GA23861@ming.t460p> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.3 (2019-02-01) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (mx1.redhat.com [10.5.110.71]); Tue, 03 Sep 2019 03:30:23 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 28, 2019 at 04:07:19PM +0200, Thomas Gleixner wrote: > On Wed, 28 Aug 2019, Ming Lei wrote: > > On Wed, Aug 28, 2019 at 01:23:06PM +0200, Thomas Gleixner wrote: > > > On Wed, 28 Aug 2019, Ming Lei wrote: > > > > On Wed, Aug 28, 2019 at 01:09:44AM +0200, Thomas Gleixner wrote: > > > > > > > Also how is that supposed to work when sched_clock is jiffies based? > > > > > > > > > > > > Good catch, looks ktime_get_ns() is needed. > > > > > > > > > > And what is ktime_get_ns() returning when the only available clocksource is > > > > > jiffies? > > > > > > > > IMO, it isn't one issue. If the only clocksource is jiffies, we needn't to > > > > expect high IO performance. Then it is fine to always handle the irq in > > > > interrupt context or thread context. > > > > > > > > However, if it can be recognized runtime, irq_flood_detected() can > > > > always return true or false. > > > > > > Right. The clocksource is determined at runtime. And if there is no high > > > resolution clocksource then that function will return crap. > > > > This patch still works even though the only clocksource is jiffies. > > Works by some definition of works, right? I am not sure there is such system which doesn't provide any high resolution clocksource, meantime there is one high performance storage device attached, and expect top IO performance can be reached. Suppose there is such system: I mean that irq_flood_detected() returns either true or false, then the actual IO performance should be accepted on system without high resolution clocksource from user view. > > > > Well, yes. But it's trivial enough to utilize parts of it for your > > > purposes. > > > > >From the code of kernel/irq/timing.c: > > > > 1) record_irq_time() only records the start time of one irq, and not > > consider the time taken in interrupt handler, so we can't figure out > > the real interval between two do_IRQ() on one CPU > > I said utilize and that means that the infrastructure can be used and > extended. I did not say that it solves your problem, right? The infrastructure is for predicating when the next interrupt comes, which is used in PM cases(usually for mobile phone or power sensitive cases). However, IRQ flood is used in high performance system(usually enterprise case). The two use cases are actually orthogonal, also: 1) if the irq timing infrastructure is used, we have to apply the management code on irq flood detection, for example, we have to build the irq timing code in kernel and enable it. Then performance regression might be caused for enterprise application. 2) irq timing's runtime overload is much higher, irq_timings_push() touches much more memory footprint, since it records recent 32 irq's timestamp. That isn't what IRQ flood detection wants, also not enough for flood detection. 3) irq flood detection itself is very simple, just one EWMA calculation, see the following code: irq_update_interval() (called from irq_enter()) int cpu = raw_smp_processor_id(); struct irq_interval *inter = per_cpu_ptr(&avg_irq_interval, cpu); u64 delta = sched_clock_cpu(cpu) - inter->last_irq_end; inter->avg = (inter->avg * IRQ_INTERVAL_EWMA_PREV_FACTOR + delta * IRQ_INTERVAL_EWMA_CURR_FACTOR) / IRQ_INTERVAL_EWMA_WEIGHT; bool irq_flood_detected(void) (called from __handle_irq_event_percpu()) { return raw_cpu_ptr(&avg_irq_interval)->avg <= IRQ_FLOOD_THRESHOLD_NS; } irq_exit() inter->last_irq_end = sched_clock_cpu(smp_processor_id()); So there is basically nothing shared between the two, only one percpu variable is needed for detecting irq flood. > > > 2) irq/timing doesn't cover softirq > > That's solvable, right? Yeah, we can extend irq/timing, but ugly for irq/timing, since irq/timing focuses on hardirq predication, and softirq isn't involved in that purpose. > > > Daniel, could you take a look and see if irq flood detection can be > > implemented easily by irq/timing.c? > > I assume you can take a look as well, right? Yeah, I have looked at the code for a while, but I think that irq/timing could become complicated unnecessarily for covering irq flood detection, meantime it is much less efficient for detecting IRQ flood. thanks, Ming