Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
Date:   Tue, 3 Sep 2019 17:24:54 +0800
From:   Ming Lei <ming.lei@redhat.com>
To:     Thomas Gleixner <tglx@linutronix.de>
Cc:     Daniel Lezcano <daniel.lezcano@linaro.org>,
        LKML <linux-kernel@vger.kernel.org>,
        Long Li <longli@microsoft.com>, Ingo Molnar <mingo@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Keith Busch <keith.busch@intel.com>, Jens Axboe <axboe@fb.com>,
        Christoph Hellwig <hch@lst.de>,
        Sagi Grimberg <sagi@grimberg.me>,
        John Garry <john.garry@huawei.com>,
        Hannes Reinecke <hare@suse.com>,
        linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org
Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism
Message-ID: <20190903092453.GA22399@ming.t460p>
References: <20190828110633.GC15524@ming.t460p>
 <alpine.DEB.2.21.1908281316230.1869@nanos.tec.linutronix.de>
 <20190828135054.GA23861@ming.t460p>
 <alpine.DEB.2.21.1908281605190.23149@nanos.tec.linutronix.de>
 <20190903033001.GB23861@ming.t460p>
 <299fb6b5-d414-2e71-1dd2-9d6e34ee1c79@linaro.org>
 <20190903063125.GA21022@ming.t460p>
 <6b88719c-782a-4a63-db9f-bf62734a7874@linaro.org>
 <20190903072848.GA22170@ming.t460p>
 <alpine.DEB.2.21.1909031000460.1880@nanos.tec.linutronix.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <alpine.DEB.2.21.1909031000460.1880@nanos.tec.linutronix.de>
User-Agent: Mutt/1.11.3 (2019-02-01)
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

On Tue, Sep 03, 2019 at 10:09:57AM +0200, Thomas Gleixner wrote:
> On Tue, 3 Sep 2019, Ming Lei wrote:
> > Scheduler can do nothing if the CPU is taken completely by handling
> > interrupt & softirq, so seems not a scheduler problem, IMO.
> 
> Well, but thinking more about it, the solution you are proposing is more a
> bandaid than anything else.
> 
> If you look at the networking NAPI mechanism. It handles that situation
> gracefully by:
> 
>   - Disabling the interrupt at the device level

I guess you mean we disable the interrupt in the softirq context.

IO performance could be affected by the extra action of disabling/enabling
interrupt every time.

IOPS for the discussed device is several millions.

> 
>   - Polling the device in softirq context until empty and then reenabling
>     interrupts

blk-mq switches to complete req in interrupt context for avoiding extra
performance loss, so switching back to softirq context every time may
cause performance regression.

> 
>   - In case the softirq handles more packets than a defined budget it
>     forces the softirq into the softirqd thread context which also
>     allows rescheduling once the budget is completed.


It can be hard to figure out one perfect defined budget.

In the patchset of V2[1], IRQF_ONESHOT is applied on the irq thread,
and interrupt isn't enabled until the interrupt has been handled in
the irq thread context.

[1] https://github.com/ming1/linux/commits/v5.3-genirq-for-5.4

The approach in this patchset is actually very similar with the above
NAPI based way. The difference is that softirq is avoided, and interrupt
is always handled in interrupt context in case that CPU won't be stalled,
so performance won't be affected. And we only switch to handle interrupt
in thread context if CPU stall is going to happen.

> 
> With your adhoc workaround you handle one specific case. But it does not
> work at all when an overload situation occurs in a case where the queues
> are truly per cpu simply.

There isn't such CPU stall issue in case of single submission vs. single
completion, because submission side and completion side share same single
CPU, and the submission side will slow down if completion side takes all
the CPU. 

> Because then the interrupt and the thread
> affinity are the same and single CPU targets and you replace the interrupt
> with a threaded handler which runs by default with RT priority.

Even though the threaded handler is RT priority and the thread is run
on same CPU with the interrupt, CPU/rcu stall still can be avoided.

Also we can switch to use irq affinity for the irq thread instead of effective
affinity.

> 
> So instead of hacking something half baken into the hard/softirq code, why
> can't block do a budget limitation and once that is reached switch to
> something NAPI like as a general solution?

Another big reason is that multiple submission vs. single completion isn't
common case, I knew that there are only small number of such device,
so re-inventing NAPI based approach may takes lots of effort, meantime
only small number of devices can get the benefit, not sure if block
community would like to consider that.

IMO, it might be the simplest generic way to solve the problem from genirq.


Thanks,
Ming