Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18;
Subject: Re: [PATCH 3/3] blk-mq: Use llist_head for blk_cpu_done
To:     Christoph Hellwig <hch@infradead.org>,
        Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc:     Jens Axboe <axboe@kernel.dk>, linux-block@vger.kernel.org,
        Thomas Gleixner <tglx@linutronix.de>,
        David Runge <dave@sleepmap.de>, linux-rt-users@vger.kernel.org,
        linux-kernel@vger.kernel.org,
        Peter Zijlstra <peterz@infradead.org>,
        Daniel Wagner <dwagner@suse.de>, Mike Galbraith <efault@gmx.de>
References: <20201029131212.dsulzvsb6pahahbs@linutronix.de>
 <20201029140536.GA6376@infradead.org>
 <20201029145623.3zry7o6nh6ks5tjj@linutronix.de>
 <20201029145743.GA19379@infradead.org>
 <d2c15411-5b21-535b-6e07-331ebe22f8c8@grimberg.me>
 <20201029210103.ocufuvj6i4idf5hj@linutronix.de>
 <deb40e55-d228-06c8-8719-fc8657a0a19b@grimberg.me>
 <20201031104108.wjjdiklqrgyqmj54@linutronix.de>
 <3bbfb5e1-c5d7-8f3b-4b96-6dc02be0550d@kernel.dk>
 <20201102095533.fxc2xpauzsoju7cm@linutronix.de>
 <20201102181238.GA17806@infradead.org>
From:   Sagi Grimberg <sagi@grimberg.me>
Message-ID: <75970f9d-7e59-5fba-280a-d0d935fc4d2f@grimberg.me>
Date:   Wed, 4 Nov 2020 11:15:27 -0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.10.0
MIME-Version: 1.0
In-Reply-To: <20201102181238.GA17806@infradead.org>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Precedence: bulk


>>> There really aren't any rules for this, and it's perfectly legit to
>>> complete from process context. Maybe you're a kthread driven driver and
>>> that's how you handle completions. The block completion path has always
>>> been hard IRQ safe, but possible to call from anywhere.
>>
>> I'm not trying to put restrictions and forbidding completions from a
>> kthread. I'm trying to avoid the pointless softirq dance for no added
>> value. We could:
> 
>> to not break that assumption you just mentioned and provide
>> |static inline void blk_mq_complete_request_local(struct request *rq)
>> |{
>> |                 rq->q->mq_ops->complete(rq);
>> |}
>>
>> so that completion issued from from process context (like those from
>> usb-storage) don't end up waking `ksoftird' (running at SCHED_OTHER)
>> completing the requests but rather performing it right away. The softirq
>> dance makes no sense here.
> 
> Agreed.  But I don't think your above blk_mq_complete_request_local
> is all that useful either as ->complete is defined by the caller,
> so we could just do a direct call.  Basically we should just
> return false from blk_mq_complete_request_remote after updating
> the state when called from process context.

Agreed.

> But given that IIRC
> we are not supposed to check what state we are called from
> we'll need a helper just for updating the state instead and
> ensure the driver uses the right helper.  Now of course we might
> have process context callers that still want to bounce to the
> submitting CPU, but in that case we should go directly to a
> workqueue or similar.

This would mean that it may be suboptimal for nvme-tcp to complete
requests in softirq context from the network context (determined by
NIC steering). Because in this case, this would trigger workqueue
schedule on a per-request basis rather than once per .data_ready
call like we do today. Is that correct?

It has been observed that completing commands in softirq context
(network determined cpu) because basically the completion does
IPI + local complete, not IPI + softirq or IPI + workqueue.

> Either way doing this properly will probabl involve an audit of all
> drivers, but I think that is worth it.

Agree.