Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754805AbdLOCN7 (ORCPT ); Thu, 14 Dec 2017 21:13:59 -0500 Received: from userp2120.oracle.com ([156.151.31.85]:54766 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754594AbdLOCN4 (ORCPT ); Thu, 14 Dec 2017 21:13:56 -0500 Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme To: Peter Zijlstra , Bart Van Assche Cc: "linux-kernel@vger.kernel.org" , "linux-block@vger.kernel.org" , "kernel-team@fb.com" , "oleg@redhat.com" , "hch@lst.de" , "axboe@kernel.dk" , "osandov@fb.com" , "tj@kernel.org" References: <20171212190134.535941-1-tj@kernel.org> <20171212190134.535941-3-tj@kernel.org> <1513277469.2475.43.camel@wdc.com> <20171214202042.GG3326@worktop> <1513287766.2475.73.camel@wdc.com> <20171214215404.GK3326@worktop> From: "jianchao.wang" Message-ID: <007e5a56-83fb-23b0-64d9-4725f15c596d@oracle.com> Date: Fri, 15 Dec 2017 10:12:50 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <20171214215404.GK3326@worktop> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8745 signatures=668648 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1712150025 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2585 Lines: 83 On 12/15/2017 05:54 AM, Peter Zijlstra wrote: > On Thu, Dec 14, 2017 at 09:42:48PM +0000, Bart Van Assche wrote: >> On Thu, 2017-12-14 at 21:20 +0100, Peter Zijlstra wrote: >>> On Thu, Dec 14, 2017 at 06:51:11PM +0000, Bart Van Assche wrote: >>>> On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote: >>>>> + write_seqcount_begin(&rq->gstate_seq); >>>>> + blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT); >>>>> + blk_add_timer(rq); >>>>> + write_seqcount_end(&rq->gstate_seq); >>>> >>>> My understanding is that both write_seqcount_begin() and write_seqcount_end() >>>> trigger a write memory barrier. Is a seqcount really faster than a spinlock? >>> >>> Yes lots, no atomic operations and no waiting. >>> >>> The only constraint for write_seqlock is that there must not be any >>> concurrency. >>> >>> But now that I look at this again, TJ, why can't the below happen? >>> >>> write_seqlock_begin(); >>> blk_mq_rq_update_state(rq, IN_FLIGHT); >>> blk_add_timer(rq); >>> >>> read_seqcount_begin() >>> while (seq & 1) >>> cpurelax(); >>> // life-lock >>> >>> write_seqlock_end(); >> >> Hello Peter, >> >> Some time ago the block layer was changed to handle timeouts in thread context >> instead of interrupt context. See also commit 287922eb0b18 ("block: defer >> timeouts to a workqueue"). > > That only makes it a little better: > > Task-A Worker > > write_seqcount_begin() > blk_mq_rw_update_state(rq, IN_FLIGHT) > blk_add_timer(rq) > > schedule_work() > > > read_seqcount_begin() > while(seq & 1) > cpu_relax(); > Hi Peter The current seqcount read side is as below: do { start = read_seqcount_begin(&rq->gstate_seq); gstate = READ_ONCE(rq->gstate); deadline = rq->deadline; } while (read_seqcount_retry(&rq->gstate_seq, start)); read_seqcount_retry() doesn't check the bit 0, but whether the saved value from read_seqcount_begin() is equal to the current value of seqcount. pls refer: static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start) { return unlikely(s->sequence != start); } Thanks Jianchao > > Now normally this isn't fatal because Worker will simply spin its entire > time slice away and we'll eventually schedule our Task-A back in, which > will complete the seqcount and things will work. > > But if, for some reason, our Worker was to have RT priority higher than > our Task-A we'd be up some creek without no paddles. > > We don't happen to have preemption of IRQs off here? That would fix > things nicely. >