Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp4319919imu; Mon, 14 Jan 2019 20:25:34 -0800 (PST) X-Google-Smtp-Source: ALg8bN6rJzay9CHFQXgLGf6Sm7bD68aLoBMPtwq9fK5qgdsPmgDufNRafXbcwwcXr5Rbnr5s+/IG X-Received: by 2002:a62:cec6:: with SMTP id y189mr2091568pfg.100.1547526334522; Mon, 14 Jan 2019 20:25:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547526334; cv=none; d=google.com; s=arc-20160816; b=qAOtQfdIWm6x27B8H7kK42DoWds2hzT1/Wx/OCs/3GrmWLGHYW/794+yEn9ykxJqTT JzaoV28pn97rFW40nV/dcXH9Rb7bKa8ZLTZ4QSuHMLhCjmEq9+9TXMsfM/kEKLm6WFRg 6r49kDpGCUtMef+ql4pfXQzWE+ZOP98qpBj6P5OjwOuBrUveBQg4y1tXxLsJUjG6qddB +bGolziHzAt6FcSaZVbf0UKgPuJVs0vxCaW/cuA8AlS1sWI4/ZJxhrGJwToyCU06Q62g p7I/gIzCNf6BFgnyqCjv00EVKS+8h8oK5VL5X/7wpp01vzddkJM/HxO3PNJgwSaTEgKV nYWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=mWsKvsN1KpcCFhSkdRH/HkWi9klXka1o3aY7/5LJy4o=; b=EljvSoBSTTzXnrz82xSm7jeV57IW2u6WfxSq5wd2rB3IzCmQjtOMIfOi2neEX5Pkz6 EIFMZea3yq8mPZYNh6PUhzJvOlqcml5hLYuwJHtVVxixl3IX2+dMvsQZXGV3JOgFkokd kc5hWokU/GMJHwabEeLiTqsvXPAuN9jbKy1hZDFj07JcjjpCQ1AZ1KsZatTeVexmg3u9 RIBWxEulao3k1ROEAjojw/8xwV/d8ZWE0WLBW1MO0rKaOhojxbXB1x8DWjVxqKXQyhKl 0uF/SUyJeuUBS8zi2vcd6fsw5+LM5x9JAg6Vx1T3kxLbHZvFJd5i6tS37rWeAFY5bYIb 3FdQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 63si312459pfv.38.2019.01.14.20.25.18; Mon, 14 Jan 2019 20:25:34 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728054AbfAODq0 (ORCPT + 99 others); Mon, 14 Jan 2019 22:46:26 -0500 Received: from mx1.redhat.com ([209.132.183.28]:33734 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726769AbfAODq0 (ORCPT ); Mon, 14 Jan 2019 22:46:26 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E3C9A2D7E4; Tue, 15 Jan 2019 03:46:25 +0000 (UTC) Received: from ming.t460p (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 9E07619C7E; Tue, 15 Jan 2019 03:46:18 +0000 (UTC) Date: Tue, 15 Jan 2019 11:46:14 +0800 From: Ming Lei To: Jens Axboe Cc: Steven Rostedt , LKML , Linus Torvalds , Andrew Morton , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Clark Williams , Bart Van Assche Subject: Re: Real deadlock being suppressed in sbitmap Message-ID: <20190115034612.GF10121@ming.t460p> References: <20190114121414.450ab4ea@gandalf.local.home> <20190115032355.GE10121@ming.t460p> <31b9c76b-6f70-0af4-d854-e02bda25e4c0@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <31b9c76b-6f70-0af4-d854-e02bda25e4c0@kernel.dk> User-Agent: Mutt/1.9.1 (2017-09-22) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Tue, 15 Jan 2019 03:46:26 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 14, 2019 at 08:41:16PM -0700, Jens Axboe wrote: > On 1/14/19 8:23 PM, Ming Lei wrote: > > Hi Steven, > > > > On Mon, Jan 14, 2019 at 12:14:14PM -0500, Steven Rostedt wrote: > >> It was brought to my attention (by this creating a splat in the RT tree > >> too) this code: > >> > >> static inline bool sbitmap_deferred_clear(struct sbitmap *sb, int index) > >> { > >> unsigned long mask, val; > >> unsigned long __maybe_unused flags; > >> bool ret = false; > >> > >> /* Silence bogus lockdep warning */ > >> #if defined(CONFIG_LOCKDEP) > >> local_irq_save(flags); > >> #endif > >> spin_lock(&sb->map[index].swap_lock); > >> > >> Commit 58ab5e32e6f ("sbitmap: silence bogus lockdep IRQ warning") > >> states the following: > >> > >> For this case, it's a false positive. The swap_lock is used from process > >> context only, when we swap the bits in the word and cleared mask. We > >> also end up doing that when we are getting a driver tag, from the > >> blk_mq_mark_tag_wait(), and from there we hold the waitqueue lock with > >> IRQs disabled. However, this isn't from an actual IRQ, it's still > >> process context. > >> > >> The thing is, lockdep doesn't define a lock as "irq-safe" based on it > >> being taken under interrupts disabled or not. It detects when locks are > >> used in actual interrupts. Further in that commit we have this: > >> > >> [ 106.097386] fio/1043 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: > >> [ 106.098231] 000000004c43fa71 > >> (&(&sb->map[i].swap_lock)->rlock){+.+.}, at: sbitmap_get+0xd5/0x22c > >> [ 106.099431] > >> [ 106.099431] and this task is already holding: > >> [ 106.100229] 000000007eec8b2f > >> (&(&hctx->dispatch_wait_lock)->rlock){....}, at: > >> blk_mq_dispatch_rq_list+0x4c1/0xd7c > >> [ 106.101630] which would create a new lock dependency: > >> [ 106.102326] (&(&hctx->dispatch_wait_lock)->rlock){....} -> > >> (&(&sb->map[i].swap_lock)->rlock){+.+.} > >> > >> Saying that you are trying to take the swap_lock while holding the > >> dispatch_wait_lock. > >> > >> > >> [ 106.103553] but this new dependency connects a SOFTIRQ-irq-safe lock: > >> [ 106.104580] (&sbq->ws[i].wait){..-.} > >> > >> Which means that there's already a chain of: > >> > >> sbq->ws[i].wait -> dispatch_wait_lock > >> > >> [ 106.104582] > >> [ 106.104582] ... which became SOFTIRQ-irq-safe at: > >> [ 106.105751] _raw_spin_lock_irqsave+0x4b/0x82 > >> [ 106.106284] __wake_up_common_lock+0x119/0x1b9 > >> [ 106.106825] sbitmap_queue_wake_up+0x33f/0x383 > >> [ 106.107456] sbitmap_queue_clear+0x4c/0x9a > >> [ 106.108046] __blk_mq_free_request+0x188/0x1d3 > >> [ 106.108581] blk_mq_free_request+0x23b/0x26b > >> [ 106.109102] scsi_end_request+0x345/0x5d7 > >> [ 106.109587] scsi_io_completion+0x4b5/0x8f0 > >> [ 106.110099] scsi_finish_command+0x412/0x456 > >> [ 106.110615] scsi_softirq_done+0x23f/0x29b > >> [ 106.111115] blk_done_softirq+0x2a7/0x2e6 > >> [ 106.111608] __do_softirq+0x360/0x6ad > >> [ 106.112062] run_ksoftirqd+0x2f/0x5b > >> [ 106.112499] smpboot_thread_fn+0x3a5/0x3db > >> [ 106.113000] kthread+0x1d4/0x1e4 > >> [ 106.113457] ret_from_fork+0x3a/0x50 > >> > >> > >> We see that sbq->ws[i].wait was taken from a softirq context. > > > > Actually sbq->ws[i].wait is taken from a softirq context only in case > > of single-queue, see __blk_mq_complete_request(). For multiple queue, > > sbq->ws[i].wait is taken from hardirq context. > > That's a good point, but that's just current implementation, we can't > assume any of those relationsships. Any completion can happen from > softirq or hardirq. So the patch is inadequate. > > > Sounds the correct fix may be the following one, and the irqsave cost > > should be fine given sbitmap_deferred_clear is only triggered when one > > word is run out of. > > Yes, the _bh() variant isn't going to cut it. Can you send this patch > against Linus's master? OK, will post it out soon. Thanks, Ming