Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3803027imu; Mon, 14 Jan 2019 09:15:56 -0800 (PST) X-Google-Smtp-Source: ALg8bN6jGYqP15+VAj3T+njflaNC5+cpbsep6LVGbE5UxMY40PJyAqQJaOU1iP4yVZMlG2aMppg0 X-Received: by 2002:a62:3adc:: with SMTP id v89mr25991578pfj.109.1547486156390; Mon, 14 Jan 2019 09:15:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547486156; cv=none; d=google.com; s=arc-20160816; b=vGoIzvhjnY+o6dAsGgdg5ukce69eaaoWSVizQh/x5AxL1fQZ8ZaaNm2o/QEhfsAz4L RrMf51HS5SZJb1m+JdY5YnrVGkVhN7kKVeASawcOTWkxzzf/3VEmiPrJeez2mBmbVeOt pZUjEWMBVxMTCc1pmsqP+LUuOkrnrfZ5ZEAtvxidKnL6CH5Zd7LC2f/vvUiSVqn+qjkW Qsh3Yhzo/O8Y+4e0VBso2FjXbWEfT3CtI7CzRdaSn77KnSdMejBEOVBrui/1qDrOw2nE ECVYNs5l+uJ6AJyNpjgyJmQVGB4dfg2Vg6hSKsLrF5SYHXAFGYpurOFRbfakt7G6ClDU nGHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:subject:cc:to:from:date; bh=X6NqYcef2tLUq8unj90bpWjkF7+dcj5ToBSCsjAJafQ=; b=gtDtq0QRM2YJThvW4ttBi8T02pSg29fW1hsX5f+ost7CAw/CQgtdurZ7WY8gafcQA+ J3Q2GRUEua1Y9NsFjc0vjGLugCNDCuFxoyzfSEhe2CuYIdIveFmrvgzFwt0tyYKVDvnf BXfpsJyXCawh+5YDUAH5uHTWmdNy4xAhKJj9McteHEIVGj5O0MQZARZ4WRvi3Zc7qXH8 zImU4w0sCrkTIVxS/fINkzvcG50ku0vm25Jg7mdl5npGnk65Nzcw03LV0NHal+bhWLlE Fty4IgCNXPCzfUHhjDq8/q1b3r51mhyTPxOlI6c8rtm/lMUetXybmEat8k0CnGTYhYSf fnBw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v6si754344pgv.277.2019.01.14.09.15.40; Mon, 14 Jan 2019 09:15:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726804AbfANROT (ORCPT + 99 others); Mon, 14 Jan 2019 12:14:19 -0500 Received: from mail.kernel.org ([198.145.29.99]:48312 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726643AbfANROT (ORCPT ); Mon, 14 Jan 2019 12:14:19 -0500 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 78DA32070B; Mon, 14 Jan 2019 17:14:16 +0000 (UTC) Date: Mon, 14 Jan 2019 12:14:14 -0500 From: Steven Rostedt To: Jens Axboe Cc: LKML , Linus Torvalds , Andrew Morton , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Clark Williams , Bart Van Assche , Ming Lei Subject: Real deadlock being suppressed in sbitmap Message-ID: <20190114121414.450ab4ea@gandalf.local.home> X-Mailer: Claws Mail 3.16.0 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org It was brought to my attention (by this creating a splat in the RT tree too) this code: static inline bool sbitmap_deferred_clear(struct sbitmap *sb, int index) { unsigned long mask, val; unsigned long __maybe_unused flags; bool ret = false; /* Silence bogus lockdep warning */ #if defined(CONFIG_LOCKDEP) local_irq_save(flags); #endif spin_lock(&sb->map[index].swap_lock); Commit 58ab5e32e6f ("sbitmap: silence bogus lockdep IRQ warning") states the following: For this case, it's a false positive. The swap_lock is used from process context only, when we swap the bits in the word and cleared mask. We also end up doing that when we are getting a driver tag, from the blk_mq_mark_tag_wait(), and from there we hold the waitqueue lock with IRQs disabled. However, this isn't from an actual IRQ, it's still process context. The thing is, lockdep doesn't define a lock as "irq-safe" based on it being taken under interrupts disabled or not. It detects when locks are used in actual interrupts. Further in that commit we have this: [ 106.097386] fio/1043 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 106.098231] 000000004c43fa71 (&(&sb->map[i].swap_lock)->rlock){+.+.}, at: sbitmap_get+0xd5/0x22c [ 106.099431] [ 106.099431] and this task is already holding: [ 106.100229] 000000007eec8b2f (&(&hctx->dispatch_wait_lock)->rlock){....}, at: blk_mq_dispatch_rq_list+0x4c1/0xd7c [ 106.101630] which would create a new lock dependency: [ 106.102326] (&(&hctx->dispatch_wait_lock)->rlock){....} -> (&(&sb->map[i].swap_lock)->rlock){+.+.} Saying that you are trying to take the swap_lock while holding the dispatch_wait_lock. [ 106.103553] but this new dependency connects a SOFTIRQ-irq-safe lock: [ 106.104580] (&sbq->ws[i].wait){..-.} Which means that there's already a chain of: sbq->ws[i].wait -> dispatch_wait_lock [ 106.104582] [ 106.104582] ... which became SOFTIRQ-irq-safe at: [ 106.105751] _raw_spin_lock_irqsave+0x4b/0x82 [ 106.106284] __wake_up_common_lock+0x119/0x1b9 [ 106.106825] sbitmap_queue_wake_up+0x33f/0x383 [ 106.107456] sbitmap_queue_clear+0x4c/0x9a [ 106.108046] __blk_mq_free_request+0x188/0x1d3 [ 106.108581] blk_mq_free_request+0x23b/0x26b [ 106.109102] scsi_end_request+0x345/0x5d7 [ 106.109587] scsi_io_completion+0x4b5/0x8f0 [ 106.110099] scsi_finish_command+0x412/0x456 [ 106.110615] scsi_softirq_done+0x23f/0x29b [ 106.111115] blk_done_softirq+0x2a7/0x2e6 [ 106.111608] __do_softirq+0x360/0x6ad [ 106.112062] run_ksoftirqd+0x2f/0x5b [ 106.112499] smpboot_thread_fn+0x3a5/0x3db [ 106.113000] kthread+0x1d4/0x1e4 [ 106.113457] ret_from_fork+0x3a/0x50 We see that sbq->ws[i].wait was taken from a softirq context. [ 106.131226] Chain exists of: [ 106.131226] &sbq->ws[i].wait --> &(&hctx->dispatch_wait_lock)->rlock --> &(&sb->map[i].swap_lock)->rlock This is telling us that we now have a chain of: sbq->ws[i].wait -> dispatch_wait_lock -> swap_lock [ 106.131226] [ 106.132865] Possible interrupt unsafe locking scenario: [ 106.132865] [ 106.133659] CPU0 CPU1 [ 106.134194] ---- ---- [ 106.134733] lock(&(&sb->map[i].swap_lock)->rlock); [ 106.135318] local_irq_disable(); [ 106.136014] lock(&sbq->ws[i].wait); [ 106.136747] lock(&(&hctx->dispatch_wait_lock)->rlock); [ 106.137742] [ 106.138110] lock(&sbq->ws[i].wait); [ 106.138625] [ 106.138625] *** DEADLOCK *** [ 106.138625] I need to make this more than just two levels deep. Here's the issue: CPU0 CPU1 CPU2 ---- ---- ---- lock(swap_lock) local_irq_disable() lock(dispatch_lock); local_irq_disable() lock(sbq->ws[i].wait) lock(dispatch_lock) lock(swap_lock) lock(sbq->ws[i].wait) DEADLOCK! In other words, it is not bogus, and can be a real potential for a deadlock. Please talk with the lockdep maintainers before saying there's a bogus deadlock, because lockdep is seldom wrong. -- Steve