Received: by 10.223.176.46 with SMTP id f43csp1603845wra; Wed, 24 Jan 2018 20:16:50 -0800 (PST) X-Google-Smtp-Source: AH8x227UPrZXGg7j6aJC/WFhQXW1v8K3LfuSJ8jo+xGflsJAca0U+Ng/7wMR2n966TNnB2axmBsZ X-Received: by 10.98.252.82 with SMTP id e79mr14903152pfh.159.1516853810224; Wed, 24 Jan 2018 20:16:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516853810; cv=none; d=google.com; s=arc-20160816; b=GEigO5i+28Ntw+G8wDO7xIkzCDI6oCkKLMazaUdCap3qwJdUvkmcY4DiVkEPZh2D+2 91NTSEQUkHz0veLfKzarFH/l9V5szgwpZ2CKcA2+7xOUmUR50tFBXaq0Z9aZM6vMkJCy TVlKDtD3bpQ5Rx1N20v27awA9KjewTyEl90aLml8WhusFGGHbDzryimVp8plR4XdeiOx C3/2qti84pdhXW/o6WZS/YPxnVxZ1ZMN5mPEZODqTycZAqFOmhLfjGMl3eqjV9PaOiXQ 6obXYVB7KQIsnrLSjc0zQeWx4e+/oQiSpzqayul384HWiHQ0pz0UWjUU90fgkZO36a3X UjQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=JFtIvg82a7iXOeG3qD9NNV6hzl2oA6mMzCovvf4MyAo=; b=LGDldBiz//tK9g8hwaDZfAefXoINfIdM4HAnbwftyt6xmkSmXL19S8n02/PtG1A0H3 6UjxLzcYEqOFfzp0e4r5V1nVuSonqU3yA/kCXyb/NI0U1TFbgqohzKcdyZqSwJ0vVYv7 txsxoITNLy96jn4AZyx31l6wd3UXHUjWpuH67ojoA9Z0JWPdnDiQgaHtT/lw3KC+OQP2 Bnp7TviFryZ+rB018V1gUwfHHXG0vpCyM41vzdkcio1tH9BM6ddWkRcpBxhrNy32xRed HrWwh9ZDIVGPzsiUxCN83AwI3jqNkjKeUDvj1Rj+ob07D/H5RdWvbkQmwxBBIkaU5DFp W7tw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v137si1014749pgb.753.2018.01.24.20.16.35; Wed, 24 Jan 2018 20:16:50 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933466AbeAYEQI (ORCPT + 99 others); Wed, 24 Jan 2018 23:16:08 -0500 Received: from zeniv.linux.org.uk ([195.92.253.2]:57122 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933218AbeAYEQH (ORCPT ); Wed, 24 Jan 2018 23:16:07 -0500 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.87 #1 (Red Hat Linux)) id 1eeYxJ-0001Og-Kt; Thu, 25 Jan 2018 04:16:05 +0000 Date: Thu, 25 Jan 2018 04:16:05 +0000 From: Al Viro To: Jia-Ju Bai Cc: axboe@kernel.dk, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] block: blk-mq-sched: Replace GFP_ATOMIC with GFP_KERNEL in blk_mq_sched_assign_ioc Message-ID: <20180125041605.GV13338@ZenIV.linux.org.uk> References: <1516848386-5720-1-git-send-email-baijiaju1990@gmail.com> <20180125025811.GT13338@ZenIV.linux.org.uk> <70efc238-9517-7cfa-03ce-ba9c3ba0ebd4@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <70efc238-9517-7cfa-03ce-ba9c3ba0ebd4@gmail.com> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 25, 2018 at 11:13:56AM +0800, Jia-Ju Bai wrote: > I have checked the given call chain, and find that nvme_dev_disable in > nvme_timeout calls mutex_lock that can sleep. > Thus, I suppose this call chain is not in atomic context. ... or it is broken. > Besides, how do you find that "function (nvme_timeout()) strongly suggests > that it *is* meant to be called from bloody atomic context"? > I check the comments in nvme_timeout, and do not find related description... Anything that reads registers for controller state presumably won't be happy if it can happen in parallel with other threads poking the same hardware. Not 100% guaranteed, but it's a fairly strong sign that there's some kind of exclusion between whatever's submitting requests / handling interrupts and the caller of that thing. And such exclusion is likely to be spin_lock_irqsave()-based. Again, that does not _prove_ it's called from atomic contexts, but does suggest such possibility. Looking through the callers of that method, blk_abort_request() certainly *is* called from under queue lock. Different drivers, though. No idea if nvme_timeout() blocking case is broken - I'm not familiar with that code. Question should go to nvme maintainers... However, digging through other call chains, there's this: drivers/md/dm-mpath.c:530: clone = blk_get_request(q, rq->cmd_flags | REQ_NOMERGE, GFP_ATOMIC); in multipath_clone_and_map(), aka. ->clone_and_map_rq(), called at drivers/md/dm-rq.c:480: r = ti->type->clone_and_map_rq(ti, rq, &tio->info, &clone); in map_request(), which is called from dm_mq_queue_rq(), aka ->queue_rq(), which is called from blk_mq_dispatch_rq_list(), called from blk_mq_do_dispatch_sched(), called from blk_mq_sched_dispatch_requests(), called under rcu_read_lock(). Not a context where you want GFP_KERNEL allocations... > By the way, do you mean that I should add "My tool has proved that this > function is never called in atomic context" in the description? I mean that proof itself should be at least outlined. Crediting the tool for finding the proof is fine *IF* it's done along wiht the proof itself. You want to convince the people applying the patch that it is correct. Leaving out something trivial to verify is fine - "foo_bar_baz() has no callers" doesn't require grep output + quoting the area around each instance to prove that all of them are in the comments, etc.; readers can bloody well check the accuracy of that claim themselves. This kind of analysis, however, is decidedly *NOT* trivial to verify from scratch. Moreover, if what you've proven is that for each call chain leading to that place there's a blocking operation nearby, there is still a possibility that some of those *are* called while in non-blocking context. In that case you've found real bugs, and strictly speaking your change doesn't break correct code. However, it does not make the change itself correct - if you have something like enter non-blocking section ..... in very unusual cases grab a mutex (or panic, or...) ..... do GFP_ATOMIC allocation ..... leave non-blocking section changing that to GFP_KERNEL will turn "we deadlock in very hard to hit case" into "we deadlock easily"... At the very least, I'd like to see those cutoffs - i.e. the places that already could block on the callchains. You might very well have found actual bugs there.