Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755991AbZICQfm (ORCPT ); Thu, 3 Sep 2009 12:35:42 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755897AbZICQfm (ORCPT ); Thu, 3 Sep 2009 12:35:42 -0400 Received: from moutng.kundenserver.de ([212.227.126.186]:63683 "EHLO moutng.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755867AbZICQfl (ORCPT ); Thu, 3 Sep 2009 12:35:41 -0400 Message-ID: <4A9FF032.2020000@vlnb.net> Date: Thu, 03 Sep 2009 20:34:58 +0400 From: Vladislav Bolkhovitin User-Agent: Thunderbird 2.0.0.21 (X11/20090320) MIME-Version: 1.0 To: Jens Axboe CC: linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org, scst-devel@lists.sourceforge.net, Tejun Heo , Boaz Harrosh , James Bottomley , FUJITA Tomonori , Joe Eykholt Subject: Re: [PATCH]: Implementation of blk_rq_map_kern_sg() (aka New implementation of scsi_execute_async() v3) References: <4A563368.5040407@vlnb.net> <4A830016.5020304@vlnb.net> <20090815082220.GJ12579@kernel.dk> In-Reply-To: <20090815082220.GJ12579@kernel.dk> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V01U2FsdGVkX1/Vz0ZgG2Oph6rbPFMnzkuUKrxUIWHejUW8ZvS IGcHvJ90qkFvSb5ReRLC7OzUnkXfwza80jcGSaj0+lBGsoD0Wp XRqB10QKXlGb8AN+7mx0A== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3982 Lines: 87 Jens Axboe, on 08/15/2009 12:22 PM wrote: > On Wed, Aug 12 2009, Vladislav Bolkhovitin wrote: >> This patch implements function blk_rq_map_kern_sg(), which allows to map >> a kernel-originated SG vector to a block request. It is necessary to execute >> SCSI commands with from kernel going SG buffer. At the moment SCST is the only >> user of this functionality. It needs it, because its target drivers, which >> are, basically, SCSI drivers, can deal only with SGs, not with BIOs. But, >> according to the latest discussions, there can be other potential users for of >> this functionality, so I'm sending this patch in a hope that it will be >> also useful for them and eventually will be merged in the mainline kernel. >> >> In the previous submissions this patch was called "New implementation of >> scsi_execute_async()", but since in this version scsi_execute_async() was >> removed from it by request of Boaz Harrosh the name was changed accordingly. > > Generally this patch looks great, I just have one little thing I'd like > to point out: > >> + while (hbio != NULL) { >> + bio = hbio; >> + hbio = hbio->bi_next; >> + bio->bi_next = NULL; >> + >> + blk_queue_bounce(q, &bio); >> + >> + res = blk_rq_append_bio(q, rq, bio); >> + if (unlikely(res != 0)) { >> + bio->bi_next = hbio; >> + hbio = bio; >> + /* We can have one or more bios bounced */ >> + goto out_unmap_bios; >> + } >> + } > > Constructs like this are always dangerous, because of how mempools work. > __blk_queue_bounce() will internally do: > > bio = bio_alloc(GFP_NOIO, cnt); > > so you could potentially enter a deadlock if a) you are the only one > allocating a bio currently, and b) the alloc fails and we wait for a bio > to be returned to the pool. This is highly unlikely and requires other > conditions to be dire, but it is a problem. This is not restricted to > the swap out path, the problem is purely lack of progress. So the golden > rule is always that you either allocate these units from a private pool > (which is hard for bouncing, since it does both page and bio allocations > from a mempool), or that you always ensure that a previously allocated > bio is in flight before attempting a new alloc. Sorry for the late reply, I was on vacation. I see your concerns. Since all the bios in __blk_rq_map_kern_sg() at first all allocated and only then submitted for I/O, bio_alloc() in __blk_queue_bounce() potentially can deadlock, if it's called with GFP_NOIO (i.e. with __GFP_WAIT) and its mempool gets empty. The fact that __blk_rq_map_kern_sg() allocates originally bios using bio_kmalloc() doesn't fundamentally change that, only low the failure probability. (Just to make sure I understand everything correctly.) Potentially this can be a problem, since SCST nearly always uses GFP_KERNEL as the mask, i.e. has __GFP_WAIT set, although, I agree, the deadlock is very unlikely. To address it and other similar cases, which, I guess, should exist, I see the following 2 ways: 1. Increase BIO_POOL_SIZE from current 2 to a bigger value to be large enough to satisfy such full requests allocations for the maximum requests. In ideal, for the worst case it should be something like for 2MB * NR_CPUS much data, which is 2MB / (BIO_MAX_PAGES * PAGE_SIZE) * NR_CPUS = 2NR_CPUS with 4K pages. But on practice, possibly something like 10-20 should be sufficient? 2. Modify blk_queue_bounce() that it can fail with bounce buffers allocation and graciously process that in __blk_rq_map_kern_sg() and all other similar places. Which way would you prefer? Or do you think the probability for such deadlock is so low, so it doesn't worth the effort to do anything with it? Thanks a lot for review! Vlad -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/