Received: by 10.223.164.202 with SMTP id h10csp5918317wrb; Tue, 21 Nov 2017 18:32:52 -0800 (PST) X-Google-Smtp-Source: AGs4zMZdnb0gAiELVfJ6JJifKdDz9f3tGGEg82HRNVV58/PnCcbriOqQEwe7lEyRQk68YL+/yFrQ X-Received: by 10.84.131.36 with SMTP id 33mr19643342pld.348.1511317972583; Tue, 21 Nov 2017 18:32:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511317972; cv=none; d=google.com; s=arc-20160816; b=K5Gpv8YxtemgBHbTrpd+IPa38UR044ZakdikQdNJzmGlzgVCWYPthEUVe405itipgi lHAqwDGCRGgpN64NArFeG0UHBk8Gzs9M7zm+7zdXJSt0rXIN19wPI9JJC4ujNj6N/5hf uzn7TjMsCUJm00vhweTT35AGfjPq6jidlnAijSAbslggVcf5ndocbvTI2LgH0pPpvtA4 nPn73f3B33ZFGt+8bCI7oUUKwvB7xvuQEXFn5X9nBNmc/ttdBwC/HqPs6rX8ZUv8eviN 1dxbL3ZXyclOeqYFHYSCcNYc2qsEKqS6qufA7ixwi9gZefWGSXm6g41E/YpBfcgy/0HO 428A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=8fQMPwzMRoT0SviwLMrGTV6C32ZZ6g/9s/stl3gIx3k=; b=wqb2wLnRsA1kiuWqE7cPSfJ2dgO6Mz+iuK/TO6G63ZReTj3wFeuYrz/IRApucvkedy W7AYEdtm5DQ8Ih5InIQz6TnIRIfqTZK24GSQ6RxvVYwUz4YjnCfG2cPs6/fSjeWwQ+zI fcu9BbjWDrJdksQUJ52auAufzfbBOvlAcf46Cf20FpPW/7Upt/71iDqhgf/ZcITFxX1O 37Wg3XwFWMkunvQJ3DUD1N/b80LqBFWhnTxkHHshyd46pPnSkNDzzG9IcZgOAlgfrBss aIBm8AYTF1GXgoQXbAOZ5ORYv/Sf46Yyc9QxGquw7yWRIQVpS7ctHqZ0hyFXXS4ytI5B vBhw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z8si12124931pgr.541.2017.11.21.18.32.41; Tue, 21 Nov 2017 18:32:52 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752445AbdKVCcK (ORCPT + 76 others); Tue, 21 Nov 2017 21:32:10 -0500 Received: from mx1.redhat.com ([209.132.183.28]:53508 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752425AbdKVCcI (ORCPT ); Tue, 21 Nov 2017 21:32:08 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6F13B83F3D; Wed, 22 Nov 2017 02:32:08 +0000 (UTC) Received: from localhost (unknown [10.18.25.149]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B847860C8B; Wed, 22 Nov 2017 02:32:04 +0000 (UTC) Date: Tue, 21 Nov 2017 21:32:03 -0500 From: Mike Snitzer To: Mikulas Patocka Cc: NeilBrown , Jens Axboe , "linux-kernel@vger.kernel.org" , linux-block@vger.kernel.org, device-mapper development , Zdenek Kabelac Subject: Re: new patchset to eliminate DM's use of BIOSET_NEED_RESCUER Message-ID: <20171122023203.GA20417@redhat.com> References: <87poe13rmm.fsf@notabene.neil.brown.name> <87a7zg31vx.fsf@notabene.neil.brown.name> <20171121013533.GA14520@redhat.com> <20171121121049.GA17014@redhat.com> <20171121124311.GA17243@redhat.com> <20171121194709.GA18903@redhat.com> <20171121225119.GA19630@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Wed, 22 Nov 2017 02:32:08 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 21 2017 at 8:21pm -0500, Mikulas Patocka wrote: > > > On Tue, 21 Nov 2017, Mike Snitzer wrote: > > > On Tue, Nov 21 2017 at 4:23pm -0500, > > Mikulas Patocka wrote: > > > > > This is not correct: > > > > > > 2206 static void dm_wq_work(struct work_struct *work) > > > 2207 { > > > 2208 struct mapped_device *md = container_of(work, struct mapped_device, work); > > > 2209 struct bio *bio; > > > 2210 int srcu_idx; > > > 2211 struct dm_table *map; > > > 2212 > > > 2213 if (!bio_list_empty(&md->rescued)) { > > > 2214 struct bio_list list; > > > 2215 spin_lock_irq(&md->deferred_lock); > > > 2216 list = md->rescued; > > > 2217 bio_list_init(&md->rescued); > > > 2218 spin_unlock_irq(&md->deferred_lock); > > > 2219 while ((bio = bio_list_pop(&list))) > > > 2220 generic_make_request(bio); > > > 2221 } > > > 2222 > > > 2223 map = dm_get_live_table(md, &srcu_idx); > > > 2224 > > > 2225 while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) { > > > 2226 spin_lock_irq(&md->deferred_lock); > > > 2227 bio = bio_list_pop(&md->deferred); > > > 2228 spin_unlock_irq(&md->deferred_lock); > > > 2229 > > > 2230 if (!bio) > > > 2231 break; > > > 2232 > > > 2233 if (dm_request_based(md)) > > > 2234 generic_make_request(bio); > > > 2235 else > > > 2236 __split_and_process_bio(md, map, bio); > > > 2237 } > > > 2238 > > > 2239 dm_put_live_table(md, srcu_idx); > > > 2240 } > > > > > > You can see that if we are in dm_wq_work in __split_and_process_bio, we > > > will not process md->rescued list. > > > > Can you elaborate further? We cannot be "in dm_wq_work in > > __split_and_process_bio" simultaneously. Do you mean as a side-effect > > of scheduling away from __split_and_process_bio? > > > > The more detail you can share the better. > > Suppose this scenario: > > * dm_wq_work calls __split_and_process_bio Right, I later realized this was the call chain you were referring to. Not sure how I missed it the first time around. > * __split_and_process_bio eventually reaches the function snapshot_map > * snapshot_map attempts to take the snapshot lock > > * the snapshot lock could be released only if some bios submitted by the > snapshot driver to the underlying device complete > * the bios submitted to the underlying device were already offloaded by > some other task and they are waiting on the list md->rescued > * the bios waiting on md->rescued are not processed, because dm_wq_work is > blocked in snapshot_map (called from __split_and_process_bio) SO you're saying the case that Neil doesn't think should happen: https://lkml.org/lkml/2017/11/21/658 ...can happen. > > > The processing of md->rescued is also wrong - bios for different devices > > > must be offloaded to different helper threads, so that processing a bio > > > for a lower device doesn't depend on processing a bio for a higher device. > > > If you offload all the bios on current->bio_list to the same thread, the > > > bios still depend on each other and the deadlock will still happen. > > > > Commit 325738403 ("dm: revise 'rescue' strategy for bio-based bioset > > allocations") speaks to this with: > > > > "Note that only current->bio_list[0] is offloaded. current->bio_list[1] > > contains bios that were scheduled *before* the current one started, so > > they must have been submitted from higher up the stack, and we cannot be > > waiting for them here (thanks to the "dm: ensure bio submission follows > > a depth-first tree walk" commit). Also, we now rescue *all* bios on the > > list as there is nothing to be gained by being more selective." > > I think you are right - if we only offload current->bio_list[0], then > mixing of dependent bios on the offloaded list won't happen. > > > And again: this patchset passes your dm-snapshot deadlock test. Is > > that test somehow lacking? > > With your patchset, the deadlock would happen only if bios are queued on > &md->deferred - and that happens only in case of resume or if we are > processing REQ_PREFLUSH with non-zero data size. > > So, the simple test that I wrote doesn't trigger it, but a more complex > test involving REQ_PREFLUSH could. Makes sense. But I need to think further about _why_ bios submitted to the snapshot driver's underlying device would end up on md->rescued (like you suggested above). Again, Neil thinks it not possible. Neil said: "they will not be recursive calls, so nothing will be added to current->bio_list[0] and nothing will be moved to md->rescued. Each generic_make_request() will completely submit the request in the lower level devel." Mike From 1584727356726608634@xxx Wed Nov 22 01:22:58 +0000 2017 X-GM-THRID: 1584717880005758635 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread