Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752362Ab3CWW47 (ORCPT ); Sat, 23 Mar 2013 18:56:59 -0400 Received: from mx1.redhat.com ([209.132.183.28]:51298 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752220Ab3CWW45 (ORCPT ); Sat, 23 Mar 2013 18:56:57 -0400 Date: Sat, 23 Mar 2013 18:56:48 -0400 From: Mike Snitzer To: "Darrick J. Wong" Cc: Heinz Mauelshagen , Randy Dunlap , linux-kernel@vger.kernel.org, device-mapper development , Mikulas Patocka , Paul Taysom , Linus Torvalds , Joe Thornber Subject: Re: dm: dm-cache fails to write the cache device in writethrough mode Message-ID: <20130323225648.GA13583@redhat.com> References: <20130322201151.GB5357@blackbox.djwong.org> <20130322223425.GA5638@redhat.com> <20130322231600.GD5357@blackbox.djwong.org> <20130323032715.GA7692@redhat.com> <20130323051549.GE5357@blackbox.djwong.org> <20130323210853.GA12164@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130323210853.GA12164@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3148 Lines: 85 On Sat, Mar 23 2013 at 5:08pm -0400, Mike Snitzer wrote: > But even after having changed my test to use /dev/sdb for the origin > device I cannot reproduce the problem you've reported. Do you have any > further details on how/why the bios are being altered? Are you > reliably hitting partial completions within the origin's driver? If so > how? I can easily see bio->bi_size being 0 in writethrough_endio, here is the stack trace from a WARN_ON_ONCE(!bio->bi_size); that I added to writethrough_endio: Call Trace: [] warn_slowpath_common+0x7f/0xc0 [] warn_slowpath_null+0x1a/0x20 [] writethrough_endio+0x13f/0x150 [dm_cache] [] bio_endio+0x3d/0x90 [] req_bio_endio+0xa3/0xe0 [] blk_update_request+0x10f/0x480 [] blk_update_bidi_request+0x27/0xb0 [] blk_end_bidi_request+0x2f/0x80 [] blk_end_request+0x10/0x20 [] scsi_end_request+0x40/0xb0 [] ? entity_tick+0x97/0x420 [] scsi_io_completion+0x9f/0x660 [] ? raise_softirq_irqoff+0x9/0x50 [] scsi_finish_command+0xc9/0x130 [] scsi_softirq_done+0x147/0x170 [] blk_done_softirq+0x82/0xa0 [] __do_softirq+0xe7/0x260 [] ? bio_alloc_bioset+0x65/0x120 [] call_softirq+0x1c/0x30 [] do_softirq+0x65/0xa0 [] irq_exit+0xbd/0xe0 [] do_IRQ+0x66/0xe0 [] common_interrupt+0x6d/0x6d No idea why I was so oblivious to a bio->bi_size of 0 reflecting completion. So nothing to do with partial completion at all. Here is a version of the patch you posted that uses dm_bio_{record,restore} like Alasdair suggested: diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c index 66120bd..90b1dd2 100644 --- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -5,6 +5,7 @@ */ #include "dm.h" +#include "dm-bio-record.h" #include "dm-bio-prison.h" #include "dm-cache-metadata.h" @@ -205,6 +206,7 @@ struct per_bio_data { struct cache *cache; dm_cblock_t cblock; bio_end_io_t *saved_bi_end_io; + struct dm_bio_details bio_details; }; struct dm_cache_migration { @@ -643,6 +645,7 @@ static void writethrough_endio(struct bio *bio, int err) return; } + dm_bio_restore(&pb->bio_details, bio); remap_to_cache(pb->cache, bio, pb->cblock); /* @@ -668,6 +671,7 @@ static void remap_to_origin_then_cache(struct cache *cache, struct bio *bio, pb->cblock = cblock; pb->saved_bi_end_io = bio->bi_end_io; bio->bi_end_io = writethrough_endio; + dm_bio_record(&pb->bio_details, bio); remap_to_origin_clear_discard(pb->cache, bio, oblock); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/