Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp742252imu; Tue, 27 Nov 2018 21:33:58 -0800 (PST) X-Google-Smtp-Source: AFSGD/W6k0pHkiMoTqEalqhA+u2jtQ4RzUP3iY5Su75GrsFaBsgbjPidjv5DKntSj3Bx0ZlO4O1D X-Received: by 2002:a63:554b:: with SMTP id f11mr32928971pgm.37.1543383238551; Tue, 27 Nov 2018 21:33:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543383238; cv=none; d=google.com; s=arc-20160816; b=vV7xsrZ716pAKGDNVXCfrEkSsGioV7h+fxoRmC+fRT5tT/Rco6vZoRrQTkV4uUqU8A E603gsKal1yJA97hRUck4U7tzHbmgfD3W+MIHL9mJU1140YIlwwBYej8f+HeJJ33gWU4 avNtFf50HzIOWvJE5cRAicUSZd4vER9U902+ppr0khLoE8BF3tasKqzfOq6Ll13QkrRs XcHfsycChtyLSTZfTdDtyzc8EI36AzhgIPggiLcC7Iggy8VBuPsP0A5CwTMtNHtpplVT 6e5F44AJOgPtPlmOfmwH0bpZjPQQEV2SrVCt0Mvwtx8QX6FkluYUbjsnjt5Wyhvp36Q+ 9yFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=bCV1U+/BiZSRGI0mNgvhsKtmXSH3nry0xNwaJqiGAys=; b=eS0E0xHHJtsgALGyZh4yoOEI7BhiaITmnWu0bhfpN4XmB3IF2X2MoH4i3N1rXWyRvI Jv2gF4yBqDICX2XEiLJs5NcoZ3Urjeg1NIb2zzR96mkihedUn2L6U9ezg2IJ+kf7T06X WqcEyccxlRbbnE6W7yxw8AKqgmsOyAkahBQhwwkxl3VtE8uyhL4aizEFzEU4NdIy+NE+ 28CCb4rKgtnzhBAsweCFUCEdQikUEWtO2kZimCkb50zTX8c2V5FBF8RbfXTPNVNlunc8 /22rdYvMkEpYG/pJoX4h8+zAJPX4I1QKJ5xAeibkJCGGe6nCzMdGAp9ThIP9V6an+Taj cwiA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w14si6227276plq.145.2018.11.27.21.33.43; Tue, 27 Nov 2018 21:33:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727382AbeK1Qdb (ORCPT + 99 others); Wed, 28 Nov 2018 11:33:31 -0500 Received: from ipmail07.adl2.internode.on.net ([150.101.137.131]:31666 "EHLO ipmail07.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726972AbeK1Qda (ORCPT ); Wed, 28 Nov 2018 11:33:30 -0500 Received: from ppp59-167-129-252.static.internode.on.net (HELO dastard) ([59.167.129.252]) by ipmail07.adl2.internode.on.net with ESMTP; 28 Nov 2018 16:03:04 +1030 Received: from dave by dastard with local (Exim 4.80) (envelope-from ) id 1gRsT9-00049X-EY; Wed, 28 Nov 2018 16:33:03 +1100 Date: Wed, 28 Nov 2018 16:33:03 +1100 From: Dave Chinner To: Allison Henderson Cc: linux-block@vger.kernel.org, linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, martin.petersen@oracle.com, shirley.ma@oracle.com, bob.liu@oracle.com Subject: Re: [RFC PATCH v1 0/7] Block/XFS: Support alternative mirror device retry Message-ID: <20181128053303.GL6311@dastard> References: <1543376991-5764-1-git-send-email-allison.henderson@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1543376991-5764-1-git-send-email-allison.henderson@oracle.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 27, 2018 at 08:49:44PM -0700, Allison Henderson wrote: > Motivation: > When fs data/metadata checksum mismatch, lower block devices may have other > correct copies. e.g. If XFS successfully reads a metadata buffer off a raid1 but > decides that the metadata is garbage, today it will shut down the entire > filesystem without trying any of the other mirrors. This is a severe > loss of service, and we propose these patches to have XFS try harder to > avoid failure. > > This patch prototype this mirror retry idea by: > * Adding @nr_mirrors to struct request_queue which is similar as > blk_queue_nonrot(), filesystem can grab device request queue and check max > mirrors this block device has. > Helper functions were also added to get/set the nr_mirrors. > > * Expanding bi_write_hint to bi_rw_hint, now @bi_rw_hint has three meanings. > 1.Original write_hint. > 2.end_io() will update @bi_rw_hint to reflect which mirror this i/o really happened. > 3.Fs set @bi_rw_hint to force driver e.g raid1 read from a specific mirror. > > * Modify md/raid1 to support this retry feature. > > * Add b_rw_hint to xfs_buf > This patch adds a new field b_rw_hint to xfs_buf. We will use this to set the > new bio->bi_rw_hint when submitting the read request, and also to store the > returned mirror when the read compleates One thing that is going to make this more complex at the XFS layer is discontiguous buffers. They require multiple IOs (and therefore bios) and so we are going to need to ensure that all the bios use the same bi_rw_hint. This is another reason I suggest that bi_rw_hint has a magic value for "block layer selects mirror" and separate the initial read from the retry iterations. That allows us to let he block layer ot pick whatever leg it wants for the initial read, but if we get a failure we directly control the mirror we retry from and all bios in the buffer go to that same mirror. > We're not planning to take over all 16 bits of the read hint field; just looking for > feedback about the sanity of the overall approach. It seems conceptually simple enough - the biggest questions I have are: - how does propagation through stacked layers work? - is it generic/abstract enough to be able to work with RAID5/6 to trigger verification/recovery from the parity information in the stripe? Cheers, Dave. -- Dave Chinner david@fromorbit.com