Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753557AbbGWOeG (ORCPT ); Thu, 23 Jul 2015 10:34:06 -0400 Received: from mx1.redhat.com ([209.132.183.28]:33169 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753413AbbGWOdy (ORCPT ); Thu, 23 Jul 2015 10:33:54 -0400 Date: Thu, 23 Jul 2015 10:33:52 -0400 From: Mike Snitzer To: Dave Chinner Cc: Eric Sandeen , axboe@kernel.dk, linux-kernel@vger.kernel.org, xfs@oss.sgi.com, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, hch@lst.de, Vivek Goyal Subject: Re: [RFC PATCH] block: xfs: dm thin: train XFS to give up on retrying IO if thinp is out of space Message-ID: <20150723143352.GA23921@redhat.com> References: <20150720223610.GV7943@dastard> <55AE6670.40903@redhat.com> <20150721174753.GA8563@redhat.com> <20150722000923.GB7943@dastard> <20150722010056.GC7943@dastard> <20150722014029.GA10628@redhat.com> <20150722023711.GD7943@dastard> <20150722133451.GB16842@redhat.com> <55AFC496.4000009@redhat.com> <20150723051043.GB3902@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150723051043.GB3902@dastard> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4809 Lines: 104 On Thu, Jul 23 2015 at 1:10am -0400, Dave Chinner wrote: > On Wed, Jul 22, 2015 at 11:28:06AM -0500, Eric Sandeen wrote: > > On 7/22/15 8:34 AM, Mike Snitzer wrote: > > > On Tue, Jul 21 2015 at 10:37pm -0400, > > > Dave Chinner wrote: > > >> On Tue, Jul 21, 2015 at 09:40:29PM -0400, Mike Snitzer wrote: > > >>> I'm open to considering alternative interfaces for getting you the info > > >>> you need. I just don't have a great sense for what mechanism you'd like > > >>> to use. Do we invent a new block device operations table method that > > >>> sets values in a 'struct no_space_strategy' passed in to the > > >>> blockdevice? > > >> > > >> It's long been frowned on having the filesystems dig into block > > >> device structures. We have lots of wrapper functions for getting > > >> information from or performing operations on block devices. (e.g. > > >> bdev_read_only(), bdev_get_queue(), blkdev_issue_flush(), > > >> blkdev_issue_zeroout(), etc) and so I think this is the pattern we'd > > >> need to follow. If we do that - bdev_get_nospace_strategy() - then > > >> how that information gets to the filesystem is completely opaque > > >> at the fs level, and the block layer can implement it in whatever > > >> way is considered sane... > > >> > > >> And, realistically, all we really need returned is a enum to tell us > > >> how the bdev behaves on enospc: > > >> - bdev fails fast, (i.e. immediate ENOSPC) > > >> - bdev fails slow, (i.e. queue for some time, then ENOSPC) > > >> - bdev never fails (i.e. queue forever) > > >> - bdev doesn't support this (i.e. EOPNOTSUPP) > > > > I'm not sure how this is more useful than the bdev simply responding to > > a query of "should we keep trying IOs?" > > - bdev fails fast, (i.e. immediate ENOSPC) > > XFS should use a bound retry behaviour for to allow the possiblity of > the admin adding more space before we shut down the fs. i.e. > XFS fails slow. > > - bdev fails slow, (i.e. queue for some time, then ENOSPC) > > We know that IOs are going to be delayed before they are failed, so > there's no point in retrying as the admin has already had a chance > to resolve the ENOSPC condition before failure was reported. i.e. > XFS fails fast. > > - bdev never fails (i.e. queue forever) > > Block device will appear to hang when it runs out of space. Nothing > XFS can do here because IOs never fail, but we need to note this in > the log at mount time so that filesystem hangs are easily explained > when reported to us. > > - bdev doesn't support this (i.e. EOPNOTSUPP) > > XFS uses default "retry forever" behaviour. > > > > This 'struct no_space_strategy' would be invented purely for > > > informational purposes for upper layers' benefit -- I don't consider it > > > a "block device structure" it the traditional sense. > > > > > > I was thinking upper layers would like to know the actual timeout value > > > for the "fails slow" case. As such the 'struct no_space_strategy' would > > > have the enum and the timeout. And would be returned with a call: > > > bdev_get_nospace_strategy(bdev, &no_space_strategy) > > > > Asking for the timeout value seems to add complexity. It could change after > > we ask, and knowing it now requires another layer to be handling timeouts... > > I don't think knowing the bdev timeout is necessary because the > default is most likely to be "fail fast" in this case. i.e. no > retries, just shut down. IOWs, if we describe the configs and > actions in neutral terms, then the default configurations easy for > users to understand. i.e: > > bdev enospc XFS default > ----------- ----------- > Fail slow Fail fast > Fail fast Fail slow > Fail never Fail never, Record in log > EOPNOTSUPP Fail never > > With that in mind, I'm thinking I should drop the > "permanent/transient" error classifications, and change it "failure > behaviour" with the options "fast slow [never]" and only the slow > option has retry/timeout configuration options. I think the "never" > option still needs to "fail at unmount" config variable, but we > enable it by default rather than hanging unmount and requiring a > manual shutdown like we do now.... This all sounds good to me. The simpler XFS configuration looks like a nice improvement. If you just want to stub out the call to bdev_get_nospace_strategy() I can crank through implementing it once I get a few minutes. Btw, not sure what I was thinking when suggesting XFS would benefit from knowing the duration of the thinp no_space_timeout. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/