Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932791AbcDYROm (ORCPT ); Mon, 25 Apr 2016 13:14:42 -0400 Received: from mga02.intel.com ([134.134.136.20]:36224 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932130AbcDYROj (ORCPT ); Mon, 25 Apr 2016 13:14:39 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,533,1455004800"; d="scan'208";a="966146404" From: "Verma, Vishal L" To: "hch@infradead.org" CC: "Wilcox, Matthew R" , "linux-kernel@vger.kernel.org" , "linux-block@vger.kernel.org" , "xfs@oss.sgi.com" , "linux-nvdimm@ml01.01.org" , "jmoyer@redhat.com" , "linux-mm@kvack.org" , "viro@zeniv.linux.org.uk" , "axboe@fb.com" , "akpm@linux-foundation.org" , "linux-fsdevel@vger.kernel.org" , "linux-ext4@vger.kernel.org" , "david@fromorbit.com" , "jack@suse.cz" Subject: Re: [PATCH v2 5/5] dax: handle media errors in dax_do_io Thread-Topic: [PATCH v2 5/5] dax: handle media errors in dax_do_io Thread-Index: AQHRlzGOmdVyB28MyEyyQ/rUxU8Y+Z+T1f2AgASHRwCAAoNZAIAAkjsA Date: Mon, 25 Apr 2016 17:14:36 +0000 Message-ID: <1461604476.3106.12.camel@intel.com> References: <1459303190-20072-1-git-send-email-vishal.l.verma@intel.com> <1459303190-20072-6-git-send-email-vishal.l.verma@intel.com> <20160420205923.GA24797@infradead.org> <1461434916.3695.7.camel@intel.com> <20160425083114.GA27556@infradead.org> In-Reply-To: <20160425083114.GA27556@infradead.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.252.138.35] Content-Type: text/plain; charset="utf-8" Content-ID: <80F5C603E926CF47BF71F79A2B125709@intel.com> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id u3PHEodU031084 Content-Length: 2064 Lines: 51 On Mon, 2016-04-25 at 01:31 -0700, hch@infradead.org wrote: > On Sat, Apr 23, 2016 at 06:08:37PM +0000, Verma, Vishal L wrote: > > > > direct_IO might fail with -EINVAL due to misalignment, or -ENOMEM > > due > > to some allocation failing, and I thought we should return the > > original > > -EIO in such cases so that the application doesn't lose the > > information > > that the bad block is actually causing the error. > EINVAL is a concern here.  Not due to the right error reported, but > because it means your current scheme is fundamentally broken - we > need to support I/O at any alignment for DAX I/O, and not fail due to > alignbment concernes for a highly specific degraded case. > > I think this whole series need to go back to the drawing board as I > don't think it can actually rely on using direct I/O as the EIO > fallback. > Agreed that DAX I/O can happen with any size/alignment, but how else do we send an IO through the driver without alignment restrictions? Also, the granularity at which we store badblocks is 512B sectors, so it seems natural that to clear such a sector, you'd expect to send a write to the whole sector. The expected usage flow is: - Application hits EIO doing dax_IO or load/store io - It checks badblocks and discovers it's files have lost data - It write()s those sectors (possibly converted to file offsets using fiemap)     * This triggers the fallback path, but if the application is doing this level of recovery, it will know the sector is bad, and write the entire sector - Or it replaces the entire file from backup also using write() (not mmap+stores)     * This just frees the fs block, and the next time the block is reallocated by the fs, it will likely be zeroed first, and that will be done through the driver and will clear errors I think if we want to keep allowing arbitrary alignments for the dax_do_io path, we'd need: 1. To represent badblocks at a finer granularity (likely cache lines) 2. To allow the driver to do IO to a *block device* at sub-sector granularity Can we do that?