Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757245Ab2EYUlF (ORCPT ); Fri, 25 May 2012 16:41:05 -0400 Received: from mailout-de.gmx.net ([213.165.64.23]:54318 "HELO mailout-de.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1753131Ab2EYUlD (ORCPT ); Fri, 25 May 2012 16:41:03 -0400 X-Authenticated: #7756412 X-Provags-ID: V01U2FsdGVkX18sSH8EoPss4OJhXHCCaxhhCb4T/GXZwnKIrytexV LppNeX3TyHXGk0 Message-ID: <4FBFEE7E.8060008@gmx.net> Date: Fri, 25 May 2012 22:41:34 +0200 From: Arne Jansen User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20111110 Icedove/3.0.11 MIME-Version: 1.0 To: Stefan Behrens CC: Christoph Hellwig , linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v5 0/3] Btrfs: add IO error device stats References: <1337954770-10086-1-git-send-email-sbehrens@giantdisaster.de> <20120525151854.GA23362@infradead.org> <4FBFC63B.6050403@giantdisaster.de> In-Reply-To: <4FBFC63B.6050403@giantdisaster.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2805 Lines: 69 On 05/25/12 19:49, Stefan Behrens wrote: > It would be helpful if already the generic block layer would offer > device error counters. Then btrfs could read them, add own counters for > its checksum detected errors, and store everything persistently in the > filesystem. > I take it that you not only count I/O-errors, but also corrupted blocks and errors generated by misdirected writes. These are informations that are not available to the block layer. > The goal is to replace disks that have an increased error rate with > spare disks, and the goal is to repair this degenerated RAID state quickly. > > > On 05/25/2012 17:18, Christoph Hellwig wrote: >> Can you explain why the device error counters should be in a filesystem >> instead of generic block layer code? >> >> On Fri, May 25, 2012 at 04:06:07PM +0200, Stefan Behrens wrote: > [...] >>> The goal is to detect when drives start to get an increased error rate, >>> when drives should be replaced soon. Therefore statistic counters are >>> added that count IO errors (read, write and flush). Additionally, the >>> software detected errors like checksum errors and corrupted blocks are >>> counted. >>> >>> An ioctl interface is added to get the device statistic counters. >>> A second ioctl is added to atomically get and reset these counters. >>> >>> The device statistics are written into the device tree with each >>> transaction commit. Only modified statistics are written. >>> When a filesystem is mounted, the device statistics for each involved >>> device are read from the device tree and used to initialize the >>> counters. >>> >>> A patch for the btrfs-progs world will also be sent. >>> >>> Stefan Behrens (3): >>> Btrfs: add device counters for detected IO and checksum errors >>> Btrfs: add ioctl to get and reset the device stats >>> Btrfs: read device stats on mount, write modified ones during commit >>> >>> fs/btrfs/ctree.h | 38 ++++++ >>> fs/btrfs/disk-io.c | 20 +++- >>> fs/btrfs/extent_io.c | 18 ++- >>> fs/btrfs/ioctl.c | 26 +++++ >>> fs/btrfs/ioctl.h | 33 ++++++ >>> fs/btrfs/print-tree.c | 3 + >>> fs/btrfs/scrub.c | 65 ++++++++--- >>> fs/btrfs/transaction.c | 4 + >>> fs/btrfs/volumes.c | 304 >>> +++++++++++++++++++++++++++++++++++++++++++++++- >>> fs/btrfs/volumes.h | 52 +++++++++ >>> 10 files changed, 539 insertions(+), 24 deletions(-) >>> >>> -- >>> 1.7.10.2 > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/