Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752266AbbKIQhk (ORCPT ); Mon, 9 Nov 2015 11:37:40 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45662 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751841AbbKIQhh (ORCPT ); Mon, 9 Nov 2015 11:37:37 -0500 Date: Mon, 9 Nov 2015 11:37:35 -0500 From: Mike Snitzer To: Sami Tolvanen Cc: Milan Broz , device-mapper development , Mikulas Patocka , Mandeep Baines , Will Drewry , Kees Cook , linux-kernel@vger.kernel.org, Alasdair Kergon , Mark Salyzyn Subject: Re: [PATCH 0/4] dm verity: add support for error correction Message-ID: <20151109163735.GA28884@redhat.com> References: <1446688954-29589-1-git-send-email-samitolvanen@google.com> <563B066C.6050202@redhat.com> <20151105173306.GA22302@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151105173306.GA22302@google.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4746 Lines: 111 On Thu, Nov 05 2015 at 12:33pm -0500, Sami Tolvanen wrote: > On Thu, Nov 05, 2015 at 08:34:04AM +0100, Milan Broz wrote: > > could you please elaborate why is all this needed? To extend support > > of some faulty flash chips? > > This makes dm-verity more robust against corruption caused by either > hardware or software bugs, both of which we have seen in the past on > actual devices. > > Note that unlike the error correction sometimes included in flash > storage devices, this doesn't merely protect against random bit flips, > it makes it possible to recover from several megabytes of corrupted > or lost data. Google (via Android and/or ChromeOS) is the primary consumer of dm-verity. As such I'm inclined to trust Google's need for this feature and that it has been carefully designed and implemented. But obviously we need to verify that. Patches 1 and 2 look fine to me (just refactoring, no functional change). I may find something upon closer review but we'll cross that bridge if/when we get to it. > > Do you have some statistics that there are really such correctable errors > > in real devices? > > Sorry, I don't have statistics to share at the moment. > > > Anyway, I really do not understand layer separation here. > > I should have elaborated more on this. Implementing this without > integrity checking would not be feasible for a few reasons: > > 1. Being able to detect which blocks are corrupted allows us to > avoid correcting valid blocks. Correcting errors is slow and > this is the only way to keep performance acceptable. > > 2. Due to a property of erasure codes, we can correct twice as > many errors if we know where the errors are. Using the hash > tree to detect corrupted blocks lets us locate erasures. > > 3. Error correction algorithms may not produce valid output and > without integrity checking, there's no reliable way to detect > when we actually succeeded in correcting a block. This all makes sense to me. So for patch 3: I'm left wondering: can the new error correction code be made an optional feature that is off by default? -- so as to preserve some isolation of this new code from the old dm-verity behaviour. Looking at the code it isn't immediately clear to me where any of this is _really_ optional; closest I see if verity_fec_decode() returning -1 if (!v->fec_bufio)... might be good to add a wrapper like verity_fec_is_enabled(). The if (v->fec_dev) {} block in verity_ctr() should probably be split out to a new function. Similar to how drivers/md/dm-thin.c:pool_create() will return error string via **error, etc. In addition the kbuild errors/warnings (reported by the kbuild test robot) need fixing. Also, the 2 other big questions from Mikulas need answering: 1) why aren't you actually adjustng error codes, returning success, if dm-verity was able to trap/correct the corruption? 2) please fix the code to preallocate all required memory -- so that verity_fec_alloc_buffers() isn't called in map. Any reason why you couldn't collect the table's fec options and determine how much additional memory is needed per dm_verity_io? And then just add that to the per-bio-data? > > Are we sure this combination does not create some unintended > > gap in integrity checking? > > Yes, I'm sure. Corrupted blocks are integrity checked again after they > are corrected to make sure only valid data is allowed to pass. Makes sense. > > Why the integrity check should even try to do some > > error correction if there is an intentional integrity attack? > > Most corruption is not malicious and being able to recover from it > makes the system more reliable. This doesn't make it any easier for an > attacker to break dm-verity. That would still require finding a hash > collision (or being able to modify the hash tree). > > > The second question - why are you writing another separate tool > > for maintenance for dm-verity when there is veritysetup? > > Our tool for generating error correction metadata is independent from > dm-verity. This data is also used by other software to correct errors > during software updates, for example. If there's interest, I can help > in adding this functionality to veritysetup. If this error correction feature is going to go upstream we really should see any associated userspace enablement also included in veritysetup. Really no sense in fragmenting the utilities used to setup a dm-verity device. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/