Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755443Ab0LDXwl (ORCPT ); Sat, 4 Dec 2010 18:52:41 -0500 Received: from mail-fx0-f46.google.com ([209.85.161.46]:64357 "EHLO mail-fx0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754564Ab0LDXwj convert rfc822-to-8bit (ORCPT ); Sat, 4 Dec 2010 18:52:39 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=l2IWsL14JVX1ObYbh05Ssm3H+9W6OHOYBZMpD7BB153iGp6ImGAEL4n4SYMiLg59uG 6sp0FY0ZRILEebqIi3eMmjfDezg+wXeiWkrZIZWOv0SxKce0mZUFTJJmynHiQkEU4uMV e+LG8tEs8mcFc5kSNFlxcLKhS7+5oSMtl5X6g= MIME-Version: 1.0 In-Reply-To: <20101204193828.GB13871@redhat.com> References: <4CE05A9E.9090204@redhat.com> <20101201165229.GC13415@redhat.com> <4CF692D1.1010906@redhat.com> <4CF6B3E8.2000406@redhat.com> <20101201212310.GA15648@redhat.com> <20101204193828.GB13871@redhat.com> Date: Sun, 5 Dec 2010 00:52:36 +0100 Message-ID: Subject: Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective) From: Matt To: Mike Snitzer Cc: Milan Broz , Andi Kleen , linux-btrfs , dm-devel , Linux Kernel , htd , Chris Mason , htejun@gmail.com, linux-ext4@vger.kernel.org, Jon Nelson Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3563 Lines: 90 On Sat, Dec 4, 2010 at 8:38 PM, Mike Snitzer wrote: > On Sat, Dec 04 2010 at ?2:18pm -0500, > Matt wrote: > >> On Wed, Dec 1, 2010 at 10:23 PM, Mike Snitzer wrote: >> > Matt and Jon, >> > >> > If you'd be up to it: could you try testing your dm-crypt+ext4 >> > corruption reproducers against the following two 2.6.37-rc commits: >> > >> > 1) 1de3e3df917459422cb2aecac440febc8879d410 >> > then >> > 2) bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc >> > >> > Then, depending on results of no corruption for those commits, bonus >> > points for testing the same commits but with Andi and Milan's latest >> > dm-crypt cpu scalability patch applied too: >> > https://patchwork.kernel.org/patch/365542/ >> > >> > Thanks! >> > Mike >> > >> >> Hi Mike, >> >> it seems like there isn't even much testing to do: >> >> I tested all 3 commits / checkouts by re-compiling gcc which was/is >> the 2nd easy way to trigger this "corruption", compiling google's >> chromium (v9) and looking at the output/existance of gcc, g++ and >> eselect opengl list > > Can you be a bit more precise about what you're doing to reproduce? > What sequence? ?What (if any) builds are going in parallel? ?Etc. > >> so far everything went fine >> >> After that I used the new patch (v6 or pre-v6), before that I had to >> >> replace WQ_MEM_RECLAIM with WQ_RESCUER >> >> and, re-compiled the kernels >> >> shortly after I had booted up the system with the first kernel >> (http://git.eu.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5a87b7a5da250c9be6d757758425dfeaf8ed3179) >> the output of 'eselect opengl list' did show no opengl backend >> selected >> >> so it seems to manifest itself even earlier (ext4: call >> mpage_da_submit_io() from mpage_da_map_blocks()) even if only subtly >> and over time - >> I'm still currently running that kernel and posting from it & having tests run > > OK. > >> I'm not sure if it's even a problem with ext4 - I haven't had the time >> to test with XFS yet - maybe it's also happening with that so it more >> likely would be dm-core, like Milan suspected >> (http://marc.info/?l=linux-kernel&m=129123636223477&w=2) :( > > It'd be interesting to try to reproduce with that same kernel but using > XFS. ?I'll check with Milan on what he thinks would be the best next > steps. ?Ideally we'll be able to reproduce your results to aid in > pinpointing the issue. ?I think Milan will be trying to do so shortly > (if he hasn't started already -- using gentoo emerge, etc). > >> even though most of the time it's compiling I don't need to do much - >> I need the box for work so if my time allows next tests would be next >> weekend and I'm back to my other partition >> >> I really do hope that this bugger can be nailed down ASAP - I like the >> improvements made in 2.6.37 but without the dm-crypt multi-cpu patch >> it's only half the "fun" ;) > > Sure, we'll need to get to the bottom of this before we can have > confidence sending the dm-crypt cpu scalability patch upstream. > > Thanks for your testing, > Mike > I should have made it clear that the results I get are observed when using the kernels/checkouts *with* the dm-crypt multi-cpu patch, without the patch I didn't see that kind of problems (hardlocks, files missing, etc.) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/