Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934921AbcJ0RYW (ORCPT ); Thu, 27 Oct 2016 13:24:22 -0400 Received: from arcturus.aphlor.org ([188.246.204.175]:54062 "EHLO arcturus.aphlor.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752288AbcJ0RYT (ORCPT ); Thu, 27 Oct 2016 13:24:19 -0400 Date: Thu, 27 Oct 2016 13:23:32 -0400 From: Dave Jones To: Dave Chinner Cc: Chris Mason , Andy Lutomirski , Andy Lutomirski , Linus Torvalds , Jens Axboe , Al Viro , Josef Bacik , David Sterba , linux-btrfs , Linux Kernel Subject: Re: bio linked list corruption. Message-ID: <20161027172332.krdsrc5rlivq4mrv@codemonkey.org.uk> Mail-Followup-To: Dave Jones , Dave Chinner , Chris Mason , Andy Lutomirski , Andy Lutomirski , Linus Torvalds , Jens Axboe , Al Viro , Josef Bacik , David Sterba , linux-btrfs , Linux Kernel References: <20161020230341.jsxpia2sy53xn5l5@codemonkey.org.uk> <20161021200245.kahjzgqzdfyoe3uz@codemonkey.org.uk> <20161022152033.gkmm3l75kqjzsije@codemonkey.org.uk> <20161024044051.onmh4h6sc2bjxzzc@codemonkey.org.uk> <77d9983d-a00a-1dc1-a9a1-631de1d0c146@fb.com> <20161026002752.qvrm6yxqb54fiqnd@codemonkey.org.uk> <20161027054133.GM14023@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161027054133.GM14023@dastard> User-Agent: NeoMutt/20161014 (1.7.1) X-Spam-Flag: skipped (authorised relay user) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1260 Lines: 26 On Thu, Oct 27, 2016 at 04:41:33PM +1100, Dave Chinner wrote: > And that's indicative of a delalloc metadata reservation being > being too small and so we're allocating unreserved blocks. > > Different symptoms, same underlying cause, I think. > > I see the latter assert from time to time in my testing, but it's > not common (maybe once a month) and I've never been able to track it > down. However, it doesn't affect production systems unless they hit > ENOSPC hard enough that it causes the critical reserve pool to be > exhausted iand so the allocation fails. That's extremely rare - > usually takes a several hundred processes all trying to write as had > as they can concurrently and to all slip through the ENOSPC > detection without the correct metadata reservations and all require > multiple metadata blocks to be allocated durign writeback... > > If you've got a way to trigger it quickly and reliably, that would > be helpful... Seems pretty quickly reproducable for me in some shape or form. Run trinity with --enable-fds=testfile and create enough children to create a fair bit of contention, (for me -C64 seems a good fit on spinning rust, but if you're running on shiny nvme you might have to pump it up a bit). Dave