Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752587AbcCLNs4 (ORCPT ); Sat, 12 Mar 2016 08:48:56 -0500 Received: from mail-pf0-f172.google.com ([209.85.192.172]:35821 "EHLO mail-pf0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751746AbcCLNss (ORCPT ); Sat, 12 Mar 2016 08:48:48 -0500 Date: Sat, 12 Mar 2016 04:48:43 -0900 From: Kent Overstreet To: Ming Lei Cc: Ming Lin , Jens Axboe , Linux Kernel Mailing List , torvalds@linux-foundation.org Subject: Re: e827091cb1 "block: merge: get the 1st and last bvec via helpers" broken Message-ID: <20160312134843.GA27821@kmo-pixel> References: <20160312074329.GA19066@kmo-pixel> <20160312092421.GA20839@kmo-pixel> <20160312121236.GA25403@kmo-pixel> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4568 Lines: 100 On Sat, Mar 12, 2016 at 09:33:18PM +0800, Ming Lei wrote: > On Sat, Mar 12, 2016 at 8:12 PM, Kent Overstreet > wrote: > > On Sat, Mar 12, 2016 at 06:36:56PM +0800, Ming Lei wrote: > >> Hi Kent, > >> > >> On Sat, Mar 12, 2016 at 5:24 PM, Kent Overstreet > >> wrote: > >> > On Sat, Mar 12, 2016 at 04:49:41PM +0800, Ming Lei wrote: > >> >> On Sat, Mar 12, 2016 at 3:43 PM, Kent Overstreet > >> >> wrote: > >> >> > I don't know exactly how it's broken, but with that patch segment counting is > >> >> > broken - I'm seeing blk_rq_map_sg() overrun the end of the sgtable. > >> >> > > >> >> > I suggest reverting it for 4.5... > >> >> > >> >> Kent, could you share your test case? I'd like to figure out the root cause. > >> > > >> > xfstest 036 on bcachefs. > >> > >> Given bcachefs isn't merged, could you provide one way to reproduce it > >> with clean upstream kernel? > >> > >> > > >> >> BTW, I don't object to revert it given it is close to v4.5 release, but I am > >> >> curious how it breaks segment couting. > >> > > >> > If you want to debug your version (personally I'd just revert to the simpler > >> > one), I'd start by having your helper use both methods to calculate the last > >> > biovec, and then assert that they're equal. > >> > > >> > Also make sure you're testing with a sub-page sized blocksize, if filesystem > >> > blocksize == page size you're not going to be testing the interesting cases > >> > >> I just run xfstests 036 over bcache and md, with block size 1024/2048, with > >> xfs/ext4/btrfs, looks the segment counting issue can't be reproduced. > >> > >> If the issue can only be reproduced with bcachefs, I suggest we don't revert > >> it until the root cause is figured out. > > > > Ming, if blk_rq_map_sg() is overrunning arrays and corrupting memory that's a > > bug in your code - this is certainly a bug in the core block layer - and just > > because you haven't been able to reproduce it yet does _not_ mean that no one > > else will hit it. > > I just mean given bcachefs isn't merged yet, then maybe no one can be effected, > so we have time to figure out the root cause. No - we're at rc7, and there is no good reason to think this is bcachefs specific - when you introduce a regression into mainline the responsible thing to do is revert first, then debug. Especially this close to release - also, why'd this go in after the merge window, Jens? blk-merge.c related stuff is notoriously fragile... > I will try your bcachefs to see if I can reproduce it, could you share me > one tree so that I can test bcachefs? sigh, I'll debug it tomorrow. bcachefs instructions are at the top of the bcache website, but you'll need some minor patches to xfstests. easier for me to just do it. > > > > > I think I know why bcachefs hits it - that particular test is doing DIO writes > > of sub page granularity, creating extents that are logically adjacent and > > adjacent on disk (because they were written one right after the other), but that > > don't get merged because I'm running my tests with data checksumming enabled. > > Then, when we go to read that data - with a buffered read, so page granularity - > > the read gets fragmented into multiple bios (because there's multiple extents), > > where the two bios are adjacent (and pointing to the same page!) and adjacent on > > disk - thus when they're issued you get sub page size segments from two different > > bios that are able to be merged, which is otherwise a highly unusual situation. > > That means the situation of merging two bios, but can't explain why this > commit is wrong. > > Last time we had one bug report under this kind of situation, and the test case > in following link can trigger merging two bios always: > > http://marc.info/?l=linux-scsi&m=144881373723594&w=2 > > And finally a88d32af(blk-merge: fix computing bio->bi_seg_front_size in > case of single segment) is figured out for fixing the bug. > > Today I have run this test case again, and looks everything is fine. > > > > > I'm not about to write a test case for you though, it's your job to figure out > > how to test your code. > > > > Also note that it is entirely possible that the segment counting itself is > > correct with your patch, and the bug is just that segments aren't getting merged > > that the segment counting assumed would be. If that is the case then your patch > > merely exposed the bug, but your patch still needs to be reverted in the > > meantime. > > > Thanks, > Ming Lei