Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750852AbbLTSoN (ORCPT ); Sun, 20 Dec 2015 13:44:13 -0500 Received: from mail-pf0-f196.google.com ([209.85.192.196]:35979 "EHLO mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750727AbbLTSoL (ORCPT ); Sun, 20 Dec 2015 13:44:11 -0500 Date: Sun, 20 Dec 2015 09:44:04 -0900 From: Kent Overstreet To: Christoph Hellwig Cc: Linus Torvalds , Ming Lin , Jens Axboe , "Artem S. Tashkinov" , Steven Whitehouse , Tejun Heo , IDE-ML , Linux Kernel Mailing List Subject: Re: IO errors after "block: remove bio_get_nr_vecs()" Message-ID: <20151220184404.GA18035@kmo-pixel> References: <20151220181801.GA12402@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151220181801.GA12402@lst.de> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2002 Lines: 41 On Sun, Dec 20, 2015 at 07:18:01PM +0100, Christoph Hellwig wrote: > On Sun, Dec 20, 2015 at 09:51:14AM -0800, Linus Torvalds wrote: > > Kent, Jens, Christoph et al, > ie please see this bugzilla: > >o > > httpps://bugzilla.kernel.org/show_bug.cgi?id=109661 > > > > where Artem Tashkinov bisected his problems with 4.3 down to commit > > b54ffb73cadc ("block: remove bio_get_nr_vecs()") that you've all > > signed off on. > > Artem, > > can you re-check the commits around this series again? I would be > extremtly surprised if it's really this particular commit and not > one just before it causing the problem - it just allocates bios > to the biggest possible instead of only allocating up to what > bio_add_page would accept. pretty sure it's something with how blk_bio_segment_split() decides what segments are mergable and not. bio_get_nr_vecs() was just returning nr_pages == queue_max_segments (ignoring sectors for the moment) - so wait, wtf? that's basically assuming no segment merging can ever happen, if it does then this was causing us to send smaller requests to the device than we could have been. so actually two possibilities I can see: - in blk_bio_segment_split(), something's screwed up with how it decides what segments are going to be mergable or not. but I don't think that's likely since it's doing the exact same thing the rest of the segment merging code does. - or, the driver was lying in its queue limits, using queue_max_segments for "the maximum number of pages I can possibly take", and that bug lurked undiscovered because of the screwed-upness in bio_get_nr_vecs(). Offhand I don't know where to start digging in the driver code to look into the second theory though. Tejun, you got any ideas? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/