Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755306AbcC3RH7 (ORCPT ); Wed, 30 Mar 2016 13:07:59 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:35995 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932111AbcC3RH5 (ORCPT ); Wed, 30 Mar 2016 13:07:57 -0400 Date: Wed, 30 Mar 2016 10:07:36 -0700 From: Shaohua Li To: Ming Lei CC: , Linux Kernel Mailing List , Jens Axboe , FB Kernel Team , "4.2+" Subject: Re: [PATCH] block: don't make BLK_DEF_MAX_SECTORS too big Message-ID: <20160330170735.GA3596724@devbig084.prn1.facebook.com> References: <21cf85d32278bbe5acbc3def0a6db75db98a2670.1459269590.git.shli@fb.com> <20160330022651.GA2147487@devbig084.prn1.facebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-12-10) X-Originating-IP: [192.168.52.123] X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-03-30_09:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4246 Lines: 93 On Wed, Mar 30, 2016 at 08:13:07PM +0800, Ming Lei wrote: > Hi Shaohua, > > On Wed, Mar 30, 2016 at 10:27 AM, Shaohua Li wrote: > > On Wed, Mar 30, 2016 at 09:39:35AM +0800, Ming Lei wrote: > >> On Wed, Mar 30, 2016 at 12:42 AM, Shaohua Li wrote: > >> > bio_alloc_bioset() allocates bvecs from bvec_slabs which can only > >> > allocate maximum 256 bvec (eg, 1M for 4k pages). We can't bump > >> > BLK_DEF_MAX_SECTORS to exceed this value otherwise bio_alloc_bioset will > >> > fail. > >> > > >> > In the future, we can extend the size either bvec_slabs array is > >> > expanded or the upcoming multipage bvec is added if pages are > >> > contiguous. This one is suitable for stable. > >> > > >> > Fixes: d2be537c3ba (block: bump BLK_DEF_MAX_SECTORS to 2560) > >> > Reported-by: Sebastian Roesner > >> > Cc: stable@vger.kernel.org (4.2+) > >> > Cc: Ming Lei > >> > Reviewed-by: Jeff Moyer > >> > Signed-off-by: Shaohua Li > >> > --- > >> > include/linux/blkdev.h | 6 +++++- > >> > 1 file changed, 5 insertions(+), 1 deletion(-) > >> > > >> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > >> > index 7e5d7e0..da64325 100644 > >> > --- a/include/linux/blkdev.h > >> > +++ b/include/linux/blkdev.h > >> > @@ -1153,7 +1153,11 @@ extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm); > >> > enum blk_default_limits { > >> > BLK_MAX_SEGMENTS = 128, > >> > BLK_SAFE_MAX_SECTORS = 255, > >> > - BLK_DEF_MAX_SECTORS = 2560, > >> > + /* > >> > + * if you change this, please also change bvec_alloc and BIO_MAX_PAGES. > >> > + * Otherwise bio_alloc_bioset will break. > >> > + */ > >> > + BLK_DEF_MAX_SECTORS = BIO_MAX_SECTORS, > >> > >> Thinking about it further, it isn't good to change the default max > >> sectors because > >> the patch affects REQ_PC bios too, which don't have the 1Mbytes limit at all. > > > > what breaks setting REQ_PC to 1M limit? I can understand bigger limit might help > > big raid array performance, but REQ_PC isn't the case. > > I mean REQ_PC can include at most 1024 vectors intead of 256, so looks it isn't > fair to introduce the strict limit for all kinds of requests. > > More importantly, the max sector limit is for limitting max sectors in > a request, > and is used for bios merging, not same with bio's 256 bvecs limit. My point is this doesn't matter because there is no performance issue. 2560 isn't fair too which uses 320 vectors. And note, blk_queue_max_hw_sectors doesn't force max_hw_sectors has the BLK_DEF_MAX_SECTORS limit. > > > >> So suggest to just change bcache's queue max sector limit to 1M, that means > >> we shouldn't encourage bcache's usage of bypassing bio_add_page(). > > > > Don't think this is a good idea. This is a limitation of block core, > > This bio's 256 bvecs limitation is from block implementation, think about why > one bvec just includes one page, instead of one segment. In the future, it can > be improved absolutely, that is why I said it isn't good to use BIO_MAX_SECTORS. > Also you can find that there is only one user of BIO_MAX_SECTORS. Don't disagree. But when you switch to multpage bvec, you must fix this anyway, let's fix current problem. Both 1M or 2560 sectors are wrong in that case. The size limit could be 1M if pages are not contiguous or 256 * max_segment_size. > > block core should make sure the limitation doesn't break, not the > > driver. On the other hand, are you going to fix all drivers? drivers can > > set arbitrary max sector limit. > > The issue only exists if drivers(fs, dm, md, bcache) do not use bio_add_page(). > All this kind of usage shouldn't be encouraged. bio_add_page can add pages to big bio too, there is no limitation. > So how about fixing the issue by introducing the limit into get_max_io_size()? > Such as, add something like below at the end of this function? > > sectors = min_t(unsigned, sectors, BIO_MAX_PAGES << > (PAGE_CACHE_SHIFT - 9)); I can do this, just don't see the point why. max_sectors is a software limitation. Thanks, Shaohua