Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756407AbcCOQGE (ORCPT ); Tue, 15 Mar 2016 12:06:04 -0400 Received: from mga02.intel.com ([134.134.136.20]:48607 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756191AbcCOQF7 (ORCPT ); Tue, 15 Mar 2016 12:05:59 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,340,1455004800"; d="scan'208";a="66802500" Date: Tue, 15 Mar 2016 16:03:22 +0000 From: Keith Busch To: Vitaly Kuznetsov Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Jens Axboe , Dan Williams , "Martin K. Petersen" , Sagi Grimberg , Mike Snitzer , "K. Y. Srinivasan" , Cathy Avery Subject: Re: [PATCH RFC] block: fix bio merge checks when virt_boundary is set Message-ID: <20160315160321.GA30533@localhost.lm.intel.com> References: <1458055076-2175-1-git-send-email-vkuznets@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1458055076-2175-1-git-send-email-vkuznets@redhat.com> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1229 Lines: 32 On Tue, Mar 15, 2016 at 04:17:56PM +0100, Vitaly Kuznetsov wrote: > The reason of the slowdown is the fact that bios don't get merged and we > end up sending many short requests to the host. My investigation led me to > the following code (__bvec_gap_to_prev()): > > return offset || > ((bprv->bv_offset + bprv->bv_len) & queue_virt_boundary(q)); > > Here is an example: we have two bio_vec with the following content: > bprv.bv_offset = 512 > bprv.bv_len = 512 > > bnxt.bv_offset = 1024 > bnxt.bv_len = 512 > > bprv.bv_page == bnxt.bv_page > virt_boundary is set to PAGE_SIZE-1 > > The above mentioned code will report that a gap will appear if we merge > these two (as offset = 1024) but this doesn't look sane. On top of that, > we have the following optimization in bio_add_pc_page(): > > if (page == prev->bv_page && > offset == prev->bv_offset + prev->bv_len) { > prev->bv_len += len; > bio->bi_iter.bi_size += len; > goto done; > } This part sounds odd. Why is a filesystem using bio_add_pc_page? Shouldn't these go through "bio_add_page" instead? That already has an optimization to combine bio's within the same page.