Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752278AbdHDSBO (ORCPT ); Fri, 4 Aug 2017 14:01:14 -0400 Received: from mail-vk0-f42.google.com ([209.85.213.42]:35848 "EHLO mail-vk0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751216AbdHDSBM (ORCPT ); Fri, 4 Aug 2017 14:01:12 -0400 MIME-Version: 1.0 In-Reply-To: <20170804081740.GA2083@bbox> References: <20170728165604.10455-1-ross.zwisler@linux.intel.com> <20170728173143.GE15980@bombadil.infradead.org> <20170802221359.GA20666@linux.intel.com> <20170803001315.GF32020@bbox> <20170803211335.GA1260@linux.intel.com> <20170804035441.GA305@bbox> <20170804081740.GA2083@bbox> From: Dan Williams Date: Fri, 4 Aug 2017 11:01:08 -0700 Message-ID: Subject: Re: [PATCH 0/3] remove rw_page() from brd, pmem and btt To: Minchan Kim Cc: Ross Zwisler , Matthew Wilcox , Andrew Morton , "linux-kernel@vger.kernel.org" , "karam . lee" , Jerome Marchand , Nitin Gupta , seungho1.park@lge.com, Christoph Hellwig , Dave Chinner , Jan Kara , Jens Axboe , Vishal Verma , "linux-nvdimm@lists.01.org" , Dave Jiang Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2086 Lines: 48 [ adding Dave who is working on a blk-mq + dma offload version of the pmem driver ] On Fri, Aug 4, 2017 at 1:17 AM, Minchan Kim wrote: > On Fri, Aug 04, 2017 at 12:54:41PM +0900, Minchan Kim wrote: [..] >> Thanks for the testing. Your testing number is within noise level? >> >> I cannot understand why PMEM doesn't have enough gain while BTT is significant >> win(8%). I guess no rw_page with BTT testing had more chances to wait bio dynamic >> allocation and mine and rw_page testing reduced it significantly. However, >> in no rw_page with pmem, there wasn't many cases to wait bio allocations due >> to the device is so fast so the number comes from purely the number of >> instructions has done. At a quick glance of bio init/submit, it's not trivial >> so indeed, i understand where the 12% enhancement comes from but I'm not sure >> it's really big difference in real practice at the cost of maintaince burden. > > I tested pmbench 10 times in my local machine(4 core) with zram-swap. > In my machine, even, on-stack bio is faster than rw_page. Unbelievable. > > I guess it's really hard to get stable result in severe memory pressure. > It would be a result within noise level(see below stddev). > So, I think it's hard to conclude rw_page is far faster than onstack-bio. > > rw_page > avg 5.54us > stddev 8.89% > max 6.02us > min 4.20us > > onstack bio > avg 5.27us > stddev 13.03% > max 5.96us > min 3.55us The maintenance burden of having alternative submission paths is significant especially as we consider the pmem driver ising more services of the core block layer. Ideally, I'd want to complete the rw_page removal work before we look at the blk-mq + dma offload reworks. The change to introduce BDI_CAP_SYNC is interesting because we might have use for switching between dma offload and cpu copy based on whether the I/O is synchronous or otherwise hinted to be a low latency request. Right now the dma offload patches are using "bio_segments() > 1" as the gate for selecting offload vs cpu copy which seem inadequate.