Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752367AbdHGIXx (ORCPT ); Mon, 7 Aug 2017 04:23:53 -0400 Received: from LGEAMRELO11.lge.com ([156.147.23.51]:51125 "EHLO lgeamrelo11.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751442AbdHGIXv (ORCPT ); Mon, 7 Aug 2017 04:23:51 -0400 X-Original-SENDERIP: 156.147.1.151 X-Original-MAILFROM: minchan@kernel.org X-Original-SENDERIP: 10.177.220.163 X-Original-MAILFROM: minchan@kernel.org Date: Mon, 7 Aug 2017 17:23:47 +0900 From: Minchan Kim To: Dan Williams Cc: Ross Zwisler , Matthew Wilcox , Andrew Morton , "linux-kernel@vger.kernel.org" , "karam . lee" , Jerome Marchand , Nitin Gupta , seungho1.park@lge.com, Christoph Hellwig , Dave Chinner , Jan Kara , Jens Axboe , Vishal Verma , "linux-nvdimm@lists.01.org" , Dave Jiang Subject: Re: [PATCH 0/3] remove rw_page() from brd, pmem and btt Message-ID: <20170807082347.GA24466@bbox> References: <20170728165604.10455-1-ross.zwisler@linux.intel.com> <20170728173143.GE15980@bombadil.infradead.org> <20170802221359.GA20666@linux.intel.com> <20170803001315.GF32020@bbox> <20170803211335.GA1260@linux.intel.com> <20170804035441.GA305@bbox> <20170804081740.GA2083@bbox> <20170804182109.GA16128@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3264 Lines: 71 On Fri, Aug 04, 2017 at 11:24:49AM -0700, Dan Williams wrote: > On Fri, Aug 4, 2017 at 11:21 AM, Ross Zwisler > wrote: > > On Fri, Aug 04, 2017 at 11:01:08AM -0700, Dan Williams wrote: > >> [ adding Dave who is working on a blk-mq + dma offload version of the > >> pmem driver ] > >> > >> On Fri, Aug 4, 2017 at 1:17 AM, Minchan Kim wrote: > >> > On Fri, Aug 04, 2017 at 12:54:41PM +0900, Minchan Kim wrote: > >> [..] > >> >> Thanks for the testing. Your testing number is within noise level? > >> >> > >> >> I cannot understand why PMEM doesn't have enough gain while BTT is significant > >> >> win(8%). I guess no rw_page with BTT testing had more chances to wait bio dynamic > >> >> allocation and mine and rw_page testing reduced it significantly. However, > >> >> in no rw_page with pmem, there wasn't many cases to wait bio allocations due > >> >> to the device is so fast so the number comes from purely the number of > >> >> instructions has done. At a quick glance of bio init/submit, it's not trivial > >> >> so indeed, i understand where the 12% enhancement comes from but I'm not sure > >> >> it's really big difference in real practice at the cost of maintaince burden. > >> > > >> > I tested pmbench 10 times in my local machine(4 core) with zram-swap. > >> > In my machine, even, on-stack bio is faster than rw_page. Unbelievable. > >> > > >> > I guess it's really hard to get stable result in severe memory pressure. > >> > It would be a result within noise level(see below stddev). > >> > So, I think it's hard to conclude rw_page is far faster than onstack-bio. > >> > > >> > rw_page > >> > avg 5.54us > >> > stddev 8.89% > >> > max 6.02us > >> > min 4.20us > >> > > >> > onstack bio > >> > avg 5.27us > >> > stddev 13.03% > >> > max 5.96us > >> > min 3.55us > >> > >> The maintenance burden of having alternative submission paths is > >> significant especially as we consider the pmem driver ising more > >> services of the core block layer. Ideally, I'd want to complete the > >> rw_page removal work before we look at the blk-mq + dma offload > >> reworks. > >> > >> The change to introduce BDI_CAP_SYNC is interesting because we might > >> have use for switching between dma offload and cpu copy based on > >> whether the I/O is synchronous or otherwise hinted to be a low latency > >> request. Right now the dma offload patches are using "bio_segments() > > >> 1" as the gate for selecting offload vs cpu copy which seem > >> inadequate. > > > > Okay, so based on the feedback above and from Jens[1], it sounds like we want > > to go forward with removing the rw_page() interface, and instead optimize the > > regular I/O path via on-stack BIOS and dma offload, correct? > > > > If so, I'll prepare patches that fully remove the rw_page() code, and let > > Minchan and Dave work on their optimizations. > > I think the conversion to on-stack-bio should be done in the same > patchset that removes rw_page. We don't want to leave a known > performance regression while the on-stack-bio work is in-flight. Okay. It seems everyone get an agreement with on-stack-bio. I will send my formal patchset including Ross's patches which removes rw_page. Thanks. Thanks.