Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751948AbdHCVRJ (ORCPT ); Thu, 3 Aug 2017 17:17:09 -0400 Received: from mail-io0-f175.google.com ([209.85.223.175]:34034 "EHLO mail-io0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751791AbdHCVRI (ORCPT ); Thu, 3 Aug 2017 17:17:08 -0400 Subject: Re: [PATCH 0/3] remove rw_page() from brd, pmem and btt To: Ross Zwisler , Minchan Kim Cc: Matthew Wilcox , Andrew Morton , linux-kernel@vger.kernel.org, "karam . lee" , Jerome Marchand , Nitin Gupta , seungho1.park@lge.com, Christoph Hellwig , Dan Williams , Dave Chinner , Jan Kara , Vishal Verma , linux-nvdimm@lists.01.org References: <20170728165604.10455-1-ross.zwisler@linux.intel.com> <20170728173143.GE15980@bombadil.infradead.org> <20170802221359.GA20666@linux.intel.com> <20170803001315.GF32020@bbox> <20170803211335.GA1260@linux.intel.com> From: Jens Axboe Message-ID: <38dbe660-24fe-812f-d66f-1329c2d3aacb@kernel.dk> Date: Thu, 3 Aug 2017 15:17:04 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20170803211335.GA1260@linux.intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4954 Lines: 105 On 08/03/2017 03:13 PM, Ross Zwisler wrote: > On Thu, Aug 03, 2017 at 09:13:15AM +0900, Minchan Kim wrote: >> Hi Ross, >> >> On Wed, Aug 02, 2017 at 04:13:59PM -0600, Ross Zwisler wrote: >>> On Fri, Jul 28, 2017 at 10:31:43AM -0700, Matthew Wilcox wrote: >>>> On Fri, Jul 28, 2017 at 10:56:01AM -0600, Ross Zwisler wrote: >>>>> Dan Williams and Christoph Hellwig have recently expressed doubt about >>>>> whether the rw_page() interface made sense for synchronous memory drivers >>>>> [1][2]. It's unclear whether this interface has any performance benefit >>>>> for these drivers, but as we continue to fix bugs it is clear that it does >>>>> have a maintenance burden. This series removes the rw_page() >>>>> implementations in brd, pmem and btt to relieve this burden. >>>> >>>> Why don't you measure whether it has performance benefits? I don't >>>> understand why zram would see performance benefits and not other drivers. >>>> If it's going to be removed, then the whole interface should be removed, >>>> not just have the implementations removed from some drivers. >>> >>> Okay, I've run a bunch of performance tests with the PMEM and with BTT entry >>> points for rw_pages() in a swap workload, and in all cases I do see an >>> improvement over the code when rw_pages() is removed. Here are the results >>> from my random lab box: >>> >>> Average latency of swap_writepage() >>> +------+------------+---------+-------------+ >>> | | no rw_page | rw_page | Improvement | >>> +-------------------------------------------+ >>> | PMEM | 5.0 us | 4.7 us | 6% | >>> +-------------------------------------------+ >>> | BTT | 6.8 us | 6.1 us | 10% | >>> +------+------------+---------+-------------+ >>> >>> Average latency of swap_readpage() >>> +------+------------+---------+-------------+ >>> | | no rw_page | rw_page | Improvement | >>> +-------------------------------------------+ >>> | PMEM | 3.3 us | 2.9 us | 12% | >>> +-------------------------------------------+ >>> | BTT | 3.7 us | 3.4 us | 8% | >>> +------+------------+---------+-------------+ >>> >>> The workload was pmbench, a memory benchmark, run on a system where I had >>> severely restricted the amount of memory in the system with the 'mem' kernel >>> command line parameter. The benchmark was set up to test more memory than I >>> allowed the OS to have so it spilled over into swap. >>> >>> The PMEM or BTT device was set up as my swap device, and during the test I got >>> a few hundred thousand samples of each of swap_writepage() and >>> swap_writepage(). The PMEM/BTT device was just memory reserved with the >>> memmap kernel command line parameter. >>> >>> Thanks, Matthew, for asking for performance data. It looks like removing this >>> code would have been a mistake. >> >> By suggestion of Christoph Hellwig, I made a quick patch which does IO without >> dynamic bio allocation for swap IO. Actually, it's not formal patch to be >> worth to send mainline yet but I believe it's enough to test the improvement. >> >> Could you test patchset on pmem and btt without rw_page? >> >> For working the patch, block drivers need to declare it's synchronous IO >> device via BDI_CAP_SYNC but if it's hard, you can just make every swap IO >> comes from (sis->flags & SWP_SYNC_IO) with removing condition check >> >> if (!(sis->flags & SWP_SYNC_IO)) in swap_[read|write]page. >> >> Patchset is based on 4.13-rc3. > > Thanks for the patch, here are the updated results from my test box: > > Average latency of swap_writepage() > +------+------------+---------+---------+ > | | no rw_page | minchan | rw_page | > +---------------------------------------- > | PMEM | 5.0 us | 4.98 us | 4.7 us | > +---------------------------------------- > | BTT | 6.8 us | 6.3 us | 6.1 us | > +------+------------+---------+---------+ > > Average latency of swap_readpage() > +------+------------+---------+---------+ > | | no rw_page | minchan | rw_page | > +---------------------------------------- > | PMEM | 3.3 us | 3.27 us | 2.9 us | > +---------------------------------------- > | BTT | 3.7 us | 3.44 us | 3.4 us | > +------+------------+---------+---------+ > > I've added another digit in precision in some cases to help differentiate the > various results. > > In all cases your patches did perform better than with the regularly allocated > BIO, but again for all cases the rw_page() path was the fastest, even if only > marginally. IMHO, the win needs to be pretty substantial to justify keeping a parallel read/write path in the kernel. The recent work of making O_DIRECT faster is exactly the same as what Minchan did here for sync IO. I would greatly prefer one fast path, instead of one fast and one that's just a little faster for some things. It's much better to get everyone behind one path/stack, and make that as fast as it can be. -- Jens Axboe