Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752634AbdHNPGh (ORCPT ); Mon, 14 Aug 2017 11:06:37 -0400 Received: from mail-pg0-f42.google.com ([74.125.83.42]:34565 "EHLO mail-pg0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751965AbdHNPGf (ORCPT ); Mon, 14 Aug 2017 11:06:35 -0400 Date: Tue, 15 Aug 2017 00:06:20 +0900 From: Minchan Kim To: Jens Axboe Cc: Christoph Hellwig , Dan Williams , Matthew Wilcox , Andrew Morton , Linux Kernel Mailing List , linux-mm , Ross Zwisler , "karam . lee" , seungho1.park@lge.com, Dave Chinner , Jan Kara , Vishal Verma , "linux-nvdimm@lists.01.org" , kernel-team Subject: Re: [PATCH v1 2/6] fs: use on-stack-bio if backing device has BDI_CAP_SYNC capability Message-ID: <20170814150620.GA12657@bgram> References: <20170808132904.GC31390@bombadil.infradead.org> <20170809015113.GB32338@bbox> <20170809023122.GF31390@bombadil.infradead.org> <20170809024150.GA32471@bbox> <20170810030433.GG31390@bombadil.infradead.org> <20170811104615.GA14397@lst.de> <20c5b30a-b787-1f46-f997-7542a87033f8@kernel.dk> <20170814085042.GG26913@bbox> <51f7472a-977b-be69-2688-48f2a0fa6fb3@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51f7472a-977b-be69-2688-48f2a0fa6fb3@kernel.dk> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2595 Lines: 55 On Mon, Aug 14, 2017 at 08:36:00AM -0600, Jens Axboe wrote: > On 08/14/2017 02:50 AM, Minchan Kim wrote: > > Hi Jens, > > > > On Fri, Aug 11, 2017 at 08:26:59AM -0600, Jens Axboe wrote: > >> On 08/11/2017 04:46 AM, Christoph Hellwig wrote: > >>> On Wed, Aug 09, 2017 at 08:06:24PM -0700, Dan Williams wrote: > >>>> I like it, but do you think we should switch to sbvec[] to > >>>> preclude pathological cases where nr_pages is large? > >>> > >>> Yes, please. > >>> > >>> Then I'd like to see that the on-stack bio even matters for > >>> mpage_readpage / mpage_writepage. Compared to all the buffer head > >>> overhead the bio allocation should not actually matter in practice. > >> > >> I'm skeptical for that path, too. I also wonder how far we could go > >> with just doing a per-cpu bio recycling facility, to reduce the cost > >> of having to allocate a bio. The on-stack bio parts are fine for > >> simple use case, where simple means that the patch just special > >> cases the allocation, and doesn't have to change much else. > >> > >> I had a patch for bio recycling and batched freeing a year or two > >> ago, I'll see if I can find and resurrect it. > > > > So, you want to go with per-cpu bio recycling approach to > > remove rw_page? > > > > So, do you want me to hold this patchset? > > I don't want to hold this series up, but I do think the recycling is > a cleaner approach since we don't need to special case anything. I > hope I'll get some time to dust it off, retest, and post soon. I don't know how your bio recycling works. But my worry when I heard per-cpu bio recycling firstly is if it's not reserved pool for BDI_CAP_SYNCHRONOUS(IOW, if it is shared by several storages), BIOs can be consumed by slow device(e.g., eMMC) so that a bio for fastest device(e.g., zram in embedded system) in the system can be stucked to wait on bio until IO for slow deivce is completed. I guess it would be a not rare case for swap device under severe memory pressure because lots of page cache are already reclaimed when anonymous page start to be reclaimed so that many BIOs can be consumed for eMMC to fetch code but swap IO to fetch heap data would be stucked although zram-swap is much faster than eMMC. As well, time to wait to get BIO among even fastest devices is simple waste, I guess. To me, bio suggested by Christoph Hellwig isn't diverge current path a lot and simple enough to change. Anyway, I'm okay with either way if we can remove rw_page without any regression because the maintainance of both rw_page and make_request is rather burden for zram, too.