Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1162892AbbKTUgW (ORCPT ); Fri, 20 Nov 2015 15:36:22 -0500 Received: from ipmail06.adl2.internode.on.net ([150.101.137.129]:62011 "EHLO ipmail06.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759896AbbKTUgU (ORCPT ); Fri, 20 Nov 2015 15:36:20 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2BeCAAOg09WPALtLHleKAECgxBTb4JfqSABAQEBAQEGiz2FLoQNHYVsAgIBAQKBR00BAQEBAQEHAQEBAUABP4Q0AQEBAwEnExwjBQsIAw4KCSUPBSUDBxoTiCYHwQMBAQEBAQUBAQEBGwQZhXSFRYJxgUODcIEVBZJqg2KFI4gDnFWEeyo0g2GBSgEBAQ Date: Sat, 21 Nov 2015 07:36:02 +1100 From: Dave Chinner To: Brian Foster Cc: linux-fsdevel@vger.kernel.org, Octavian Purdila , linux-kernel@vger.kernel.org, xfs@oss.sgi.com Subject: Re: [RFC PATCH] xfs: support for non-mmu architectures Message-ID: <20151120203602.GA26718@dastard> References: <1447800381-20167-1-git-send-email-octavian.purdila@intel.com> <20151119155525.GB13055@bfoster.bfoster> <20151119233547.GN14311@dastard> <20151120151118.GB60886@bfoster.bfoster> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151120151118.GB60886@bfoster.bfoster> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4322 Lines: 95 On Fri, Nov 20, 2015 at 10:11:19AM -0500, Brian Foster wrote: > On Fri, Nov 20, 2015 at 10:35:47AM +1100, Dave Chinner wrote: > > On Thu, Nov 19, 2015 at 10:55:25AM -0500, Brian Foster wrote: > > > On Wed, Nov 18, 2015 at 12:46:21AM +0200, Octavian Purdila wrote: > > > > Naive implementation for non-mmu architectures: allocate physically > > > > contiguous xfs buffers with alloc_pages. Terribly inefficient with > > > > memory and fragmentation on high I/O loads but it may be good enough > > > > for basic usage (which most non-mmu architectures will need). > > > > > > > > This patch was tested with lklfuse [1] and basic operations seems to > > > > work even with 16MB allocated for LKL. > > > > > > > > [1] https://github.com/lkl/linux > > > > > > > > Signed-off-by: Octavian Purdila > > > > --- > > > > > > Interesting, though this makes me wonder why we couldn't have a new > > > _XBF_VMEM (for example) buffer type that uses vmalloc(). I'm not > > > familiar with mmu-less context, but I see that mm/nommu.c has a > > > __vmalloc() interface that looks like it ultimately translates into an > > > alloc_pages() call. Would that accomplish what this patch is currently > > > trying to do? > > > > vmalloc is always a last resort. vmalloc space on 32 bit systems is > > extremely limited and it is easy to exhaust with XFS. > > > > Sure, but my impression is that a vmalloc() buffer is roughly equivalent > in this regard to a current !XBF_UNMAPPED && size > PAGE_SIZE buffer. We > just do the allocation and mapping separately (presumably for other > reasons). Yes, it'a always a last resort. We don't use vmap'd buffers very much on block size <= page size filesystems (e.g. iclog buffers are the main user in such cases, IIRC), so the typical 32 bit system doesn't have major problems with vmalloc space. However, the moment you increase the directory block size > block size, that all goes out the window... > > Also, vmalloc limits the control we have over allocation context > > (e.g. the hoops we jump through in kmem_alloc_large() to maintain > > GFP_NOFS contexts), so just using vmalloc doesn't make things much > > simpler from an XFS perspective. > > > > The comment in kmem_zalloc_large() calls out some apparent hardcoded > allocation flags down in the depths of vmalloc(). It looks to me that > page allocation (__vmalloc_area_node()) actually uses the provided > flags, so I'm not following the "data page" part of that comment. You can pass gfp flags for the page allocation part of vmalloc, but not the pte allocation part of it. That's what the hacks in kmem_zalloc_large() are doing. > Indeed, I do see that this is not the case down in calls like > pmd_alloc_one(), pte_alloc_one_kernel(), etc., associated with page > table management. Right. > Those latter calls are all from following down through the > map_vm_area()->vmap_page_range() codepath from __vmalloc_area_node(). We > call vm_map_ram() directly from _xfs_buf_map_pages(), which itself calls > down into the same code. Indeed, we already protect ourselves here via > the same memalloc_noio_save() mechanism that kmem_zalloc_large() uses. Yes, we do, but that is separately handled to the allocation of the pages, which we have to do for all types of buffers, mapped or unmapped, because xfs_buf_ioapply_map() requires direct access to the underlying pages to build the bio for IO. If we delegate the allocation of pages to vmalloc, we don't have direct reference to the underlying pages and so we have to do something completely diffferent to build the bios for the buffer.... > I suspect there's more to it than that because it does look like > vm_map_ram() has a different mechanism for managing vmalloc space for > certain (smaller) allocations, either of which I'm not really familiar > with. Yes, it manages vmalloc space quite differently, and there are different scalability aspects to consider as well - vm_map_ram was pretty much written for the use XFS has in xfs_buf.c... Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/