Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1163106AbbKTPkX (ORCPT ); Fri, 20 Nov 2015 10:40:23 -0500 Received: from mx1.redhat.com ([209.132.183.28]:40468 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934421AbbKTPkW (ORCPT ); Fri, 20 Nov 2015 10:40:22 -0500 Date: Fri, 20 Nov 2015 10:40:20 -0500 From: Brian Foster To: Octavian Purdila Cc: linux-fsdevel , lkml , xfs Subject: Re: [RFC PATCH] xfs: support for non-mmu architectures Message-ID: <20151120154020.GD60886@bfoster.bfoster> References: <1447800381-20167-1-git-send-email-octavian.purdila@intel.com> <20151119155525.GB13055@bfoster.bfoster> <20151119233547.GN14311@dastard> <20151120151118.GB60886@bfoster.bfoster> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5617 Lines: 114 On Fri, Nov 20, 2015 at 05:35:55PM +0200, Octavian Purdila wrote: > On Fri, Nov 20, 2015 at 5:11 PM, Brian Foster wrote: > > On Fri, Nov 20, 2015 at 10:35:47AM +1100, Dave Chinner wrote: > >> On Thu, Nov 19, 2015 at 10:55:25AM -0500, Brian Foster wrote: > >> > On Wed, Nov 18, 2015 at 12:46:21AM +0200, Octavian Purdila wrote: > >> > > Naive implementation for non-mmu architectures: allocate physically > >> > > contiguous xfs buffers with alloc_pages. Terribly inefficient with > >> > > memory and fragmentation on high I/O loads but it may be good enough > >> > > for basic usage (which most non-mmu architectures will need). > >> > > > >> > > This patch was tested with lklfuse [1] and basic operations seems to > >> > > work even with 16MB allocated for LKL. > >> > > > >> > > [1] https://github.com/lkl/linux > >> > > > >> > > Signed-off-by: Octavian Purdila > >> > > --- > >> > > >> > Interesting, though this makes me wonder why we couldn't have a new > >> > _XBF_VMEM (for example) buffer type that uses vmalloc(). I'm not > >> > familiar with mmu-less context, but I see that mm/nommu.c has a > >> > __vmalloc() interface that looks like it ultimately translates into an > >> > alloc_pages() call. Would that accomplish what this patch is currently > >> > trying to do? > >> > >> vmalloc is always a last resort. vmalloc space on 32 bit systems is > >> extremely limited and it is easy to exhaust with XFS. > >> > > > > Sure, but my impression is that a vmalloc() buffer is roughly equivalent > > in this regard to a current !XBF_UNMAPPED && size > PAGE_SIZE buffer. We > > just do the allocation and mapping separately (presumably for other > > reasons). > > > >> Also, vmalloc limits the control we have over allocation context > >> (e.g. the hoops we jump through in kmem_alloc_large() to maintain > >> GFP_NOFS contexts), so just using vmalloc doesn't make things much > >> simpler from an XFS perspective. > >> > > > > The comment in kmem_zalloc_large() calls out some apparent hardcoded > > allocation flags down in the depths of vmalloc(). It looks to me that > > page allocation (__vmalloc_area_node()) actually uses the provided > > flags, so I'm not following the "data page" part of that comment. > > Indeed, I do see that this is not the case down in calls like > > pmd_alloc_one(), pte_alloc_one_kernel(), etc., associated with page > > table management. > > > > Those latter calls are all from following down through the > > map_vm_area()->vmap_page_range() codepath from __vmalloc_area_node(). We > > call vm_map_ram() directly from _xfs_buf_map_pages(), which itself calls > > down into the same code. Indeed, we already protect ourselves here via > > the same memalloc_noio_save() mechanism that kmem_zalloc_large() uses. > > > > I suspect there's more to it than that because it does look like > > vm_map_ram() has a different mechanism for managing vmalloc space for > > certain (smaller) allocations, either of which I'm not really familiar > > with. That aside, I don't see how vmalloc() introduces any new > > allocation context issues for those buffers where we already set up a > > multi-page mapping. > > > > We still have the somewhat customized page allocation code in > > xfs_buf_allocate_memory() to contend with. I actually think it would be > > useful to have a DEBUG sysfs tunable to turn on vmalloc() buffers and > > actually test how effective some of this code is. > > > >> > I ask because it seems like that would help clean up the code a bit, for > >> > one. It might also facilitate some degree of testing of the XFS bits > >> > (even if utilized sparingly in DEBUG mode if it weren't suitable enough > >> > for generic/mmu use). We currently allocate and map the buffer pages > >> > separately and I'm not sure if there's any particular reasons for doing > >> > that outside of some congestion handling in the allocation code and > >> > XBF_UNMAPPED buffers, the latter probably being irrelevant for nommu. > >> > Any other thoughts on that? > >> > >> We could probably clean the code up more (the allocation logic > >> is now largely a historic relic) but I'm not convinced yet that we > >> should be spending any time trying to specifically support mmu-less > >> hardware. > >> > > > > Fair point, we'll see where the use case discussion goes. That said, I > > was a little surprised that this is all that was required to enable > > nommu support. If that is indeed the case and we aren't in for a series > > of subsequent nommu specific changes (Octavian?) by letting this > > through, what's the big deal? This seems fairly harmless to me as is, > > particularly if it can be semi-tested via DEBUG mode and has potential > > generic use down the road. > > > > I don't foresee additional patches. I was able to use lklfuse to mount > an XFS image and perform basic operations. Are there any xfs specific > tests coverage tools I can use to make sure I am not missing anything? > Ok, well you probably want to run some of the tests in xfstests and see if anything falls over: https://git.kernel.org/cgit/fs/xfs/xfstests-dev.git/ Note that this has generic as well as fs-specific tests for other filesystems (ext4, btrfs, etc.). Brian > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/