Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262124AbUKVQk7 (ORCPT ); Mon, 22 Nov 2004 11:40:59 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262159AbUKVQhx (ORCPT ); Mon, 22 Nov 2004 11:37:53 -0500 Received: from mx1.redhat.com ([66.187.233.31]:49038 "EHLO mx1.redhat.com") by vger.kernel.org with ESMTP id S262153AbUKVQHX (ORCPT ); Mon, 22 Nov 2004 11:07:23 -0500 From: David Howells In-Reply-To: <20041122144127.GE2714@holomorphy.com> References: <20041122144127.GE2714@holomorphy.com> <11948.1101130077@redhat.com> To: William Lee Irwin III Cc: torvalds@osdl.org, akpm@osdl.org, hch@infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] Compound page overhaul User-Agent: EMH/1.14.1 SEMI/1.14.5 (Awara-Onsen) FLIM/1.14.5 (Demachiyanagi) APEL/10.6 Emacs/21.3 (i386-redhat-linux-gnu) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.5 - "Awara-Onsen") Content-Type: text/plain; charset=US-ASCII Date: Mon, 22 Nov 2004 16:07:06 +0000 Message-ID: <14391.1101139626@redhat.com> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5937 Lines: 149 William Lee Irwin III wrote: > > (1) A new bit flag PG_compound_slave has been added. This is used to mark > > ... > There are a lot of ways to do these things. Most of it is bitpacking > and dodging assumptions in other code about various fields always being > something or other they expect (e.g. bh's vs. page->private). I want to avoid putting magic numbers in page->private. What goes in there could be anything, as it's up to the filesystem. Do you have any preferences? I'd prefer to use page->mapping, I think, except that's used for the destructor. I should probably make a set-destructor function for hugetlbfs to call. > A generally innocuous rearrangement. Some explanation of the advantage > of these new bitpacking and field arrangements over the current > arrangement may be good to have. The only differences are: (1) PG_compound_slave now exists. (2) I'm permitting the owner of the page to do do what it will with page->private on the first page. > > (3) __page_first() is now provided to find the first page of any page set > > (even single page sets). > > I have to question the underscores. The underscores can be dropped if they're not wanted. > Also, there's a commonly-used term in the superpage literature, ``head of > the superpage'', that may be more easily recognizable for readers familiar > with that but not Linux specifics, but that's just nomenclature and not > particularly pressing or any kind of requirement, just a non-Linux > precedent. I couldn't think of a good name for it, so I settled on it being the first page. How about page_head()? > __GFP_COMP was introduced because several unusual drivers allocate > higher-order pages and then move on to free fragments of them. There's > a small danger some others may allocate higher-order pages and then > treat each piece as a separate entity (particularly in the freeing pass). I wasn't aware of that. Looking at the mm code, doing a fragmentary release would cause bad_page() to be invoked. Presumably these drivers modify the various struct pages involved directly to keep the allocator happy. It would be better, I think, to provide a page splitter function. Thus allowing pages to be cut in half, and then have the two halves made into the equivalent allocated pages. > Sweeping affected drivers to use a fragmenting primitive may help here. Do you know which drivers? > > (6) compound_page_order() is now available. This will indicate the order > ... > Possible, but it's likely a micro-optimization to cache the order in > registers across function calls. The allocator is something of a ``hot > path'' and small alterations can have noticeable effects. Yes... but the order gets examined anyway in the free page checker, and the second plus page structs get modified too, so I don't think it'll make much of a difference. Plus the filesystem or driver that owned the page won't need to keep track of the size, nor will it need to calculate it. > > (7) Trailing spaces have been cleaned up on lines in page_alloc.c. > > I like this quite a bit. =) (defun trim () "Delete trailing whitespace in buffer" (interactive) (save-excursion (goto-char (point-min)) (replace-regexp "[ \t\r]+$" "") (goto-char (point-max)) (skip-chars-backward "\n") (if (not (eobp)) (delete-region (1+ (point)) (point-max))))) > > (8) bootmem.c now has a separate path to release pages to the main > > allocator > ... > Clearly it could merely scan the bitmap for the largest properly-sized, > properly-aligned leading run of free bits beyond even that, though I > wouldn't expect you to pursue that as it's far beyond the scope of the > patch. I was hit up to deal with bootmem.c issues, and will be looking > into that and more after the current set of bootmem changes has settled > down and ia64 bootstrap has been stable for a while. I may look at doing this after this patch (or similar) goes in. If so, I'll send you the patch. > (2) The physaddr alignment comment in bootmem.c is mangled. It's not > O(LOG2(BITS_PER_LONG)) -aligned, it's exactly LOG2(BITS_PER_LONG) > aligned. But we don't have a LOG2(...) macro, we have fls()/ffs(). I suspect that's meant to be mathematical notation, not strictly compilable code, though I think there may be a missing "if" in it. > (3) page_count() probably deserves the %0*lx treatment in __bad_page(). > Conserving screenspace when possible helps some, though that's > offset a bit against predictable field alignment. Maybe putting > variable-length fields at the end of the line would help. I don't think that matters too much. This message should never be seen, after all... That said, I think I should probably provide a 64-bit version too... some of the fields will have 16-char widths there. > Also, the pfn would be great to have there, too, while you're at it. Okay. > (4) I wonder if anyone's run with CONFIG_DEBUG_PAGEALLOC recently. > bootmem.c seems a bit early for kernel_map_pages() et al. > It could be okay depending. I can try it. BTW, should the free page checking be contingent on this option? Or maybe it should have its own option. > (5) This patch does a fair number of different things and it takes a > bit of effort to wade through some of the longer rearrangements > as they overflow 80x24. It would be helpful for reviewers if you > could break this down into a somewhat more easily-digestible > series of smaller patches. It's a little tricky to break it up logically since it's mostly incredibly interrelated. I could separate out some of the cleanups: rearrangement between files, trailing space splatting. David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/