Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756488AbZDVRas (ORCPT ); Wed, 22 Apr 2009 13:30:48 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754960AbZDVRaX (ORCPT ); Wed, 22 Apr 2009 13:30:23 -0400 Received: from e3.ny.us.ibm.com ([32.97.182.143]:38308 "EHLO e3.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754847AbZDVRaV (ORCPT ); Wed, 22 Apr 2009 13:30:21 -0400 Subject: Re: [PATCH 02/22] Do not sanity check order in the fast path From: Dave Hansen To: Mel Gorman Cc: Linux Memory Management List , KOSAKI Motohiro , Christoph Lameter , Nick Piggin , Linux Kernel Mailing List , Lin Ming , Zhang Yanmin , Peter Zijlstra , Pekka Enberg , Andrew Morton In-Reply-To: <20090422171151.GF15367@csn.ul.ie> References: <1240408407-21848-1-git-send-email-mel@csn.ul.ie> <1240408407-21848-3-git-send-email-mel@csn.ul.ie> <1240416791.10627.78.camel@nimitz> <20090422171151.GF15367@csn.ul.ie> Content-Type: text/plain Date: Wed, 22 Apr 2009 10:30:15 -0700 Message-Id: <1240421415.10627.93.camel@nimitz> Mime-Version: 1.0 X-Mailer: Evolution 2.22.3.1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4019 Lines: 93 On Wed, 2009-04-22 at 18:11 +0100, Mel Gorman wrote: > On Wed, Apr 22, 2009 at 09:13:11AM -0700, Dave Hansen wrote: > > On Wed, 2009-04-22 at 14:53 +0100, Mel Gorman wrote: > > > No user of the allocator API should be passing in an order >= MAX_ORDER > > > but we check for it on each and every allocation. Delete this check and > > > make it a VM_BUG_ON check further down the call path. > > > > Should we get the check re-added to some of the upper-level functions, > > then? Perhaps __get_free_pages() or things like alloc_pages_exact()? > > I don't think so, no. It just moves the source of the text bloat and > for the few callers that are asking for something that will never > succeed. Well, it's a matter of figuring out when it can succeed. Some of this stuff, we can figure out at compile-time. Others are a bit harder. > > I'm selfishly thinking of what I did in profile_init(). Can I slab > > alloc it? Nope. Page allocator? Nope. Oh, well, try vmalloc(): > > > > prof_buffer = kzalloc(buffer_bytes, GFP_KERNEL); > > if (prof_buffer) > > return 0; > > > > prof_buffer = alloc_pages_exact(buffer_bytes, GFP_KERNEL|__GFP_ZERO); > > if (prof_buffer) > > return 0; > > > > prof_buffer = vmalloc(buffer_bytes); > > if (prof_buffer) > > return 0; > > > > free_cpumask_var(prof_cpu_mask); > > return -ENOMEM; > > > > Can this ever actually be asking for an order larger than MAX_ORDER > though? If so, you're condemning it to always behave poorly. Yeah. It is based on text size. Smaller kernels with trimmed configs and no modules have no problem fitting under MAX_ORDER, as do kernels with larger base page sizes. > > Same thing in __kmalloc_section_memmap(): > > > > page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size)); > > if (page) > > goto got_map_page; > > > > ret = vmalloc(memmap_size); > > if (ret) > > goto got_map_ptr; > > > > If I'm reading that right, the order will never be a stupid order. It can fail > for higher orders in which case it falls back to vmalloc() . For example, > to hit that limit, the section size for a 4K kernel, maximum usable order > of 10, the section size would need to be 256MB (assuming struct page size > of 64 bytes). I don't think it's ever that size and if so, it'll always be > sub-optimal which is a poor choice to make. I think the section size default used to be 512M on x86 because we concentrate on removing whole DIMMs. > > I depend on the allocator to tell me when I've fed it too high of an > > order. If we really need this, perhaps we should do an audit and then > > add a WARN_ON() for a few releases to catch the stragglers. > > I consider it buggy to ask for something so large that you always end up > with the worst option - vmalloc(). How about leaving it as a VM_BUG_ON > to get as many reports as possible on who is depending on this odd > behaviour? > > If there are users with good reasons, then we could convert this to WARN_ON > to fix up the callers. I suspect that the allocator can already cope with > recieving a stupid order silently but slowly. It should go all the way to the > bottom and just never find anything useful and return NULL. zone_watermark_ok > is the most dangerous looking part but even it should never get to MAX_ORDER > because it should always find there are not enough free pages and return > before it overruns. Whatever we do, I'd agree that it's fine that this is a degenerate case that gets handled very slowly and as far out of hot paths as possible. Anybody who can fall back to a vmalloc is not doing these things very often. -- Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/