Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933979AbXINQKu (ORCPT ); Fri, 14 Sep 2007 12:10:50 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1758858AbXINQKj (ORCPT ); Fri, 14 Sep 2007 12:10:39 -0400 Received: from mx1.Informatik.Uni-Tuebingen.De ([134.2.12.5]:63042 "EHLO mx1.informatik.uni-tuebingen.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756578AbXINQKh (ORCPT ); Fri, 14 Sep 2007 12:10:37 -0400 From: Goswin von Brederlow To: Nick Piggin Cc: Christoph Lameter , Mel Gorman , andrea@suse.de, torvalds@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Christoph Hellwig , Mel Gorman , William Lee Irwin III , David Chinner , Jens Axboe , Badari Pulavarty , Maxim Levitsky , Fengguang Wu , swin wang , totty.lu@gmail.com, hugh@veritas.com, joern@lazybastard.org Subject: Re: [00/41] Large Blocksize Support V7 (adds memmap support) References: <20070911060349.993975297@sgi.com> <200709111606.10873.nickpiggin@yahoo.com.au> <200709120407.48344.nickpiggin@yahoo.com.au> Date: Fri, 14 Sep 2007 18:10:31 +0200 In-Reply-To: <200709120407.48344.nickpiggin@yahoo.com.au> (Nick Piggin's message of "Wed, 12 Sep 2007 04:07:47 +1000") Message-ID: <87zlzpqlc8.fsf@informatik.uni-tuebingen.de> User-Agent: Gnus/5.110006 (No Gnus v0.6) XEmacs/21.4.19 (linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4096 Lines: 91 Hi, Nick Piggin writes: > In my attack, I cause the kernel to allocate lots of unmovable allocations > and deplete movable groups. I theoretically then only need to keep a > small number (1/2^N) of these allocations around in order to DoS a > page allocation of order N. I'm assuming that when an unmovable allocation hijacks a movable group any further unmovable alloc will evict movable objects out of that group before hijacking another one. right? > And it doesn't even have to be a DoS. The natural fragmentation > that occurs today in a kernel today has the possibility to slowly push out > the movable groups and give you the same situation. How would you cause that? Say you do want to purposefully place one unmovable 4k page into every 64k compund page. So you allocate 4K. First 64k page locked. But now, to get 4K into the second 64K page you have to first use up all the rest of the first 64k page. Meaning one 4k chunk, one 8k chunk, one 16k cunk, one 32k chunk. Only then will a new 64k chunk be broken and become locked. So to get the last 64k chunk used all previous 32k chunks need to be blocked and you need to allocate 32k (or less if more is blocked). For all previous 32k chunks to be blocked every second 16k needs to be blocked. To block the last of those 16k chunks all previous 8k chunks need to be blocked and you need to allocate 8k. For all previous 8k chunks to be blocked every second 4k page needs to be used. To alloc the last of those 4k pages all previous 4k pages need to be used. So to construct a situation where no continious 64k chunk is free you have to allocate - 64k - 32k - 16k - 8k - 4k (or there about) of memory first. Only then could you free memory again while still keeping every 64k page blocked. Does that occur naturally given enough ram to start with? Too see how bad fragmentation could be I wrote a little progamm to simulate allocations with the following simplified alogrithm: Memory management: - Free pages are kept in buckets, one per order, and sorted by address. - alloc() the front page (smallest address) out of the bucket of the right order or recursively splits the next higher bucket. - free() recursively tries to merge a page with its neighbour and puts the result back into the proper bucket (sorted by address). Allocation and lifetime: - Every tick a new page is allocated with random order. - The order is a triangle distribution with max at 0 (throw 2 dice, add the eyes, subtract 7, abs() the number). - The page is scheduled to be freed after X ticks. Where X is nearly a gaus curve centered at 0 and maximum at * 1.5. (What I actualy do is throw 8 dice and sum them up and shift the result.) Display: I start with a white window. Every page allocation draws a black box from the address of the page and as wide as the page is big (-1 pixel to give a seperation to the next page). Every page free draws a yellow box in place of the black one. Yellow to show where a page was in use at one point while white means the page was never used. As the time ticks the memory fills up. Quickly at first and then comes to a stop around 80% filled. And then something interesting happens. The yellow regions (previously used but now free) start drifting up. Small pages tend to end up in the lower addresses and big pages at the higher addresses. The memory defragments itself to some degree. http://mrvn.homeip.net/fragment/ Simulating 256MB ram and after 1472943 ticks and 530095 4k, 411841 8k, 295296 16k, 176647 32k and 59064 64k allocations you get this: http://mrvn.homeip.net/fragment/256mb.png Simulating 1GB ram and after 5881185 ticks and 2116671 4k, 1645957 8k, 1176994 16k, 705873 32k and 235690 64k allocations you get this: http://mrvn.homeip.net/fragment/1gb.png MfG Goswin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/