Date: 31 Jan 2005 10:55:38 -0000
Message-ID: <20050131105538.9149.qmail@science.horizon.com>
From: linux@horizon.com
To: nigelenki@comcast.net
Subject: Re: Patch 4/6  randomize the stack pointer
Cc: linux@horizon.com, linux-kernel@vger.kernel.org
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3936
Lines: 89

> Why not compromise, if possible?  256M of randomization, but move the
> split up to 3.5/0.5 gig, if possible.  I seem to recall seeing an option
> (though I think it was UML) to do 3.5/0.5 before; and I'm used to "a
> little worse" meaning "microbenches say it's worse, but you won't notice
> it," so perhaps this would be a good compromise.  How well tuned can
> 3G/1G be?  Come on, 1G is just a big friggin' even number.

Ah, grasshopper, so much you have to learn...
In particular, prople these days are more likely to want to move the
split DOWN rather than UP.

First point: it is important that the split happens at an "even" boundary
for the highest-level page table.  This makes it as simple as possible to
copy the shared global pages into each process' page tables.

On typical x86, each table is 1024 entries long, so the top table maps
4G/1024 = 4M sections.

However, with PAE (Physical Address Extensions), a 32-bit page table
entry is no longer enough to hold the 36-bit physical address.  Instead,
the entries are 64 bits long, so only 512 fit into a page.  With a
4K page and 18 more bits from page tables, two levels will map only
30 bits of the 32-bit virtual address space.  So Intel added a small,
4-entry third-level page table.

With PAE, you are indeed limited to 1G boundaries.  (Unless you want to
seriously overhaul mm setup and teardown.)


Secondly, remember that, unless you want to pay a performance penalty
for enabling one of the highmem options, you have to fit ALL of physical
memory, PLUS memory-mapped IO (say around 128M) into the kernel's portion
of the address space.  512M of kernel space isn't enough unless you have
less than 512M (like 384M) of memory to keep track of.

That is getting less common, *especially* on servers.  (Which are
presumably an important target audience for buffer overflow defenses.)


Indeed, if you have a lot of RAM and you don't have a big database that
needs tons of virtual address space, it's usually worth moving the
split DOWN.


Now, what about the case where you have gobs of RAM and need a highmem
option anyway?  Well, there's a limit to what you can use high mem for.
Application memory and page cache, yes.  Kernel data structures, no.
You can't use it for dcache or inodes or network sockets or page
tables or the struct page array.

And the mem_map array of struct pages (on x86, it's 32 bytes per page,
or 1/128 of physical memory; 32M for a 4G machine) is a fixed overhead
that's subtracted before you even start.  Fully-populated 64G x86
machines need 512M of mem_map, and the remaining space isn't enough to
really run well in.

If you crunch kernel lowmem too tightly, that becomes the
performance-limiting resource.


Anyway, the split between user and kernel address space is mandated
by:
- Kernel space wants to be as bit as physical RAM if possible, or not
  more than 10x smaller if not.
- User space really depends on the application, but larger than 2-3x
  physical memory is pointless, as trying to actually use it all will
  swap you to death.

So for 1G of physical RAM, 3G:1G is pretty close to perfect.  It was
NOT pulled out of a hat.  Depending on your applications, you may be
able to get away with a smaller user virtual address space, which could
allow you to work with more RAM without needing to slow the kernel with
highmem code.


You'll find another discussion of the issues at
http://kerneltrap.org/node/2450
http://lwn.net/Articles/75174/

Finally, could I suggest a little more humility when addressing the
assembled linux-kernel developers?  I've seen Linus have to eat his
words a time or two, and I know I can't do as well.
http://marc.theaimsgroup.com/?m=91723854823435
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/