Hi,
It does annoy me that any 1G i386 machine will end up with 1/8th of the
memory as highmem. A patch like this one has been used in various places
since the early 2.4 days at least, is there a reason why it isn't merged
yet? Note I just hacked this one up, but similar patches abound I'm
sure. Bugs are mine.
Signed-off-by: Jens Axboe <[email protected]>
diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index d849c68..0b2457b 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -444,6 +464,24 @@ config HIGHMEM64G
endchoice
+choice
+ depends on NOHIGHMEM
+ prompt "Memory split"
+ default DEFAULT_3G
+ help
+ Select the wanted split between kernel and user memory. On a 1G
+ machine, the 3G/1G default split will result in 128MiB of high
+ memory. Selecting a 2G/2G split will make all of memory available
+ as low memory. Note that this will make your kernel incompatible
+ with binary only kernel modules.
+
+ config DEFAULT_3G
+ bool "3G/1G user/kernel split"
+ config DEFAULT_2G
+ bool "2G/2G user/kernel split"
+
+endchoice
+
config HIGHMEM
bool
depends on HIGHMEM64G || HIGHMEM4G
diff --git a/include/asm-i386/page.h b/include/asm-i386/page.h
index 73296d9..be5f6b6 100644
--- a/include/asm-i386/page.h
+++ b/include/asm-i386/page.h
@@ -110,10 +110,22 @@ extern int page_is_ram(unsigned long pag
#endif /* __ASSEMBLY__ */
#ifdef __ASSEMBLY__
+#if defined(CONFIG_DEFAULT_3G)
#define __PAGE_OFFSET (0xC0000000)
+#elif defined(CONFIG_DEFAULT_2G)
+#define __PAGE_OFFSET (0x80000000)
+#else
+#error" Bad memory split"
+#endif
#define __PHYSICAL_START CONFIG_PHYSICAL_START
#else
+#if defined(CONFIG_DEFAULT_3G)
#define __PAGE_OFFSET (0xC0000000UL)
+#elif defined(CONFIG_DEFAULT_2G)
+#define __PAGE_OFFSET (0x80000000UL)
+#else
+#error "Bad memory split"
+#endif
#define __PHYSICAL_START ((unsigned long)CONFIG_PHYSICAL_START)
#endif
#define __KERNEL_START (__PAGE_OFFSET + __PHYSICAL_START)
--
Jens Axboe
* Jens Axboe <[email protected]> wrote:
> Hi,
>
> It does annoy me that any 1G i386 machine will end up with 1/8th of
> the memory as highmem. A patch like this one has been used in various
> places since the early 2.4 days at least, is there a reason why it
> isn't merged yet? Note I just hacked this one up, but similar patches
> abound I'm sure. Bugs are mine.
yes, i made it totally configurable in 2.4 days: 1:3, 2/2 and 3:1 splits
were possible. It was a larger patch to enable all this across x86, but
the Kconfig portion was removed a bit later because people _frequently_
misconfigured their kernels and then complained about the results.
so for now the trivial solution is to change the "C" to "8" in the
following line in include/asm-i386/page.h:
> #define __PAGE_OFFSET (0xC0000000)
instead of editing your .config :-)
Maybe we could try the Kconfig solution again, but it'll need alot
better documentation, dependency on KERNEL_DEBUG and some heavy warnings
all around.
Ingo
On Tue, Jan 10 2006, Ingo Molnar wrote:
>
> * Jens Axboe <[email protected]> wrote:
>
> > Hi,
> >
> > It does annoy me that any 1G i386 machine will end up with 1/8th of
> > the memory as highmem. A patch like this one has been used in various
> > places since the early 2.4 days at least, is there a reason why it
> > isn't merged yet? Note I just hacked this one up, but similar patches
> > abound I'm sure. Bugs are mine.
>
> yes, i made it totally configurable in 2.4 days: 1:3, 2/2 and 3:1 splits
> were possible. It was a larger patch to enable all this across x86, but
> the Kconfig portion was removed a bit later because people _frequently_
> misconfigured their kernels and then complained about the results.
How is this different than all other sorts of misconfigurations? As far
as I can tell, the biggest "problem" for some is if they depend on some
binary module that will of course break with a different page offset.
For simplicity, I didn't add more than the 2/2 split, where we could add
even a 3/1 kernel/user or a 0.5/3.5 (I think sles8 had this).
> so for now the trivial solution is to change the "C" to "8" in the
> following line in include/asm-i386/page.h:
>
> > #define __PAGE_OFFSET (0xC0000000)
>
> instead of editing your .config :-)
:-)
That is what I have been doing, but that requires me to carry this patch
along with me all the time. So it annoys me!
I would have posted a simple patch moving it to 0xB0000000 which would
solve the problem for me as well, but I didn't because I'm sure people
would be screaming at me...
> Maybe we could try the Kconfig solution again, but it'll need alot
> better documentation, dependency on KERNEL_DEBUG and some heavy warnings
> all around.
The help text could definitely be improved, it was a 30 second hackup.
Why would you want to make it depend on DEBUG?
--
Jens Axboe
On Tue, 10 Jan 2006, Jens Axboe wrote:
>> yes, i made it totally configurable in 2.4 days: 1:3, 2/2 and 3:1 splits
>> were possible. It was a larger patch to enable all this across x86, but
>> the Kconfig portion was removed a bit later because people _frequently_
>> misconfigured their kernels and then complained about the results.
>
> How is this different than all other sorts of misconfigurations? As far
> as I can tell, the biggest "problem" for some is if they depend on some
> binary module that will of course break with a different page offset.
>
> For simplicity, I didn't add more than the 2/2 split, where we could add
> even a 3/1 kernel/user or a 0.5/3.5 (I think sles8 had this).
I prefer setting __PAGE_OFFSET to (0x78000000) on machines with 2GB of RAM.
This seems to let the kernel use the full 2GB of memory, rather than just
1920-1984 MB (at least back in 2.4 days).
-Byron
--
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [email protected]
On Tue, Jan 10 2006, Byron Stanoszek wrote:
> On Tue, 10 Jan 2006, Jens Axboe wrote:
>
> >>yes, i made it totally configurable in 2.4 days: 1:3, 2/2 and 3:1 splits
> >>were possible. It was a larger patch to enable all this across x86, but
> >>the Kconfig portion was removed a bit later because people _frequently_
> >>misconfigured their kernels and then complained about the results.
> >
> >How is this different than all other sorts of misconfigurations? As far
> >as I can tell, the biggest "problem" for some is if they depend on some
> >binary module that will of course break with a different page offset.
> >
> >For simplicity, I didn't add more than the 2/2 split, where we could add
> >even a 3/1 kernel/user or a 0.5/3.5 (I think sles8 had this).
>
> I prefer setting __PAGE_OFFSET to (0x78000000) on machines with 2GB of RAM.
> This seems to let the kernel use the full 2GB of memory, rather than just
> 1920-1984 MB (at least back in 2.4 days).
That might indeed be a good idea for 2G/2G, in the same sense that
0xC0000000 is a silly default because of the many 1G machines out there.
--
Jens Axboe
Jens Axboe writes:
> Hi,
>
> It does annoy me that any 1G i386 machine will end up with 1/8th of the
> memory as highmem. A patch like this one has been used in various places
> since the early 2.4 days at least, is there a reason why it isn't merged
> yet? Note I just hacked this one up, but similar patches abound I'm
> sure. Bugs are mine.
>
> Signed-off-by: Jens Axboe <[email protected]>
>
> diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
> index d849c68..0b2457b 100644
> --- a/arch/i386/Kconfig
> +++ b/arch/i386/Kconfig
> @@ -444,6 +464,24 @@ config HIGHMEM64G
>
> endchoice
>
> +choice
> + depends on NOHIGHMEM
> + prompt "Memory split"
> + default DEFAULT_3G
> + help
> + Select the wanted split between kernel and user memory. On a 1G
> + machine, the 3G/1G default split will result in 128MiB of high
> + memory. Selecting a 2G/2G split will make all of memory available
> + as low memory. Note that this will make your kernel incompatible
> + with binary only kernel modules.
2G/2G is not the only viable alternative. On my 1GB x86 box I'm
using "lowmem1g" patches for both 2.4 and 2.6, which results in
2.75G for user-space. I'm sure others have other preferences.
Any standard option for this should either have several hard-coded
alternatives, or should support arbitrary values (within reason).
(See http://www.csd.uu.se/~mikpe/linux/patches/*/patch-i386-lowmem1g-*
if you're interested.)
/Mikael
On Tue, Jan 10 2006, Mikael Pettersson wrote:
> Jens Axboe writes:
> > Hi,
> >
> > It does annoy me that any 1G i386 machine will end up with 1/8th of the
> > memory as highmem. A patch like this one has been used in various places
> > since the early 2.4 days at least, is there a reason why it isn't merged
> > yet? Note I just hacked this one up, but similar patches abound I'm
> > sure. Bugs are mine.
> >
> > Signed-off-by: Jens Axboe <[email protected]>
> >
> > diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
> > index d849c68..0b2457b 100644
> > --- a/arch/i386/Kconfig
> > +++ b/arch/i386/Kconfig
> > @@ -444,6 +464,24 @@ config HIGHMEM64G
> >
> > endchoice
> >
> > +choice
> > + depends on NOHIGHMEM
> > + prompt "Memory split"
> > + default DEFAULT_3G
> > + help
> > + Select the wanted split between kernel and user memory. On a 1G
> > + machine, the 3G/1G default split will result in 128MiB of high
> > + memory. Selecting a 2G/2G split will make all of memory available
> > + as low memory. Note that this will make your kernel incompatible
> > + with binary only kernel modules.
>
> 2G/2G is not the only viable alternative. On my 1GB x86 box I'm
Yes I know, as I wrote to Ingo I wanted to keep it really simple. It can
easily be extended, of course.
> using "lowmem1g" patches for both 2.4 and 2.6, which results in
> 2.75G for user-space. I'm sure others have other preferences.
> Any standard option for this should either have several hard-coded
> alternatives, or should support arbitrary values (within reason).
That's just asking for trouble, imho. We should provide some defaults
(that work well on 1G and 2G machines, for instance) and stick to that.
> (See http://www.csd.uu.se/~mikpe/linux/patches/*/patch-i386-lowmem1g-*
> if you're interested.)
It's similar to what I've been doing so far (just changing page.h to
0xb0000000). 0x80000000 might be a bad default as suggested by Byron, as
it just misses the full 2G.
0xb0000000 is a much better default, but I didn't think that would fly
as a patch.
--
Jens Axboe
Hi,
> 0xb0000000 is a much better default, but I didn't think that would fly
> as a patch.
I think that will not fly with CONFIG_X86_PAE. In PAE mode the 3rd pmd
(for the 0xc0000000 => 0xffffffff kernel address range) is shared,
anything but 0xc000000 most likely needs some more hackery than just
changing PAGE_OFFSET. As the whole point of this split patchery is to
avoid highmem in the first place it maybe makes sense to have some
"optimize for 1/2/4/more GB main memory" config option which in turn
picks sane PAGE_OFFSET+HIGHMEM+PAE settings?
cheers,
Gerd
--
Gerd 'just married' Hoffmann <[email protected]>
I'm the hacker formerly known as Gerd Knorr.
Jens Axboe wrote:
..
> +choice
> + depends on NOHIGHMEM
Is that dependency strictly needed here?
(Mark hurriedly applies patch and rebuilds kernel on 2G notebook..)
Cheers!
On Tue, Jan 10 2006, Gerd Hoffmann wrote:
> Hi,
>
> >0xb0000000 is a much better default, but I didn't think that would fly
> >as a patch.
>
> I think that will not fly with CONFIG_X86_PAE. In PAE mode the 3rd pmd
> (for the 0xc0000000 => 0xffffffff kernel address range) is shared,
> anything but 0xc000000 most likely needs some more hackery than just
> changing PAGE_OFFSET. As the whole point of this split patchery is to
> avoid highmem in the first place it maybe makes sense to have some
> "optimize for 1/2/4/more GB main memory" config option which in turn
> picks sane PAGE_OFFSET+HIGHMEM+PAE settings?
The patch depends on NOHIGHMEM atm, so you can't select PAE and move the
page offset anyways.
--
Jens Axboe
On Tue, Jan 10 2006, Mark Lord wrote:
> Jens Axboe wrote:
> ..
> >+choice
> >+ depends on NOHIGHMEM
>
> Is that dependency strictly needed here?
Probably not. Well at least it could be relaxed, I just wanted to be
safe and avoid any extra complications due to this.
> (Mark hurriedly applies patch and rebuilds kernel on 2G notebook..)
You may want to change the 2G split to 0x78000000 as noted by Byron,
then you can skip highmem completely if you so wanted.
--
Jens Axboe
On Tue, Jan 10 2006, Gerd Hoffmann wrote:
> changing PAGE_OFFSET. As the whole point of this split patchery is to
> avoid highmem in the first place it maybe makes sense to have some
> "optimize for 1/2/4/more GB main memory" config option which in turn
> picks sane PAGE_OFFSET+HIGHMEM+PAE settings?
Forgot to comment on that... Yes that is indeed an option, but it's a
little complicated since it depends on the role of the machine. A file
server may want a very small user land virtual memory size, whereas a
database server may want the opposite. So I think it's still saner to
expose a select range of settings.
--
Jens Axboe
On Tue, Jan 10 2006, Byron Stanoszek wrote:
> On Tue, 10 Jan 2006, Jens Axboe wrote:
>
> >>yes, i made it totally configurable in 2.4 days: 1:3, 2/2 and 3:1 splits
> >>were possible. It was a larger patch to enable all this across x86, but
> >>the Kconfig portion was removed a bit later because people _frequently_
> >>misconfigured their kernels and then complained about the results.
> >
> >How is this different than all other sorts of misconfigurations? As far
> >as I can tell, the biggest "problem" for some is if they depend on some
> >binary module that will of course break with a different page offset.
> >
> >For simplicity, I didn't add more than the 2/2 split, where we could add
> >even a 3/1 kernel/user or a 0.5/3.5 (I think sles8 had this).
>
> I prefer setting __PAGE_OFFSET to (0x78000000) on machines with 2GB of RAM.
> This seems to let the kernel use the full 2GB of memory, rather than just
> 1920-1984 MB (at least back in 2.4 days).
A newer version, trying to cater to the various comments in here.
Changes:
- Add 1G_OPT split, meant for 1GiB machines. Uses 0xB0000000
- Add 1G/3G split
- Move the 2G/2G a little, so the full 2GiB of ram can be mapped.
- Improve help text (I hope :)
- Make option depend on EXPERIMENTAL.
- Make the page.h a lot more readable.
---
Add option for configuring the page offset, to better optimize the
kernel for higher memory machines. Enables users to get rid of high
memory for eg a 1GiB machine.
Signed-off-by: Jens Axboe <[email protected]>
diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index d849c68..fcad8f7 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -444,6 +464,32 @@ config HIGHMEM64G
endchoice
+choice
+ depends on NOHIGHMEM && EXPERIMENTAL
+ prompt "Memory split"
+ default DEFAULT_3G
+ help
+ Select the wanted split between kernel and user memory.
+
+ If the address range available to the kernel is less than the
+ physical memory installed, the remaining memory will be available
+ as "high memory". Accessing high memory is a little more costly
+ than low memory, as it needs to be mapped into the kernel first.
+
+ Note that selecting anything but the default 3G/1G split will make
+ your kernel incompatible with binary only modules.
+
+ config DEFAULT_3G
+ bool "3G/1G user/kernel split"
+ config DEFAULT_3G_OPT
+ bool "3G/1G user/kernel split (for full 1G low memory)"
+ config DEFAULT_2G
+ bool "2G/2G user/kernel split"
+ config DEFAULT_1G
+ bool "1G/3G user/kernel split"
+
+endchoice
+
config HIGHMEM
bool
depends on HIGHMEM64G || HIGHMEM4G
diff --git a/include/asm-i386/page.h b/include/asm-i386/page.h
index 73296d9..7da50a1 100644
--- a/include/asm-i386/page.h
+++ b/include/asm-i386/page.h
@@ -109,11 +109,23 @@ extern int page_is_ram(unsigned long pag
#endif /* __ASSEMBLY__ */
+#if defined(CONFIG_DEFAULT_3G)
+#define __PAGE_OFFSET_RAW (0xC0000000)
+#elif defined(CONFIG_DEFAULT_3G_OPT)
+#define __PAGE_OFFSET_RAW (0xB0000000)
+#elif defined(CONFIG_DEFAULT_2G)
+#define __PAGE_OFFSET_RAW (0x78000000)
+#elif defined(CONFIG_DEFAULT_1G)
+#define __PAGE_OFFSET_RAW (0x40000000)
+#else
+#error "Bad user/kernel offset"
+#endif
+
#ifdef __ASSEMBLY__
-#define __PAGE_OFFSET (0xC0000000)
+#define __PAGE_OFFSET __PAGE_OFFSET_RAW
#define __PHYSICAL_START CONFIG_PHYSICAL_START
#else
-#define __PAGE_OFFSET (0xC0000000UL)
+#define __PAGE_OFFSET ((unsigned long)__PAGE_OFFSET_RAW)
#define __PHYSICAL_START ((unsigned long)CONFIG_PHYSICAL_START)
#endif
#define __KERNEL_START (__PAGE_OFFSET + __PHYSICAL_START)
--
Jens Axboe
* Jens Axboe <[email protected]> wrote:
> + Select the wanted split between kernel and user memory.
> +
> + If the address range available to the kernel is less than the
> + physical memory installed, the remaining memory will be available
> + as "high memory". Accessing high memory is a little more costly
> + than low memory, as it needs to be mapped into the kernel first.
make it _ALOT_ more clear that mere mortals should not touch this
option! Also, you do not mention the userspace-VM fragmentation issues.
Plus, if a user uses a 2G/2G split with more than 2G of RAM, the kernel
should print a warning that it's running with a non-default split. Do
the same if the user uses a non-default split with less than 960MB of
RAM.
> +
> + Note that selecting anything but the default 3G/1G split will make
> + your kernel incompatible with binary only modules.
it's not 'will' but 'may', and even then, tons of .config things can
break bin-only modules, so just skip this paragraph.
looks good to me otherwise, with the text fixes it's:
Acked-by: Ingo Molnar <[email protected]>
Ingo
On Tue, Jan 10 2006, Ingo Molnar wrote:
>
> * Jens Axboe <[email protected]> wrote:
>
> > + Select the wanted split between kernel and user memory.
> > +
> > + If the address range available to the kernel is less than the
> > + physical memory installed, the remaining memory will be available
> > + as "high memory". Accessing high memory is a little more costly
> > + than low memory, as it needs to be mapped into the kernel first.
>
> make it _ALOT_ more clear that mere mortals should not touch this
> option! Also, you do not mention the userspace-VM fragmentation issues.
> Plus, if a user uses a 2G/2G split with more than 2G of RAM, the kernel
> should print a warning that it's running with a non-default split. Do
> the same if the user uses a non-default split with less than 960MB of
> RAM.
I added the < 960MiB warning, but not for the 2G/2G as the option
depends on NOHIGHMEM right now.
I also changed the help text again, hope you are happy with it now.
> > +
> > + Note that selecting anything but the default 3G/1G split will make
> > + your kernel incompatible with binary only modules.
>
> it's not 'will' but 'may', and even then, tons of .config things can
> break bin-only modules, so just skip this paragraph.
Killed.
> looks good to me otherwise, with the text fixes it's:
>
> Acked-by: Ingo Molnar <[email protected]>
Thanks! Updated patch below.
---
Add option for configuring the page offset, to better optimize the
kernel for higher memory machines. Enables users to get rid of high
memory for eg a 1GiB machine.
Signed-off-by: Jens Axboe <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index d849c68..20d1423 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -444,6 +464,35 @@ config HIGHMEM64G
endchoice
+choice
+ depends on NOHIGHMEM && EXPERIMENTAL
+ prompt "Memory split"
+ default DEFAULT_3G
+ help
+ Select the wanted split between kernel and user memory.
+
+ If the address range available to the kernel is less than the
+ physical memory installed, the remaining memory will be available
+ as "high memory". Accessing high memory is a little more costly
+ than low memory, as it needs to be mapped into the kernel first.
+ Note that increasing the kernel address space limits the range
+ available to user programs, making the address space there
+ tighter.
+
+ If you are not absolutely sure what you are doing, leave this
+ option alone!
+
+ config DEFAULT_3G
+ bool "3G/1G user/kernel split"
+ config DEFAULT_3G_OPT
+ bool "3G/1G user/kernel split (for full 1G low memory)"
+ config DEFAULT_2G
+ bool "2G/2G user/kernel split"
+ config DEFAULT_1G
+ bool "1G/3G user/kernel split"
+
+endchoice
+
config HIGHMEM
bool
depends on HIGHMEM64G || HIGHMEM4G
diff --git a/arch/i386/mm/init.c b/arch/i386/mm/init.c
index 7df494b..67f1da0 100644
--- a/arch/i386/mm/init.c
+++ b/arch/i386/mm/init.c
@@ -597,6 +597,12 @@ void __init mem_init(void)
high_memory = (void *) __va(max_low_pfn * PAGE_SIZE - 1) + 1;
#endif
+#if !defined(CONFIG_DEFAULT_3G)
+ /* if the user has less than 960MB of RAM, he should use the default */
+ if (max_low_pfn < (960 * 1024 * 1024 / PAGE_SIZE))
+ printk(KERN_INFO "Memory: less than 960MiB of RAM, you should use the default memory split setting\n");
+#endif
+
/* this will put all low memory onto the freelists */
totalram_pages += free_all_bootmem();
diff --git a/include/asm-i386/page.h b/include/asm-i386/page.h
index 73296d9..7da50a1 100644
--- a/include/asm-i386/page.h
+++ b/include/asm-i386/page.h
@@ -109,11 +109,23 @@ extern int page_is_ram(unsigned long pag
#endif /* __ASSEMBLY__ */
+#if defined(CONFIG_DEFAULT_3G)
+#define __PAGE_OFFSET_RAW (0xC0000000)
+#elif defined(CONFIG_DEFAULT_3G_OPT)
+#define __PAGE_OFFSET_RAW (0xB0000000)
+#elif defined(CONFIG_DEFAULT_2G)
+#define __PAGE_OFFSET_RAW (0x78000000)
+#elif defined(CONFIG_DEFAULT_1G)
+#define __PAGE_OFFSET_RAW (0x40000000)
+#else
+#error "Bad user/kernel offset"
+#endif
+
#ifdef __ASSEMBLY__
-#define __PAGE_OFFSET (0xC0000000)
+#define __PAGE_OFFSET __PAGE_OFFSET_RAW
#define __PHYSICAL_START CONFIG_PHYSICAL_START
#else
-#define __PAGE_OFFSET (0xC0000000UL)
+#define __PAGE_OFFSET ((unsigned long)__PAGE_OFFSET_RAW)
#define __PHYSICAL_START ((unsigned long)CONFIG_PHYSICAL_START)
#endif
#define __KERNEL_START (__PAGE_OFFSET + __PHYSICAL_START)
--
Jens Axboe
This (below) is NOT a kernel patch, but anyone running VMware
will need something like it when using the 2G-2G split:
Cheers
--- vmware-config.pl.orig 2006-01-10 10:05:55.000000000 -0500
+++ vmware-config.pl 2006-01-10 10:07:29.000000000 -0500
@@ -2593,7 +2593,9 @@
my $first;
$first = lc(substr($fields[0], 0, 1));
- if ($first =~ /^[4567]$/) {
+ if (lc(substr($fields[0],0,2)) =~ /^78$/) {
+ $first = '78000000';
+ } elsif ($first =~ /^[4567]$/) {
$first = '40000000';
} elsif ($first =~ /^[89ab]$/) {
$first = '80000000';
Jens Axboe writes:
> Thanks! Updated patch below.
>
> ---
>
> Add option for configuring the page offset, to better optimize the
> kernel for higher memory machines. Enables users to get rid of high
> memory for eg a 1GiB machine.
>
> Signed-off-by: Jens Axboe <[email protected]>
> Acked-by: Ingo Molnar <[email protected]>
Acked-by: Mikael Pettersson <[email protected]>
> diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
> index d849c68..20d1423 100644
> --- a/arch/i386/Kconfig
> +++ b/arch/i386/Kconfig
> @@ -444,6 +464,35 @@ config HIGHMEM64G
>
> endchoice
>
> +choice
> + depends on NOHIGHMEM && EXPERIMENTAL
> + prompt "Memory split"
> + default DEFAULT_3G
> + help
> + Select the wanted split between kernel and user memory.
> +
> + If the address range available to the kernel is less than the
> + physical memory installed, the remaining memory will be available
> + as "high memory". Accessing high memory is a little more costly
> + than low memory, as it needs to be mapped into the kernel first.
> + Note that increasing the kernel address space limits the range
> + available to user programs, making the address space there
> + tighter.
> +
> + If you are not absolutely sure what you are doing, leave this
> + option alone!
> +
> + config DEFAULT_3G
> + bool "3G/1G user/kernel split"
> + config DEFAULT_3G_OPT
> + bool "3G/1G user/kernel split (for full 1G low memory)"
> + config DEFAULT_2G
> + bool "2G/2G user/kernel split"
> + config DEFAULT_1G
> + bool "1G/3G user/kernel split"
> +
> +endchoice
> +
> config HIGHMEM
> bool
> depends on HIGHMEM64G || HIGHMEM4G
> diff --git a/arch/i386/mm/init.c b/arch/i386/mm/init.c
> index 7df494b..67f1da0 100644
> --- a/arch/i386/mm/init.c
> +++ b/arch/i386/mm/init.c
> @@ -597,6 +597,12 @@ void __init mem_init(void)
> high_memory = (void *) __va(max_low_pfn * PAGE_SIZE - 1) + 1;
> #endif
>
> +#if !defined(CONFIG_DEFAULT_3G)
> + /* if the user has less than 960MB of RAM, he should use the default */
> + if (max_low_pfn < (960 * 1024 * 1024 / PAGE_SIZE))
> + printk(KERN_INFO "Memory: less than 960MiB of RAM, you should use the default memory split setting\n");
> +#endif
> +
> /* this will put all low memory onto the freelists */
> totalram_pages += free_all_bootmem();
>
> diff --git a/include/asm-i386/page.h b/include/asm-i386/page.h
> index 73296d9..7da50a1 100644
> --- a/include/asm-i386/page.h
> +++ b/include/asm-i386/page.h
> @@ -109,11 +109,23 @@ extern int page_is_ram(unsigned long pag
>
> #endif /* __ASSEMBLY__ */
>
> +#if defined(CONFIG_DEFAULT_3G)
> +#define __PAGE_OFFSET_RAW (0xC0000000)
> +#elif defined(CONFIG_DEFAULT_3G_OPT)
> +#define __PAGE_OFFSET_RAW (0xB0000000)
> +#elif defined(CONFIG_DEFAULT_2G)
> +#define __PAGE_OFFSET_RAW (0x78000000)
> +#elif defined(CONFIG_DEFAULT_1G)
> +#define __PAGE_OFFSET_RAW (0x40000000)
> +#else
> +#error "Bad user/kernel offset"
> +#endif
> +
> #ifdef __ASSEMBLY__
> -#define __PAGE_OFFSET (0xC0000000)
> +#define __PAGE_OFFSET __PAGE_OFFSET_RAW
> #define __PHYSICAL_START CONFIG_PHYSICAL_START
> #else
> -#define __PAGE_OFFSET (0xC0000000UL)
> +#define __PAGE_OFFSET ((unsigned long)__PAGE_OFFSET_RAW)
> #define __PHYSICAL_START ((unsigned long)CONFIG_PHYSICAL_START)
> #endif
> #define __KERNEL_START (__PAGE_OFFSET + __PHYSICAL_START)
>
> --
> Jens Axboe
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Tue, 10 Jan 2006, Jens Axboe wrote:
>
> A newer version, trying to cater to the various comments in here.
> Changes:
Can we do one final cleanup? Do all the magic in _one_ place, namely the
x86 Kconfig file.
Also, I don't think the NOHIGHMEM dependency is necessarily correct. A
2G/2G split can be advantageous with a 16GB setup (you'll have more room
for dentries etc), but you obviously want to have HIGHMEM for that..
Do it something like this:
choice
depends on EXPERIMENTAL
prompt "Memory split"
default DEFAULT_3G
help
Select the wanted split between kernel and user memory.
If the address range available to the kernel is less than the
physical memory installed, the remaining memory will be available
as "high memory". Accessing high memory is a little more costly
than low memory, as it needs to be mapped into the kernel first.
Note that selecting anything but the default 3G/1G split will make
your kernel incompatible with binary only modules.
config DEFAULT_3G
bool "3G/1G user/kernel split"
config DEFAULT_3G_OPT
bool "3G/1G user/kernel split (for full 1G low memory)"
config DEFAULT_2G
bool "2G/2G user/kernel split"
config DEFAULT_1G
bool "1G/3G user/kernel split"
endchoice
config PAGE_OFFSET
hex
default 0xC0000000
default 0xB0000000 if DEFAULT_3G_OPT
default 0x78000000 if DEFAULT_2G
default 0x40000000 if DEFAULT_1G
and then asm-i386/page.h can just do
#define __PAGE_OFFSET ((unsigned long)CONFIG_PAGE_OFFSET)
and you're done.
If you ever want to change the offsets, you're only changing the Kconfig
file, and as you can tell, the syntax is actually much _nicer_ that using
the C preprocessor, since these kinds of choices is exactly what the
Kconfig language is all about.
Please?
Linus
Linus Torvalds wrote:
...
> Can we do one final cleanup? Do all the magic in _one_ place, namely the
> x86 Kconfig file.
...
> config DEFAULT_3G
> bool "3G/1G user/kernel split"
> config DEFAULT_3G_OPT
> bool "3G/1G user/kernel split (for full 1G low memory)"
> config DEFAULT_2G
> bool "2G/2G user/kernel split"
> config DEFAULT_1G
> bool "1G/3G user/kernel split"
...
Are "DEFAULT_*" really the best names to assign to these options?
For these options, I'd expect something like "VMUSER_*" or "USERMEM_*".
Cheers
On Tue, 10 Jan 2006, Mark Lord wrote:
>
> Are "DEFAULT_*" really the best names to assign to these options?
> For these options, I'd expect something like "VMUSER_*" or "USERMEM_*".
Good point. I just took the naming from the original one. Especially if
all the logic is moved into Kconfig files, it has nothing to do with
DEFAULT what-so-ever. More of a VMSPLIT_3G or similar..
Linus
So, the patch would now look like this:
diff -u --recursive --new-file --exclude='.*' linux-2.6.15/arch/i386/Kconfig linux/arch/i386/Kconfig
--- linux-2.6.15/arch/i386/Kconfig 2006-01-02 22:21:10.000000000 -0500
+++ linux/arch/i386/Kconfig 2006-01-10 12:02:40.000000000 -0500
@@ -448,6 +448,43 @@
endchoice
+choice
+ depends on EXPERIMENTAL
+ prompt "Memory split"
+ default VMSPLIT_3G
+ help
+ Select the desired split between kernel and user memory.
+
+ If the address range available to the kernel is less than the
+ physical memory installed, the remaining memory will be available
+ as "high memory". Accessing high memory is a little more costly
+ than low memory, as it needs to be mapped into the kernel first.
+ Note that increasing the kernel address space limits the range
+ available to user programs, making the address space there
+ tighter. Selecting anything other than the default 3G/1G split
+ will also likely make your kernel incompatible with binary-only
+ kernel modules.
+
+ If you are not absolutely sure what you are doing, leave this
+ option alone!
+
+ config VMSPLIT_3G
+ bool "3G/1G user/kernel split"
+ config VMSPLIT_3G_OPT
+ bool "3G/1G user/kernel split (for full 1G low memory)"
+ config VMSPLIT_2G
+ bool "2G/2G user/kernel split"
+ config VMSPLIT_1G
+ bool "1G/3G user/kernel split"
+endchoice
+
+config PAGE_OFFSET
+ hex
+ default 0xC0000000
+ default 0xB0000000 if VMSPLIT_3G_OPT
+ default 0x78000000 if VMSPLIT_2G
+ default 0x40000000 if VMSPLIT_1G
+
config HIGHMEM
bool
depends on HIGHMEM64G || HIGHMEM4G
diff -u --recursive --new-file --exclude='.*' linux-2.6.15/arch/i386/mm/init.c linux/arch/i386/mm/init.c
--- linux-2.6.15/arch/i386/mm/init.c 2006-01-02 22:21:10.000000000 -0500
+++ linux/arch/i386/mm/init.c 2006-01-10 12:06:10.000000000 -0500
@@ -597,6 +597,12 @@
high_memory = (void *) __va(max_low_pfn * PAGE_SIZE - 1) + 1;
#endif
+#if !defined(CONFIG_VMSPLIT_3G)
+ /* if the user has less than 960MB of RAM, he should use the default */
+ if (max_low_pfn < (960 * 1024 * 1024 / PAGE_SIZE))
+ printk(KERN_INFO "Memory: less than 960MiB of RAM, you should use the default memory split setting\n");
+#endif
+
/* this will put all low memory onto the freelists */
totalram_pages += free_all_bootmem();
diff -u --recursive --new-file --exclude='.*' linux-2.6.15/include/asm-i386/page.h linux/include/asm-i386/page.h
--- linux-2.6.15/include/asm-i386/page.h 2006-01-02 22:21:10.000000000 -0500
+++ linux/include/asm-i386/page.h 2006-01-10 12:04:56.000000000 -0500
@@ -110,10 +110,10 @@
#endif /* __ASSEMBLY__ */
#ifdef __ASSEMBLY__
-#define __PAGE_OFFSET (0xC0000000)
+#define __PAGE_OFFSET CONFIG_PAGE_OFFSET
#define __PHYSICAL_START CONFIG_PHYSICAL_START
#else
-#define __PAGE_OFFSET (0xC0000000UL)
+#define __PAGE_OFFSET ((unsigned long)CONFIG_PAGE_OFFSET)
#define __PHYSICAL_START ((unsigned long)CONFIG_PHYSICAL_START)
#endif
#define __KERNEL_START (__PAGE_OFFSET + __PHYSICAL_START)
On Tue, 10 Jan 2006 15:39:31 +0100 Jens Axboe wrote:
> --- a/include/asm-i386/page.h
> +++ b/include/asm-i386/page.h
> @@ -109,11 +109,23 @@ extern int page_is_ram(unsigned long pag
>
> #endif /* __ASSEMBLY__ */
>
> +#if defined(CONFIG_DEFAULT_3G)
> +#define __PAGE_OFFSET_RAW (0xC0000000)
> +#elif defined(CONFIG_DEFAULT_3G_OPT)
> +#define __PAGE_OFFSET_RAW (0xB0000000)
> +#elif defined(CONFIG_DEFAULT_2G)
> +#define __PAGE_OFFSET_RAW (0x78000000)
> +#elif defined(CONFIG_DEFAULT_1G)
> +#define __PAGE_OFFSET_RAW (0x40000000)
> +#else
> +#error "Bad user/kernel offset"
> +#endif
> +
> #ifdef __ASSEMBLY__
> -#define __PAGE_OFFSET (0xC0000000)
> +#define __PAGE_OFFSET __PAGE_OFFSET_RAW
> #define __PHYSICAL_START CONFIG_PHYSICAL_START
> #else
> -#define __PAGE_OFFSET (0xC0000000UL)
> +#define __PAGE_OFFSET ((unsigned long)__PAGE_OFFSET_RAW)
> #define __PHYSICAL_START ((unsigned long)CONFIG_PHYSICAL_START)
> #endif
> #define __KERNEL_START (__PAGE_OFFSET + __PHYSICAL_START)
Changing PAGE_OFFSET this way would break at least Valgrind (the latest
release 3.1.0 by default is statically linked at address 0xb0000000, and
PIE support does not seem to be present in that release). I remember
that similar changes were also breaking Lisp implementations (cmucl,
sbcl), however, I am not really sure about this.
On Tue, 10 Jan 2006, Mark Lord wrote:
>
> So, the patch would now look like this:
Yes, except I think we need to make the "depends on" include !X86_PAE:
> +choice
> + depends on EXPERIMENTAL
depends on EXPERIMENTAL && !X86_PAE
since PAE depends on the 3G/1G split (we could make it work for a pure
2G/2G split, but that's a separate issue, and then we'd need to change the
CONFIG_PAGE_OFFSET defaults to be something like
default 0x80000000 if VMSPLIT_2G && X86_PAE
(but that's definitely not appropriate for now - that's a separate issue,
after somebody has verified that PAE and 2G:2G works)
Also, I think the arch/i386/mm/init.c snippet should just be removed. If
we make the split configurable, I don't see that we should warn about a
configuration where you have less memory than the point where the split
makes sense. A distribution (either something like Fedora _or_ just a
internal company "standard image") migth decide to use 2G:2G, but not all
machines might have lots of memory. Warning about it would be silly.
Anyway, this should go into -mm, and I'd rather have it stay there for a
while. I've got tons of stuff for 2.6.16 already, I'd prefer to not see
this kind of thing too..
Linus
On Tue, Jan 10 2006, Mark Lord wrote:
> So, the patch would now look like this:
Looks good to me!
--
Jens Axboe
Jens Axboe wrote:
> On Tue, Jan 10 2006, Mark Lord wrote:
>>So, the patch would now look like this:
> Looks good to me!
I'm just trying to save you some diff'ing, Jens!
Here it is again, with Linus's latest.
Andrew: please drop this into -mm for now, per Linus:
Chief Penguin wrote:
>Anyway, this should go into -mm, and I'd rather have it stay there for a
>while. I've got tons of stuff for 2.6.16 already, I'd prefer to not see
>this kind of thing too..
Signed-off-by: [email protected]
diff -u --recursive --new-file --exclude='.*' linux-2.6.15/arch/i386/Kconfig linux/arch/i386/Kconfig
--- linux-2.6.15/arch/i386/Kconfig 2006-01-02 22:21:10.000000000 -0500
+++ linux/arch/i386/Kconfig 2006-01-10 12:02:40.000000000 -0500
@@ -448,6 +448,43 @@
endchoice
+choice
+ depends on EXPERIMENTAL && !X86_PAE
+ prompt "Memory split"
+ default VMSPLIT_3G
+ help
+ Select the desired split between kernel and user memory.
+
+ If the address range available to the kernel is less than the
+ physical memory installed, the remaining memory will be available
+ as "high memory". Accessing high memory is a little more costly
+ than low memory, as it needs to be mapped into the kernel first.
+ Note that increasing the kernel address space limits the range
+ available to user programs, making the address space there
+ tighter. Selecting anything other than the default 3G/1G split
+ will also likely make your kernel incompatible with binary-only
+ kernel modules.
+
+ If you are not absolutely sure what you are doing, leave this
+ option alone!
+
+ config VMSPLIT_3G
+ bool "3G/1G user/kernel split"
+ config VMSPLIT_3G_OPT
+ bool "3G/1G user/kernel split (for full 1G low memory)"
+ config VMSPLIT_2G
+ bool "2G/2G user/kernel split"
+ config VMSPLIT_1G
+ bool "1G/3G user/kernel split"
+endchoice
+
+config PAGE_OFFSET
+ hex
+ default 0xC0000000
+ default 0xB0000000 if VMSPLIT_3G_OPT
+ default 0x78000000 if VMSPLIT_2G
+ default 0x40000000 if VMSPLIT_1G
+
config HIGHMEM
bool
depends on HIGHMEM64G || HIGHMEM4G
diff -u --recursive --new-file --exclude='.*' linux-2.6.15/include/asm-i386/page.h linux/include/asm-i386/page.h
--- linux-2.6.15/include/asm-i386/page.h 2006-01-02 22:21:10.000000000 -0500
+++ linux/include/asm-i386/page.h 2006-01-10 12:04:56.000000000 -0500
@@ -110,10 +110,10 @@
#endif /* __ASSEMBLY__ */
#ifdef __ASSEMBLY__
-#define __PAGE_OFFSET (0xC0000000)
+#define __PAGE_OFFSET CONFIG_PAGE_OFFSET
#define __PHYSICAL_START CONFIG_PHYSICAL_START
#else
-#define __PAGE_OFFSET (0xC0000000UL)
+#define __PAGE_OFFSET ((unsigned long)CONFIG_PAGE_OFFSET)
#define __PHYSICAL_START ((unsigned long)CONFIG_PHYSICAL_START)
#endif
#define __KERNEL_START (__PAGE_OFFSET + __PHYSICAL_START)
Mark Lord <[email protected]> wrote:
> So, the patch would now look like this:
can we please state something what the 3G_OPT is suppsoed to do? Is this "optimzed for 1GB Real RAM"? Should this be something like "2.5G" instead?
> + config VMSPLIT_3G_OPT
> + bool "3G/1G user/kernel split (for full 1G low memory)"
> + default 0xC0000000
> + default 0xB0000000 if VMSPLIT_3G_OPT
> + default 0x78000000 if VMSPLIT_2G
> + default 0x40000000 if VMSPLIT_1G
Grusss
Bernd
Linus Torvalds wrote:
>
> On Tue, 10 Jan 2006, Jens Axboe wrote:
>
>>A newer version, trying to cater to the various comments in here.
>>Changes:
>
>
> Can we do one final cleanup? Do all the magic in _one_ place, namely the
> x86 Kconfig file.
>
> Also, I don't think the NOHIGHMEM dependency is necessarily correct. A
> 2G/2G split can be advantageous with a 16GB setup (you'll have more room
> for dentries etc), but you obviously want to have HIGHMEM for that..
>
> Do it something like this:
>
> choice
> depends on EXPERIMENTAL
> prompt "Memory split"
> default DEFAULT_3G
> help
> Select the wanted split between kernel and user memory.
> If the address range available to the kernel is less than the
> physical memory installed, the remaining memory will be available
> as "high memory". Accessing high memory is a little more costly
> than low memory, as it needs to be mapped into the kernel first.
> Note that selecting anything but the default 3G/1G split will make
> your kernel incompatible with binary only modules.
>
> config DEFAULT_3G
> bool "3G/1G user/kernel split"
> config DEFAULT_3G_OPT
> bool "3G/1G user/kernel split (for full 1G low memory)"
> config DEFAULT_2G
> bool "2G/2G user/kernel split"
> config DEFAULT_1G
> bool "1G/3G user/kernel split"
> endchoice
>
> config PAGE_OFFSET
> hex
> default 0xC0000000
> default 0xB0000000 if DEFAULT_3G_OPT
> default 0x78000000 if DEFAULT_2G
> default 0x40000000 if DEFAULT_1G
>
> and then asm-i386/page.h can just do
>
> #define __PAGE_OFFSET ((unsigned long)CONFIG_PAGE_OFFSET)
>
> and you're done.
The non-1GB-aligned ones need to be disbarred when PAE is on, I think.
M.
Bernd Eckenfels wrote:
Here are the patches I use for the splitting. They work well. The
methods employed in Red Hat ES are far better and I am surprised
no one has simply integrated those patches into the kernel which are 4GB
/ 4GB kernel/user.
Jeff
>Mark Lord <[email protected]> wrote:
>
>
>>So, the patch would now look like this:
>>
>>
>
>can we please state something what the 3G_OPT is suppsoed to do? Is this "optimzed for 1GB Real RAM"? Should this be something like "2.5G" instead?
>
>
>
>>+ config VMSPLIT_3G_OPT
>>+ bool "3G/1G user/kernel split (for full 1G low memory)"
>>
>>
>
>
>
>>+ default 0xC0000000
>>+ default 0xB0000000 if VMSPLIT_3G_OPT
>>+ default 0x78000000 if VMSPLIT_2G
>>+ default 0x40000000 if VMSPLIT_1G
>>
>>
>
>
>
2006/1/10, Jens Axboe <[email protected]>:
> On Tue, Jan 10 2006, Byron Stanoszek wrote:
> > On Tue, 10 Jan 2006, Jens Axboe wrote:
> >
> > >>yes, i made it totally configurable in 2.4 days: 1:3, 2/2 and 3:1 splits
> > >>were possible. It was a larger patch to enable all this across x86, but
> > >>the Kconfig portion was removed a bit later because people _frequently_
> > >>misconfigured their kernels and then complained about the results.
> > >
> > >How is this different than all other sorts of misconfigurations? As far
> > >as I can tell, the biggest "problem" for some is if they depend on some
> > >binary module that will of course break with a different page offset.
> > >
> > >For simplicity, I didn't add more than the 2/2 split, where we could add
> > >even a 3/1 kernel/user or a 0.5/3.5 (I think sles8 had this).
> >
> > I prefer setting __PAGE_OFFSET to (0x78000000) on machines with 2GB of RAM.
> > This seems to let the kernel use the full 2GB of memory, rather than just
> > 1920-1984 MB (at least back in 2.4 days).
>
> A newer version, trying to cater to the various comments in here.
> Changes:
>
> - Add 1G_OPT split, meant for 1GiB machines. Uses 0xB0000000
> - Add 1G/3G split
> - Move the 2G/2G a little, so the full 2GiB of ram can be mapped.
> - Improve help text (I hope :)
> - Make option depend on EXPERIMENTAL.
> - Make the page.h a lot more readable.
>
> ---
>
> Add option for configuring the page offset, to better optimize the
> kernel for higher memory machines. Enables users to get rid of high
> memory for eg a 1GiB machine.
>
> Signed-off-by: Jens Axboe <[email protected]>
>
> diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
> index d849c68..fcad8f7 100644
> --- a/arch/i386/Kconfig
> +++ b/arch/i386/Kconfig
> @@ -444,6 +464,32 @@ config HIGHMEM64G
>
> endchoice
>
> +choice
> + depends on NOHIGHMEM && EXPERIMENTAL
> + prompt "Memory split"
> + default DEFAULT_3G
> + help
> + Select the wanted split between kernel and user memory.
> +
> + If the address range available to the kernel is less than the
> + physical memory installed, the remaining memory will be available
> + as "high memory". Accessing high memory is a little more costly
> + than low memory, as it needs to be mapped into the kernel first.
> +
> + Note that selecting anything but the default 3G/1G split will make
> + your kernel incompatible with binary only modules.
> +
> + config DEFAULT_3G
> + bool "3G/1G user/kernel split"
> + config DEFAULT_3G_OPT
> + bool "3G/1G user/kernel split (for full 1G low memory)"
> + config DEFAULT_2G
> + bool "2G/2G user/kernel split"
> + config DEFAULT_1G
> + bool "1G/3G user/kernel split"
I don't like these names. Can't your invent better ones?
Having multiple defaults seems odd. See these maybe:
MEMSPLIT_U3_K1
MEMSPLIT_U11_K5
MEMSPLIT_U39_K44
MEMSPLIT_U1_K3
odd too? midnight here. |-)
> +
> +endchoice
> +
> config HIGHMEM
> bool
> depends on HIGHMEM64G || HIGHMEM4G
> diff --git a/include/asm-i386/page.h b/include/asm-i386/page.h
> index 73296d9..7da50a1 100644
> --- a/include/asm-i386/page.h
> +++ b/include/asm-i386/page.h
> @@ -109,11 +109,23 @@ extern int page_is_ram(unsigned long pag
>
> #endif /* __ASSEMBLY__ */
>
> +#if defined(CONFIG_DEFAULT_3G)
> +#define __PAGE_OFFSET_RAW (0xC0000000)
> +#elif defined(CONFIG_DEFAULT_3G_OPT)
> +#define __PAGE_OFFSET_RAW (0xB0000000)
> +#elif defined(CONFIG_DEFAULT_2G)
> +#define __PAGE_OFFSET_RAW (0x78000000)
> +#elif defined(CONFIG_DEFAULT_1G)
> +#define __PAGE_OFFSET_RAW (0x40000000)
> +#else
> +#error "Bad user/kernel offset"
> +#endif
> +
> #ifdef __ASSEMBLY__
> -#define __PAGE_OFFSET (0xC0000000)
> +#define __PAGE_OFFSET __PAGE_OFFSET_RAW
> #define __PHYSICAL_START CONFIG_PHYSICAL_START
> #else
> -#define __PAGE_OFFSET (0xC0000000UL)
> +#define __PAGE_OFFSET ((unsigned long)__PAGE_OFFSET_RAW)
> #define __PHYSICAL_START ((unsigned long)CONFIG_PHYSICAL_START)
> #endif
> #define __KERNEL_START (__PAGE_OFFSET + __PHYSICAL_START)
>
> --
> Jens Axboe
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
Coywolf Qi Hunt
2006/1/10, Linus Torvalds <[email protected]>:
>
>
> On Tue, 10 Jan 2006, Mark Lord wrote:
> >
> > Are "DEFAULT_*" really the best names to assign to these options?
> > For these options, I'd expect something like "VMUSER_*" or "USERMEM_*".
>
> Good point. I just took the naming from the original one. Especially if
> all the logic is moved into Kconfig files, it has nothing to do with
> DEFAULT what-so-ever. More of a VMSPLIT_3G or similar..
Good. VMSPLIT_ is better than my MEMSPLIT_.
>
> Linus
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
Coywolf Qi Hunt
On Tue, 10 Jan 2006, Martin Bligh wrote:
>
> The non-1GB-aligned ones need to be disbarred when PAE is on, I think.
Well, right now _all_ the non-3:1 cases need to be disbarred. I think we
depend on the kernel mapping only ever being the _one_ last entry in the
top-level page table, which is only true with the 3:1 mapping.
But I didn't check.
Linus
Linus Torvalds wrote:
>
> On Tue, 10 Jan 2006, Martin Bligh wrote:
>
>>The non-1GB-aligned ones need to be disbarred when PAE is on, I think.
>
>
> Well, right now _all_ the non-3:1 cases need to be disbarred. I think we
> depend on the kernel mapping only ever being the _one_ last entry in the
> top-level page table, which is only true with the 3:1 mapping.
>
> But I didn't check.
I think it was OK as of 2.6.5 or so, unless something changed recently.
Used to work unless you had PAE on, and a non-aligned split ... I had
that patch as a config option for a long time, as did SuSE, etc.
M.
Linus Torvalds wrote:
>On Tue, 10 Jan 2006, Martin Bligh wrote:
>
>
>>The non-1GB-aligned ones need to be disbarred when PAE is on, I think.
>>
>>
>
>Well, right now _all_ the non-3:1 cases need to be disbarred. I think we
>depend on the kernel mapping only ever being the _one_ last entry in the
>top-level page table, which is only true with the 3:1 mapping.
>
>But I didn't check.
>
>
No. It works fine (or seems to) with 2:2 mapping. I've tested with these
extensively
and am shipping products on the 1U appliances with 2:2 and I have never
seen any problems
with 2.6.9-2.6.13.
The only unpleasant side affect with 3:1 is user apps seem to rely on
swap space
a little more than I like -- perhaps this is the side affect you are
referring to?
RH ES uses 4:4 which is ideal and superior to this hack.
Jeff
> Linus
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>
Jeff V. Merkey wrote:
> Linus Torvalds wrote:
>
>> On Tue, 10 Jan 2006, Martin Bligh wrote:
>>
>>
>>> The non-1GB-aligned ones need to be disbarred when PAE is on, I think.
>>>
>>
>> Well, right now _all_ the non-3:1 cases need to be disbarred. I think
>> we depend on the kernel mapping only ever being the _one_ last entry
>> in the top-level page table, which is only true with the 3:1 mapping.
>>
>> But I didn't check.
>>
>>
>
>
> No. It works fine (or seems to) with 2:2 mapping. I've tested with
> these extensively
> and am shipping products on the 1U appliances with 2:2 and I have
> never seen any problems
> with 2.6.9-2.6.13.
>
> The only unpleasant side affect with 3:1 is user apps seem to rely on
> swap space
> a little more than I like -- perhaps this is the side affect you are
> referring to?
>
> RH ES uses 4:4 which is ideal and superior to this hack.
>
> Jeff
>
Take that back. I checked the build and 2:2 does not work correctly with
highmem enabled. So
you may be correct. Highmem support is crap anyway and a 4:4 scheme is
what this should have been
from the start. It's ok. The nice thing about Linux if you don't like
what Linus is cooking in the kitchen,
you can add your own ingredients and make something else.
Jeff
Jeff V. Merkey wrote:
> Linus Torvalds wrote:
>> On Tue, 10 Jan 2006, Martin Bligh wrote:
>>> The non-1GB-aligned ones need to be disbarred when PAE is on, I think.
>> Well, right now _all_ the non-3:1 cases need to be disbarred. I think
>> we depend on the kernel mapping only ever being the _one_ last entry
>> in the top-level page table, which is only true with the 3:1 mapping.
>>
>> But I didn't check.
..
>
> No. It works fine (or seems to) with 2:2 mapping. I've tested with these
> extensively
..
The boundary for 2:2 with the current patch is 0x78000000, not 0x80000000.
It may still work, but nobody's checked yet.
cheers
> No. It works fine (or seems to) with 2:2 mapping. I've tested with these
> extensively
> and am shipping products on the 1U appliances with 2:2 and I have never
> seen any problems
> with 2.6.9-2.6.13.
Thanks, that helps.
> The only unpleasant side affect with 3:1 is user apps seem to rely on
> swap space
> a little more than I like -- perhaps this is the side affect you are
> referring to?
>
> RH ES uses 4:4 which is ideal and superior to this hack.
Ideal in that it's universally slower, and most people don't need it?
;-) 4:4 is a workaround for a very specialized, and rare situation.
M.
On Tue, Jan 10, 2006 at 09:30:58AM -0700, Jeff V. Merkey wrote:
> Bernd Eckenfels wrote:
>
> Here are the patches I use for the splitting. They work well. The
> methods employed in Red Hat ES are far better and I am surprised
> no one has simply integrated those patches into the kernel which are 4GB
> / 4GB kernel/user.
I was under the impression the 4G/4G split had some non negligable
performance penalties compared to the other options.
Len Sorensen
On Tue, 2006-01-10 at 10:34 -0800, Linus Torvalds wrote:
> On Tue, 10 Jan 2006, Martin Bligh wrote:
> >
> > The non-1GB-aligned ones need to be disbarred when PAE is on, I think.
>
> Well, right now _all_ the non-3:1 cases need to be disbarred. I think we
> depend on the kernel mapping only ever being the _one_ last entry in the
> top-level page table, which is only true with the 3:1 mapping.
It actually "just works". We have a 16GB machine that gets a lot of
filesystem activity and use a 2:2 split all the time. Appended patch is
all that we need.
It was for other reasons at the time, but I think we fixed a bunch of
the multiple kernel mapping PMDs back in 2.5. Some remnants of that
stuff are still around.
http://marc.theaimsgroup.com/?l=linux-kernel&m=104197008817507&w=2
diff -purN -X /home/dvhart/.diff.exclude /home/linux/views/linux-2.6.12/include/asm-i386/page.h 2.6.12-uptime/include/asm-i386/page.h
--- /home/linux/views/linux-2.6.12/include/asm-i386/page.h 2005-03-02 03:00:08.000000000 -0800
+++ 2.6.12-uptime/include/asm-i386/page.h 2005-07-27 11:53:40.000000000 -0700
@@ -122,9 +122,9 @@ extern int sysctl_legacy_va_layout;
#endif /* __ASSEMBLY__ */
#ifdef __ASSEMBLY__
-#define __PAGE_OFFSET (0xC0000000)
+#define __PAGE_OFFSET (0x80000000)
#else
-#define __PAGE_OFFSET (0xC0000000UL)
+#define __PAGE_OFFSET (0x80000000UL)
#endif
-- Dave
On Tue, Jan 10 2006, Jeff V. Merkey wrote:
> RH ES uses 4:4 which is ideal and superior to this hack.
This isn't a hack, it's just making the offset configurable so you can
get the best of what you want.
And 4:4 may be ideal in a peyote haze, so whatever.
--
Jens Axboe
Lennart Sorensen wrote:
>On Tue, Jan 10, 2006 at 09:30:58AM -0700, Jeff V. Merkey wrote:
>
>
>>Bernd Eckenfels wrote:
>>
>>Here are the patches I use for the splitting. They work well. The
>>methods employed in Red Hat ES are far better and I am surprised
>>no one has simply integrated those patches into the kernel which are 4GB
>>/ 4GB kernel/user.
>>
>>
>
>I was under the impression the 4G/4G split had some non negligable
>performance penalties compared to the other options.
>
>
It does, but from my testing, I/O performance and app performance seems
negligible. I run the highest performing app/driver
on Linux for disk and network I/O loading and ES3 and ES4 are just as
performant with 4:4 as FC2, FC3, and FC4 with
3:1.
I am now able to capture 4 x gigabit segments with 3:1 at sustained
stream to disk rates of 497 MB/S
and 1 x 10Gbe at 517 MB/S stream to disk. I see no appreciable
performance differences 3:1 vs. 4:4. Modern Xeon
processors have gotten a lot better dealing with TLB invalidation. I
suppose applications that remap
memory all over the place or that do tons of swapping would see some
penalty, and I do see some
performance degredation when user space apps start swapping, but it's
difficult to quantify how much is related
to disk I/O latency vs. TLB overhear. Most TLB flushes will cost you 150
clocks over time as the TLB reloads
itself.
Jeff
>Len Sorensen
>
>
>
Mark Lord wrote:
..
> +choice
> + depends on EXPERIMENTAL && !X86_PAE
> + prompt "Memory split"
> + default VMSPLIT_3G
> + help
> + Select the desired split between kernel and user memory.
> +
> + If the address range available to the kernel is less than the
> + physical memory installed, the remaining memory will be available
> + as "high memory". Accessing high memory is a little more costly
> + than low memory, as it needs to be mapped into the kernel first.
> + Note that increasing the kernel address space limits the range
> + available to user programs, making the address space there
> + tighter. Selecting anything other than the default 3G/1G split
> + will also likely make your kernel incompatible with binary-only
> + kernel modules.
> +
> + If you are not absolutely sure what you are doing, leave this
> + option alone!
> +
> + config VMSPLIT_3G
> + bool "3G/1G user/kernel split"
> + config VMSPLIT_3G_OPT
> + bool "3G/1G user/kernel split (for full 1G low memory)"
> + config VMSPLIT_2G
> + bool "2G/2G user/kernel split"
> + config VMSPLIT_1G
> + bool "1G/3G user/kernel split"
> +endchoice
> +
> +config PAGE_OFFSET
> + hex
> + default 0xC0000000
> + default 0xB0000000 if VMSPLIT_3G_OPT
> + default 0x78000000 if VMSPLIT_2G
> + default 0x40000000 if VMSPLIT_1G
> +
...
> #ifdef __ASSEMBLY__
> -#define __PAGE_OFFSET (0xC0000000)
> +#define __PAGE_OFFSET CONFIG_PAGE_OFFSET
> #define __PHYSICAL_START CONFIG_PHYSICAL_START
> #else
> -#define __PAGE_OFFSET (0xC0000000UL)
> +#define __PAGE_OFFSET ((unsigned long)CONFIG_PAGE_OFFSET)
> #define __PHYSICAL_START ((unsigned long)CONFIG_PHYSICAL_START)
> #endif
> #define __KERNEL_START (__PAGE_OFFSET + __PHYSICAL_START)
Mmm.. somethings wrong with this version..
I have selected VMSPLIT_2G, but on rebooting the kernel
now only sees/uses 1GB of RAM (the machine has 2GB of RAM).
Almost as if those trailing "if VMSPLIT_" clauses are being ignored.
--
Mark Lord
Real-Time Remedies Inc.
[email protected]
Dave Hansen wrote:
>
> It actually "just works". We have a 16GB machine that gets a lot of
> filesystem activity and use a 2:2 split all the time. Appended patch is
> all that we need.
Your (tested) patch is not the same as what is being proposed here,
so the testing experience probably doesn't apply.
The 2:2 boundary is different here.
Jens Axboe wrote:
>On Tue, Jan 10 2006, Jeff V. Merkey wrote:
>
>
>>RH ES uses 4:4 which is ideal and superior to this hack.
>>
>>
>
>This isn't a hack, it's just making the offset configurable so you can
>get the best of what you want.
>
>And 4:4 may be ideal in a peyote haze, so whatever.
>
>
Jens,
You are just jealous because you can't take peyote. Next time your are
in Utah visiting Novell, come by Lindon at
the old keylabs building -- I'm on the second floor. I am right next
door to SCO (you can throw a rock and hit darls window from the parking
lot).
I don't ever talk to them and we have nothing to do with them.
Jeff
"Cry me a river(tm)" is an unregistered common law trademark of Linux
Torvalds.
:-)
On Tue, 2006-01-10 at 14:01 -0500, Mark Lord wrote:
> Dave Hansen wrote:
> >
> > It actually "just works". We have a 16GB machine that gets a lot of
> > filesystem activity and use a 2:2 split all the time. Appended patch is
> > all that we need.
>
> Your (tested) patch is not the same as what is being proposed here,
> so the testing experience probably doesn't apply.
>
> The 2:2 boundary is different here.
That'll teach me not to read the patch. That actually makes the link I
sent more topical because it allowed the user:kernel split with PAE to
occur away from hard PMD boundaries.
-- Dave
Okay, fixed the ordering of the "default" lines
so that the Kconfig actually works correctly.
Best for Andrew to soak this one in -mm.
Signed-off-by: Mark Lord <[email protected]>
diff -u --recursive --new-file --exclude='.*' linux-2.6.15/arch/i386/Kconfig linux/arch/i386/Kconfig
--- linux-2.6.15/arch/i386/Kconfig 2006-01-02 22:21:10.000000000 -0500
+++ linux/arch/i386/Kconfig 2006-01-10 12:02:40.000000000 -0500
@@ -448,6 +448,43 @@
endchoice
+choice
+ depends on EXPERIMENTAL && !X86_PAE
+ prompt "Memory split"
+ default VMSPLIT_3G
+ help
+ Select the desired split between kernel and user memory.
+
+ If the address range available to the kernel is less than the
+ physical memory installed, the remaining memory will be available
+ as "high memory". Accessing high memory is a little more costly
+ than low memory, as it needs to be mapped into the kernel first.
+ Note that increasing the kernel address space limits the range
+ available to user programs, making the address space there
+ tighter. Selecting anything other than the default 3G/1G split
+ will also likely make your kernel incompatible with binary-only
+ kernel modules.
+
+ If you are not absolutely sure what you are doing, leave this
+ option alone!
+
+ config VMSPLIT_3G
+ bool "3G/1G user/kernel split"
+ config VMSPLIT_3G_OPT
+ bool "3G/1G user/kernel split (for full 1G low memory)"
+ config VMSPLIT_2G
+ bool "2G/2G user/kernel split"
+ config VMSPLIT_1G
+ bool "1G/3G user/kernel split"
+endchoice
+
+config PAGE_OFFSET
+ hex
+ default 0xB0000000 if VMSPLIT_3G_OPT
+ default 0x78000000 if VMSPLIT_2G
+ default 0x40000000 if VMSPLIT_1G
+ default 0xC0000000
+
config HIGHMEM
bool
depends on HIGHMEM64G || HIGHMEM4G
diff -u --recursive --new-file --exclude='.*' linux-2.6.15/include/asm-i386/page.h linux/include/asm-i386/page.h
--- linux-2.6.15/include/asm-i386/page.h 2006-01-02 22:21:10.000000000 -0500
+++ linux/include/asm-i386/page.h 2006-01-10 12:04:56.000000000 -0500
@@ -110,10 +110,10 @@
#endif /* __ASSEMBLY__ */
#ifdef __ASSEMBLY__
-#define __PAGE_OFFSET (0xC0000000)
+#define __PAGE_OFFSET CONFIG_PAGE_OFFSET
#define __PHYSICAL_START CONFIG_PHYSICAL_START
#else
-#define __PAGE_OFFSET (0xC0000000UL)
+#define __PAGE_OFFSET ((unsigned long)CONFIG_PAGE_OFFSET)
#define __PHYSICAL_START ((unsigned long)CONFIG_PHYSICAL_START)
#endif
#define __KERNEL_START (__PAGE_OFFSET + __PHYSICAL_START)
Mark Lord wrote:
Looks good. I'll try this one.
Jeff
> Okay, fixed the ordering of the "default" lines
> so that the Kconfig actually works correctly.
>
> Best for Andrew to soak this one in -mm.
>
> Signed-off-by: Mark Lord <[email protected]>
>
> diff -u --recursive --new-file --exclude='.*'
> linux-2.6.15/arch/i386/Kconfig linux/arch/i386/Kconfig
> --- linux-2.6.15/arch/i386/Kconfig 2006-01-02 22:21:10.000000000 -0500
> +++ linux/arch/i386/Kconfig 2006-01-10 12:02:40.000000000 -0500
> @@ -448,6 +448,43 @@
>
> endchoice
>
> +choice
> + depends on EXPERIMENTAL && !X86_PAE
> + prompt "Memory split"
> + default VMSPLIT_3G
> + help
> + Select the desired split between kernel and user memory.
> +
> + If the address range available to the kernel is less than the
> + physical memory installed, the remaining memory will be available
> + as "high memory". Accessing high memory is a little more costly
> + than low memory, as it needs to be mapped into the kernel first.
> + Note that increasing the kernel address space limits the range
> + available to user programs, making the address space there
> + tighter. Selecting anything other than the default 3G/1G split
> + will also likely make your kernel incompatible with binary-only
> + kernel modules.
> +
> + If you are not absolutely sure what you are doing, leave this
> + option alone!
> +
> + config VMSPLIT_3G
> + bool "3G/1G user/kernel split"
> + config VMSPLIT_3G_OPT
> + bool "3G/1G user/kernel split (for full 1G low memory)"
> + config VMSPLIT_2G
> + bool "2G/2G user/kernel split"
> + config VMSPLIT_1G
> + bool "1G/3G user/kernel split"
> +endchoice
> +
> +config PAGE_OFFSET
> + hex
> + default 0xB0000000 if VMSPLIT_3G_OPT
> + default 0x78000000 if VMSPLIT_2G
> + default 0x40000000 if VMSPLIT_1G
> + default 0xC0000000
> +
> config HIGHMEM
> bool
> depends on HIGHMEM64G || HIGHMEM4G
> diff -u --recursive --new-file --exclude='.*'
> linux-2.6.15/include/asm-i386/page.h linux/include/asm-i386/page.h
> --- linux-2.6.15/include/asm-i386/page.h 2006-01-02 22:21:10.000000000
> -0500
> +++ linux/include/asm-i386/page.h 2006-01-10 12:04:56.000000000 -0500
> @@ -110,10 +110,10 @@
> #endif /* __ASSEMBLY__ */
>
> #ifdef __ASSEMBLY__
> -#define __PAGE_OFFSET (0xC0000000)
> +#define __PAGE_OFFSET CONFIG_PAGE_OFFSET
> #define __PHYSICAL_START CONFIG_PHYSICAL_START
> #else
> -#define __PAGE_OFFSET (0xC0000000UL)
> +#define __PAGE_OFFSET ((unsigned long)CONFIG_PAGE_OFFSET)
> #define __PHYSICAL_START ((unsigned long)CONFIG_PHYSICAL_START)
> #endif
> #define __KERNEL_START (__PAGE_OFFSET + __PHYSICAL_START)
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Tue, Jan 10 2006, Mark Lord wrote:
> Okay, fixed the ordering of the "default" lines
> so that the Kconfig actually works correctly.
>
> Best for Andrew to soak this one in -mm.
>
> Signed-off-by: Mark Lord <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
--
Jens Axboe
(please don't drop people from the cc list)
On Tue, Jan 10 2006, Bernd Eckenfels wrote:
> Mark Lord <[email protected]> wrote:
> > So, the patch would now look like this:
>
> can we please state something what the 3G_OPT is suppsoed to do? Is
> this "optimzed for 1GB Real RAM"? Should this be something like "2.5G"
> instead?
Hmm I thought it was obvious with the description in paranthesis after
the option. Basically the option is just an optimized default for 1GB of
RAM, like the 2G option is tailored for 2GB of low mem on a 2GB machine.
The reason the option exists is of course to leave the default at the
older not-so-great option for 1G of RAM.
--
Jens Axboe
On Tue, Jan 10, 2006 at 08:42:00PM +0100, Jens Axboe wrote:
> Hmm I thought it was obvious with the description in paranthesis after
> the option. Basically the option is just an optimized default for 1GB of
> RAM, like the 2G option is tailored for 2GB of low mem on a 2GB machine.
The description was (for full 1Gb Low Memory) and not (optimized for 1GB
physical RAM) which would be more obvious, yes. However the text could still
explain the consequences.
Gruss
Bernd
On Tue, Jan 10 2006, Bernd Eckenfels wrote:
> On Tue, Jan 10, 2006 at 08:42:00PM +0100, Jens Axboe wrote:
> > Hmm I thought it was obvious with the description in paranthesis after
> > the option. Basically the option is just an optimized default for 1GB of
> > RAM, like the 2G option is tailored for 2GB of low mem on a 2GB machine.
>
> The description was (for full 1Gb Low Memory) and not (optimized for 1GB
> physical RAM) which would be more obvious, yes. However the text could still
> explain the consequences.
To me the former is clearer, it tells you that you have one full gig of
low memory. But maybe that's just me.
--
Jens Axboe
>2G/2G is not the only viable alternative. On my 1GB x86 box I'm
>using "lowmem1g" patches for both 2.4 and 2.6, which results in
>2.75G for user-space. I'm sure others have other preferences.
>Any standard option for this should either have several hard-coded
>alternatives, or should support arbitrary values (within reason).
>
>(See http://www.csd.uu.se/~mikpe/linux/patches/*/patch-i386-lowmem1g-*
>if you're interested.)
Hm, Con Kolivas also provided a lowmem1g patch in his set...
Jan Engelhardt
--
On Maw, 2006-01-10 at 09:56 -0700, Jeff V. Merkey wrote:
> RH ES uses 4:4 which is ideal and superior to this hack.
Its a non trivial trade-off. 4/4 lets you run very large physical memory
systems much more efficiently than usual but you pay a cost on syscalls
and some other events when using the majority of processors. The 4/4
tricks also give most emulations (eg Qemu) serious heartburn trying to
emulate %cr3 reloading via mmap and other interfaces with high overhead
in relative terms.
Of course AMD64 kind of shot the problem in the head once and for all.
Alan
Alan Cox wrote:
>On Maw, 2006-01-10 at 09:56 -0700, Jeff V. Merkey wrote:
>
>
>>RH ES uses 4:4 which is ideal and superior to this hack.
>>
>>
>
>Its a non trivial trade-off. 4/4 lets you run very large physical memory
>systems much more efficiently than usual but you pay a cost on syscalls
>and some other events when using the majority of processors. The 4/4
>tricks also give most emulations (eg Qemu) serious heartburn trying to
>emulate %cr3 reloading via mmap and other interfaces with high overhead
>in relative terms.
>
>Of course AMD64 kind of shot the problem in the head once and for all.
>
>
>
Yep, they sure did. Seriously, the 4:4 option should also be present
along with 3:1 and 2:2
splits. You should merge your RH work into this patch and allow both.
It would save me one less
patch to maintain off the tree.
Alan, you're the man.
:-)
Jeff
>Alan
>
>
>
>
On Tue, Jan 10 2006, Jeff V. Merkey wrote:
> Alan Cox wrote:
>
> >On Maw, 2006-01-10 at 09:56 -0700, Jeff V. Merkey wrote:
> >
> >
> >>RH ES uses 4:4 which is ideal and superior to this hack.
> >>
> >>
> >
> >Its a non trivial trade-off. 4/4 lets you run very large physical memory
> >systems much more efficiently than usual but you pay a cost on syscalls
> >and some other events when using the majority of processors. The 4/4
> >tricks also give most emulations (eg Qemu) serious heartburn trying to
> >emulate %cr3 reloading via mmap and other interfaces with high overhead
> >in relative terms.
> >
> >Of course AMD64 kind of shot the problem in the head once and for all.
> >
> >
> >
>
> Yep, they sure did. Seriously, the 4:4 option should also be present
> along with 3:1 and 2:2
> splits. You should merge your RH work into this patch and allow both.
> It would save me one less
> patch to maintain off the tree.
You can't compare the two patches, saying that 4:4 should go in because
configurable page offsets is merged is nonsense.
Note that I'm not advocating against 4:4 as such, I have no real
oppinion on that. It has its uses for sure, while it comes with a cost
for others.
--
Jens Axboe
Jens Axboe wrote:
>On Tue, Jan 10 2006, Jeff V. Merkey wrote:
>
>
>>Alan Cox wrote:
>>
>>
>>
>>>On Maw, 2006-01-10 at 09:56 -0700, Jeff V. Merkey wrote:
>>>
>>>
>>>
>>>
>>>>RH ES uses 4:4 which is ideal and superior to this hack.
>>>>
>>>>
>>>>
>>>>
>>>Its a non trivial trade-off. 4/4 lets you run very large physical memory
>>>systems much more efficiently than usual but you pay a cost on syscalls
>>>and some other events when using the majority of processors. The 4/4
>>>tricks also give most emulations (eg Qemu) serious heartburn trying to
>>>emulate %cr3 reloading via mmap and other interfaces with high overhead
>>>in relative terms.
>>>
>>>Of course AMD64 kind of shot the problem in the head once and for all.
>>>
>>>
>>>
>>>
>>>
>>Yep, they sure did. Seriously, the 4:4 option should also be present
>>along with 3:1 and 2:2
>>splits. You should merge your RH work into this patch and allow both.
>>It would save me one less
>>patch to maintain off the tree.
>>
>>
>
>You can't compare the two patches, saying that 4:4 should go in because
>configurable page offsets is merged is nonsense.
>
>Note that I'm not advocating against 4:4 as such, I have no real
>oppinion on that. It has its uses for sure, while it comes with a cost
>for others.
>
>
>
I agree and I appreciate your recognizing this. As it stands, if I need
4:4 I just ship on ES3 and ES4. the 3:1
patch in the standard kernel is a very good thing, and you are to be
commended for finally getting it in.
P.S. Your bio stuff works great.
Jeff
On Wed, 11 Jan 2006 07:42 am, Jan Engelhardt wrote:
> >2G/2G is not the only viable alternative. On my 1GB x86 box I'm
> >using "lowmem1g" patches for both 2.4 and 2.6, which results in
> >2.75G for user-space. I'm sure others have other preferences.
> >Any standard option for this should either have several hard-coded
> >alternatives, or should support arbitrary values (within reason).
> >
> >(See http://www.csd.uu.se/~mikpe/linux/patches/*/patch-i386-lowmem1g-*
> >if you're interested.)
>
> Hm, Con Kolivas also provided a lowmem1g patch in his set...
I was under the impression that breaking the ABI was a nono and such a patch
would never be considered for mainline. Guess I was wrong. However mine only
offered a split suitable for 1GB of ram whereas this is offering all that and
steak knives too.
Cheers,
Con
On Tue, 10 Jan 2006 14:16:19 -0500, Mark Lord <[email protected]> wrote:
> Okay, fixed the ordering of the "default" lines
> so that the Kconfig actually works correctly.
>
> Best for Andrew to soak this one in -mm.
>
> Signed-off-by: Mark Lord <[email protected]>
>
Working nice on top of 2.6.15-mm2.
Even with the 'evil binary' nVidia driver 8178 ;).
In fact, I have been using the 1Gb-lowmem patch on -mm and the nVidia
driver since long ago, without problems.
I really like to see this in -mm, and finally in mainline.
My only objection is about the menu entry names and help. I think
people building a kernel would not exactly understand what all this is
about (even I think I don't have it realle clear).
Is there any doc which states clearly somthing like:
- no highmem is the fastest
- 4Gb introduces one indirection, so it is slower...(really ?)
- 64Gb introduces two (PAE ?)
mixed with
- 3G/1G standard maping:
- nor user nor kernel can use any memory above 860 Mb
- user processes (my numbercruncher) can not allocate more than XGb
- 2G/2G: idem:
- max memory seen by my linux system (not kernel, but kernel+userspace,
- how much can I allocate for a single process (how big my problem
can be ?)
If there is already a doc like that, it would be very interesting to
have pointer/link to it in the help text.
For example, when I read this:
+ If the address range available to the kernel is less than the
+ physical memory installed, the remaining memory will be available
+ as "high memory". Accessing high memory is a little more costly
+ than low memory, as it needs to be mapped into the kernel first.
Does this mean that with 3/1 standard split, I still can use the lost
128 Mb for something ? I though I can't.
Don't be too hard with me, just anxious to finally understand this...
--
J.A. Magallon <jamagallon()able!es> \ Software is like sex:
werewolf!able!es \ It's better when it's free
Mandriva Linux release 2006.1 (Cooker) for i586
Linux 2.6.15-jam2 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1))
Jens Axboe <[email protected]> wrote:
> To me the former is clearer, it tells you that you have one full gig of
> low memory. But maybe that's just me.
It still does not describe what the consequences, especiall the difference
to the non-optimized case is. When do you want to use it, and when not.
Gruss
Bernd
On Wed, Jan 11 2006, Bernd Eckenfels wrote:
> Jens Axboe <[email protected]> wrote:
> > To me the former is clearer, it tells you that you have one full gig of
> > low memory. But maybe that's just me.
>
> It still does not describe what the consequences, especiall the difference
> to the non-optimized case is. When do you want to use it, and when not.
Please, I told you before in this thread, don't drop people from the cc
list!
But it does explain that, it says full 1g low memory support. So you
have one full gig of low memory, compared to the default setting.
Describing this in painstakingly more detail it of course possible, but
as the help text mentions, if you are not sure then you better leave
this option at its default setting.
--
Jens Axboe
On Wed, Jan 11 2006, J.A. Magallon wrote:
> I really like to see this in -mm, and finally in mainline.
It's in -mm now.
> My only objection is about the menu entry names and help. I think
> people building a kernel would not exactly understand what all this is
> about (even I think I don't have it realle clear).
If they don't, they should not touch the option...
> Is there any doc which states clearly somthing like:
>
> - no highmem is the fastest
> - 4Gb introduces one indirection, so it is slower...(really ?)
> - 64Gb introduces two (PAE ?)
>
> mixed with
>
> - 3G/1G standard maping:
> - nor user nor kernel can use any memory above 860 Mb
> - user processes (my numbercruncher) can not allocate more than XGb
> - 2G/2G: idem:
> - max memory seen by my linux system (not kernel, but kernel+userspace,
> - how much can I allocate for a single process (how big my problem
> can be ?)
>
> If there is already a doc like that, it would be very interesting to
> have pointer/link to it in the help text.
I think the help text is good enough, but it would definitely be nice
with a fuller description of what exactly low and high memory is and the
implications of the various settings.
> For example, when I read this:
>
> + If the address range available to the kernel is less than the
> + physical memory installed, the remaining memory will be available
> + as "high memory". Accessing high memory is a little more costly
> + than low memory, as it needs to be mapped into the kernel first.
>
> Does this mean that with 3/1 standard split, I still can use the lost
> 128 Mb for something ? I though I can't.
It tells you that the remaining memory is available as high memory, so
it's not lost of course. It also tells you that accessing this high
memory is indeed possible, but it's a little more costly since it needs
to be mapped temporarily into the kernel address space.
> Don't be too hard with me, just anxious to finally understand this...
No worries, perhaps you will be the one writing the Documentation/ bit
to accompany this then :-)
Basically the option boils down to how much virtual address space you
want to assign to the kernel and user space. The kernel can always
access all of memory, but in some cases part of that memory will be
available as high memory that needs to be mapped in first (see
references to kmap() and kmap_atomic() in the kernel). So whether
changing the mapping or using highmem is the best option for you,
depends entirely on what you run on that machine. If you require a huge
user address space, then you don't want to change away from the 3/1
user/kernel default setting. However, if you don't need the full 3G of
adress space to user apps, then you are better off increasing the kernel
address space range to get rid of the high memory mapping.
For the "typical" case of 1GB machine, using the _OPT setting to just
move the offset slightly is a really good choice as it only removes a
little bit of the user address range.
--
Jens Axboe
Is there any benefit/point to enabling HIGHMEM when using this patch,
assuming that physical memory is smaller than the address space? For
example, when using VMSPLIT_3G_OPT on a box with 1G of memory.
Thanx!
Greg Norris wrote:
> Is there any benefit/point to enabling HIGHMEM when using this patch,
> assuming that physical memory is smaller than the address space? For
> example, when using VMSPLIT_3G_OPT on a box with 1G of memory.
No. In fact, there should be a (very) tiny performance gain
by NOT enabling HIGHMEM -- things like kmap() should get simpler.
Cheers
On Wed, Jan 11, 2006 at 12:13:06PM -0500, Mark Lord wrote:
> Greg Norris wrote:
> >Is there any benefit/point to enabling HIGHMEM when using this patch,
> >assuming that physical memory is smaller than the address space? For
> >example, when using VMSPLIT_3G_OPT on a box with 1G of memory.
>
> No. In fact, there should be a (very) tiny performance gain
> by NOT enabling HIGHMEM -- things like kmap() should get simpler.
That's essentially what I thought, but it's nice to have some
verification. Thanx!
On Tue, Jan 10, 2006 at 02:16:19PM -0500, Mark Lord wrote:
> Okay, fixed the ordering of the "default" lines
> so that the Kconfig actually works correctly.
>
> Best for Andrew to soak this one in -mm.
glad to see that the linux kernel is now ready for
the 'idea' I submitted a patch[1] for, more than a
year ago -- which unfortunately went unnoticed back
then ...
cheers to Jens and Mark!
best,
Herbert
[1] http://lkml.org/lkml/2004/10/9/126
> Signed-off-by: Mark Lord <[email protected]>
>
> diff -u --recursive --new-file --exclude='.*'
> linux-2.6.15/arch/i386/Kconfig linux/arch/i386/Kconfig
> --- linux-2.6.15/arch/i386/Kconfig 2006-01-02 22:21:10.000000000 -0500
> +++ linux/arch/i386/Kconfig 2006-01-10 12:02:40.000000000 -0500
> @@ -448,6 +448,43 @@
>
> endchoice
>
> +choice
> + depends on EXPERIMENTAL && !X86_PAE
> + prompt "Memory split"
> + default VMSPLIT_3G
> + help
> + Select the desired split between kernel and user memory.
> +
> + If the address range available to the kernel is less than the
> + physical memory installed, the remaining memory will be available
> + as "high memory". Accessing high memory is a little more costly
> + than low memory, as it needs to be mapped into the kernel first.
> + Note that increasing the kernel address space limits the range
> + available to user programs, making the address space there
> + tighter. Selecting anything other than the default 3G/1G split
> + will also likely make your kernel incompatible with binary-only
> + kernel modules.
> +
> + If you are not absolutely sure what you are doing, leave this
> + option alone!
> +
> + config VMSPLIT_3G
> + bool "3G/1G user/kernel split"
> + config VMSPLIT_3G_OPT
> + bool "3G/1G user/kernel split (for full 1G low memory)"
> + config VMSPLIT_2G
> + bool "2G/2G user/kernel split"
> + config VMSPLIT_1G
> + bool "1G/3G user/kernel split"
> +endchoice
> +
> +config PAGE_OFFSET
> + hex
> + default 0xB0000000 if VMSPLIT_3G_OPT
> + default 0x78000000 if VMSPLIT_2G
> + default 0x40000000 if VMSPLIT_1G
> + default 0xC0000000
> +
> config HIGHMEM
> bool
> depends on HIGHMEM64G || HIGHMEM4G
> diff -u --recursive --new-file --exclude='.*'
> linux-2.6.15/include/asm-i386/page.h linux/include/asm-i386/page.h
> --- linux-2.6.15/include/asm-i386/page.h 2006-01-02
> 22:21:10.000000000 -0500
> +++ linux/include/asm-i386/page.h 2006-01-10 12:04:56.000000000 -0500
> @@ -110,10 +110,10 @@
> #endif /* __ASSEMBLY__ */
>
> #ifdef __ASSEMBLY__
> -#define __PAGE_OFFSET (0xC0000000)
> +#define __PAGE_OFFSET CONFIG_PAGE_OFFSET
> #define __PHYSICAL_START CONFIG_PHYSICAL_START
> #else
> -#define __PAGE_OFFSET (0xC0000000UL)
> +#define __PAGE_OFFSET ((unsigned long)CONFIG_PAGE_OFFSET)
> #define __PHYSICAL_START ((unsigned long)CONFIG_PHYSICAL_START)
> #endif
> #define __KERNEL_START (__PAGE_OFFSET + __PHYSICAL_START)
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>>>>> On Wed, 01 Feb 2006, Herbert Poetzl wrote:
> glad to see that the linux kernel is now ready for the 'idea'
> I submitted a patch[1] for, more than a year ago -- which
> unfortunately went unnoticed back then ...
> [1] http://lkml.org/lkml/2004/10/9/126
Hm, I wonder if we could have a more fine-grained choice of the
boundary? There are also systems around with e.g. 1.25G or 1.5G of
main memory.
The following patch is against 2.6.16-rc1-mm4 and allows for steps of
256M between 1G and 3G.
Cheers
Uli
Signed-off-by: Ulrich Mueller <[email protected]>
diff -Nur linux-2.6.16-rc1-mm4.orig/arch/i386/Kconfig linux-2.6.16-rc1-mm4/arch/i386/Kconfig
--- linux-2.6.16-rc1-mm4.orig/arch/i386/Kconfig 2006-01-31 16:43:11.000000000 +0100
+++ linux-2.6.16-rc1-mm4/arch/i386/Kconfig 2006-01-31 17:16:44.000000000 +0100
@@ -470,18 +470,33 @@
config VMSPLIT_3G
bool "3G/1G user/kernel split"
- config VMSPLIT_3G_OPT
- bool "3G/1G user/kernel split (for full 1G low memory)"
+ config VMSPLIT_2G75
+ bool "2.75G/1.25G user/kernel split (for full 1G low memory)"
+ config VMSPLIT_2G5
+ bool "2.5G/1.5G user/kernel split"
+ config VMSPLIT_2G25
+ bool "2.25G/1.75G user/kernel split"
config VMSPLIT_2G
bool "2G/2G user/kernel split"
+ config VMSPLIT_1G75
+ bool "1.75G/2.25G user/kernel split (for full 2G low memory)"
+ config VMSPLIT_1G5
+ bool "1.5G/2.5G user/kernel split"
+ config VMSPLIT_1G25
+ bool "1.25G/2/75G user/kernel split"
config VMSPLIT_1G
bool "1G/3G user/kernel split"
endchoice
config PAGE_OFFSET
hex
- default 0xB0000000 if VMSPLIT_3G_OPT
- default 0x78000000 if VMSPLIT_2G
+ default 0xB0000000 if VMSPLIT_2G75
+ default 0xA0000000 if VMSPLIT_2G5
+ default 0x90000000 if VMSPLIT_2G25
+ default 0x80000000 if VMSPLIT_2G
+ default 0x70000000 if VMSPLIT_1G75
+ default 0x60000000 if VMSPLIT_1G5
+ default 0x50000000 if VMSPLIT_1G25
default 0x40000000 if VMSPLIT_1G
default 0xC0000000
>
>> glad to see that the linux kernel is now ready for the 'idea'
>> I submitted a patch[1] for, more than a year ago -- which
>> unfortunately went unnoticed back then ...
BTW, I patched my local 2.6.16-rc1 with that patch and I run off
VMSPLIT_3G_OPT quite well without problems. (Well, VMware gets it wrong, as
usual, but nothing that I could not solved.) Even though I do not even have
1 G but just 768 ;-)
This sort of testing reminds me of Linus's 100->1000 Hz change
("I chose 1000 originally partly as a way to make sure that people that
assumed HZ was 100 would get a swift kick in the pants.")
Could we also do that with VMSPLIT?
("Let's choose VMSPLIT_2G to make sure that i386-people that assumed
PAGE_OFFSET was 0xC0000000 would get...")
>Hm, I wonder if we could have a more fine-grained choice of the
>boundary? There are also systems around with e.g. 1.25G or 1.5G of
>main memory.
>
Maybe something like:
config VMSPLIT_1G
bool "1G/3G user/kernel split"
config VMSPLIT_X
bool "Manual split"
endchoice
config VMSPLIT_MANUAL
depends on VMSPLIT_X
hex
default 0xC0000000
prompt "Memory split address (must be aligned to 4096)"
And in include/asm/page.h:
#ifdef CONFIG_VMSPLIT_MANUAL
#define __PAGE_OFFSET ((unsigned long)CONFIG_VMSPLIT_MANUAL)
#else
#define __PAGE_OFFSET ((unsigned long)CONFIG_PAGE_OFFSET)
#endif
Not perfect, but a start.
Jan Engelhardt
--
Jan Engelhardt wrote:
>
> This sort of testing reminds me of Linus's 100->1000 Hz change
> ("I chose 1000 originally partly as a way to make sure that people that
> assumed HZ was 100 would get a swift kick in the pants.")
>
> Could we also do that with VMSPLIT?
> ("Let's choose VMSPLIT_2G to make sure that i386-people that assumed
> PAGE_OFFSET was 0xC0000000 would get...")
Mmm.. bad idea. As much as I'd like the default to be 3GB_OPT, that would
be a big impact to userspace, and there's no point in breaking everyone's
machines when advanced users can just reconfig/recompile to get what they want.
>> Hm, I wonder if we could have a more fine-grained choice of the
>> boundary? There are also systems around with e.g. 1.25G or 1.5G of
>> main memory.
>>
> Maybe something like:
> config VMSPLIT_1G
> bool "1G/3G user/kernel split"
> config VMSPLIT_X
> bool "Manual split"
> endchoice
...
Yes, that looks like a good idea.
Cheers
>>>>> On Fri, 03 Feb 2006, Mark Lord wrote:
>>> Hm, I wonder if we could have a more fine-grained choice of the
>>> boundary? There are also systems around with e.g. 1.25G or 1.5G of
>>> main memory.
>> Maybe something like:
>> config VMSPLIT_1G
>> bool "1G/3G user/kernel split"
>> config VMSPLIT_X
>> bool "Manual split"
>> endchoice
> ...
> Yes, that looks like a good idea.
Couldn't this still be implemented entirely in Kconfig, without
modifying page.h? Like in the following example:
[...]
config VMSPLIT_1G
bool "1G/3G user/kernel split"
config VMSPLIT_X
bool "Manual split"
endchoice
config PAGE_OFFSET
hex
range 0x40000000 0xC0000000
prompt "Memory split address (must be aligned to 4096)" if VMSPLIT_X
[...]
default 0x40000000 if VMSPLIT_1G
default 0xC0000000
Cheers
Uli
On Fri, Feb 03 2006, Mark Lord wrote:
> >Maybe something like:
> > config VMSPLIT_1G
> > bool "1G/3G user/kernel split"
> > config VMSPLIT_X
> > bool "Manual split"
> >endchoice
> ...
>
> Yes, that looks like a good idea.
Sounds like a huge mess to me. The manual kernel buffer size
configuration was bad enough, and this is much trickier. People have
enough problems even understanding what the option does, lets please
leave it as is with a few select options.
--
Jens Axboe
>
> Mmm.. bad idea. As much as I'd like the default to be 3GB_OPT, that would
> be a big impact to userspace, and there's no point in breaking everyone's
> machines when advanced users can just reconfig/recompile to get what they want.
>
What userspace programs do depend on it?
Jan Engelhardt
--
>
>> Yes, that looks like a good idea.
>
>Couldn't this still be implemented entirely in Kconfig, without
>modifying page.h? Like in the following example:
>
> [...]
> config VMSPLIT_1G
> bool "1G/3G user/kernel split"
> config VMSPLIT_X
> bool "Manual split"
>endchoice
>
>config PAGE_OFFSET
> hex
> range 0x40000000 0xC0000000
> prompt "Memory split address (must be aligned to 4096)" if VMSPLIT_X
> [...]
> default 0x40000000 if VMSPLIT_1G
> default 0xC0000000
>
Well, if kconfig is able to do that, the better.
Jan Engelhardt
--
Jan Engelhardt wrote:
>
> What userspace programs do depend on it?
That *is* the question, isn't it.
We simply don't know, other than that this is
a visible change to any program that cares.
Gratuitis breakage of userspace is pointless.
Empirically speaking, everything I use is working fine
here with the 2G/2G split, but that's just not good enough
reason to mindlessly break other people's userspace.
Cheers
On Sat, 04 Feb 2006 08:57:23 -0500, Mark Lord <[email protected]> wrote:
> Jan Engelhardt wrote:
> >
> > What userspace programs do depend on it?
>
> That *is* the question, isn't it.
> We simply don't know, other than that this is
> a visible change to any program that cares.
>
> Gratuitis breakage of userspace is pointless.
>
> Empirically speaking, everything I use is working fine
> here with the 2G/2G split, but that's just not good enough
> reason to mindlessly break other people's userspace.
>
The only thing I have seen that depends explicitely on this setting
is valgrind, but I think that CVS included some kind of runtime
configuration/selection.
--
J.A. Magallon <jamagallon()able!es> \ Software is like sex:
werewolf!able!es \ It's better when it's free
Mandriva Linux release 2006.1 (Cooker) for i586
Linux 2.6.15-jam8 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1))
On Sat, 2006-02-04 at 12:05 +0100, Jan Engelhardt wrote:
> >
> > Mmm.. bad idea. As much as I'd like the default to be 3GB_OPT, that would
> > be a big impact to userspace, and there's no point in breaking everyone's
> > machines when advanced users can just reconfig/recompile to get what they want.
> >
> What userspace programs do depend on it?
there is a lot of userspace that assumes they can do 2Gb or even close
to 3Gb of memory allocations. Databases, java, basically anything with
threads. Sure for most of these its a configuration option to reduce
this, but that still doesn't mean it's a good idea to change from the
existing behavior...
On 1/11/06, Mark Lord <[email protected]> wrote:
> Greg Norris wrote:
> > Is there any benefit/point to enabling HIGHMEM when using this patch,
> > assuming that physical memory is smaller than the address space? For
> > example, when using VMSPLIT_3G_OPT on a box with 1G of memory.
>
> No. In fact, there should be a (very) tiny performance gain
> by NOT enabling HIGHMEM -- things like kmap() should get simpler.
Actually, IIRC if you have an x86 CPU (or x86-64 running a 32-bit
kernel) which has a no-execute bit, to support that (i.e. to have a
more secure system) you need to use HIGHMEM64G, no matter how much or
how little RAM you have.
--
-Barry K. Nathan <[email protected]>
Jan Engelhardt <[email protected]> wrote:
>> Mmm.. bad idea. As much as I'd like the default to be 3GB_OPT, that would
>> be a big impact to userspace, and there's no point in breaking everyone's
>> machines when advanced users can just reconfig/recompile to get what they
>> want.
>>
> What userspace programs do depend on it?
As far as I understand, user mode linux.
--
Ich danke GMX daf?r, die Verwendung meiner Adressen mittels per SPF
verbreiteten L?gen zu sabotieren.
>> >
>> > Mmm.. bad idea. As much as I'd like the default to be 3GB_OPT, that would
>> > be a big impact to userspace, and there's no point in breaking everyone's
>> > machines when advanced users can just reconfig/recompile to get what they want.
>> >
>> What userspace programs do depend on it?
>
>there is a lot of userspace that assumes they can do 2Gb or even close
>to 3Gb of memory allocations. Databases, java, basically anything with
>threads. Sure for most of these its a configuration option to reduce
>this, but that still doesn't mean it's a good idea to change from the
>existing behavior...
>
Not to mention that these (almost(*)) fail anyway when you have less than 2
GB of RAM.
(*) when finally writing to overcommitted memory
Yuck. That sounds like they depend on 64G/64bit allocations on 4G/32bit
machines.
Jan Engelhardt
--
On Sun, 2006-02-05 at 22:14 +0100, Jan Engelhardt wrote:
> >> >
> >> > Mmm.. bad idea. As much as I'd like the default to be 3GB_OPT, that would
> >> > be a big impact to userspace, and there's no point in breaking everyone's
> >> > machines when advanced users can just reconfig/recompile to get what they want.
> >> >
> >> What userspace programs do depend on it?
> >
> >there is a lot of userspace that assumes they can do 2Gb or even close
> >to 3Gb of memory allocations. Databases, java, basically anything with
> >threads. Sure for most of these its a configuration option to reduce
> >this, but that still doesn't mean it's a good idea to change from the
> >existing behavior...
> >
> Not to mention that these (almost(*)) fail anyway when you have less than 2
> GB of RAM.
it's not really overcommit... it can also be file mmaps or shared mmaps
of say tmpfs files (the later is common with oracle actually)
On Sun, Feb 05, 2006 at 09:20:55PM +0100, Bodo Eggert wrote:
> As far as I understand, user mode linux.
Nope, not any more.
UML used to load at the top of the user address space, hence a
dependency on a 3/1 split, but the default config now has UML loading
lower, where other processes load.
Jeff
>> >> What userspace programs do depend on it?
>> >
>> >there is a lot of userspace that assumes they can do 2Gb or even close
>> >to 3Gb of memory allocations. Databases, java, basically anything with
>> >threads. Sure for most of these its a configuration option to reduce
>> >this, but that still doesn't mean it's a good idea to change from the
>> >existing behavior...
>> >
>> Not to mention that these (almost(*)) fail anyway when you have less than 2
>> GB of RAM.
>
>it's not really overcommit... it can also be file mmaps or shared mmaps
>of say tmpfs files (the later is common with oracle actually)
>
So, just as I did in the sample patch, the manual split shall depend on
EMBEDDED. Those who run fat databases with big malloc/mmap assumptions
don't probably belong to the group using CONFIG_EMBEDDED.
Jan Engelhardt
--
On Mon, Feb 06, 2006 at 03:56:34PM +0100, Jan Engelhardt wrote:
> >> >> What userspace programs do depend on it?
> >> >
> >> >there is a lot of userspace that assumes they can do 2Gb or even close
> >> >to 3Gb of memory allocations. Databases, java, basically anything with
> >> >threads. Sure for most of these its a configuration option to reduce
> >> >this, but that still doesn't mean it's a good idea to change from the
> >> >existing behavior...
> >> >
> >> Not to mention that these (almost(*)) fail anyway when you have less than 2
> >> GB of RAM.
> >
> >it's not really overcommit... it can also be file mmaps or shared mmaps
> >of say tmpfs files (the later is common with oracle actually)
>
> So, just as I did in the sample patch, the manual split shall depend on
> EMBEDDED. Those who run fat databases with big malloc/mmap assumptions
> don't probably belong to the group using CONFIG_EMBEDDED.
*sigh* well, the embeded folks are unlikely to have 1-3GB
why not use EXPERIMENTAL if you 'think' the option will
hurt the database folks who do not know to configure their
kernel ...
best,
Herbert
> Jan Engelhardt
> --
On Feb 6, 2006, at 6:41 PM, Herbert Poetzl wrote:
> On Mon, Feb 06, 2006 at 03:56:34PM +0100, Jan Engelhardt wrote:
>>>>>> What userspace programs do depend on it?
>>>>>
>>>>> there is a lot of userspace that assumes they can do 2Gb or
>>>>> even close
>>>>> to 3Gb of memory allocations. Databases, java, basically
>>>>> anything with
>>>>> threads. Sure for most of these its a configuration option to
>>>>> reduce
>>>>> this, but that still doesn't mean it's a good idea to change
>>>>> from the
>>>>> existing behavior...
>>>>>
>>>> Not to mention that these (almost(*)) fail anyway when you have
>>>> less than 2
>>>> GB of RAM.
>>>
>>> it's not really overcommit... it can also be file mmaps or shared
>>> mmaps
>>> of say tmpfs files (the later is common with oracle actually)
>>
>> So, just as I did in the sample patch, the manual split shall
>> depend on
>> EMBEDDED. Those who run fat databases with big malloc/mmap
>> assumptions
>> don't probably belong to the group using CONFIG_EMBEDDED.
>
> *sigh* well, the embeded folks are unlikely to have 1-3GB
> why not use EXPERIMENTAL if you 'think' the option will
> hurt the database folks who do not know to configure their
> kernel ...
Embedded is not the same thing as small. 1GB is what the system I
work on uses and it is "embedded". This new VMSPLIT is great, BTW.
--
Mark Rustad, [email protected]
On Tue, 2006-02-07 at 01:41 +0100, Herbert Poetzl wrote:
> On Mon, Feb 06, 2006 at 03:56:34PM +0100, Jan Engelhardt wrote:
[...]
> >
> > So, just as I did in the sample patch, the manual split shall depend on
> > EMBEDDED. Those who run fat databases with big malloc/mmap assumptions
> > don't probably belong to the group using CONFIG_EMBEDDED.
>
> *sigh* well, the embeded folks are unlikely to have 1-3GB
ACK.
But don't be confused by the naming: CONFIG_EMBEDDED nowadays means
"options for people who know really what they do". It came originally
from the embedded world but applies now also to others.
No one has come up with a better option name up to now ...
> why not use EXPERIMENTAL if you 'think' the option will
> hurt the database folks who do not know to configure their
> kernel ...
EXPERIMENTAL means (at least for me) "this may not really work" and is
IMHO orthogonal to CONFIG_EMBEDDED.
Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services
On Tue, Feb 07, 2006 at 10:38:05AM +0100, Bernd Petrovitsch wrote:
> On Tue, 2006-02-07 at 01:41 +0100, Herbert Poetzl wrote:
> > On Mon, Feb 06, 2006 at 03:56:34PM +0100, Jan Engelhardt wrote:
> [...]
> > >
> > > So, just as I did in the sample patch, the manual split shall depend on
> > > EMBEDDED. Those who run fat databases with big malloc/mmap assumptions
> > > don't probably belong to the group using CONFIG_EMBEDDED.
> >
> > *sigh* well, the embeded folks are unlikely to have 1-3GB
>
> ACK.
> But don't be confused by the naming: CONFIG_EMBEDDED nowadays means
> "options for people who know really what they do". It came originally
> from the embedded world but applies now also to others.
> No one has come up with a better option name up to now ...
>...
It's slightly different:
EMBEDDED is limited to options allowing additional space savings for
machines with strong space limits.
If you have enough RAM that VMSPLIT matters, you shouldn't enable
EMBEDDED.
What we could do is to add an additional ADVANCED_USER option that hides
options like VMSPLIT or the NAPI options for net drivers.
This would result in the following (the text for ADVANCED_USER isn't
good, but you get the idea):
config ADVANCED_USER
bool "ask questions that require a deeper knowledge of the kernel"
config EXPERIMENTAL
bool "Prompt for development and/or incomplete code/drivers"
depends on ADVANCED_USER
menuconfig EMBEDDED
bool "Configure standard kernel features (for small systems)"
depends on ADVANCED_USER
> Bernd
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
>>>>> On Tue, 7 Feb 2006, Adrian Bunk wrote:
> What we could do is to add an additional ADVANCED_USER option that
> hides options like VMSPLIT or the NAPI options for net drivers.
> config ADVANCED_USER
> bool "ask questions that require a deeper knowledge of the kernel"
> config EXPERIMENTAL
> bool "Prompt for development and/or incomplete code/drivers"
> depends on ADVANCED_USER
Shouldn't this be the other way around, i.e. ADVANCED_USER depending
on EXPERIMENTAL?
If you implement it as above, people will set ADVANCED_USER to "n" in
oldconfig and then be surprised that all experimental drivers are
gone.
On Tue, Feb 07, 2006 at 03:05:52PM +0100, Ulrich Mueller wrote:
> >>>>> On Tue, 7 Feb 2006, Adrian Bunk wrote:
>
> > What we could do is to add an additional ADVANCED_USER option that
> > hides options like VMSPLIT or the NAPI options for net drivers.
>
> > config ADVANCED_USER
> > bool "ask questions that require a deeper knowledge of the kernel"
>
> > config EXPERIMENTAL
> > bool "Prompt for development and/or incomplete code/drivers"
> > depends on ADVANCED_USER
>
> Shouldn't this be the other way around, i.e. ADVANCED_USER depending
> on EXPERIMENTAL?
No, if there's a dependency between the two, then in this direction.
> If you implement it as above, people will set ADVANCED_USER to "n" in
> oldconfig and then be surprised that all experimental drivers are
> gone.
What about no dependency between ADVANCED_USER and EXPERIMENTAL?
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
>> > config ADVANCED_USER
>> > bool "ask questions that require a deeper knowledge of the kernel"
>>
>> > config EXPERIMENTAL
>> > bool "Prompt for development and/or incomplete code/drivers"
>> > depends on ADVANCED_USER
>>
>> Shouldn't this be the other way around, i.e. ADVANCED_USER depending
>> on EXPERIMENTAL?
>
>No, if there's a dependency between the two, then in this direction.
ACK. Advanced code is not necessarily "incomplete code/drivers".
>> If you implement it as above, people will set ADVANCED_USER to "n" in
>> oldconfig and then be surprised that all experimental drivers are
>> gone.
>
>What about no dependency between ADVANCED_USER and EXPERIMENTAL?
>
Sounds good.
Jan Engelhardt
--