2002-01-12 05:42:13

by Randy Hron

[permalink] [raw]
Subject: [PATCH] 1-2-3 GB


Patch to have 1-3 GB of virtual memory and not show up as highmem:

Tested on uniprocessor Athlon with 1024 MB RAM and 1027 MB swap.
Caused no LTP (ltp-20020108) regressions.
Did a test like http://marc.theaimsgroup.com/?l=linux-kernel&m=101064072924424&w=2
This time the test completed in 51 minutes (11% faster) and I had setiathome running
the whole time and listened to 12 mp3s sampled at 128k.

dmesg|grep Mem
Memory: 1029848k/1048512k available (1054k kernel code, 18276k reserved, 260k data, 240k init, 0k highmem)

egrep '^CONFIG_HIGH|GB' /usr/src/linux/.config
CONFIG_HIGHMEM4G=y
CONFIG_HIGHMEM=y
# CONFIG_1GB is not set
CONFIG_2GB=y
# CONFIG_3GB is not set
# CONFIG_05GB is not set

uname -a
Linux rushmore 2.4.18pre2aa2-2g #2 Fri Jan 11 22:25:55 EST 2002 i686 unknown

Derived from:
htty://kernelnewbies.org/kernels/rh72/SOURCES/linux-2.4.2-vm-1-2-3-gbyte.patch
Some parts of the patch above are already in the mainline trees.

Patch below applies to 2.4.18pre2aa2:

diff -nur linux.aa2/Rules.make linux/Rules.make
--- linux.aa2/Rules.make Tue Mar 6 22:31:01 2001
+++ linux/Rules.make Fri Jan 11 22:00:57 2002
@@ -212,6 +212,7 @@
#
# Added the SMP separator to stop module accidents between uniprocessor
# and SMP Intel boxes - AC - from bits by Michael Chastain
+# Added separator for different PAGE_OFFSET memory models - Ingo.
#

ifdef CONFIG_SMP
@@ -220,6 +221,22 @@
genksyms_smp_prefix :=
endif

+ifdef CONFIG_2GB
+ifdef CONFIG_SMP
+ genksyms_smp_prefix := -p smp_2gig_
+else
+ genksyms_smp_prefix := -p 2gig_
+endif
+endif
+
+ifdef CONFIG_3GB
+ifdef CONFIG_SMP
+ genksyms_smp_prefix := -p smp_3gig_
+else
+ genksyms_smp_prefix := -p 3gig_
+endif
+endif
+
$(MODINCL)/%.ver: %.c
@if [ ! -r $(MODINCL)/$*.stamp -o $(MODINCL)/$*.stamp -ot $< ]; then \
echo '$(CC) $(CFLAGS) $(EXTRA_CFLAGS) -E -D__GENKSYMS__ $<'; \
diff -nur linux.aa2/arch/i386/config.in linux/arch/i386/config.in
--- linux.aa2/arch/i386/config.in Fri Jan 11 20:57:58 2002
+++ linux/arch/i386/config.in Fri Jan 11 22:20:32 2002
@@ -169,7 +169,11 @@
if [ "$CONFIG_HIGHMEM64G" = "y" ]; then
define_bool CONFIG_X86_PAE y
else
- bool '3.5GB user address space' CONFIG_05GB
+ choice 'Maximum Virtual Memory' \
+ "3GB CONFIG_1GB \
+ 2GB CONFIG_2GB \
+ 1GB CONFIG_3GB \
+ 05GB CONFIG_05GB" 3GB
fi
if [ "$CONFIG_NOHIGHMEM" = "y" ]; then
define_bool CONFIG_NO_PAGE_VIRTUAL y
@@ -179,6 +183,7 @@
bool 'HIGHMEM I/O support (EXPERIMENTAL)' CONFIG_HIGHIO
fi

+
bool 'Math emulation' CONFIG_MATH_EMULATION
bool 'MTRR (Memory Type Range Register) support' CONFIG_MTRR
bool 'Symmetric multi-processing support' CONFIG_SMP
diff -nur linux.aa2/include/asm-i386/page_offset.h linux/include/asm-i386/page_offset.h
--- linux.aa2/include/asm-i386/page_offset.h Fri Jan 11 20:57:58 2002
+++ linux/include/asm-i386/page_offset.h Fri Jan 11 21:20:48 2002
@@ -1,6 +1,10 @@
#include <linux/config.h>
-#ifndef CONFIG_05GB
-#define PAGE_OFFSET_RAW 0xC0000000
-#else
+#ifdef CONFIG_05GB
#define PAGE_OFFSET_RAW 0xE0000000
+#elif defined(CONFIG_1GB)
+#define PAGE_OFFSET_RAW 0xC0000000
+#elif defined(CONFIG_2GB)
+#define PAGE_OFFSET_RAW 0x80000000
+#elif defined(CONFIG_3GB)
+#define PAGE_OFFSET_RAW 0x40000000
#endif

--
Randy Hron


2002-01-12 07:33:26

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

Followup to: <[email protected]>
By author: [email protected]
In newsgroup: linux.dev.kernel
> --- linux.aa2/arch/i386/config.in Fri Jan 11 20:57:58 2002
> +++ linux/arch/i386/config.in Fri Jan 11 22:20:32 2002
> @@ -169,7 +169,11 @@
> if [ "$CONFIG_HIGHMEM64G" = "y" ]; then
> define_bool CONFIG_X86_PAE y
> else
> - bool '3.5GB user address space' CONFIG_05GB
> + choice 'Maximum Virtual Memory' \
> + "3GB CONFIG_1GB \
> + 2GB CONFIG_2GB \
> + 1GB CONFIG_3GB \
> + 05GB CONFIG_05GB" 3GB
> fi

Calling this "Maximum Virtual Memory" is misleading at best. This is
best described as "kernel:user split" (3:1, 2:2, 1:3, 3.5:0.5);
"maximum virtual memory" sounds to me a lot like the opposite of what
your parameter is.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>

2002-01-12 11:57:26

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

On Sat, Jan 12, 2002 at 12:45:28AM -0500, [email protected] wrote:
>
> Patch to have 1-3 GB of virtual memory and not show up as highmem:
>
> Tested on uniprocessor Athlon with 1024 MB RAM and 1027 MB swap.
> Caused no LTP (ltp-20020108) regressions.
> Did a test like http://marc.theaimsgroup.com/?l=linux-kernel&m=101064072924424&w=2
> This time the test completed in 51 minutes (11% faster) and I had setiathome running
> the whole time and listened to 12 mp3s sampled at 128k.
>
> dmesg|grep Mem
> Memory: 1029848k/1048512k available (1054k kernel code, 18276k reserved, 260k data, 240k init, 0k highmem)
>
> egrep '^CONFIG_HIGH|GB' /usr/src/linux/.config
> CONFIG_HIGHMEM4G=y
> CONFIG_HIGHMEM=y
> # CONFIG_1GB is not set
> CONFIG_2GB=y
> # CONFIG_3GB is not set
> # CONFIG_05GB is not set
>
> uname -a
> Linux rushmore 2.4.18pre2aa2-2g #2 Fri Jan 11 22:25:55 EST 2002 i686 unknown
>
> Derived from:
> htty://kernelnewbies.org/kernels/rh72/SOURCES/linux-2.4.2-vm-1-2-3-gbyte.patch
> Some parts of the patch above are already in the mainline trees.
>
> Patch below applies to 2.4.18pre2aa2:

for a fileserver (even more if in kernel like tux) it certainly make
sense to have as much direct mapped memory as possible, it is not the
recommended setup for a generic purpose kernel though. So I applied the
patch (except the prefix thing which is distribution specific). thanks,

Andrea

2002-01-12 13:18:51

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

On Fri, Jan 11, 2002 at 11:32:37PM -0800, H. Peter Anvin wrote:
> Followup to: <[email protected]>
> By author: [email protected]
> In newsgroup: linux.dev.kernel
> > --- linux.aa2/arch/i386/config.in Fri Jan 11 20:57:58 2002
> > +++ linux/arch/i386/config.in Fri Jan 11 22:20:32 2002
> > @@ -169,7 +169,11 @@
> > if [ "$CONFIG_HIGHMEM64G" = "y" ]; then
> > define_bool CONFIG_X86_PAE y
> > else
> > - bool '3.5GB user address space' CONFIG_05GB
> > + choice 'Maximum Virtual Memory' \
> > + "3GB CONFIG_1GB \
> > + 2GB CONFIG_2GB \
> > + 1GB CONFIG_3GB \
> > + 05GB CONFIG_05GB" 3GB
> > fi
>
> Calling this "Maximum Virtual Memory" is misleading at best. This is
> best described as "kernel:user split" (3:1, 2:2, 1:3, 3.5:0.5);
> "maximum virtual memory" sounds to me a lot like the opposite of what
> your parameter is.

actually it is really max virtual memory.. but from the user point of
view, user is supposed to care about the virtual memory he can manage,
not about what the kernel will do with the rest. So if the user wants
3GB of virtual memory available to each task he will select 3GB. I
really don't mind if you want to change it from the kernel point of
view, but given it's the user who's supposed to compile it, also the
current patch looks good enough to me.

Andrea

2002-01-12 15:47:02

by Randy Hron

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

> > Derived from:
> > htty://kernelnewbies.org/kernels/rh72/SOURCES/linux-2.4.2-vm-1-2-3-gbyte.patch
> > Some parts of the patch above are already in the mainline trees.
> >
> > Patch below applies to 2.4.18pre2aa2:
>
> for a fileserver (even more if in kernel like tux) it certainly make
> sense to have as much direct mapped memory as possible, it is not the
> recommended setup for a generic purpose kernel though. So I applied the
> patch (except the prefix thing which is distribution specific). thanks,
>
> Andrea

Thanks so much!

--
Randy Hron

2002-01-12 17:27:14

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

Andrea Arcangeli writes:
> On Fri, Jan 11, 2002 at 11:32:37PM -0800, H. Peter Anvin wrote:
>> By author: [email protected]

>>> --- linux.aa2/arch/i386/config.in Fri Jan 11 20:57:58 2002
>>> +++ linux/arch/i386/config.in Fri Jan 11 22:20:32 2002
>>> @@ -169,7 +169,11 @@
>>> if [ "$CONFIG_HIGHMEM64G" = "y" ]; then
>>> define_bool CONFIG_X86_PAE y
>>> else
>>> - bool '3.5GB user address space' CONFIG_05GB
>>> + choice 'Maximum Virtual Memory' \
>>> + "3GB CONFIG_1GB \
>>> + 2GB CONFIG_2GB \
>>> + 1GB CONFIG_3GB \
>>> + 05GB CONFIG_05GB" 3GB
>>> fi
>>
>> Calling this "Maximum Virtual Memory" is misleading at best. This is
>> best described as "kernel:user split" (3:1, 2:2, 1:3, 3.5:0.5);
>> "maximum virtual memory" sounds to me a lot like the opposite of what
>> your parameter is.
>
> actually it is really max virtual memory.. but from the user point of
> view, user is supposed to care about the virtual memory he can manage,
> not about what the kernel will do with the rest. So if the user wants
> 3GB of virtual memory available to each task he will select 3GB. I
> really don't mind if you want to change it from the kernel point of
> view, but given it's the user who's supposed to compile it, also the
> current patch looks good enough to me.

The numbers are wrong anyway, because of vmalloc() and PCI space.
The PCI space is motherboard-dependent AFAIK, but you could at
least account for the 128 MB vmalloc() area:

user virtual space / non-kmap physical memory

3584/384
3072/896
2048/1920
1024/2944 (sure this works, even for syscalls w/ bad pointers?)
512/3456 (sure this works, even for syscalls w/ bad pointers?)

2002-01-12 17:43:25

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

On Sat, Jan 12, 2002 at 12:26:35PM -0500, Albert D. Cahalan wrote:
> Andrea Arcangeli writes:
> > On Fri, Jan 11, 2002 at 11:32:37PM -0800, H. Peter Anvin wrote:
> >> By author: [email protected]
>
> >>> --- linux.aa2/arch/i386/config.in Fri Jan 11 20:57:58 2002
> >>> +++ linux/arch/i386/config.in Fri Jan 11 22:20:32 2002
> >>> @@ -169,7 +169,11 @@
> >>> if [ "$CONFIG_HIGHMEM64G" = "y" ]; then
> >>> define_bool CONFIG_X86_PAE y
> >>> else
> >>> - bool '3.5GB user address space' CONFIG_05GB
> >>> + choice 'Maximum Virtual Memory' \
> >>> + "3GB CONFIG_1GB \
> >>> + 2GB CONFIG_2GB \
> >>> + 1GB CONFIG_3GB \
> >>> + 05GB CONFIG_05GB" 3GB
^^ this should be 3.5GB btw
> >>> fi
> >>
> >> Calling this "Maximum Virtual Memory" is misleading at best. This is
> >> best described as "kernel:user split" (3:1, 2:2, 1:3, 3.5:0.5);
> >> "maximum virtual memory" sounds to me a lot like the opposite of what
> >> your parameter is.
> >
> > actually it is really max virtual memory.. but from the user point of
> > view, user is supposed to care about the virtual memory he can manage,
> > not about what the kernel will do with the rest. So if the user wants
> > 3GB of virtual memory available to each task he will select 3GB. I
> > really don't mind if you want to change it from the kernel point of
> > view, but given it's the user who's supposed to compile it, also the
> > current patch looks good enough to me.
>
> The numbers are wrong anyway, because of vmalloc() and PCI space.
> The PCI space is motherboard-dependent AFAIK, but you could at
> least account for the 128 MB vmalloc() area:

looks dirty, the size of the kernel direct mapping is mainly in function
of #defines that can be changed freely, they're not constant in function
of CONFIG_1G etc.. and it changes also in function of smp/up/4G/64G
options. The 3GB/2GB/1GB/3.5GB visible into the menuconfig are exact
instead. So I wouldn't mention inprecise stuff that can changed under
us (and the exact size of the kernel direct mapping doesn't matter to
the user anyways I think [and if it matters I think it means he's
skilled enough to know about vmalloc space ;) ]).

>
> user virtual space / non-kmap physical memory
>
> 3584/384
> 3072/896
> 2048/1920
> 1024/2944 (sure this works, even for syscalls w/ bad pointers?)
> 512/3456 (sure this works, even for syscalls w/ bad pointers?)

Andrea

2002-01-12 18:29:11

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

Andrea Arcangeli writes:
> On Sat, Jan 12, 2002 at 12:26:35PM -0500, Albert D. Cahalan wrote:

>> The numbers are wrong anyway, because of vmalloc() and PCI space.
>> The PCI space is motherboard-dependent AFAIK, but you could at
>> least account for the 128 MB vmalloc() area:
>
> looks dirty, the size of the kernel direct mapping is mainly in function
> of #defines that can be changed freely, they're not constant in function
> of CONFIG_1G etc.. and it changes also in function of smp/up/4G/64G
> options. The 3GB/2GB/1GB/3.5GB visible into the menuconfig are exact
> instead. So I wouldn't mention inprecise stuff that can changed under
> us (and the exact size of the kernel direct mapping doesn't matter to
> the user anyways I think [and if it matters I think it means he's
> skilled enough to know about vmalloc space ;) ]).

The problem is that the "1GB" option doesn't cover 1 GB.
It is common for people to buy 1 GB of memory (power of 2),
and then complain that Linux only sees 896 MB of memory.

So how will you make the choices clear to such people?
There are 3 options, not counting the slram block device:

a. give up 128 MB (12.5 %) of memory
b. suffer the kmap overhead
c. give up some user virtual address space

None of this is obvious. The user sees "1GB" and will
innocently believe that this is good for a 1 GB system!

BTW, do we no longer require that the kernel side of things
(whole thing, including vmalloc space) be a power of two?
There used to be a bitwise operation in the user access stuff.

2002-01-12 19:20:39

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

On Sat, 12 Jan 2002, Andrea Arcangeli wrote:
>
> for a fileserver (even more if in kernel like tux) it certainly make
> sense to have as much direct mapped memory as possible, it is not the
> recommended setup for a generic purpose kernel though. So I applied the
> patch (except the prefix thing which is distribution specific). thanks,

Please add in the patch below as well. It needs some explanation!
A few weeks ago we noticed a compiler bug: in both egcs-2.91.66 and
gcc-2.95.3; not in RH 2.96-85; forget if I tried gcc-3.0, expect okay.

If CONFIG_HIGHMEM64G (PAE: 3 levels of page table, 64-bit pte),
the free_one_pgd code inlined in clear_page_tables is miscompiled:
the loop is terminated by a "jle" signed comparison of addresses
instead of a "jb" unsigned comparison.

Usually not a problem: but if you configure for 1GB of user virtual
and 3GB of kernel virtual, and you have more than 1GB of physical
memory (as you normally would if chose HIGHMEM64G), then there's
a page at physical address 0x3ffff000, directly mapped to virtual
address 0x7ffff000. And if that page happens to get used for the
pmd of a process, then on exit the free_one_pgd loop wraps over
to carry on freeing "entries" at 0x80000000, 0x80000008, ...
A lot of pmd_ERROR messages, but eventually an entry scrapes
through the pmd_bad test and is wrongly freed, not so good.

The patch below seems to be enough to convince egcs-2.91.66 and
gcc-2.95.3 to use a "jb" comparison there. I'm working on PIII,
prefetchw() just a stub, if that makes any difference.

This patch is not actually what we've used. Paranoia (what other
such bugs might there be?) drove me to set physical pages 0x3ffff
and 0x40000 as Reserved in arch/i386/setup.c. I don't think it's
appropriate to force that level of paranoia on others; but anyone
configuring 3GBK should remember that it's a less-travelled path.

Hugh

--- 2.4.18pre2aa2/mm/memory.c Sat Jan 12 18:01:36 2002
+++ linux/mm/memory.c Sat Jan 12 18:09:27 2002
@@ -106,8 +106,7 @@

static inline void free_one_pgd(pgd_t * dir)
{
- int j;
- pmd_t * pmd;
+ pmd_t * pmd, * md, * emd;

if (pgd_none(*dir))
return;
@@ -118,9 +117,9 @@
}
pmd = pmd_offset(dir, 0);
pgd_clear(dir);
- for (j = 0; j < PTRS_PER_PMD ; j++) {
- prefetchw(pmd+j+(PREFETCH_STRIDE/16));
- free_one_pmd(pmd+j);
+ for (md = pmd, emd = pmd + PTRS_PER_PMD; md < emd; md++) {
+ prefetchw(md+(PREFETCH_STRIDE/16));
+ free_one_pmd(md);
}
pmd_free(pmd);
}

2002-01-12 19:25:39

by Andre Hedrick

[permalink] [raw]
Subject: BIO Usage Error or Conflicting Designs


Jens,

Below is a single sector read using ACB.
If I do not use the code inside "#ifdef USEBIO" and run UP/SMP but no
highmem, it runs and works like a charm. It is also 100% unchanged code
from what is in 2.4 patches. The attached oops is generate under
SMP without highmem and running the USEBIO code.

CONFIG_NOHIGHMEM=y
# CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set

Regards,

Andre Hedrick
Linux Disk Certification Project Linux ATA Development


/*
* Handler for command with PIO data-in phase
*/
ide_startstop_t task_in_intr (ide_drive_t *drive)
{
byte stat = GET_STAT();
byte io_32bit = drive->io_32bit;
struct request *rq = HWGROUP(drive)->rq;
char *pBuf = NULL;

if (!OK_STAT(stat,DATA_READY,BAD_R_STAT)) {
if (stat & (ERR_STAT|DRQ_STAT)) {
return ide_error(drive, "task_in_intr", stat);
}
if (!(stat & BUSY_STAT)) {
DTF("task_in_intr to Soon wait for next interrupt\n");
ide_set_handler(drive, &task_in_intr, WAIT_CMD, NULL);
return ide_started;
}
}
drive->io_32bit = 0;
DTF("stat: %02x\n", stat);
#ifdef USEBIO
if (rq->flags & REQ_CMD) {
pBuf = ide_map_buffer(rq, &flags);
} else {
pBuf = rq->buffer + ((rq->nr_sectors - rq->current_nr_sectors) * SECTOR_SIZE);
}
#else
pBuf = rq->buffer + ((rq->nr_sectors - rq->current_nr_sectors) * SECTOR_SIZE);
#endif
DTF("Read: %p, rq->current_nr_sectors: %d\n", pBuf, (int) rq->current_nr_sectors);
taskfile_input_data(drive, pBuf, SECTOR_WORDS);
#ifdef USEBIO
if (rq->flags & REQ_CMD)
ide_unmap_buffer(pBuf, &flags);
rq->sector++;
rq->errors = 0;
#endif
drive->io_32bit = io_32bit;
if (--rq->current_nr_sectors <= 0) {
/* (hs): swapped next 2 lines */
DTF("Request Ended stat: %02x\n", GET_STAT());
ide_end_request(1, HWGROUP(drive));
} else {
ide_set_handler(drive, &task_in_intr, WAIT_CMD, NULL);
return ide_started;
}
return ide_stopped;
}


Attachments:
bio.oops.file (2.32 kB)

2002-01-12 20:06:33

by Jens Axboe

[permalink] [raw]
Subject: Re: BIO Usage Error or Conflicting Designs

On Sat, Jan 12 2002, Andre Hedrick wrote:
>
> Jens,
>
> Below is a single sector read using ACB.
> If I do not use the code inside "#ifdef USEBIO" and run UP/SMP but no
> highmem, it runs and works like a charm. It is also 100% unchanged code
> from what is in 2.4 patches. The attached oops is generate under
> SMP without highmem and running the USEBIO code.
>
> CONFIG_NOHIGHMEM=y

Is this with the highmem debug stuff enabled? That's the only way I can
see this BUG triggering, otherwise q->bounce_pfn _cannot_ be smaller
than the max_pfn.

--
Jens Axboe

2002-01-12 20:59:52

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

Andrea Arcangeli wrote:

>
> actually it is really max virtual memory.. but from the user point of
> view, user is supposed to care about the virtual memory he can manage,
> not about what the kernel will do with the rest. So if the user wants
> 3GB of virtual memory available to each task he will select 3GB. I
> really don't mind if you want to change it from the kernel point of
> view, but given it's the user who's supposed to compile it, also the
> current patch looks good enough to me.
>


Oh, right... if so he simply has the 05G option mislabelled... it should
be 3.5G

-hpa


2002-01-12 21:09:02

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

Hugh Dickins wrote:
> ...
> The patch below seems to be enough to convince egcs-2.91.66 and
> gcc-2.95.3 to use a "jb" comparison there.
> ...
> - for (j = 0; j < PTRS_PER_PMD ; j++) {
> - prefetchw(pmd+j+(PREFETCH_STRIDE/16));
> - free_one_pmd(pmd+j);
> + for (md = pmd, emd = pmd + PTRS_PER_PMD; md < emd; md++) {
> + prefetchw(md+(PREFETCH_STRIDE/16));
> + free_one_pmd(md);

You need to add a big fat comment here, describing the compiler
problem, and the risks which attend any change to this code.

-

2002-01-12 21:39:06

by Randy Hron

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

Based on some of the comments in the thread, here
is what I came up with for Configure.help.

Also, in the patch, I had 3GB as the default config
option. It may be safer to have 1GB as the default
configure option to match the mainline.

--- linux.aa2/Documentation/Configure.help Fri Jan 11 20:57:58 2002
+++ linux/Documentation/Configure.help Sat Jan 12 16:29:21 2002
@@ -376,6 +376,59 @@
Select this if you have a 32-bit processor and more than 4
gigabytes of physical RAM.

+# Choice: maxvm
+Maximum Virtual Memory
+CONFIG_1GB
+ If you have 4 Gigabytes of physical memory or less, you can change
+ where the where the kernel maps high memory. If you have less
+ than 1 gigabyte of physical memory, you should disable
+ CONFIG_HIGHMEM4G because you don't need the choices below.
+
+ If you have a large amount of physical memory, all of it may not
+ be "permanently mapped" by the kernel. The physical memory that
+ is not permanently mapped is called "high memory".
+
+ The numbers in the configuration options are not precise because
+ of the kernel's vmalloc() area, and the PCI space on motherboards
+ may vary as well. Typically there will 128 megabytes less
+ "user memory" mapped than the number in the configuration option.
+ Saying that another way, "high memory" will usually start 128
+ megabytes lower than the configuration option.
+
+ Selecting "05GB" results in a "3.5GB/0.5GB" kernel/user split:
+ 3.5 gigabytes are kernel mapped so each process sees a 3.5
+ gigabyte virtual memory space and the remaining part of the 4
+ gigabyte virtual memory space is used by the kernel to permanently
+ map as much physical memory as possible. On a system with 1 gigabyte
+ of physical memory, you may get 384 megabytes of "user memory" and
+ 640 megabytes of "high memory" with this selection.
+
+ Selecting "1GB" results in a "3GB/1GB" kernel/user split:
+ 3 gigabytes are mapped so each process sees a 3 gigabyte virtual
+ memory space and the remaining part of the 4 gigabyte virtual memory
+ space is used by the kernel to permanently map as much physical
+ memory as possible. On a system with 1 gigabyte of memory, you may
+ get 896 MB of "user memory" and 128 megabytes of "high memory"
+
+ Selecting "2GB" results in a "2GB/2GB" kernel/user split:
+ 2 gigabytes are mapped so each process sees a 2 gigabyte virtual
+ memory space and the remaining part of the 4 gigabyte virtual memory
+ space is used by the kernel to permanently map as much physical
+ memory as possible. On a system with 1 to 1.75 gigabytes of
+ physical memory, this option have all make it so no memory is
+ mapped as "high memory".
+
+ Selecting "3GB" results in a "1GB/3GB" kernel/user split:
+ 1 gigabyte is mapped so each process sees a 1 gigabyte virtual
+ memory space and the remaining part of the 4 gigabytes of virtual
+ memory space is used by the kernel to permanently map as much
+ physical memory as possible.
+
+ Options "2GB" and "3GB" may expose bugs that were dormant in
+ certain hardware and possibly even the kernel.
+
+ If unsure, say "1GB".
+
HIGHMEM I/O support
CONFIG_HIGHIO
If you want to be able to do I/O to high memory pages, say Y.

--
Randy Hron

2002-01-12 22:31:37

by Randy Hron

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

This configure help may be clearer than the previous.

--- linux.aa2/Documentation/Configure.help Fri Jan 11 20:57:58 2002
+++ linux/Documentation/Configure.help Sat Jan 12 17:27:01 2002
@@ -376,6 +376,50 @@
Select this if you have a 32-bit processor and more than 4
gigabytes of physical RAM.

+# Choice: maxvm
+Maximum Virtual Memory
+CONFIG_1GB
+ If you have 4 Gigabytes of physical memory or less, you can change
+ where the kernel maps high memory. If you have less than 1 gigabyte
+ of physical memory, you should disable CONFIG_HIGHMEM4G because you
+ don't need the choices below.
+
+ If you have a large amount of physical memory, all of it may not
+ be "permanently mapped" by the kernel. The physical memory that
+ is not permanently mapped is called "high memory".
+
+ The numbers in the configuration options are not precise because
+ of the kernel's vmalloc() area, and the PCI space on motherboards
+ may vary as well. Typically there will 128 megabytes less
+ "user memory" mapped than the number in the configuration option.
+ Saying that another way, "high memory" will usually start 128
+ megabytes lower than the configuration option.
+
+ Selecting "05GB" results in a "3.5GB/0.5GB" kernel/user split:
+ 3.5 gigabytes are kernel mapped so each process sees a 3.5
+ gigabyte virtual memory space and the remaining part of the 4
+ gigabyte virtual memory space is used by the kernel to permanently
+ map as much physical memory as possible. On a system with 1 gigabyte
+ of physical memory, you may get 384 megabytes of "user memory" and
+ 640 megabytes of "high memory" with this selection.
+
+ Selecting "1GB" results in a "3GB/1GB" kernel/user split:
+ On a system with 1 gigabyte of memory, you may get 896 megabytes
+ of "user memory" and 128 megabytes of "high memory"
+
+ Selecting "2GB" results in a "2GB/2GB" kernel/user split:
+ On a system with 1 to 1.75 gigabytes of physical memory, this
+ option will make it so no memory is mapped as "high memory".
+
+ Selecting "3GB" results in a "1GB/3GB" kernel/user split:
+ On a system with 1 to 2.75 gigabytes of physical memory, this
+ option will make it so no memory is mapped as "high memory".
+
+ Options "2GB" and "3GB" may expose bugs that were dormant in
+ certain hardware and possibly even the kernel.
+
+ If unsure, say "1GB".
+
HIGHMEM I/O support
CONFIG_HIGHIO
If you want to be able to do I/O to high memory pages, say Y.

--
Randy Hron

2002-01-13 01:33:39

by Andre Hedrick

[permalink] [raw]
Subject: Re: BIO Usage Error or Conflicting Designs


Jens,

Here is back at you sir.

Andre Hedrick
Linux Disk Certification Project Linux ATA Development


Attachments:
oops3.file (3.74 kB)

2002-01-13 12:59:57

by Jens Axboe

[permalink] [raw]
Subject: Re: BIO Usage Error or Conflicting Designs

On Sat, Jan 12 2002, Andre Hedrick wrote:
>
> Jens,
>
> Here is back at you sir.

Without highmem debug enabled?? I already knew this was the bug
triggered, nothing new here.

Please print the two pfn values triggering the BUG_ON, I'll take a look
at this tomorrow.

--
Jens Axboe

2002-01-13 20:17:36

by Andre Hedrick

[permalink] [raw]
Subject: Re: BIO Usage Error or Conflicting Designs

On Sun, 13 Jan 2002, Jens Axboe wrote:

> On Sat, Jan 12 2002, Andre Hedrick wrote:
> >
> > Jens,
> >
> > Here is back at you sir.
>
> Without highmem debug enabled?? I already knew this was the bug
> triggered, nothing new here.
>
> Please print the two pfn values triggering the BUG_ON, I'll take a look
> at this tomorrow.

That is with highmem debug on, the stuff at the end of the config file.
Nothing more is generated, if there are more flags to set please tell me
where.

Regards,

Andre Hedrick
Linux Disk Certification Project Linux ATA Development

2002-01-13 20:24:46

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

Followup to: <[email protected]>
By author: Hugh Dickins <[email protected]>
In newsgroup: linux.dev.kernel
>
> Usually not a problem: but if you configure for 1GB of user virtual
> and 3GB of kernel virtual, and you have more than 1GB of physical
> memory (as you normally would if chose HIGHMEM64G), then there's
> a page at physical address 0x3ffff000, directly mapped to virtual
> address 0x7ffff000. And if that page happens to get used for the
> pmd of a process, then on exit the free_one_pgd loop wraps over
> to carry on freeing "entries" at 0x80000000, 0x80000008, ...
> A lot of pmd_ERROR messages, but eventually an entry scrapes
> through the pmd_bad test and is wrongly freed, not so good.
>

By the way, expect user programs to fail due to lack of address space
if you only give them 1 GB of userspace. At 1 GB of userspace there
is *no* address space which is compatible with the normal address
space map available to the user process.

I would personally vote against including that particular option.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>

2002-01-13 21:09:37

by Manfred Spraul

[permalink] [raw]
Subject: Re: BIO Usage Error or Conflicting Designs


Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

>
> Is this with the highmem debug stuff enabled? That's the only way I can
> see this BUG triggering, otherwise q->bounce_pfn _cannot_ be smaller
> than the max_pfn.
>
Have you tested that?

Unless I misread arch/i386/kernel/setup.c, line 740 to 760, max_pfn is
the upper end of the highmem area, if highmem is configured.
For non-highmem setup, it's set to min(system_memory, 4 GB).
It was a local variable within setup_arch, and someone made it a global
variable.

I.e. max_pfn is 1 GB with Andre's setup.

His patch doesn't touch the bounce limit, the default limit from
blk_queue_make_request() is used: BLK_BOUNCE_HIGH, which is max_low_pfn.

max_low_pfn is 896 MB.

--> BUG in create_bounce(), because a request comes in with a bounce
limit less than the total system memory, and no highmem configured.

--
Manfred

2002-01-13 23:07:20

by Marvin Justice

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

On Sunday 13 January 2002 02:24 pm, H. Peter Anvin wrote:
> By the way, expect user programs to fail due to lack of address space
> if you only give them 1 GB of userspace. At 1 GB of userspace there
> is *no* address space which is compatible with the normal address
> space map available to the user process.


Actually, I think it will work for apps < `600MB since the mmap area is
automatically adjusted to begin at PAGE_OFFSET/3.

cat /proc/1/maps

CONFIG_1GB
08048000-0804e000 r-xp 00000000 03:41 58716 /sbin/init
0804e000-08050000 rw-p 00005000 03:41 58716 /sbin/init
08050000-08054000 rwxp 00000000 00:00 0
40000000-40016000 r-xp 00000000 03:41 73822 /lib/ld-2.2.4.so
40016000-40017000 rw-p 00015000 03:41 73822 /lib/ld-2.2.4.so
40017000-40018000 rw-p 00000000 00:00 0
4002c000-4015e000 r-xp 00000000 03:41 73816 /lib/i686/libc-2.2.4.so
4015e000-40164000 rw-p 00131000 03:41 73816 /lib/i686/libc-2.2.4.so
40164000-40168000 rw-p 00000000 00:00 0
bfffe000-c0000000 rwxp fffff000 00:00 0

CONFIG_3GB
08048000-0804e000 r-xp 00000000 03:41 58716 /sbin/init
0804e000-08050000 rw-p 00005000 03:41 58716 /sbin/init
08050000-08054000 rwxp 00000000 00:00 0
15556000-1556c000 r-xp 00000000 03:41 73822 /lib/ld-2.2.4.so
1556c000-1556d000 rw-p 00015000 03:41 73822 /lib/ld-2.2.4.so
1556d000-1556e000 rw-p 00000000 00:00 0
15582000-156b4000 r-xp 00000000 03:41 73816 /lib/i686/libc-2.2.4.so
156b4000-156ba000 rw-p 00131000 03:41 73816 /lib/i686/libc-2.2.4.so
156ba000-156be000 rw-p 00000000 00:00 0
3fffe000-40000000 rwxp fffff000 00:00 0

Then again, I agree the 3GB option doesn't make much sense for 99.99% of
cases.

-Marvin

2002-01-14 00:04:36

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

Marvin Justice wrote:

> On Sunday 13 January 2002 02:24 pm, H. Peter Anvin wrote:
>
>>By the way, expect user programs to fail due to lack of address space
>>if you only give them 1 GB of userspace. At 1 GB of userspace there
>>is *no* address space which is compatible with the normal address
>>space map available to the user process.
>>
> Actually, I think it will work for apps < `600MB since the mmap area is
> automatically adjusted to begin at PAGE_OFFSET/3.
>


As I said: At 1 GB of userspace there is *no* address space which is
compatible with the normal address space map available to the user process.

There is mmap() space, available, sure, but you can't get the same
address, even by request. Applications that care about the layout of
the address space will fail.

-hpa

2002-01-14 00:36:41

by Alan

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

> As I said: At 1 GB of userspace there is *no* address space which is
> compatible with the normal address space map available to the user process.
>
> There is mmap() space, available, sure, but you can't get the same
> address, even by request. Applications that care about the layout of
> the address space will fail.

That sounds a good reason to run this mode a bit for debugging (be sure to
use Hugh's gcc 2.95/egcs-1.1.2 bug fix when trying this though!)

2002-01-14 02:22:25

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

On 13 Jan 2002, H. Peter Anvin wrote:

> By the way, expect user programs to fail due to lack of address space
> if you only give them 1 GB of userspace. At 1 GB of userspace there
> is *no* address space which is compatible with the normal address
> space map available to the user process.
>
> I would personally vote against including that particular option.

It could be useful for machines where most activity happens
inside the kernel, though. Think of TUX web or ftp servers
or dedicated NFS servers.

regards,

Rik
--
"Linux holds advantages over the single-vendor commercial OS"
-- Microsoft's "Competing with Linux" document

http://www.surriel.com/ http://distro.conectiva.com/

2002-01-14 06:43:10

by Jens Axboe

[permalink] [raw]
Subject: Re: BIO Usage Error or Conflicting Designs

On Sun, Jan 13 2002, Andre Hedrick wrote:
> On Sun, 13 Jan 2002, Jens Axboe wrote:
>
> > On Sat, Jan 12 2002, Andre Hedrick wrote:
> > >
> > > Jens,
> > >
> > > Here is back at you sir.
> >
> > Without highmem debug enabled?? I already knew this was the bug
> > triggered, nothing new here.
> >
> > Please print the two pfn values triggering the BUG_ON, I'll take a look
> > at this tomorrow.
>
> That is with highmem debug on, the stuff at the end of the config file.
> Nothing more is generated, if there are more flags to set please tell me
> where.

Sorry if I wasn't clear, I mean the emulate highmem debug patch I
forwarded to you. I'll look into Manfred's post right now, you can
simply remove the

#ifndef CONFIG_HIGHMEM
BUG();
#endif

test for now, for testing.

--
Jens Axboe

2002-01-14 07:24:07

by Jens Axboe

[permalink] [raw]
Subject: Re: BIO Usage Error or Conflicting Designs

On Sun, Jan 13 2002, Manfred Spraul wrote:
> >
> > Is this with the highmem debug stuff enabled? That's the only way I can
> > see this BUG triggering, otherwise q->bounce_pfn _cannot_ be smaller
> > than the max_pfn.
> >
> Have you tested that?
>
> Unless I misread arch/i386/kernel/setup.c, line 740 to 760, max_pfn is
> the upper end of the highmem area, if highmem is configured.
> For non-highmem setup, it's set to min(system_memory, 4 GB).
> It was a local variable within setup_arch, and someone made it a global
> variable.
>
> I.e. max_pfn is 1 GB with Andre's setup.
>
> His patch doesn't touch the bounce limit, the default limit from
> blk_queue_make_request() is used: BLK_BOUNCE_HIGH, which is max_low_pfn.
>
> max_low_pfn is 896 MB.
>
> --> BUG in create_bounce(), because a request comes in with a bounce
> limit less than the total system memory, and no highmem configured.

Indeed, I misread the max_pfn stuff when I added that.

--- /opt/kernel/linux-2.5.2-pre11/drivers/block/ll_rw_blk.c Thu Jan 10 09:56:52 2002
+++ drivers/block/ll_rw_blk.c Mon Jan 14 02:21:50 2002
@@ -1711,7 +1705,11 @@
printk("block: %d slots per queue, batch=%d\n", queue_nr_requests, batch_requests);

blk_max_low_pfn = max_low_pfn;
+#ifdef CONFIG_HIGHMEM
blk_max_pfn = max_pfn;
+#else
+ blk_max_pfn = max_low_pfn;
+#endif

#if defined(CONFIG_IDE) && defined(CONFIG_BLK_DEV_IDE)
ide_init(); /* this MUST precede hd_init */
--- /opt/kernel/linux-2.5.2-pre11/mm/highmem.c Thu Jan 10 09:56:53 2002
+++ mm/highmem.c Mon Jan 14 02:20:53 2002
@@ -367,12 +367,6 @@
if (pfn >= blk_max_pfn)
return;

-#ifndef CONFIG_HIGHMEM
- /*
- * should not hit for non-highmem case
- */
- BUG();
-#endif
bio_gfp = GFP_NOHIGHIO;
pool = page_pool;
} else {

--
Jens Axboe

2002-01-15 14:04:40

by Randy Hron

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

On Sat, Jan 12, 2002 at 07:22:23PM +0000, Hugh Dickins wrote:
> This patch is not actually what we've used. Paranoia (what other
> such bugs might there be?) drove me to set physical pages 0x3ffff
> and 0x40000 as Reserved in arch/i386/setup.c. I don't think it's
> appropriate to force that level of paranoia on others; but anyone
> configuring 3GBK should remember that it's a less-travelled path.
>
> Hugh

Thanks for the patch! I'm running on it on a 1024MB machine with
the 2GB option, and it passes LTP runalltests.sh.

The 3 patches in this thread combined into one, with a default
config option of 2GB, and help saying, if unsure, say "1GB":


diff -nur linux-2.4.18pre2aa2/Documentation/Configure.help
linux/Documentation/Configure.help
--- linux-2.4.18pre2aa2/Documentation/Configure.help Tue Jan 15 00:01:38
2002
+++ linux/Documentation/Configure.help Mon Jan 14 23:59:35 2002
@@ -376,6 +376,59 @@
Select this if you have a 32-bit processor and more than 4
gigabytes of physical RAM.

+# Choice: maxvm
+Maximum Virtual Memory
+CONFIG_1GB
+ If you have 4 Gigabytes of physical memory or less, you can change
+ where the where the kernel maps high memory. If you have less
+ than 1 gigabyte of physical memory, you should disable
+ CONFIG_HIGHMEM4G because you don't need the choices below.
+
+ If you have a large amount of physical memory, all of it may not
+ be "permanently mapped" by the kernel. The physical memory that
+ is not permanently mapped is called "high memory".
+
+ The numbers in the configuration options are not precise because
+ of the kernel's vmalloc() area, and the PCI space on motherboards
+ may vary as well. Typically there will 128 megabytes less
+ "user memory" mapped than the number in the configuration option.
+ Saying that another way, "high memory" will usually start 128
+ megabytes lower than the configuration option.
+
+ Selecting "05GB" results in a "3.5GB/0.5GB" kernel/user split:
+ 3.5 gigabytes are kernel mapped so each process sees a 3.5
+ gigabyte virtual memory space and the remaining part of the 4
+ gigabyte virtual memory space is used by the kernel to permanently
+ map as much physical memory as possible. On a system with 1 gigabyte
+ of physical memory, you may get 384 megabytes of "user memory" and
+ 640 megabytes of "high memory" with this selection.
+
+ Selecting "1GB" results in a "3GB/1GB" kernel/user split:
+ 3 gigabytes are mapped so each process sees a 3 gigabyte virtual
+ memory space and the remaining part of the 4 gigabyte virtual memory
+ space is used by the kernel to permanently map as much physical
+ memory as possible. On a system with 1 gigabyte of memory, you may
+ get 896 MB of "user memory" and 128 megabytes of "high memory"
+
+ Selecting "2GB" results in a "2GB/2GB" kernel/user split:
+ 2 gigabytes are mapped so each process sees a 2 gigabyte virtual
+ memory space and the remaining part of the 4 gigabyte virtual memory
+ space is used by the kernel to permanently map as much physical
+ memory as possible. On a system with 1 to 1.75 gigabytes of
+ physical memory, this option have all make it so no memory is
+ mapped as "high memory".
+
+ Selecting "3GB" results in a "1GB/3GB" kernel/user split:
+ 1 gigabyte is mapped so each process sees a 1 gigabyte virtual
+ memory space and the remaining part of the 4 gigabytes of virtual
+ memory space is used by the kernel to permanently map as much
+ physical memory as possible.
+
+ Options "2GB" and "3GB" may expose bugs that were dormant in
+ certain hardware, compilers, and possibly even the kernel.
+
+ If unsure, say "1GB".
+
HIGHMEM I/O support
CONFIG_HIGHIO
If you want to be able to do I/O to high memory pages, say Y.
diff -nur linux-2.4.18pre2aa2/Rules.make linux/Rules.make
--- linux-2.4.18pre2aa2/Rules.make Tue Mar 6 22:31:01 2001
+++ linux/Rules.make Mon Jan 14 23:58:55 2002
@@ -212,6 +212,7 @@
#
# Added the SMP separator to stop module accidents between uniprocessor
# and SMP Intel boxes - AC - from bits by Michael Chastain
+# Added separator for different PAGE_OFFSET memory models - Ingo.
#

ifdef CONFIG_SMP
@@ -220,6 +221,22 @@
genksyms_smp_prefix :=
endif

+ifdef CONFIG_2GB
+ifdef CONFIG_SMP
+ genksyms_smp_prefix := -p smp_2gig_
+else
+ genksyms_smp_prefix := -p 2gig_
+endif
+endif
+
+ifdef CONFIG_3GB
+ifdef CONFIG_SMP
+ genksyms_smp_prefix := -p smp_3gig_
+else
+ genksyms_smp_prefix := -p 3gig_
+endif
+endif
+
$(MODINCL)/%.ver: %.c
@if [ ! -r $(MODINCL)/$*.stamp -o $(MODINCL)/$*.stamp -ot $< ]; then
\
echo '$(CC) $(CFLAGS) $(EXTRA_CFLAGS) -E -D__GENKSYMS__ $<';
\
diff -nur linux-2.4.18pre2aa2/arch/i386/config.in linux/arch/i386/config.in
--- linux-2.4.18pre2aa2/arch/i386/config.in Tue Jan 15 00:01:38 2002
+++ linux/arch/i386/config.in Mon Jan 14 23:58:55 2002
@@ -169,7 +169,11 @@
if [ "$CONFIG_HIGHMEM64G" = "y" ]; then
define_bool CONFIG_X86_PAE y
else
- bool '3.5GB user address space' CONFIG_05GB
+ choice 'Maximum Virtual Memory' \
+ "3GB CONFIG_1GB \
+ 2GB CONFIG_2GB \
+ 1GB CONFIG_3GB \
+ 05GB CONFIG_05GB" 2GB
fi
if [ "$CONFIG_NOHIGHMEM" = "y" ]; then
define_bool CONFIG_NO_PAGE_VIRTUAL y
@@ -179,6 +183,7 @@
bool 'HIGHMEM I/O support (EXPERIMENTAL)' CONFIG_HIGHIO
fi

+
bool 'Math emulation' CONFIG_MATH_EMULATION
bool 'MTRR (Memory Type Range Register) support' CONFIG_MTRR
bool 'Symmetric multi-processing support' CONFIG_SMP
diff -nur linux-2.4.18pre2aa2/include/asm-i386/page_offset.h
linux/include/asm-i386/page_offset.h
--- linux-2.4.18pre2aa2/include/asm-i386/page_offset.h Tue Jan 15 00:01:38
2002
+++ linux/include/asm-i386/page_offset.h Mon Jan 14 23:58:55 2002
@@ -1,6 +1,10 @@
#include <linux/config.h>
-#ifndef CONFIG_05GB
-#define PAGE_OFFSET_RAW 0xC0000000
-#else
+#ifdef CONFIG_05GB
#define PAGE_OFFSET_RAW 0xE0000000
+#elif defined(CONFIG_1GB)
+#define PAGE_OFFSET_RAW 0xC0000000
+#elif defined(CONFIG_2GB)
+#define PAGE_OFFSET_RAW 0x80000000
+#elif defined(CONFIG_3GB)
+#define PAGE_OFFSET_RAW 0x40000000
#endif
diff -nur linux-2.4.18pre2aa2/mm/memory.c linux/mm/memory.c
--- linux-2.4.18pre2aa2/mm/memory.c Tue Jan 15 00:01:38 2002
+++ linux/mm/memory.c Mon Jan 14 23:59:18 2002
@@ -106,8 +106,7 @@

static inline void free_one_pgd(pgd_t * dir)
{
- int j;
- pmd_t * pmd;
+ pmd_t * pmd, * md, * emd;

if (pgd_none(*dir))
return;
@@ -118,9 +117,23 @@
}
pmd = pmd_offset(dir, 0);
pgd_clear(dir);
- for (j = 0; j < PTRS_PER_PMD ; j++) {
- prefetchw(pmd+j+(PREFETCH_STRIDE/16));
- free_one_pmd(pmd+j);
+
+ /*
+ * Beware if changing the loop below. It once used int j,
+ * for (j = 0; j < PTRS_PER_PMD; j++)
+ * free_one_pmd(pmd+j);
+ * but some older i386 compilers (e.g. egcs-2.91.66, gcc-2.95.3)
+ * terminated the loop with a _signed_ address comparison
+ * using "jle", when configured for HIGHMEM64GB (X86_PAE).
+ * If also configured for 3GB of kernel virtual address space,
+ * if page at physical 0x3ffff000 virtual 0x7ffff000 is used as
+ * a pmd, when that mm exits the loop goes on to free "entries"
+ * found at 0x80000000 onwards. The loop below compiles instead
+ * to be terminated by unsigned address comparison using "jb".
+ */
+ for (md = pmd, emd = pmd + PTRS_PER_PMD; md < emd; md++) {
+ prefetchw(md+(PREFETCH_STRIDE/16));
+ free_one_pmd(md);
}
pmd_free(pmd);
}
--
Randy Hron

2002-01-15 17:50:27

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

On Tue, Jan 15, 2002 at 09:07:46AM -0500, [email protected] wrote:

> The 3 patches in this thread combined into one, with a default
> config option of 2GB, and help saying, if unsure, say "1GB":

This may be confusing for some, bringing up the question
"I'm unsure, but why is the default at 2GB?"

Default option should match default advice.

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2002-01-16 02:12:58

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

Followup to: <[email protected]>
By author: Dave Jones <[email protected]>
In newsgroup: linux.dev.kernel
>
> On Tue, Jan 15, 2002 at 09:07:46AM -0500, [email protected] wrote:
>
> > The 3 patches in this thread combined into one, with a default
> > config option of 2GB, and help saying, if unsure, say "1GB":
>
> This may be confusing for some, bringing up the question
> "I'm unsure, but why is the default at 2GB?"
>
> Default option should match default advice.
>

The default should definitely be the one that produces the standard
memory map, i.e. 3 GB userspace.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>

2002-01-18 21:33:01

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

Hi!

> The patch below seems to be enough to convince egcs-2.91.66 and
> gcc-2.95.3 to use a "jb" comparison there. I'm working on PIII,
> prefetchw() just a stub, if that makes any difference.

If this is really gcc bug, would simply making j volatile fix it?

Pavel
> --- 2.4.18pre2aa2/mm/memory.c Sat Jan 12 18:01:36 2002
> +++ linux/mm/memory.c Sat Jan 12 18:09:27 2002
> @@ -106,8 +106,7 @@
>
> static inline void free_one_pgd(pgd_t * dir)
> {
> - int j;
> - pmd_t * pmd;
> + pmd_t * pmd, * md, * emd;
>
> if (pgd_none(*dir))
> return;
> @@ -118,9 +117,9 @@
> }
> pmd = pmd_offset(dir, 0);
> pgd_clear(dir);
> - for (j = 0; j < PTRS_PER_PMD ; j++) {
> - prefetchw(pmd+j+(PREFETCH_STRIDE/16));
> - free_one_pmd(pmd+j);
> + for (md = pmd, emd = pmd + PTRS_PER_PMD; md < emd; md++) {
> + prefetchw(md+(PREFETCH_STRIDE/16));
> + free_one_pmd(md);
> }
> pmd_free(pmd);
> }

--
(about SSSCA) "I don't say this lightly. However, I really think that the U.S.
no longer is classifiable as a democracy, but rather as a plutocracy." --hpa

2002-01-19 00:23:39

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

On Fri, 18 Jan 2002, Pavel Machek wrote:
>
> > The patch below seems to be enough to convince egcs-2.91.66 and
> > gcc-2.95.3 to use a "jb" comparison there. I'm working on PIII,
> > prefetchw() just a stub, if that makes any difference.
>
> If this is really gcc bug, would simply making j volatile fix it?

You rogue! The panacea, eh? Well, yes, it does look like that's
enough with egcs-2.91.66 (I don't have 2.95 here to try at the moment,
expect it would behave the same) - the comparison uses "jle" as before,
but now it's correctly on free_one_pmd's index j instead of an address.
Neat - but an even bigger fatter comment needed?

Hugh

2002-01-23 03:50:28

by Randy Hron

[permalink] [raw]
Subject: Re: [PATCH] 1-2-3 GB

> > The 3 patches in this thread combined into one, with a default
> > config option of 2GB, and help saying, if unsure, say "1GB":
>
> This may be confusing for some, bringing up the question
> "I'm unsure, but why is the default at 2GB?"
>
> Default option should match default advice.
>
> --
> | Dave Jones. http://www.codemonkey.org.uk

Good point. This Configure.help for 2.4.18pre4aa1 may be better:


--- linux-2.4.18pre4aa1/Documentation/Configure.help Tue Jan 22 21:25:55 2002
+++ linux/Documentation/Configure.help Tue Jan 22 22:51:11 2002
@@ -376,6 +376,34 @@
Select this if you have a 32-bit processor and more than 4
gigabytes of physical RAM.

+User address space size
+CONFIG_1GB
+ If you have 4 Gigabytes of physical memory or less, you can change
+ where the kernel maps high memory.
+
+ Typically there will 128 megabytes less "user memory" mapped
+ than the number in the configuration option. Saying that
+ another way, "high memory" will usually start 128 megabytes
+ lower than the configuration option.
+
+ Selecting "05GB" results in a "3.5GB/0.5GB" kernel/user split:
+ On a system with 1 gigabyte of physical memory, you may get 384
+ megabytes of "user memory" and 640 megabytes of "high memory"
+ with this selection.
+
+ Selecting "1GB" results in a "3GB/1GB" kernel/user split:
+ On a system with 1 gigabyte of memory, you may get 896 MB of
+ "user memory" and 128 megabytes of "high memory" with this
+ selection. This is the usual setting.
+
+ Selecting "2GB" results in a "2GB/2GB" kernel/user split:
+ On a system with less than 1.75 gigabytes of physical memory,
+ this option will make it so no memory is mapped as "high".
+
+ Selecting "3GB" results in a "1GB/3GB" kernel/user split:
+
+ If unsure, say "1GB".
+
HIGHMEM I/O support
CONFIG_HIGHIO
If you want to be able to do I/O to high memory pages, say Y.

--
Randy Hron