2003-05-09 07:23:09

by Ulrich Drepper

[permalink] [raw]
Subject: hammer: MAP_32BIT

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

To allocate stacks for the threads in nptl we currently use MAP_32BIT to
make sure we get <4GB addresses for faster context switching time. But
once the address space is allocated we have to resort to not using the
flag. This means we have to make 2 mmap() calls, one with MAP_32BIT and
if it fails another one without.

It would be much better if there would also be a MAP_32PREFER flag with
the appropriate semantics. The failing mmap() calls seems to be quite
expensive so programs with many threads are really punished a lot.

- --
- --------------. ,-. 444 Castro Street
Ulrich Drepper \ ,-----------------' \ Mountain View, CA 94041 USA
Red Hat `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+u1pF2ijCOnn/RHQRAk2IAKDAzXZUOsxMPAKkK9ivOz8o6zAaHQCeMC24
ysih3QB/I1w5MNXEIxNs284=
=2cet
-----END PGP SIGNATURE-----


2003-05-09 09:08:47

by Andi Kleen

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

On Fri, May 09, 2003 at 09:35:32AM +0200, Ulrich Drepper wrote:
> It would be much better if there would also be a MAP_32PREFER flag with
> the appropriate semantics. The failing mmap() calls seems to be quite
> expensive so programs with many threads are really punished a lot.

That's just an inadequate data structure. It does an linear search of the
VMAs and you probably have a lot of them. Before you add kludges like this
better fix the data structure for fast free space lookup.

MAP_32BIT currently limits to the first 2GB only. That's needed because
most programs use it to allocate modules for the small code model and that
only supports 2GB (poster child for that is the X server) But for your
application 4GB would be better. But adding another MAP_32BIT_4GB or so
would be quite ugly. I considered making the address where mmap starts searching
(TASK_UNMAPPED_BASE) settable using a prctl.

In some vendor kernels it's already in /proc/pid/mapped_base, but that is
quite costly to change. That would probably give you the best of both, Just
set it to a low value for the thread stacks and then reset it to the default.

I guess that would be the better solution for your stacks.

-Andi

2003-05-09 11:15:45

by Mikael Pettersson

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

Andi Kleen writes:
> On Fri, May 09, 2003 at 09:35:32AM +0200, Ulrich Drepper wrote:
> > It would be much better if there would also be a MAP_32PREFER flag with
> > the appropriate semantics. The failing mmap() calls seems to be quite
> > expensive so programs with many threads are really punished a lot.
>
> That's just an inadequate data structure. It does an linear search of the
> VMAs and you probably have a lot of them. Before you add kludges like this
> better fix the data structure for fast free space lookup.
>
> MAP_32BIT currently limits to the first 2GB only. That's needed because
> most programs use it to allocate modules for the small code model and that
> only supports 2GB (poster child for that is the X server) But for your
> application 4GB would be better. But adding another MAP_32BIT_4GB or so
> would be quite ugly. I considered making the address where mmap starts searching
> (TASK_UNMAPPED_BASE) settable using a prctl.

I have a potential use for mmap()ing in the low 4GB on x86_64.
Sounds like your MAP_32BIT really is MAP_31BIT :-( which is too limiting.
What about a more generic way of indicating which parts of the address
space one wants? The simplest that would work for me is a single byte
'nrbits' specifying the target address space as [0 .. 2^nrbits-1].
This could be specified on a per-mmap() basis or as a settable process attribute.

/Mikael

2003-05-09 11:26:34

by Andi Kleen

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT


On Fri, May 09, 2003 at 01:28:11PM +0200, [email protected] wrote:
> I have a potential use for mmap()ing in the low 4GB on x86_64.

Just use MAP_32BIT

> Sounds like your MAP_32BIT really is MAP_31BIT :-( which is too limiting.
> What about a more generic way of indicating which parts of the address
> space one wants? The simplest that would work for me is a single byte
> 'nrbits' specifying the target address space as [0 .. 2^nrbits-1].
> This could be specified on a per-mmap() basis or as a settable process attribute.

On x86-64 an mmap extension for that would be fine, but on i386 you get
problems because mmap64() already maxes out the argument limit and you
cannot add more.

You could only implement it with a structure in memory pointed to by an
argument, which would be ugly.

prctl is probably better. You really want [start; end] right ?

Pity that task_struct is already so bloated, so every new entry hurts.

-Andi

2003-05-09 11:39:49

by Mikael Pettersson

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

Andi Kleen writes:
>
> On Fri, May 09, 2003 at 01:28:11PM +0200, [email protected] wrote:
> > I have a potential use for mmap()ing in the low 4GB on x86_64.
>
> Just use MAP_32BIT

Will that be corrected to use the full 4GB space? 2GB is too small.

> > Sounds like your MAP_32BIT really is MAP_31BIT :-( which is too limiting.
> > What about a more generic way of indicating which parts of the address
> > space one wants? The simplest that would work for me is a single byte
> > 'nrbits' specifying the target address space as [0 .. 2^nrbits-1].
> > This could be specified on a per-mmap() basis or as a settable process attribute.
>
> On x86-64 an mmap extension for that would be fine, but on i386 you get
> problems because mmap64() already maxes out the argument limit and you
> cannot add more.

This would only be used on x86_64. i386 compat is a non-issue.
(This is for runtime systems stuff, not applictions.)

> prctl is probably better. You really want [start; end] right ?

I just want mmap() to return addresses that fit in 32 bits.

MAP_32BIT would do nicely, if it wasn't limited to 2GB.

/Mikael

2003-05-09 12:04:43

by Andi Kleen

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

On Fri, May 09, 2003 at 01:52:17PM +0200, [email protected] wrote:
> Andi Kleen writes:
> >
> > On Fri, May 09, 2003 at 01:28:11PM +0200, [email protected] wrote:
> > > I have a potential use for mmap()ing in the low 4GB on x86_64.
> >
> > Just use MAP_32BIT
>
> Will that be corrected to use the full 4GB space? 2GB is too small.

That would break the X server.

But what you can do is to use mmap(0x1000, ....) and free the memory
again if the result is bigger than 4GB. If you pass an non zero value
as first argument but not MAP_FIXED it'll use the address argument
as starting point for the search.

-Andi

2003-05-09 17:24:09

by H. Peter Anvin

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

Followup to: <20030509092026.GA11012@averell>
By author: Andi Kleen <[email protected]>
In newsgroup: linux.dev.kernel
>
> MAP_32BIT currently limits to the first 2GB only. That's needed because
> most programs use it to allocate modules for the small code model and that
> only supports 2GB (poster child for that is the X server) But for your
> application 4GB would be better. But adding another MAP_32BIT_4GB or so
> would be quite ugly. I considered making the address where mmap starts searching
> (TASK_UNMAPPED_BASE) settable using a prctl.
>

MAP_31BIT would have been a better name...

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

2003-05-09 17:27:27

by Ulrich Drepper

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andi Kleen wrote:

> That's just an inadequate data structure. It does an linear search of the
> VMAs and you probably have a lot of them. Before you add kludges like this
> better fix the data structure for fast free space lookup.

If you mean the code in arch_get_unmapped_area(), yes, this needs
fixing. In fact, Ingo has already a patch which brings back the
performance of thread creation to what we had back in September/October.


> In some vendor kernels it's already in /proc/pid/mapped_base, but that is
> quite costly to change. That would probably give you the best of both, Just
> set it to a low value for the thread stacks and then reset it to the default.
>
> I guess that would be the better solution for your stacks.

Are you sure this is the best solution? It means the mmap regions for
restricted 31/32 bit addresses and that for the normal, unrestricted
mapping is continuous. This removes a lot of freedom in deciding where
the unrestricted mappings are best located and it would make programs
using threads have a very different memory layout. Not that it should
make any difference; but I can here /them/ already scream that this
breaks applications.

My kernel-uninformed opinion would be to keep the settings separate.

Oh, and please rename MAP_32BIT to MAP_31BIT. This will save nerves on
all sides.

- --
- --------------. ,-. 444 Castro Street
Ulrich Drepper \ ,-----------------' \ Mountain View, CA 94041 USA
Red Hat `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+u+fi2ijCOnn/RHQRAqeBAKC3ZlSCNcw3f7SXahvxRc0WMupYgwCgyBGy
fMqzCxWcx90e002CNUQqwgM=
=LDJf
-----END PGP SIGNATURE-----

2003-05-09 17:59:16

by H. Peter Anvin

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

Followup to: <20030509113845.GA4586@averell>
By author: Andi Kleen <[email protected]>
In newsgroup: linux.dev.kernel
>
>
> On Fri, May 09, 2003 at 01:28:11PM +0200, [email protected] wrote:
> > I have a potential use for mmap()ing in the low 4GB on x86_64.
>
> Just use MAP_32BIT
>
> > Sounds like your MAP_32BIT really is MAP_31BIT :-( which is too limiting.
> > What about a more generic way of indicating which parts of the address
> > space one wants? The simplest that would work for me is a single byte
> > 'nrbits' specifying the target address space as [0 .. 2^nrbits-1].
> > This could be specified on a per-mmap() basis or as a settable process attribute.
>
> On x86-64 an mmap extension for that would be fine, but on i386 you get
> problems because mmap64() already maxes out the argument limit and you
> cannot add more.
>

How about this: since the address argument is basically unused anyway
unless MAP_FIXED is set, how about a MAP_MAXADDR which interprets the
address argument as the highest permissible address (or lowest
nonpermissible address)?

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

2003-05-09 19:12:34

by Ulrich Drepper

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

H. Peter Anvin wrote:

> How about this: since the address argument is basically unused anyway
> unless MAP_FIXED is set, how about a MAP_MAXADDR which interprets the
> address argument as the highest permissible address (or lowest
> nonpermissible address)?

You miss the point of my initial mail: I need a way to say "preferrably
32bit address, otherwise give me what you have". MAP_32BIT already
provides a way to require 32 bit addresses.

- --
- --------------. ,-. 444 Castro Street
Ulrich Drepper \ ,-----------------' \ Mountain View, CA 94041 USA
Red Hat `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+vACE2ijCOnn/RHQRAl3rAKCYgj3LqvIDJ8Ny3pnii8bBvsbwrQCdGkg4
pnFnBmubkRnnsVfBSjDBBWQ=
=P8SV
-----END PGP SIGNATURE-----

2003-05-09 20:43:25

by H. Peter Anvin

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> H. Peter Anvin wrote:
>
>
>>How about this: since the address argument is basically unused anyway
>>unless MAP_FIXED is set, how about a MAP_MAXADDR which interprets the
>>address argument as the highest permissible address (or lowest
>>nonpermissible address)?
>
>
> You miss the point of my initial mail: I need a way to say "preferrably
> 32bit address, otherwise give me what you have". MAP_32BIT already
> provides a way to require 32 bit addresses.
>

No, it requires 31-bit addresses, and there was a discussion about how
some things need 31-bit and some 32-bit addresses. There might also be
a need for 39-bit addresses, to be compatible with Linux 2.4.

MAP_MAXADDR_ADVISORY?

-hpa


2003-05-09 21:32:41

by Ulrich Drepper

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

H. Peter Anvin wrote:

> No, it requires 31-bit addresses, and there was a discussion about how
> some things need 31-bit and some 32-bit addresses.

That's completely irrelevant to my point. Whether MAP_32BIT actually
has a 31 bit limit or not doesn't matter, it's limited as well in the
possible mmap blocks it can return.

The only thing I care about is to have a hint and not a fixed
requirement for mmap(). All your proposals completely ignored this.

- --
- --------------. ,-. 444 Castro Street
Ulrich Drepper \ ,-----------------' \ Mountain View, CA 94041 USA
Red Hat `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+vCFk2ijCOnn/RHQRAnw1AKChzyuZ3g9iXAX5wH088rhko/s8YgCgku12
CayuZsLJGzPO//WCJVWyLxk=
=rkBk
-----END PGP SIGNATURE-----

2003-05-09 21:56:05

by H. Peter Anvin

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> H. Peter Anvin wrote:
>
>
>>No, it requires 31-bit addresses, and there was a discussion about how
>>some things need 31-bit and some 32-bit addresses.
>
>
> That's completely irrelevant to my point. Whether MAP_32BIT actually
> has a 31 bit limit or not doesn't matter, it's limited as well in the
> possible mmap blocks it can return.
>
> The only thing I care about is to have a hint and not a fixed
> requirement for mmap(). All your proposals completely ignored this.
>

Yes, but this is irrelevant to *MY* point... this discussion spawned a
side discussion, and somehow you're upset that it's not addressing your
concern but a different one... seems a bit ridiculous!

Anyway, I already posted that if we're adding MAP_MAXADDR we could also
add MAP_MAXADDR_ADVISORY or something similar to that. On the other
hand, how big of a performance issue is it really to call mmap() again
in the failure scenario *only*?

-hpa


2003-05-09 22:04:02

by Timothy Miller

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT



Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> H. Peter Anvin wrote:
>
>
>>No, it requires 31-bit addresses, and there was a discussion about how
>>some things need 31-bit and some 32-bit addresses.
>
>
> That's completely irrelevant to my point. Whether MAP_32BIT actually
> has a 31 bit limit or not doesn't matter, it's limited as well in the
> possible mmap blocks it can return.
>
> The only thing I care about is to have a hint and not a fixed
> requirement for mmap(). All your proposals completely ignored this.
>

If your program is capable of handling an address with more than 32
bits, what point is there giving a hint? Either your program can handle
64-bit pointers or it cannot. Any program flexible enough to handle
either size dynamically would expend enough overhead checking that it
would be worse than if it just made a hard choice.

2003-05-09 22:08:03

by Ulrich Drepper

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

H. Peter Anvin wrote:
> On the other
> hand, how big of a performance issue is it really to call mmap() again
> in the failure scenario *only*?

Just look at the code, it's very expensive. In the moment the mmap code
has to sequentially look at the VMAs in question. If it fails it means
it walked the entire data structure without success. Ingo's patch does
not address this, it just makes successful allocation usually fast again.

- --
- --------------. ,-. 444 Castro Street
Ulrich Drepper \ ,-----------------' \ Mountain View, CA 94041 USA
Red Hat `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+vCmt2ijCOnn/RHQRAsUeAJ9gGIwIK+QKpSz15YDEaB5aISBwowCgjReV
WSvgiDRcLX5bpla/Agikmj0=
=NSIn
-----END PGP SIGNATURE-----

2003-05-09 22:09:26

by H. Peter Anvin

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> H. Peter Anvin wrote:
>
>>On the other
>>hand, how big of a performance issue is it really to call mmap() again
>>in the failure scenario *only*?
>
>
> Just look at the code, it's very expensive. In the moment the mmap code
> has to sequentially look at the VMAs in question. If it fails it means
> it walked the entire data structure without success. Ingo's patch does
> not address this, it just makes successful allocation usually fast again.
>

OK, maybe we should fix that instead :-/

-hpa


2003-05-09 22:07:54

by H. Peter Anvin

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

Timothy Miller wrote:
>
> If your program is capable of handling an address with more than 32
> bits, what point is there giving a hint? Either your program can handle
> 64-bit pointers or it cannot. Any program flexible enough to handle
> either size dynamically would expend enough overhead checking that it
> would be worse than if it just made a hard choice.
>

The purpose is that there is a slight task-switching speed advantage if
the address is in the bottom 4 GB. Since this affects every process,
and most processes use very little TLS, this is worthwhile.

This is fundamentally due to a K8 design flaw.

-hpa

2003-05-09 22:10:33

by Ulrich Drepper

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Timothy Miller wrote:

> If your program is capable of handling an address with more than 32
> bits, what point is there giving a hint? Either your program can handle
> 64-bit pointers or it cannot. Any program flexible enough to handle
> either size dynamically would expend enough overhead checking that it
> would be worse than if it just made a hard choice.

Look at the x86-64 context switching code. If memory addressed by the
GDT entries has a 32-bit address it uses a different method than for
cases where the virtual address has more than 32 bits. This way of
handling GDT entries is faster according to ak. So, it's not a
correctness thing, it's a performance thing.

- --
- --------------. ,-. 444 Castro Street
Ulrich Drepper \ ,-----------------' \ Mountain View, CA 94041 USA
Red Hat `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+vCo82ijCOnn/RHQRAlGzAJ9Ti80kJMeecyxGikowWcfCAq0stwCfRVcQ
Clui3Z6yKNSy3mu+phrY2FQ=
=GFwi
-----END PGP SIGNATURE-----

2003-05-09 22:29:57

by Timothy Miller

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT



H. Peter Anvin wrote:
> Timothy Miller wrote:
>
>>If your program is capable of handling an address with more than 32
>>bits, what point is there giving a hint? Either your program can handle
>>64-bit pointers or it cannot. Any program flexible enough to handle
>>either size dynamically would expend enough overhead checking that it
>>would be worse than if it just made a hard choice.
>>
>
>
> The purpose is that there is a slight task-switching speed advantage if
> the address is in the bottom 4 GB. Since this affects every process,
> and most processes use very little TLS, this is worthwhile.
>
> This is fundamentally due to a K8 design flaw.

Is there an explicit check somewhere for this? Are the page tables laid
out differently?

2003-05-09 22:36:27

by Timothy Miller

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT



Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Timothy Miller wrote:
>
>
>>If your program is capable of handling an address with more than 32
>>bits, what point is there giving a hint? Either your program can handle
>>64-bit pointers or it cannot. Any program flexible enough to handle
>>either size dynamically would expend enough overhead checking that it
>>would be worse than if it just made a hard choice.
>
>
> Look at the x86-64 context switching code. If memory addressed by the
> GDT entries has a 32-bit address it uses a different method than for
> cases where the virtual address has more than 32 bits. This way of
> handling GDT entries is faster according to ak. So, it's not a
> correctness thing, it's a performance thing.
>

Alright. Sounds great. So my next question is this:

Why does there ever need to be an explicit HINT that you would prefer a
<32 bit address, when it's known a priori that <32 is better? Why
doesn't the mapping code ALWAYS try to use 32-bit addresses before
resorting to 64-bit?

2003-05-09 23:12:23

by Ulrich Drepper

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Timothy Miller wrote:

> Why does there ever need to be an explicit HINT that you would prefer a
> <32 bit address, when it's known a priori that <32 is better? Why
> doesn't the mapping code ALWAYS try to use 32-bit addresses before
> resorting to 64-bit?

Because not all memory is addressed via GDT entries. In fact, almost
none is, only thread stacks and similar gimicks. If all mmap memory
would by default be served from the low memory pool you soon run out of
it and without any good reason.

- --
- --------------. ,-. 444 Castro Street
Ulrich Drepper \ ,-----------------' \ Mountain View, CA 94041 USA
Red Hat `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+vDjB2ijCOnn/RHQRAnHmAJ9V3BwxGTAUs7hw1YXowv0K0cEFFACePj6t
vLI+B5BlYG4ox5WcyFrwg8A=
=IGO2
-----END PGP SIGNATURE-----

2003-05-09 23:12:26

by H. Peter Anvin

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

Timothy Miller wrote:
>>
>> The purpose is that there is a slight task-switching speed advantage if
>> the address is in the bottom 4 GB. Since this affects every process,
>> and most processes use very little TLS, this is worthwhile.
>>
>> This is fundamentally due to a K8 design flaw.
>
> Is there an explicit check somewhere for this? Are the page tables laid
> out differently?
>

No, there are two ways to load the FS base register: use a descriptor,
which is limited to 4 GB but is faster, or WRMSR, which is slower, but
unlimited.

-hpa

2003-05-09 23:49:48

by Edgar Toernig

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

Ulrich Drepper wrote:
> > Why does there ever need to be an explicit HINT that you would prefer a
> > <32 bit address, when it's known a priori that <32 is better? Why
> > doesn't the mapping code ALWAYS try to use 32-bit addresses before
> > resorting to 64-bit?
>
> Because not all memory is addressed via GDT entries. In fact, almost
> none is, only thread stacks and similar gimicks. If all mmap memory
> would by default be served from the low memory pool you soon run out of
> it and without any good reason.

As if there are so many apps that would suffer from that...

Anyway, what's so bad about the idea someone (Linus?) suggested?
Without MAP_FIXED the address given to mmap is already taken as a
hint where to start looking for free memory. So use mmap(4GB,...)
for regular memory and mmap(4kB, ...) for stacks. What's wrong
with that? And if you are really frightend to run out of "low"
memory make the above-4GB allocation the default for addr==0.

Ciao, ET.

2003-05-10 00:45:41

by Ulrich Drepper

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Edgar Toernig wrote:

> Anyway, what's so bad about the idea someone (Linus?) suggested?
> Without MAP_FIXED the address given to mmap is already taken as a
> hint where to start looking for free memory.

The kernel fortunately already defines some semantics to using a
non-NULL first parameter without MAP_FIXED. It means: I prefer
*exactly* this address. If it's not available, give me anything else.
This is used and needed, for instance, when loading prelinked DSOs.

Now you want to give this another semantics. It would need at least one
more MAP_* flag.

Anyway, I don't care what the solution looks like. Changing existing
semantics should be out, that's the only requirement. Since I don't
plan on doing the work I have nothing to decide.

- --
- --------------. ,-. 444 Castro Street
Ulrich Drepper \ ,-----------------' \ Mountain View, CA 94041 USA
Red Hat `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+vE6a2ijCOnn/RHQRAnxgAJ9ptrA6XRvLveB+xZyXZVTz4W8KjgCgkyUp
BwOWiMQys/z8b6HZpneawJs=
=Ra9K
-----END PGP SIGNATURE-----

2003-05-10 01:35:37

by Andi Kleen

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

On Fri, May 09, 2003 at 07:39:46PM +0200, Ulrich Drepper wrote:
>
> > In some vendor kernels it's already in /proc/pid/mapped_base, but that is
> > quite costly to change. That would probably give you the best of both, Just
> > set it to a low value for the thread stacks and then reset it to the default.
> >
> > I guess that would be the better solution for your stacks.
>
> Are you sure this is the best solution? It means the mmap regions for

No, I'm not sure.

On further thinking the mapped_base would not be useful for you currently,
because at least in the SuSE/AMD64 kernel it only applies to 32bit processes.

The real solution is probably to pass in the search start hint in mmap's
address argument and not use MAP_32BiT.

e.g. use something like

/*
* Current gcc still needs PROT_EXEC because it doesn't call
* __enable_execute_stack for trampolines yet.
*/
stack = mmap(0x1000, stack_size, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);

This will give you memory at the beginning of the address space and
beyond 4GB if needed.

This may still be slow, but fixing the search algorithm is a different
problem that can be tackled separately.

> Oh, and please rename MAP_32BIT to MAP_31BIT. This will save nerves on
> all sides.

I bet changing it will cost more nerves in supporting all these people
whose software doesn't compile anymore. And it's not really a lie. 2GB
is 32bit too.

-Andi

2003-05-10 02:40:13

by Edgar Toernig

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

> > Anyway, what's so bad about the idea someone (Linus?) suggested?
[it was Andi]
> > Without MAP_FIXED the address given to mmap is already taken as a
> > hint where to start looking for free memory.
>
> The kernel fortunately already defines some semantics to using a
> non-NULL first parameter without MAP_FIXED. It means: I prefer
> *exactly* this address.

Yeah, ok.

> If it's not available, give me anything else.

And at least on older kernels (don't know about 2.5) it gives you
not "anything" but the next free memory region above that address.

POSIX-draft6 about that topic:

"A non-zero value of addr is taken to be a suggestion of a
process address near which the mapping should be placed."


> Now you want to give this another semantics. It would need at least one
> more MAP_* flag.

No new flag. No new semantic. Everything's already there...

Ciao, ET.

2003-05-10 19:57:55

by David Woodhouse

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

On Sat, 2003-05-10 at 02:48, Andi Kleen wrote:
> > Oh, and please rename MAP_32BIT to MAP_31BIT. This will save nerves on
> > all sides.
>
> I bet changing it will cost more nerves in supporting all these people
> whose software doesn't compile anymore. And it's not really a lie. 2GB
> is 32bit too.

If that's _really_ an issue, then also provide MAP_32BIT which does what
its name implies.

Anyone who was using MAP_32BIT in the knowledge that it really limits to
31 bits gets the breakage they deserve for not reporting and fixing the
problem at the time.

--
dwmw2

2003-05-13 14:08:05

by Timothy Miller

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT



H. Peter Anvin wrote:
> Timothy Miller wrote:
>
>>>The purpose is that there is a slight task-switching speed advantage if
>>>the address is in the bottom 4 GB. Since this affects every process,
>>>and most processes use very little TLS, this is worthwhile.
>>>
>>>This is fundamentally due to a K8 design flaw.
>>
>>Is there an explicit check somewhere for this? Are the page tables laid
>>out differently?
>>
>
>
> No, there are two ways to load the FS base register: use a descriptor,
> which is limited to 4 GB but is faster, or WRMSR, which is slower, but
> unlimited.
>


Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Timothy Miller wrote:
>
>
>>Why does there ever need to be an explicit HINT that you would prefer a
>><32 bit address, when it's known a priori that <32 is better? Why
>>doesn't the mapping code ALWAYS try to use 32-bit addresses before
>>resorting to 64-bit?
>
>
> Because not all memory is addressed via GDT entries. In fact, almost
> none is, only thread stacks and similar gimicks. If all mmap memory
> would by default be served from the low memory pool you soon run out of
> it and without any good reason.


All I have to say is... I appreciate your patience with my ignorant
questions. :)


2003-05-13 18:42:25

by H. Peter Anvin

[permalink] [raw]
Subject: Re: hammer: MAP_32BIT

Followup to: <[email protected]>
By author: David Woodhouse <[email protected]>
In newsgroup: linux.dev.kernel
>
> On Sat, 2003-05-10 at 02:48, Andi Kleen wrote:
> > > Oh, and please rename MAP_32BIT to MAP_31BIT. This will save nerves on
> > > all sides.
> >
> > I bet changing it will cost more nerves in supporting all these people
> > whose software doesn't compile anymore. And it's not really a lie. 2GB
> > is 32bit too.
>
> If that's _really_ an issue, then also provide MAP_32BIT which does what
> its name implies.
>
> Anyone who was using MAP_32BIT in the knowledge that it really limits to
> 31 bits gets the breakage they deserve for not reporting and fixing the
> problem at the time.
>

Agreed.

That being said, I think a more flexible scheme is called for; I still
would like to suggest the MAP_MAXADDR and MAP_MAXADDR_ADVISORY flags
that I mentioned earlier.

If people really want to retain the (rarely used) suggestion address,
I'd suggest making the address argument a pointer to a structure:

struct map_maxaddr {
void *search; /* Suggestion address */
void *min; /* Lowest acceptable address */
void *max; /* Maximum acceptable address */
};

... however, it seems like overkill to me.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64