LinuxLists.cc - A way to shrink process impact on kernel memory usage?

2003-05-09 16:53:34

Subject: A way to shrink process impact on kernel memory usage?

One of the things that's been worked on to reduce kernel memory usage
for processes is to shrink the kernel stack from 8k to 4k. I mean, it's
not like you could shrink it to 6k, right? Well, why not? Why not
allocate an 8k space and put various process-related data structures at
the beginning of it? Sure, a stack overflow could corrupt that data,
but a stack overflow would be disasterous anyhow.

I'm sure that, in addition to the memory allocated by kmalloc, some data
structures are also allocated to track it so that you can know what to
free when you use kfree, right? Well, combining a few things this way
would save a few bytes there too.

Also, if you're really worried about overflow, or you want a guard page
or whatever, then put the data structures at the end and set the initial
stack pointer appropriately.

Someone complained about a process structure already being too bloated.
Unless it's several K in size already, you can bloat it up all you
please this way.

Another advantage is that you could make the datastructures growable.
The stack grows down, and the data grows up. As long as they don't
meet, all is well.

2003-05-10 01:13:02

by Perez-Gonzalez, Inaky

[permalink] [raw]

Subject: RE: A way to shrink process impact on kernel memory usage?

> -----Original Message-----
> From: Timothy Miller [mailto:[email protected]]
>
> One of the things that's been worked on to reduce kernel memory usage
> for processes is to shrink the kernel stack from 8k to 4k. I mean, it's
> not like you could shrink it to 6k, right? Well, why not? Why not
> allocate an 8k space and put various process-related data structures at
> the beginning of it? Sure, a stack overflow could corrupt that data,
> but a stack overflow would be disasterous anyhow.

It is being done already. At least, on i386, alloc_thread_info()
allocates two pages; at the beginning you have the thread info
structure [context and friends].

This is called from copy_process(), dup_task_struct(), alloc_thread_info().

However, what you say makes sense, but it'd be kind of difficult to
calculate how much is enough ... maybe, who knows. But the only
thing you can put in there is stuff that is specific to each thread
(scheduling information, parent/s, childs, siblings, pid maps,
timers? used_math, comm, fsinfo, ipc, etc, etc ...).

Thus, it'd be interesting to collapse all the common stuff in
the task_struct corresponding to a same thread group into a single
one, and move whatever is thread specific out to a thread-specific
structure [alike to thread_info, although I guess you want to keep
thread_info really small for cache performance].

That should save a lot of task_structs when threading and move all
the info to that place you say. It is going to be a lot of work,
though, very kind of 2.7.

> Someone complained about a process structure already being too bloated.
> Unless it's several K in size already, you can bloat it up all you
> please this way.

Not really - the more bloated, the more cache misses you will have.
There are a lot of fields that don't use all the bits and a lot
of Booleans; it'd make sense to collapse all those into a single
word if possible.

> Another advantage is that you could make the datastructures growable.
> The stack grows down, and the data grows up. As long as they don't
> meet, all is well.

To solve that, you put the structures on the top of the area instead
of at the beginning. That way you are sure the stack cannot overflow
over your (very delicate) data structures, and makes it easier to add
an overflow guard page (as the stack end is at the beginning of a
page).

I?aky P?rez-Gonz?lez -- Not speaking for Intel -- all opinions are my own
(and my fault)

2003-05-10 20:30:36

by David Woodhouse

[permalink] [raw]

Subject: Re: A way to shrink process impact on kernel memory usage?

On Fri, 2003-05-09 at 18:10, Timothy Miller wrote:
> Why not allocate an 8k space and put various process-related data
> structures at the beginning of it? Sure, a stack overflow could
> corrupt that data, but a stack overflow would be disasterous anyhow.

No reason why not at all. That's why we've been doing it this way for
years ;)

--
dwmw2

2003-05-13 14:27:08

by Timothy Miller

[permalink] [raw]

Subject: Re: A way to shrink process impact on kernel memory usage?

Perez-Gonzalez, Inaky wrote:
>>-----Original Message-----
>>From: Timothy Miller [mailto:[email protected]]
>>
>>One of the things that's been worked on to reduce kernel memory usage
>>for processes is to shrink the kernel stack from 8k to 4k. I mean, it's
>>not like you could shrink it to 6k, right? Well, why not? Why not
>>allocate an 8k space and put various process-related data structures at
>>the beginning of it? Sure, a stack overflow could corrupt that data,
>>but a stack overflow would be disasterous anyhow.
>
>
> It is being done already. At least, on i386, alloc_thread_info()
> allocates two pages; at the beginning you have the thread info
> structure [context and friends].
>
> This is called from copy_process(), dup_task_struct(), alloc_thread_info().
>
> However, what you say makes sense, but it'd be kind of difficult to
> calculate how much is enough ... maybe, who knows. But the only
> thing you can put in there is stuff that is specific to each thread
> (scheduling information, parent/s, childs, siblings, pid maps,
> timers? used_math, comm, fsinfo, ipc, etc, etc ...).

If you have some data which is common to a group of threads/processes,
it could be stored in one (or more--redundantly) of the process stacks.
If the refcount is not zero and the process stack holding the data is
to die, the data can be moved to another stack or otherwise stored
somewhere else.

>
> Thus, it'd be interesting to collapse all the common stuff in
> the task_struct corresponding to a same thread group into a single
> one, and move whatever is thread specific out to a thread-specific
> structure [alike to thread_info, although I guess you want to keep
> thread_info really small for cache performance].
>
> That should save a lot of task_structs when threading and move all
> the info to that place you say. It is going to be a lot of work,
> though, very kind of 2.7.
>

It might, nevertheless, be a good an equitable solution to the problem.
Another way to skin a cat, as it were.

>
>>Someone complained about a process structure already being too bloated.
>> Unless it's several K in size already, you can bloat it up all you
>>please this way.
>
>
> Not really - the more bloated, the more cache misses you will have.
> There are a lot of fields that don't use all the bits and a lot
> of Booleans; it'd make sense to collapse all those into a single
> word if possible.

Most assuredly. Why are they not already? :)

>
>>Another advantage is that you could make the datastructures growable.
>>The stack grows down, and the data grows up. As long as they don't
>>meet, all is well.
>
>
> To solve that, you put the structures on the top of the area instead
> of at the beginning. That way you are sure the stack cannot overflow
> over your (very delicate) data structures, and makes it easier to add
> an overflow guard page (as the stack end is at the beginning of a
> page).

I believe I mentioned that idea. Either the stack and data grow in
opposite directions, with obvious advantages and risks, or the data is
at the top of the area but therefore not growable.

2003-05-13 19:43:51

by Perez-Gonzalez, Inaky

[permalink] [raw]

Subject: RE: A way to shrink process impact on kernel memory usage?

From: Timothy Miller [mailto:[email protected]]
> Perez-Gonzalez, Inaky wrote:
>
> If you have some data which is common to a group of threads/processes,
> it could be stored in one (or more--redundantly) of the process stacks.
> If the refcount is not zero and the process stack holding the data is
> to die, the data can be moved to another stack or otherwise stored
> somewhere else.

I don't think you need that redundancy - at the end of the day, it
is much simpler to just have the common task struct (could we say,
process?) with the shared stuff - replication is nice, but not in
this area.

> > Not really - the more bloated, the more cache misses you will have.
> > There are a lot of fields that don't use all the bits and a lot
> > of Booleans; it'd make sense to collapse all those into a single
> > word if possible.
>
> Most assuredly. Why are they not already? :)

Beats me ... maybe there are performance concerns I am not aware
of, or simply, it has not been tackled. This is something I have
on my list of "would be interesting to work on".

> > To solve that, you put the structures on the top of the area instead
> > of at the beginning. That way you are sure the stack cannot overflow
> > over your (very delicate) data structures, and makes it easier to add
> > an overflow guard page (as the stack end is at the beginning of a
> > page).
>
> I believe I mentioned that idea. Either the stack and data grow in
> opposite directions, with obvious advantages and risks, or the data is
> at the top of the area but therefore not growable.

Kill me ... my apologies; sometimes it seems that I don't master
reading as much as I thought :]

I?aky P?rez-Gonz?lez -- Not speaking for Intel -- all opinions are my own
(and my fault)