LinuxLists.cc - pre6 VM issues

[permalink] [raw]

Subject: Re: pre6 VM issues

On Tue, 9 Oct 2001, Marcelo Tosatti wrote:

>
> Hi,
>
> I've been testing pre6 (actually its pre5 a patch which Linus sent me
> named "prewith 16GB of RAM (thanks to OSDLabs for that), and I've found

I haven't woke up correctly yet, I guess.

I mean its pre6 with a patch named "p5p6" which Linus sent me.

2001-10-09 14:17:06

[permalink] [raw]

Subject: Re: pre6 VM issues

Most of the traditional unices maintained a pool for each subsystem
(this is really useful when u have the memory to spare), so not matter
what they use memory only from their pool (and if needed peek outside),
but nobody else used the memory from the pool.

I have seen cases where, I have run out of physical memory on my system,
so I try to log in using the serial console, but since the serial driver
does get_free_page (this most likely fails) and the driver complains back.
So, I had suggested a while back that important subsystems should maintain
their own pool (it will take a new thread to discuss the right size of
each pool).

Why can't Linux follow the same approach? especially on systems with a lot
of memory.

Balbir

Marcelo Tosatti wrote:

>Hi,
>
>I've been testing pre6 (actually its pre5 a patch which Linus sent me
>named "prewith 16GB of RAM (thanks to OSDLabs for that), and I've found
>out some problems. First of all, we need to throttle normal allocators
>more often and/or update the low memory limits for normal allocators to a
>saner value. I already said I think allowing everybody to eat up to
>"freepages.min" is too low for a default.
>
>I've got atomic memory failures with _22GB_ of swap free (32GB total):
>
> eth0: can't fill rx buffer (force 0)!
>
>Another issue is the damn fork() special case. Its failing in practice:
>
>bash: fork: Cannot allocate memory
>
>Also with _LOTS_ of swap free. (gigs of them)
>
>Linus, we can introduce a "__GFP_FAIL" flag to be used by _everyone_ which
>wants to do higher order allocations as an optimization (eg allocate big
>scatter-gather tables or whatever). Or do you prefer to make the fork()
>allocation a separate case ?
>
>I'll take a closer look at the code now and make the throttling/limits to
>what I think is saner for a default.
>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>

Attachments:

2001-10-09 14:23:26

[permalink] [raw]

Subject: Re: pre6 VM issues

On Tue, 9 Oct 2001, BALBIR SINGH wrote:

> Most of the traditional unices maintained a pool for each subsystem
> (this is really useful when u have the memory to spare), so not matter
> what they use memory only from their pool (and if needed peek outside),
> but nobody else used the memory from the pool.
>
> I have seen cases where, I have run out of physical memory on my system,
> so I try to log in using the serial console, but since the serial driver
> does get_free_page (this most likely fails) and the driver complains back.
> So, I had suggested a while back that important subsystems should maintain
> their own pool (it will take a new thread to discuss the right size of
> each pool).
>
> Why can't Linux follow the same approach? especially on systems with a lot
> of memory.

There is nothing which avoids us from doing that (there is one reserved
pool I remeber right now: the highmem bounce buffering pool, but that one
is a special case due to the way Linux does IO in high memory and its only
needed on _real_ emergencies --- it will be removed in 2.5, I hope).

In general, its a better approach to share the memory and have a unified
pool. If a given subsystem is not using its own "reversed" memory, another
subsystems can use it.

The problem we are seeing now can be fixed even without the reserved
pools.

2001-10-09 14:32:06

[permalink] [raw]

Subject: Re: pre6 VM issues

On Tue, Oct 09, 2001 at 10:44:37AM -0200, Marcelo Tosatti wrote:
>
> Hi,
>
> I've been testing pre6 (actually its pre5 a patch which Linus sent me
> named "prewith 16GB of RAM (thanks to OSDLabs for that), and I've found
> out some problems. First of all, we need to throttle normal allocators
> more often and/or update the low memory limits for normal allocators to a
> saner value. I already said I think allowing everybody to eat up to
> "freepages.min" is too low for a default.
>
> I've got atomic memory failures with _22GB_ of swap free (32GB total):
>
> eth0: can't fill rx buffer (force 0)!
>
> Another issue is the damn fork() special case. Its failing in practice:
>
> bash: fork: Cannot allocate memory
>
> Also with _LOTS_ of swap free. (gigs of them)
>
> Linus, we can introduce a "__GFP_FAIL" flag to be used by _everyone_ which
> wants to do higher order allocations as an optimization (eg allocate big
> scatter-gather tables or whatever). Or do you prefer to make the fork()
> allocation a separate case ?
>
> I'll take a closer look at the code now and make the throttling/limits to
> what I think is saner for a default.

I've also finished last night to fix all highmem troubles that I could
reproduce on 128mbyte with highmem emulation, I'm confidetn it will work
fine on real highmem too now, I hope to get access soon to some highmem
machine too to test it.

I guess you're not interested to test my patches since they're not in
the mainline direction though.

Andrea

2001-10-09 14:35:26

[permalink] [raw]

Subject: Re: pre6 VM issues

On Tue, 9 Oct 2001, Andrea Arcangeli wrote:

> I guess you're not interested to test my patches since they're not in
> the mainline direction though.

Why they are not in the mainline direction ?

Are they hackish ?

2001-10-09 14:36:56

[permalink] [raw]

Subject: Re: pre6 VM issues

Marcelo Tosatti wrote:

>
>On Tue, 9 Oct 2001, BALBIR SINGH wrote:
>
>>Most of the traditional unices maintained a pool for each subsystem
>>(this is really useful when u have the memory to spare), so not matter
>>what they use memory only from their pool (and if needed peek outside),
>>but nobody else used the memory from the pool.
>>
>>I have seen cases where, I have run out of physical memory on my system,
>>so I try to log in using the serial console, but since the serial driver
>>does get_free_page (this most likely fails) and the driver complains back.
>>So, I had suggested a while back that important subsystems should maintain
>>their own pool (it will take a new thread to discuss the right size of
>>each pool).
>>
>>Why can't Linux follow the same approach? especially on systems with a lot
>>of memory.
>>
>
>There is nothing which avoids us from doing that (there is one reserved
>pool I remeber right now: the highmem bounce buffering pool, but that one
>is a special case due to the way Linux does IO in high memory and its only
>needed on _real_ emergencies --- it will be removed in 2.5, I hope).
>
>In general, its a better approach to share the memory and have a unified
>pool. If a given subsystem is not using its own "reversed" memory, another
>subsystems can use it.
>
>The problem we are seeing now can be fixed even without the reserved
>pools.
>
I agree that is the fair and nice thing to do, but I was talking about reserving
memory for device vs sharing it with a user process, user processes can wait,
their pages can even be swapped out if needed. But for a device that is not willing
to wait (GFP_ATOMIC) say in an interrupt context, this might be a issue.

Anyway, how do you plan to solve this ?
Balbir

>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>

Attachments:

2001-10-09 14:43:26

[permalink] [raw]

Subject: Re: pre6 VM issues

On Tue, Oct 09, 2001 at 11:13:07AM -0200, Marcelo Tosatti wrote:
>
>
> On Tue, 9 Oct 2001, Andrea Arcangeli wrote:
>
> > I guess you're not interested to test my patches since they're not in
> > the mainline direction though.
>
> Why they are not in the mainline direction ?
>
> Are they hackish ?

IMHO the other way around, first of all I'm not using the infinite loop,
and I dropped a few bits ready for doing a few different things in the
next days like selecting the process to kill in function of the
allocation rate and by collecting away exclusive pages in get_swap_page
etc...

I'll release the stuff soon as usual in separate patches easily readable
and mergeable, because as said I cannot find anything wrong anymore in
the allocator with my testing resources. Of course I'd really like if
you could test it on the 16G box, but as said it won't test the approch
to the allocator faliure fixes that is been implemented in mainline
which I understood you're working on at the moment.

Andrea

2001-10-09 14:43:16

[permalink] [raw]

Subject: Re: pre6 VM issues

BALBIR SINGH wrote:

>>
>> There is nothing which avoids us from doing that (there is one reserved
>> pool I remeber right now: the highmem bounce buffering pool, but that
>> one
>> is a special case due to the way Linux does IO in high memory and its
>> only
>> needed on _real_ emergencies --- it will be removed in 2.5, I hope).
>>
>> In general, its a better approach to share the memory and have a unified
>> pool. If a given subsystem is not using its own "reversed" memory,
>> another
>> subsystems can use it.
>>
>> The problem we are seeing now can be fixed even without the reserved
>> pools.
>>
> I agree that is the fair and nice thing to do, but I was talking about
> reserving
> memory for device vs sharing it with a user process, user processes
> can wait,
> their pages can even be swapped out if needed. But for a device that
> is not willing
> to wait (GFP_ATOMIC) say in an interrupt context, this might be a issue.

>
>
> Anyway, how do you plan to solve this ?
> Balbir

I did not realize that highmem was causing this problem you were facing,
anyway my argument
about the pools still holds.

Balbir

>
>>
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>
>
>
>
>------------------------------------------------------------------------
>
>----------------------------------------------------------------------------------------------------------------------
>Information transmitted by this E-MAIL is proprietary to Wipro and/or its Customers and
>is intended for use only by the individual or entity to which it is
>addressed, and may contain information that is privileged, confidential or
>exempt from disclosure under applicable law. If you are not the intended
>recipient or it appears that this mail has been forwarded to you without
>proper authority, you are notified that any use or dissemination of this
>information in any manner is strictly prohibited. In such cases, please
>notify us immediately at mailto:[email protected] and delete this mail
>from your records.
>----------------------------------------------------------------------------------------------------------------------
>

Attachments:

2001-10-09 14:45:16

[permalink] [raw]

Subject: Re: pre6 VM issues

On Tue, Oct 09, 2001 at 08:07:19PM +0530, BALBIR SINGH wrote:
> their pages can even be swapped out if needed. But for a device that is not willing
> to wait (GFP_ATOMIC) say in an interrupt context, this might be a issue.

There's just a reserved pool for atomic allocations. See the __GFP_WAIT
check in __alloc_pages.

Andrea

2001-10-09 14:45:16

[permalink] [raw]

Subject: Re: pre6 VM issues

On Tue, 9 Oct 2001, Andrea Arcangeli wrote:

> On Tue, Oct 09, 2001 at 10:44:37AM -0200, Marcelo Tosatti wrote:
> >
> > Hi,
> >
> > I've been testing pre6 (actually its pre5 a patch which Linus sent me
> > named "prewith 16GB of RAM (thanks to OSDLabs for that), and I've found
> > out some problems. First of all, we need to throttle normal allocators
> > more often and/or update the low memory limits for normal allocators to a
> > saner value. I already said I think allowing everybody to eat up to
> > "freepages.min" is too low for a default.
> >
> > I've got atomic memory failures with _22GB_ of swap free (32GB total):
> >
> > eth0: can't fill rx buffer (force 0)!
> >
> > Another issue is the damn fork() special case. Its failing in practice:
> >
> > bash: fork: Cannot allocate memory
> >
> > Also with _LOTS_ of swap free. (gigs of them)
> >
> > Linus, we can introduce a "__GFP_FAIL" flag to be used by _everyone_ which
> > wants to do higher order allocations as an optimization (eg allocate big
> > scatter-gather tables or whatever). Or do you prefer to make the fork()
> > allocation a separate case ?
> >
> > I'll take a closer look at the code now and make the throttling/limits to
> > what I think is saner for a default.
>
> I've also finished last night to fix all highmem troubles that I could
> reproduce on 128mbyte with highmem emulation, I'm confidetn it will work
> fine on real highmem too now, I hope to get access soon to some highmem
> machine too to test it.
>
> I guess you're not interested to test my patches since they're not in
> the mainline direction though.

Ah, I forgot something: Even if I'm not interested in the patches the 16GB
machine is available to the community. If you (or any other VM people who
need the machine) want access, just tell me.

2001-10-09 14:45:16

[permalink] [raw]

Subject: Re: pre6 VM issues

On Tue, 9 Oct 2001, BALBIR SINGH wrote:

> Marcelo Tosatti wrote:
>
> >
> >On Tue, 9 Oct 2001, BALBIR SINGH wrote:
> >
> >>Most of the traditional unices maintained a pool for each subsystem
> >>(this is really useful when u have the memory to spare), so not matter
> >>what they use memory only from their pool (and if needed peek outside),
> >>but nobody else used the memory from the pool.
> >>
> >>I have seen cases where, I have run out of physical memory on my system,
> >>so I try to log in using the serial console, but since the serial driver
> >>does get_free_page (this most likely fails) and the driver complains back.
> >>So, I had suggested a while back that important subsystems should maintain
> >>their own pool (it will take a new thread to discuss the right size of
> >>each pool).
> >>
> >>Why can't Linux follow the same approach? especially on systems with a lot
> >>of memory.
> >>
> >
> >There is nothing which avoids us from doing that (there is one reserved
> >pool I remeber right now: the highmem bounce buffering pool, but that one
> >is a special case due to the way Linux does IO in high memory and its only
> >needed on _real_ emergencies --- it will be removed in 2.5, I hope).
> >
> >In general, its a better approach to share the memory and have a unified
> >pool. If a given subsystem is not using its own "reversed" memory, another
> >subsystems can use it.
> >
> >The problem we are seeing now can be fixed even without the reserved
> >pools.
> >
> I agree that is the fair and nice thing to do, but I was talking about reserving
> memory for device vs sharing it with a user process, user processes can wait,
> their pages can even be swapped out if needed. But for a device that is not willing
> to wait (GFP_ATOMIC) say in an interrupt context, this might be a issue.
>
>
> Anyway, how do you plan to solve this ?

I plan to have saner limits for atomic allocations for 2.4. For the corner
cases, we can make then those limits tunable.

For 2.5, I guess we'll need some scheme for those corner cases, since they
will probably become more common (think about gigabit ethernet, etc).

I'm not sure yet which one will be used. Ben ([email protected]) has a nice
scheme ready for reservation. But thats 2.5 only anyway.

2001-10-09 14:50:26

[permalink] [raw]

Subject: Re: pre6 VM issues

On Tue, Oct 09, 2001 at 10:44:37AM -0200, Marcelo Tosatti wrote:
>
> Hi,
>
> I've been testing pre6 (actually its pre5 a patch which Linus sent me
> named "prewith 16GB of RAM (thanks to OSDLabs for that), and I've found
> out some problems. First of all, we need to throttle normal allocators
> more often and/or update the low memory limits for normal allocators to a
> saner value. I already said I think allowing everybody to eat up to
> "freepages.min" is too low for a default.
>
> I've got atomic memory failures with _22GB_ of swap free (32GB total):
>
> eth0: can't fill rx buffer (force 0)!
>
> Another issue is the damn fork() special case. Its failing in practice:
>
> bash: fork: Cannot allocate memory
>
> Also with _LOTS_ of swap free. (gigs of them)

It could be just fragmentation but the fact it doesn't happen in
non-highmem pretty much shows that shows the memory balancing isn't
doing the right thing, you hide the problem with the infinite loop for
non atomic order 0 allocations and that's just broken, as best it will
be slower in collecting the right pages away.

My approch shouldn't fail so easily in fork despite I'm not looping in
fork either, because I'm trying to do better decisions since the first
place in the memory balancing, I don't wait the infinite loop to
eventually collect away the right pages.

Andrea

2001-10-09 14:53:56

[permalink] [raw]

Subject: Re: pre6 VM issues

On Tue, Oct 09, 2001 at 11:23:24AM -0200, Marcelo Tosatti wrote:
> machine is available to the community. If you (or any other VM people who
> need the machine) want access, just tell me.

I'd like to get a login. I think my project is been approved and we'll
get soon an additional machine to test (that doesn't hurt), but in the
meantime I'd be just interested to run some test on real highmem of
course.

Andrea

2001-10-09 14:56:07

[permalink] [raw]

Subject: Re: pre6 VM issues

Andrea Arcangeli wrote:

>On Tue, Oct 09, 2001 at 08:07:19PM +0530, BALBIR SINGH wrote:
>
>>their pages can even be swapped out if needed. But for a device that is not willing
>>to wait (GFP_ATOMIC) say in an interrupt context, this might be a issue.
>>
>
>There's just a reserved pool for atomic allocations. See the __GFP_WAIT
>check in __alloc_pages.
>
I apologize for my ignorance on this
Balbir

>
>Andrea
>

Attachments:

2001-10-09 14:57:16

[permalink] [raw]

Subject: Re: pre6 VM issues

On Tue, 9 Oct 2001, Andrea Arcangeli wrote:

> On Tue, Oct 09, 2001 at 10:44:37AM -0200, Marcelo Tosatti wrote:
> >
> > Hi,
> >
> > I've been testing pre6 (actually its pre5 a patch which Linus sent me
> > named "prewith 16GB of RAM (thanks to OSDLabs for that), and I've found
> > out some problems. First of all, we need to throttle normal allocators
> > more often and/or update the low memory limits for normal allocators to a
> > saner value. I already said I think allowing everybody to eat up to
> > "freepages.min" is too low for a default.
> >
> > I've got atomic memory failures with _22GB_ of swap free (32GB total):
> >
> > eth0: can't fill rx buffer (force 0)!
> >
> > Another issue is the damn fork() special case. Its failing in practice:
> >
> > bash: fork: Cannot allocate memory
> >
> > Also with _LOTS_ of swap free. (gigs of them)
>
> It could be just fragmentation but the fact it doesn't happen in
> non-highmem pretty much shows that shows the memory balancing isn't
> doing the right thing, you hide the problem with the infinite loop for
> non atomic order 0 allocations and that's just broken, as best it will
> be slower in collecting the right pages away.
>
> My approch shouldn't fail so easily in fork despite I'm not looping in
> fork either, because I'm trying to do better decisions since the first
> place in the memory balancing, I don't wait the infinite loop to
> eventually collect away the right pages.

The problem may well be in the memory balancing Andrea, but I'm not trying
to hide it with the infinite loop.

The infinite loop is just a guarantee that we'll have a reliable way of
throttling the allocators which can block. Not doing the infinite loop is
just way too fragile IMO and it is _prone_ to fail in intensive
loads.

If the problem is the highmem balancing, I'll love to get your fixes and
integrate with the infinite loop logic, which is a separated (related,
yes, but separate) thing.

2001-10-09 15:40:09

[permalink] [raw]

Subject: Re: pre6 VM issues

On Tue, Oct 09, 2001 at 11:34:47AM -0200, Marcelo Tosatti wrote:
> The problem may well be in the memory balancing Andrea, but I'm not trying
> to hide it with the infinite loop.

I assumed fixing the oom faliures with highmem was the main reason of
the infinite loop.

> The infinite loop is just a guarantee that we'll have a reliable way of
> throttling the allocators which can block. Not doing the infinite loop is

Throttling have nothing to do with the infinite loop.

> just way too fragile IMO and it is _prone_ to fail in intensive
> loads.

It is too fragile if the vm is doing the wrong actions and so we must
loop over and over again before it finally does the right thing.

If allocation fails that's a nice feedback that tell us "the memory
balancing is at least inefficient in doing the right thing, looping
would only waste more cache and more time for the allocation".

Think a list where pages can be only freeable or unfreeable. Now scan
_all_ the pages and free all the freeable ones. Finished. If it failed
and it couldn't free anything it means there was nothing to free so
we're oom. How can that be "fragile"?

In real life it isn't as simple as that, there's some "race" effect
caming from the schedules in between, there are multiple lists, there's
swapout etc... so it's a little more complex than just "freeable" and
"unfreeable" and a single list, but it can be done, 2.2 does that too,
if we loop over and over again and we do no progress in the right
direction I prefer to know about that via an allocation faliure rather
than by just getting sucking performance. Also an allocation faliure is
a minor problem compared to a deadlock that the infinite loop cannot
prevent.

> If the problem is the highmem balancing, I'll love to get your fixes and
> integrate with the infinite loop logic, which is a separated (related,
> yes, but separate) thing.

The infinite loop shouldn't do anything except introducing the deadlock
after that (otherwise it means I failed :), but you're free to go in
your direction if you think it's the right one of course (like I'm free
to go in my direction since I think it's the right one).

Andrea

2001-10-09 16:30:12

[permalink] [raw]

Subject: Re: pre6 VM issues

On Tue, 9 Oct 2001, Andrea Arcangeli wrote:

> On Tue, Oct 09, 2001 at 11:34:47AM -0200, Marcelo Tosatti wrote:
> > The problem may well be in the memory balancing Andrea, but I'm not trying
> > to hide it with the infinite loop.
>
> I assumed fixing the oom faliures with highmem was the main reason of
> the infinite loop.
>
> > The infinite loop is just a guarantee that we'll have a reliable way of
> > throttling the allocators which can block. Not doing the infinite loop is
>
> Throttling have nothing to do with the infinite loop.

Sorry but the infinite loop does throttles page reclamation until there
is enough memory for the process allocating memory to go on.

> > just way too fragile IMO and it is _prone_ to fail in intensive
> > loads.
>
> It is too fragile if the vm is doing the wrong actions and so we must
> loop over and over again before it finally does the right thing.
>
> If allocation fails that's a nice feedback that tell us "the memory
> balancing is at least inefficient in doing the right thing, looping
> would only waste more cache and more time for the allocation".
>
> Think a list where pages can be only freeable or unfreeable. Now scan
> _all_ the pages and free all the freeable ones. Finished. If it failed
> and it couldn't free anything it means there was nothing to free so
> we're oom. How can that be "fragile"?

That is fragile IMHO, Andrea.

The infinite loop is simple, reliable logic which shows to works (as long
as the OOM killer is working correctly).

> In real life it isn't as simple as that, there's some "race" effect
> caming from the schedules in between, there are multiple lists, there's
> swapout etc... so it's a little more complex than just "freeable" and
> "unfreeable" and a single list, but it can be done, 2.2 does that too,
> if we loop over and over again and we do no progress in the right
> direction I prefer to know about that via an allocation faliure rather
> than by just getting sucking performance. Also an allocation faliure is
> a minor problem compared to a deadlock that the infinite loop cannot
> prevent.

If the OOM killer is doing its job correctly, a deadlock will not happen.

> > If the problem is the highmem balancing, I'll love to get your fixes and
> > integrate with the infinite loop logic, which is a separated (related,
> > yes, but separate) thing.
>
> The infinite loop shouldn't do anything except introducing the deadlock
> after that (otherwise it means I failed :), but you're free to go in
> your direction if you think it's the right one of course (like I'm free
> to go in my direction since I think it's the right one).

Sure. :)

2001-10-09 16:50:04