2001-03-22 22:06:59

by Richard Jerrell

[permalink] [raw]
Subject: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

2.4.1 has a memory leak (temporary) where anonymous memory pages that have
been moved into the swap cache will stick around after their vma has been
unmapped by the owning process. These pages are not free'd in free_pte()
because they are still referenced by the page cache. In addition, if the
pages are dirty, they will be written out to the swap device before they
are reclaimed even though the owning process no longer will be using the
pages.

free_pte in mm/memory.c has been modified to check to see if the page is
only being referenced by the swap cache (and possibly buffers). If so,
the buffers (if existant) are free'd and the page and swap cache
entry are removed immediately.

Essentially, this is the same patch as before, but there was one condition
in which case we would leak and extra reference to the targeted page if
the counts would not allow us to remove the swap cache entry. The leak in
2.4.1 also applies to 2.4.2 and 2.4.3-pre5.

Rich Jerrell
[email protected]


Attachments:
2.4.1-paging-fix-22.03.01.patch (1.46 kB)

2001-03-22 23:20:01

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

On Thu, 22 Mar 2001, Richard Jerrell wrote:

> 2.4.1 has a memory leak (temporary) where anonymous memory pages
> that have been moved into the swap cache will stick around after
> their vma has been unmapped by the owning process.

> free_pte in mm/memory.c has been modified to check to see if the
> page is only being referenced by the swap cache

Your idea is nice, but the patch lacks a few things:

- SMP locking, what if some other process faults in this page
between the atomic_read of the page count and the test later?
- testing if our process is the _only_ user of this swap page,
for eg. apache you'll have lots of COW-shared pages .. it would
be good to keep the page in memory for our siblings

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com/

2001-03-23 16:24:09

by Richard Jerrell

[permalink] [raw]
Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

> Your idea is nice, but the patch lacks a few things:
>
> - SMP locking, what if some other process faults in this page
> between the atomic_read of the page count and the test later?

It can't happen. free_pte is called with the page_table_lock held in
addition to having the mmap_sem downed.

> - testing if our process is the _only_ user of this swap page,
> for eg. apache you'll have lots of COW-shared pages .. it would
> be good to keep the page in memory for our siblings

This is already done in free_page_and_swap_cache.

There is a problem with a possible kernel panic in that
try_to_free_buffers is called with a wait of 1 (thanks to Andrew Morton
for pointing that out) and we might reschedule while we wait on io. So,
to fix it, here is an even newer (and simpler) patch. Everything is
handled in free_page_and_swap_cache, so we just call that if we can
successfully look up the entry.

Rich


Attachments:
2.4.1-paging-fix-23.03.01.patch (711.00 B)

2001-03-23 23:59:37

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

On Fri, 23 Mar 2001, Richard Jerrell wrote:
> > Your idea is nice, but the patch lacks a few things:
> >
> > - SMP locking, what if some other process faults in this page
> > between the atomic_read of the page count and the test later?
>
> It can't happen. free_pte is called with the page_table_lock held in
> addition to having the mmap_sem downed.

The page_table_lock and the mmap_sem only protect the *current*
task. Think about something like an apache with 500 children who
COW share the same page...

> > - testing if our process is the _only_ user of this swap page,
> > for eg. apache you'll have lots of COW-shared pages .. it would
> > be good to keep the page in memory for our siblings
>
> This is already done in free_page_and_swap_cache.

Ok ...

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-03-27 16:23:33

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

Hi,

On Thu, Mar 22, 2001 at 05:21:46PM -0500, Richard Jerrell wrote:
> 2.4.1 has a memory leak (temporary) where anonymous memory pages that have
> been moved into the swap cache will stick around after their vma has been
> unmapped by the owning process. These pages are not free'd in free_pte()
> because they are still referenced by the page cache. In addition, if the
> pages are dirty, they will be written out to the swap device before they
> are reclaimed even though the owning process no longer will be using the
> pages.
>
> free_pte in mm/memory.c has been modified to check to see if the page is
> only being referenced by the swap cache (and possibly buffers).

But is it worth it?

fork and exit are very hot paths in the kernel, and this patch can force
a page cache lookup on a large number of pte which wouldn't be looked
up before.

The classic case is sendmail or apache, where you can have a parent
process rapidly forking a large number of children. If part of the
parent gets swapped out due to lack of use, then the children all
inherit swapped ptes and each such page will result in an extra page
cache lookup in zap_page_range on exit with this change.

Given that the leak is, as you say, temporary, and that the leak will
be recovered as soon as we start swapping again, do we really want to
pollute the fast path for the sake of a bit more speed during
swapping?

Cheers,
Stephen

2001-03-27 21:19:37

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

On Tue, 27 Mar 2001, Richard Jerrell wrote:

> Instead of removing the swap cache page at process exit and possibly
> expending time doing disk IO as you have pointed out, we check during
> refill_inactive_scan and page_launder for a page that is

Three comments:

1. we take an extra reference on the page, how does that
affect the test for if the page is shared or not ?
2. we call delete_from_swap_cache with the pagemap_lru_lock
held, since this tries to grab the pagecache_lock we can
easily deadlock with the rest of the kernel (where the
locking order is opposite)
3. there are no comments in the code explaining what this
suspicious-looking piece of code does ;)

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com/


2001-03-27 21:11:24

by Richard Jerrell

[permalink] [raw]
Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

> fork and exit are very hot paths in the kernel, and this patch can force
> a page cache lookup on a large number of pte which wouldn't be looked
> up before.

True, but I don't know how large of a performance hit the system takes.

> Given that the leak is, as you say, temporary, and that the leak will
> be recovered as soon as we start swapping again, do we really want to
> pollute the fast path for the sake of a bit more speed during
> swapping?

It isn't speed of swapping that is the biggest problem. The problem is
that if you run a memory intensive task, exit after being placed on an
lru, and run it again, there won't be enough memory to execute because all
the memory you used previously is now sitting in the swap cache. That
isn't to say that without being patched the speed isn't poor. After all,
we'd be paging out a dead processes pages.

But you are right, this fix is slow and that can be improved. So,
hopefully this patch is satisfactory in respect to speed and fixing the
leak. And will also remove the panic which is possible with the other
patches (can't do a lookup_swap_cache with a spinlock held).

Instead of removing the swap cache page at process exit and possibly
expending time doing disk IO as you have pointed out, we check during
refill_inactive_scan and page_launder for a page that is

1) in the swap cache
2) is not locked
3) is only being referenced by the swap cache, us, and possibly by
buffers
4) has no one else referencing the swap cell

If that is true, we can safely remove that page without writing it to
disk. In addition, the number of swap cache pages are included in the
amount returned from vm_enough_memory to get rid of the temporary leak.

So, the exit path remains unchanged, reclaiming a page is faster for when
the page is no longer being mapped, and the lazy reclaiming for multiply
referenced pages remains intact.

Rich


Attachments:
2.4.1-paging-fix-27.03.01.patch (2.50 kB)

2001-03-27 21:53:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

On Tue, 27 Mar 2001, Richard Jerrell wrote:
>
> Instead of removing the swap cache page at process exit and possibly
> expending time doing disk IO as you have pointed out, we check during
> refill_inactive_scan and page_launder for a page that is

I think this patch looks pretty good. However, I don't think you can
safely do a "is_shared()" query without holding the page lock.

I'd be happy to be shown otherwise, of course. I'm just generally very
wary of "is_shared()", and that function makes me nervous. I'd almost
prefer to get rid of it, and test for the stuff it tests for directly
(most places that test this are likely to not need all the tests
anyway).

I also have this suspicion that most of the advantage of this patch
could easily be gotten by just testing for the exclusive "no longer
used" case in the swap-cache "writepage()" function. That would have
the advantage of localizing the test more, and minimizing special-case
swap-cache tests in the general VM codepaths.

Comments?

Linus


2001-03-27 22:52:57

by Richard Jerrell

[permalink] [raw]
Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

> 1. we take an extra reference on the page, how does that
> affect the test for if the page is shared or not ?

is_page_shared expects us to have our own reference to the page.

> 2. we call delete_from_swap_cache with the pagemap_lru_lock
> held, since this tries to grab the pagecache_lock we can
> easily deadlock with the rest of the kernel (where the
> locking order is opposite)

You're right. Oversight on my part. Here is another version of the
patch.

> 3. there are no comments in the code explaining what this
> suspicious-looking piece of code does ;)

Oops... I sent out the wrong version of the patch the first time. This
one has comments, promise. And it has one less bug. :)

Rich


Attachments:
2.4.1-paging-fix-27.03.01.patch (3.40 kB)

2001-03-27 23:05:07

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

On Tue, 27 Mar 2001, Richard Jerrell wrote:

> Oops... I sent out the wrong version of the patch the first time.
> This one has comments, promise. And it has one less bug. :)

Looks good to me (at first glance). Any volunteer to
stress-test this on an SMP machine ?

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-03-28 23:49:47

by Tim Haynes

[permalink] [raw]
Subject: Re: Ideas for the oom problem

On Wed, Mar 28, 2001 at 06:33:04PM -0500, Hacksaw wrote:

> > Anyone working as root is (sorry) an idiot! root's processes are normally
> > quite system-relevant and so they should never be killed, if we can avoid
> > it.
>
> The real world intrudes. Root sometimes needs to look at documentation,
> which, these days is often available as html. Sometimes it's only as html.
> And people in a panic who aren't trained sys-admins aren't going to remember
> to log in as someone else.

Why are they logged in as root in the first place? Is there something they
can't do over sudo?
I definitely remember seeing a document saying `if you find yourself needing to
`man foo', do it in another terminal as your non-root self'; it might or might
not've been the SAG.

In any case, what happened to `if you use this rope you will hang yourself'?
There has to be a point where you abandon catering for all kinds of fool and
get on with writing something useful, I think.

> I completely agree that doing general work as root is a bad idea. I do most
> root things via sudo. It sure would be nice if all the big dists supplied it
> (Hey, RedHat! You listening?) as part of their normal set.

RH have been listening since v7.0.

~Tim

2001-03-29 00:13:37

by Hacksaw

[permalink] [raw]
Subject: Re: Ideas for the oom problem

> On Wed, Mar 28, 2001 at 06:33:04PM -0500, Hacksaw wrote:

> Why are they logged in as root in the first place? Is there something they
> can't do over sudo?

I have the "Gnome workstation" version of rawhide (7.0.xxx) on my new laptop.
I don't see sudo. Of course, it's rawhide, but you'd think, if it were in 7.0,
it'd make it. Or maybe they decided that the gnome workstation didn't need
it... Hmmm.

> I definitely remember seeing a document saying `if you find yourself needing to
> `man foo', do it in another terminal as your non-root self'; it might or might
> not've been the SAG.

Sucks if you are trying to figure out a VT problem.

> In any case, what happened to `if you use this rope you will hang yourself'?
> There has to be a point where you abandon catering for all kinds of fool and
> get on with writing something useful, I think.

Let's accept one thing: Root, should in fact, be allowed to do anything a
regular user can. The fact that hanging is a possibility might ought to be
pointed out. I have my shell set up to tell me I'm root. But the fact is, the
typical sys-admin is essentially always logged in as root somewhere, and
changing terminals to look at man pages is sometimes not an option.

For that matter, I have often figured out that something had funny permission
problems by discovering that the problem goes away if I run a program as root.

Assuming everything root is doing must be sacrosanct is a pipe dream.
Assuming everything a regular user is doing is expendable is BOFH think.

I do agree that you have to draw a line. I'm just saying that's the wrong one.

> > I completely agree that doing general work as root is a bad idea. I do most
> > root things via sudo. It sure would be nice if all the big dists supplied it
> > (Hey, RedHat! You listening?) as part of their normal set.
>
> RH have been listening since v7.0.

Good. I hope it comes out well in 7.1, considering my experience with rawhide.


2001-03-28 23:34:27

by Hacksaw

[permalink] [raw]
Subject: Re: Ideas for the oom problem

> --On Wednesday, March 28, 2001 09:38:04 -0500 Hacksaw <[email protected]>
> wrote:
> >
> > Deciding what not to kill based on who started it seems like a bad idea.
> > Root can start netscape just as easily as any user, but if the choice of
> > processes to kill is root's netscape or a user's experimental database,
> > I'd want the netscape to go away.
>
> root does not use netscape -FULLSTOP-

Making assumptions about what users will do is foolish.

> Anyone working as root is (sorry) an idiot! root's processes are normally
> quite system-relevant and so they should never be killed, if we can avoid
> it.

The real world intrudes. Root sometimes needs to look at documentation, which,
these days is often available as html. Sometimes it's only as html. And people
in a panic who aren't trained sys-admins aren't going to remember to log in as
someone else.

I completely agree that doing general work as root is a bad idea. I do most
root things via sudo. It sure would be nice if all the big dists supplied it
(Hey, RedHat! You listening?) as part of their normal set.

> There can however be processes owned by other users which shouldn't be
> killed in OOM-Situation, but generally root's processes are more important
> than a normal user's processes.

I'd suggest that this is going to change. Not to regular users, though, so
it's still a good point. But we should be figuring out how to compartmentalize
all our servers. Rarely do most servers need to run as root. Just login ones,
and those should be limited.

So which should die, the users experiment, or identd?

> What about doing something really critical to avoid the upcoming OOM-situ
> and get your shell killed because you were to slow?

Right. I agree that roots shell should be exempt. It may be that all shells
should be exempt, or maybe all recent shells.

Better, though, would be to establish the idea of "linchpins".

A linchpin is a process marked with a don't kill for OOM flag (a capability?).
Only those in root group should be able to start one. And darn few things
should be marked as such. Some very small shell, vi, ed, maybe a small emacs.
Just enough so that our heroic admin can gracefully ease the OOM situ by
changing a few bits of /etc or killing off a few well chosen processes.

On the other hand, a flag that says "kill me first" might be even better.

In any case, I'd certainly expect the OOM killer to sort by memory usage, and
kill off the hogs first. I assume it does that.



2001-03-29 08:04:19

by Helge Hafting

[permalink] [raw]
Subject: Re: Disturbing news..

Ben Ford wrote:
>

> There are two problems I see here. First, there are several known ways
> to elevate privileges.
Fixable, except from guessing the root password which is hard.

> If a virus can elevate privileges, then it owns
> you. Second, this is a multi-OS virus. If you dual-boot into Windows,
> any ELF files accessible can be infected. With this one, that isn't a
> prob, but when somebody codes in an ext2 driver to their virus, then
> we've got issues.

And the only cure then is not make your linux fs accessible from
windows. I.e. not on a disk for which windows have a driver
installed. Preferably not the same computer.

Or simply "don't run untrusted executables under windows". Do
so in linux only, where protection applies. Do anybody ever
_need_ to run a program they got in the mail?

Helge Hafting

2001-03-29 12:10:55

by Walter Hofmann

[permalink] [raw]
Subject: Re: Disturbing news..


On Wed, 28 Mar 2001, Jesse Pollard wrote:

> By itself it doesn't - but if you also don't have user/group/world rw and
> don't own the file, you can't do anything to it.

This is all completely useless. Why not remove world rw permissions in the
first place. If the admin isn't even able to write a cron job that does
this, all help is lost.

> It's only there to reduce accidents. If you want to go full out,
> remove the symbols from the file.

Just as useless.

> Now, if ELF were to be modified, I'd just add a segment checksum
> for each segment, then put the checksum in the ELF header as well as
> in the/a segment header just to make things harder. At exec time a checksum
> verify could (expensive) be done on each segment. A reduced level could be
> done only on the data segment or text segment. This would at least force
> the virus to completly read the file to regenerate the checksum.

So? The virus will just redo the checksum. Sooner or later their will be a
routine to do this in libbfd and this all reduces to a single additional
line of code.

> That change would even allow for signature checks of the checksum if the
> signature was stored somewhere else (system binaries/setuid binaries...).
> But only in a high risk environment. This could even be used for a scanner
> to detect ANY change to binaries (and fast too - signature check of checksums
> wouldn't require reading the entire file).

One sane way to do this is to store the sig on a ro medium and make the
kernel check the sig of every binary before it is run.

HOWEVER, this means no compilers will work, and you have to delete all
script languages like perl or python (or make all of them check the
signature).

Useless again, IMO.

> In any case, the problem is limited to one user, even if nothing is done.

Your best bet is to educate your users.

Walter

2001-03-28 00:30:04

by james

[permalink] [raw]
Subject: Ideas for the oom problem


Hi Kernel Guru's

Here are my ideas on how too deal with the oom situation, most of these
should be thought of stuff to do in 2.5.x kernels, because it touches a lot
of kernel path ways, with possible back porting
once it is tested.

I propose a three prong approach too this problem

Prong 1: WHAT TOO KILL

a. don't kill any task with a uid < 100

b. if uid between 100 to 500 or CAP-SYS equivalent enabled
set it too a lower priority, so if it is at fault it will happen slower
giving more time before the system collapses

c. if a task is nice'd then immediately put the task too sleep, and schedule
all code / data too be swapped out, or thrown away as appropiate. do not
reschedule the task too continue until memory is available

d. kill any normal user interactive tasks that is started during a memory
crisis.

Prong 2 WHAT TOO DO ABOUT STABILIZING THE SYSTEM

allocate a pool of memory at system start up that is too be released to the
memory pool when the system is in a memory crisis. This will reduce system
swapping, and allow the system too stablize slightly

report any task asking for large pool of memory while the system is in
oom crisis. if uid > 500 and was started from an interactive shell it should
be killed.

when the crisis is ended, re-adquire the memory pool for later usage.

Prong 3 providing information about oom crisis too user land

create /proc/vm/oom_crisis this would be readonly file owned by root it would
report if the system is in crisis and the uid of any process that is asking
for large amounts of ram while the system
is in crisis.

create a SIGDANGER handler that is sent out too all tasks that have
registered a handler when the kernel enters oom_kill, give these tasks a high
priority access too system resources.

this would enable user land programs too deal with the situation with out
continuous polling free ram/swap. They could email/page sysadmin and user
about the crisis and add additional swap resources and kill any know non
essential tasks. and probe system for possible broken tasks, such as
netscape-common tasks not connected too netscape client, at least i have been
known too find these when netscape crashes.



Okay that is my idea, i am putting on my flame proof suit and getting ready
for the flames that are sure too come my way.....



James
kernelnewbie in training

2001-03-28 00:54:14

by Rik van Riel

[permalink] [raw]
Subject: Re: Ideas for the oom problem

On Tue, 27 Mar 2001, james wrote:

> Here are my ideas on how too deal with the oom situation,

> I propose a three prong approach too this problem

Isn't that a bit much for an emergency situation that never
even occurs on most systems ?

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-03-28 01:09:26

by Doug Ledford

[permalink] [raw]
Subject: Re: Ideas for the oom problem

Rik van Riel wrote:
>
> On Tue, 27 Mar 2001, james wrote:
>
> > Here are my ideas on how too deal with the oom situation,
>
> > I propose a three prong approach too this problem
>
> Isn't that a bit much for an emergency situation that never
> even occurs on most systems ?

I've been using our internal tree for my testing, and I'm reluctant to let my
experiences there cause me to draw conclusions about other trees. So, will
you please tell me which version of the kernel you think has a vm that only
triggers the oom killer in emergency situations so I can test it here to see
if you are right?

--

Doug Ledford <[email protected]> http://people.redhat.com/dledford
Please check my web site for aic7xxx updates/answers before
e-mailing me about problems

2001-03-28 01:15:56

by james

[permalink] [raw]
Subject: Re: Ideas for the oom problem

On Tuesday 27 March 2001 18:52, Rik van Riel wrote:
> On Tue, 27 Mar 2001, james wrote:
> > Here are my ideas on how too deal with the oom situation,
> >
> > I propose a three prong approach too this problem
>
> Isn't that a bit much for an emergency situation that never
> even occurs on most systems ?
>
> Rik
> --
> Virtual memory is like a game you can't win;
> However, without VM there's truly nothing to lose...
>
> http://www.surriel.com/
> http://www.conectiva.com/ http://distro.conectiva.com.br/


Given the amount, trafic on this mailing list and other places that this
topic has created. Most of what I propose is not new it was proposed by
others on this list. Prong 1 is pretty much what oom_kill does with some
slight canges and an addition of putting nice tasks too sleep, prong 2 is a
variation of providing resources too root user, along with some resource
accounting information that can be used both in the kernel and userland. If
we don't get the right task, the problem continues too progress,. untill the
right task is found or the system is brought too it knees. Prong three
provides a way too communicate with userland providing what aix does, and
provides some level of being proactive instead of just be reactive where we
have unto now been doing the wrong thing according too other readers of this
list.


james

2001-03-28 03:22:15

by Rik van Riel

[permalink] [raw]
Subject: Re: Ideas for the oom problem

On Tue, 27 Mar 2001, Doug Ledford wrote:

> I've been using our internal tree for my testing, and I'm reluctant to
> let my experiences there cause me to draw conclusions about other
> trees. So, will you please tell me which version of the kernel you
> think has a vm that only triggers the oom killer in emergency
> situations so I can test it here to see if you are right?

Detecting WHEN we're OOM is quite unrelated from chosing WHAT
to do when we're OOM.

There is currently no kernel that I'm aware of which does the
OOM kill at the "exact right" moment.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-03-28 03:36:56

by Doug Ledford

[permalink] [raw]
Subject: Re: Ideas for the oom problem

Rik van Riel wrote:
>
> On Tue, 27 Mar 2001, Doug Ledford wrote:
>
> > I've been using our internal tree for my testing, and I'm reluctant to
> > let my experiences there cause me to draw conclusions about other
> > trees. So, will you please tell me which version of the kernel you
> > think has a vm that only triggers the oom killer in emergency
> > situations so I can test it here to see if you are right?
>
> Detecting WHEN we're OOM is quite unrelated from chosing WHAT
> to do when we're OOM.
>
> There is currently no kernel that I'm aware of which does the
> OOM kill at the "exact right" moment.

I'm not looking for "exact right". I'm looking for "in the ballpark". Hell,
I'm not even that picky. "In the right country" will do for me. But right
now, what I'm seeing, is a vm that will trigger the oom_killer with 900Mb of a
1GB machine used for nothing but disk cache.

Now, I wouldn't bring this up as a big issue except I keep seeing people say
things like "why so complex a solution for something that is only used in
emergency situations". My point is that it *IS NOT* being using only in
emergency situations and that is what needs fixed. Now, I'm willing to allow
that our internal kernel may trigger an oom at different times than the kernel
you use. That's why I asked what kernel you want me to test in order to
establish whether or not I'm right about how far off the oom_killer trigger
really is.

--

Doug Ledford <[email protected]> http://people.redhat.com/dledford
Please check my web site for aic7xxx updates/answers before
e-mailing me about problems

2001-03-28 03:54:57

by Rik van Riel

[permalink] [raw]
Subject: Re: Ideas for the oom problem

On Tue, 27 Mar 2001, Doug Ledford wrote:

> Now, I wouldn't bring this up as a big issue except I keep seeing
> people say things like "why so complex a solution for something that
> is only used in emergency situations". My point is that it *IS NOT*
> being using only in emergency situations and that is what needs fixed.

Exactly.

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-03-28 05:53:36

by Jonathan Morton

[permalink] [raw]
Subject: Re: Ideas for the oom problem

I'm going to be gentle here and try to point out where your suggestions are
flawed...

>a. don't kill any task with a uid < 100

Suppose your system daemon springs a leak? It will have to be killed
eventually, however system daemons can sensibly be given a little "grace".
Also, the UIDs used by a system daemon vary from system to system.

>b. if uid between 100 to 500 or CAP-SYS equivalent enabled
> set it too a lower priority, so if it is at fault it will happen
>slower
> giving more time before the system collapses

Not slowly enough. When your system is thrashing, the CPU is the resource
under least pressure, so "nice" values and priorities have virtually zero
effect. In any case, under OOM conditions the system has *already*
collapsed and we *have* to kill something for the system to keep running.

>c. if a task is nice'd then immediately put the task too sleep, and schedule
>all code / data too be swapped out, or thrown away as appropiate. do not
>reschedule the task too continue until memory is available

In OOM conditions there is no swap space left to do what you suggest. This
is a sensible solution for when thrashing is the only problem...

>d. kill any normal user interactive tasks that is started during a memory
>crisis.

Define "memory crisis". However, this is a relatively sensible solution.

>allocate a pool of memory at system start up that is too be released to the
>memory pool when the system is in a memory crisis. This will reduce system
>swapping, and allow the system too stablize slightly

One of my patches already tries to do this, in a way. It doesn't yet
provide a hard barrier, but it does prevent applications from hogging the
entire memory on the system (at least, without expending some effort into
it).

>report any task asking for large pool of memory while the system is in
>oom crisis. if uid > 500 and was started from an interactive shell it should
>be killed.

See above. malloc() fails, which tells the application there is no more
memory in the system. A well-written application will respond to this and
use more memory-conservative techniques. A poorly-written application will
segfault. End Of Problem. Now to make memory accounting work properly so
these tests are reliable...

>when the crisis is ended, re-adquire the memory pool for later usage.

It is never given up, except when it is needed by the kernel itself (eg. to
swap in pages or (in the absence of true memory accounting) to provide COW
space.

>Prong 3 providing information about oom crisis too user land
>
>create /proc/vm/oom_crisis this would be readonly file owned by root it would
>report if the system is in crisis and the uid of any process that is asking
>for large amounts of ram while the system
>is in crisis.

This kind of information is already available using /proc - applications
just have to look int he right places.

>create a SIGDANGER handler that is sent out too all tasks that have
>registered a handler when the kernel enters oom_kill, give these tasks a high
>priority access too system resources.

This is a fairly good idea, why does it look so familiar? :) SIGDANGER
would be sent to all processes when memory availablility goes below a
threshold, ie. when there is still enough memory left to handle the
situation. The default handler would be a no-op, preserving compatibility.
However, the notion of "high priority access to resources" is not currently
feasible (or necessary).

>this would enable user land programs too deal with the situation with out
>continuous polling free ram/swap. They could email/page sysadmin and user
>about the crisis and add additional swap resources and kill any know non
>essential tasks. and probe system for possible broken tasks, such as
>netscape-common tasks not connected too netscape client, at least i have been
>known too find these when netscape crashes.

Interesting applications for this signal. However, this is entirely a
userspace issue as to what to do with the signal - the kernel's job is to
provide it (if we decide to, that is).

--------------------------------------------------------------
from: Jonathan "Chromatix" Morton
mail: [email protected] (not for attachments)
big-mail: [email protected]
uni-mail: [email protected]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----


2001-03-28 06:16:47

by Shawn Starr

[permalink] [raw]
Subject: Disturbing news..


http://news.cnet.com/news/0-1003-200-5329436.html?tag=lh

Isn't it time to change the ELF format to stop this crap?

Shawn.


2001-03-28 06:33:58

by Shawn Starr

[permalink] [raw]
Subject: Re: Disturbing news.. Idea


Why not make a new file permission?

to deny a ELF binary the ability to modify the ELF entry point?

like +p if the file had +p (by default) the kernel would deny the ELF
binary the ability to modify files.

Shawn.

On Wed, 28 Mar 2001, Shawn Starr wrote:

>
> http://news.cnet.com/news/0-1003-200-5329436.html?tag=lh
>
> Isn't it time to change the ELF format to stop this crap?
>
> Shawn.
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2001-03-28 07:20:27

by Matti Aarnio

[permalink] [raw]
Subject: Re: Disturbing news..

On Wed, Mar 28, 2001 at 01:16:02AM -0500, Shawn Starr wrote:
> Date: Wed, 28 Mar 2001 01:16:02 -0500 (EST)
> From: Shawn Starr <[email protected]>
> To: <[email protected]>
> Subject: Disturbing news..
>
> http://news.cnet.com/news/0-1003-200-5329436.html?tag=lh
> Isn't it time to change the ELF format to stop this crap?
> Shawn.

Why ? "Double-click on attachment to run it" is typical
M$ client stupidity -- and the reason why there
are so many things that can mail themselves around.

Changeing ELF-format would be comparable to what M$ did when
they met the first Word macro viruses -- they changed the
script language inside the Word... What good did that do ?
Did it harm people ? You bet...


You are downloading binaries off the net, and not compiling
from the sources ? (Yes, we all do that. This is why folks
these days carry PGP signatures at the RPM packages.)


So, the program modifies ELF format executables by rewriting
some instructions in the beginning (propably to map-in the virus
code proper with X-bit on), and tags itself (PIC presumably) at
the end of the file.



Another issue is "safe conduct" practice of installing binaries
with minimum privileges (ok, granted that for e.g. RPMs that
usually means root), and *never* running them with undue levels
of privileges -- not even as the owner of said executables.



Ok, granted that we have dangers of getting arbitrary BAD programs
into our systems, how can we combat that ? Virus-scanners
(as much good as they could do..) don't really work in UNIX
environments where "small things" like intercept of every
exec(), and open() via privileged program (scanner) is not
available feature. (I think doing it by passing a AF_UNIX
message with fd + flags to registered server, expecting answer
for the open() -- this would happen *after* the file open is
done with user privileges, but before the call returns.)
(Trapping open() so that shared-libraries could be scanned.)

There could be, I think, a method for doing such intercepts,
which could be used by security scanners to implement some
sense of security in Linux-like systems.

Is it good enough, e.g. when some file is multiply-mapped to
shared programs, and application rewrites parts of the file ?
Can it detect that kind of multi-mapped writing-sharing ?

Can such system be made fast ? (Scanner becomes performance
bottle-neck.)


How about PROPER Orange Book B-level security ?
E.g. NSA trusted-linux ?


/Matti Aarnio

2001-03-28 07:28:27

by Shawn Starr

[permalink] [raw]
Subject: Re: Disturbing news..


Well, why can't the ELF loader module/kernel detect or have some sort of
restriction on modifying other/ELF binaries including itself from changing
the Entry point?

There has to be a way stop this. WHY would anyone want to modify the entry
point anyway? (there may be some reasons but I really dont know what).
Even if it's user level, this cant affect files with root permissions
(unless root is running them or suid).

Any idea?

On Wed, 28 Mar 2001, Matti Aarnio wrote:

> On Wed, Mar 28, 2001 at 01:16:02AM -0500, Shawn Starr wrote:
> > Date: Wed, 28 Mar 2001 01:16:02 -0500 (EST)
> > From: Shawn Starr <[email protected]>
> > To: <[email protected]>
> > Subject: Disturbing news..
> >
> > http://news.cnet.com/news/0-1003-200-5329436.html?tag=lh
> > Isn't it time to change the ELF format to stop this crap?
> > Shawn.
>
> Why ? "Double-click on attachment to run it" is typical
> M$ client stupidity -- and the reason why there
> are so many things that can mail themselves around.
>
> Changeing ELF-format would be comparable to what M$ did when
> they met the first Word macro viruses -- they changed the
> script language inside the Word... What good did that do ?
> Did it harm people ? You bet...
>
>
> You are downloading binaries off the net, and not compiling
> from the sources ? (Yes, we all do that. This is why folks
> these days carry PGP signatures at the RPM packages.)
>
>
> So, the program modifies ELF format executables by rewriting
> some instructions in the beginning (propably to map-in the virus
> code proper with X-bit on), and tags itself (PIC presumably) at
> the end of the file.
>
>
>
> Another issue is "safe conduct" practice of installing binaries
> with minimum privileges (ok, granted that for e.g. RPMs that
> usually means root), and *never* running them with undue levels
> of privileges -- not even as the owner of said executables.
>
>
>
> Ok, granted that we have dangers of getting arbitrary BAD programs
> into our systems, how can we combat that ? Virus-scanners
> (as much good as they could do..) don't really work in UNIX
> environments where "small things" like intercept of every
> exec(), and open() via privileged program (scanner) is not
> available feature. (I think doing it by passing a AF_UNIX
> message with fd + flags to registered server, expecting answer
> for the open() -- this would happen *after* the file open is
> done with user privileges, but before the call returns.)
> (Trapping open() so that shared-libraries could be scanned.)
>
> There could be, I think, a method for doing such intercepts,
> which could be used by security scanners to implement some
> sense of security in Linux-like systems.
>
> Is it good enough, e.g. when some file is multiply-mapped to
> shared programs, and application rewrites parts of the file ?
> Can it detect that kind of multi-mapped writing-sharing ?
>
> Can such system be made fast ? (Scanner becomes performance
> bottle-neck.)
>
>
> How about PROPER Orange Book B-level security ?
> E.g. NSA trusted-linux ?
>
>
> /Matti Aarnio
>
>

2001-03-28 10:02:00

by Helge Hafting

[permalink] [raw]
Subject: Re: Disturbing news..

Shawn Starr wrote:
>
> http://news.cnet.com/news/0-1003-200-5329436.html?tag=lh
>
> Isn't it time to change the ELF format to stop this crap?
>
Nothing to worry about.
A sane distribution have all executables installed read-only
and owned by root or some non-user.

Email appliacations and file browsers etc. are run as normal
users. So, even if the user stupidly run this mysterious
program he got in the mail - what happens?

It search for all ELF executables in the system and find
it can open none! They are not writeable, and the
user don't own them so the bad program cannot change
permissions in order to modify the executables either.

About the only "danger" here is messing with a developer's
program being developed, but he can recompile it
and loose the virus that way. And a developer wouldn't
trust a program he got in the mail in the first place.
Those dumb enough don't have any writeable executables.

Helge Hafting

2001-03-28 12:10:52

by Jesse Pollard

[permalink] [raw]
Subject: Re: Disturbing news..

On Wed, 28 Mar 2001, Shawn Starr wrote:
>Well, why can't the ELF loader module/kernel detect or have some sort of
>restriction on modifying other/ELF binaries including itself from changing
>the Entry point?
>
>There has to be a way stop this. WHY would anyone want to modify the entry
>point anyway? (there may be some reasons but I really dont know what).
>Even if it's user level, this cant affect files with root permissions
>(unless root is running them or suid).
>
>Any idea?

Sure - very simple. If the execute bit is set on a file, don't allow
ANY write to the file. This does modify the permission bits slightly
but I don't think it is an unreasonable thing to have.

>
>On Wed, 28 Mar 2001, Matti Aarnio wrote:
>
>> On Wed, Mar 28, 2001 at 01:16:02AM -0500, Shawn Starr wrote:
>> > Date: Wed, 28 Mar 2001 01:16:02 -0500 (EST)
>> > From: Shawn Starr <[email protected]>
>> > To: <[email protected]>
>> > Subject: Disturbing news..
>> >
>> > http://news.cnet.com/news/0-1003-200-5329436.html?tag=lh
>> > Isn't it time to change the ELF format to stop this crap?
>> > Shawn.
>>
>> Why ? "Double-click on attachment to run it" is typical
>> M$ client stupidity -- and the reason why there
>> are so many things that can mail themselves around.
>>
>> Changeing ELF-format would be comparable to what M$ did when
>> they met the first Word macro viruses -- they changed the
>> script language inside the Word... What good did that do ?
>> Did it harm people ? You bet...
>>
>>
>> You are downloading binaries off the net, and not compiling
>> from the sources ? (Yes, we all do that. This is why folks
>> these days carry PGP signatures at the RPM packages.)
>>
>>
>> So, the program modifies ELF format executables by rewriting
>> some instructions in the beginning (propably to map-in the virus
>> code proper with X-bit on), and tags itself (PIC presumably) at
>> the end of the file.
>>
>>
>>
>> Another issue is "safe conduct" practice of installing binaries
>> with minimum privileges (ok, granted that for e.g. RPMs that
>> usually means root), and *never* running them with undue levels
>> of privileges -- not even as the owner of said executables.
>>
>>
>>
>> Ok, granted that we have dangers of getting arbitrary BAD programs
>> into our systems, how can we combat that ? Virus-scanners
>> (as much good as they could do..) don't really work in UNIX
>> environments where "small things" like intercept of every
>> exec(), and open() via privileged program (scanner) is not
>> available feature. (I think doing it by passing a AF_UNIX
>> message with fd + flags to registered server, expecting answer
>> for the open() -- this would happen *after* the file open is
>> done with user privileges, but before the call returns.)
>> (Trapping open() so that shared-libraries could be scanned.)
>>
>> There could be, I think, a method for doing such intercepts,
>> which could be used by security scanners to implement some
>> sense of security in Linux-like systems.
>>
>> Is it good enough, e.g. when some file is multiply-mapped to
>> shared programs, and application rewrites parts of the file ?
>> Can it detect that kind of multi-mapped writing-sharing ?
>>
>> Can such system be made fast ? (Scanner becomes performance
>> bottle-neck.)
>>
>>
>> How about PROPER Orange Book B-level security ?
>> E.g. NSA trusted-linux ?
>>
>>
>> /Matti Aarnio
>>
>>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
--
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2001-03-28 12:51:21

by Walter Hofmann

[permalink] [raw]
Subject: Re: Disturbing news..



On Wed, 28 Mar 2001, Jesse Pollard wrote:

> >Any idea?
>
> Sure - very simple. If the execute bit is set on a file, don't allow
> ANY write to the file. This does modify the permission bits slightly
> but I don't think it is an unreasonable thing to have.

And how exactly does this help?

fchmod (fd, 0666);
fwrite (fd, ...);
fchmod (fd, 0777);

Walter

2001-03-28 13:02:42

by Russell King

[permalink] [raw]
Subject: Re: Disturbing news..

On Wed, Mar 28, 2001 at 06:08:15AM -0600, Jesse Pollard wrote:
> Sure - very simple. If the execute bit is set on a file, don't allow
> ANY write to the file. This does modify the permission bits slightly
> but I don't think it is an unreasonable thing to have.

Even easier method - remove the write permission bits from all executable
files, and don't do the unsafe thing of running email/web browsers/other
user-type stuff as user root.

If it still worries you that root can write to files without the 'w' bit
set, modify the capabilities of the system to prevent it (there is a bit
that can be set which will remove this ability for all new processes).

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-03-28 12:55:01

by Keith Owens

[permalink] [raw]
Subject: Re: Disturbing news..

On Wed, 28 Mar 2001 06:08:15 -0600,
Jesse Pollard <[email protected]> wrote:
>Sure - very simple. If the execute bit is set on a file, don't allow
>ANY write to the file. This does modify the permission bits slightly
>but I don't think it is an unreasonable thing to have.

man strip
man objcopy
man ld

2001-03-28 13:26:53

by Alexander Viro

[permalink] [raw]
Subject: Re: Disturbing news..



On Wed, 28 Mar 2001, Shawn Starr wrote:

>
> http://news.cnet.com/news/0-1003-200-5329436.html?tag=lh
>
> Isn't it time to change the ELF format to stop this crap?

<shrug> If you run untrusted binaries - you are screwed. If you run
them as root - all users on your system are screwed. If your MUA
(or browser, etc.) can run untrusted code - see above. If you have
a dual-boot system and one of the OSes is compromised - all of them
are. Nothing to do about that. What's new here? Don't be an idiot
nd don't use crapware...

2001-03-28 14:06:37

by Simon Williams

[permalink] [raw]
Subject: Re: Disturbing news..

In message <Pine.GSO.3.96.1010328144551.7198A-100000@laertes>, Walter
Hofmann <[email protected]> writes
>
>
>On Wed, 28 Mar 2001, Jesse Pollard wrote:
>
>> >Any idea?
>>
>> Sure - very simple. If the execute bit is set on a file, don't allow
>> ANY write to the file. This does modify the permission bits slightly
>> but I don't think it is an unreasonable thing to have.
>
>And how exactly does this help?
>
>fchmod (fd, 0666);
>fwrite (fd, ...);
>fchmod (fd, 0777);
>

I think their point was that a program could only change permissions
of a file that was owned by the same owner. If a file is owned by a
different user & has no write permissions for any user, the program
can't modify the file or it's permissions.

Sounds like a good plan to me.


--
Simon Williams

2001-03-28 13:53:56

by Ben Ford

[permalink] [raw]
Subject: Re: Disturbing news..

Jesse Pollard wrote:

> On Wed, 28 Mar 2001, Shawn Starr wrote:
>
>> Well, why can't the ELF loader module/kernel detect or have some sort of
>> restriction on modifying other/ELF binaries including itself from changing
>> the Entry point?
>>
>> There has to be a way stop this. WHY would anyone want to modify the entry
>> point anyway? (there may be some reasons but I really dont know what).
>> Even if it's user level, this cant affect files with root permissions
>> (unless root is running them or suid).
>>
>> Any idea?
>
>
> Sure - very simple. If the execute bit is set on a file, don't allow
> ANY write to the file. This does modify the permission bits slightly
> but I don't think it is an unreasonable thing to have.
>
What a pain in the ass when you are writing / updating a shell script .
. . .

2001-03-28 14:17:09

by Jesse Pollard

[permalink] [raw]
Subject: Re: Disturbing news..

Keith Owens <[email protected]>
>
> On Wed, 28 Mar 2001 06:08:15 -0600,
> Jesse Pollard <[email protected]> wrote:
> >Sure - very simple. If the execute bit is set on a file, don't allow
> >ANY write to the file. This does modify the permission bits slightly
> >but I don't think it is an unreasonable thing to have.
>
> man strip
> man objcopy
> man ld

Thought of theses already (well, at least ld...)

strip - not used that much (most executables still have their symbol table
but could be handled by removing the execute bit, stripping, then
putting it back. Or just use the ld option -s.
objcopy - copies object files. Object files are not marked executable...
ld - on other UNIX systems (Cray/IRIX), I think the output file
(-o) specified is first deleted. Whenever I can cause a link
error, the output is not marked executable. If the GNU ld doesn't
delete it first, then it most likely should.

I was expecting shell scripts to be the complaint first... :-)

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2001-03-28 14:07:58

by Sean Hunter

[permalink] [raw]
Subject: Re: Disturbing news..

On Wed, Mar 28, 2001 at 06:08:15AM -0600, Jesse Pollard wrote:
> Sure - very simple. If the execute bit is set on a file, don't allow
> ANY write to the file. This does modify the permission bits slightly
> but I don't think it is an unreasonable thing to have.
>

Are we not then in the somewhat zen-like state of having an "rm" which can't
"rm" itself without needing to be made non-executable so that it can't execute?

Sean

2001-03-28 14:33:52

by Romano Giannetti

[permalink] [raw]
Subject: Re: Disturbing news..

Notice: this is my first post to l-k since some bug report as old as 0.99...
so please be kind, don't beat me to hard.

On Wed, Mar 28, 2001 at 08:25:46AM -0500, Alexander Viro wrote:

> <shrug> If you run untrusted binaries - you are screwed. If you run
> them as root - all users on your system are screwed. If your MUA
> (or browser, etc.) can run untrusted code - see above.

Too true.

But with the new VFS semantics, wouldn't be possible for a MUA to make a
thing like the following:

spawn a process with a private namespace. Here a minimun subset of the
"real" tree (maybe all / except /dev) is mounted readonly. The private /tmp
and /home/user are substituted by read-write directory that are in the
"real" tree /home/user/mua/fakehome and /home/user/mua/faketmp. In this
private namespace, run the "untrusted" binary.

Now the binary can do much less harm than before, or am I missing something?
It have no access to real user data, but can use the system library and
services without changing anything in the system.

Having the read-only flag per vfs-mount is the only kernel-related thing
here, I think; all the rest is simply user-space spice :-).

Have a nice day,
Romano


--
Romano Giannetti - Univ. Pontificia Comillas (Madrid, Spain)
Electronic Engineer - phone +34 915 422 800 ext 2416 fax +34 915 411 132

2001-03-28 14:39:12

by Hacksaw

[permalink] [raw]
Subject: Re: Ideas for the oom problem

> a. don't kill any task with a uid < 100
>
> b. if uid between 100 to 500 or CAP-SYS equivalent enabled
> set it too a lower priority, so if it is at fault it will happen slower
> giving more time before the system collapses

Deciding what not to kill based on who started it seems like a bad idea. Root
can start netscape just as easily as any user, but if the choice of processes
to kill is root's netscape or a user's experimental database, I'd want the
netscape to go away.



2001-03-28 14:44:12

by Jesse Pollard

[permalink] [raw]
Subject: Re: Disturbing news..

--------- Received message begins Here ---------

>
> On Wed, Mar 28, 2001 at 06:08:15AM -0600, Jesse Pollard wrote:
> > Sure - very simple. If the execute bit is set on a file, don't allow
> > ANY write to the file. This does modify the permission bits slightly
> > but I don't think it is an unreasonable thing to have.
>
> Even easier method - remove the write permission bits from all executable
> files, and don't do the unsafe thing of running email/web browsers/other
> user-type stuff as user root.
>
> If it still worries you that root can write to files without the 'w' bit
> set, modify the capabilities of the system to prevent it (there is a bit
> that can be set which will remove this ability for all new processes).

How about just adding MLS ... :-)

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2001-03-28 14:43:22

by Jesse Pollard

[permalink] [raw]
Subject: Re: Disturbing news..

>
>
>
> On Wed, 28 Mar 2001, Jesse Pollard wrote:
>
> > >Any idea?
> >
> > Sure - very simple. If the execute bit is set on a file, don't allow
> > ANY write to the file. This does modify the permission bits slightly
> > but I don't think it is an unreasonable thing to have.
>
> And how exactly does this help?
>
> fchmod (fd, 0666);
> fwrite (fd, ...);
> fchmod (fd, 0777);

By itself it doesn't - but if you also don't have user/group/world rw and
don't own the file, you can't do anything to it.

It's only there to reduce accidents. If you want to go full out,
remove the symbols from the file.

Now, if ELF were to be modified, I'd just add a segment checksum
for each segment, then put the checksum in the ELF header as well as
in the/a segment header just to make things harder. At exec time a checksum
verify could (expensive) be done on each segment. A reduced level could be
done only on the data segment or text segment. This would at least force
the virus to completly read the file to regenerate the checksum.

That change would even allow for signature checks of the checksum if the
signature was stored somewhere else (system binaries/setuid binaries...).
But only in a high risk environment. This could even be used for a scanner
to detect ANY change to binaries (and fast too - signature check of checksums
wouldn't require reading the entire file).

In any case, the problem is limited to one user, even if nothing is done.

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2001-03-28 14:55:02

by Russell King

[permalink] [raw]
Subject: Re: Disturbing news..

On Wed, Mar 28, 2001 at 08:15:57AM -0600, Jesse Pollard wrote:
> objcopy - copies object files. Object files are not marked executable...

objcopy copies executable files as well - check the kernel makefiles
for examples.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-03-28 15:01:43

by Alexander Viro

[permalink] [raw]
Subject: Re: Disturbing news..



On Wed, 28 Mar 2001, Romano Giannetti wrote:

> Now the binary can do much less harm than before, or am I missing something?
> It have no access to real user data, but can use the system library and
> services without changing anything in the system.

You mean, like mailbombing the living hell out of somebody? Or playing
interesting games with sending signals all over the place...

2001-03-28 14:58:02

by Andreas Rogge

[permalink] [raw]
Subject: Re: Ideas for the oom problem

--On Wednesday, March 28, 2001 09:38:04 -0500 Hacksaw <[email protected]>
wrote:
>
> Deciding what not to kill based on who started it seems like a bad idea.
> Root can start netscape just as easily as any user, but if the choice of
> processes to kill is root's netscape or a user's experimental database,
> I'd want the netscape to go away.

root does not use netscape -FULLSTOP-

Anyone working as root is (sorry) an idiot! root's processes are normally
quite system-relevant and so they should never be killed, if we can avoid
it.
There can however be processes owned by other users which shouldn't be
killed in OOM-Situation, but generally root's processes are more important
than a normal user's processes.
What about doing something really critical to avoid the upcoming OOM-situ
and get your shell killed because you were to slow?

--
Andreas Rogge <[email protected]>
Available on IRCnet:#linux.de as Dyson

2001-03-28 14:58:22

by Bill Rugolsky Jr.

[permalink] [raw]
Subject: Re: Disturbing news..

On Wed, Mar 28, 2001 at 04:32:44PM +0200, Romano Giannetti wrote:
> But with the new VFS semantics, wouldn't be possible for a MUA to make a
> thing like the following:
>
> spawn a process with a private namespace. Here a minimun subset of the
> "real" tree (maybe all / except /dev) is mounted readonly. The private /tmp
> and /home/user are substituted by read-write directory that are in the
> "real" tree /home/user/mua/fakehome and /home/user/mua/faketmp. In this
> private namespace, run the "untrusted" binary.

Possible and desirable. You have to turn off access to all the other
dangerous namespaces though, like socket() and shmat(), and make sure
that nosuid and devices are handled properly. Done right, the only thing
that untrusted code can do is consume a little memory, CPU, and disk,
but that's why there are limits and a scheduler. :-)

One might even want to add back limited access to those other namespaces
by implementing a filesystem interface, ala Plan-9/Inferno.

Regards,

Bill Rugolsky

2001-03-28 15:05:52

by Olivier Galibert

[permalink] [raw]
Subject: Re: Disturbing news..

On Wed, Mar 28, 2001 at 03:04:46PM +0100, Simon Williams wrote:
> I think their point was that a program could only change permissions
> of a file that was owned by the same owner. If a file is owned by a
> different user & has no write permissions for any user, the program
> can't modify the file or it's permissions.

You mean, you usually have write permissions for other than the owner
on executable files?

Let me reformulate that. You usually have write permissions for other
than the owner, and not only on some special, untrusted log files (I'm
talking files, here, not device nodes)? What's your umask, 0?


> Sounds like a good plan to me.

PEBCAK. Unix security is not designed with dumb "administrators" in
mind, nor should be. User friendly is good. Luser friendly isn't,
it's either dumbing down or unnecessarily restrictive.

OG, who waits for the first insmod-ing "virus"

2001-03-28 15:22:52

by Russell King

[permalink] [raw]
Subject: Re: Disturbing news..

On Wed, Mar 28, 2001 at 08:40:42AM -0600, Jesse Pollard wrote:
> Now, if ELF were to be modified, I'd just add a segment checksum
> for each segment, then put the checksum in the ELF header as well as
> in the/a segment header just to make things harder. At exec time a checksum
> verify could (expensive) be done on each segment. A reduced level could be
> done only on the data segment or text segment. This would at least force
> the virus to completly read the file to regenerate the checksum.

Checksums don't help that much - virus writers would treat it as "part
of the set of alterations that need to be made" and then the checksum
becomes zero protection.

Your system binaries are safe from the virus (from my understanding of the
poor writeup) if you are sensible about your system - ie, don't run stuff
as the root user, don't login as root, ensure that your binaries are owned
by root etc.

If users have their own binaries, then they should take adequate care
over them (find ~ -type f -perm +111 | xargs chmod a-w) to ensure that
they are not writable (this applies to your argument as well).

Once you're in this situation:

1. Users can't write to their executables without first chmod'ing them.
(won't take long for a virus writer to get the idea that they should
chmod +w them first).

2. If a user binary becomes infected, only people able to run that binary
also become infected. Certainly root should under no circumstances
run any program which a user has compiled - the user may have some
nice code in there which creates another root user in /etc/passwd
with no password entry.

3. Since you're only running system programs as root (and by that I mean
stuff for administration, not stuff like mail clients, news readers
etc), your system binaries should not become infected.

Therefore, if you follow good easy system administration techniques, then
you end up minimising the risk of getting:

1. viruses
2. trojans
3. malicious users

cracking your system. If you don't follow these techniques, then you're
asking for lots of trouble, and no amount of checksumming/signing/etc
will ever save you.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-03-28 15:32:22

by john slee

[permalink] [raw]
Subject: Re: Disturbing news..

[cc list trimmed]

On Wed, Mar 28, 2001 at 03:10:08PM +0100, Sean Hunter wrote:
> On Wed, Mar 28, 2001 at 06:08:15AM -0600, Jesse Pollard wrote:
> > Sure - very simple. If the execute bit is set on a file, don't allow
> > ANY write to the file. This does modify the permission bits slightly
> > but I don't think it is an unreasonable thing to have.
> >
>
> Are we not then in the somewhat zen-like state of having an "rm" which can't
> "rm" itself without needing to be made non-executable so that it can't execute?

aiiiieee, my head hurts now, thanks :(

j.

--
"Bobby, jiggle Grandpa's rat so it looks alive, please" -- gary larson

2001-03-28 15:32:12

by Jesse Pollard

[permalink] [raw]
Subject: Re: Disturbing news..

Sean Hunter <[email protected]>:
> On Wed, Mar 28, 2001 at 06:08:15AM -0600, Jesse Pollard wrote:
> > Sure - very simple. If the execute bit is set on a file, don't allow
> > ANY write to the file. This does modify the permission bits slightly
> > but I don't think it is an unreasonable thing to have.
> >
>
> Are we not then in the somewhat zen-like state of having an "rm" which can't
> "rm" itself without needing to be made non-executable so that it can't execute?

We've been in that state for a long time... (carefull updating that libc.so
file... can't overwrite/delete without having some REAL problems show up.)

It just calls for some carefull activity. If rm is being replaced, first
rename it; then put new one in place; chmod old; delete old. It is directly
comparable to the libc.so update procedure.

I should have left off the "very simple" remark.

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2001-03-28 15:44:32

by Jesse Pollard

[permalink] [raw]
Subject: Re: Disturbing news..

Russell King <[email protected]>:
> On Wed, Mar 28, 2001 at 08:15:57AM -0600, Jesse Pollard wrote:
> > objcopy - copies object files. Object files are not marked executable...
>
> objcopy copies executable files as well - check the kernel makefiles
> for examples.

At the time it's copying, the input doesn't need to be executable. That
appears to be a byproduct of a library link.

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2001-03-28 15:52:43

by Simon Williams

[permalink] [raw]
Subject: Re: Disturbing news..

In message <[email protected]>, Olivier Galibert
<[email protected]> writes
>On Wed, Mar 28, 2001 at 03:04:46PM +0100, Simon Williams wrote:
>> I think their point was that a program could only change permissions
>> of a file that was owned by the same owner. If a file is owned by a
>> different user & has no write permissions for any user, the program
>> can't modify the file or it's permissions.
>
>You mean, you usually have write permissions for other than the owner
>on executable files?
>
>Let me reformulate that. You usually have write permissions for other
>than the owner, and not only on some special, untrusted log files (I'm
>talking files, here, not device nodes)? What's your umask, 0?
>

Firstly, I'm relatively new to Linux (only about 3 yrs experience) &
don't claim to be an expert. Secondly, I don't think I stated my point
very clearly.

No, I don't have write permissions set on an executable for any user
other than the owner.

What I meant was that if a file is owned by root with permissions of,
say, 555 (r-xr-xr-x), not setuid or setgid, then another executable
run as a non-root user cannot modify it or change the permissions to
7 (rwx).

>
>> Sounds like a good plan to me.
>
>PEBCAK. Unix security is not designed with dumb "administrators" in
>mind, nor should be. User friendly is good. Luser friendly isn't,
>it's either dumbing down or unnecessarily restrictive.
>

I completely agree (even with the PEBCAK part :)). UNIX security on
corporate networks or public-facing systems should be left to experts.
I, on the other hand, am a home-user trying to learn how Linux works &
how to secure it, I don't pretend to be an expert.

My policy is to give necessary permissions & no more. I would set the
aforementioned permissions on the main system binaries which would allow
other users to get on with what they need to do without being able to
affect the workspaces of other users, only their own.

I'm open to contructive criticism on this.


--
Simon Williams

2001-03-28 15:52:42

by Jesse Pollard

[permalink] [raw]
Subject: Re: Disturbing news..

Russell King <[email protected]>
>
> On Wed, Mar 28, 2001 at 08:40:42AM -0600, Jesse Pollard wrote:
> > Now, if ELF were to be modified, I'd just add a segment checksum
> > for each segment, then put the checksum in the ELF header as well as
> > in the/a segment header just to make things harder. At exec time a checksum
> > verify could (expensive) be done on each segment. A reduced level could be
> > done only on the data segment or text segment. This would at least force
> > the virus to completly read the file to regenerate the checksum.
>
> Checksums don't help that much - virus writers would treat it as "part
> of the set of alterations that need to be made" and then the checksum
> becomes zero protection.
>
[ snip of good stuff ]
> Therefore, if you follow good easy system administration techniques, then
> you end up minimising the risk of getting:
>
> 1. viruses
> 2. trojans
> 3. malicious users
>
> cracking your system. If you don't follow these techniques, then you're
> asking for lots of trouble, and no amount of checksumming/signing/etc
> will ever save you.

Absolutely true. The only help the checksumming etc stuff is good for is
detecting the fact afterward by external comparison.

I like MLS for the ability to catch ATTEMPTS to make unauthorized
modification.

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2001-03-28 16:08:22

by Russell King

[permalink] [raw]
Subject: Re: Disturbing news..

Jesse Pollard writes:
> Absolutely true. The only help the checksumming etc stuff is good for is
> detecting the fact afterward by external comparison.

Don't we already have that to some extent? rpm -ya or rpm -y <package name>
on a RedHat system? I'm sure that there is a Debian equivalent.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-03-28 16:20:32

by Jonathan Lundell

[permalink] [raw]
Subject: Re: Disturbing news..

john slee <[email protected]> says:

>On Wed, Mar 28, 2001 at 03:10:08PM +0100, Sean Hunter wrote:
>> On Wed, Mar 28, 2001 at 06:08:15AM -0600, Jesse Pollard wrote:
>> > Sure - very simple. If the execute bit is set on a file, don't allow
>> > ANY write to the file. This does modify the permission bits slightly
>> > but I don't think it is an unreasonable thing to have.
>> >
>>
>> Are we not then in the somewhat zen-like state of having an "rm" which can't
>> "rm" itself without needing to be made non-executable so that it can't execute?
>
>aiiiieee, my head hurts now, thanks :(

It shouldn't. rm is not prevented from removing an unwriteable file (though it complains by default). Directory permissions control operations on links.

--
/Jonathan Lundell.

2001-03-28 16:15:32

by Romano Giannetti

[permalink] [raw]
Subject: Re: Disturbing news..

On Wed, Mar 28, 2001 at 09:57:47AM -0500, Alexander Viro wrote:
>
> On Wed, 28 Mar 2001, Romano Giannetti wrote:
>
> > Now the binary can do much less harm than before, or am I missing something?
> > It have no access to real user data, but can use the system library and
> > services without changing anything in the system.
>
> You mean, like mailbombing the living hell out of somebody? Or playing
> interesting games with sending signals all over the place...

Yes, I was sure there were doors left, but --- it has no access to the
bookmark list of user, and can kill just user processes... that was what I
meant with "less harm" (never say never, I know...).

Romano

--
Romano Giannetti - Univ. Pontificia Comillas (Madrid, Spain)
Electronic Engineer - phone +34 915 422 800 ext 2416 fax +34 915 411 132

2001-03-28 17:30:55

by Horst H. von Brand

[permalink] [raw]
Subject: Re: Disturbing news..

Shawn Starr <[email protected]> said:
> Well, why can't the ELF loader module/kernel detect or have some sort of
> restriction on modifying other/ELF binaries including itself from changing
> the Entry point?

Because there are quite valid reasons for "normal" programs (e.g., ld(1)
and other binary-futzing tools) to do so. No, I don't want a paranoic
system where I (regular user) can't do this to my own files using a random
binary editor. An executable is just a normal file in Unix, can't get
around this without seriously breaking lots of stuff.
--
Dr. Horst H. von Brand mailto:[email protected]
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513

2001-03-28 17:52:46

by Olivier Galibert

[permalink] [raw]
Subject: Re: Disturbing news..

On Wed, Mar 28, 2001 at 04:49:26PM +0100, Simon Williams wrote:
> What I meant was that if a file is owned by root with permissions of,
> say, 555 (r-xr-xr-x), not setuid or setgid, then another executable
> run as a non-root user cannot modify it or change the permissions to
> 7 (rwx).

It's already the case that a file owned by user A cannot have its
rights changed by user B. Also, if the write permission is not set,
you can't modify the file. That's the basic unix security model. Of
course, if you're root all best are off, root is god. For those who
con't like that, there are things like capabilities and MAC. But they
are _really_ hard to setup correctly.

What they are talking about is to have the x bit cancel the w bit,
i.e. make the rwx files unwritable. Fixing the symptoms, you know...


> My policy is to give necessary permissions & no more.

This is not a bad policy. Removing read permissions can make fixing
problems a pain, though (what, no gdb/strace of system executables?).


> I would set the
> aforementioned permissions on the main system binaries which would allow
> other users to get on with what they need to do without being able to
> affect the workspaces of other users, only their own.

Well, the main system binaries are already that way (r-xr-xr-x or
rwxr-xr-x, which when root-owned are equivalent). I don't see your
point.

OG.

2001-03-28 20:00:52

by Ben Ford

[permalink] [raw]
Subject: Re: Disturbing news..

Simon Williams wrote:

> In message <[email protected]>, Olivier Galibert
> <[email protected]> writes
>
>> On Wed, Mar 28, 2001 at 03:04:46PM +0100, Simon Williams wrote:
>>
>>> I think their point was that a program could only change permissions
>>> of a file that was owned by the same owner. If a file is owned by a
>>> different user & has no write permissions for any user, the program
>>> can't modify the file or it's permissions.
>>
>> You mean, you usually have write permissions for other than the owner
>> on executable files?
>>
>> Let me reformulate that. You usually have write permissions for other
>> than the owner, and not only on some special, untrusted log files (I'm
>> talking files, here, not device nodes)? What's your umask, 0?
>>
>
> Firstly, I'm relatively new to Linux (only about 3 yrs experience) &
> don't claim to be an expert. Secondly, I don't think I stated my point
> very clearly.
>
> No, I don't have write permissions set on an executable for any user
> other than the owner.
>
> What I meant was that if a file is owned by root with permissions of,
> say, 555 (r-xr-xr-x), not setuid or setgid, then another executable
> run as a non-root user cannot modify it or change the permissions to
> 7 (rwx).

There are two problems I see here. First, there are several known ways
to elevate privileges. If a virus can elevate privileges, then it owns
you. Second, this is a multi-OS virus. If you dual-boot into Windows,
any ELF files accessible can be infected. With this one, that isn't a
prob, but when somebody codes in an ext2 driver to their virus, then
we've got issues.

-b

2001-03-28 21:20:24

by Gerhard Mack

[permalink] [raw]
Subject: Re: Disturbing news..

On Wed, 28 Mar 2001 [email protected] wrote:

> Jesse Pollard writes:
> > Absolutely true. The only help the checksumming etc stuff is good for is
> > detecting the fact afterward by external comparison.
>
> Don't we already have that to some extent? rpm -ya or rpm -y <package name>
> on a RedHat system? I'm sure that there is a Debian equivalent.

http://www.tripwire.com does exactly this afik.

Gerhard

--
Gerhard Mack

[email protected]

<>< As a computer I find your faith in technology amusing.

2001-04-02 23:11:51

by Dr. Kelsey Hudson

[permalink] [raw]
Subject: Re: Disturbing news..

On Wed, 28 Mar 2001, Jesse Pollard wrote:
> Sure - very simple. If the execute bit is set on a file, don't allow
> ANY write to the file. This does modify the permission bits slightly
> but I don't think it is an unreasonable thing to have.

Oh, honestly! Think about what you are saying here:

What if you are developing something in an interpereted language such as
perl or a shell script, where you *directly modify the executable file*?

No, this won't work...Not wwithout being annoying as hell.

2001-04-21 00:43:22

by Shawn Starr

[permalink] [raw]
Subject: Serious Latency problems : 2.4.4-pre5


Note: I'm not on this mailing list (for now, domain IP is changing).
Please email directly

1) I've noticed very high CPU load 3.00 running ./configure alone

2) some gnome applications (Gnome Mailcheck broke with 2.4.4-pre5)

3) Resolving local domains takes an awful long time (though netscape) but
not with 2.4.3.

Odd bugs... reverted for now.

Shawn.