Anybody want to venture an opinion why overwriting executable files that are
currently in use gives you a "text file busy" error, but overwriting shared
libraries that are in use apparently works just fine (modulo a core dump if
you aren't subtle about your run-time patching)?
Permissions are still enforced, but it seems to me somebody who cracks root
on a system could potentially modify the behavior of important system daemons
without changing their process ID numbers.
Did I miss something somewhere?
Rob
On Tue, 2 Oct 2001, Rob Landley wrote:
> Anybody want to venture an opinion why overwriting executable files that are
> currently in use gives you a "text file busy" error, but overwriting shared
> libraries that are in use apparently works just fine (modulo a core dump if
> you aren't subtle about your run-time patching)?
>
> Permissions are still enforced, but it seems to me somebody who cracks root
> on a system could potentially modify the behavior of important system daemons
> without changing their process ID numbers.
>
> Did I miss something somewhere?
Somebody who cracks root can attach gdb to a daemon, modify the contents of
its text segment and detach. No need to change any files...
Alexander Viro <[email protected]>:
> On Tue, 2 Oct 2001, Rob Landley wrote:
>
> > Anybody want to venture an opinion why overwriting executable files that are
> > currently in use gives you a "text file busy" error, but overwriting shared
> > libraries that are in use apparently works just fine (modulo a core dump if
> > you aren't subtle about your run-time patching)?
> >
> > Permissions are still enforced, but it seems to me somebody who cracks root
> > on a system could potentially modify the behavior of important system daemons
> > without changing their process ID numbers.
> >
> > Did I miss something somewhere?
>
> Somebody who cracks root can attach gdb to a daemon, modify the contents of
> its text segment and detach. No need to change any files...
True, but the original problem still appears to be a bug.
Even the owner of the file should not be able to write to a busy executable,
whether it is a shared library, or an executable image. Remove it, yes.
Create a new one (in a different inode) - yes.
But not modify a busy executable.
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]
Any opinions expressed are solely my own.
Jesse Pollard <[email protected]> writes:
> Alexander Viro <[email protected]>:
> > On Tue, 2 Oct 2001, Rob Landley wrote:
> >
> > > Anybody want to venture an opinion why overwriting executable files that are
>
> > > currently in use gives you a "text file busy" error, but overwriting shared
>
> > > libraries that are in use apparently works just fine (modulo a core dump if
>
> > > you aren't subtle about your run-time patching)?
> > >
> > > Permissions are still enforced, but it seems to me somebody who cracks root
>
> > > on a system could potentially modify the behavior of important system
> daemons
>
> > > without changing their process ID numbers.
> > >
> > > Did I miss something somewhere?
> >
> > Somebody who cracks root can attach gdb to a daemon, modify the contents of
> > its text segment and detach. No need to change any files...
>
> True, but the original problem still appears to be a bug.
>
> Even the owner of the file should not be able to write to a busy executable,
> whether it is a shared library, or an executable image. Remove it, yes.
> Create a new one (in a different inode) - yes.
>
> But not modify a busy executable.
Have ld-linux.so set the MAP_DENYWRITE bit when it is mapping
the library.
Eric
On Wednesday 03 October 2001 14:06, Eric W. Biederman wrote:
> > But not modify a busy executable.
>
> Have ld-linux.so set the MAP_DENYWRITE bit when it is mapping
> the library.
And of course since the FSF wrote it, it's not quite that simple...
>/* The right way to map in the shared library files is MAP_COPY, which
> makes a virtual copy of the data at the time of the mmap call; this
> guarantees the mapped pages will be consistent even if the file is
> overwritten. Some losing VM systems like Linux's lack MAP_COPY. All we
> get is MAP_PRIVATE, which copies each page when it is modified; this
> means if the file is overwritten, we may at some point get some pages
> from the new version after starting with pages from the old version. */
I.E. it seems like they go out of their way to ALLOW writing to the libaries.
(I assume they KNOW the difference between MAP_DENYWRITE, MAP_COPY, and
MAP_PRIVATE...?)
This look right to anybody else? Or am I about to wander into weird
side-effect land? (Is there a reason they DON'T want a read-only mapping?
Are they writing data into those pages, perhaps doing the linking fixup
stuff? What?)
--- elf/dl-load.bak Wed Oct 3 18:53:37 2001
+++ elf/dl-load.c Wed Oct 3 18:55:57 2001
@@ -48,7 +48,7 @@
means if the file is overwritten, we may at some point get some pages
from the new version after starting with pages from the old version. */
#ifndef MAP_COPY
-# define MAP_COPY MAP_PRIVATE
+# define MAP_COPY MAP_DENYWRITE
#endif
/* Some systems link their relocatable objects for another base address
I should just try this and see what it does. On a machine I don't mind
reinstalling from scratch. Which means I need to dig up a spare keyboard for
my junk machine... (And figure out how to get glibc's ./configure script to
realise that linuxthreads is, in fact, there in the source directory. It's
right there. Use it. Don't yell at me it's not there. I didn't make this
SRPM, I changed one line... Sigh...)
In the morning...
Rob
Rob Landley <[email protected]> writes:
> On Wednesday 03 October 2001 14:06, Eric W. Biederman wrote:
>
> > > But not modify a busy executable.
> >
> > Have ld-linux.so set the MAP_DENYWRITE bit when it is mapping
> > the library.
>
> And of course since the FSF wrote it, it's not quite that simple...
>
> >/* The right way to map in the shared library files is MAP_COPY, which
> > makes a virtual copy of the data at the time of the mmap call; this
> > guarantees the mapped pages will be consistent even if the file is
> > overwritten. Some losing VM systems like Linux's lack MAP_COPY. All we
> > get is MAP_PRIVATE, which copies each page when it is modified; this
> > means if the file is overwritten, we may at some point get some pages
> > from the new version after starting with pages from the old version. */
>
> I.E. it seems like they go out of their way to ALLOW writing to the libaries.
> (I assume they KNOW the difference between MAP_DENYWRITE, MAP_COPY, and
> MAP_PRIVATE...?)
>
> This look right to anybody else? Or am I about to wander into weird
> side-effect land? (Is there a reason they DON'T want a read-only mapping?
> Are they writing data into those pages, perhaps doing the linking fixup
> stuff? What?)
You definentily need to do some writing to do the fixups.
The deny write solves the problem of somone potentially writing to the
file at a later date.
Probably what is needed is:
#ifndef MAP_COPY
# ifdef MAP_DENYWRITE
# define MAP_COPY (MAP_PRIVATE | MAP_DENYWRITE)
# else
# define MAP_COPY MAP_PRIVATE
# endif
#endif
>
> --- elf/dl-load.bak Wed Oct 3 18:53:37 2001
> +++ elf/dl-load.c Wed Oct 3 18:55:57 2001
> @@ -48,7 +48,7 @@
> means if the file is overwritten, we may at some point get some pages
> from the new version after starting with pages from the old version. */
> #ifndef MAP_COPY
> -# define MAP_COPY MAP_PRIVATE
> +# define MAP_COPY MAP_DENYWRITE
> #endif
>
> /* Some systems link their relocatable objects for another base address
>
> I should just try this and see what it does. On a machine I don't mind
> reinstalling from scratch. Which means I need to dig up a spare keyboard for
> my junk machine... (And figure out how to get glibc's ./configure script to
> realise that linuxthreads is, in fact, there in the source directory. It's
> right there. Use it. Don't yell at me it's not there. I didn't make this
> SRPM, I changed one line... Sigh...)
>
> In the morning...
For testing you can do ./ld-linux.so program to run a program under to
see if it actually works.
Eric
On 3 Oct 2001, Eric W. Biederman quoted:
> > >/* The right way to map in the shared library files is MAP_COPY, which
> > > makes a virtual copy of the data at the time of the mmap call; this
> > > guarantees the mapped pages will be consistent even if the file is
> > > overwritten. Some losing VM systems like Linux's lack MAP_COPY. All we
> > > get is MAP_PRIVATE, which copies each page when it is modified; this
> > > means if the file is overwritten, we may at some point get some pages
> > > from the new version after starting with pages from the old version. */
IMO it needs a slight correction.
+ /* Unfortunately, that is not an option, since losing bloatware like GNU's
+ relies heavily on equally bloated shared libraries and use of MAP_COPY
+ would eat memory with no mercy. OTOH, implementing it might be a good
+ idea, since results would force people to switch to something less obese */
In article <[email protected]>,
Rob Landley <[email protected]> wrote:
>
>I.E. it seems like they go out of their way to ALLOW writing to the libaries.
> (I assume they KNOW the difference between MAP_DENYWRITE, MAP_COPY, and
>MAP_PRIVATE...?)
Note that the kernel will refuse to honour MAP_DENYWRITE from user
space, so I'm afraid that changing ld.so won't do a thing.
The reason the kernel refuses to honour it, is that MAP_DENYWRITE is an
excellent DoS-vehicle - you just mmap("/etc/passwd") with MAP_DENYWRITE,
and even root cannot write to it.. Vary nasty.
Which is why the kernel only allows it when the binary loader itself
sets the flag, because security-conscious application writers are
already aware of the "oh, a running binary may not be writable" issues.
So sorry..
Linus
On Thu, 4 Oct 2001, Linus Torvalds wrote:
> The reason the kernel refuses to honour it, is that MAP_DENYWRITE is an
> excellent DoS-vehicle - you just mmap("/etc/passwd") with MAP_DENYWRITE,
> and even root cannot write to it.. Vary nasty.
<nit>
I _really_ doubt that something does write() on /etc/passwd. Create a
file and rename it over the thing - sure, but that's it.
</nit>
On Thu, 4 Oct 2001, Alexander Viro wrote:
> <nit>
> I _really_ doubt that something does write() on /etc/passwd. Create a
> file and rename it over the thing - sure, but that's it.
> </nit>
Well, yeah, bad choice. Can you believe /var/run/utmp or similar?
And yes, we could add checks for the thing being executable before we
accept MAP_DENYWRITE instead of just ignoring the flag from user space.
Nobody has cared enough to make the effort.
Until now?
Linus
Alexander Viro writes:
> <nit>
> I _really_ doubt that something does write() on /etc/passwd. Create a
> file and rename it over the thing - sure, but that's it.
> </nit>
# vi /etc/passwd
Regards,
Richard....
Permanent: [email protected]
Current: [email protected]
Alexander Viro <[email protected]> writes:
> On 3 Oct 2001, Eric W. Biederman quoted:
>
> > > >/* The right way to map in the shared library files is MAP_COPY, which
> > > > makes a virtual copy of the data at the time of the mmap call; this
> > > > guarantees the mapped pages will be consistent even if the file is
> > > > overwritten. Some losing VM systems like Linux's lack MAP_COPY. All we
>
> > > > get is MAP_PRIVATE, which copies each page when it is modified; this
> > > > means if the file is overwritten, we may at some point get some pages
> > > > from the new version after starting with pages from the old version. */
>
>
> IMO it needs a slight correction.
>
> + /* Unfortunately, that is not an option, since losing bloatware like GNU's
> + relies heavily on equally bloated shared libraries and use of MAP_COPY
> + would eat memory with no mercy. OTOH, implementing it might be a good
> + idea, since results would force people to switch to something less obese */
Hmm. Perhaps. But if we went there we would need to add something like.
/* But finding a less obese platform to run these less obese libraries is a
challenge. Unix clones like UZI have been shown to run a complete system
including user space binaries in just 64KB of RAM, on systems
originally designed to run CPM. But today you can't find a general
purpose kernel whose binary much less it footprint fits in 256KB.
It seems bloatware is everywhere.
*/
I have days when I'm frustrated by the size of both glibc and the
linux kernel. stripped both the linux kernel and glibc are comparable
in size. Though I think the 400KB of compressed glibc-2.1.2 is
actually smaller than the kernel for the most part. I have to strip
off practically everthing to get a useable bzImage under 400KB.
So any good ideas on how to get the size of linux down?
Eric
On Thu, 4 Oct 2001, Linus Torvalds wrote:
>Which is why the kernel only allows it when the binary loader itself
>sets the flag, because security-conscious application writers are
>already aware of the "oh, a running binary may not be writable" issues.
One of the methods I tried to use to stop a fork()-bomb was to zero the
executable in question to force it to crash. No such luck, reboot it was.
Not that I can think of any other useful application of said behavior.
--
George Greer, [email protected]
http://www.m-l.org/~greerga/
Thursday, October 04, 2001, 8:15:01 AM,
[email protected] (Eric W. Biederman) wrote:
EWB> I have days when I'm frustrated by the size of both glibc and the
EWB> linux kernel. stripped both the linux kernel and glibc are comparable
EWB> in size. Though I think the 400KB of compressed glibc-2.1.2 is
EWB> actually smaller than the kernel for the most part. I have to strip
EWB> off practically everthing to get a useable bzImage under 400KB.
EWB> So any good ideas on how to get the size of linux down?
I think code quality priorities are:
1. Features
If a program misses some very useful (or even vital)
feature, there's not much sense in using it.
2. Stability
Nobody likes when program (or server) hangs/crashes.
3. Speed
Developers usually have fast machines with big disks,
so they really like to see progs work fast.
4. Size
As you see, size isn't a top prio. It's sad, but we can't
have all these objectives met with same level of success.
However, Linux isn't that bad compared to Win2K nightmare.
And please let's not start a longish discussion on this. Please.
A contest to cut the most kbytes without loss of features/speed
from kernel/glibs/X/... is much more productive. :-)
If you can't resist, may I suggest private mail, not lkml
--
Best regards, VDA
mailto:[email protected]
On Thu, Oct 04, 2001 at 12:15:01AM -0600, Eric W. Biederman wrote:
> I have days when I'm frustrated by the size of both glibc and the
> linux kernel. stripped both the linux kernel and glibc are comparable
> in size. Though I think the 400KB of compressed glibc-2.1.2 is
> actually smaller than the kernel for the most part. I have to strip
> off practically everthing to get a useable bzImage under 400KB.
>
> So any good ideas on how to get the size of linux down?
Mind if I ask why you need a bzimage under 400kb? Just curious as I've
never had the need. (And I can see needing it less then 1.4meg - are you
trying to get a kernel AND a ramdisk on the one floppy?)
--
CaT "As you can expect it's really affecting my sex life. I can't help
it. Each time my wife initiates sex, these ejaculating hippos keep
floating through my mind."
- Mohd. Binatang bin Goncang, Singapore Zoological Gardens
On Thu, Oct 04, 2001 at 12:15:01AM -0600, you [Eric W. Biederman] claimed:
>
> <snip size of glibc>
Where size is an issue, diet libc might be an alternative:
http://www.fefe.de/dietlibc/
(287kB statically linked zsh is not too shabby, I reckon.)
That and things like busybox:
http://busybox.lineo.com/
I have no suggestion wrt the kernel, though.
-- v --
[email protected]
[ cc list trimmed ]
On Thu, Oct 04, 2001 at 06:21:27PM +1000, CaT wrote:
> On Thu, Oct 04, 2001 at 12:15:01AM -0600, Eric W. Biederman wrote:
> > I have days when I'm frustrated by the size of both glibc and the
> > linux kernel. stripped both the linux kernel and glibc are comparable
> > in size. Though I think the 400KB of compressed glibc-2.1.2 is
> > actually smaller than the kernel for the most part. I have to strip
> > off practically everthing to get a useable bzImage under 400KB.
> >
> > So any good ideas on how to get the size of linux down?
>
> Mind if I ask why you need a bzimage under 400kb? Just curious as I've
> never had the need. (And I can see needing it less then 1.4meg - are you
> trying to get a kernel AND a ramdisk on the one floppy?)
plenty of reasons. i'm building a compactflash-based linux router which
will only have 16mb of flash for the entire system... saving 100kb means
you can fit a few extra userspace tools in there...
-rwxr-xr-x 1 indigoid indigoid 54444 Oct 4 18:30 boa*
j.
--
R N G G "Well, there it goes again... And we just sit
I G G G here without opposable thumbs." -- gary larson
On Thu, Oct 04, 2001 at 06:35:07PM +1000, john slee wrote:
> > Mind if I ask why you need a bzimage under 400kb? Just curious as I've
> > never had the need. (And I can see needing it less then 1.4meg - are you
> > trying to get a kernel AND a ramdisk on the one floppy?)
>
> plenty of reasons. i'm building a compactflash-based linux router which
> will only have 16mb of flash for the entire system... saving 100kb means
> you can fit a few extra userspace tools in there...
>
> -rwxr-xr-x 1 indigoid indigoid 54444 Oct 4 18:30 boa*
Well, duh. :)
Thanks. :)
--
CaT "As you can expect it's really affecting my sex life. I can't help
it. Each time my wife initiates sex, these ejaculating hippos keep
floating through my mind."
- Mohd. Binatang bin Goncang, Singapore Zoological Gardens
[email protected] (Eric W. Biederman) writes:
|> So any good ideas on how to get the size of linux down?
How about linux-0.01?
SCNR.
Andreas.
--
Andreas Schwab "And now for something
[email protected] completely different."
SuSE Labs, SuSE GmbH, Schanz?ckerstr. 10, D-90443 N?rnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
On Thu Oct 04, 2001 at 11:30:19AM +0300, Ville Herva wrote:
> On Thu, Oct 04, 2001 at 12:15:01AM -0600, you [Eric W. Biederman] claimed:
> >
> > <snip size of glibc>
>
> Where size is an issue, diet libc might be an alternative:
>
> http://www.fefe.de/dietlibc/
>
> (287kB statically linked zsh is not too shabby, I reckon.)
uClibc is also a nice alternative. Works just great and uses glibc
header files. I only fully support shared libs on x86 and arm
at the moment.
http://cvs.uclinux.org/uClibc.html
(I need to update the webpage sometime)
> That and things like busybox:
>
> http://busybox.lineo.com/
Why thanks. I've sure worked hard to make it be nice and small...
-Erik
--
Erik B. Andersen email: [email protected], formerly of Lineo
--This message was written using 73% post-consumer electrons--
On Thu, Oct 04, 2001 at 05:38:12AM +0000, Linus Torvalds wrote:
> The reason the kernel refuses to honour it, is that MAP_DENYWRITE is an
> excellent DoS-vehicle - you just mmap("/etc/passwd") with MAP_DENYWRITE,
> and even root cannot write to it.. Vary nasty.
why is MAP_EXECUTABLE dealt with in the same way then ?
john
--
" It is quite humbling to realize that the storage occupied by the longest line
from a typical Usenet posting is sufficient to provide a state space so vast
that all the computation power in the world can not conquer it."
- Dave Wallace
CaT <[email protected]> writes:
> On Thu, Oct 04, 2001 at 12:15:01AM -0600, Eric W. Biederman wrote:
> > I have days when I'm frustrated by the size of both glibc and the
> > linux kernel. stripped both the linux kernel and glibc are comparable
> > in size. Though I think the 400KB of compressed glibc-2.1.2 is
> > actually smaller than the kernel for the most part. I have to strip
> > off practically everthing to get a useable bzImage under 400KB.
> >
> > So any good ideas on how to get the size of linux down?
>
> Mind if I ask why you need a bzimage under 400kb? Just curious as I've
> never had the need. (And I can see needing it less then 1.4meg - are you
> trying to get a kernel AND a ramdisk on the one floppy?)
floppies have lots of room.
I'd like to get a kernel, ramdisk, and some hw initialization code all
on a 256KB ROM. I have my ramdisk down to about 14KB compressed. I
have my hw initialization code down to 32KB uncompressed (and I might
be able to reduce that further). So I want something like a 192KB
(compressed) linux kernel.
If I had that some of the hard problems of with linuxBIOS would just
drop away.
Eric
Andreas Schwab <[email protected]> writes:
> [email protected] (Eric W. Biederman) writes:
>
> |> So any good ideas on how to get the size of linux down?
>
> How about linux-0.01?
There might be some fodder there, but I doubt it. I have
played with linux-lite-v1.00 (which is something like linux-1.09).
And couldn't get any really compelling results. Plus for it to be
useful I would still need to backport all of the driver API's from
2.4.x.
Just for a note, UZI has been ported to x86 as UZIX so a 32KB kernel
(without a network stack is achievable on x86). If I can get a core
kernel size with no drivers down to 64KB or less I would be happy.
So far I haven't been able to come up with anything satisfying.
Eric
On 4 Oct 2001, Eric W. Biederman wrote:
> CaT <[email protected]> writes:
>
> > On Thu, Oct 04, 2001 at 12:15:01AM -0600, Eric W. Biederman wrote:
[SNIPPED...]
>
> I'd like to get a kernel, ramdisk, and some hw initialization code all
> on a 256KB ROM. I have my ramdisk down to about 14KB compressed. I
> have my hw initialization code down to 32KB uncompressed (and I might
> be able to reduce that further). So I want something like a 192KB
> (compressed) linux kernel.
>
> If I had that some of the hard problems of with linuxBIOS would just
> drop away.
>
Major size differences seem to depend upon the C compiler being
used.
Here are two different systems with exactly the same kernel
with exactly the same ".config" file.
The kernel on one is compiled with whatever comes with RedHat.
The other is compiled with egcs-2.91.66.
We are looking at the compressed size! The actual expanded size
difference is about 2:1 !
Script started on Thu Oct 4 10:10:31 2001
[root@blackhole /boot]# ls -la vmlinuz-2.4.1
-rw-r--r-- 1 root root 648638 Mar 12 2001 vmlinuz-2.4.1
[root@blackhole /boot]# gcc --version
2.96
[root@blackhole /boot]# exit
Script done on Thu Oct 4 10:11:11 2001
Script started on Thu Oct 4 10:11:26 2001
# gcc --version
egcs-2.91.66
# ls -la vmlinuz-2.4.1
-rw-r--r-- 1 root root 584959 Oct 1 15:26 vmlinuz-2.4.1
# exit
exit
Script done on Thu Oct 4 10:12:01 2001
It seems that, amongst other ethings, 2.96 aligns every function and
every memory variable on 16-byte boundaries, i.e., the offset address
lowest nibble is always 0. There doesn't seem to be any way to turn it
off.
So, if size counts, use egcs-2.91.66. It works okay with 2.4.x kernels.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).
I was going to compile a list of innovations that could be
attributed to Microsoft. Once I realized that Ctrl-Alt-Del
was handled in the BIOS, I found that there aren't any.
Linus Torvalds <[email protected]> writes:
> On Thu, 4 Oct 2001, Alexander Viro wrote:
> > <nit>
> > I _really_ doubt that something does write() on /etc/passwd. Create a
> > file and rename it over the thing - sure, but that's it.
> > </nit>
>
> Well, yeah, bad choice. Can you believe /var/run/utmp or similar?
>
> And yes, we could add checks for the thing being executable before we
> accept MAP_DENYWRITE instead of just ignoring the flag from user space.
> Nobody has cared enough to make the effort.
>
> Until now?
Hmm. Before I volunteer I need to think this thing out. I orginally
missed the clearing of MAP_DENYWRITE in the arch specific code.
First what user space really wants is the MAP_COPY. Which is
MAP_PRIVATE with the guarantee that they don't see anyone else's changes.
Just skimming /lib most libraries are only rw by root so the case we
are protecting ourselves against is fumble fingered administrators.
The two fumbles in particular are fumbling the permissions, and
accidentaly writing to a shared library.
given that MAP_DENYWRITE does remove unlink permission for most uses it
can be worked around.
We already allow user space applications to make arbitrary files
MAP_DENYWRITE simply by executing them, and the only restriction is
that MAP_DENYWRITE only persists while user space has the file open.
So I guess allowing it in mmap is not actually a problem, as we can
already do that.
At the same time there are cases where it is unacceptable to stop
people from writing to a file just because you have read access to it,
and you open the file. Even having write access to a file isn't
enough. So you really need to have execute and read permissions on a
file for this to be reasonable.
The one downside of requiring libraries to be executable is that
tricks like preventing. /lib/ld-linux.so.2 /mnt/noexec/bin/true is
a little harder.
A remaining question was for newer kernels should MAP_DENYWRITE fail
if you don't have execute permissions, or should it just be a strong
hint.
Having MAP_DENYWRITE fail on filesystems that are mounted noexec and
having a dynamic loader that tests looks like it would be easier to
enforce a noexec policy for untrusted mounts.
So the code will need to look something like.
if (flags & MAP_DENYWRITE) {
struct inode *inode = file->f_dentry->inode;
if (IS_NOEXEC(inode) || !ISREG(inode->i_mode) ||
(permission(inode, MAY_EXEC) != 0)) {
return -EACCESS;
}
}
Eric
On 4 Oct 2001, Eric W. Biederman wrote:
>
> First what user space really wants is the MAP_COPY. Which is
> MAP_PRIVATE with the guarantee that they don't see anyone else's changes.
Which is a completely idiotic idea, and which is only just another example
of how absolutely and stunningly _stupid_ Hurd is.
The thing with MAP_COPY is that how do you efficiently _detect_ somebody
elses changes on a page that you haven't even read in yet?
So you have a few choices, all bad:
- immediately reading in everything, basically turning the mmap() into a
read. Obviously a bad idea.
- mark the inode as a "copy" inode, and whenever somebody writes to it,
you not only make sure that you do copy-on-write on the page cache page
(which, btw, is pretty much impossible - how did you intend to find all
the other _non_COPY_ users that _want_ coherency).
You also have to make sure that if somebody changes the page, you have
to read in the old contents first (not normally needed for most
changes that write over at least a full block), but you also have to
save the old page somewhere so that the mapping can use it if it faults
it in later. And how the hell do you do THAT? Especially as you can
have multiple generations of inodes with different sets of "MAP_COPY"
on different contents..
In short, now you need filesystem versioning at a per-page level etc.
Trust me. The people who came up with MAP_COPY were stupid. Really. It's
an idiotic concept, and it's not worth implementing.
And this all for what is a administration bug in the first place.
In short: just say NO TO DRUGS, and maybe you won't end up like the Hurd
people.
Linus
Linus Torvalds writes:
>
> On 4 Oct 2001, Eric W. Biederman wrote:
> >
> > First what user space really wants is the MAP_COPY. Which is
> > MAP_PRIVATE with the guarantee that they don't see anyone else's changes.
>
> Which is a completely idiotic idea, and which is only just another
> example of how absolutely and stunningly _stupid_ Hurd is.
Indeed. If you're updated a shared library, why not *create a new
file* and then rename it?!? That lets running programmes work fine,
and new programmes will get the new library. Also, the following
construct makes a lot of sense:
ld -shared -o libfred.so *.o || mv libfred.so /usr/local/lib
Why? Because if ld(1) fails for some reason, and ends up writing a
short file, *you don't want to install the bloody thing*!!! Any new
user would be stuffed (no way around that, even with MAP_COPY).
I don't want to install/upgrade to a half-working library. What's the
point in that?
Regards,
Richard....
Permanent: [email protected]
Current: [email protected]
On Thu, 4 Oct 2001, Linus Torvalds wrote:
> In short, now you need filesystem versioning at a per-page level etc.
*ding* *ding* *ding* we have a near winner. Remember, folks, Hurd had been
started by people who not only don't understand UNIX, but detest it.
ITS/TWENEX refugees. And semantics in question comes from there -
they had "open and make sure that anyone who tries to modify will get
a new version, leaving one we'd opened unchanged".
> Trust me. The people who came up with MAP_COPY were stupid. Really. It's
> an idiotic concept, and it's not worth implementing.
Well, actually that's a concept that made sense on system we got mmap from[1]
They just want infection to be complete.
[1] cue Tom Lehrer singing "I got it from Agnes, she got it from Jim"
Richard Gooch <[email protected]> writes:
|> Linus Torvalds writes:
|> >
|> > On 4 Oct 2001, Eric W. Biederman wrote:
|> > >
|> > > First what user space really wants is the MAP_COPY. Which is
|> > > MAP_PRIVATE with the guarantee that they don't see anyone else's changes.
|> >
|> > Which is a completely idiotic idea, and which is only just another
|> > example of how absolutely and stunningly _stupid_ Hurd is.
|>
|> Indeed. If you're updated a shared library, why not *create a new
|> file* and then rename it?!? That lets running programmes work fine,
|> and new programmes will get the new library. Also, the following
|> construct makes a lot of sense:
|> ld -shared -o libfred.so *.o || mv libfred.so /usr/local/lib
^^
That || should be &&, otherwise you are doing exactly the opposite of what
you want.
Andreas.
--
Andreas Schwab "And now for something
[email protected] completely different."
SuSE Labs, SuSE GmbH, Schanz?ckerstr. 10, D-90443 N?rnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
Andreas Schwab writes:
> Richard Gooch <[email protected]> writes:
>
> |> Linus Torvalds writes:
> |> >
> |> > On 4 Oct 2001, Eric W. Biederman wrote:
> |> > >
> |> > > First what user space really wants is the MAP_COPY. Which is
> |> > > MAP_PRIVATE with the guarantee that they don't see anyone else's changes.
> |> >
> |> > Which is a completely idiotic idea, and which is only just another
> |> > example of how absolutely and stunningly _stupid_ Hurd is.
> |>
> |> Indeed. If you're updated a shared library, why not *create a new
> |> file* and then rename it?!? That lets running programmes work fine,
> |> and new programmes will get the new library. Also, the following
> |> construct makes a lot of sense:
> |> ld -shared -o libfred.so *.o || mv libfred.so /usr/local/lib
>
> That || should be &&, otherwise you are doing exactly the opposite
> of what you want.
Yeah. Of course. Brain fart. Fingers faster than brain syndrome...
Regards,
Richard....
Permanent: [email protected]
Current: [email protected]
Wow what a wild spin off of my stream of consciousness.
Linus Torvalds <[email protected]> writes:
> On 4 Oct 2001, Eric W. Biederman wrote:
> >
> > First what user space really wants is the MAP_COPY. Which is
> > MAP_PRIVATE with the guarantee that they don't see anyone else's changes.
>
> Which is a completely idiotic idea, and which is only just another example
> of how absolutely and stunningly _stupid_ Hurd is.
Well in this case it is Mach not Hurd, and I wouldn't be suprised if
you could find MAP_COPY in selected BSDs. The semantics wanted are
very reasonable. You only want to see your changes to a given file.
In practice there is no reason that anyone needs to actually change
the file so MAP_PRIVATE | MAP_DENYWRITE is much more sensible (at
least implementation wise).
> The thing with MAP_COPY is that how do you efficiently _detect_ somebody
> elses changes on a page that you haven't even read in yet?
Definentily.
> Trust me. The people who came up with MAP_COPY were stupid. Really. It's
> an idiotic concept, and it's not worth implementing.
I quite agree that MAP_COPY is not worth implementing. And I never
said otherwise. I only mentioned so I could think what the
alternatives where and what it's benefits really were.
> And this all for what is a administration bug in the first place.
Probably. I can't think of any other cases where this could trigger.
However I think is is sensible to export MAP_DENYWRITE to user space.
It cheaply closes the administration bug, it is partially exported
already, and can even allow dynamic loaders to follow the kernel
noexec policy.
The latter looks like an actual advantage over MAP_COPY. Though
somehow MAP_EXECUTABLE sounds like a better name...
Eric
> On Thu, 4 Oct 2001, Linus Torvalds wrote:
>
> > The reason the kernel refuses to honour it, is that MAP_DENYWRITE is an
> > excellent DoS-vehicle - you just mmap("/etc/passwd") with MAP_DENYWRITE,
> > and even root cannot write to it.. Vary nasty.
>
> <nit>
> I _really_ doubt that something does write() on /etc/passwd. Create a
> file and rename it over the thing - sure, but that's it.
> </nit>
The MAP_DENYWRITE rule was added a long time ago because people found actual
workable DoS attacks
[email protected] (Erik Andersen) wrote on 04.10.01 in <[email protected]>:
> On Thu Oct 04, 2001 at 11:30:19AM +0300, Ville Herva wrote:
> > On Thu, Oct 04, 2001 at 12:15:01AM -0600, you [Eric W. Biederman] claimed:
> > >
> > > <snip size of glibc>
> >
> > Where size is an issue, diet libc might be an alternative:
> >
> > http://www.fefe.de/dietlibc/
> >
> > (287kB statically linked zsh is not too shabby, I reckon.)
>
> uClibc is also a nice alternative. Works just great and uses glibc
> header files. I only fully support shared libs on x86 and arm
> at the moment.
>
> http://cvs.uclinux.org/uClibc.html
>
> (I need to update the webpage sometime)
>
> > That and things like busybox:
> >
> > http://busybox.lineo.com/
>
> Why thanks. I've sure worked hard to make it be nice and small...
And some people *still* start threads titled "Busybox still too bloated".
(Screaming their heads off for a reduction of around 50 KB, IIRC.)
MfG Kai
[email protected] (Alexander Viro) wrote on 04.10.01 in <[email protected]>:
>
> On Thu, 4 Oct 2001, Linus Torvalds wrote:
>
> > In short, now you need filesystem versioning at a per-page level etc.
>
> *ding* *ding* *ding* we have a near winner. Remember, folks, Hurd had been
> started by people who not only don't understand UNIX, but detest it.
> ITS/TWENEX refugees. And semantics in question comes from there -
> they had "open and make sure that anyone who tries to modify will get
> a new version, leaving one we'd opened unchanged".
Sounds to me like it could be done ... *if* you had per-process filesystem
snapshot capability.
Of course, that's using ICBMs to swat mosquitos. I don't recommend it just
for implementing a mmap() flag.
MfG Kai
Alan Cox <[email protected]> writes:
> > On Thu, 4 Oct 2001, Linus Torvalds wrote:
> >
> > > The reason the kernel refuses to honour it, is that MAP_DENYWRITE is an
> > > excellent DoS-vehicle - you just mmap("/etc/passwd") with MAP_DENYWRITE,
> > > and even root cannot write to it.. Vary nasty.
> >
> > <nit>
> > I _really_ doubt that something does write() on /etc/passwd. Create a
> > file and rename it over the thing - sure, but that's it.
> > </nit>
>
> The MAP_DENYWRITE rule was added a long time ago because people found actual
> workable DoS attacks
Do you have any details. I would like to figure out what it takes to
export MAP_DENYWRITE safely to userspace.
Currently checking to see if the file is executable looks good
enough. I don't see any case where this would be a problem, unless
someone has set their permissions wrong.
The fix for bad permission (during a DOS attack) is either:
chmod correct_permissions foo
lsof foo | xargs kill
or:
chmod correct_permissions foo
mv foo bar
cp -a bar foo
rm bar
Which looks fairly straight forward.
Eric
On 5 Oct 2001, Eric W. Biederman wrote:
> > The MAP_DENYWRITE rule was added a long time ago because people found actual
> > workable DoS attacks
>
> Do you have any details. I would like to figure out what it takes to
> export MAP_DENYWRITE safely to userspace.
I think it literally was /var/run/[uw]tmp, and using MAP_DENYWRITE to
disable all logins.
But it pretty much covers _any_ logfiles that are readable (and thus
openable) by users.
> Currently checking to see if the file is executable looks good
> enough.
[ executable by the user in question, not just anybody ]
Yes, I suspect it is.
> The fix for bad permission (during a DOS attack) is either:
> chmod correct_permissions foo
> lsof foo | xargs kill
Well, if you cannot log in as root, it doesn't much matter what the "fix"
is, so it's better to be safe than sorry.
Linus
Linus Torvalds wrote:
> Trust me. The people who came up with MAP_COPY were stupid. Really. It's
> an idiotic concept, and it's not worth implementing.
I can think of an efficiency-related use for MAP_COPY, and it has
nothing to do with shared libraries:
- An editor using mmap() to read a file.
The existing semantics require that you either call read() and waste
(potentially shared) memory to do this, or use MAP_PRIVATE and then
deliberately page in and dirty all of the file's pages.
Neither of these seem to be the most efficient way to launch an editor.
cheers,
-- Jamie
On Sat, 13 Oct 2001, Jamie Lokier wrote:
>
> I can think of an efficiency-related use for MAP_COPY, and it has
> nothing to do with shared libraries:
>
> - An editor using mmap() to read a file.
No, you're thinking the wrong way.
Trust me, MAP_COPY really _is_ stupid, and the Hurd is a piece of crap.
People who think MAP_COPY is a good idea are people who cannot think about
the implications of it, and cannot think about the alternatives.
In particular, you claim that you could use "mmap()" for "read()", and
speed up the application that way. Ok, fair enough.
Now, somebody who _isn't_ stupid (and that, of course, is me), immediately
goes "well, _duh_, why don't you speed up read() instead?".
The fact is, all the problems that "MAP_COPY" has just go away if you
instead of thinking about a mmap(), you think about doing a "read()" and
just marking the pages PAGE_COPY if they are exclusive.
In short: MAP_COPY is braindamaged, because it doesn't have enough
information at the right level to do a reasonable job of it. What people
want to use it for is really to emulate "read()" efficiently using mmap,
and _nothing_ else. That is the only reason for it ever existing, and the
fact is, that clearly shows just how _stupid_ the whole thing is.
You migth as well just do a read() in the first place.
Your arguments are
- read() implies a memcpy()
- read() dirties pages and causes more memory pressure
but you don't actually _question_ those arguments.
I will tell you that doing a read() that _acts_ like the MAP_COPY you so
want is a LOT easier than doing MAP_COPY in the first place.
Why?
- a read() call doesn't have any "history" - it doesn't leave (bogus)
VM data around like MAP_COPY does. MAP_COPY says "I want these pages to
have the contents they did _when_I_did_the_mapping_", which is a
temporal shift that just doesn't make sense in any sane VM model, and
which inherently implies versioning.
- a read() can fairly easily just do the optimization
(a) if we're reading a large area
(b) if the offset and the destination are page-aligned
(c) if the page is exclusive (ie no existing other owners)
then
just do the page move instead of the copy, and mark the page as
PAGE_COPY
Every other use of the page that can change it (ie a shared writable
mapping, or a "write()" call) will now check the PAGE_COPY bit on the
_page_, and just say "ok, I'll allocate a new page, and atomically
switch the ones, and leave the old page untouched and remove it from
the page cache"
(And the swap-out logic has to turn a PAGE_COPY page into a swap-cache
page - this is the real downside, because it implies that we will have
to write it out to swap if we're low on memory, unlike a real mmap)
Notice? Same as MAP_COPY, but without any global state.
And notice how this is actually conceptually much closer to what you
actually _want_ to use MAP_COPY for.
Could we implement MAP_COPY as such a read()? Yes, sure. But that's just
confusing the issue - why call it a mmap() at all, when it isn't. The day
when Hurd is so common that we want to emulate its braindamages is not
going to be in my life-time, I suspect.
Linus
On Sat, 13 Oct 2001, Linus Torvalds wrote:
> Trust me, MAP_COPY really _is_ stupid, and the Hurd is a piece of crap.
Isn't it a Mach thing ?
> People who think MAP_COPY is a good idea are people who cannot think about
> the implications of it, and cannot think about the alternatives.
I guess thinking about the implications will come when
the Hurd people seriously start porting their beast to
other microkernels, say L4 ;)
This should be a spectacle worth watching (from a safe
distance).
cheers,
Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ (volunteers needed)
http://www.surriel.com/ http://distro.conectiva.com/
Whatever will be the chosen solution, it would have to allow to
overwrite all the executables and libraries files (if we have enough
permissions).
Because:
- If I overwrite a shared library and then one running program crash, it
will be my fault (as system administrator) or mistake.. ;-)
- It is probable that one file library is updated within one more global
update, then probably I restart later the new demon or program. So if
the program crash I'll fix the problem eventually.
- The previous version of a file library that I am replacing can depend
on another file that the installer of the new version of the program
simply erases it. For example:
a.so depends of b.so
but
a_new_version.so does not depend of b.so.
When I or an installer install the new program version, me or the
installer erase b.so because the new version doesn't use it.
So, that it matters if a program can or can't access to the old version
of a.so if b.so was erased?
And eventually, if I decide to update a library, I would have to do it
(I suspect it would be the same case with executables files). It doesn't
the matter if the change implies a fault in a running program.
It can be that this serves so that a hacker can attack the system... or
I could hang a program when this is not my objective. Maybe a flag in
/proc/somewhere would be am useful thing:
- if it's 1, I can overwrite all the libraries and executables files (If
I've permission, etc.);
- if it's 0, I can not overwrite anything If it's in use.
I only want that everybody respect my right to do the wrong or stupid
thing. This is an system administrator right :-)
Pablo
Linus Torvalds wrote:
> On Sat, 13 Oct 2001, Jamie Lokier wrote:
> > I can think of an efficiency-related use for MAP_COPY, and it has
> > nothing to do with shared libraries:
> >
> > - An editor using mmap() to read a file.
>
> No, you're thinking the wrong way.
...
> People who think MAP_COPY is a good idea are people who cannot think about
> the implications of it, and cannot think about the alternatives.
:-)
> In particular, you claim that you could use "mmap()" for "read()", and
> speed up the application that way. Ok, fair enough.
>
> Now, somebody who _isn't_ stupid (and that, of course, is me), immediately
> goes "well, _duh_, why don't you speed up read() instead?".
Thanks Linus. You are right, speeding up read() is the right thing to do.
In fact it was proposed here on this list years ago, and I think you
argued against it (TLB flush costs). The costs and kernel
infrastructure have changed and maybe the idea could be revisited now.
Sometimes one has to be an idiot and explore an application of MAP_COPY
to get someone looking at sensible old ideas again :-)
have a nice day,
-- Jamie
Pablo Alcaraz wrote:
> Whatever will be the chosen solution, it would have to allow to
> overwrite all the executables and libraries files (if we have enough
> permissions).
Pablo, there's no need for this. Upgrades to libraries are done by
removing the old file's name from its parent directory and placing the
new file at that name, perhaps using an atomic rename. The old _file_
continues to exist even though its name has been deleted, until the last
program using the library finishes using it, even though the old file
does not have a name any more.
New programs that open the library will find the new file. Both files
exist until the old file finally disappears. At no point is any file's
contents overwritten.
This is why you can upgrade a running linux system including critical
system libraries, and nothing crashes. Usually ;-)
-- Jamie
On Sat, 13 Oct 2001, Jamie Lokier wrote:
>
> In fact it was proposed here on this list years ago, and I think you
> argued against it (TLB flush costs). The costs and kernel
> infrastructure have changed and maybe the idea could be revisited now.
It's still not entirely unlikely that doing VM mappings is simply more
expensive than just doing a memcpy. The TLB invalidate is only part of the
issue - you also have the page table walk, the VM lock, and the fact that
PAGE_COPY itself ends up being overhead.
Which is why the PAGE_COPY kind of read() optimization is _probably_ only
worth it if the user asks for it directly (or automatically only for large
reads together with single-threaded applications).
The explicit flag is probably a good idea also because of usage patterns
(PAGE_COPY is a slowdown _if_ the file is actually written to or even
mapped shared).
Linus
Linus Torvalds wrote:
> > In fact it was proposed here on this list years ago, and I think you
> > argued against it (TLB flush costs). The costs and kernel
> > infrastructure have changed and maybe the idea could be revisited now.
>
> It's still not entirely unlikely that doing VM mappings is simply more
> expensive than just doing a memcpy. The TLB invalidate is only part of the
> issue - you also have the page table walk, the VM lock, and the fact that
> PAGE_COPY itself ends up being overhead.
There are applications (GCC comes to mind) which are using mmap() to
read files now because it is measurably faster than read(), for
sufficiently large source files.
I don't know where the optimal costs lie.
-- Jamie
On Thu, Oct 04, 2001 at 10:24:48AM -0400, Richard B. Johnson wrote:
> Major size differences seem to depend upon the C compiler being
> used.
The -Os option could help a bit too.
On Sat, Oct 13, 2001 at 09:46:03PM +0200, Jamie Lokier wrote:
> There are applications (GCC comes to mind) which are using mmap() to
> read files now because it is measurably faster than read(), for
> sufficiently large source files.
But it does have the advantage of allowing the sharing of memory, does
it not?
In article <[email protected]>,
Linus Torvalds <[email protected]> wrote:
>
>The explicit flag is probably a good idea also because of usage patterns
>(PAGE_COPY is a slowdown _if_ the file is actually written to or even
>mapped shared).
Actually, I missed the obvious case: quite often when you do a "read()",
the reader itself will end up writing to the area read into. In which
case doing the PAGE_COPY would also slow down measurably, due to the
extra overhead of the copy-on-write fault (which not just does the copy
that we tried to avoid, but will take a fault and more VM locks).
So if we want to do this optimization, we _definitely_ want it to be
explicitly controlled by a flag, like O_DIRECT is. There are just too
many cases where it's a pessimization, and while the user can often tell
before-hand, the kernel simply cannot.
Linus
Aaron Lehmann <[email protected]> writes:
> On Sat, Oct 13, 2001 at 09:46:03PM +0200, Jamie Lokier wrote:
> > There are applications (GCC comes to mind) which are using mmap() to
> > read files now because it is measurably faster than read(), for
> > sufficiently large source files.
>
> But it does have the advantage of allowing the sharing of memory, does
> it not?
Only if you are going to write to the data.
Eric
Jamie Lokier writes:
> There are applications (GCC comes to mind) which are using mmap() to
> read files now because it is measurably faster than read(), for
> sufficiently large source files.
So? MAP_PRIVATE is just fine for these. The simple solution if you
care about an edit in the middle of a compile is to have your editor
write a new file and do an atomic rename. No half-and-half data
problems, and the VM logic is kept simple (well, relative to what we
have now;-).
Regards,
Richard....
Permanent: [email protected]
Current: [email protected]
On Sat, Oct 13, 2001 at 04:27:48PM -0600, Eric W. Biederman wrote:
> > But it does have the advantage of allowing the sharing of memory, does
> > it not?
>
> Only if you are going to write to the data.
Why? If gcc and another application read the source file with an
mmap() with the right parameters (read-only), it would only be shared
between them, as I understand it. If they both read() the file into
private buffers those can not be shared.
On Sat, 13 Oct 2001, Jamie Lokier wrote:
>
> There are applications (GCC comes to mind) which are using mmap() to
> read files now because it is measurably faster than read(), for
> sufficiently large source files.
>
> I don't know where the optimal costs lie.
The gcc people tested it, and their cut-off point is at 30kB or so.
Anything smaller than that is faster to just "read()".
Now, that's a traditional mmap(), though, which has more overhead than a
"read-with-PAGE_COPY" would have. The pure mmap() approach has the actual
page fault overhead too, along with having to do "fstat()" and "munmap()".
Linus
Linus Torvalds <[email protected]> writes:
> On Sat, 13 Oct 2001, Jamie Lokier wrote:
> >
> > There are applications (GCC comes to mind) which are using mmap() to
> > read files now because it is measurably faster than read(), for
> > sufficiently large source files.
> >
> > I don't know where the optimal costs lie.
>
> The gcc people tested it, and their cut-off point is at 30kB or so.
> Anything smaller than that is faster to just "read()".
>
> Now, that's a traditional mmap(), though, which has more overhead than a
> "read-with-PAGE_COPY" would have. The pure mmap() approach has the actual
> page fault overhead too, along with having to do "fstat()" and "munmap()".
Hmm. read-with-PAGE_COPY may not be any faster than read as you still
read all of the data into memory, so you have almost the same latency.
mmap might work better because of better overlapping of I/O and cpu
processing.
Also read-with-PAGE_COPY has some really interesting implications for the
page out routines. Because anytime you start the page out you have to
copy the page. Not exactly when you want to increase the memory presure.
And not at all suitable for shared libraries.
Eric
le dim 14-10-2001 ? 08:49, Eric W. Biederman a ?crit :
> Also read-with-PAGE_COPY has some really interesting implications for the
> page out routines. Because anytime you start the page out you have to
> copy the page. Not exactly when you want to increase the memory presure.
> And not at all suitable for shared libraries.
I didn't understood very well why you would want to swap out a page
marked PAGE_COPY ? Doesn't it make sense to special-case it and just
leave it "in the file", as long as it's untouched ?
Xav
> My big question is how to correctly define O_EXEC for every
> architecture. But I would like to know if there are objectionable
> parts as well.
It looks totally unworkable. Open() has side effects on a large number of
platforms, and being able to open an exec only file might trigger them
as well as all sorts of other potential problems where files are
marked rwx by accident as is very common.
You narrow the DoS vulnerability and add a whole new set of open based
ones.
This isnt a problem worth solving. Shared libraries are managed by the
superuser. The shared library tools already do the right thing. The
superuser can equally reboot the machine or reformat the disk by accident
anyway.
> This isnt a problem worth solving. Shared libraries are managed by the
> superuser. The shared library tools already do the right thing. The
> superuser can equally reboot the machine or reformat the disk by accident
> anyway.
Sounds _very_ true for me...
cheers, Samium Gromoff
On 14 Oct 2001, Eric W. Biederman wrote:
>
> Hmm. read-with-PAGE_COPY may not be any faster than read as you still
> read all of the data into memory, so you have almost the same latency.
> mmap might work better because of better overlapping of I/O and cpu
> processing.
Most of the time, you either have the IO overhead (and whether you use
read or mmap won't matter all that much, because you're IO limited), or
the thing is cached.
For gcc, it's cached 99% of the time, because most of the IO ends up being
header files (this is, of course, assuming that you're compiling a big
project, but if you're not, the big overhead is in loading _gcc_, not in
the pages it reads).
> Also read-with-PAGE_COPY has some really interesting implications for the
> page out routines. Because anytime you start the page out you have to
> copy the page. Not exactly when you want to increase the memory presure.
No no. Read my thing again. On swap-out, you just move the thing to the
swap cache.
Sure, that removes it from the regular cache, and that's possibly a
performance problem. But
> And not at all suitable for shared libraries.
No. Why would you "read" shared libraries? read is read, mmap is mmap. If
you want mmap, use mmap. Don't mess it up with MAP_COPY, which is not mmap
at all.
Linus
[email protected] (Linus Torvalds) wrote on 13.10.01 in <[email protected]>:
> Now, somebody who _isn't_ stupid (and that, of course, is me), immediately
> goes "well, _duh_, why don't you speed up read() instead?".
Probably because people think that's hard ... so they invent another thing
that's even harder.
> The fact is, all the problems that "MAP_COPY" has just go away if you
> instead of thinking about a mmap(), you think about doing a "read()" and
> just marking the pages PAGE_COPY if they are exclusive.
That's part of the problem. The other is the idea that mmap only needs to
read those pages actually needed.
Hmm.
Would it be possible - and cheap enough - to do this optimization:
When read()ing a file, *if* nobody else has that inode open (which is
probably impossible to determine with networked filesystems, so one would
probably have to exclude those), create mmap-like mappings where possible
without actually reading the pages; at the moment someone else opens the
file, actually read them in and mark them PAGE_COPY.
Or maybe just do it exclusive of writers, not readers.
(I don't think waiting for actual writes would be sensible.)
MfG Kai
Linus Torvalds <[email protected]> writes:
> On 14 Oct 2001, Eric W. Biederman wrote:
> >
> > Hmm. read-with-PAGE_COPY may not be any faster than read as you still
> > read all of the data into memory, so you have almost the same latency.
> > mmap might work better because of better overlapping of I/O and cpu
> > processing.
>
> Most of the time, you either have the IO overhead (and whether you use
> read or mmap won't matter all that much, because you're IO limited), or
> the thing is cached.
Thanks that makes sense for where you are targeting the performance
improvement.
> For gcc, it's cached 99% of the time, because most of the IO ends up being
> header files (this is, of course, assuming that you're compiling a big
> project, but if you're not, the big overhead is in loading _gcc_, not in
> the pages it reads).
O.k. So the case that matters is when you are repeatedly reading from
the cache.
> > Also read-with-PAGE_COPY has some really interesting implications for the
> > page out routines. Because anytime you start the page out you have to
> > copy the page. Not exactly when you want to increase the memory presure.
>
> No no. Read my thing again. On swap-out, you just move the thing to the
> swap cache.
>
> Sure, that removes it from the regular cache, and that's possibly a
> performance problem. But
On swap-out the optimization to steal the page from the page cache
makes it much less of a problem. But you still have to be prepared to
do the copy. As there might be multiple users of the page, in which
case you can't steal the page cache copy.
> > And not at all suitable for shared libraries.
>
> No. Why would you "read" shared libraries? read is read, mmap is mmap. If
> you want mmap, use mmap. Don't mess it up with MAP_COPY, which is not mmap
> at all.
Linus I'm sure you realized that. I'm not certain the whole rest of the world
did. And the shared library topic is what got this discussion going.
Hmm. So what you are looking at is something similiar to O_DIRECT,
but that will instead of doing the I/O and bypassing the page cache,
will instead borrow the page cache copies pages.
Eric
Alan Cox <[email protected]> writes:
> > My big question is how to correctly define O_EXEC for every
> > architecture. But I would like to know if there are objectionable
> > parts as well.
>
> It looks totally unworkable. Open() has side effects on a large number of
> platforms, and being able to open an exec only file might trigger them
> as well as all sorts of other potential problems where files are
> marked rwx by accident as is very common.
We already can open an exec only file just open("file", 0).
In fact it looks like you can open a file with no permissions at all.
You just can't do anything with it.
All O_EXEC does is stipulate that you must have the exec permission
to the file, and it does cause a side effect. Possibly it should
be broken into open, and then side effect. fcntl(fd,F_DENYWRITE).
My primary observation is that we don't need to manage the DENYWRITE
at the mmap level. The file descriptor level gets the job done with
less code, fewer suprises, fewer races.
> You narrow the DoS vulnerability and add a whole new set of open based
> ones.
You may be write. With the cleanup of the implementation by moving
everything into open (where we implement this for exec), it hadn't
occured to me that I might be opening a different kettle of fish.
> This isnt a problem worth solving. Shared libraries are managed by the
> superuser. The shared library tools already do the right thing. The
> superuser can equally reboot the machine or reformat the disk by accident
> anyway.
Yes the superuser can shoot himself in the foot, and by that argument
I should delete the entire implementation of MAP_DENYWRITE from the
kernel.
It is by no means true that the existing user space tools get it
right. I have multiple shared libraries where the owner has write
permission to them. And I do believe gcc -o foo.so does not do a
unlink/open(O_CREAT) pair. Nor does cp.
As for the superuser being the only one who touches shared libaries.
That is as true as it is that the superuser is the only one who
touches binaries, or scripts.
It is also quite unobvious that you shouldn't write to shared
libraries. If you have looked at how shared libaries are mapped and
you know that they are mapped into memory with mmap(MAP_PRIVATE), and
you understand how mmap works. It is quite obvious why you shouldn't
touch them. There are a lot of users that haven't done that however.
Accidental rwx permissions settings may indeed be a valid argument,
though I think that is more a bug in chmod, than anything else.
Eric
You could also look at TOPS-20 for things they did well or unwell. All
disk I/O in TOPS-20 is done by the VM code; the funky SIN%/SOUT% etc.
simply adjust the mapping window and copy stuff, letting the VM subsystem
schedule I/O as needed.
--
Mark H. Wood, Lead System Programmer [email protected]
Make a good day.
> We already can open an exec only file just open("file", 0).
Wrong.
> In fact it looks like you can open a file with no permissions at all.
> You just can't do anything with it.
This isnt true. Read the code.
Alan
On Mon, 15 Oct 2001, Alan Cox wrote:
>
> > We already can open an exec only file just open("file", 0).
>
> Wrong.
For exact details: there's a magic value, but it's 3.
open("file", 3)
will open the filename with neither read nor write permissions, but it
will actually _require_ both read- and write- permissions, and it was
historically a way to open a device just for ioctl's.
I don't think anybody actually uses it any more.
Linus
Linus Torvalds <[email protected]> writes:
> On Mon, 15 Oct 2001, Alan Cox wrote:
> >
> > > We already can open an exec only file just open("file", 0).
> >
> > Wrong.
Grr. All of that munging about with the permission bits confused me.
We do allow a no permission open in open_namei (which is what
I was staring at), we just can't generate it with filp_open. So my
open_exec needs to call open_namei directly oh well. I never intended
to allow a weird case like that, or imply that I would allow it.
Since I now have the permission checks seperate between user space,
and kernel space. I have added an additional check that O_EXEC
fails if anyone has write permission to the file. So the permissions
must be r-xr-xr-x on the file.
Just to reiterate I see this as a solution to two problems
1) It adds an additional safety check that shared libraries won't
mutate under you.
2) It allows user space access to the security policy information
regarding executables. Allowing ld-linux.so to refust to
execute binaries, and shared libaries on a filesystem mounted
noexec.
And even if this doesn't get exported to user space I see this as
a small code cleanup. That simplifies MAP_DENYWRITE implementation.
My biggest unresolved issue is which numbers to choose for O_EXEC on
every platform. As the DENYWRITE code is cleaner in open than in mmap.
Eric
diff -uNrX linux-ignore-files linux-2.4.12/Documentation/Changes linux-2.4.12.eb3/Documentation/Changes
--- linux-2.4.12/Documentation/Changes Sat Oct 13 16:19:40 2001
+++ linux-2.4.12.eb3/Documentation/Changes Sun Oct 14 01:39:38 2001
@@ -126,6 +126,13 @@
32-bit UID support is now in place. Have fun!
+A new flag O_EXEC has been added to the open call. While a file
+is opened O_EXEC you get ETXTBSY errors if you attempt to write it.
+This allows shared libraries to have the same garanutee against
+changes as normal executables do. The permissions requirements
+to open a file O_EXEC are (a) no one has it open read/write and
+(b) that the open has execute permissions on the file.
+
Linux documentation for functions is transitioning to inline
documentation via specially-formatted comments near their
definitions in the source. These comments can be combined with the
diff -uNrX linux-ignore-files linux-2.4.12/arch/alpha/kernel/osf_sys.c linux-2.4.12.eb3/arch/alpha/kernel/osf_sys.c
--- linux-2.4.12/arch/alpha/kernel/osf_sys.c Sat Oct 13 16:18:56 2001
+++ linux-2.4.12.eb3/arch/alpha/kernel/osf_sys.c Sat Oct 13 17:15:04 2001
@@ -240,7 +240,7 @@
if (!file)
goto out;
}
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
down_write(¤t->mm->mmap_sem);
ret = do_mmap(file, addr, len, prot, flags, off);
up_write(¤t->mm->mmap_sem);
diff -uNrX linux-ignore-files linux-2.4.12/arch/arm/kernel/sys_arm.c linux-2.4.12.eb3/arch/arm/kernel/sys_arm.c
--- linux-2.4.12/arch/arm/kernel/sys_arm.c Wed Jul 25 03:08:24 2001
+++ linux-2.4.12.eb3/arch/arm/kernel/sys_arm.c Sat Oct 13 17:07:57 2001
@@ -58,7 +58,7 @@
int error = -EINVAL;
struct file * file = NULL;
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
/*
* If we are doing a fixed mapping, and address < PAGE_SIZE,
diff -uNrX linux-ignore-files linux-2.4.12/arch/cris/kernel/sys_cris.c linux-2.4.12.eb3/arch/cris/kernel/sys_cris.c
--- linux-2.4.12/arch/cris/kernel/sys_cris.c Sat Aug 11 09:50:04 2001
+++ linux-2.4.12.eb3/arch/cris/kernel/sys_cris.c Sat Oct 13 17:14:06 2001
@@ -52,7 +52,7 @@
int error = -EBADF;
struct file * file = NULL;
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
if (!(flags & MAP_ANONYMOUS)) {
file = fget(fd);
if (!file)
diff -uNrX linux-ignore-files linux-2.4.12/arch/i386/kernel/sys_i386.c linux-2.4.12.eb3/arch/i386/kernel/sys_i386.c
--- linux-2.4.12/arch/i386/kernel/sys_i386.c Sat Apr 14 13:36:44 2001
+++ linux-2.4.12.eb3/arch/i386/kernel/sys_i386.c Sat Oct 13 17:08:07 2001
@@ -48,7 +48,7 @@
int error = -EBADF;
struct file * file = NULL;
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
if (!(flags & MAP_ANONYMOUS)) {
file = fget(fd);
if (!file)
diff -uNrX linux-ignore-files linux-2.4.12/arch/ia64/ia32/sys_ia32.c linux-2.4.12.eb3/arch/ia64/ia32/sys_ia32.c
--- linux-2.4.12/arch/ia64/ia32/sys_ia32.c Sat Oct 13 16:18:57 2001
+++ linux-2.4.12.eb3/arch/ia64/ia32/sys_ia32.c Sat Oct 13 17:15:34 2001
@@ -282,7 +282,7 @@
long error = -EFAULT;
unsigned int poff;
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
prot |= PROT_EXEC;
if ((flags & MAP_FIXED) && ((addr & ~PAGE_MASK) || (offset & ~PAGE_MASK)))
diff -uNrX linux-ignore-files linux-2.4.12/arch/ia64/kernel/sys_ia64.c linux-2.4.12.eb3/arch/ia64/kernel/sys_ia64.c
--- linux-2.4.12/arch/ia64/kernel/sys_ia64.c Sat Aug 11 09:50:05 2001
+++ linux-2.4.12.eb3/arch/ia64/kernel/sys_ia64.c Sat Oct 13 17:10:39 2001
@@ -178,7 +178,7 @@
unsigned long roff;
struct file *file = 0;
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
if (!(flags & MAP_ANONYMOUS)) {
file = fget(fd);
if (!file)
diff -uNrX linux-ignore-files linux-2.4.12/arch/m68k/kernel/sys_m68k.c linux-2.4.12.eb3/arch/m68k/kernel/sys_m68k.c
--- linux-2.4.12/arch/m68k/kernel/sys_m68k.c Wed Jul 25 03:07:49 2001
+++ linux-2.4.12.eb3/arch/m68k/kernel/sys_m68k.c Sat Oct 13 17:11:13 2001
@@ -52,7 +52,7 @@
int error = -EBADF;
struct file * file = NULL;
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
if (!(flags & MAP_ANONYMOUS)) {
file = fget(fd);
if (!file)
@@ -104,7 +104,7 @@
if (a.offset & ~PAGE_MASK)
goto out;
- a.flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ a.flags &= ~(MAP_EXECUTABLE);
error = do_mmap2(a.addr, a.len, a.prot, a.flags, a.fd, a.offset >> PAGE_SHIFT);
out:
@@ -144,7 +144,7 @@
if (!file)
goto out;
}
- a.flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ a.flags &= ~(MAP_EXECUTABLE);
down_write(¤t->mm->mmap_sem);
error = do_mmap_pgoff(file, a.addr, a.len, a.prot, a.flags, pgoff);
diff -uNrX linux-ignore-files linux-2.4.12/arch/mips/kernel/irixelf.c linux-2.4.12.eb3/arch/mips/kernel/irixelf.c
--- linux-2.4.12/arch/mips/kernel/irixelf.c Sat Apr 14 13:36:44 2001
+++ linux-2.4.12.eb3/arch/mips/kernel/irixelf.c Sun Oct 14 00:35:14 2001
@@ -298,7 +298,7 @@
eppnt = elf_phdata;
for(i=0; i<interp_elf_ex->e_phnum; i++, eppnt++) {
if(eppnt->p_type == PT_LOAD) {
- int elf_type = MAP_PRIVATE | MAP_DENYWRITE;
+ int elf_type = MAP_PRIVATE;
int elf_prot = 0;
unsigned long vaddr = 0;
if (eppnt->p_flags & PF_R) elf_prot = PROT_READ;
@@ -479,7 +479,7 @@
return 0;
}
-#define EXEC_MAP_FLAGS (MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE)
+#define EXEC_MAP_FLAGS (MAP_FIXED | MAP_PRIVATE | MAP_EXECUTABLE)
static inline void map_executable(struct file *fp, struct elf_phdr *epp, int pnum,
unsigned int *estack, unsigned int *laddr,
@@ -776,7 +776,6 @@
return retval;
out_free_dentry:
- allow_write_access(interpreter);
fput(interpreter);
out_free_interp:
if (elf_interpreter)
@@ -842,7 +841,7 @@
elf_phdata->p_vaddr & 0xfffff000,
elf_phdata->p_filesz + (elf_phdata->p_vaddr & 0xfff),
PROT_READ | PROT_WRITE | PROT_EXEC,
- MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
+ MAP_FIXED | MAP_PRIVATE,
elf_phdata->p_offset & 0xfffff000);
up_write(¤t->mm->mmap_sem);
@@ -919,7 +918,7 @@
down_write(¤t->mm->mmap_sem);
retval = do_mmap(filp, (hp->p_vaddr & 0xfffff000),
(hp->p_filesz + (hp->p_vaddr & 0xfff)),
- prot, (MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE),
+ prot, (MAP_FIXED | MAP_PRIVATE),
(hp->p_offset & 0xfffff000));
up_write(¤t->mm->mmap_sem);
diff -uNrX linux-ignore-files linux-2.4.12/arch/mips/kernel/syscall.c linux-2.4.12.eb3/arch/mips/kernel/syscall.c
--- linux-2.4.12/arch/mips/kernel/syscall.c Wed Jul 25 03:07:50 2001
+++ linux-2.4.12.eb3/arch/mips/kernel/syscall.c Sat Oct 13 17:11:32 2001
@@ -62,7 +62,7 @@
int error = -EBADF;
struct file * file = NULL;
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
if (!(flags & MAP_ANONYMOUS)) {
file = fget(fd);
if (!file)
diff -uNrX linux-ignore-files linux-2.4.12/arch/mips/kernel/sysirix.c linux-2.4.12.eb3/arch/mips/kernel/sysirix.c
--- linux-2.4.12/arch/mips/kernel/sysirix.c Sat Oct 13 16:18:57 2001
+++ linux-2.4.12.eb3/arch/mips/kernel/sysirix.c Sat Oct 13 17:16:16 2001
@@ -1080,7 +1080,7 @@
}
}
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
down_write(¤t->mm->mmap_sem);
retval = do_mmap(file, addr, len, prot, flags, offset);
@@ -1640,7 +1640,7 @@
}
}
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
down_write(¤t->mm->mmap_sem);
error = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
diff -uNrX linux-ignore-files linux-2.4.12/arch/mips64/kernel/syscall.c linux-2.4.12.eb3/arch/mips64/kernel/syscall.c
--- linux-2.4.12/arch/mips64/kernel/syscall.c Sat Oct 13 16:19:43 2001
+++ linux-2.4.12.eb3/arch/mips64/kernel/syscall.c Sat Oct 13 17:11:45 2001
@@ -63,7 +63,7 @@
if (!file)
goto out;
}
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
down_write(¤t->mm->mmap_sem);
error = do_mmap(file, addr, len, prot, flags, offset);
diff -uNrX linux-ignore-files linux-2.4.12/arch/parisc/kernel/sys_parisc.c linux-2.4.12.eb3/arch/parisc/kernel/sys_parisc.c
--- linux-2.4.12/arch/parisc/kernel/sys_parisc.c Sat Apr 14 13:36:45 2001
+++ linux-2.4.12.eb3/arch/parisc/kernel/sys_parisc.c Sat Oct 13 17:13:55 2001
@@ -59,7 +59,7 @@
if (!file)
goto out;
}
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
error = do_mmap(file, addr, len, prot, flags, offset);
if (file != NULL)
fput(file);
diff -uNrX linux-ignore-files linux-2.4.12/arch/ppc/kernel/syscalls.c linux-2.4.12.eb3/arch/ppc/kernel/syscalls.c
--- linux-2.4.12/arch/ppc/kernel/syscalls.c Wed Jul 25 03:07:25 2001
+++ linux-2.4.12.eb3/arch/ppc/kernel/syscalls.c Sat Oct 13 17:12:01 2001
@@ -196,7 +196,7 @@
struct file * file = NULL;
int ret = -EBADF;
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
if (!(flags & MAP_ANONYMOUS)) {
if (!(file = fget(fd)))
goto out;
diff -uNrX linux-ignore-files linux-2.4.12/arch/s390/kernel/sys_s390.c linux-2.4.12.eb3/arch/s390/kernel/sys_s390.c
--- linux-2.4.12/arch/s390/kernel/sys_s390.c Sat Apr 14 13:36:45 2001
+++ linux-2.4.12.eb3/arch/s390/kernel/sys_s390.c Sat Oct 13 17:13:33 2001
@@ -54,7 +54,7 @@
int error = -EBADF;
struct file * file = NULL;
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
if (!(flags & MAP_ANONYMOUS)) {
file = fget(fd);
if (!file)
diff -uNrX linux-ignore-files linux-2.4.12/arch/s390x/kernel/linux32.c linux-2.4.12.eb3/arch/s390x/kernel/linux32.c
--- linux-2.4.12/arch/s390x/kernel/linux32.c Sat Oct 13 16:18:57 2001
+++ linux-2.4.12.eb3/arch/s390x/kernel/linux32.c Sat Oct 13 20:13:59 2001
@@ -2923,12 +2923,10 @@
bprm.loader = 0;
bprm.exec = 0;
if ((bprm.argc = count32(argv)) < 0) {
- allow_write_access(file);
fput(file);
return bprm.argc;
}
if ((bprm.envc = count32(envp)) < 0) {
- allow_write_access(file);
fput(file);
return bprm.envc;
}
@@ -2957,7 +2955,6 @@
out:
/* Something went wrong, return the inode and free the argument pages*/
- allow_write_access(bprm.file);
if (bprm.file)
fput(bprm.file);
@@ -4179,7 +4176,7 @@
struct file * file = NULL;
unsigned long error = -EBADF;
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
if (!(flags & MAP_ANONYMOUS)) {
file = fget(fd);
if (!file)
diff -uNrX linux-ignore-files linux-2.4.12/arch/s390x/kernel/sys_s390.c linux-2.4.12.eb3/arch/s390x/kernel/sys_s390.c
--- linux-2.4.12/arch/s390x/kernel/sys_s390.c Thu May 3 01:46:43 2001
+++ linux-2.4.12.eb3/arch/s390x/kernel/sys_s390.c Sat Oct 13 17:14:23 2001
@@ -54,7 +54,7 @@
long error = -EBADF;
struct file * file = NULL;
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
if (!(flags & MAP_ANONYMOUS)) {
file = fget(fd);
if (!file)
diff -uNrX linux-ignore-files linux-2.4.12/arch/sh/kernel/sys_sh.c linux-2.4.12.eb3/arch/sh/kernel/sys_sh.c
--- linux-2.4.12/arch/sh/kernel/sys_sh.c Sat Oct 13 16:19:45 2001
+++ linux-2.4.12.eb3/arch/sh/kernel/sys_sh.c Sat Oct 13 17:12:17 2001
@@ -89,7 +89,7 @@
int error = -EBADF;
struct file *file = NULL;
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
if (!(flags & MAP_ANONYMOUS)) {
file = fget(fd);
if (!file)
diff -uNrX linux-ignore-files linux-2.4.12/arch/sparc/kernel/sys_sparc.c linux-2.4.12.eb3/arch/sparc/kernel/sys_sparc.c
--- linux-2.4.12/arch/sparc/kernel/sys_sparc.c Thu May 3 01:46:43 2001
+++ linux-2.4.12.eb3/arch/sparc/kernel/sys_sparc.c Sat Oct 13 17:12:40 2001
@@ -240,7 +240,7 @@
if (len > TASK_SIZE - PAGE_SIZE || addr + len > TASK_SIZE - PAGE_SIZE)
goto out_putf;
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
down_write(¤t->mm->mmap_sem);
retval = do_mmap_pgoff(file, addr, len, prot, flags, pgoff);
diff -uNrX linux-ignore-files linux-2.4.12/arch/sparc/kernel/sys_sunos.c linux-2.4.12.eb3/arch/sparc/kernel/sys_sunos.c
--- linux-2.4.12/arch/sparc/kernel/sys_sunos.c Sat Oct 13 16:18:57 2001
+++ linux-2.4.12.eb3/arch/sparc/kernel/sys_sunos.c Sun Oct 14 00:35:47 2001
@@ -115,7 +115,7 @@
goto out_putf;
}
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
down_write(¤t->mm->mmap_sem);
retval = do_mmap(file, addr, len, prot, flags, off);
up_write(¤t->mm->mmap_sem);
diff -uNrX linux-ignore-files linux-2.4.12/arch/sparc64/kernel/binfmt_aout32.c linux-2.4.12.eb3/arch/sparc64/kernel/binfmt_aout32.c
--- linux-2.4.12/arch/sparc64/kernel/binfmt_aout32.c Wed Jul 25 03:08:26 2001
+++ linux-2.4.12.eb3/arch/sparc64/kernel/binfmt_aout32.c Sun Oct 14 00:36:22 2001
@@ -280,7 +280,7 @@
down_write(¤t->mm->mmap_sem);
error = do_mmap(bprm->file, N_TXTADDR(ex), ex.a_text,
PROT_READ | PROT_EXEC,
- MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE,
+ MAP_FIXED | MAP_PRIVATE | MAP_EXECUTABLE,
fd_offset);
up_write(¤t->mm->mmap_sem);
@@ -292,7 +292,7 @@
down_write(¤t->mm->mmap_sem);
error = do_mmap(bprm->file, N_DATADDR(ex), ex.a_data,
PROT_READ | PROT_WRITE | PROT_EXEC,
- MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE,
+ MAP_FIXED | MAP_PRIVATE | MAP_EXECUTABLE,
fd_offset + ex.a_text);
up_write(¤t->mm->mmap_sem);
if (error != N_DATADDR(ex)) {
@@ -372,7 +372,7 @@
down_write(¤t->mm->mmap_sem);
error = do_mmap(file, start_addr, ex.a_text + ex.a_data,
PROT_READ | PROT_WRITE | PROT_EXEC,
- MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
+ MAP_FIXED | MAP_PRIVATE,
N_TXTOFF(ex));
up_write(¤t->mm->mmap_sem);
retval = error;
diff -uNrX linux-ignore-files linux-2.4.12/arch/sparc64/kernel/sys_sparc.c linux-2.4.12.eb3/arch/sparc64/kernel/sys_sparc.c
--- linux-2.4.12/arch/sparc64/kernel/sys_sparc.c Thu May 3 01:46:44 2001
+++ linux-2.4.12.eb3/arch/sparc64/kernel/sys_sparc.c Sat Oct 13 17:13:19 2001
@@ -277,7 +277,7 @@
if (!file)
goto out;
}
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
len = PAGE_ALIGN(len);
retval = -EINVAL;
diff -uNrX linux-ignore-files linux-2.4.12/arch/sparc64/kernel/sys_sparc32.c linux-2.4.12.eb3/arch/sparc64/kernel/sys_sparc32.c
--- linux-2.4.12/arch/sparc64/kernel/sys_sparc32.c Sat Oct 13 16:21:40 2001
+++ linux-2.4.12.eb3/arch/sparc64/kernel/sys_sparc32.c Sat Oct 13 20:28:49 2001
@@ -2998,12 +2998,10 @@
bprm.loader = 0;
bprm.exec = 0;
if ((bprm.argc = count32(argv, bprm.p / sizeof(u32))) < 0) {
- allow_write_access(file);
fput(file);
return bprm.argc;
}
if ((bprm.envc = count32(envp, bprm.p / sizeof(u32))) < 0) {
- allow_write_access(file);
fput(file);
return bprm.envc;
}
@@ -3032,7 +3030,6 @@
out:
/* Something went wrong, return the inode and free the argument pages*/
- allow_write_access(bprm.file);
if (bprm.file)
fput(bprm.file);
diff -uNrX linux-ignore-files linux-2.4.12/arch/sparc64/kernel/sys_sunos32.c linux-2.4.12.eb3/arch/sparc64/kernel/sys_sunos32.c
--- linux-2.4.12/arch/sparc64/kernel/sys_sunos32.c Sat Oct 13 16:18:58 2001
+++ linux-2.4.12.eb3/arch/sparc64/kernel/sys_sunos32.c Sat Oct 13 17:16:57 2001
@@ -99,7 +99,7 @@
ret_type = flags & _MAP_NEW;
flags &= ~_MAP_NEW;
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
down_write(¤t->mm->mmap_sem);
retval = do_mmap(file,
(unsigned long) addr, (unsigned long) len,
diff -uNrX linux-ignore-files linux-2.4.12/arch/sparc64/solaris/misc.c linux-2.4.12.eb3/arch/sparc64/solaris/misc.c
--- linux-2.4.12/arch/sparc64/solaris/misc.c Sat Oct 13 16:19:45 2001
+++ linux-2.4.12.eb3/arch/sparc64/solaris/misc.c Sat Oct 13 17:17:17 2001
@@ -93,7 +93,7 @@
flags &= ~_MAP_NEW;
down_write(¤t->mm->mmap_sem);
- flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
+ flags &= ~(MAP_EXECUTABLE);
retval = do_mmap(file,
(unsigned long) addr, (unsigned long) len,
(unsigned long) prot, (unsigned long) flags, off);
Binary files linux-2.4.12/drivers/net/hamradio/soundmodem/gentbl and linux-2.4.12.eb3/drivers/net/hamradio/soundmodem/gentbl differ
diff -uNrX linux-ignore-files linux-2.4.12/fs/binfmt_aout.c linux-2.4.12.eb3/fs/binfmt_aout.c
--- linux-2.4.12/fs/binfmt_aout.c Sat Oct 13 16:21:50 2001
+++ linux-2.4.12.eb3/fs/binfmt_aout.c Sun Oct 14 00:59:39 2001
@@ -380,7 +380,7 @@
down_write(¤t->mm->mmap_sem);
error = do_mmap(bprm->file, N_TXTADDR(ex), ex.a_text,
PROT_READ | PROT_EXEC,
- MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE,
+ MAP_FIXED | MAP_PRIVATE | MAP_EXECUTABLE,
fd_offset);
up_write(¤t->mm->mmap_sem);
@@ -392,7 +392,7 @@
down_write(¤t->mm->mmap_sem);
error = do_mmap(bprm->file, N_DATADDR(ex), ex.a_data,
PROT_READ | PROT_WRITE | PROT_EXEC,
- MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE,
+ MAP_FIXED | MAP_PRIVATE | MAP_EXECUTABLE,
fd_offset + ex.a_text);
up_write(¤t->mm->mmap_sem);
if (error != N_DATADDR(ex)) {
@@ -479,7 +479,7 @@
down_write(¤t->mm->mmap_sem);
error = do_mmap(file, start_addr, ex.a_text + ex.a_data,
PROT_READ | PROT_WRITE | PROT_EXEC,
- MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
+ MAP_FIXED | MAP_PRIVATE,
N_TXTOFF(ex));
up_write(¤t->mm->mmap_sem);
retval = error;
diff -uNrX linux-ignore-files linux-2.4.12/fs/binfmt_elf.c linux-2.4.12.eb3/fs/binfmt_elf.c
--- linux-2.4.12/fs/binfmt_elf.c Sat Oct 13 16:21:50 2001
+++ linux-2.4.12.eb3/fs/binfmt_elf.c Sun Oct 14 00:38:37 2001
@@ -288,7 +288,7 @@
eppnt = elf_phdata;
for (i=0; i<interp_elf_ex->e_phnum; i++, eppnt++) {
if (eppnt->p_type == PT_LOAD) {
- int elf_type = MAP_PRIVATE | MAP_DENYWRITE;
+ int elf_type = MAP_PRIVATE;
int elf_prot = 0;
unsigned long vaddr = 0;
unsigned long k, map_addr;
@@ -642,7 +642,7 @@
if (elf_ppnt->p_flags & PF_W) elf_prot |= PROT_WRITE;
if (elf_ppnt->p_flags & PF_X) elf_prot |= PROT_EXEC;
- elf_flags = MAP_PRIVATE|MAP_DENYWRITE|MAP_EXECUTABLE;
+ elf_flags = MAP_PRIVATE|MAP_EXECUTABLE;
vaddr = elf_ppnt->p_vaddr;
if (elf_ex.e_type == ET_EXEC || load_addr_set) {
@@ -701,7 +701,6 @@
interpreter,
&interp_load_addr);
- allow_write_access(interpreter);
fput(interpreter);
kfree(elf_interpreter);
@@ -789,7 +788,6 @@
/* error cleanup */
out_free_dentry:
- allow_write_access(interpreter);
fput(interpreter);
out_free_interp:
if (elf_interpreter)
@@ -853,7 +851,7 @@
(elf_phdata->p_filesz +
ELF_PAGEOFFSET(elf_phdata->p_vaddr)),
PROT_READ | PROT_WRITE | PROT_EXEC,
- MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
+ MAP_FIXED | MAP_PRIVATE,
(elf_phdata->p_offset -
ELF_PAGEOFFSET(elf_phdata->p_vaddr)));
up_write(¤t->mm->mmap_sem);
diff -uNrX linux-ignore-files linux-2.4.12/fs/binfmt_em86.c linux-2.4.12.eb3/fs/binfmt_em86.c
--- linux-2.4.12/fs/binfmt_em86.c Sat Mar 17 16:28:29 2001
+++ linux-2.4.12.eb3/fs/binfmt_em86.c Sat Oct 13 20:12:38 2001
@@ -44,7 +44,6 @@
}
bprm->sh_bang++; /* Well, the bang-shell is implicit... */
- allow_write_access(bprm->file);
fput(bprm->file);
bprm->file = NULL;
diff -uNrX linux-ignore-files linux-2.4.12/fs/binfmt_misc.c linux-2.4.12.eb3/fs/binfmt_misc.c
--- linux-2.4.12/fs/binfmt_misc.c Sat Mar 17 16:28:29 2001
+++ linux-2.4.12.eb3/fs/binfmt_misc.c Sat Oct 13 20:12:46 2001
@@ -201,7 +201,6 @@
if (!fmt)
goto _ret;
- allow_write_access(bprm->file);
fput(bprm->file);
bprm->file = NULL;
diff -uNrX linux-ignore-files linux-2.4.12/fs/binfmt_script.c linux-2.4.12.eb3/fs/binfmt_script.c
--- linux-2.4.12/fs/binfmt_script.c Sat Mar 17 16:28:29 2001
+++ linux-2.4.12.eb3/fs/binfmt_script.c Sat Oct 13 20:12:30 2001
@@ -29,7 +29,6 @@
*/
bprm->sh_bang++;
- allow_write_access(bprm->file);
fput(bprm->file);
bprm->file = NULL;
diff -uNrX linux-ignore-files linux-2.4.12/fs/exec.c linux-2.4.12.eb3/fs/exec.c
--- linux-2.4.12/fs/exec.c Sat Oct 13 16:20:02 2001
+++ linux-2.4.12.eb3/fs/exec.c Mon Oct 15 03:46:21 2001
@@ -337,39 +337,17 @@
struct file *open_exec(const char *name)
{
+ int error;
struct nameidata nd;
- struct inode *inode;
- struct file *file;
- int err = 0;
- if (path_init(name, LOOKUP_FOLLOW|LOOKUP_POSITIVE, &nd))
- err = path_walk(name, &nd);
- file = ERR_PTR(err);
- if (!err) {
- inode = nd.dentry->d_inode;
- file = ERR_PTR(-EACCES);
- if (!(nd.mnt->mnt_flags & MNT_NOEXEC) &&
- S_ISREG(inode->i_mode)) {
- int err = permission(inode, MAY_EXEC);
- if (!err && !(inode->i_mode & 0111))
- err = -EACCES;
- file = ERR_PTR(err);
- if (!err) {
- file = dentry_open(nd.dentry, nd.mnt, O_RDONLY);
- if (!IS_ERR(file)) {
- err = deny_write_access(file);
- if (err) {
- fput(file);
- file = ERR_PTR(err);
- }
- }
-out:
- return file;
- }
- }
- path_release(&nd);
- }
- goto out;
+ /* For a real exec we cheat. We don't do permission checks
+ * for read, but we open the file for reading anyway...
+ */
+ error = open_namei(name, O_EXEC, 0, &nd);
+ if (!error)
+ return dentry_open(nd.dentry, nd.mnt, O_EXEC | O_RDONLY);
+
+ return ERR_PTR(error);
}
int kernel_read(struct file *file, unsigned long offset,
@@ -774,7 +752,6 @@
struct file * file;
unsigned long loader;
- allow_write_access(bprm->file);
fput(bprm->file);
bprm->file = NULL;
@@ -809,7 +786,6 @@
retval = fn(bprm, regs);
if (retval >= 0) {
put_binfmt(fmt);
- allow_write_access(bprm->file);
if (bprm->file)
fput(bprm->file);
bprm->file = NULL;
@@ -871,13 +847,11 @@
bprm.loader = 0;
bprm.exec = 0;
if ((bprm.argc = count(argv, bprm.p / sizeof(void *))) < 0) {
- allow_write_access(file);
fput(file);
return bprm.argc;
}
if ((bprm.envc = count(envp, bprm.p / sizeof(void *))) < 0) {
- allow_write_access(file);
fput(file);
return bprm.envc;
}
@@ -906,7 +880,6 @@
out:
/* Something went wrong, return the inode and free the argument pages*/
- allow_write_access(bprm.file);
if (bprm.file)
fput(bprm.file);
diff -uNrX linux-ignore-files linux-2.4.12/fs/file_table.c linux-2.4.12.eb3/fs/file_table.c
--- linux-2.4.12/fs/file_table.c Sat Oct 13 16:20:02 2001
+++ linux-2.4.12.eb3/fs/file_table.c Sat Oct 13 20:00:23 2001
@@ -114,6 +114,8 @@
fops_put(file->f_op);
if (file->f_mode & FMODE_WRITE)
put_write_access(inode);
+ if (file->f_mode & FMODE_EXEC)
+ allow_write_access(inode);
file_list_lock();
file->f_dentry = NULL;
file->f_vfsmnt = NULL;
diff -uNrX linux-ignore-files linux-2.4.12/fs/namei.c linux-2.4.12.eb3/fs/namei.c
--- linux-2.4.12/fs/namei.c Sat Oct 13 16:22:01 2001
+++ linux-2.4.12.eb3/fs/namei.c Mon Oct 15 03:52:33 2001
@@ -212,10 +212,10 @@
* put_write_access() releases this write permission.
* This is used for regular files.
* We cannot support write (and maybe mmap read-write shared) accesses and
- * MAP_DENYWRITE mmappings simultaneously. The i_writecount field of an inode
- * can have the following values:
- * 0: no writers, no VM_DENYWRITE mappings
- * < 0: (-i_writecount) vm_area_structs with VM_DENYWRITE set exist
+ * O_EXEC mmappings simultaneously. The i_writecount field of an inode can have
+ * the following values:
+ * 0: no writers, no executers.
+ * < 0: (-i_writecount) users are executing the file.
* > 0: (i_writecount) users are writing to the file.
*
* Normally we operate on that counter with atomic_{inc,dec} and it's safe
@@ -974,6 +974,8 @@
int count = 0;
acc_mode = ACC_MODE(flag);
+ if (flag & O_EXEC)
+ acc_mode |= MAY_EXEC;
/*
* The simplest case - just a plain lookup.
@@ -1069,6 +1071,22 @@
error = -EISDIR;
if (S_ISDIR(inode->i_mode) && (flag & FMODE_WRITE))
goto exit;
+
+ error = -EACCES;
+ if (flag & O_EXEC) {
+ if (flag & FMODE_WRITE)
+ goto exit;
+ if (nd->mnt->mnt_flags & MNT_NOEXEC)
+ goto exit;
+ if (!S_ISREG(inode->i_mode))
+ goto exit;
+ /* The follow check for an executable bit was taken
+ * from the old open_exec. I don't think it is either
+ * needed or makes any sense but just in case...
+ */
+ if (!(inode->i_mode & S_IXUGO))
+ goto exit;
+ }
error = permission(inode,acc_mode);
if (error)
diff -uNrX linux-ignore-files linux-2.4.12/fs/open.c linux-2.4.12.eb3/fs/open.c
--- linux-2.4.12/fs/open.c Sat Oct 13 16:21:50 2001
+++ linux-2.4.12.eb3/fs/open.c Mon Oct 15 03:45:49 2001
@@ -612,6 +612,8 @@
* 11 - read-write
* for the internal routines (ie open_namei()/follow_link() etc). 00 is
* used by symlinks.
+ *
+ * The mapping is: 00 -> 01 , 01 -> 10, 10 -> 11, 11 -> 11
*/
struct file *filp_open(const char * filename, int flags, int mode)
{
@@ -625,6 +627,11 @@
namei_flags |= 2;
error = open_namei(filename, namei_flags, mode, &nd);
+ /* For now don't allow O_EXEC from userspace when anyone can
+ * write to the file.
+ */
+ if (!error && (flags & O_EXEC) && (nd.dentry->d_inode->i_mode & S_IWUGO))
+ error = -EACCES;
if (!error)
return dentry_open(nd.dentry, nd.mnt, flags);
@@ -644,6 +651,8 @@
goto cleanup_dentry;
f->f_flags = flags;
f->f_mode = (flags+1) & O_ACCMODE;
+ if (flags & O_EXEC)
+ f->f_mode |= FMODE_EXEC;
inode = dentry->d_inode;
if (f->f_mode & FMODE_WRITE) {
error = get_write_access(inode);
@@ -673,6 +682,12 @@
goto cleanup_all;
}
f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
+
+ if (f->f_mode & FMODE_EXEC) {
+ error = deny_write_access(f);
+ if (error)
+ goto cleanup_all;
+ }
return f;
diff -uNrX linux-ignore-files linux-2.4.12/include/asm-i386/fcntl.h linux-2.4.12.eb3/include/asm-i386/fcntl.h
--- linux-2.4.12/include/asm-i386/fcntl.h Sat Oct 13 16:20:04 2001
+++ linux-2.4.12.eb3/include/asm-i386/fcntl.h Sun Oct 14 00:08:01 2001
@@ -7,6 +7,7 @@
#define O_RDONLY 00
#define O_WRONLY 01
#define O_RDWR 02
+#define O_EXEC 04 /* generate ETXTBSY on writes */
#define O_CREAT 0100 /* not fcntl */
#define O_EXCL 0200 /* not fcntl */
#define O_NOCTTY 0400 /* not fcntl */
diff -uNrX linux-ignore-files linux-2.4.12/include/linux/fs.h linux-2.4.12.eb3/include/linux/fs.h
--- linux-2.4.12/include/linux/fs.h Sat Oct 13 17:26:25 2001
+++ linux-2.4.12.eb3/include/linux/fs.h Sun Oct 14 00:42:30 2001
@@ -73,6 +73,7 @@
#define FMODE_READ 1
#define FMODE_WRITE 2
+#define FMODE_EXEC 4
#define READ 0
#define WRITE 1
diff -uNrX linux-ignore-files linux-2.4.12/kernel/fork.c linux-2.4.12.eb3/kernel/fork.c
--- linux-2.4.12/kernel/fork.c Sat Oct 13 16:20:09 2001
+++ linux-2.4.12.eb3/kernel/fork.c Sat Oct 13 20:34:56 2001
@@ -168,8 +168,6 @@
if (file) {
struct inode *inode = file->f_dentry->d_inode;
get_file(file);
- if (tmp->vm_flags & VM_DENYWRITE)
- atomic_dec(&inode->i_writecount);
/* insert tmp into the share list, just after mpnt */
spin_lock(&inode->i_mapping->i_shared_lock);
diff -uNrX linux-ignore-files linux-2.4.12/mm/mmap.c linux-2.4.12.eb3/mm/mmap.c
--- linux-2.4.12/mm/mmap.c Sat Oct 13 16:21:52 2001
+++ linux-2.4.12.eb3/mm/mmap.c Sun Oct 14 00:50:32 2001
@@ -98,9 +98,6 @@
struct file * file = vma->vm_file;
if (file) {
- struct inode *inode = file->f_dentry->d_inode;
- if (vma->vm_flags & VM_DENYWRITE)
- atomic_inc(&inode->i_writecount);
if(vma->vm_next_share)
vma->vm_next_share->vm_pprev_share = vma->vm_pprev_share;
*vma->vm_pprev_share = vma->vm_next_share;
@@ -205,7 +202,6 @@
_trans(prot, PROT_EXEC, VM_EXEC);
flag_bits =
_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN) |
- _trans(flags, MAP_DENYWRITE, VM_DENYWRITE) |
_trans(flags, MAP_EXECUTABLE, VM_EXECUTABLE);
return prot_bits | flag_bits;
#undef _trans
@@ -310,9 +306,6 @@
struct address_space *mapping = inode->i_mapping;
struct vm_area_struct **head;
- if (vma->vm_flags & VM_DENYWRITE)
- atomic_dec(&inode->i_writecount);
-
head = &mapping->i_mmap;
if (vma->vm_flags & VM_SHARED)
head = &mapping->i_mmap_shared;
@@ -395,7 +388,6 @@
struct mm_struct * mm = current->mm;
struct vm_area_struct * vma, * prev;
unsigned int vm_flags;
- int correct_wcount = 0;
int error;
rb_node_t ** rb_link, * rb_parent;
@@ -526,12 +518,6 @@
error = -EINVAL;
if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
goto free_vma;
- if (vm_flags & VM_DENYWRITE) {
- error = deny_write_access(file);
- if (error)
- goto free_vma;
- correct_wcount = 1;
- }
vma->vm_file = file;
get_file(file);
error = file->f_op->mmap(file, vma);
@@ -551,8 +537,6 @@
addr = vma->vm_start;
vma_link(mm, vma, prev, rb_link, rb_parent);
- if (correct_wcount)
- atomic_inc(&file->f_dentry->d_inode->i_writecount);
out:
mm->total_vm += len >> PAGE_SHIFT;
@@ -563,8 +547,6 @@
return addr;
unmap_and_free_vma:
- if (correct_wcount)
- atomic_inc(&file->f_dentry->d_inode->i_writecount);
vma->vm_file = NULL;
fput(file);
@@ -946,11 +928,9 @@
* so release them, and unmap the page range..
* If the one of the segments is only being partially unmapped,
* it will put new vm_area_struct(s) into the address space.
- * In that case we have to be careful with VM_DENYWRITE.
*/
while ((mpnt = free) != NULL) {
unsigned long st, end, size;
- struct file *file = NULL;
free = free->vm_next;
@@ -959,11 +939,6 @@
end = end > mpnt->vm_end ? mpnt->vm_end : end;
size = end - st;
- if (mpnt->vm_flags & VM_DENYWRITE &&
- (st != mpnt->vm_start || end != mpnt->vm_end) &&
- (file = mpnt->vm_file) != NULL) {
- atomic_dec(&file->f_dentry->d_inode->i_writecount);
- }
remove_shared_vm_struct(mpnt);
mm->map_count--;
@@ -973,8 +948,6 @@
* Fix the mapping, and free the old area if it wasn't reused.
*/
extra = unmap_fixup(mm, mpnt, st, size, extra);
- if (file)
- atomic_inc(&file->f_dentry->d_inode->i_writecount);
}
validate_mm(mm);
Aaron Lehmann wrote:
> > > But it does have the advantage of allowing the sharing of memory, does
> > > it not?
> >
> > Only if you are going to write to the data.
>
> Why? If gcc and another application read the source file with an
> mmap() with the right parameters (read-only), it would only be shared
> between them, as I understand it. If they both read() the file into
> private buffers those can not be shared.
And furthermore, the private buffers cannot be shared with the
filesystem's page cache.
-- Jamie
Richard Gooch wrote:
> > There are applications (GCC comes to mind) which are using mmap() to
> > read files now because it is measurably faster than read(), for
> > sufficiently large source files.
>
> So? MAP_PRIVATE is just fine for these. The simple solution if you
> care about an edit in the middle of a compile is to have your editor
> write a new file and do an atomic rename. No half-and-half data
> problems, and the VM logic is kept simple (well, relative to what we
> have now;-).
This does not work. Example:
1. JamieEmacs loads file using MAP_PRIVATE.
2. Something else writes to the file.
3. Scroll to the bottom of the file in JamieEmacs. It displays some
of the newly written data, though not all of it.
--> Wrong editor semantics.
Note that the something else which modifies the file in step 2 is not an
editor written especially to cooperate with JamieEmacs. So it does not
do renaming -- why should it? You might have just loaded
/var/log/messages into JamieEmacs, for example, and syslog is the
program in step 2.
What you need is read() or an equivalent. I don't know of a
memory-efficient equivalent to read. MAP_PRIVATE doesn't do it because
you have to dirty every page before you can be sure that file
modifications won't change your view of the data, and the dirtying
creates just as many page duplicates as read() does.
cheers,
-- Jamie
Linus Torvalds wrote:
> Now, that's a traditional mmap(), though, which has more overhead than a
> "read-with-PAGE_COPY" would have. The pure mmap() approach has the actual
> page fault overhead too, along with having to do "fstat()" and "munmap()".
read() into a freshly allocated buffer (as you do for any large file)
has page fault overhead too -- to allocate zero pages. It may be a
greater overhead, because the pages are unnecessarily zeroed.
read-with-PAGE_COPY may eliminate both of these overheads.
But then, even without PAGE_COPY, a read() which looks at the receiving
process' page tables may be able to eliminate the page faults, by simply
allocating (without zeroing) pages in kernel context prior to copying
the data there.
-- Jamie
> Just to reiterate I see this as a solution to two problems
> 1) It adds an additional safety check that shared libraries won't
> mutate under you.
Which prevents a user with rights doing so deliberately.
> 2) It allows user space access to the security policy information
> regarding executables. Allowing ld-linux.so to refust to
> execute binaries, and shared libaries on a filesystem mounted
> noexec.
Which is mostly useless anyway since anyone can write an ld-linux that
doesn't check providing the binary is readable. noexec is basically a weird
ancient unixism that is usless.
> My biggest unresolved issue is which numbers to choose for O_EXEC on
> every platform. As the DENYWRITE code is cleaner in open than in mmap.
And the fact that open has side effects.
Alan
On Mon, 15 Oct 2001, Jamie Lokier wrote:
> This does not work. Example:
>
> 1. JamieEmacs loads file using MAP_PRIVATE.
> 2. Something else writes to the file.
> 3. Scroll to the bottom of the file in JamieEmacs. It displays some
> of the newly written data, though not all of it.
>
> --> Wrong editor semantics.
--> Wrong permissions or hopelessly crappy source control system.
At point 2 you are _already_ screwed. Depending on who hits (hell,
what's the equivalent of :x in Emacsese?) first, one of you is
going to lose results of editing. Doctor, it hurts when I do it...
If you want versioning - use source control system. Or go play
with DEC cra^WOSes. In RSX that "feature" sucked (and so did
editor semantics, but that's a separate story).
Without versioning - see above.
On Mon, 15 Oct 2001, Alan Cox wrote:
> Which is mostly useless anyway since anyone can write an ld-linux that
> doesn't check providing the binary is readable. noexec is basically a weird
> ancient unixism that is usless.
Anyone can write it, but what the hell will he do without write access to
any place that wouldn't be mounted noexec? Environment can be restricted
even if you give them shell...
> Anyone can write it, but what the hell will he do without write access to
> any place that wouldn't be mounted noexec? Environment can be restricted
> even if you give them shell...
He will type "perl" and interactively issue any damn syscall he likes
subject to the normal permissions rules. Noexec is only useful for a user
given virtually nothing.
ALan
On Mon, 15 Oct 2001, Alan Cox wrote:
> > Anyone can write it, but what the hell will he do without write access to
> > any place that wouldn't be mounted noexec? Environment can be restricted
> > even if you give them shell...
>
> He will type "perl" and interactively issue any damn syscall he likes
> subject to the normal permissions rules. Noexec is only useful for a user
> given virtually nothing.
... and will hit "permission denied" on attempt to exec /usr/bin/perl.
Blanket noexec on /usr instance mounted in his chroot with selective turning
the thing off on some binaries. And yes, I realize that one can always
hunt for buffer overruns in sh(1)...
Alexander Viro wrote:
> > This does not work. Example:
> >
> > 1. JamieEmacs loads file using MAP_PRIVATE.
> > 2. Something else writes to the file.
> > 3. Scroll to the bottom of the file in JamieEmacs. It displays some
> > of the newly written data, though not all of it.
> >
> > --> Wrong editor semantics.
>
> --> Wrong permissions or hopelessly crappy source control system.
>
> At point 2 you are _already_ screwed. Depending on who hits (hell,
> what's the equivalent of :x in Emacsese?) first, one of you is
> going to lose results of editing. Doctor, it hurts when I do it...
I am _not_ saving anything. Viewing
/home/web/automatically_generated_every_hour.html from a particular
moment is a perfectly reasonable thing to do in Emacs, and it's a
perfectly reasonable thing to do in Less and Midnight Commander and
Mozilla for that matter.
_If_ I hit :x (in Vi-mode in Emacs ;-) then I expect the editor to warn
me that the file was updated by some other program. Some editors will
warn before that. Some will reload the file automatically if I haven't
made changed within the editor.
However, at all times I expect a consistent display of the file either
from read time, or from the current time. _Never_ some unparsable,
invalid, mixed up combination of pages.
> If you want versioning - use source control system. Or go play
> with DEC cra^WOSes. In RSX that "feature" sucked (and so did
> editor semantics, but that's a separate story).
I do _not_ want versioning. I want to load a file into an editor and
look at _that_ snapshot, at my leisure. (Almost) every editor ever
written works this way, and I am quite happy with it.
read() gives the correct semantics.
There is potential to make read() more efficient, both in execution time
and in memory consumption.
Enjoy :-)
-- Jamie