Alan,
This enables the ips driver to build:
--- linux/drivers/scsi/ips.c.orig Sun Apr 7 11:01:00 2002
+++ linux/drivers/scsi/ips.c Sun Apr 7 11:35:31 2002
@@ -543,7 +543,8 @@
}
return (1);
-
+}
+
__setup("ips=", ips_setup);
#else
@@ -579,10 +580,10 @@
}
}
}
+}
#endif
-}
/****************************************************************************/
/* */
And, unless this is reversed the OpenAFS kernel module won't load (it
needs sys_call_table.):
diff -u --new-file --recursive --exclude-from /usr/src/exclude linux.19p5/kernel/ksyms.c linux.19pre5-ac3/kernel/ksyms.c
--- linux.19p5/kernel/ksyms.c Thu Apr 4 13:21:17 2002
+++ linux.19pre5-ac3/kernel/ksyms.c Fri Apr 5 14:02:06 2002
@@ -469,9 +475,6 @@
EXPORT_SYMBOL(simple_strtoull);
EXPORT_SYMBOL(system_utsname); /* UTS data */
EXPORT_SYMBOL(uts_sem); /* UTS semaphore */
-#ifndef __mips__
-EXPORT_SYMBOL(sys_call_table);
-#endif
EXPORT_SYMBOL(machine_restart);
EXPORT_SYMBOL(machine_halt);
EXPORT_SYMBOL(machine_power_off);
Regards,
Steve
On Sun, Apr 07, 2002 at 12:43:57PM -0400, Steven N. Hirsch wrote:
> And, unless this is reversed the OpenAFS kernel module won't load (it
> needs sys_call_table.):
sys_call_table was unexported for a reason - OpenAFS is broken by design
if it messes with the syscall table.
Christoph
In article <[email protected]> you wrote:
> On Sun, Apr 07, 2002 at 12:43:57PM -0400, Steven N. Hirsch wrote:
>> And, unless this is reversed the OpenAFS kernel module won't load (it
>> needs sys_call_table.):
>
> sys_call_table was unexported for a reason - OpenAFS is broken by design
> if it messes with the syscall table.
it replaces/overrides existing syscalls from a module. I'd call that
broken by design yes.
> And, unless this is reversed the OpenAFS kernel module won't load (it
> needs sys_call_table.):
Correct. There was agreement a very long time ago that code should not patch
the syscall table (for one its not safe). AFS probably needs fixing so the
AFS syscall hook is exported portably and nicely in the syscall code.
This wants fixing in 2.5 too - basically
static int (*afs_syscall)(...);
sys_afs_syscall(...)
{
if(afs_syscall)
return afs_syscall(....)
return -ENOSYS;
}
EXPORT_SYMBOL(afs_syscall)
Alan
In article <[email protected]> you wrote:
> This wants fixing in 2.5 too - basically
>
> static int (*afs_syscall)(...);
> sys_afs_syscall(...)
> {
> if(afs_syscall)
> return afs_syscall(....)
> return -ENOSYS;
> }
I think it wants addin a lock around it vs module unload....
On Sun, Apr 07, 2002 at 06:42:05PM +0100, Alan Cox wrote:
> > And, unless this is reversed the OpenAFS kernel module won't load (it
> > needs sys_call_table.):
>
> Correct. There was agreement a very long time ago that code should not patch
> the syscall table (for one its not safe). AFS probably needs fixing so the
> AFS syscall hook is exported portably and nicely in the syscall code.
Sure, AFS can do that ...
But what about the recent discussion on the removal of sys_call_table ?
(I believe it was along the lines of "it's ugly, prevent it ...", "ah,
but it has real uses, so why can't it stay as deprecated/unadvised ?"
"*no response*").
I'm a bit disappointed this has just gone in without any real discussion
on the usefulness of this for certain circumstances :(
regards
john
--
"I never understood what's so hard about picking a unique
first and last name - and not going beyond the 6 character limit."
- Toon Moene
On Sun, 7 Apr 2002 [email protected] wrote:
> In article <[email protected]> you wrote:
> > This wants fixing in 2.5 too - basically
> >
> > static int (*afs_syscall)(...);
> > sys_afs_syscall(...)
> > {
> > if(afs_syscall)
> > return afs_syscall(....)
> > return -ENOSYS;
> > }
>
> I think it wants addin a lock around it vs module unload....
I suspect that equivalent of sys_nfsservctl() would be a better way,
actually...
> I'm a bit disappointed this has just gone in without any real discussion
> on the usefulness of this for certain circumstances :(
How about "there are no correct users". Its basically impossible to patch
the syscall table safely anyway. As Arjan pointed out there are races
against module unload that on SMP are basically incurable. Doing the
right hooks makes the AFS code portable which is a big win.
On Sun, Apr 07, 2002 at 08:18:16PM +0100, Alan Cox wrote:
> How about "there are no correct users". Its basically impossible to patch
> the syscall table safely anyway. As Arjan pointed out there are races
> against module unload that on SMP are basically incurable.
can_unload == FALSE
> Doing the right hooks makes the AFS code portable which is a big win.
Definitely. I'm not for a minute arguing that the nfsservctl-style thing
is not how it should be done for those cases.
I'll genuinely take on board advice on how I can profile all the system
via x86 perf counters efficiently without having to patch the kernel.
The old way just uses sys_call_table. So what do I do now ?
I've actually *tried* doing the /proc-reading thing. It's very low
resolution and/or too slow.
regards
john
--
"I never understood what's so hard about picking a unique
first and last name - and not going beyond the 6 character limit."
- Toon Moene
> I'll genuinely take on board advice on how I can profile all the system
> via x86 perf counters efficiently without having to patch the kernel.
> The old way just uses sys_call_table. So what do I do now ?
The obvious thing is to represent it as a device. I'm not familiar enough
with the existing perfctr work to know how well that works out.
> But what about the recent discussion on the removal of sys_call_table ?
>
> (I believe it was along the lines of "it's ugly, prevent it ...", "ah,
> but it has real uses, so why can't it stay as deprecated/unadvised ?"
> "*no response*").
>
> I'm a bit disappointed this has just gone in without any real discussion
> on the usefulness of this for certain circumstances :(
Removing it in the -ac tree is a good way to stimulate discussion and
fixing the code that relies on it (except for the 99% of code relying on it
which is cracker authored trojans)
On Sun, Apr 07, 2002 at 08:42:48PM +0100, Alan Cox wrote:
> > I'll genuinely take on board advice on how I can profile all the system
> > via x86 perf counters efficiently without having to patch the kernel.
> > The old way just uses sys_call_table. So what do I do now ?
>
> The obvious thing is to represent it as a device. I'm not familiar enough
> with the existing perfctr work to know how well that works out.
The system call tracking is only used to associate a particular EIP with
a particular offset in some binary image. There's no other efficient
method to capture the mmap() calls for these images, for everything
running. ptrace() is only really useful for a small number of processes,
and is slow. Offline post-analysis isn't possible. There is no
API for getting access to this information.
Removing sys_call_table from exports won't have any positive effect.
Using it has always been "well, you're on your own" - if there is a
really good reason it needs to be changed, fine; but just changing it
because it's not supposed to be used isn't a good enough reason when
there is actually a couple of niche cases where it's the only option.
imho,
john
--
"I never understood what's so hard about picking a unique
first and last name - and not going beyond the 6 character limit."
- Toon Moene
On Sun, 7 Apr 2002, Alan Cox wrote:
> > But what about the recent discussion on the removal of sys_call_table ?
> >
> > (I believe it was along the lines of "it's ugly, prevent it ...", "ah,
> > but it has real uses, so why can't it stay as deprecated/unadvised ?"
> > "*no response*").
> >
> > I'm a bit disappointed this has just gone in without any real discussion
> > on the usefulness of this for certain circumstances :(
>
> Removing it in the -ac tree is a good way to stimulate discussion and
> fixing the code that relies on it (except for the 99% of code relying on it
> which is cracker authored trojans)
I agree. I've forwarded the information to the openafs folks. Given
Derek and Derrick's respective levels of energy, it's likely already fixed
in the CVS tree.
On Sun, Apr 07, 2002 at 08:18:16PM +0100, Alan Cox wrote:
> > I'm a bit disappointed this has just gone in without any real discussion
> > on the usefulness of this for certain circumstances :(
>
> How about "there are no correct users". Its basically impossible to patch
> the syscall table safely anyway. As Arjan pointed out there are races
> against module unload that on SMP are basically incurable. Doing the
> right hooks makes the AFS code portable which is a big win.
What they do in the syscall is still questionable in my opinion. AFS
wants to have a process authentication group (PAG) associated with
processes. The syscall rewrites some fields in the task structure,
basically adds 'hidden' entries to the groups array. They should
probably use something like the task-ornaments patch (was that by Dave
Howells?), which is what I plan to use for Coda whenever they get merged
into the mainline.
Both Coda and AFS have semantically quite similar requirements for the
PAG indentifier, a generic solution is probably better than having
random modules hacking their stuff into the syscall table and task
structures, which is why I do not consider their solution the right one.
Either add getpag/newpag natively (good for yearly flamefests in
linux-kernel), or the more generic task-ornaments so I can make a
trivial module that adds /dev/pag, semantics could be as simple as
reading returns the current pag, and writing adds a new pag as a
task-ornament.
Then both Coda and AFS can use the common mechanism and we'll get things
like PAM support and PAG aware daemons more quickly and consistently.
Anything that currently relies on setuid/setgid would f.i. benefit from
this on Coda and AFS type filesystems, as we can tell the difference
between administrator, random hacker, mail delivery process, nameserver,
cron daemon even when they all have the uid 'root'.
Jan
On Sun, Apr 07, 2002 at 08:49:17PM +0100, Alan Cox wrote:
> Removing it in the -ac tree is a good way to stimulate discussion
OK
> fixing the code that relies on it (except for the 99% of code relying on it
> which is cracker authored trojans)
No doubt, but it's not much harder to look at nm vmlinux or System.map,
so I don't see the security angle...
I'd be happy to bear the brunt of users moaning at me because they now
have to apply a kernel patch (and I have to maintain it ...), iff there
was some strongly technical reason the code has to change.
regards
john
--
"I never understood what's so hard about picking a unique
first and last name - and not going beyond the 6 character limit."
- Toon Moene
> The system call tracking is only used to associate a particular EIP with
> a particular offset in some binary image. There's no other efficient
> method to capture the mmap() calls for these images, for everything
> running. ptrace() is only really useful for a small number of processes,
> and is slow. Offline post-analysis isn't possible. There is no
> API for getting access to this information.
Ok, so you have a real reason for dealing with it
> Removing sys_call_table from exports won't have any positive effect.
> Using it has always been "well, you're on your own" - if there is a
> really good reason it needs to be changed, fine; but just changing it
> because it's not supposed to be used isn't a good enough reason when
> there is actually a couple of niche cases where it's the only option.
Lets see if we can sort out AFS and the like then come back to that one. I
think you may have a valid point. If 2.5 has EXPORT_SYMBOL_INTERNAL it
gets a lot easier.
> Either add getpag/newpag natively (good for yearly flamefests in
> linux-kernel), or the more generic task-ornaments so I can make a
> trivial module that adds /dev/pag, semantics could be as simple as
> reading returns the current pag, and writing adds a new pag as a
> task-ornament.
Oh look, more dirt is falling out now we shook the tree a little. I have
zero problems with the PAG. We also need an luid and some other related things
in the future to do strict resource management on big systems (think of it
in this case as an accounting charge code)
> > fixing the code that relies on it (except for the 99% of code relying on it
> > which is cracker authored trojans)
>
> No doubt, but it's not much harder to look at nm vmlinux or System.map,
> so I don't see the security angle...
Thats not why it was pulled out. Its a deliberate attempt to find out who
is patching the syscall table and sort the results out.
And btw - I've not submitted this to Marcelo, its not something I expect to
see sprung on people in 2.4.19!
On Sun, Apr 07, 2002 at 08:41:14PM +0100, John Levon wrote:
> On Sun, Apr 07, 2002 at 08:49:17PM +0100, Alan Cox wrote:
>
> > Removing it in the -ac tree is a good way to stimulate discussion
>
> OK
>
> > fixing the code that relies on it (except for the 99% of code relying on it
> > which is cracker authored trojans)
>
> No doubt, but it's not much harder to look at nm vmlinux or System.map,
> so I don't see the security angle...
>
> I'd be happy to bear the brunt of users moaning at me because they now
> have to apply a kernel patch (and I have to maintain it ...), iff there
> was some strongly technical reason the code has to change.
I'd like to second that. syscalltrack (http://syscalltrack.sf.net)
hijacks syscall entries in the sys_call_table as well, because we
want it to work as a module and not require patching the kernel. Our
solution to the module unload race on syscall de-hijacking is simple,
splitting the system call hijacking code into a single small module
which once loaded cannot be unloaded.
So please keep the sys_call_table exported and marked as "ugh, not
portable and racy, please dont hijack system calls unless you really
have to" unless there's a strongly technical reason otherwise. Our
users (all 7 of them) will appreciate it ;)
--
The ill-formed Orange
Fails to satisfy the eye: http://vipe.technion.ac.il/~mulix/
Segmentation fault. http://syscalltrack.sf.net/
> hijacks syscall entries in the sys_call_table as well, because we
> want it to work as a module and not require patching the kernel. Our
> solution to the module unload race on syscall de-hijacking is simple,
> splitting the system call hijacking code into a single small module
> which once loaded cannot be unloaded.
So your small module can export a function called
patch_syscall(NR_foo, function);
Now you can put the arch specific syscall patching code into your small
common module and its cleaner anyway ?
Alan
On Sun, Apr 07, 2002 at 09:01:47PM +0100, Alan Cox wrote:
> > Either add getpag/newpag natively (good for yearly flamefests in
> > linux-kernel), or the more generic task-ornaments so I can make a
> > trivial module that adds /dev/pag, semantics could be as simple as
> > reading returns the current pag, and writing adds a new pag as a
> > task-ornament.
>
> Oh look, more dirt is falling out now we shook the tree a little. I have
> zero problems with the PAG. We also need an luid and some other related things
> in the future to do strict resource management on big systems (think of it
> in this case as an accounting charge code)
Correct, there is already a whole bunch of ID's and associated with a
task structure, and each has their own little niche.
process id
parent process id
process group id
session id
thread group id
session group leader
user id (4 flavours)
group id (also 4 flavours)
groups (array)
a 'user' struct
various 'thread group tracking' identifiers
journalling filesystem info (whose? ext3? reiserfs? XFS? what if I
use multiple journalling filesystems?)
(and there is probably more)
And there is a continuing request to add even more things like,
process authentication group id
login user id
etc. etc.
And all of these have different requirements and lifetimes. I believe
task-ornaments are a pretty clean way of allowing most this kind of data
to be added dynamically if it is needed or used. If the core kernel
doesn't need some 'task identifying field', it probably should not have
ended up in the task struct. But until recently there simply was no
other solution if one needed process related information that is shared
or inherited across a fork.
Jan
John Levon <[email protected]> writes:
> On Sun, Apr 07, 2002 at 08:49:17PM +0100, Alan Cox wrote:
>
> > Removing it in the -ac tree is a good way to stimulate discussion
>
> OK
>
> > fixing the code that relies on it (except for the 99% of code relying on it
> > which is cracker authored trojans)
>
> No doubt, but it's not much harder to look at nm vmlinux or System.map,
> so I don't see the security angle...
>
> I'd be happy to bear the brunt of users moaning at me because they now
> have to apply a kernel patch (and I have to maintain it ...), iff there
> was some strongly technical reason the code has to change.
Deep technical reason there are architectures where patching the
system call table does not work.
Eric
On Sun, Apr 07, 2002 at 09:29:01PM +0100, Alan Cox wrote:
> > hijacks syscall entries in the sys_call_table as well, because we
> > want it to work as a module and not require patching the kernel. Our
> > solution to the module unload race on syscall de-hijacking is simple,
> > splitting the system call hijacking code into a single small module
> > which once loaded cannot be unloaded.
>
> So your small module can export a function called
> patch_syscall(NR_foo, function);
>
> Now you can put the arch specific syscall patching code into your small
> common module and its cleaner anyway ?
Right, this module (syscall_hijack.o) currently has the interface:
int hijack_syscall_before(int syscall_id, func_ptr func);
int hijack_syscall_after(int syscall_id, func_ptr func);
int release_syscall_before(int syscall_id);
int release_syscall_after(int syscall_id);
where 'before' and 'after' correspond to a hook which should run
before the original system call is invoked (allowing it to specify
that the original system call should not be executed) or after the
original system call is invoked (allowing it access to its return
value).
We only support i386 (and uml) at the moment, because of the problems
with hijacking system calls on other architectures and because none of
us have access to other architectures to test the code on.
I recall reading in the archives that linus objected to modules
hijacking system calls on the grounds that it allows binary only
modules to completely subvert the way the kernel behaves towards
userspace. What if such a mechanism existed, however, and was only
exported to modules with a suitable license (modules which don't taint
the kernel). Would that be feasible?
--
The ill-formed Orange
Fails to satisfy the eye: http://vipe.technion.ac.il/~mulix/
Segmentation fault. http://syscalltrack.sf.net/
> userspace. What if such a mechanism existed, however, and was only
> exported to modules with a suitable license (modules which don't taint
> the kernel). Would that be feasible?
I think I'd rather discourage it. There are times its appropriate but very
few. Better that people figure out the right approach
Alan
--
"Who killed more Americans - Al Queda or the cigarette industry ?"
-- Anon
Alan Cox <[email protected]> writes:
> > The system call tracking is only used to associate a particular EIP with
> > a particular offset in some binary image. There's no other efficient
> > method to capture the mmap() calls for these images, for everything
> > running. ptrace() is only really useful for a small number of processes,
> > and is slow. Offline post-analysis isn't possible. There is no
> > API for getting access to this information.
>
> Ok, so you have a real reason for dealing with it
pice (another kernel debugger) needs it also, it's not only oprofile.
I think it is a bad idea to remove it. It just means that these programs
will access it via System.map instead of an exported symbol. It doesn't change
anything, just makes life harder for some people.
> Lets see if we can sort out AFS and the like then come back to that one. I
> think you may have a valid point. If 2.5 has EXPORT_SYMBOL_INTERNAL it
> gets a lot easier.
That doesn't help the oprofile users who want to use it in 2.4 now.
-Andi
At 21:23 07/04/02, Muli Ben-Yehuda wrote:
>On Sun, Apr 07, 2002 at 09:29:01PM +0100, Alan Cox wrote:
> > > hijacks syscall entries in the sys_call_table as well, because we
> > > want it to work as a module and not require patching the kernel. Our
> > > solution to the module unload race on syscall de-hijacking is simple,
> > > splitting the system call hijacking code into a single small module
> > > which once loaded cannot be unloaded.
> >
> > So your small module can export a function called
> > patch_syscall(NR_foo, function);
> >
> > Now you can put the arch specific syscall patching code into your small
> > common module and its cleaner anyway ?
>
>Right, this module (syscall_hijack.o) currently has the interface:
>
>int hijack_syscall_before(int syscall_id, func_ptr func);
>int hijack_syscall_after(int syscall_id, func_ptr func);
>
>int release_syscall_before(int syscall_id);
>int release_syscall_after(int syscall_id);
>
>where 'before' and 'after' correspond to a hook which should run
>before the original system call is invoked (allowing it to specify
>that the original system call should not be executed) or after the
>original system call is invoked (allowing it access to its return
>value).
[snip]
So are you coping with someone hijacking YOU as well between calls to
hijack_syscall_* and release_syscall_*? Or would that trash the caller chain?
From your interface I would assume yes, but just making sure...
Anton
--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
IRC: #ntfs on irc.openprojects.net / ICQ: 8561279
WWW: http://www-stu.christs.cam.ac.uk/~aia21/
On Sun, Apr 07, 2002 at 02:10:42PM -0600, Eric W. Biederman wrote:
> Deep technical reason there are architectures where patching the
> system call table does not work.
Everyone knows that. This is just one of the reasons it will never be
*encouraged*, not a reason for disabling it (when it doesn't hurt to
leave it as is).
regards
john
--
"I never understood what's so hard about picking a unique
first and last name - and not going beyond the 6 character limit."
- Toon Moene
On Sun, Apr 07, 2002 at 06:42:05PM +0100, Alan Cox wrote:
> > And, unless this is reversed the OpenAFS kernel module won't load (it
> > needs sys_call_table.):
>
> Correct. There was agreement a very long time ago that code should not patch
> the syscall table (for one its not safe). AFS probably needs fixing so the
> AFS syscall hook is exported portably and nicely in the syscall code.
I am really not an expert on kernel-programming but I remember that
there was a security-hole in the ptrace-code with which one a local user
could gain root access. And there was a little kernel-modul with a
wrapper-function for the ptrace-syscall that made traces only possible
if the user who was calling this syscall was root. So if I understand
right if we don't export the syscall-table it is impossible to write
such syscall-wrapper-functions and it requires to recompile the kernel
and reboot the machiene to fix such an security-hole.
So wouldn't it be better to export the syscall-table and just write into
the documentation that it is not a good idea to manipulate syscalls or
write a compiler-makro that gives out a warning when such a module is
beeing compiled.
On Mon, Apr 08, 2002 at 12:03:06AM +0100, Anton Altaparmakov wrote:
> At 21:23 07/04/02, Muli Ben-Yehuda wrote:
> >Right, this module (syscall_hijack.o) currently has the interface:
> >
> >int hijack_syscall_before(int syscall_id, func_ptr func);
> >int hijack_syscall_after(int syscall_id, func_ptr func);
> >
> >int release_syscall_before(int syscall_id);
> >int release_syscall_after(int syscall_id);
> >
> >where 'before' and 'after' correspond to a hook which should run
> >before the original system call is invoked (allowing it to specify
> >that the original system call should not be executed) or after the
> >original system call is invoked (allowing it access to its return
> >value).
> [snip]
>
> So are you coping with someone hijacking YOU as well between calls to
> hijack_syscall_* and release_syscall_*? Or would that trash the
> caller chain?
That should work fine, since we never explicitly refer to the entry in
the sys_call_table in our call chain (our callchain goes
hijacked_function
-> hook_before
if call original syscall
-> original syscall (the entry that was in the sys_call_table when we
hijacked it, not the currrent entry!)
-> hook_after
Note that we don't support stacking of hooks right now - we never had
need to.
--
The ill-formed Orange
Fails to satisfy the eye: http://vipe.technion.ac.il/~mulix/
Segmentation fault. http://syscalltrack.sf.net/
On Sun, 7 Apr 2002, John Levon wrote:
> But what about the recent discussion on the removal of sys_call_table ?
>
> (I believe it was along the lines of "it's ugly, prevent it ...", "ah,
> but it has real uses, so why can't it stay as deprecated/unadvised ?"
> "*no response*").
>
> I'm a bit disappointed this has just gone in without any real discussion
> on the usefulness of this for certain circumstances :(
Sure, removing that would break a lot of cracker software. Oh wait,
maybe that's a good thing...
For legitimate use, if any, a compile-time optional system call could be
added requiring a capability to use, and programs which are currently
doing that (AFS?) can be converted to use another f/s interface. I have
seen a few mentions of software which DO use that capability, I'm not sure
I've seen one which can be done no other way.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
From: "Bill Davidsen" <[email protected]>
Sent: Monday, April 08, 2002 4:48 PM
> On Sun, 7 Apr 2002, John Levon wrote:
>
> > But what about the recent discussion on the removal of sys_call_table ?
> >
> > (I believe it was along the lines of "it's ugly, prevent it ...", "ah,
> > but it has real uses, so why can't it stay as deprecated/unadvised ?"
> > "*no response*").
> >
> > I'm a bit disappointed this has just gone in without any real discussion
> > on the usefulness of this for certain circumstances :(
>
> Sure, removing that would break a lot of cracker software. Oh wait,
> maybe that's a good thing...
It's really easy for cracker to patch sys_call even if it the table is not
exported. Not exporting the sys call table is just to encourage good
programming technics not a protection against machiavel things.
> For legitimate use, if any, a compile-time optional system call could be
> added requiring a capability to use, and programs which are currently
> doing that (AFS?) can be converted to use another f/s interface. I have
> seen a few mentions of software which DO use that capability, I'm not sure
> I've seen one which can be done no other way.
As stated oprofile needs it, there is no other efficient way to track exec,
mmap and other sys call needed for profiler. I hope a consensus can
be reach : explain than unloading module wich patch the sys call table
are unsafe on SMP, discourage the use of sys call table patch, but do
not forbid that.
--
Philippe Elie
"Philippe Elie" <[email protected]> writes:
> From: "Bill Davidsen" <[email protected]>
> Sent: Monday, April 08, 2002 4:48 PM
>
> > For legitimate use, if any, a compile-time optional system call could be
> > added requiring a capability to use, and programs which are currently
> > doing that (AFS?) can be converted to use another f/s interface. I have
> > seen a few mentions of software which DO use that capability, I'm not sure
> > I've seen one which can be done no other way.
>
> As stated oprofile needs it, there is no other efficient way to track exec,
> mmap and other sys call needed for profiler. I hope a consensus can
> be reach : explain than unloading module wich patch the sys call table
> are unsafe on SMP, discourage the use of sys call table patch, but do
> not forbid that.
In times past when people were working on the vm86 system call you needed
a modified version of insmod, that could read System.map.
If you are going to be doing strange things I don't see why that shouldn't
still be required.
Though I am wondering if the sane approach for a profiler might not to be
have a kernel conditional compilation directive that simply patches
the syscall path. The overhead is probably less as well.
Eric
On Mon, Apr 08, 2002 at 11:53:38AM -0600, Eric W. Biederman wrote:
> If you are going to be doing strange things I don't see why that shouldn't
> still be required.
Practically speaking System.map is often wrongly installed. Additionally
Keith Owens has (sensibly I think) refused to support this sort of
thing.
> Though I am wondering if the sane approach for a profiler might not to be
> have a kernel conditional compilation directive that simply patches
> the syscall path. The overhead is probably less as well.
This would be OK for us, iff it was on by default. We want to avoid
forcing users to install kernel source and compile up a new kernel,
because it's not just useful for kernel profiling.
Currently it's just a matter of install the module and go, which is
undeniably very useful.
regards
john
--
"I never understood what's so hard about picking a unique
first and last name - and not going beyond the 6 character limit."
- Toon Moene