2004-01-02 12:59:18

by Libor Vanek

[permalink] [raw]
Subject: Syscall table AKA hijacking syscalls

Hi,
I'm writing some project which needs to hijack some syscalls in VFS
layer. AFAIK in 2.6 is this "not-wanted" solution (even that there are
some very nasty ways of doing it - see
http://mail.nl.linux.org/kernelnewbies/2002-12/msg00266.html )

Also I've found out that Linus stated that intercepting syscalls is "bad
thing" (load module a, load module b, unload module b => crash) but I
think that there are some very good reasons (and ways) to do it (see
http://syscalltrack.sourceforge.net ). My main reason to do it is that I
want my GPLed module to be able to modify some VFS syscalls without
patching and recompiling whole kernel and rebooting the machine.

So what is proper (Linus recommanded) way to do such a things? Create
patches for specific syscalls like "if this_module_installed then
call_this_function;" or try to force things like syscalltrack to go into
vanilla kernel some time? Because what I've found out there are more
projects which suffer from this restriction.


--

Libor Vanek







2004-01-02 13:08:57

by Matti Aarnio

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls

On Fri, Jan 02, 2004 at 01:59:08PM +0100, Libor Vanek wrote:
> Hi,
> I'm writing some project which needs to hijack some syscalls in VFS
> layer. AFAIK in 2.6 is this "not-wanted" solution (even that there are
...
> So what is proper (Linus recommanded) way to do such a things? Create
> patches for specific syscalls like "if this_module_installed then
> call_this_function;" or try to force things like syscalltrack to go into
> vanilla kernel some time? Because what I've found out there are more
> projects which suffer from this restriction.


Maybe:

int (*funcvec_v1)(args);
EXPORT_SYMBOL(funcvec_v1);

...

retval = (funcvec_v1) ? (funcvec_v1)(args..) : func_f1(args...);

or something of that kind.

There is, of course, whole slew of politically coloured
issues with this chainability.


> --
> Libor Vanek

/Matti Aarnio

2004-01-02 13:27:05

by Libor Vanek

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls

Matti Aarnio wrote:

>On Fri, Jan 02, 2004 at 01:59:08PM +0100, Libor Vanek wrote:
>
>
>>Hi,
>>I'm writing some project which needs to hijack some syscalls in VFS
>>layer. AFAIK in 2.6 is this "not-wanted" solution (even that there are
>>
>>
>...
>
>
>>So what is proper (Linus recommanded) way to do such a things? Create
>>patches for specific syscalls like "if this_module_installed then
>>call_this_function;" or try to force things like syscalltrack to go into
>>vanilla kernel some time? Because what I've found out there are more
>>projects which suffer from this restriction.
>>
>>
>
>
>There is, of course, whole slew of politically coloured
>issues with this chainability.
>
>

I think that issues with chainability are ALWAYS whenever you try to do
this hijacking general.


--

Libor Vanek




2004-01-02 13:57:52

by Ragnar Kjørstad

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls

On Fri, Jan 02, 2004 at 01:59:08PM +0100, Libor Vanek wrote:
> Hi,
> I'm writing some project which needs to hijack some syscalls in VFS
> layer. AFAIK in 2.6 is this "not-wanted" solution (even that there are
> some very nasty ways of doing it - see
> http://mail.nl.linux.org/kernelnewbies/2002-12/msg00266.html )
>
> Also I've found out that Linus stated that intercepting syscalls is "bad
> thing" (load module a, load module b, unload module b => crash) but I
> think that there are some very good reasons (and ways) to do it (see
> http://syscalltrack.sourceforge.net ). My main reason to do it is that I
> want my GPLed module to be able to modify some VFS syscalls without
> patching and recompiling whole kernel and rebooting the machine.

As part of the openxdsm-project we wrote an syscall-intercept module
that "solves" the (load module a, load module b, unload module b =>
crash) part by providing a common infrastructure for intercepting
syscalls.

It's available at:
http://cvs.sourceforge.net/viewcvs.py/openxdsm/openxdsm/eventmodule/module/events.c?rev=1.1.1.1&view=auto


--
Ragnar Kj?rstad

2004-01-02 15:12:15

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls

On Fri, Jan 02, 2004 at 01:59:08PM +0100, Libor Vanek wrote:
> Hi,
> I'm writing some project which needs to hijack some syscalls in VFS
> layer. AFAIK in 2.6 is this "not-wanted" solution (even that there are
> some very nasty ways of doing it - see
> http://mail.nl.linux.org/kernelnewbies/2002-12/msg00266.html )

Why do you need to hijack system calls from a module? 99% of the
times, it's the wrong technical solution.

> So what is proper (Linus recommanded) way to do such a things? Create
> patches for specific syscalls like "if this_module_installed then
> call_this_function;" or try to force things like syscalltrack to go into
> vanilla kernel some time? Because what I've found out there are more
> projects which suffer from this restriction.

There is no such Linus recommended way. For 2.6, syscalltrack's hijack
module moved into the kernel and will provide such generic
functionality one day. But I don't anticipate it every going into the
vanilla kernel, due to Linus's well known objections to syscall
hijacking in general and making it convenient in particular.

Cheers,
Muli
--
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/

"the nucleus of linux oscillates my world" - gccbot@#offtopic


Attachments:
(No filename) (1.22 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2004-01-02 15:38:36

by Libor Vanek

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls


>>I'm writing some project which needs to hijack some syscalls in VFS
>>layer. AFAIK in 2.6 is this "not-wanted" solution (even that there are
>>some very nasty ways of doing it - see
>>http://mail.nl.linux.org/kernelnewbies/2002-12/msg00266.html )
>>
>>
>
>Why do you need to hijack system calls from a module? 99% of the
>times, it's the wrong technical solution.
>
>
I'm working on my diploma thesis which is adding snapshot capability
into Linux VFS (so you can do directory based snapshots - not complete
device, like in LVM). It'll consist of two separete modules:
Snapshot module:
- will hijack (one or another way) calls to open/move/unlink/mkdir/etc.
syscall
- when will detect change to selected directory (which I want to
snapshot), it'll copy/move old file/directory to some temporary
(selected when creating snapshot) - in fact - copy on write behaviour

UnionFS module:
- will place "temporary" directory with saved files/dirs "over" actual
one and result will be read-only snapshot - this can be done without
hijacking syscalls probably
- something like overlay fs but a bit different

--

Libor Vanek



2004-01-02 15:39:13

by Libor Vanek

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls



>>I'm writing some project which needs to hijack some syscalls in VFS
>>layer. AFAIK in 2.6 is this "not-wanted" solution (even that there are
>>some very nasty ways of doing it - see
>>http://mail.nl.linux.org/kernelnewbies/2002-12/msg00266.html )
>>
>>Also I've found out that Linus stated that intercepting syscalls is "bad
>>thing" (load module a, load module b, unload module b => crash) but I
>>think that there are some very good reasons (and ways) to do it (see
>>http://syscalltrack.sourceforge.net ). My main reason to do it is that I
>>want my GPLed module to be able to modify some VFS syscalls without
>>patching and recompiling whole kernel and rebooting the machine.
>>
>>
>
>As part of the openxdsm-project we wrote an syscall-intercept module
>that "solves" the (load module a, load module b, unload module b =>
>crash) part by providing a common infrastructure for intercepting
>syscalls.
>
>
The code looks very nice'n'simple but it won't run on 2.6 because
mentioned hidden sys_call_table. But I can imagine that this with some
small tweaks can be integrated into 2.6 to provide generall
infrastructure for syscall hijacking when really needed.


--

Libor Vanek


2004-01-02 16:00:26

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls

On Fri, Jan 02, 2004 at 04:38:27PM +0100, Libor Vanek wrote:
> I'm working on my diploma thesis which is adding snapshot capability
> into Linux VFS (so you can do directory based snapshots - not complete
> device, like in LVM). It'll consist of two separete modules:
> Snapshot module:
> - will hijack (one or another way) calls to open/move/unlink/mkdir/etc.
> syscall
> - when will detect change to selected directory (which I want to
> snapshot), it'll copy/move old file/directory to some temporary
> (selected when creating snapshot) - in fact - copy on write behaviour

This should be implemented as a stackable filesystem..

2004-01-02 16:36:38

by Jörn Engel

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls

On Fri, 2 January 2004 16:00:20 +0000, Christoph Hellwig wrote:
> On Fri, Jan 02, 2004 at 04:38:27PM +0100, Libor Vanek wrote:
> > I'm working on my diploma thesis which is adding snapshot capability
> > into Linux VFS (so you can do directory based snapshots - not complete
> > device, like in LVM). It'll consist of two separete modules:
> > Snapshot module:
> > - will hijack (one or another way) calls to open/move/unlink/mkdir/etc.
> > syscall
> > - when will detect change to selected directory (which I want to
> > snapshot), it'll copy/move old file/directory to some temporary
> > (selected when creating snapshot) - in fact - copy on write behaviour
>
> This should be implemented as a stackable filesystem..

Does this filesystem stack work with multiple mount points?

My guess is that the filesystem change notification would be a better
solution, either in userspace or in kernelspace, doesn't matter. But
that is far from finished or even generally accepted.


For the diploma thesis, feel free to use any hack you like, including
hijacking syscalls. But remember that it is a hack and nothing else,
only helping you to remain on schedule and focus more on the real
subject. And don't plan on kernel acceptance either, as you will fail
either that or the thesis and I'd choose the thesis.

J?rn

--
Mundie uses a textbook tactic of manipulation: start with some
reasonable talk, and lead the audience to an unreasonable conclusion.
-- Bruce Perens

2004-01-02 16:42:14

by Jörn Engel

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls

On Fri, 2 January 2004 16:39:04 +0100, Libor Vanek wrote:
> >
> The code looks very nice'n'simple but it won't run on 2.6 because
> mentioned hidden sys_call_table. But I can imagine that this with some
> small tweaks can be integrated into 2.6 to provide generall
> infrastructure for syscall hijacking when really needed.

Repeat: This is a hack, nothing else.

*You* need it for your thesis, so go ahead. The right solution for
the problem, which is research, rather than engineering. But don't
think this mess should be integrated into mainline, much less tell
other people it should.

*Noone* needs it for real code that is supposed to do something
meaningful. Some people feel they need it, but they just haven't
found the right solution yet and believe this hack to be right. It
isn't.

Clear enough?

J?rn

--
The competent programmer is fully aware of the strictly limited size of
his own skull; therefore he approaches the programming task in full
humility, and among other things he avoids clever tricks like the plague.
-- Edsger W. Dijkstra

2004-01-02 17:00:22

by Libor Vanek

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls


>>>I'm working on my diploma thesis which is adding snapshot capability
>>>into Linux VFS (so you can do directory based snapshots - not complete
>>>device, like in LVM). It'll consist of two separete modules:
>>>Snapshot module:
>>>- will hijack (one or another way) calls to open/move/unlink/mkdir/etc.
>>>syscall
>>>- when will detect change to selected directory (which I want to
>>>snapshot), it'll copy/move old file/directory to some temporary
>>>(selected when creating snapshot) - in fact - copy on write behaviour
>>
>>This should be implemented as a stackable filesystem..
> Does this filesystem stack work with multiple mount points?

I think that stackable filesystem is useless for this. See this:
- I have some BIG (>> 1 TB) filesystem (XFS/ext3/JFS whatever!!!)
mounted in /BIG
- I call something like snapshot --what=/BIG/data --where=/OTHER_FS/snap1
- from now whenever I delete/create/modify file/dir in /BIG/data then
BEFORE the action the original file/dir should be copyied/moved to
/OTHER_FS/snap1

There is no place where can I use stackable filesystem because I'd need
to use it on the very beginning of mounting /BIG so instead:
mount /dev/md0 /BIG -t xfs
I'd need to do something like:
mount /dev/md0 /BIG -t snapfs -o fstype=xfs
Which is something I'm trying to avoid as much as possible for many reasons.


> My guess is that the filesystem change notification would be a better
> solution, either in userspace or in kernelspace, doesn't matter. But
> that is far from finished or even generally accepted.

This is also something (but just a bit) different - I don't need "change
notification" but "pre-change notification" ;)

> For the diploma thesis, feel free to use any hack you like, including
> hijacking syscalls. But remember that it is a hack and nothing else,
> only helping you to remain on schedule and focus more on the real
> subject. And don't plan on kernel acceptance either, as you will fail
> either that or the thesis and I'd choose the thesis.

You're absolutely right but when I'm going to spent several weeks on
something like this I'd like to do something usefull - not something
which will be trashed after exam. So I'm trying to find out some
"politically correct" way.

Right now I'll code it with the very nasty "find sys_call_table" hack
just I can test it without rebooting my machine (attempts to get 2.6.0
running under UML or VMWare failed :-((() and when I'll release some
0.0.1 version and get some good respondce I'll code it as the VFS patch
& module.


--

Libor Vanek

2004-01-02 18:05:17

by Jörn Engel

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls

On Fri, 2 January 2004 17:59:22 +0100, Libor Vanek wrote:
>
> >My guess is that the filesystem change notification would be a better
> >solution, either in userspace or in kernelspace, doesn't matter. But
> >that is far from finished or even generally accepted.
>
> This is also something (but just a bit) different - I don't need "change
> notification" but "pre-change notification" ;)

"Vor dem Spiel ist nach dem Spiel" -- Sepp Herberger

Except for exactly two cases, pre-change and post-change and the same,
just off-by-one. So you would need a bootup/mount/whenever special
case now, is that a big problem?

> >For the diploma thesis, feel free to use any hack you like, including
> >hijacking syscalls. But remember that it is a hack and nothing else,
> >only helping you to remain on schedule and focus more on the real
> >subject. And don't plan on kernel acceptance either, as you will fail
> >either that or the thesis and I'd choose the thesis.
>
> You're absolutely right but when I'm going to spent several weeks on
> something like this I'd like to do something usefull - not something
> which will be trashed after exam. So I'm trying to find out some
> "politically correct" way.

Then seperate the two problems. One is to figure out, what has
changed and two is to act accordingly. Two should be pretty
independent on this threads subject. If that part is really useful,
people will help you on problem one. Postpone. :)

Something I learned over time is that the first implementation is
almost always crappy, often even righout wrong. It has to be, because
noone really knows all the problems yet and thus can design the Proper
Solution (tm) yet. Look at the current devfs vs. udev discussion for
one example.

Many research people know this and won't give you any source code
beyond the official paper simply because it is horrible and the don't
want to wear brown paper-bags. There is no shame in a horrible first
try, noone could have done that much better than Richard Gooch back
then. Simply because noone could learn from his mistakes yet.

Ok, there is shame in a horrible first try, but there shouldn't be,
really. The "standing on the shoulders of giants" thing applies, even
when standing on the shoulders of dwarves, people should be more
polite. :)

And even though he won't read this, thank you Richard! He took the
unrewarding role and grew bitter, but he did a good thing.


Ok, back to your problem. Seperation is the way to go. Problem one
is a hard one and it takes a lot of time to do right. But hacking it
up is quite simple, so you can save time with the hack and do it right
only if your solution to problem two proved good enough.

J?rn

--
He who knows others is wise.
He who knows himself is enlightened.
-- Lao Tsu

2004-01-02 19:00:15

by Libor Vanek

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls

On Fri, Jan 02, 2004 at 07:04:31PM +0100, J?rn Engel wrote:
> On Fri, 2 January 2004 17:59:22 +0100, Libor Vanek wrote:
> > >My guess is that the filesystem change notification would be a better
> > >solution, either in userspace or in kernelspace, doesn't matter. But
> > >that is far from finished or even generally accepted.
> >
> > This is also something (but just a bit) different - I don't need "change
> > notification" but "pre-change notification" ;)
>
> "Vor dem Spiel ist nach dem Spiel" -- Sepp Herberger
>
> Except for exactly two cases, pre-change and post-change and the same,
> just off-by-one. So you would need a bootup/mount/whenever special
> case now, is that a big problem?

Probably my english is bad but I don't understand what are you trying to say (except the german part ;-))
A bit more about pre/post-change (if this is what are you trying to say) - I need allways pre-change because after file is changed I can no longer get original (pre-change) version of file which I need for snapshot.

> > >For the diploma thesis, feel free to use any hack you like, including
> > >hijacking syscalls. But remember that it is a hack and nothing else,
> > >only helping you to remain on schedule and focus more on the real
> > >subject. And don't plan on kernel acceptance either, as you will fail
> > >either that or the thesis and I'd choose the thesis.
> >
> > You're absolutely right but when I'm going to spent several weeks on
> > something like this I'd like to do something usefull - not something
> > which will be trashed after exam. So I'm trying to find out some
> > "politically correct" way.
>
> Then seperate the two problems. One is to figure out, what has
> changed and two is to act accordingly. Two should be pretty
> independent on this threads subject. If that part is really useful,
> people will help you on problem one. Postpone. :)
> ...
> Ok, back to your problem. Seperation is the way to go. Problem one
> is a hard one and it takes a lot of time to do right. But hacking it
> up is quite simple, so you can save time with the hack and do it right
> only if your solution to problem two proved good enough.

Yes - that's what I'm now doing. Just now I'm going to reboot my machine (<grr>) and try if EXPORT_SYMBOL(sys_open) works as I think it should ;)

Anyway - thanks for hints.

--
Libor Vanek



2004-01-02 19:16:12

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls

On Fri, 2004-01-02 at 19:58, Libor Vanek wrote:
> On Fri, Jan 02, 2004 at 07:04:31PM +0100, Jörn Engel wrote:
> > On Fri, 2 January 2004 17:59:22 +0100, Libor Vanek wrote:
> > > >My guess is that the filesystem change notification would be a better
> > > >solution, either in userspace or in kernelspace, doesn't matter. But
> > > >that is far from finished or even generally accepted.
> > >
> > > This is also something (but just a bit) different - I don't need "change
> > > notification" but "pre-change notification" ;)
> >
> > "Vor dem Spiel ist nach dem Spiel" -- Sepp Herberger
> >
> > Except for exactly two cases, pre-change and post-change and the same,
> > just off-by-one. So you would need a bootup/mount/whenever special
> > case now, is that a big problem?
>
> Probably my english is bad but I don't understand what are you trying to say (except the german part ;-))
> A bit more about pre/post-change (if this is what are you trying to say) - I need allways pre-change because after file is changed I can no longer get original (pre-change) version of file which I need for snapshot.

then you are off on the wrong track anyway since filedata can change
without system call anyway (think mmaped file where the dirtying doesnt'
involve a syscall


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-01-02 19:18:33

by Jörn Engel

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls

On Fri, 2 January 2004 19:58:48 +0100, Libor Vanek wrote:
> On Fri, Jan 02, 2004 at 07:04:31PM +0100, J?rn Engel wrote:
> >On Fri, 2 January 2004 17:59:22 +0100, Libor Vanek wrote:
> >>
> >> This is also something (but just a bit) different - I don't need "change
> >> notification" but "pre-change notification" ;)
> >
> >"Vor dem Spiel ist nach dem Spiel" -- Sepp Herberger
> >
> >Except for exactly two cases, pre-change and post-change and the same,
> >just off-by-one. So you would need a bootup/mount/whenever special
> >case now, is that a big problem?
>
> Probably my english is bad but I don't understand what are you trying to
> say (except the german part ;-))
> A bit more about pre/post-change (if this is what are you trying to say) -
> I need allways pre-change because after file is changed I can no longer get
> original (pre-change) version of file which I need for snapshot.

If you take a snapshot on every change within your scope, it doesn't
really matter whether you do it before or after the change. Before
change n is just after change n-1. All you have to do is take another
snapshot before the first change, that is the special case.

If you take snapshots just once in a blue moon, this obviously doesn't
work. But I wonder if that approach would make sense anyway, as
incremental snapshots are just so much nicer. Actually, I've once written
a Pascal program to do just that, for some stupid university credit.
It doesn't have the proper kernel hooks and is the wrong language, but
it does the incremental snapshot. Documentation is in German, but if
you are interested, I will try to find an electronic copy.

Actually, with userspace notification in place, you could even get
this with just cvs. Whenever a file is changed, commit. cvs add on
creation, etc. Yes, it sucks, but implementation simplicity has it's
own beauty and it would only take a few minutes. :)

J?rn

--
Geld macht nicht gl?cklich.
Gl?ck macht nicht satt.

2004-01-02 19:23:55

by Libor Vanek

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls

Arjan van de Ven wrote:
> On Fri, 2004-01-02 at 19:58, Libor Vanek wrote:
>
>>On Fri, Jan 02, 2004 at 07:04:31PM +0100, Jörn Engel wrote:
>>
>>>On Fri, 2 January 2004 17:59:22 +0100, Libor Vanek wrote:
>>>
>>>>>My guess is that the filesystem change notification would be a better
>>>>>solution, either in userspace or in kernelspace, doesn't matter. But
>>>>>that is far from finished or even generally accepted.
>>>>
>>>>This is also something (but just a bit) different - I don't need "change
>>>>notification" but "pre-change notification" ;)
>>>
>>>"Vor dem Spiel ist nach dem Spiel" -- Sepp Herberger
>>>
>>>Except for exactly two cases, pre-change and post-change and the same,
>>>just off-by-one. So you would need a bootup/mount/whenever special
>>>case now, is that a big problem?
>>
>>Probably my english is bad but I don't understand what are you trying to say (except the german part ;-))
>>A bit more about pre/post-change (if this is what are you trying to say) - I need allways pre-change because after file is changed I can no longer get original (pre-change) version of file which I need for snapshot.
> then you are off on the wrong track anyway since filedata can change
> without system call anyway (think mmaped file where the dirtying doesnt'
> involve a syscall

I know about this - the only (simple and fast enough) solution is to copy (backup) file whenever it's open for writing and mmap is called.


--

Libor Vanek



2004-01-02 19:37:39

by Libor Vanek

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls

J?rn Engel wrote:
> On Fri, 2 January 2004 19:58:48 +0100, Libor Vanek wrote:
>
>>On Fri, Jan 02, 2004 at 07:04:31PM +0100, J?rn Engel wrote:
>>
>>>On Fri, 2 January 2004 17:59:22 +0100, Libor Vanek wrote:
>>>
>>>>This is also something (but just a bit) different - I don't need "change
>>>>notification" but "pre-change notification" ;)
>>>
>>>"Vor dem Spiel ist nach dem Spiel" -- Sepp Herberger
>>>
>>>Except for exactly two cases, pre-change and post-change and the same,
>>>just off-by-one. So you would need a bootup/mount/whenever special
>>>case now, is that a big problem?
>>
>>Probably my english is bad but I don't understand what are you trying to
>>say (except the german part ;-))
>>A bit more about pre/post-change (if this is what are you trying to say) -
>>I need allways pre-change because after file is changed I can no longer get
>>original (pre-change) version of file which I need for snapshot.
> If you take a snapshot on every change within your scope, it doesn't
> really matter whether you do it before or after the change. Before
> change n is just after change n-1. All you have to do is take another
> snapshot before the first change, that is the special case.

But this special case in fact means to copy all the data, if wanted to do it 100% working ;-) And I suggest that it wont' go through my exam ;)

> Actually, with userspace notification in place, you could even get
> this with just cvs. Whenever a file is changed, commit. cvs add on
> creation, etc. Yes, it sucks, but implementation simplicity has it's
> own beauty and it would only take a few minutes. :)

I've heard about some fs from Microsoft which should have cvs-like behaviour for all the time ("I want this file version from yesterday") - but I haven't had any details (and I suppose performance hit must be big)

--

Libor Vanek




2004-01-02 19:56:29

by Jörn Engel

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls

On Fri, 2 January 2004 20:37:37 +0100, Libor Vanek wrote:
>
> >If you take a snapshot on every change within your scope, it doesn't
> >really matter whether you do it before or after the change. Before
> >change n is just after change n-1. All you have to do is take another
> >snapshot before the first change, that is the special case.
>
> But this special case in fact means to copy all the data, if wanted to do
> it 100% working ;-) And I suggest that it wont' go through my exam ;)

Yes, it does. Cannot comment on your exam, though.

> >Actually, with userspace notification in place, you could even get
> >this with just cvs. Whenever a file is changed, commit. cvs add on
> >creation, etc. Yes, it sucks, but implementation simplicity has it's
> >own beauty and it would only take a few minutes. :)
>
> I've heard about some fs from Microsoft which should have cvs-like
> behaviour for all the time ("I want this file version from yesterday") -
> but I haven't had any details (and I suppose performance hit must be big)

Not too big, actually. The obvious implementation has to write data
twice (plus metadata, but that's minimal), so performance is ~50% of
normal. With hard drives and a smarter implementation, you don't have
to worry about seek latency too much for the log, so performance is
usually better than 50%. Basically, it hurts where you are good
enough anyway (large sequential writes) and goes unnoticed where
performance is bad (many small scattered writes).

The real problem is storage size. Each write will *permanently*
reduce the filesystem size. Maxtor and friends should happily sponsor
such development. :)

Seriously, such a thing is the perfect cure for "rm -rf foo /" and
similar famous mistakes and you can always just remove the oldest
backup data (preferrably after writing it to a tape).

J?rn

--
A defeated army first battles and then seeks victory.
-- Sun Tzu

2004-01-02 23:40:31

by Anton Blanchard

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls


> I'm writing some project which needs to hijack some syscalls in VFS
> layer. AFAIK in 2.6 is this "not-wanted" solution (even that there are
> some very nasty ways of doing it - see
> http://mail.nl.linux.org/kernelnewbies/2002-12/msg00266.html )

And it will fail miserably on many non x86 architectures for
various reasons:

1. ppc64 and ia64 use function descriptors
2. sparc64 uses a 32bit call out table

In short its not only an awful hack, its horribly non portable :)

Anton

2004-01-02 23:47:06

by Libor Vanek

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls

>>I'm writing some project which needs to hijack some syscalls in VFS
>>layer. AFAIK in 2.6 is this "not-wanted" solution (even that there are
>>some very nasty ways of doing it - see
>>http://mail.nl.linux.org/kernelnewbies/2002-12/msg00266.html )
>
>
> And it will fail miserably on many non x86 architectures for
> various reasons:
>
> 1. ppc64 and ia64 use function descriptors
> 2. sparc64 uses a 32bit call out table
>
> In short its not only an awful hack, its horribly non portable :)

But in short you always get some syscall from userspace and have some table with function vectors assigned to each syscall, don't you?

So you can have something like "append_this_function_before_syscall_sys_open" and "append_this_function_after_syscall_sys_open" which would be platform independent but will have platform dependent implementation.


--

Libor Vanek

2004-01-03 15:30:22

by Helge Hafting

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls

On Sat, Jan 03, 2004 at 12:46:17AM +0100, Libor Vanek wrote:
> >>I'm writing some project which needs to hijack some syscalls in VFS
> >>layer. AFAIK in 2.6 is this "not-wanted" solution (even that there are
> >>some very nasty ways of doing it - see
> >>http://mail.nl.linux.org/kernelnewbies/2002-12/msg00266.html )
> >
> >
> >And it will fail miserably on many non x86 architectures for
> >various reasons:
> >
> >1. ppc64 and ia64 use function descriptors
> >2. sparc64 uses a 32bit call out table
> >
> >In short its not only an awful hack, its horribly non portable :)
>
> But in short you always get some syscall from userspace and have some table
> with function vectors assigned to each syscall, don't you?
>
> So you can have something like
> "append_this_function_before_syscall_sys_open" and
> "append_this_function_after_syscall_sys_open" which would be platform
> independent but will have platform dependent implementation.

Why bother overriding syscalls?
If you want a different sys_open, just modify/rewrite it. Then you get a kernel
that works your way without touching the syscall table (or other implementation of it)
at all.

Of course this sort of rewrite cannot be acitvated/deactivated by loading/unloading
a module. But that isn't necessary, use a writeable flag in /proc instead.

i.e.:
sys_open(...)
{
if (activated_in_proc) my_sys_open(...); else standard_sys_open(...);
}

Helge Hafting

2004-01-07 09:36:02

by stefan.eletzhofer

[permalink] [raw]
Subject: Re: Syscall table AKA hijacking syscalls

On Fri, Jan 02, 2004 at 04:38:27PM +0100, Libor Vanek wrote:
>
> >>I'm writing some project which needs to hijack some syscalls in VFS
> >>layer. AFAIK in 2.6 is this "not-wanted" solution (even that there are
> >>some very nasty ways of doing it - see
> >>http://mail.nl.linux.org/kernelnewbies/2002-12/msg00266.html )
> >>
> >>
> >
> >Why do you need to hijack system calls from a module? 99% of the
> >times, it's the wrong technical solution.
> >
> >
> I'm working on my diploma thesis which is adding snapshot capability
> into Linux VFS (so you can do directory based snapshots - not complete
> device, like in LVM). It'll consist of two separete modules:
> Snapshot module:
> - will hijack (one or another way) calls to open/move/unlink/mkdir/etc.
> syscall
> - when will detect change to selected directory (which I want to
> snapshot), it'll copy/move old file/directory to some temporary
> (selected when creating snapshot) - in fact - copy on write behaviour

Do it in userspace. Hack a nfs server.

>
> UnionFS module:
> - will place "temporary" directory with saved files/dirs "over" actual
> one and result will be read-only snapshot - this can be done without
> hijacking syscalls probably
> - something like overlay fs but a bit different
>
> --
>
> Libor Vanek
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Eletztrick Computing - Customized Linux Development
Stefan Eletzhofer, Marktstrasse 43, DE-88214 Ravensburg
http://www.eletztrick.de