2003-11-22 00:05:23

by Michael Welles

[permalink] [raw]
Subject: Using get_cwd inside a module.

Apologies in advance for a lengthy and probably stupid question.

I'm one of the authors of changedfiles (http://changedfiles.org), a
GPL'd application that monitors file operations and allows user
configurable actions to take place in userspace when any defined rules
are matched. Currently we work w/ 2.2.x and 2.4.x kernel series, but
this question mostly concerns 2.4.x.

When doing the initial development of the kernel module component of the
system, I needed getcwd() in kernel space. Being lazy, and frankly
terrified to be working in kernel space, I ended up cutting and pasting
getcwd() from dcache.c and using the locally defined version in the module.

This worked great for a couple of years, but recently when my debian
testing box starting using gcc 3.3 (I think this is the cause -- it's
just about the only thing that changed), I started getting NULL pointer
drefs inside the copied code.

I figured it was high time I stoppied using cut n' paste code. Instead
I thought I'd use sys_call_table[__NR_getcwd] instead -- since I could
run pwd in a shell without any kernel panics, I figured it must be
working OK.

My powers of grep were sorely lacking, though, and I couldn't find
anywhere in the source where the function was assigned. In a bit of
desperation, I guessed that somewhere, somehow, the getcwd() in dcache
was being assigned, so I used that function prototype:

int (*getcwd)(char *buf, unsigned long size);
getcwd = (int (*)(char * , unsigned long
))(sys_call_table[__NR_getcwd]);

and use it as I used to:

len = getcwd(fullnewname, MAX_PATH);


Everything built just fine, but whenever I load the module and the above
statement runs, the function returns -14. This is true on my debian
testing box, and also on my YDL 3.0 machine, where the old version (with
the cut n' paste code) still runs just fine.

I'm not sure what to try next. What am I doing wrong?


Thanks,

Michael Welles



2003-11-22 00:33:14

by Maciej Żenczykowski

[permalink] [raw]
Subject: Re: Using get_cwd inside a module.

> ))(sys_call_table[__NR_getcwd]);
>
> and use it as I used to:
>
> len = getcwd(fullnewname, MAX_PATH);

sys_getcwd is the actual function name.

> Everything built just fine, but whenever I load the module and the above
> statement runs, the function returns -14. This is true on my debian
> testing box, and also on my YDL 3.0 machine, where the old version (with
> the cut n' paste code) still runs just fine.

Well -14 is EFAULT - i.e. memory protection failure. You are passing a
kernel area pointer (the buf) to a userspace oriented function - it checks
the access and fails it as the user process doesn't have access to the kernel
memory of your module (Ie it doesn't realise it is being called from the
kernel and checks whether the user process currently running would have
access to your modules memory (likely stack) - obviously it wouldn't and
thus it fails with -EFAULT.

Don't think there is anyway around this - so you'll probably need to
cut'n'paste. On the other hand if such functionality is required it is
probably best to split the fs/dcache.c sys_getcwd implementation into two
pieces - the userspace verify wrapper and the function which performs the
actual work.

Cheers,
MaZe.

PS. you'd basically need to stick everything except the kernel memory
allocation/freeing in a seperate helper function (probably named
do_getcwd). sys_getcwd would alloc page, call helper, free page.
Your code would just directly call the helper. If you need further help
just holler.

2003-11-22 08:30:38

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Using get_cwd inside a module.

The basic problem is that you shouldn't call syscalls from kernelspace.
Have you looked at dnotify to look for changed files instead?

2003-11-22 09:32:59

by Juergen Hasch

[permalink] [raw]
Subject: Re: Using get_cwd inside a module.

Am Samstag, 22. November 2003 09:30 schrieb Christoph Hellwig:
> The basic problem is that you shouldn't call syscalls from kernelspace.
> Have you looked at dnotify to look for changed files instead?

Dnotify doesn't return the file names that changed, changedfiles does.
I've looked into this, because Samba would benefit from such a functionality.

So maybe it would be possible to teach dnotify to return file names
(e.g. using netlink) ?

...Juergen

>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2003-11-22 10:16:02

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Using get_cwd inside a module.

On Sat, Nov 22, 2003 at 10:33:34AM +0100, Juergen Hasch wrote:
> Am Samstag, 22. November 2003 09:30 schrieb Christoph Hellwig:
> > The basic problem is that you shouldn't call syscalls from kernelspace.
> > Have you looked at dnotify to look for changed files instead?
>
> Dnotify doesn't return the file names that changed, changedfiles does.
> I've looked into this, because Samba would benefit from such a functionality.
>
> So maybe it would be possible to teach dnotify to return file names
> (e.g. using netlink) ?

Well, you can't return filenames. There's no unique path to a give
file.

What are the exact requirements of changedfiles or samba?

2003-11-22 10:44:42

by Juergen Hasch

[permalink] [raw]
Subject: Re: Using get_cwd inside a module.

Am Samstag, 22. November 2003 11:15 schrieb Christoph Hellwig:
> On Sat, Nov 22, 2003 at 10:33:34AM +0100, Juergen Hasch wrote:
> > Am Samstag, 22. November 2003 09:30 schrieb Christoph Hellwig:
> > > The basic problem is that you shouldn't call syscalls from kernelspace.
> > > Have you looked at dnotify to look for changed files instead?
> >
> > Dnotify doesn't return the file names that changed, changedfiles does.
> > I've looked into this, because Samba would benefit from such a
> > functionality.
> >
> > So maybe it would be possible to teach dnotify to return file names
> > (e.g. using netlink) ?
>
> Well, you can't return filenames. There's no unique path to a give
> file.
>
> What are the exact requirements of changedfiles or samba?

Samba needs to be able to notify a client machine, when a file in a
directory changes (i.e. is added/removed/modified/renamed). The directory
to be watched is given by the client and can include subdirectories.

Right now Samba on Linux uses dnotify to watch the given directory
and just returns an error (too many files changed). The client has
to find out for itself which files changed.
This breaks certain user applications (e.g. IIS), because they rely
on getting the file names back from Samba.
I have a patch which makes a copy of all directory information on
each dnotify event and looks for the changed files. This works but
obviously doesn't scale too well for many files in a directory.

...Juergen

2003-11-22 11:05:07

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Using get_cwd inside a module.

On Sat, Nov 22, 2003 at 11:45:39AM +0100, Juergen Hasch wrote:
> > What are the exact requirements of changedfiles or samba?
>
> Samba needs to be able to notify a client machine, when a file in a
> directory changes (i.e. is added/removed/modified/renamed). The directory
> to be watched is given by the client and can include subdirectories.

Well, reporting a single path component relative to the parent directory
is doable, there's just no way to have a canonical absolute or
multi-component pathname.

2003-11-22 11:52:56

by Andreas Dilger

[permalink] [raw]
Subject: Re: Using get_cwd inside a module.

On Nov 22, 2003 10:15 +0000, Christoph Hellwig wrote:
> On Sat, Nov 22, 2003 at 10:33:34AM +0100, Juergen Hasch wrote:
> > Dnotify doesn't return the file names that changed, changedfiles does.
> > I've looked into this, because Samba would benefit from such a functionality.
> >
> > So maybe it would be possible to teach dnotify to return file names
> > (e.g. using netlink) ?
>
> Well, you can't return filenames. There's no unique path to a give
> file.

Since the caller is already watching a specific directory, it doesn't
need to know the full pathname, just the inode number that changed.
Then Samba et. al. could do an inode->name(s) lookup on the directory.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2003-11-22 12:30:14

by Szymon Acedański

[permalink] [raw]
Subject: Re: Using get_cwd inside a module.

On Saturday 22 November 2003 01:32, Maciej Zenczykowski wrote:
> > ))(sys_call_table[__NR_getcwd]);
> >
> > and use it as I used to:
> >
> > len = getcwd(fullnewname, MAX_PATH);
>
> sys_getcwd is the actual function name.
>
> > Everything built just fine, but whenever I load the module and the above
> > statement runs, the function returns -14. This is true on my debian
> > testing box, and also on my YDL 3.0 machine, where the old version (with
> > the cut n' paste code) still runs just fine.
>
> Well -14 is EFAULT - i.e. memory protection failure. You are passing a
> kernel area pointer (the buf) to a userspace oriented function - it checks
> the access and fails it as the user process doesn't have access to the
> kernel memory of your module (Ie it doesn't realise it is being called from
> the kernel and checks whether the user process currently running would have
> access to your modules memory (likely stack) - obviously it wouldn't and
> thus it fails with -EFAULT.
>
> Don't think there is anyway around this - so you'll probably need to
> cut'n'paste. On the other hand if such functionality is required it is
> probably best to split the fs/dcache.c sys_getcwd implementation into two
> pieces - the userspace verify wrapper and the function which performs the
> actual work.

You can consider doing similar hack as kHTTPd does. It calls temporarily
set_fs(KERNEL_DS), disabling memlimit checking, then calls userspace-aware
function, and then restores everything back. Try looking at
linux/net/khttpd/rfc.c:SendHTTPHeader (it's in 2.4 kernels, but not in 2.6).

Cheers
accek

2003-11-22 17:17:06

by Michael Welles

[permalink] [raw]
Subject: Re: Using get_cwd inside a module.


--- Christoph Hellwig <[email protected]> wrote:

> Well, you can't return filenames. There's no unique
> path to a give
> file.
>
> What are the exact requirements of changedfiles or
> samba?

For changedfiles, essentially a device special that
reports file operations, e.g.

OPENW /home/foo/filename
MKDIR /home/foo/bar
RENAME /home/foo/bar -> /home/foo/goo

The userspace deamon reads the operations and files,
and compares then to regexp rules in the config file
i.e, for an automatic image converter:

RULE ^/home/dropbox/*.gif
SHELL /usr/bin/convert __FILE__ `basename __FILE__
.jpg`

It also does in process FTP, so you can do poor man's
replication to a remote machine without the overhead
of polling.




__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

2003-11-22 17:22:58

by Michael Welles

[permalink] [raw]
Subject: Re: Using get_cwd inside a module.

For changedfiles, using cut n' paste sys_getcwd, we've
managed to get "good enough" functionality. When a
user runs "touch foo" in /tmp/watch, when we intercept
the openw, we manage to resolve "foo" to
"/tmp/watch/foo". We still break on things like
"touch ../../foo", since we resolve that to
"/tmp/watch/../../foo" -- but that's pretty fixable
with a little effort.

NFS mounted filesystems still get us, since we rely on
intercepting the system calls and putting a wrapper
around them. I've looked into doing so, but I didn't
see a way of doing so in a module -- and we haven't
wanted to demand that users patch and rebuild a
kernel.


--- Christoph Hellwig <[email protected]> wrote:

>
> Well, reporting a single path component relative to
> the parent directory
> is doable, there's just no way to have a canonical
> absolute or
> multi-component pathname.
>


__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

2003-11-23 19:42:21

by Juergen Hasch

[permalink] [raw]
Subject: Re: Using get_cwd inside a module.

Am Samstag, 22. November 2003 12:04 schrieb Christoph Hellwig:
> On Sat, Nov 22, 2003 at 11:45:39AM +0100, Juergen Hasch wrote:
> > > What are the exact requirements of changedfiles or samba?
> >
> > Samba needs to be able to notify a client machine, when a file in a
> > directory changes (i.e. is added/removed/modified/renamed). The directory
> > to be watched is given by the client and can include subdirectories.
>
> Well, reporting a single path component relative to the parent directory
> is doable, there's just no way to have a canonical absolute or
> multi-component pathname.

If dnotify is able to return name and event type for each dnotify event,
it is quite simple to make use of it in Samba. I tested this some time
ago, but it was too much of a hack to consider sending this to the list.

What are the chances of such an extension being accepted in the
kernel at all ?
>From the LKML archives I haven't seen much love for dnotify and I
think only few Samba users would consider using a custom kernel.

...Juergen

2003-11-23 19:43:43

by Juergen Hasch

[permalink] [raw]
Subject: Re: Using get_cwd inside a module.

Am Samstag, 22. November 2003 12:48 schrieb Andreas Dilger:
> On Nov 22, 2003 10:15 +0000, Christoph Hellwig wrote:
> > On Sat, Nov 22, 2003 at 10:33:34AM +0100, Juergen Hasch wrote:
> > > Dnotify doesn't return the file names that changed, changedfiles does.
> > > I've looked into this, because Samba would benefit from such a
> > > functionality.
> > >
> > > So maybe it would be possible to teach dnotify to return file names
> > > (e.g. using netlink) ?
> >
> > Well, you can't return filenames. There's no unique path to a give
> > file.
>
> Since the caller is already watching a specific directory, it doesn't
> need to know the full pathname, just the inode number that changed.
> Then Samba et. al. could do an inode->name(s) lookup on the directory.

As the kernel would need to communicate with userspace anyway, I believe
getting the file name directly from the kernel shouldn't be too much to
ask for. The kernel already knows it and we save the lookup operation in
userspace that is needed each time we receive an event.

However getting inode and event type would still be a huge improvement,
because right now we have to stat all files in a directory and compare
it with a stored version, to find out what happend.

...Juergen