Hi,
I'm glad to see you guys are working on a common acl api for ext2/3 and xfs.
I was just wondering if this api provided what would be needed for linux
to support NTFS's acls.
Now bare in mind I know little about how NTFS's alc's are implimented or
if they follow POSIX at
all. But I just thought it might be worth asking the ntfs maintainer if
the proposed api would be
adaquit to support ntfs's acls on linux should they ever want to
impliment this. Might save them
headaches someday.
Also will it supply the interface needed for other filesystems that have
been ported that linux that
support acls? (i.e. will it work for them, could they use it in the
future if/when they decide to
impliment that feature) I think JFS might support acls too.
Sorry to be a bother,
Tim
hi Tim,
On Sat, Nov 10, 2001 at 04:08:50AM -0500, Tim R. wrote:
> I'm glad to see you guys are working on a common acl api for ext2/3 and xfs.
> I was just wondering if this api provided what would be needed for linux
> to support NTFS's acls.
> Now bare in mind I know little about how NTFS's alc's are implimented or
> if they follow POSIX at
> all. But I just thought it might be worth asking the ntfs maintainer if
> the proposed api would be
> adaquit to support ntfs's acls on linux should they ever want to
> impliment this. Might save them
> headaches someday.
The API doesn't favour any one form of ACL - it allows for any
implementation to be layered above it, provided the semantics
of those ACLs can be expressed using extended attributes, of
course.
> Also will it supply the interface needed for other filesystems that have
> been ported that linux that
> support acls? (i.e. will it work for them, could they use it in the
> future if/when they decide to
> impliment that feature) I think JFS might support acls too.
Yes, I believe so. I see EA and ACL support is on the JFS todo
list - I was contacted by some folk at IBM who let me know this,
so there is certainly some interest there.
cheers.
--
Nathan
At 09:08 10/11/2001, Tim R. wrote:
>I'm glad to see you guys are working on a common acl api for ext2/3 and xfs.
>I was just wondering if this api provided what would be needed for linux
>to support NTFS's acls.
>Now bare in mind I know little about how NTFS's alc's are implimented or
>if they follow POSIX at
>all. But I just thought it might be worth asking the ntfs maintainer if
>the proposed api would be
>adaquit to support ntfs's acls on linux should they ever want to impliment
>this. Might save them
>headaches someday.
Comments/problems for NTFS with proposed EA/ACL API:
I think the API is good for extended attributes, no doubt. If we ever get
round to implementing EAs in NTFS then I would be happy to use the API. It
fully satisfies the needs of the NTFS EAs. The only addition I would put
into the API is that the names of the extended attributes have to be able
to have different name spaces themselves. For example I am fairly sure that
the name of an EA in NTFS cannot contain just any character and it
certainly cannot have a name of any length... This is something that needs
to be considered. At least there must be a defined error return values of
"EILSEQ" (bad name namespace) and "ENAMETOOLONG" (self evident).
But for ACLs I am not so positive:
I guess the real problem is that NTFS security doesn't map very well onto
Unix/Linux type of security model because the NTFS model has way more features.
If you are asking the question whether NTFS can work with the proposed API
then yes, it can support all its features, but not the other way round...
Particular problems:
- The proposed API puts ACLs inside extended attributes (EAs). On NTFS ACLs
have nothing to do with extended attributes. They are two entirely
different things. I suppose they could be merged into one API and the NTFS
driver would have to parse and decide whether it is supposed to be
operating on ACLs or EAs. But that will be a pain, especially as there may
be ways of abusing the system, depending on how exactly it is implemented.
- The ACLs in NTFS are _way_ more complex than the suggested ones. So
mapping from one to the other is possible only when creating new files.
When reading/writing existing ACLs a lot of information would be lost.
Further each inode has a "user" owner and a group "owner" plus two types of
ACLs: system one (SACL) and discretionary "normal" one (DACL).
These four thigns are stored within a self relative security descriptor.
And some of them are optional or can be inherited from parent inode or can
be defaulted. - This actually breaks the current API which says that files
cannot inherit/default file ACLs. In NTFS they can.
The actual permissions in NTFS are not just RWX but they are a lot more
granular (a 32 bitfield, see below URL for a list of all defined values)
and some of them even determine the access rights to extended attributes,
which needless to say causes a problem if ACLs are treated as EAs...
- NTFS doesn't store uids but Security Identifiers (SID ones not
Security_ID ones, both are separate things on NTFS. Are you confused yet? I
am...) so mapping would need to exist between NTFS SID and Linux UIDs.
Samba needs to do this (and does it already AFAIK), too, but that is more a
problem of NTFS and not a Linux ACL API.
All NTFS security stuff can be seen at the following URL - just search for
IDENTIFIER_AUTHORITY and read from there on... all security related
structures are defined there and there are quite a few comments.
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/linux-ntfs/ntfs-driver-tng/linux/include/linux/ntfs_layout.h?rev=1.11&content-type=text/vnd.viewcvs-markup
You can also read the NTFS documentation on SF but note that this is not as
complete as the header file above but it might be easier to understand. The
url with the description of the security descriptor is:
http://linux-ntfs.sourceforge.net/ntfs/attributes/security_descriptor.html
Best regards,
Anton
--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/
hi Anton,
On Mon, Nov 12, 2001 at 01:57:28AM +0000, Anton Altaparmakov wrote:
> At 09:08 10/11/2001, Tim R. wrote:
> >I'm glad to see you guys are working on a common acl api for ext2/3 and xfs.
> >I was just wondering if this api provided what would be needed for linux
> >to support NTFS's acls.
>
> Comments/problems for NTFS with proposed EA/ACL API:
>
> I think the API is good for extended attributes, no doubt. If we ever get
> round to implementing EAs in NTFS then I would be happy to use the API. It
> fully satisfies the needs of the NTFS EAs.
That's great to hear! Thanks.
> The only addition I would put
> into the API is that the names of the extended attributes have to be able
> to have different name spaces themselves. For example I am fairly sure that
> the name of an EA in NTFS cannot contain just any character and it
> certainly cannot have a name of any length... This is something that needs
> to be considered.
Agreed. I think the determination of what is a valid attribute
name and what are the max name lengths, etc. will have to be
done within the filesystem, as we have already come across these
sorts of differences when reconciling the XFS EAs (from IRIX)
with Andreas' ext2/3 ones.
I'll put out an initial attempt at some VFS code to sit behind
this system call soon too.
> I guess the real problem is that NTFS security doesn't map very well onto
> Unix/Linux type of security model because the NTFS model has way more features.
>
> If you are asking the question whether NTFS can work with the proposed API
> then yes, it can support all its features, but not the other way round...
>
> Particular problems:
>
> - The proposed API puts ACLs inside extended attributes (EAs).
By "proposed API" I assume you are talking about Andreas' POSIX ACL
implementation? (i.e. not this API for extended attributes, which
we have described here, as this is not specific to any form of ACLs).
I don't think anyone has ever proposed that the NTFS ACLs need to
use the same userspace tools as Andreas' POSIX ACL implementation -
I'm not sure that makes any sense at all.
> On NTFS ACLs have nothing to do with extended attributes.
Yup and right there this extended attribute system call API becomes
inappropriate for NTFS ACLs, right? It is not that its a limitation
of this API, it is that NTFS does not use EAs to store ACLs. Using
a crowbar to try to force that association is almost certainly not
a good idea.
> They are two entirely different things.
*nod*
> I suppose they could be merged into one API and the NTFS
> driver would have to parse and decide whether it is supposed to be
> operating on ACLs or EAs. But that will be a pain, especially as there may
> be ways of abusing the system, depending on how exactly it is implemented.
If you do implement NT ACLs in NTFS, then it sounds like you would
be better off using a different API to this extended attribute one.
Andreas has several thoughts on this - he knows plenty more about
ACLs than I, having researched this area alot and having completely
implemented POSIX ACLs for ext2/3 from scratch. I have read mail
from him saying that given the different semantics of the numerous
forms of ACL, that a single ACL API is inappropriate/impossible.
It is convenient for us in XFS and ext2/3 to be able to use extended
attributes to implement POSIX ACLs - because these filesystems all
really do implement POSIX ACLs using extended attributes. Of course,
this doesn't necessarily mean that it will be convenient for other
filesystems that don't implement their particular flavour of ACLs as
extended attributes.
Thanks for the feedback, Anton - much appreciated.
cheers.
--
Nathan
On Mon, Nov 12, 2001 at 02:20:29PM +1100, Nathan Scott wrote:
> On Mon, Nov 12, 2001 at 01:57:28AM +0000, Anton Altaparmakov wrote:
> > At 09:08 10/11/2001, Tim R. wrote:
> > >I'm glad to see you guys are working on a common acl api for ext2/3 and xfs.
> > >I was just wondering if this api provided what would be needed for linux
> > >to support NTFS's acls.
> >
> > Comments/problems for NTFS with proposed EA/ACL API:
> >
> > I think the API is good for extended attributes, no doubt. If we ever get
> > round to implementing EAs in NTFS then I would be happy to use the API. It
> > fully satisfies the needs of the NTFS EAs.
>
> That's great to hear! Thanks.
> ...
> I'll put out an initial attempt at some VFS code to sit behind
> this system call soon too.
>
Al, folks,
Andreas and I have been looking at several different VFS mechanisms
for extended attributes, I've included the code for one below, and
we're keen to get a bit of feedback here as well.
We started off with the simplest mechanism, just passing everything
straight down into the filesystem. I then played around with some
ways of separating out the different operations and then passing off
to the filesystem that way (see patch) to give the interface a more
rigid definition. Andreas' original mechanism was alot like this,
except used NULLs in some field values instead of explicit flags to
distinguish similar operations - that's another approach.
Yet another way would be to have an ea_operations vector separate to
the inode_operations with an ea_operations pointer in struct inode,
enumerating each EA operation and doing away with the flags (in the
patch below) altogether.
Any suggestions/improvements? The patch below is very much a work
in progress - it may even compile.
many thanks.
--
Nathan
diff -Naur 2.4.14-pristine/arch/i386/kernel/entry.S 2.4.14-explicit/arch/i386/kernel/entry.S
--- 2.4.14-pristine/arch/i386/kernel/entry.S Sat Nov 3 12:18:49 2001
+++ 2.4.14-explicit/arch/i386/kernel/entry.S Fri Nov 9 15:34:29 2001
@@ -622,6 +622,9 @@
.long SYMBOL_NAME(sys_ni_syscall) /* Reserved for Security */
.long SYMBOL_NAME(sys_gettid)
.long SYMBOL_NAME(sys_readahead) /* 225 */
+ .long SYMBOL_NAME(sys_extattr)
+ .long SYMBOL_NAME(sys_lextattr)
+ .long SYMBOL_NAME(sys_fextattr)
.rept NR_syscalls-(.-sys_call_table)/4
.long SYMBOL_NAME(sys_ni_syscall)
diff -Naur 2.4.14-pristine/fs/Makefile 2.4.14-explicit/fs/Makefile
--- 2.4.14-pristine/fs/Makefile Tue Nov 6 08:40:59 2001
+++ 2.4.14-explicit/fs/Makefile Fri Nov 9 15:27:42 2001
@@ -14,7 +14,7 @@
super.o block_dev.o char_dev.o stat.o exec.o pipe.o namei.o \
fcntl.o ioctl.o readdir.o select.o fifo.o locks.o \
dcache.o inode.o attr.o bad_inode.o file.o iobuf.o dnotify.o \
- filesystems.o namespace.o
+ filesystems.o namespace.o extattr.o
ifeq ($(CONFIG_QUOTA),y)
obj-y += dquot.o
diff -Naur 2.4.14-pristine/fs/extattr.c 2.4.14-explicit/fs/extattr.c
--- 2.4.14-pristine/fs/extattr.c Thu Jan 1 10:00:00 1970
+++ 2.4.14-explicit/fs/extattr.c Mon Nov 12 11:24:11 2001
@@ -0,0 +1,101 @@
+/*
+ File: fs/extattr.c
+
+ Extended attribute handling.
+
+ Copyright (C) 2001 by Andreas Gruenbacher <[email protected]>
+ Copyright (C) 2001 SGI - Silicon Graphics, Inc <[email protected]>
+ */
+
+#include <linux/mm.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/smp_lock.h>
+#include <linux/extattr.h>
+
+static long
+extattr_inode(struct inode *i, int cmd, char *name, void *value, size_t size)
+{
+ int error = -EOPNOTSUPP, flags = EA_FLAG_USER;
+
+ lock_kernel();
+ switch (cmd) {
+ case EA_SET:
+ case EA_CREATE:
+ case EA_REPLACE:
+ case EA_REMOVE:
+ if (!i->i_op->setxattr)
+ break;
+ if (cmd == EA_CREATE)
+ flags |= EA_FLAG_CREATE;
+ else if (cmd == EA_REPLACE)
+ flags |= EA_FLAG_REPLACE;
+ else if (cmd == EA_REMOVE)
+ flags |= EA_FLAG_REMOVE;
+ error = i->i_op->setxattr(i, name, value, size, flags);
+ break;
+
+ case EA_GETSIZE:
+ flags |= EA_FLAG_SZONLY;
+ case EA_GET:
+ if (!i->i_op->getxattr)
+ break;
+ error = i->i_op->getxattr(i, name, value, size, flags);
+ break;
+
+ case EA_LISTSIZE:
+ flags |= EA_FLAG_SZONLY;
+ case EA_LIST:
+ if (!i->i_op->listxattr)
+ break;
+ error = i->i_op->listxattr(i, name, value, size, flags);
+ break;
+
+ default:
+ error = -EINVAL;
+ }
+ unlock_kernel();
+ return error;
+}
+
+asmlinkage long
+sys_extattr(char *path, int cmd, char *name, void *value, size_t size)
+{
+ struct nameidata nd;
+ int error;
+
+ error = user_path_walk(path, &nd);
+ if (error)
+ return error;
+ error = extattr_inode(nd.dentry->d_inode, cmd, name, value, size);
+ path_release(&nd);
+ return error;
+}
+
+asmlinkage long
+sys_lextattr(char *path, int cmd, char *name, void *value, size_t size)
+{
+ struct nameidata nd;
+ int error;
+
+ error = user_path_walk_link(path, &nd);
+ if (error)
+ return error;
+ error = extattr_inode(nd.dentry->d_inode, cmd, name, value, size);
+ path_release(&nd);
+ return error;
+}
+
+asmlinkage long
+sys_fextattr(int fd, int cmd, char *name, void *value, size_t size)
+{
+ struct file *f;
+ int error = -EBADF;
+
+ f = fget(fd);
+ if (!f)
+ return error;
+ error = extattr_inode(f->f_dentry->d_inode, cmd, name, value, size);
+ fput(f);
+ return error;
+}
diff -Naur 2.4.14-pristine/include/asm-i386/unistd.h 2.4.14-explicit/include/asm-i386/unistd.h
--- 2.4.14-pristine/include/asm-i386/unistd.h Thu Oct 18 03:03:03 2001
+++ 2.4.14-explicit/include/asm-i386/unistd.h Fri Nov 9 15:37:08 2001
@@ -230,6 +230,9 @@
#define __NR_security 223 /* syscall for security modules */
#define __NR_gettid 224
#define __NR_readahead 225
+#define __NR_extattr 226
+#define __NR_lextattr 227
+#define __NR_fextattr 228
/* user-visible error numbers are in the range -1 - -124: see <asm-i386/errno.h> */
diff -Naur 2.4.14-pristine/include/linux/extattr.h 2.4.14-explicit/include/linux/extattr.h
--- 2.4.14-pristine/include/linux/extattr.h Thu Jan 1 10:00:00 1970
+++ 2.4.14-explicit/include/linux/extattr.h Mon Nov 12 11:24:01 2001
@@ -0,0 +1,30 @@
+/*
+ File: linux/extattr.h
+
+ Extended attributes handling.
+
+ Copyright (C) 2001 by Andreas Gruenbacher <[email protected]>
+ Copyright (C) 2001 SGI - Silicon Graphics, Inc <[email protected]>
+*/
+#ifndef _LINUX_EXTATTR_H
+#define _LINUX_EXTATTR_H
+
+/* Operations */
+#define EA_SET 1 /* set the value, create attr where necessary */
+#define EA_CREATE 2 /* set the value, fail if attr already exists */
+#define EA_REPLACE 3 /* set the value, fail if attr does not exist */
+#define EA_REMOVE 4 /* remove the named attribute entirely */
+#define EA_GET 5 /* get the value for named attribute */
+#define EA_GETSIZE 6 /* size of value for named attribute */
+#define EA_LIST 7 /* get the list of attribute names */
+#define EA_LISTSIZE 8 /* size of list of attribute names */
+
+#ifdef __KERNEL__
+#define EA_FLAG_USER 0x0001
+#define EA_FLAG_SZONLY 0x0002
+#define EA_FLAG_CREATE 0x0004
+#define EA_FLAG_REPLACE 0x0008
+#define EA_FLAG_REMOVE 0x0010
+#endif
+
+#endif /* _LINUX_EXTATTR_H */
diff -Naur 2.4.14-pristine/include/linux/fs.h 2.4.14-explicit/include/linux/fs.h
--- 2.4.14-pristine/include/linux/fs.h Tue Nov 6 07:42:14 2001
+++ 2.4.14-explicit/include/linux/fs.h Sat Nov 10 14:10:39 2001
@@ -840,6 +840,9 @@
int (*revalidate) (struct dentry *);
int (*setattr) (struct dentry *, struct iattr *);
int (*getattr) (struct dentry *, struct iattr *);
+ int (*setxattr) (struct inode *, char *, void *, size_t, int);
+ int (*getxattr) (struct inode *, char *, void *, size_t, int);
+ int (*listxattr) (struct inode *, char *, void *, size_t, int);
};
/*
[Cc'd to Linus since API changes on that level definitely require his
approval]
On Mon, 12 Nov 2001, Nathan Scott wrote:
> +static long
> +extattr_inode(struct inode *i, int cmd, char *name, void *value, size_t size)
Broken.
a) passing inode is an obvious mistake. dentry or vfsmount/dentry.
b) for crying out loud, what's that with SGI and ioctl-like abortions?
Rule of the tumb: if your function got a "cmd" argument - it's broken.
ioctl(2). fcntl(2). prctl(2). quotactl(2). sysfs(2). Missed'em'V IPC
syscalls. Enough, already.
Folks, it's not a rocket science. Let a function do _one_ thing,
don't turn it into a multiplexed monstrosity. Yes, you've used only 3
syscalls. But actually you've managed to hide ~20 of them in that code
and the fact that you've spent only 3 syscall table entries doesn't make
the things better.
Please, come up with a decent API.
Hello Al,
On Mon, 12 Nov 2001, Alexander Viro wrote:
>
> [Cc'd to Linus since API changes on that level definitely require his
> approval]
>
> On Mon, 12 Nov 2001, Nathan Scott wrote:
>
> > +static long
> > +extattr_inode(struct inode *i, int cmd, char *name, void *value, size_t size)
>
> Broken.
> a) passing inode is an obvious mistake. dentry or vfsmount/dentry.
There are curently two paths by which the extended attribute inode
operations can be invoked: (a) from a system call, (b) from the
permission() inode operation, when checking the access ACL of a file. We
could trivially use a dentry in (a), but unfortunately we don't have a
choice in (b), as permission() itself is not passed a dentry.
It's planned that all inode operations use dentries somewhen in 2.5.
This would be the proper time to move to dentries in the EA code as well.
> b) for crying out loud, what's that with SGI and ioctl-like abortions?
>
> Rule of the tumb: if your function got a "cmd" argument - it's broken.
> ioctl(2). fcntl(2). prctl(2). quotactl(2). sysfs(2). Missed'em'V IPC
> syscalls. Enough, already.
There is one difference between the interfaces you are complaining about
above and the proposed EA interface for EA's: In those interfaces you have
wildcard parameters that are used for who-knows-what, depending on a
command-like parameter, including use as a value, use as a pointer to a
value/struct, etc.
In the EA interface we have clear semantics of what the parameters' types
and sizes are, so many of the problems there are with ioctl() and friends
don't occur here. You could as well call the `cmd' parameter a `flags'
parameter here, then you're pretty close to the open() syscall.
It would be possible to split the EA syscalls in a set for retrieving and
aonther set for setting EA's, and perhaps still a third set for listing
the EA's that are present. Those syscalls would only differ in their
names. I would consider it much more useful to provide functions in a
library for dealing with EA's in user space, which in turn would use the
syscalls, though.
Cheers,
Andreas.
On Mon, 12 Nov 2001, Andreas Gruenbacher wrote:
> There are curently two paths by which the extended attribute inode
> operations can be invoked: (a) from a system call, (b) from the
> permission() inode operation, when checking the access ACL of a file. We
> could trivially use a dentry in (a), but unfortunately we don't have a
> choice in (b), as permission() itself is not passed a dentry.
Which means that converting permission() to vfsmount/dentry should be
done first. And that's not hard to do.
> > Rule of the tumb: if your function got a "cmd" argument - it's broken.
> > ioctl(2). fcntl(2). prctl(2). quotactl(2). sysfs(2). Missed'em'V IPC
> > syscalls. Enough, already.
>
> There is one difference between the interfaces you are complaining about
> above and the proposed EA interface for EA's: In those interfaces you have
> wildcard parameters that are used for who-knows-what, depending on a
> command-like parameter, including use as a value, use as a pointer to a
> value/struct, etc.
Yes, and? You've got more than enough material for the same kind of
abuse. What's more, you _already_ have it - in some of the subfunctions
*data is read from, in some - written to, in some - ignored. Worse
yet, in some subfunctions we put structured data in there, in some -
just a chunk of something.
With all that, who had said that a year down the road we won't get a
dozen of new syscalls hiding behind that one?
Sorry, folks, but idea of private extendable syscall table (per-filesystem,
no less) doesn't look like a good thing. That's _the_ reason why ioctl()
is bad.
At 00:32 13/11/01, Alexander Viro wrote:
>On Mon, 12 Nov 2001, Andreas Gruenbacher wrote:
> > There is one difference between the interfaces you are complaining about
> > above and the proposed EA interface for EA's: In those interfaces you have
> > wildcard parameters that are used for who-knows-what, depending on a
> > command-like parameter, including use as a value, use as a pointer to a
> > value/struct, etc.
>
>Yes, and? You've got more than enough material for the same kind of
>abuse. What's more, you _already_ have it - in some of the subfunctions
>*data is read from, in some - written to, in some - ignored. Worse
>yet, in some subfunctions we put structured data in there, in some -
>just a chunk of something.
>
>With all that, who had said that a year down the road we won't get a
>dozen of new syscalls hiding behind that one?
>
>Sorry, folks, but idea of private extendable syscall table (per-filesystem,
>no less) doesn't look like a good thing. That's _the_ reason why ioctl()
>is bad.
Al,
Out of interest, which access interface(s) would you like to see used?
Giving a few suggestions you would be happy with would be a lot easier on
anyone trying to develop a filesystem API than for them having to come up
with one after the other until one is found which you approve of... (-;
Best regards,
Anton
--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/
On Mon, Nov 12, 2001 at 07:32:18PM -0500, Alexander Viro wrote:
> Which means that converting permission() to vfsmount/dentry should be
> done first. And that's not hard to do.
It's just messy as it will require changes in all file systems.
> Sorry, folks, but idea of private extendable syscall table (per-filesystem,
> no less) doesn't look like a good thing. That's _the_ reason why ioctl()
> is bad.
Unless I'm badly misreading the patch the op switch() is fixed in VFS mapping
to clearly defined inode operations. It is not extensible per filesystem.
Arguably they could be split into individual syscalls, but it looks not more
like cosmetics at this point.
-Andi
On Tue, Nov 13, 2001 at 06:27:11AM +0100, Andi Kleen wrote:
> On Mon, Nov 12, 2001 at 07:32:18PM -0500, Alexander Viro wrote:
> > Which means that converting permission() to vfsmount/dentry should be
> > done first. And that's not hard to do.
>
> It's just messy as it will require changes in all file systems.
>
> > Sorry, folks, but idea of private extendable syscall table (per-filesystem,
> > no less) doesn't look like a good thing. That's _the_ reason why ioctl()
> > is bad.
>
> Unless I'm badly misreading the patch the op switch() is fixed in VFS mapping
> to clearly defined inode operations. It is not extensible per filesystem.
> Arguably they could be split into individual syscalls, but it looks not more
> like cosmetics at this point.
hi Al,
Below is an initial attempt at an interface which doesn't have a
command parameter at all, instead using separate syscalls as Andi
has suggested - is this closer to what you had in mind?
To prevent an exponential increase in the number of syscalls used,
it uses a flag parameter to distinguish similar operations and also
to combine the follow-symlink-or-not cases - it now uses six system
calls instead of the original three.
It also moves the individual VFS extended attribute operations out
into a separate ops vector for a little more interface clarity, not
sure if thats better/worse. The patch ignores the problem of using
dentry vs. inode for now.
Thanks for the suggestions - I'd be interested in any thoughts you
have on this one as well.
cheers.
--
Nathan
diff -Naur 2.4.14-pristine/arch/i386/kernel/entry.S 2.4.14-al/arch/i386/kernel/entry.S
--- 2.4.14-pristine/arch/i386/kernel/entry.S Sat Nov 3 12:18:49 2001
+++ 2.4.14-al/arch/i386/kernel/entry.S Tue Nov 13 15:54:55 2001
@@ -622,6 +622,12 @@
.long SYMBOL_NAME(sys_ni_syscall) /* Reserved for Security */
.long SYMBOL_NAME(sys_gettid)
.long SYMBOL_NAME(sys_readahead) /* 225 */
+ .long SYMBOL_NAME(sys_setxattr)
+ .long SYMBOL_NAME(sys_fsetxattr)
+ .long SYMBOL_NAME(sys_getxattr)
+ .long SYMBOL_NAME(sys_fgetxattr)
+ .long SYMBOL_NAME(sys_listxattr) /* 230 */
+ .long SYMBOL_NAME(sys_flistxattr)
.rept NR_syscalls-(.-sys_call_table)/4
.long SYMBOL_NAME(sys_ni_syscall)
diff -Naur 2.4.14-pristine/fs/Makefile 2.4.14-al/fs/Makefile
--- 2.4.14-pristine/fs/Makefile Tue Nov 6 08:40:59 2001
+++ 2.4.14-al/fs/Makefile Fri Nov 9 15:27:42 2001
@@ -14,7 +14,7 @@
super.o block_dev.o char_dev.o stat.o exec.o pipe.o namei.o \
fcntl.o ioctl.o readdir.o select.o fifo.o locks.o \
dcache.o inode.o attr.o bad_inode.o file.o iobuf.o dnotify.o \
- filesystems.o namespace.o
+ filesystems.o namespace.o extattr.o
ifeq ($(CONFIG_QUOTA),y)
obj-y += dquot.o
diff -Naur 2.4.14-pristine/fs/extattr.c 2.4.14-al/fs/extattr.c
--- 2.4.14-pristine/fs/extattr.c Thu Jan 1 10:00:00 1970
+++ 2.4.14-al/fs/extattr.c Tue Nov 13 15:52:51 2001
@@ -0,0 +1,184 @@
+/*
+ File: fs/extattr.c
+
+ Extended attribute handling.
+
+ Copyright (C) 2001 by Andreas Gruenbacher <[email protected]>
+ Copyright (C) 2001 SGI - Silicon Graphics, Inc <[email protected]>
+ */
+
+#include <linux/mm.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/smp_lock.h>
+#include <linux/extattr.h>
+
+
+/*
+ * Extended attribute SET operations
+ */
+static long
+setxattr(struct inode *i, char *name, void *value, size_t size, int flags)
+{
+ struct xattr_operations *ops;
+ int error = -EOPNOTSUPP;
+
+ lock_kernel();
+ ops = i->i_xop;
+ if (ops) {
+ if (flags & EA_CREATE) {
+ if (ops->create)
+ error = ops->create(i, name, value, size);
+ }
+ else if (flags & EA_REPLACE) {
+ if (ops->replace)
+ error = ops->replace(i, name, value, size);
+ }
+ else if (flags & EA_REMOVE) {
+ if (ops->remove)
+ error = ops->remove(i, name);
+ }
+ else if (ops->set)
+ error = ops->set(i, name, value, size);
+ }
+ unlock_kernel();
+ return error;
+}
+
+asmlinkage long
+sys_setxattr(char *path, char *name, void *value, size_t size, int flags)
+{
+ struct nameidata nd;
+ int error;
+
+ error = (flags & EA_NOFOLLOW)?
+ user_path_walk_link(path, &nd):
+ user_path_walk(path, &nd);
+ if (error)
+ return error;
+ error = setxattr(nd.dentry->d_inode, name, value, size, flags);
+ path_release(&nd);
+ return error;
+}
+
+asmlinkage long
+sys_fsetxattr(int fd, char *name, void *value, size_t size, int flags)
+{
+ struct file *f;
+ int error = -EBADF;
+
+ f = fget(fd);
+ if (!f)
+ return error;
+ error = setxattr(f->f_dentry->d_inode, name, value, size, flags);
+ fput(f);
+ return error;
+}
+
+
+/*
+ * Extended attribute GET operations
+ */
+static long
+getxattr(struct inode *i, char *name, void *value, size_t size, int flags)
+{
+ struct xattr_operations *ops;
+ int error = -EOPNOTSUPP;
+
+ lock_kernel();
+ ops = i->i_xop;
+ if (ops) {
+ if (flags & EA_SIZEONLY) {
+ if (ops->getsize)
+ error = ops->getsize(i, name);
+ }
+ else if (ops->get)
+ error = ops->get(i, name, value, size);
+ }
+ unlock_kernel();
+ return error;
+}
+
+asmlinkage long
+sys_getxattr(char *path, char *name, void *value, size_t size, int flags)
+{
+ struct nameidata nd;
+ int error;
+
+ error = (flags & EA_NOFOLLOW)?
+ user_path_walk_link(path, &nd):
+ user_path_walk(path, &nd);
+ if (error)
+ return error;
+ error = getxattr(nd.dentry->d_inode, name, value, size, flags);
+ path_release(&nd);
+ return error;
+}
+
+asmlinkage long
+sys_fgetxattr(int fd, char *name, void *value, size_t size, int flags)
+{
+ struct file *f;
+ int error = -EBADF;
+
+ f = fget(fd);
+ if (!f)
+ return error;
+ error = getxattr(f->f_dentry->d_inode, name, value, size, flags);
+ fput(f);
+ return error;
+}
+
+
+/*
+ * Extended attribute LIST operations
+ */
+static long
+listxattr(struct inode *i, char *name, void *value, size_t size, int flags)
+{
+ struct xattr_operations *ops;
+ int error = -EOPNOTSUPP;
+
+ lock_kernel();
+ ops = i->i_xop;
+ if (ops) {
+ if (flags & EA_SIZEONLY) {
+ if (ops->listsize)
+ error = ops->listsize(i, name);
+ }
+ else if (ops->list)
+ error = ops->list(i, name, value, size);
+ }
+ unlock_kernel();
+ return error;
+}
+
+asmlinkage long
+sys_listxattr(char *path, char *name, void *value, size_t size, int flags)
+{
+ struct nameidata nd;
+ int error;
+
+ error = (flags & EA_NOFOLLOW)?
+ user_path_walk_link(path, &nd):
+ user_path_walk(path, &nd);
+ if (error)
+ return error;
+ error = listxattr(nd.dentry->d_inode, name, value, size, flags);
+ path_release(&nd);
+ return error;
+}
+
+asmlinkage long
+sys_flistxattr(int fd, char *name, void *value, size_t size, int flags)
+{
+ struct file *f;
+ int error = -EBADF;
+
+ f = fget(fd);
+ if (!f)
+ return error;
+ error = listxattr(f->f_dentry->d_inode, name, value, size, flags);
+ fput(f);
+ return error;
+}
diff -Naur 2.4.14-pristine/fs/inode.c 2.4.14-al/fs/inode.c
--- 2.4.14-pristine/fs/inode.c Sat Sep 29 11:03:48 2001
+++ 2.4.14-al/fs/inode.c Tue Nov 13 16:11:15 2001
@@ -768,10 +768,12 @@
{
static struct address_space_operations empty_aops;
static struct inode_operations empty_iops;
+ static struct xattr_operations empty_xops;
static struct file_operations empty_fops;
memset(&inode->u, 0, sizeof(inode->u));
inode->i_sock = 0;
inode->i_op = &empty_iops;
+ inode->i_xop = &empty_xops;
inode->i_fop = &empty_fops;
inode->i_nlink = 1;
atomic_set(&inode->i_writecount, 0);
diff -Naur 2.4.14-pristine/include/asm-i386/unistd.h 2.4.14-al/include/asm-i386/unistd.h
--- 2.4.14-pristine/include/asm-i386/unistd.h Thu Oct 18 03:03:03 2001
+++ 2.4.14-al/include/asm-i386/unistd.h Tue Nov 13 15:53:21 2001
@@ -230,6 +230,12 @@
#define __NR_security 223 /* syscall for security modules */
#define __NR_gettid 224
#define __NR_readahead 225
+#define __NR_setxattr 226
+#define __NR_fsetxattr 227
+#define __NR_getxattr 228
+#define __NR_fgetxattr 229
+#define __NR_listxattr 230
+#define __NR_flistxattr 231
/* user-visible error numbers are in the range -1 - -124: see <asm-i386/errno.h> */
diff -Naur 2.4.14-pristine/include/linux/extattr.h 2.4.14-al/include/linux/extattr.h
--- 2.4.14-pristine/include/linux/extattr.h Thu Jan 1 10:00:00 1970
+++ 2.4.14-al/include/linux/extattr.h Tue Nov 13 15:43:08 2001
@@ -0,0 +1,18 @@
+/*
+ File: linux/extattr.h
+
+ Extended attributes handling.
+
+ Copyright (C) 2001 by Andreas Gruenbacher <[email protected]>
+ Copyright (C) 2001 SGI - Silicon Graphics, Inc <[email protected]>
+*/
+#ifndef _LINUX_EXTATTR_H
+#define _LINUX_EXTATTR_H
+
+#define EA_CREATE 0x0001 /* Set the value: fail if attr already exists */
+#define EA_REPLACE 0x0002 /* Set the value: fail if attr does not exist */
+#define EA_REMOVE 0x0004 /* Remove the named attribute entirely */
+#define EA_SIZEONLY 0x0008 /* Retrieve a buffer size don't write into it */
+#define EA_NOFOLLOW 0x0010 /* Don't follow symlinks when traversing path */
+
+#endif /* _LINUX_EXTATTR_H */
diff -Naur 2.4.14-pristine/include/linux/fs.h 2.4.14-al/include/linux/fs.h
--- 2.4.14-pristine/include/linux/fs.h Tue Nov 6 07:42:14 2001
+++ 2.4.14-al/include/linux/fs.h Tue Nov 13 15:30:39 2001
@@ -444,6 +444,7 @@
struct semaphore i_zombie;
struct inode_operations *i_op;
struct file_operations *i_fop; /* former ->i_op->default_file_ops */
+ struct xattr_operations *i_xop;
struct super_block *i_sb;
wait_queue_head_t i_wait;
struct file_lock *i_flock;
@@ -840,6 +841,17 @@
int (*revalidate) (struct dentry *);
int (*setattr) (struct dentry *, struct iattr *);
int (*getattr) (struct dentry *, struct iattr *);
+};
+
+struct xattr_operations {
+ int (*create) (struct inode *, char *, void *, size_t);
+ int (*replace) (struct inode *, char *, void *, size_t);
+ int (*remove) (struct inode *, char *);
+ int (*set) (struct inode *, char *, void *, size_t);
+ int (*get) (struct inode *, char *, void *, size_t);
+ int (*getsize) (struct inode *, char *);
+ int (*list) (struct inode *, char *, void *, size_t);
+ int (*listsize) (struct inode *, char *);
};
/*
On Nov 15, 2001 16:08 +1100, Nathan Scott wrote:
> + if (ops) {
> + if (flags & EA_CREATE) {
> + if (ops->create)
> + error = ops->create(i, name, value, size);
> + }
> + else if (flags & EA_REPLACE) {
> + if (ops->replace)
> + error = ops->replace(i, name, value, size);
> + }
> + else if (flags & EA_REMOVE) {
> + if (ops->remove)
> + error = ops->remove(i, name);
> + }
> + else if (ops->set)
> + error = ops->set(i, name, value, size);
> + }
> + int (*create) (struct inode *, char *, void *, size_t);
> + int (*replace) (struct inode *, char *, void *, size_t);
> + int (*set) (struct inode *, char *, void *, size_t);
What is the distinction between "set" and "replace" or "set" and "create"?
Is it analogous to open(,O_CREAT|O_EXCL)? If so, why are there not just
flags to distinguish the two, but also separate VFS operations?
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
hi Andreas,
On Wed, Nov 14, 2001 at 11:01:34PM -0700, Andreas Dilger wrote:
> On Nov 15, 2001 16:08 +1100, Nathan Scott wrote:
>
> > + int (*create) (struct inode *, char *, void *, size_t);
> > + int (*replace) (struct inode *, char *, void *, size_t);
> > + int (*set) (struct inode *, char *, void *, size_t);
>
> What is the distinction between "set" and "replace" or "set" and "create"?
>
+#define EA_CREATE 0x0001 /* Set the value: fail if attr already exists */
+#define EA_REPLACE 0x0002 /* Set the value: fail if attr does not exist */
Whereas "set" is simply set the named attribute value, creating the
attribute if need be, replacing the value if the attribute exists,
and then return success.
The man pages on AndreasG's site go into more detail, but decribe
the original interface which Al didn't like. I haven't created an
updated man page yet, because I'm still not sure if this interface
is quite what Al was looking for either... (Al?)
> ... why are there not just
> flags to distinguish the two, but also separate VFS operations?
A VFS flags parameter could allow an individual filesystem to extend
the semantics of the set operation in new ways by adding new flags -
I think Al wanted to avoid the possibility of that ever happening,
which seems fair enough.
cheers.
--
Nathan
Hi, sorry for jumping into this a little late, but...
On November 16, 2001 12:18 am, Nathan Scott wrote:
> > What is the distinction between "set" and "replace" or "set" and "create"?
>
> +#define EA_CREATE 0x0001 /* Set the value: fail if attr already exists */
> +#define EA_REPLACE 0x0002 /* Set the value: fail if attr does not exist */
>
> Whereas "set" is simply set the named attribute value, creating the
> attribute if need be, replacing the value if the attribute exists,
> and then return success.
What is the purpose of these distinctions? Does anyone rely on them? Do such
distinctions exist in an existing implementation?
Thanks.
Daniel
hi Daniel,
On Mon, Dec 03, 2001 at 01:07:13AM +0100, Daniel Phillips wrote:
> Hi, sorry for jumping into this a little late, but...
>
No problem. BTW, we have reworked the interfaces once more and
will send out the latest revision in the next couple of days -
it does away with commands and flags completely, except for this
one instance of flags in the set operation...
> On November 16, 2001 12:18 am, Nathan Scott wrote:
> > > What is the distinction between "set" and "replace" or "set" and "create"?
> >
> > +#define EA_CREATE 0x0001 /* Set the value: fail if attr already exists */
> > +#define EA_REPLACE 0x0002 /* Set the value: fail if attr does not exist */
> >
> > Whereas "set" is simply set the named attribute value, creating the
> > attribute if need be, replacing the value if the attribute exists,
> > and then return success.
>
> What is the purpose of these distinctions? Does anyone rely on them? Do such
> distinctions exist in an existing implementation?
>
The purpose is to provide user tools with more control over the
creation or updating of an attribute and its value. I don't think
the replace flag is very widely used, but I have seen the create
flag used in places - eg. the XFS fsr tool uses that flag.
The IRIX extended attribute interfaces provide these flags - Andreas
has also examined the implementations, man pages, etc, of several other
operating systems, so he'll be able to tell us if any others provide
this sort of thing too.
cheers.
--
Nathan
hi Daniel,
On Mon, Dec 03, 2001 at 03:52:58PM +0100, Daniel Phillips wrote:
> On December 3, 2001 01:54 am, Nathan Scott wrote:
> > ...BTW, we have reworked the interfaces once more and
> > will send out the latest revision in the next couple of days -
> > it does away with commands and flags completely, except for this
> > one instance of flags in the set operation...
>
> OK, well I can see some patterns emerging already:
>
> long sys_getxattr(char *path, char *name, void *value, size_t size, int flags)
> long sys_setxattr(char *path, char *name, void *value, size_t size, int flags)
> long sys_listxattr(char *path, char *name, void *value, size_t size, int flags)
Not sure where you got those from (one of the early patches?), but
it looks more like this now:
long sys_setxattr(char *path, char *name, void *value, size_t size, int flags);
long sys_getxattr(char *path, char *name, void *value, size_t size);
long sys_listxattr(char *path, char *list, size_t size);
> Why don't I see 'delxattr'?
>
Hmm.. good question. I had been achieving this through a flag to `set',
but was never real happy about that - I guess a separate remove operation
is the cleaner interface.
> Why is there a need for separate 'path' and 'fd' variants?
Applications - the ready example is the POSIX ACL API, which specifies
path and file descriptor variants of its functions.
> <nit>Is there any other kind of 'attr' in the syscall interface?
> Why not spell it 'attr' instead of 'xaddr'? How about geta, seta,
> dela, lista?</nit>
We have "attr" inode_operations in the VFS, so the idea is to add...
int (*revalidate) (struct dentry *);
int (*setattr) (struct dentry *, struct iattr *);
int (*getattr) (struct dentry *, struct iattr *);
+ int (*setxattr) (struct dentry *, char *, void *, size_t, int);
+ int (*getxattr) (struct dentry *, char *, void *, size_t);
+ int (*listxattr) (struct dentry *, char *, size_t);
};
And I think I'll now separate out the remove operation too -
something like:
+ int (*removexattr) (struct dentry *, char *);
The system call names have been kept consistent with this VFS
naming convention.
> The idea of attribute class (namespace) isn't explicitly accomodated.
> I presume the intention is to encode the class as part of the
> attribute name and have the filesystem or vfs parse it out.
That's correct.
> Is that such a good idea?
It simplifies things. It has worked for Andreas for awhile (its an
idea directly from his current implementation), and I don't have any
problems with it - it has proven quite simple to implement in XFS.
> Why not pass the
> class explicitly and worry about the namespace parsing in user space?
>
> As far as listing attributes goes, is there ever a reason to list system
> and user attributes at the same time?
For this area of the design, we stuck with Andreas' implementation.
I suspect one of the main reasons Andreas did it this way is to be
able to get all attribute names at once - for efficiency in backup
programs, simplicity in file manager type programs, etc.
> IOW, the listxattr call needs a class
> parameter too. It doesn't name a 'name', at least if you accept my
> argument that the class should not be parsed inside the kernel.
Not sure I do - we had an initial version which separated name
and namespace, but we eventually scrapped it. In particular, we
ended up wanting this all-at-once list operation, which becomes
much simpler when using a "fully-qualified" name approach.
> There's no particular
> reason to force all the parameter lists to be the same is there?
No, I think you're looking at an early, draft version of the API.
See the XFS CVS tree - cmd/attr2/man/* in particular. I'll post
new patches here soon, just waiting to hear back from Andreas on
a couple of issues (he's been away from mail for a few days).
cheers.
--
Nathan
On December 3, 2001 01:54 am, Nathan Scott wrote:
> ...BTW, we have reworked the interfaces once more and
> will send out the latest revision in the next couple of days -
> it does away with commands and flags completely, except for this
> one instance of flags in the set operation...
OK, well I can see some patterns emerging already:
long sys_getxattr(char *path, char *name, void *value, size_t size, int flags)
long sys_setxattr(char *path, char *name, void *value, size_t size, int flags)
long sys_listxattr(char *path, char *name, void *value, size_t size, int flags)
long sys_fgetxattr(int fd, char *name, void *value, size_t size, int flags)
long sys_fsetxattr(int fd, char *name, void *value, size_t size, int flags)
long sys_flistxattr(int fd, char *name, void *value, size_t size, int flags)
Why don't I see 'delxattr'?
Why is there a need for separate 'path' and 'fd' variants?
<nit>Is there any other kind of 'attr' in the syscall interface? Why not spell
it 'attr' instead of 'xaddr'? How about geta, seta, dela, lista?</nit>
The idea of attribute class (namespace) isn't explicitly accomodated. I presume
the intention is to encode the class as part of the attribute name and have the
filesystem or vfs parse it out. Is that such a good idea? Why not pass the
class explicitly and worry about the namespace parsing in user space?
As far as listing attributes goes, is there ever a reason to list system and
user attributes at the same time? IOW, the listxattr call needs a class
parameter too. It doesn't name a 'name', at least if you accept my argument
that the class should not be parsed inside the kernel. There's no particular
reason to force all the parameter lists to be the same is there?
--
Daniel