Linus,
I would like to reserve a block of 32 ioctl's for the JFS filesystem.
Thank you.
Dave Kleikamp
--- linux/Documentation/ioctl-number.txt-orig Tue Feb 13 16:13:42 2001
+++ linux/Documentation/ioctl-number.txt Thu Mar 22 14:53:40 2001
@@ -86,6 +86,7 @@
'F' all linux/fb.h
'I' all linux/isdn.h
'J' 00-1F drivers/scsi/gdth_ioctl.h
+'J' 20-3F linux/jfs_fs.h
'K' all linux/kd.h
'L' 00-1F linux/loop.h
'L' E0-FF linux/ppdd.h encrypted disk device driver
On Thu, 22 Mar 2001, Dave Kleikamp wrote:
> Linus,
> I would like to reserve a block of 32 ioctl's for the JFS filesystem.
Details, please? More specifically, what kind of objects are these ioctls
applied to?
Alexander Viro wrote:
>
> On Thu, 22 Mar 2001, Dave Kleikamp wrote:
>
> > Linus,
> > I would like to reserve a block of 32 ioctl's for the JFS filesystem.
>
> Details, please? More specifically, what kind of objects are these ioctls
> applied to?
I don't have all the details worked out yet, but the utilities to extend
and defragment the filesystem will operate on a live volume, so the
utilities will need to talk to the filesystem to move blocks, extend the
block map, etc.
The utilities will probably open the root directory and apply the ioctls
to it, unless there is a better way to do it.
--
David Kleikamp
IBM Linux Technology Center
[cc'd to fsdevel, since trick described below may be of interest for other
folks]
On Thu, 22 Mar 2001, Dave Kleikamp wrote:
> Alexander Viro wrote:
> >
> > On Thu, 22 Mar 2001, Dave Kleikamp wrote:
> >
> > > Linus,
> > > I would like to reserve a block of 32 ioctl's for the JFS filesystem.
> >
> > Details, please? More specifically, what kind of objects are these ioctls
> > applied to?
>
> I don't have all the details worked out yet, but the utilities to extend
> and defragment the filesystem will operate on a live volume, so the
> utilities will need to talk to the filesystem to move blocks, extend the
> block map, etc.
>
> The utilities will probably open the root directory and apply the ioctls
> to it, unless there is a better way to do it.
There is - and it may simplify your life, actually. Here's what can be
done:
a) Use _two_ DECLARE_FSTYPE in your filesystem. Say it, for JFS
(normal) and jfsmeta. Make it FS_LITTER one.
b) let jfsmeta_read_super() take a pathname as an option (say it,
require "-o jfsroot=<some_path>")
c) let it do lookup on that pathname and verify that it's on JFS
error = 0;
if (path_init(jfsroot, LOOKUP_FOLLOW|LOOKUP_POSITIVE, &nd))
error = path_walk(jfsroot, &nd);
if (error)
/* fail */
if (nd.mnt->mnt_sb->s_type != &jfs_fs_type)
/* fail */
d) store the reference to nd.mnt in superblock.
e) Allocate root dentry (as usual) and whatever files you want
(just pull the example from fs/binfmt_misc.c in -ac).
f) Make read/write on these files whatever you want them to do -
you can access the superblock of "master" JFS (see below)
via the saved value of nd.mnt (d).
g) in jfsmeta_put_super() do mntput() on pointer saved in (d).
How it can be used? Well, say it you've mounted JFS on /usr/local
% mount -t jfsmeta none /mnt -o jfsroot=/usr/local
% ls /mnt
stats control bootcode whatever_I_bloody_want
% cat /mnt/stats
master is on /usr/local
fragmentation = 5%
696942 reads, yodda, yodda
% echo "defrag 69 whatever 42 13" > /mnt/control
% umount /mnt
That may look like an overkill, however
* You can get rid of any need to register ioctls, etc.
* You can add debugging/whatever at any moment with no need to
update any utilities - everything is available from plain shell
* You can conveniently view whatever metadata you want - no need to
shove everything into ioctls on one object.
* You can use normal permissions control - just set appropriate
permission bits for objects on jfsmeta
IOW, you can get normal filesystem view (meaning that you have all usual
UNIX toolkit available) for per-fs control stuff. And keep the ability to
do proper locking - it's the same driver that handles the main fs and you
have access to superblock. No need to change the API - everything is already
there...
I'll post an example patch for ext2 (safe access to superblock,
group descriptors, inode table and bitmaps on a live fs) after this weekend
(== when misc shit will somewhat settle down).
Cheers,
Al
PS: Folks[1], I hope it explains why I'm very sceptical about "let's add new
A{B,P}I" sort of ideas - approach above can be used for almost all stuff
I've seen proposed. You can have multiple views of the same object. And
have all of them available via normal API.
[1] Hans, in particular ;-) Basically, that's the idea you keep mentioning -
_everything can be represented as fs_. Take it one step further and you'll
get _and the beauty of that is in the fact that you don't need new tools
to use the thing - generic ones are fine_
Al, you write:
> * You can get rid of any need to register ioctls, etc.
> * You can add debugging/whatever at any moment with no need to
> update any utilities - everything is available from plain shell
> * You can conveniently view whatever metadata you want - no need to
> shove everything into ioctls on one object.
> * You can use normal permissions control - just set appropriate
> permission bits for objects on jfsmeta
>
> IOW, you can get normal filesystem view (meaning that you have all usual
> UNIX toolkit available) for per-fs control stuff. And keep the ability to
> do proper locking - it's the same driver that handles the main fs and you
> have access to superblock. No need to change the API - everything is already
> there...
> I'll post an example patch for ext2 (safe access to superblock,
> group descriptors, inode table and bitmaps on a live fs) after this weekend
> (== when misc shit will somewhat settle down).
I look forward to seeing the ext2 code. I was just in the process of
adding ioctls to ext3 to do online resizing within transactions. Maybe
I'll rather use this interface if it looks good. Will it work on 2.2,
or does it depend too much on new VFS?
Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
On Thu, 22 Mar 2001, Andreas Dilger wrote:
> I look forward to seeing the ext2 code. I was just in the process of
> adding ioctls to ext3 to do online resizing within transactions. Maybe
> I'll rather use this interface if it looks good. Will it work on 2.2,
> or does it depend too much on new VFS?
Should be OK, except that with 2.2 you'll need to get hold on dentry
from original fs instead of vfsmount (unfortunately, that can't be
done same way for both - 2.2 doesn't have vfsmount tree and in 2.4
holding dentry without holding vfsmount is a one-way ticket to hell -
umount() will be unhappy). However, version-dependent part should be
very small. Something like
struct superblock *grab_super(name, p)
char *name;
void **p;
{
struct nameidata nd;
int err = 0;
*p = NULL;
if (path_init(name, 0, &nd))
err = path_walk(name, &nd);
if (err)
return ERR_PTR(err);
*p = mntget(nd.mnt);
path_release(&nd);
return (*(struct vfsmount**)p)->mnt_sb;
}
for 2.4 and
struct superblock *grab_super(name, p)
char *name;
void **p;
{
int err = 0;
struct dentry *dentry;
*p = NULL;
dentry = lookup_dentry(name, NULL, 0);
if (IS_ERR(dentry))
return (struct super_block*)dentry;
*p = dentry;
return dentry->d_sb;
}
for 2.2 should do the trick -
master_sb = grab_super(ext2root, &sb->u.ext2meta_sb.holder);
if (IS_ERR(matser_sb))
/* fail */
sb->u.ext2meta_sb.master = master_sb;
...
should be OK (to release the master sb you'll need to do dput() or mntput()
of sb->....holder in 2.2 and 2.4 resp., indeed). I wouldn't try to port
that to 2.0, though...
Cheers,
Al
Alexander Viro <[email protected]> writes:
> IOW, you can get normal filesystem view (meaning that you have all usual
> UNIX toolkit available) for per-fs control stuff. And keep the ability to
> do proper locking - it's the same driver that handles the main fs and you
> have access to superblock. No need to change the API - everything is already
> there...
> I'll post an example patch for ext2 (safe access to superblock,
> group descriptors, inode table and bitmaps on a live fs) after this weekend
> (== when misc shit will somewhat settle down).
> Cheers,
> Al
>
> PS: Folks[1], I hope it explains why I'm very sceptical about "let's add new
> A{B,P}I" sort of ideas - approach above can be used for almost all stuff
> I've seen proposed. You can have multiple views of the same object. And
> have all of them available via normal API.
This is a cool idea. But I couple of places where this might fall down.
1) If a filesystem has multiple name spaces and we use different mounts
to handle them, will this break anything? Fat32 with it's short and long
names, and the Novell filesystem are the examples I can think of.
2) An API is still being developed it just uses the existing infrastructure
which is good, but we still need to standardize what is exported. It's
a cleaner way to build a new API but a new API is being built.
3) What is a safe way to do this so a non-root user can call mount?
4) What is appropriate way using open,read,write,close,mount to handle stat data
and extended attributes. The stat data is the big one because it is used
so frequently. Possibly a mount&open&read/write&close&umount syscall is needed.
I keep thinking open("/path/to/file/stat_data") but that feels excessively heavy
at the API level. But if we involve mount (at least semantically)
that could work for directories as well.
The goal here is to push your ideas to the limits so we can where
using ioctl or new a syscall is appropriate. If indeed there is such
a case.
Eric
Al,
I didn't know that creating file system ioctl's was such a hot topic.
Since the functions I want to implement are for a very specific purpose
(I don't expect anything except the JFS utilities to invoke them), I
would expect an ioctl to be an appropriate vehicle.
If not ioctl's, why not procfs? Your proposal is that each filesystem
implements its own mini-procfs. What's the advantage of that?
My intentions so far are to use ioctl's for specific operations required
by the JFS utilities, and procfs for debugging output, tuning, and
anything else that make sense.
Alexander Viro wrote:
> That may look like an overkill, however
> * You can get rid of any need to register ioctls, etc.
This is a one-time thing
> * You can add debugging/whatever at any moment with no need to
> update any utilities - everything is available from plain shell
We can do this with procfs right now.
> * You can conveniently view whatever metadata you want - no need to
> shove everything into ioctls on one object.
Again, why re-invent procfs? We could put this under
/proc/fs/jfs/metadata.
> * You can use normal permissions control - just set appropriate
> permission bits for objects on jfsmeta
>
> IOW, you can get normal filesystem view (meaning that you have all usual
> UNIX toolkit available) for per-fs control stuff. And keep the ability to
> do proper locking - it's the same driver that handles the main fs and you
> have access to superblock. No need to change the API - everything is already
> there...
I'm not sure how a utility would make atomic changes to several pieces
of metadata. The underlying fs code would protect the integrity of
every metadata "file", but changes to more than one of these "files"
would not be done as a group without some additional locking that would
have to be coordinated between the utility and the fs. This kind of
thing could be handled by writing some special command to a
"command-processor" type file, but why is this better than an ioctl?
--
David Kleikamp
IBM Linux Technology Center
How it can be used? Well, say it you've mounted JFS on /usr/local
>% mount -t jfsmeta none /mnt -o jfsroot=/usr/local
>% ls /mnt
>stats control bootcode whatever_I_bloody_want
>% cat /mnt/stats
>master is on /usr/local
>fragmentation = 5%
>696942 reads, yodda, yodda
>% echo "defrag 69 whatever 42 13" > /mnt/control
>% umount /mnt
There's a lot of cool simplicity in this, both in implementation and
application, but it leaves something to be desired in functionality. This
is partly because the price you pay for being able to use existing,
well-worn Unix interfaces is the ancient limitations of those interfaces
-- like the inability to return adequate error information.
Specifically, transactional stuff looks really hard in this method.
If I want the user to know why his "defrag" command failed, how would I
pass that information back to him? What if I want to warn him of of a
filesystem inconsistency I found along the way? Or inform him of how
effective the defrag was? And bear in mind that multiple processes may be
issuing commands to /mnt/control simultaneously.
With ioctl, I can easily match a response of any kind to a request. I can
even return an English text message if I want to be friendly.
On Fri, Mar 23, 2001 at 09:56:47AM -0700, Bryan Henderson wrote:
> There's a lot of cool simplicity in this, both in implementation and
> application, but it leaves something to be desired in functionality. This
> is partly because the price you pay for being able to use existing,
> well-worn Unix interfaces is the ancient limitations of those interfaces
> -- like the inability to return adequate error information.
hmm... open("defrag-error") first, then read from it if it fails?
> effective the defrag was? And bear in mind that multiple processes may be
> issuing commands to /mnt/control simultaneously.
you should probably serialise them. you probably have to do this anyway.
> With ioctl, I can easily match a response of any kind to a request. I can
> even return an English text message if I want to be friendly.
yes, one of the nice plan9 changes was the change to returning strings
instead of numerics.
--
Revolutions do not require corporate support.
On Fri, 23 Mar 2001, Bryan Henderson wrote:
> How it can be used? Well, say it you've mounted JFS on /usr/local
> >% mount -t jfsmeta none /mnt -o jfsroot=/usr/local
> >% ls /mnt
> >stats control bootcode whatever_I_bloody_want
> >% cat /mnt/stats
> >master is on /usr/local
> >fragmentation = 5%
> >696942 reads, yodda, yodda
> >% echo "defrag 69 whatever 42 13" > /mnt/control
> >% umount /mnt
>
> There's a lot of cool simplicity in this, both in implementation and
> application, but it leaves something to be desired in functionality. This
> is partly because the price you pay for being able to use existing,
> well-worn Unix interfaces is the ancient limitations of those interfaces
> -- like the inability to return adequate error information.
I can imagine a solution to this using the _same_ method - extend
/proc/*/ with a new entry (say, trace) for dumping errors. Put data
in there from every failing function in your code. Normally, this
will not introduce overheads (not unless you use error conditions to
pass on useful information), however, in case of errors, you can
get the backtrace (together with any info you want to put in there)
immediately.
> Specifically, transactional stuff looks really hard in this method.
> If I want the user to know why his "defrag" command failed, how would I
> pass that information back to him? What if I want to warn him of of a
> filesystem inconsistency I found along the way? Or inform him of how
> effective the defrag was? And bear in mind that multiple processes may be
> issuing commands to /mnt/control simultaneously.
That's all up to you. Informational messages can go to /proc.
Transactions/serialization can be done in your filesystem's
implementation. Maybe glibc guys would even want to extend
strerror() to handle these cases?
>
> With ioctl, I can easily match a response of any kind to a request. I can
> even return an English text message if I want to be friendly.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
>
Cheers,
Pjotr Kourzanov
On Fri, 23 Mar 2001, Matthew Wilcox wrote:
> On Fri, Mar 23, 2001 at 09:56:47AM -0700, Bryan Henderson wrote:
> > There's a lot of cool simplicity in this, both in implementation and
> > application, but it leaves something to be desired in functionality. This
> > is partly because the price you pay for being able to use existing,
> > well-worn Unix interfaces is the ancient limitations of those interfaces
> > -- like the inability to return adequate error information.
>
> hmm... open("defrag-error") first, then read from it if it fails?
Or just do
(echo $cmd; read reply) <file >&0
and make write() queue a reply. Yup, on the struct file used for write().
You _will_ need serialization for operations themselves, but for getting
replies... Not really.
> > With ioctl, I can easily match a response of any kind to a request. I can
> > even return an English text message if I want to be friendly.
So you can with read(). You know, the function that is intended to be used for
reading stuff into user-supplied buffer...
In article <OF791BBBC5.E3FCBEEE-ON87256A18.005BA3B7@LocalDomain> you write:
>With ioctl, I can easily match a response of any kind to a request. I can
>even return an English text message if I want to be friendly.
But ioctl requires allocation of numbers. Ugly and hard to scale.
Alex Viro's idea is cleaner, but still requires a fair amount of
coding even for simple interfaces.
Why not have a kernel thread and use standard RPC techniques like
sockets? Then you'd not have to invent anything unimportant like
Yet Another IPC Technique.
--
Chip Salzenberg - a.k.a. - <[email protected]>
"We have no fuel on board, plus or minus 8 kilograms." -- NEAR tech
On Sun, 01 Apr 2001 01:01:59 -0800,
[email protected] (Chip Salzenberg) wrote:
>In article <OF791BBBC5.E3FCBEEE-ON87256A18.005BA3B7@LocalDomain> you write:
>Why not have a kernel thread and use standard RPC techniques like
>sockets? Then you'd not have to invent anything unimportant like
>Yet Another IPC Technique.
kerneld (kmod's late unlamented predecessor) used to use Unix sockets
to communicate from the kernel to the daemon. It forced everybody to
link Unix sockets into the kernel but there are some people out there
who want to use it as a module. Also the kernel code for communicating
with kerneld was "unpleasant", see ipc/msg.c in a 2.0 kernel.
[email protected] (Chip Salzenberg) wrote on 01.04.01 in <E14jdkF-0007Ps-00@tytlal>:
> Why not have a kernel thread and use standard RPC techniques like
> sockets? Then you'd not have to invent anything unimportant like
> Yet Another IPC Technique.
You can, of course, transfer the exact same RPC messages over a file
descriptor on your metadata fs. It doesn't *have* to be ASCII, especially
not for purely internal-use interfaces.
And for ioctl() fans, you could transfer the exact same data via read()/
write() again. That's not significantly harder. Especially if you write a
wrapper around the calls. If you want to be perverse, you can probably
even transmit user space pointers.
But I suspect there are really only two generally useful interfaces:
1. A text based interface for generally-useful stuff you might want to
manipulate from the shell, or random user programs. (From the shell _is_
random user programs.)
2. A RPC based interface for tightly-coupled fs utilities. (I don't know
off the top of my head what the kernel already has - ISTR networking has
_something_.)
Don't forget a version marker of some kind. Sooner or later, you'll be
glad you have it.
MfG Kai
According to [email protected]:
>[email protected] (Chip Salzenberg) wrote:
>>Why not have a kernel thread and use standard RPC techniques like
>>sockets? Then you'd not have to invent anything unimportant like
>>Yet Another IPC Technique.
>
>kerneld (kmod's late unlamented predecessor) used to use Unix sockets
>to communicate from the kernel to the daemon. It forced everybody to
>link Unix sockets into the kernel but there are some people out there
>who want to use it as a module. Also the kernel code for communicating
>with kerneld was "unpleasant", see ipc/msg.c in a 2.0 kernel.
I see.
On the other hand, file-style (e.g. /proc-style) access works in Plan9
at least inpart because each client makes his own connection to the
server. Thus, the question of how clients know which response is for
them is trivially solved. ('Server' would in this case be the JFS
kernel thread.)
Sockets are apparently not the right way to go about getting
transaction support for kernel threads.
AFAIK, Alex Viro's idea of bindable namespaces provides effective
transaction support *ONLY* if there are per-process bindings. With
per-process bindings, each client that opens a connection does so
through a distinct binding; when that client's responses go back
through the same binding, only that client can see them.
I hope that Alex's namespaces patch, implementing per-process
bindings, goes into the official kernel Real Soon Now.
--
Chip Salzenberg - a.k.a. - <[email protected]>
"We have no fuel on board, plus or minus 8 kilograms." -- NEAR tech
[email protected] (Chip Salzenberg) writes:
> AFAIK, Alex Viro's idea of bindable namespaces provides effective
> transaction support *ONLY* if there are per-process bindings. With
> per-process bindings, each client that opens a connection does so
> through a distinct binding; when that client's responses go back
> through the same binding, only that client can see them.
Not really. We can both open /proc/partitions, read one char at a
time, and the kernel won't confuse our read positions. Different
file opens create different instances of state. See struct file,
void *private_data for how to store arbitrary data.
--
tv@{{hq.yok.utu,havoc,gaeshido}.fi,{debian,wanderer}.org,stonesoft.com}
unix, linux, debian, networks, security, | First snow, then silence.
kernel, TCP/IP, C, perl, free software, | This thousand dollar screen dies
mail, www, sw devel, unix admin, hacks. | so beautifully.