This is the third patch of a series that will eventually remove the
filesystem-specific includes, the struct inode union and the struct
super_block union from linux/fs.h. See part 1 of this series for a
description of the approach.
This patch changes the ext2 filesystem declaration, giving ext2 its own
private inode cache. The <ext2_fs_i> include is removed from linus/fs.h and
moved to ext2_fs.h. The ext2_inode_info member of the fs.h inode union is
removed. With the union out of the picture, ext2 inodes are somewhat
smaller,
so the practical effect of this is a small saving of cache memory.
If this approach meets with general approval, I will procede with the removal
of the ext2 member of the super_block union.
To apply this patch:
cd /your/source/tree
cat this/patch | patch -p0
I strongly recommended that this patch not be tested on a system containing
valuable data - user mode linux is the way to go.
--
Daniel
--- ../2.4.16.uml.clean/fs/ext2/super.c Sun Dec 30 10:15:56 2001
+++ ./fs/ext2/super.c Mon Dec 31 02:37:17 2001
@@ -798,7 +798,9 @@
return 0;
}
-static DECLARE_FSTYPE_DEV(ext2_fs_type, "ext2", ext2_read_super);
+static FILESYSTEM(ext2_fs_type, "ext2", ext2_read_super,
+ sizeof (struct ext2_sb_info),
+ sizeof (struct ext2_inode_info));
static int __init init_ext2_fs(void)
{
--- ../2.4.16.uml.clean/include/linux/ext2_fs.h Mon Dec 31 02:51:14 2001
+++ ./include/linux/ext2_fs.h Mon Dec 31 02:38:25 2001
@@ -17,6 +17,7 @@
#define _LINUX_EXT2_FS_H
#include <linux/types.h>
+#include <linux/ext2_fs_i.h>
/*
* The second extended filesystem constants/structures
--- ../2.4.16.uml.clean/include/linux/fs.h Mon Dec 31 02:51:14 2001
+++ ./include/linux/fs.h Mon Dec 31 02:39:51 2001
@@ -288,7 +288,6 @@
#include <linux/pipe_fs_i.h>
#include <linux/minix_fs_i.h>
-#include <linux/ext2_fs_i.h>
#include <linux/ext3_fs_i.h>
#include <linux/hpfs_fs_i.h>
#include <linux/ntfs_fs_i.h>
@@ -479,7 +478,6 @@
__u32 i_generation;
union {
struct minix_inode_info minix_i;
- struct ext2_inode_info ext2_inode_info;
struct ext3_inode_info ext3_i;
struct hpfs_inode_info hpfs_i;
struct ntfs_inode_info ntfs_i;
On Mon, 31 Dec 2001, Daniel Phillips wrote:
>
> This is the third patch of a series that will eventually remove the
> filesystem-specific includes, the struct inode union and the struct
> super_block union from linux/fs.h. See part 1 of this series for a
> description of the approach.
I really don't like this:
> -static DECLARE_FSTYPE_DEV(ext2_fs_type, "ext2", ext2_read_super);
> +static FILESYSTEM(ext2_fs_type, "ext2", ext2_read_super,
> + sizeof (struct ext2_sb_info),
> + sizeof (struct ext2_inode_info));
for two reasons:
- horrible layout
- bad type information
The first would be easy to fix (just put everything on a line of it's
own), but the second is harder: I don't like having two integers that can
easily be mixed up, and where order matters.
How about using a descriptor structure instead of the macro, and making
the filesystem declaration syntax look more like
static struct file_system_type ext2_descriptor = {
owner: THIS_MODULE,
fs_flags: FS_REQUIRES_DEV,
name: "ext2",
read_super: ext2_read_super,
sb_size: sizeof(ext2_sb_info),
inode_size: sizeof(struct ext2_inode_info)
};
which is more readable, and inherently documents _what_ those things are.
In short, I think the whole DECLARE_FSTYPE thing was a mistake, just to
avoid having people do the owner/flags thing. Doing the initialization
explicitly ends up being more readable, I think.
(If you _want_ to use a macro, you can do something like
#define DECLARE_FS(var,type, a...) \
struct file_system_type var = { \
owner: THIS_MODULE, \
fs_flags: FS_REQUIRES_DEV \
name: type,
,##a };
and then you do
DECLARE_FS(ext2_descriptor, "ext2",
read_super: ext2_read_super,
sb_size: sizeof(ext2_sb_info),
inode_size: sizeof(struct ext2_inode_info)
);
which is kind of half-way between the two formats.
What do you think?
Linus
Em Mon, Dec 31, 2001 at 10:16:14AM -0800, Linus Torvalds escreveu:
> How about using a descriptor structure instead of the macro, and making
> the filesystem declaration syntax look more like
>
> static struct file_system_type ext2_descriptor = {
> owner: THIS_MODULE,
> fs_flags: FS_REQUIRES_DEV,
> name: "ext2",
> read_super: ext2_read_super,
> sb_size: sizeof(ext2_sb_info),
> inode_size: sizeof(struct ext2_inode_info)
> };
>
> which is more readable, and inherently documents _what_ those things are.
Agreed, this is how the other structs of similar purpose (proto_opts,
net_proto_family, etc) are initialized.
- Arnaldo
On December 31, 2001 07:16 pm, Linus Torvalds wrote:
> How about using a descriptor structure instead of the macro, and making
> the filesystem declaration syntax look more like
>
> static struct file_system_type ext2_descriptor = {
> owner: THIS_MODULE,
> fs_flags: FS_REQUIRES_DEV,
> name: "ext2",
> read_super: ext2_read_super,
> super_size: sizeof(ext2_sb_info),
^^^^^---> changed sb to super, lines up better
> inode_size: sizeof(struct ext2_inode_info)
> };
>
> which is more readable, and inherently documents _what_ those things are.
>
> [snip halfway macro format]
>
> What do you think?
It's much nicer, here's (1 of 3) again, with the Linus-style ext2_fs
declaration.
--
Daniel
--- ../2.4.16.uml.clean/arch/um/fs/hostfs/hostfs_kern.c Mon Dec 31 03:15:49 2001
+++ ./arch/um/fs/hostfs/hostfs_kern.c Mon Dec 31 18:28:57 2001
@@ -378,7 +378,7 @@
char *name;
int type, err = 0, rdev;
- inode = get_empty_inode();
+ inode = get_empty_inode(sb);
if(inode == NULL) return(NULL);
inode->u.hostfs_i.host_filename = NULL;
inode->u.hostfs_i.fd = -1;
@@ -395,7 +395,6 @@
kfree(name);
}
else type = HOSTFS_DIR;
- inode->i_sb = sb;
if(type == HOSTFS_SYMLINK)
inode->i_op = &page_symlink_inode_operations;
--- ../2.4.16.uml.clean/drivers/block/rd.c Mon Dec 31 02:55:36 2001
+++ ./drivers/block/rd.c Mon Dec 31 18:28:57 2001
@@ -673,7 +673,7 @@
#endif
ram_device = MKDEV(MAJOR_NR, unit);
- if ((inode = get_empty_inode()) == NULL)
+ if ((inode = get_empty_inode(NULL)) == NULL)
return;
memset(&infile, 0, sizeof(infile));
memset(&in_dentry, 0, sizeof(in_dentry));
@@ -683,7 +683,7 @@
infile.f_op = &def_blk_fops;
init_special_inode(inode, S_IFBLK | S_IRUSR, kdev_t_to_nr(device));
- if ((out_inode = get_empty_inode()) == NULL)
+ if ((out_inode = get_empty_inode(NULL)) == NULL)
goto free_inode;
memset(&outfile, 0, sizeof(outfile));
memset(&out_dentry, 0, sizeof(out_dentry));
--- ../2.4.16.uml.clean/fs/ext2/super.c Sun Dec 30 10:15:56 2001
+++ ./fs/ext2/super.c Mon Dec 31 19:19:01 2001
@@ -798,16 +798,23 @@
return 0;
}
-static DECLARE_FSTYPE_DEV(ext2_fs_type, "ext2", ext2_read_super);
-
+static struct file_system_type ext2_fs = {
+ owner: THIS_MODULE,
+ fs_flags: FS_REQUIRES_DEV,
+ name: "ext2",
+ read_super: ext2_read_super,
+ super_size: sizeof(struct ext2_sb_info),
+ inode_size: sizeof(struct ext2_inode_info)
+};
+
static int __init init_ext2_fs(void)
{
- return register_filesystem(&ext2_fs_type);
+ return register_filesystem(&ext2_fs);
}
static void __exit exit_ext2_fs(void)
{
- unregister_filesystem(&ext2_fs_type);
+ unregister_filesystem(&ext2_fs);
}
EXPORT_NO_SYMBOLS;
--- ../2.4.16.uml.clean/fs/inode.c Mon Dec 31 02:55:36 2001
+++ ./fs/inode.c Mon Dec 31 18:53:23 2001
@@ -75,16 +75,23 @@
static kmem_cache_t * inode_cachep;
-#define alloc_inode() \
- ((struct inode *) kmem_cache_alloc(inode_cachep, SLAB_KERNEL))
-static void destroy_inode(struct inode *inode)
+int foo_count;
+
+static inline struct inode *alloc_inode (struct super_block *sb)
{
+ kmem_cache_t *cache = sb? sb->s_type->inode_cache: NULL;
+ return (struct inode *) kmem_cache_alloc (cache? cache: inode_cachep, SLAB_KERNEL);
+}
+
+static void destroy_inode (struct inode *inode)
+{
+ struct super_block *sb = inode->i_sb;
+ kmem_cache_t *cache = sb? sb->s_type->inode_cache: NULL;
if (inode_has_buffers(inode))
BUG();
- kmem_cache_free(inode_cachep, (inode));
+ kmem_cache_free (cache? cache: inode_cachep, inode);
}
-
/*
* These are initializations that only need to be done
* once, because the fields are idempotent across use
@@ -661,7 +668,7 @@
!inode_has_buffers(inode))
#define INODE(entry) (list_entry(entry, struct inode, i_list))
-void prune_icache(int goal)
+void prune_icache (int goal)
{
LIST_HEAD(list);
struct list_head *entry, *freeable = &list;
@@ -710,6 +717,7 @@
int shrink_icache_memory(int priority, int gfp_mask)
{
+ struct file_system_type *fs;
int count = 0;
/*
@@ -725,7 +733,15 @@
count = inodes_stat.nr_unused / priority;
prune_icache(count);
+#if 0
+ /* Manfred thinks this isn't necessary */
kmem_cache_shrink(inode_cachep);
+ write_lock(&file_systems_lock);
+ for (fs = file_systems; fs; fs = fs->next)
+ kmem_cache_shrink (fs->inode_cache);
+ write_unlock(&file_systems_lock);
+ kmem_cache_shrink(inode_cachep);
+#endif
return 0;
}
@@ -765,12 +781,14 @@
* i_sb, i_ino, i_count, i_state and the lists have
* been initialized elsewhere..
*/
-static void clean_inode(struct inode *inode)
+static void clean_inode (struct super_block *sb, struct inode *inode)
{
static struct address_space_operations empty_aops;
static struct inode_operations empty_iops;
static struct file_operations empty_fops;
- memset(&inode->u, 0, sizeof(inode->u));
+ unsigned given = sb? sb->s_type->inode_size: 0; // only rd.c has null sb
+ memset(&inode->u, 0, given? given: sizeof(inode->u));
+ inode->i_sb = sb;
inode->i_sock = 0;
inode->i_op = &empty_iops;
inode->i_fop = &empty_fops;
@@ -802,20 +820,19 @@
* lists.
*/
-struct inode * get_empty_inode(void)
+struct inode *get_empty_inode (struct super_block *sb)
{
static unsigned long last_ino;
- struct inode * inode;
+ struct inode *inode;
spin_lock_prefetch(&inode_lock);
-
- inode = alloc_inode();
+ inode = alloc_inode (sb);
if (inode)
{
spin_lock(&inode_lock);
inodes_stat.nr_inodes++;
list_add(&inode->i_list, &inode_in_use);
- inode->i_sb = NULL;
+ inode->i_sb = NULL; // need this?
inode->i_dev = 0;
inode->i_blkbits = 0;
inode->i_ino = ++last_ino;
@@ -823,7 +840,7 @@
atomic_set(&inode->i_count, 1);
inode->i_state = 0;
spin_unlock(&inode_lock);
- clean_inode(inode);
+ clean_inode(sb, inode);
}
return inode;
}
@@ -838,7 +855,7 @@
{
struct inode * inode;
- inode = alloc_inode();
+ inode = alloc_inode (sb);
if (inode) {
struct inode * old;
@@ -849,7 +866,6 @@
inodes_stat.nr_inodes++;
list_add(&inode->i_list, &inode_in_use);
list_add(&inode->i_hash, head);
- inode->i_sb = sb;
inode->i_dev = sb->s_dev;
inode->i_blkbits = sb->s_blocksize_bits;
inode->i_ino = ino;
@@ -857,8 +873,7 @@
atomic_set(&inode->i_count, 1);
inode->i_state = I_LOCK;
spin_unlock(&inode_lock);
-
- clean_inode(inode);
+ clean_inode(sb, inode);
/* reiserfs specific hack right here. We don't
** want this to last, and are looking for VFS changes
@@ -897,6 +912,22 @@
wait_on_inode(inode);
}
return inode;
+}
+
+int create_inode_cache (struct file_system_type *fs)
+{
+ if (fs->inode_size)
+ if (!(fs->inode_cache = kmem_cache_create (fs->name,
+ fs->inode_size + sizeof(struct inode) - sizeof(get_empty_inode(0)->u),
+ 0, SLAB_HWCACHE_ALIGN, init_once, NULL)))
+ return -ENOSPC;
+ return 0;
+}
+
+int destroy_inode_cache (struct file_system_type *fs)
+{
+ printk("Destroying inode cache for %s\n", fs->name);
+ return kmem_cache_destroy (fs->inode_cache)? -EBUSY: 0;
}
static inline unsigned long hash(struct super_block *sb, unsigned long i_ino)
--- ../2.4.16.uml.clean/fs/super.c Mon Dec 31 02:55:36 2001
+++ ./fs/super.c Mon Dec 31 18:28:57 2001
@@ -67,8 +67,8 @@
* Once the reference is obtained we can drop the spinlock.
*/
-static struct file_system_type *file_systems;
-static rwlock_t file_systems_lock = RW_LOCK_UNLOCKED;
+struct file_system_type *file_systems;
+rwlock_t file_systems_lock = RW_LOCK_UNLOCKED;
/* WARNING: This can be used only if we _already_ own a reference */
static void get_filesystem(struct file_system_type *fs)
@@ -105,10 +105,10 @@
* unregistered.
*/
-int register_filesystem(struct file_system_type * fs)
+int register_filesystem (struct file_system_type *fs)
{
- int res = 0;
- struct file_system_type ** p;
+ struct file_system_type **p;
+ int err = 0;
if (!fs)
return -EINVAL;
@@ -118,11 +118,12 @@
write_lock(&file_systems_lock);
p = find_filesystem(fs->name);
if (*p)
- res = -EBUSY;
+ err = -EBUSY;
else
- *p = fs;
+ if (!(err = create_inode_cache (fs)))
+ *p = fs;
write_unlock(&file_systems_lock);
- return res;
+ return err;
}
/**
@@ -137,23 +138,25 @@
* may be freed or reused.
*/
-int unregister_filesystem(struct file_system_type * fs)
+int unregister_filesystem (struct file_system_type *fs)
{
- struct file_system_type ** tmp;
-
+ struct file_system_type **p;
+ int err = -EINVAL;
write_lock(&file_systems_lock);
- tmp = &file_systems;
- while (*tmp) {
- if (fs == *tmp) {
- *tmp = fs->next;
+ p = &file_systems;
+ while (*p) {
+ if (*p == fs) {
+ if (fs->inode_cache && (err = destroy_inode_cache (fs)))
+ break;
+ *p = fs->next;
fs->next = NULL;
- write_unlock(&file_systems_lock);
- return 0;
+ err = 0;
+ break;
}
- tmp = &(*tmp)->next;
+ p = &(*p)->next;
}
write_unlock(&file_systems_lock);
- return -EINVAL;
+ return err;
}
static int fs_index(const char * __name)
--- ../2.4.16.uml.clean/include/linux/fs.h Mon Dec 31 02:55:36 2001
+++ ./include/linux/fs.h Mon Dec 31 18:50:45 2001
@@ -693,6 +693,9 @@
#include <linux/cramfs_fs_sb.h>
#include <linux/jffs2_fs_sb.h>
+extern struct file_system_type *file_systems;
+extern rwlock_t file_systems_lock;
+
extern struct list_head super_blocks;
extern spinlock_t sb_lock;
@@ -950,10 +953,14 @@
int fs_flags;
struct super_block *(*read_super) (struct super_block *, void *, int);
struct module *owner;
- struct file_system_type * next;
+ struct file_system_type *next;
struct list_head fs_supers;
+ unsigned super_size, inode_size;
+ struct kmem_cache_s *inode_cache;
};
+/* Backward compatible declarations, remove when all updated */
+
#define DECLARE_FSTYPE(var,type,read,flags) \
struct file_system_type var = { \
name: type, \
@@ -1327,18 +1334,21 @@
}
extern void clear_inode(struct inode *);
-extern struct inode * get_empty_inode(void);
+extern struct inode *get_empty_inode (struct super_block *sb);
-static inline struct inode * new_inode(struct super_block *sb)
+static inline struct inode *new_inode (struct super_block *sb)
{
- struct inode *inode = get_empty_inode();
+ struct inode *inode = get_empty_inode(sb);
if (inode) {
- inode->i_sb = sb;
inode->i_dev = sb->s_dev;
inode->i_blkbits = sb->s_blocksize_bits;
}
return inode;
}
+
+int create_inode_cache (struct file_system_type *fs);
+int destroy_inode_cache (struct file_system_type *fs);
+
extern void remove_suid(struct inode *inode);
extern void insert_inode_hash(struct inode *);
--- ../2.4.16.uml.clean/net/socket.c Mon Dec 31 02:55:36 2001
+++ ./net/socket.c Mon Dec 31 18:28:58 2001
@@ -440,11 +440,10 @@
struct inode * inode;
struct socket * sock;
- inode = get_empty_inode();
+ inode = get_empty_inode(sock_mnt->mnt_sb);
if (!inode)
return NULL;
- inode->i_sb = sock_mnt->mnt_sb;
sock = socki_lookup(inode);
inode->i_mode = S_IFSOCK|S_IRWXUGO;
Just some comments and questions about the new fs.h .. its great to hear
improvements in the new year!
Are you people going to put this on the 2.5? It seems you are keeping
the generic_ip pointer for compatibility. And I guess this is what other
file systems (those not included in the standardkernel sources) are
using, at least I am the one.
Daniel Phillips wrote:
>On December 31, 2001 07:16 pm, Linus Torvalds wrote:
>
>>How about using a descriptor structure instead of the macro, and making
>>the filesystem declaration syntax look more like
>>
>> static struct file_system_type ext2_descriptor = {
>> owner: THIS_MODULE,
>> fs_flags: FS_REQUIRES_DEV,
>> name: "ext2",
>> read_super: ext2_read_super,
>> super_size: sizeof(ext2_sb_info),
>>
> ^^^^^---> changed sb to super, lines up better
>
>> inode_size: sizeof(struct ext2_inode_info)
>> };
>>
I am using the generic_ip to hold my fs inode data pointer. During the
implementation, I tested both approaches... but it seems not sigficant
in detection of inode cache allocation overheads in read_inode(). Its
worth to talk about memory consumptions because it is silly to allocate
the largest possible inode cache sizes. I strongly agree with the new fs
declaration interface becuase of higher efficiency and less overhead,
but it seems more and more fs are supported in Linux, the effort to
patch all fs'es is a lot of work.
I am working on a compression file systmes, many thing varies such that
I have to use dynamically link allocated structures in the inode cache.
>>
>>which is more readable, and inherently documents _what_ those things are.
>>
>>[snip halfway macro format]
>>
>>What do you think?
>>
>
>It's much nicer, here's (1 of 3) again, with the Linus-style ext2_fs
>declaration.
>
>--
>Daniel
>
For the remount issue, I don't think its just affect the root
filesystem, for me I also reload other settings during a remount. For
feature oriented filesystems, remount may be use more frequent then
umount and mount. Other admins may also remount their /usr fs and other
protected fs to read-only for safe after altering their fs. Why not
changing the remount behaviour in VFS not to call module_exit()? I also
can't see why VFS has to call module_exit() in remount if we have
remount() in super_operations ... I think remount operations are fs
specific and shuold pass this job to super_operations->remount . I think
it is a better way to solve the root fs problem. But it would require
even more work to get all fs to obey the new standard. If you want nice
and clean efficient solutions, work is must anyway... good luck.
David Chow
On January 1, 2002 04:55 am, David Chow wrote:
> Just some comments and questions about the new fs.h .. its great to hear
> improvements in the new year!
>
> Are you people going to put this on the 2.5?
It's entirely up to Linus. There's no chance it will go in before it a)
works (which it doesn't at the moment) b) has been tested (a lot) c) meets
with general approval and d) survives the viro test.
> It seems you are keeping
> the generic_ip pointer for compatibility. And I guess this is what other
> file systems (those not included in the standardkernel sources) are
> using, at least I am the one.
Yes, IMHO, once people are able to use the fs-specific inode mechanism they
will prefer it over the generic_ip method. However, the generic_ip will be
the last item of the union to go, and even then, it will just be moved into
private area of whatever filesystem uses it, with something like:
struct myfs_inode { struct inode; struct myfs_inode_info *private; }
static inline struct myfs_inode_info *myfs_i(struct inode *inode)
{
return ((struct myfs_inode *) inode)->private;
}
> I am using the generic_ip to hold my fs inode data pointer. During the
> implementation, I tested both approaches... but it seems not sigficant
> in detection of inode cache allocation overheads in read_inode(). Its
> worth to talk about memory consumptions because it is silly to allocate
> the largest possible inode cache sizes. I strongly agree with the new fs
> declaration interface becuase of higher efficiency and less overhead,
> but it seems more and more fs are supported in Linux, the effort to
> patch all fs'es is a lot of work.
That's why it's good to do it sooner rather than later, the number of
filesystems will only increase. Changing a filesystem to use the
xxx_i(inode)->field idea is an easy edit.
> I am working on a compression file systmes, many thing varies such that
> I have to use dynamically link allocated structures in the inode cache.
Yes, so saving one dereference would make very little difference to you, and
the same with saving one slab alloc.
> For the remount issue, I don't think its just affect the root
> filesystem, for me I also reload other settings during a remount. For
> feature oriented filesystems, remount may be use more frequent then
> umount and mount. Other admins may also remount their /usr fs and other
> protected fs to read-only for safe after altering their fs. Why not
> changing the remount behaviour in VFS not to call module_exit()? I also
> can't see why VFS has to call module_exit() in remount if we have
> remount() in super_operations ... I think remount operations are fs
> specific and shuold pass this job to super_operations->remount . I think
> it is a better way to solve the root fs problem. But it would require
> even more work to get all fs to obey the new standard. If you want nice
> and clean efficient solutions, work is must anyway... good luck.
I think you're on to something with the ->remount idea, however it's outside
the scope of this work. For now, the goal is just get the fs.h includes
straightened out.
--
Daniel