2002-11-21 15:33:01

by Rusty Lynch

[permalink] [raw]
Subject: [BUG] sysfs on 2.5.48 unable to remove files while in use

I have noticed some strange behavior on my 2.5.48 build
where sysfs is not able to handle file/directory removal
correctly when the file/directory is in use.

I can see this with a little kprobes example code that I have
been playing with that will create entries like:

/sysfsroot/noisy/0xc0107ae0/sys_fork

when someone uses that driver to insert a kernel probe
at 0xc0107ae0 that will printk "sys_fork".

What I have noticed, is that if I create a new probe
(which will create the sysfs entry), open a new shell and
cd to /sysfsroot/noisy/0xc0107ae0, and then use my
driver to remove the probe (which will remove the
sysfs entry), then /sysfsroot/noisy/0xc0107ae0 doesn't
go away after I cd out of the shell.

>From then on any attempts to create new sysfs entries
do not show up in /sysfsroot/ until I unload/load my driver
again.

It seems like this could be an issue with some real code
(not just this stupid play code of mine), like maybe hotswap
code that updates sysfs entries.

My patch to 2.5.48 (with kprobes already applied) can be
grabbed from:
http://www.stinkycat.com/patches/patch-kprobes_sample_with_sysfs-2.5.48.diff

- or here is an inline version that $%^&^ outlook will surely screw up:

diff -urN linux-2.5.48-kprobes/arch/i386/kernel/i386_ksyms.c
linux-2.5.48-kprobes-patched/arch/i386/kernel/i386_ksyms.c
--- linux-2.5.48-kprobes/arch/i386/kernel/i386_ksyms.c 2002-11-17
20:29:57.000000000 -0800
+++ linux-2.5.48-kprobes-patched/arch/i386/kernel/i386_ksyms.c 2002-11-18
15:14:42.000000000 -0800
@@ -59,6 +59,8 @@
extern unsigned long cpu_khz;
extern unsigned long get_cmos_time(void);

+extern int valid_kernel_address(unsigned long addr);
+
/* platform dependent support */
EXPORT_SYMBOL(boot_cpu_data);
#ifdef CONFIG_EISA
@@ -91,6 +93,7 @@
EXPORT_SYMBOL(get_cmos_time);
EXPORT_SYMBOL(cpu_khz);
EXPORT_SYMBOL(apm_info);
+EXPORT_SYMBOL(valid_kernel_address);

#ifdef CONFIG_DEBUG_IOVIRT
EXPORT_SYMBOL(__io_virt_debug);
diff -urN linux-2.5.48-kprobes/arch/i386/kernel/traps.c
linux-2.5.48-kprobes-patched/arch/i386/kernel/traps.c
--- linux-2.5.48-kprobes/arch/i386/kernel/traps.c 2002-11-18
15:14:25.000000000 -0800
+++ linux-2.5.48-kprobes-patched/arch/i386/kernel/traps.c 2002-11-18
15:14:42.000000000 -0800
@@ -131,6 +131,11 @@

#endif

+int valid_kernel_address(unsigned long addr)
+{
+ return kernel_text_address(addr);
+}
+
void show_trace(unsigned long * stack)
{
int i;
diff -urN linux-2.5.48-kprobes/drivers/char/Kconfig
linux-2.5.48-kprobes-patched/drivers/char/Kconfig
--- linux-2.5.48-kprobes/drivers/char/Kconfig 2002-11-17
20:29:21.000000000 -0800
+++ linux-2.5.48-kprobes-patched/drivers/char/Kconfig 2002-11-18
15:14:59.000000000 -0800
@@ -1270,5 +1270,20 @@
Once bound, I/O against /dev/raw/rawN uses efficient zero-copy I/O.
See the raw(8) manpage for more details.

+config NOISY
+ tristate "Noisy Interface Support"
+ ---help---
+ If you say Y here and create a character special file /dev/noisy with
+ major number 10 and minor number 221 using mknod ("man mknod"), you
+ will get access to an interface for inserting arbitrary printk's
+ into executing kernel code.
+
+ This driver is also available as a module ( = code which can be
+ inserted in and removed from the running kernel whenever you want).
+ The module is called noisy.o. If you want to compile it as a module,
+ say M here and read <file:Documentation/modules.txt>.
+
+ If unsure, say N.
+
endmenu

diff -urN linux-2.5.48-kprobes/drivers/char/Makefile
linux-2.5.48-kprobes-patched/drivers/char/Makefile
--- linux-2.5.48-kprobes/drivers/char/Makefile 2002-11-17
20:29:56.000000000 -0800
+++ linux-2.5.48-kprobes-patched/drivers/char/Makefile 2002-11-18
15:14:59.000000000 -0800
@@ -102,6 +102,7 @@
obj-$(CONFIG_AGP) += agp/
obj-$(CONFIG_DRM) += drm/
obj-$(CONFIG_PCMCIA) += pcmcia/
+obj-$(CONFIG_NOISY) += noisy.o

# Files generated that shall be removed upon make clean
clean-files := consolemap_deftbl.c defkeymap.c qtronixmap.c
diff -urN linux-2.5.48-kprobes/drivers/char/noisy.c
linux-2.5.48-kprobes-patched/drivers/char/noisy.c
--- linux-2.5.48-kprobes/drivers/char/noisy.c 1969-12-31
16:00:00.000000000 -0800
+++ linux-2.5.48-kprobes-patched/drivers/char/noisy.c 2002-11-20
18:28:41.000000000 -0800
@@ -0,0 +1,265 @@
+/*
+ * Noisy Interface for Linux
+ *
+ * This driver allows arbitrary printk's to be inserted into
+ * executing kernel code by using the new kprobes interface.
+ *
+ * Copyright (C) 2002 Rusty Lynch <[email protected]>
+ */
+
+#include <linux/module.h>
+#include <linux/miscdevice.h>
+#include <linux/init.h>
+#include <linux/kprobes.h>
+#include <linux/slab.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+#include <asm/uaccess.h>
+
+/* exported by arch/YOURARCH/kernel/traps.c */
+extern int valid_kernel_address(unsigned long);
+
+#define MAX_MSG_SIZE 128
+
+/*
+ * Data structures for managing list of probe points
+ */
+static struct list_head probe_list;
+struct nprobe {
+ struct list_head list;
+ struct kprobe probe;
+ char message[MAX_MSG_SIZE + 1];
+ struct attribute attr;
+ struct kobject kobj;
+};
+
+/*
+ * sysfs stuff
+ */
+
+struct subsystem noisy_subsys = {
+ .kobj = { .name = "noisy" },
+};
+
+/*
+ * Probe handler called before the execution of all probe points
+ */
+static void noisy_pre_handler(struct kprobe *p, struct pt_regs *r)
+{
+ struct nprobe *c = container_of(p, struct nprobe, probe);
+ printk(KERN_CRIT "%s: %s\n", __FUNCTION__, c->message);
+}
+
+/*
+ * Probe handler called just after the probed address is single stepped
+ */
+static void noisy_post_handler(struct kprobe *p, struct pt_regs *r,
+ unsigned long flags)
+{
+}
+
+/*
+ * Fault handler that covers the pre_handler, single stepping, and
+ * post_handler executiion.
+ */
+static int noisy_fault_handler(struct kprobe *p, struct pt_regs *r, int
trapnr)
+{
+ /* Let the kernel handle this fault */
+ return 0;
+}
+
+/*
+ * Supported file operations
+ */
+static ssize_t noisy_read(struct file *, char *, size_t, loff_t *);
+static ssize_t noisy_write(struct file *, const char *, size_t, loff_t *);
+static int noisy_open(struct inode *, struct file *);
+static int noisy_release(struct inode *, struct file *);
+
+static struct file_operations noisy_fops = {
+ .owner = THIS_MODULE,
+ .read = noisy_read,
+ .write = noisy_write,
+ .open = noisy_open,
+ .release = noisy_release,
+};
+
+/*
+ * To conserve major numbers, this device uses
+ * the miscdevice subsystem.
+ */
+static struct miscdevice noisy_dev =
+{
+ .minor = NOISY_MINOR,
+ .name = "noisy",
+ .fops = &noisy_fops
+};
+
+static ssize_t noisy_read(struct file *file, char *buf,
+ size_t count, loff_t *ppos)
+{
+ struct nprobe *p;
+ list_for_each_entry(p, &probe_list, list) {
+ printk(KERN_CRIT "%p: %s\n", (p->probe).addr, p->message);
+ }
+
+ return 0;
+}
+
+/*
+ * A kprobe is created for each write operation where write is expecting
+ * a string of the form:
+ * 0xADDRESS MESSAGE
+ *
+ * The kprobe pre_handler will just do a printk using the MESSAGE passed
in.
+ *
+ * All kprobes are unregistered when the node is closed, so I use this
module
+ * like:
+ * $ mknod /dev/noisy c 10 221
+ * $ cat > /dev/noisy
+ * 0xc0107d50 sys_fork
+ *
+ * ... and then go do something that will trigger a sys_fork,
+ * and then control-c to stop the cat process (which
+ * will close the node and therefore stop syslog from further
+ * DOS attacks from this driver)
+ */
+static ssize_t noisy_write(struct file *file, const char *buf, size_t
count,
+ loff_t *ppos)
+{
+ struct nprobe *n = 0;
+ size_t ret = -ENOMEM;
+ char *tmp = 0;
+
+ if (count > MAX_MSG_SIZE) {
+ printk(KERN_CRIT
+ "noisy: Input buffer (%i bytes) is too big!\n",
+ count);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ tmp = kmalloc(count + 1, GFP_KERNEL);
+ if (!tmp) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ n = kmalloc(sizeof(struct nprobe), GFP_KERNEL);
+ if (!n) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ memset(n, '\0', sizeof(struct nprobe));
+
+ if (copy_from_user((void *)tmp, (void *)buf, count)) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ tmp[count] = '\0';
+
+ if (2 != sscanf(tmp, "0x%x %s", &(n->probe).addr, n->message)) {
+ ret = -EINVAL;
+ goto out;
+ }
+ (n->probe).pre_handler = noisy_pre_handler;
+ (n->probe).post_handler = noisy_post_handler;
+ (n->probe).fault_handler = noisy_fault_handler;
+
+
+ if (!valid_kernel_address((unsigned long)(n->probe).addr)) {
+ kfree(n);
+
+ ret = -EINVAL;
+ goto out;
+ }
+
+ kobject_init(&(n->kobj));
+ (n->kobj).subsys = &noisy_subsys;
+ snprintf((n->kobj).name, KOBJ_NAME_LEN, "0x%02x",
+ (unsigned int)(n->probe).addr);
+
+ (n->attr).name = n->message;
+ (n->attr).mode = S_IRUGO;
+
+ if (register_kprobe(&(n->probe))) {
+ printk(KERN_CRIT "Unable to register probe at %p\n",
+ (n->probe).addr);
+ kfree(n);
+
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (kobject_register(&(n->kobj))) {
+ printk(KERN_CRIT "Unable to add probe kobject!\n");
+ unregister_kprobe(&(n->probe));
+ kfree(n);
+
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (sysfs_create_file(&(n->kobj), &(n->attr))) {
+ printk(KERN_CRIT "Unable to add probe attr file!\n");
+ unregister_kprobe(&(n->probe));
+ kobject_unregister(&(n->kobj));
+ kfree(n);
+
+ ret = -EINVAL;
+ goto out;
+ }
+
+ list_add(&(n->list), &probe_list);
+ ret = count;
+
+ out:
+ kfree(tmp);
+ return ret;
+}
+
+static int noisy_open(struct inode *inode, struct file *file)
+{
+ return 0;
+}
+
+static int noisy_release(struct inode *inode, struct file *file)
+{
+ while(!list_empty(&probe_list)) {
+ struct list_head *n = probe_list.next;
+ struct nprobe *p = list_entry(n, struct nprobe, list);
+
+ printk("Releasing probe %p: %s\n",
+ (p->probe).addr, p->message);
+ sysfs_remove_file(&(p->kobj), &(p->attr));
+ kobject_unregister(&(p->kobj));
+ unregister_kprobe(&(p->probe));
+
+ list_del(n);
+ kfree(p);
+ }
+ return 0;
+}
+
+static int __init noisy_init(void)
+{
+ if (misc_register(&noisy_dev))
+ {
+ return -ENODEV;
+ }
+ INIT_LIST_HEAD(&probe_list);
+ subsystem_register(&noisy_subsys);
+ return 0;
+}
+
+static void __exit noisy_exit (void)
+{
+ misc_deregister(&noisy_dev);
+ subsystem_unregister(&noisy_subsys);
+}
+
+module_init(noisy_init);
+module_exit(noisy_exit);
+
+MODULE_AUTHOR("Rusty Lynch");
+MODULE_LICENSE("GPL");
diff -urN linux-2.5.48-kprobes/include/linux/miscdevice.h
linux-2.5.48-kprobes-patched/include/linux/miscdevice.h
--- linux-2.5.48-kprobes/include/linux/miscdevice.h 2002-11-17
20:29:47.000000000 -0800
+++ linux-2.5.48-kprobes-patched/include/linux/miscdevice.h 2002-11-18
15:14:59.000000000 -0800
@@ -23,6 +23,7 @@
#define MICROCODE_MINOR 184
#define MWAVE_MINOR 219 /* ACP/Mwave Modem */
#define MPT_MINOR 220
+#define NOISY_MINOR 221
#define MISC_DYNAMIC_MINOR 255

#define SGI_GRAPHICS_MINOR 146


2002-11-21 16:59:57

by Patrick Mochel

[permalink] [raw]
Subject: Re: [BUG] sysfs on 2.5.48 unable to remove files while in use


Hi there.

> My patch to 2.5.48 (with kprobes already applied) can be
> grabbed from:
> http://www.stinkycat.com/patches/patch-kprobes_sample_with_sysfs-2.5.48.diff

Where is the kprobes patch?

Thanks,

-pat

2002-11-21 17:13:31

by Rusty Lynch

[permalink] [raw]
Subject: Re: [BUG] sysfs on 2.5.48 unable to remove files while in use

I used the one submitted by Vamsi in
http://marc.theaimsgroup.com/?l=linux-kernel&m=103761676628192&w=2

I think IBM has a kprobes webpage with a link to the patch, but I also have
copy of the one I used at:
http://www.stinkycat.com/patches/patch-kprobes-2.5.48.diff

-rustyl

----- Original Message -----
From: "Patrick Mochel" <[email protected]>
To: "Rusty Lynch" <[email protected]>
Cc: "Linux Kernel Mailing List" <[email protected]>
Sent: Thursday, November 21, 2002 9:04 AM
Subject: Re: [BUG] sysfs on 2.5.48 unable to remove files while in use


>
> Hi there.
>
> > My patch to 2.5.48 (with kprobes already applied) can be
> > grabbed from:
> >
http://www.stinkycat.com/patches/patch-kprobes_sample_with_sysfs-2.5.48.diff
>
> Where is the kprobes patch?
>
> Thanks,
>
> -pat
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2002-11-21 20:42:34

by Patrick Mochel

[permalink] [raw]
Subject: Re: [BUG] sysfs on 2.5.48 unable to remove files while in use


Ok, I've had a chance to play with it a bit..

The kprobes patch didn't apply to current -bk. Attached is an updated
patch. I also have some comments on your 'noisy' patch, but first..

> I can see this with a little kprobes example code that I have
> been playing with that will create entries like:
>
> /sysfsroot/noisy/0xc0107ae0/sys_fork
>
> when someone uses that driver to insert a kernel probe
> at 0xc0107ae0 that will printk "sys_fork".

It seems that having a pure sysfs implementation would be superior,
instead of having to use a character device to write to. After looking
into this, I realize that a couple of pieces of infrastructure are needed,
so I'm working on that, and will post a modified version of your module
once I'm done..

> What I have noticed, is that if I create a new probe
> (which will create the sysfs entry), open a new shell and
> cd to /sysfsroot/noisy/0xc0107ae0, and then use my
> driver to remove the probe (which will remove the
> sysfs entry), then /sysfsroot/noisy/0xc0107ae0 doesn't
> go away after I cd out of the shell.
>
> >From then on any attempts to create new sysfs entries
> do not show up in /sysfsroot/ until I unload/load my driver
> again.
>
> It seems like this could be an issue with some real code
> (not just this stupid play code of mine), like maybe hotswap
> code that updates sysfs entries.

Yes, it was a real bug. I've mucked with the method to do file and
directory deletion, and it turns out that the way I was doing it was
wrong. I've gone back to mimmicking vfs_unlink() and vfs_rmdir(), which I
shouldn't have diverged from in the first place. (doing d_delete() after
low-level unlinking, instead of d_invalidate() before it).

Appended to this email is my current working patch to sysfs, including
fixes discussed and posted yesterday. I've pushed these to Linus, though
he appears to be away. I'll push again, with this fix in the next day or
so.

Please try this patch and let me know if you still experience the problem
you're seeing.

Concerning your patch:

> +config NOISY

'Noisy' seems like such a generic name, and the description doesn't seem
to imply its dependence on kprobes. Maybe KPROBES_NOISY? And, you should
put it under KPROBES, under DEBUG_KERNEL.

> diff -urN linux-2.5.48-kprobes/drivers/char/noisy.c
> linux-2.5.48-kprobes-patched/drivers/char/noisy.c

> +static struct list_head probe_list;
> +struct nprobe {
> + struct list_head list;
> + struct kprobe probe;
> + char message[MAX_MSG_SIZE + 1];
> + struct attribute attr;
> + struct kobject kobj;
> +};

struct subsystem has a list, and kobject and entry, so you don't have to
do it yourself..

> +static ssize_t noisy_write(struct file *file, const char *buf, size_t
> count,
> + loff_t *ppos)
> +{
> + struct nprobe *n = 0;
> + size_t ret = -ENOMEM;
> + char *tmp = 0;
> +
> + if (count > MAX_MSG_SIZE) {
> + printk(KERN_CRIT
> + "noisy: Input buffer (%i bytes) is too big!\n",
> + count);
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + tmp = kmalloc(count + 1, GFP_KERNEL);
> + if (!tmp) {
> + ret = -ENOMEM;
> + goto out;
> + }

This should be memset to 0.

> + n = kmalloc(sizeof(struct nprobe), GFP_KERNEL);
> + if (!n) {
> + ret = -ENOMEM;
> + goto out;
> + }
> + memset(n, '\0', sizeof(struct nprobe));
> +
> + if (copy_from_user((void *)tmp, (void *)buf, count)) {
> + ret = -ENOMEM;
> + goto out;
> + }
> + tmp[count] = '\0';
> +
> + if (2 != sscanf(tmp, "0x%x %s", &(n->probe).addr, n->message)) {
> + ret = -EINVAL;
> + goto out;
> + }

You don't free n if you get an error from copy_from_user() or sscanf().

> + (n->attr).name = n->message;
> + (n->attr).mode = S_IRUGO;

Instead of doing this, you should just declare a static attribute called
'message' and have the contents be the message. This will save you from
initializing it each time, and save you 4 bytes in struct nprobe.
Something like this should work:


struct noisy_attribute {
struct attribute attr;
ssize_t (*read)(struct nprobe *,char *,size_t,loff_t);
};

static ssize_t noisy_attr_show(struct kobject * kobj, struct attribute *
attr,
char * page, size_t count, loff_t off)
{
struct nprobe * n = container_of(kobj,struct nprobe,kobj);
struct noisy_attribute * noisy_attr = container_of(attr,struct
noisy_attribute,attr);
ssize_t ret = 0;
return noisy_attr->show ? noisy_attr->show(n,page,count,off);
}

static struct sysfs_ops noisy_sysfs_ops = {
.show = noisy_attr_show,
};


/* Default Attribute */
static ssize_t noisy_message_read(struct nprobe * n, char * page, size_t
count, loff_t off)
{
return off ? snprintf(page,MAX_MSG_SIZE,"%s",n->message);
}

static struct noisy_attribute attr_message = {
.attr = { .name = "message", .mode = S_IRUGO },
};

static struct attribute * default_attrs[] = {
&attr_message.attr,
NULL,
};

/*
* sysfs stuff
*/

struct subsystem noisy_subsys = {
.kobj = { .name = "noisy" },
.default_attrs = default_attrs,
.sysfs_ops = noisy_sysfs_ops,
};


Note that this will also save you from manually having to create and
teardown the file when the kobject is registered and unregistered.


-pat

--- linux-2.5-virgin/fs/sysfs/inode.c Wed Nov 20 12:13:06 2002
+++ linux-2.5-core/fs/sysfs/inode.c Thu Nov 21 13:51:32 2002
@@ -23,6 +23,8 @@
* Please see Documentation/filesystems/sysfs.txt for more information.
*/

+#undef DEBUG
+
#include <linux/list.h>
#include <linux/init.h>
#include <linux/pagemap.h>
@@ -94,9 +96,10 @@

if (!dentry->d_inode) {
inode = sysfs_get_inode(dir->i_sb, mode, dev);
- if (inode)
+ if (inode) {
d_instantiate(dentry, inode);
- else
+ dget(dentry);
+ } else
error = -ENOSPC;
} else
error = -EEXIST;
@@ -142,17 +145,6 @@
return error;
}

-static int sysfs_unlink(struct inode *dir, struct dentry *dentry)
-{
- struct inode *inode = dentry->d_inode;
- down(&inode->i_sem);
- dentry->d_inode->i_nlink--;
- up(&inode->i_sem);
- d_invalidate(dentry);
- dput(dentry);
- return 0;
-}
-
/**
* sysfs_read_file - read an attribute.
* @file: file pointer.
@@ -173,17 +165,11 @@
sysfs_read_file(struct file *file, char *buf, size_t count, loff_t *ppos)
{
struct attribute * attr = file->f_dentry->d_fsdata;
- struct sysfs_ops * ops = NULL;
- struct kobject * kobj;
+ struct sysfs_ops * ops = file->private_data;
+ struct kobject * kobj = file->f_dentry->d_parent->d_fsdata;
unsigned char *page;
ssize_t retval = 0;

- kobj = file->f_dentry->d_parent->d_fsdata;
- if (kobj && kobj->subsys)
- ops = kobj->subsys->sysfs_ops;
- if (!ops || !ops->show)
- return 0;
-
if (count > PAGE_SIZE)
count = PAGE_SIZE;

@@ -234,16 +220,11 @@
sysfs_write_file(struct file *file, const char *buf, size_t count, loff_t *ppos)
{
struct attribute * attr = file->f_dentry->d_fsdata;
- struct sysfs_ops * ops = NULL;
- struct kobject * kobj;
+ struct sysfs_ops * ops = file->private_data;
+ struct kobject * kobj = file->f_dentry->d_parent->d_fsdata;
ssize_t retval = 0;
char * page;

- kobj = file->f_dentry->d_parent->d_fsdata;
- if (kobj && kobj->subsys)
- ops = kobj->subsys->sysfs_ops;
- if (!ops || !ops->store)
- return -EINVAL;

page = (char *)__get_free_page(GFP_KERNEL);
if (!page)
@@ -275,21 +256,72 @@
return retval;
}

-static int sysfs_open_file(struct inode * inode, struct file * filp)
+static int check_perm(struct inode * inode, struct file * file)
{
- struct kobject * kobj;
+ struct kobject * kobj = kobject_get(file->f_dentry->d_parent->d_fsdata);
+ struct attribute * attr = file->f_dentry->d_fsdata;
+ struct sysfs_ops * ops = NULL;
int error = 0;

- kobj = filp->f_dentry->d_parent->d_fsdata;
- if ((kobj = kobject_get(kobj))) {
- struct attribute * attr = filp->f_dentry->d_fsdata;
- if (!attr)
- error = -EINVAL;
- } else
- error = -EINVAL;
+ if (!kobj || !attr)
+ goto Einval;
+
+ if (kobj->subsys)
+ ops = kobj->subsys->sysfs_ops;
+
+ /* No sysfs operations, either from having no subsystem,
+ * or the subsystem have no operations.
+ */
+ if (!ops)
+ goto Eaccess;
+
+ /* File needs write support.
+ * The inode's perms must say it's ok,
+ * and we must have a store method.
+ */
+ if (file->f_mode & FMODE_WRITE) {
+
+ if (!(inode->i_mode & S_IWUGO))
+ goto Eperm;
+ if (!ops->store)
+ goto Eaccess;
+
+ }
+
+ /* File needs read support.
+ * The inode's perms must say it's ok, and we there
+ * must be a show method for it.
+ */
+ if (file->f_mode & FMODE_READ) {
+ if (!(inode->i_mode & S_IRUGO))
+ goto Eperm;
+ if (!ops->show)
+ goto Eaccess;
+ }
+
+ /* No error? Great, store the ops in file->private_data
+ * for easy access in the read/write functions.
+ */
+ file->private_data = ops;
+ goto Done;
+
+ Einval:
+ error = -EINVAL;
+ goto Done;
+ Eaccess:
+ error = -EACCES;
+ goto Done;
+ Eperm:
+ error = -EPERM;
+ Done:
return error;
}

+static int sysfs_open_file(struct inode * inode, struct file * filp)
+{
+ return check_perm(inode,filp);
+}
+
static int sysfs_release(struct inode * inode, struct file * filp)
{
struct kobject * kobj = filp->f_dentry->d_parent->d_fsdata;
@@ -541,7 +573,8 @@
/* make sure dentry is really there */
if (victim->d_inode &&
(victim->d_parent->d_inode == dir->d_inode)) {
- sysfs_unlink(dir->d_inode,victim);
+ simple_unlink(dir->d_inode,victim);
+ d_delete(victim);
}
}
up(&dir->d_inode->i_sem);
@@ -599,19 +632,16 @@
list_for_each_safe(node,next,&dentry->d_subdirs) {
struct dentry * d = list_entry(node,struct dentry,d_child);
/* make sure dentry is still there */
- if (d->d_inode)
- sysfs_unlink(dentry->d_inode,d);
+ if (d->d_inode) {
+ simple_unlink(dentry->d_inode,d);
+ d_delete(dentry);
+ }
}

- d_invalidate(dentry);
- if (simple_empty(dentry)) {
- dentry->d_inode->i_nlink -= 2;
- dentry->d_inode->i_flags |= S_DEAD;
- parent->d_inode->i_nlink--;
- }
up(&dentry->d_inode->i_sem);
- dput(dentry);
-
+ d_invalidate(dentry);
+ simple_rmdir(parent->d_inode,dentry);
+ d_delete(dentry);
up(&parent->d_inode->i_sem);
dput(parent);
}
@@ -622,4 +652,3 @@
EXPORT_SYMBOL(sysfs_remove_link);
EXPORT_SYMBOL(sysfs_create_dir);
EXPORT_SYMBOL(sysfs_remove_dir);
-MODULE_LICENSE("GPL");


Attachments:
kprobes.diff (6.34 kB)

2002-11-21 22:02:38

by Rusty Lynch

[permalink] [raw]
Subject: Re: [BUG] sysfs on 2.5.48 unable to remove files while in use

>
> Ok, I've had a chance to play with it a bit..
>
> The kprobes patch didn't apply to current -bk. Attached is an updated
> patch. I also have some comments on your 'noisy' patch, but first..
>

Very cool. Bitkeeper is one of those things I never bothered
with yet (mainly because I feel some comfortable with CVS.)
It looks like it might be worth going through ramp-up time
on bk to keep up with changes to the kernel.

> > I can see this with a little kprobes example code that I have
> > been playing with that will create entries like:
> >
> > /sysfsroot/noisy/0xc0107ae0/sys_fork
> >
> > when someone uses that driver to insert a kernel probe
> > at 0xc0107ae0 that will printk "sys_fork".
>
> It seems that having a pure sysfs implementation would be superior,
> instead of having to use a character device to write to. After looking
> into this, I realize that a couple of pieces of infrastructure are needed,
> so I'm working on that, and will post a modified version of your module
> once I'm done..

I look forward to seeing it.

>
> > What I have noticed, is that if I create a new probe
> > (which will create the sysfs entry), open a new shell and
> > cd to /sysfsroot/noisy/0xc0107ae0, and then use my
> > driver to remove the probe (which will remove the
> > sysfs entry), then /sysfsroot/noisy/0xc0107ae0 doesn't
> > go away after I cd out of the shell.
> >
> > >From then on any attempts to create new sysfs entries
> > do not show up in /sysfsroot/ until I unload/load my driver
> > again.
> >
> > It seems like this could be an issue with some real code
> > (not just this stupid play code of mine), like maybe hotswap
> > code that updates sysfs entries.
>
> Yes, it was a real bug. I've mucked with the method to do file and
> directory deletion, and it turns out that the way I was doing it was
> wrong. I've gone back to mimmicking vfs_unlink() and vfs_rmdir(), which I
> shouldn't have diverged from in the first place. (doing d_delete() after
> low-level unlinking, instead of d_invalidate() before it).
>
> Appended to this email is my current working patch to sysfs, including
> fixes discussed and posted yesterday. I've pushed these to Linus, though
> he appears to be away. I'll push again, with this fix in the next day or
> so.
>
> Please try this patch and let me know if you still experience the problem
> you're seeing.
>
It looks like the patch is against the bk tree, and does not apply cleanly
to
strait 2.5.48. I don't know how much has changed to sysfs/inode.c, but
I can see where the last hunk is looking too far up, so I'll try it anyway.

In the meantime I'll be grabbing bk so I can see what everyone is looking
at.

> Concerning your patch:

Thanks for taking the time to look at my module. The reason I created
it was to get familiar with the 2.5 kernel, so this feedback is very
helpful.

I'll take a stab at the changes.

>
> > +config NOISY
>
> 'Noisy' seems like such a generic name, and the description doesn't seem
> to imply its dependence on kprobes. Maybe KPROBES_NOISY? And, you should
> put it under KPROBES, under DEBUG_KERNEL.
>

yea, I don't put much thought into a name, or where would show up
in the config. I'll change it.

> > diff -urN linux-2.5.48-kprobes/drivers/char/noisy.c
> > linux-2.5.48-kprobes-patched/drivers/char/noisy.c
>
> > +static struct list_head probe_list;
> > +struct nprobe {
> > + struct list_head list;
> > + struct kprobe probe;
> > + char message[MAX_MSG_SIZE + 1];
> > + struct attribute attr;
> > + struct kobject kobj;
> > +};
>
> struct subsystem has a list, and kobject and entry, so you don't have to
> do it yourself..
>
> > +static ssize_t noisy_write(struct file *file, const char *buf, size_t
> > count,
> > + loff_t *ppos)
> > +{
> > + struct nprobe *n = 0;
> > + size_t ret = -ENOMEM;
> > + char *tmp = 0;
> > +
> > + if (count > MAX_MSG_SIZE) {
> > + printk(KERN_CRIT
> > + "noisy: Input buffer (%i bytes) is too big!\n",
> > + count);
> > + ret = -EINVAL;
> > + goto out;
> > + }
> > +
> > + tmp = kmalloc(count + 1, GFP_KERNEL);
> > + if (!tmp) {
> > + ret = -ENOMEM;
> > + goto out;
> > + }
>
> This should be memset to 0.
>
> > + n = kmalloc(sizeof(struct nprobe), GFP_KERNEL);
> > + if (!n) {
> > + ret = -ENOMEM;
> > + goto out;
> > + }
> > + memset(n, '\0', sizeof(struct nprobe));
> > +
> > + if (copy_from_user((void *)tmp, (void *)buf, count)) {
> > + ret = -ENOMEM;
> > + goto out;
> > + }
> > + tmp[count] = '\0';
> > +
> > + if (2 != sscanf(tmp, "0x%x %s", &(n->probe).addr, n->message)) {
> > + ret = -EINVAL;
> > + goto out;
> > + }
>
> You don't free n if you get an error from copy_from_user() or sscanf().
>
> > + (n->attr).name = n->message;
> > + (n->attr).mode = S_IRUGO;
>
> Instead of doing this, you should just declare a static attribute called
> 'message' and have the contents be the message. This will save you from
> initializing it each time, and save you 4 bytes in struct nprobe.
> Something like this should work:
>
>
> struct noisy_attribute {
> struct attribute attr;
> ssize_t (*read)(struct nprobe *,char *,size_t,loff_t);
> };
>
> static ssize_t noisy_attr_show(struct kobject * kobj, struct attribute *
> attr,
> char * page, size_t count, loff_t off)
> {
> struct nprobe * n = container_of(kobj,struct nprobe,kobj);
> struct noisy_attribute * noisy_attr = container_of(attr,struct
> noisy_attribute,attr);
> ssize_t ret = 0;
> return noisy_attr->show ? noisy_attr->show(n,page,count,off);
> }
>
> static struct sysfs_ops noisy_sysfs_ops = {
> .show = noisy_attr_show,
> };
>
>
> /* Default Attribute */
> static ssize_t noisy_message_read(struct nprobe * n, char * page, size_t
> count, loff_t off)
> {
> return off ? snprintf(page,MAX_MSG_SIZE,"%s",n->message);
> }
>
> static struct noisy_attribute attr_message = {
> .attr = { .name = "message", .mode = S_IRUGO },
> };
>
> static struct attribute * default_attrs[] = {
> &attr_message.attr,
> NULL,
> };
>
> /*
> * sysfs stuff
> */
>
> struct subsystem noisy_subsys = {
> .kobj = { .name = "noisy" },
> .default_attrs = default_attrs,
> .sysfs_ops = noisy_sysfs_ops,
> };
>
>
> Note that this will also save you from manually having to create and
> teardown the file when the kobject is registered and unregistered.
>
>
> -pat
>
> --- linux-2.5-virgin/fs/sysfs/inode.c Wed Nov 20 12:13:06 2002
> +++ linux-2.5-core/fs/sysfs/inode.c Thu Nov 21 13:51:32 2002
> @@ -23,6 +23,8 @@
> * Please see Documentation/filesystems/sysfs.txt for more information.
> */
>
> +#undef DEBUG
> +
> #include <linux/list.h>
> #include <linux/init.h>
> #include <linux/pagemap.h>
> @@ -94,9 +96,10 @@
>
> if (!dentry->d_inode) {
> inode = sysfs_get_inode(dir->i_sb, mode, dev);
> - if (inode)
> + if (inode) {
> d_instantiate(dentry, inode);
> - else
> + dget(dentry);
> + } else
> error = -ENOSPC;
> } else
> error = -EEXIST;
> @@ -142,17 +145,6 @@
> return error;
> }
>
> -static int sysfs_unlink(struct inode *dir, struct dentry *dentry)
> -{
> - struct inode *inode = dentry->d_inode;
> - down(&inode->i_sem);
> - dentry->d_inode->i_nlink--;
> - up(&inode->i_sem);
> - d_invalidate(dentry);
> - dput(dentry);
> - return 0;
> -}
> -
> /**
> * sysfs_read_file - read an attribute.
> * @file: file pointer.
> @@ -173,17 +165,11 @@
> sysfs_read_file(struct file *file, char *buf, size_t count, loff_t *ppos)
> {
> struct attribute * attr = file->f_dentry->d_fsdata;
> - struct sysfs_ops * ops = NULL;
> - struct kobject * kobj;
> + struct sysfs_ops * ops = file->private_data;
> + struct kobject * kobj = file->f_dentry->d_parent->d_fsdata;
> unsigned char *page;
> ssize_t retval = 0;
>
> - kobj = file->f_dentry->d_parent->d_fsdata;
> - if (kobj && kobj->subsys)
> - ops = kobj->subsys->sysfs_ops;
> - if (!ops || !ops->show)
> - return 0;
> -
> if (count > PAGE_SIZE)
> count = PAGE_SIZE;
>
> @@ -234,16 +220,11 @@
> sysfs_write_file(struct file *file, const char *buf, size_t count, loff_t
*ppos)
> {
> struct attribute * attr = file->f_dentry->d_fsdata;
> - struct sysfs_ops * ops = NULL;
> - struct kobject * kobj;
> + struct sysfs_ops * ops = file->private_data;
> + struct kobject * kobj = file->f_dentry->d_parent->d_fsdata;
> ssize_t retval = 0;
> char * page;
>
> - kobj = file->f_dentry->d_parent->d_fsdata;
> - if (kobj && kobj->subsys)
> - ops = kobj->subsys->sysfs_ops;
> - if (!ops || !ops->store)
> - return -EINVAL;
>
> page = (char *)__get_free_page(GFP_KERNEL);
> if (!page)
> @@ -275,21 +256,72 @@
> return retval;
> }
>
> -static int sysfs_open_file(struct inode * inode, struct file * filp)
> +static int check_perm(struct inode * inode, struct file * file)
> {
> - struct kobject * kobj;
> + struct kobject * kobj = kobject_get(file->f_dentry->d_parent->d_fsdata);
> + struct attribute * attr = file->f_dentry->d_fsdata;
> + struct sysfs_ops * ops = NULL;
> int error = 0;
>
> - kobj = filp->f_dentry->d_parent->d_fsdata;
> - if ((kobj = kobject_get(kobj))) {
> - struct attribute * attr = filp->f_dentry->d_fsdata;
> - if (!attr)
> - error = -EINVAL;
> - } else
> - error = -EINVAL;
> + if (!kobj || !attr)
> + goto Einval;
> +
> + if (kobj->subsys)
> + ops = kobj->subsys->sysfs_ops;
> +
> + /* No sysfs operations, either from having no subsystem,
> + * or the subsystem have no operations.
> + */
> + if (!ops)
> + goto Eaccess;
> +
> + /* File needs write support.
> + * The inode's perms must say it's ok,
> + * and we must have a store method.
> + */
> + if (file->f_mode & FMODE_WRITE) {
> +
> + if (!(inode->i_mode & S_IWUGO))
> + goto Eperm;
> + if (!ops->store)
> + goto Eaccess;
> +
> + }
> +
> + /* File needs read support.
> + * The inode's perms must say it's ok, and we there
> + * must be a show method for it.
> + */
> + if (file->f_mode & FMODE_READ) {
> + if (!(inode->i_mode & S_IRUGO))
> + goto Eperm;
> + if (!ops->show)
> + goto Eaccess;
> + }
> +
> + /* No error? Great, store the ops in file->private_data
> + * for easy access in the read/write functions.
> + */
> + file->private_data = ops;
> + goto Done;
> +
> + Einval:
> + error = -EINVAL;
> + goto Done;
> + Eaccess:
> + error = -EACCES;
> + goto Done;
> + Eperm:
> + error = -EPERM;
> + Done:
> return error;
> }
>
> +static int sysfs_open_file(struct inode * inode, struct file * filp)
> +{
> + return check_perm(inode,filp);
> +}
> +
> static int sysfs_release(struct inode * inode, struct file * filp)
> {
> struct kobject * kobj = filp->f_dentry->d_parent->d_fsdata;
> @@ -541,7 +573,8 @@
> /* make sure dentry is really there */
> if (victim->d_inode &&
> (victim->d_parent->d_inode == dir->d_inode)) {
> - sysfs_unlink(dir->d_inode,victim);
> + simple_unlink(dir->d_inode,victim);
> + d_delete(victim);
> }
> }
> up(&dir->d_inode->i_sem);
> @@ -599,19 +632,16 @@
> list_for_each_safe(node,next,&dentry->d_subdirs) {
> struct dentry * d = list_entry(node,struct dentry,d_child);
> /* make sure dentry is still there */
> - if (d->d_inode)
> - sysfs_unlink(dentry->d_inode,d);
> + if (d->d_inode) {
> + simple_unlink(dentry->d_inode,d);
> + d_delete(dentry);
> + }
> }
>
> - d_invalidate(dentry);
> - if (simple_empty(dentry)) {
> - dentry->d_inode->i_nlink -= 2;
> - dentry->d_inode->i_flags |= S_DEAD;
> - parent->d_inode->i_nlink--;
> - }
> up(&dentry->d_inode->i_sem);
> - dput(dentry);
> -
> + d_invalidate(dentry);
> + simple_rmdir(parent->d_inode,dentry);
> + d_delete(dentry);
> up(&parent->d_inode->i_sem);
> dput(parent);
> }
> @@ -622,4 +652,3 @@
> EXPORT_SYMBOL(sysfs_remove_link);
> EXPORT_SYMBOL(sysfs_create_dir);
> EXPORT_SYMBOL(sysfs_remove_dir);
> -MODULE_LICENSE("GPL");
>

2002-11-21 23:00:50

by Patrick Mochel

[permalink] [raw]
Subject: Re: [BUG] sysfs on 2.5.48 unable to remove files while in use


> Very cool. Bitkeeper is one of those things I never bothered
> with yet (mainly because I feel some comfortable with CVS.)
> It looks like it might be worth going through ramp-up time
> on bk to keep up with changes to the kernel.

There are also nightly snapshots, though I don't recall the URL off the
top of my head..

> > It seems that having a pure sysfs implementation would be superior,
> > instead of having to use a character device to write to. After looking
> > into this, I realize that a couple of pieces of infrastructure are needed,
> > so I'm working on that, and will post a modified version of your module
> > once I'm done..
>
> I look forward to seeing it.

Ok, there are basically two parts to it: the required modifications to
sysfs and your updated module.

The appended patch adds the support to sysfs be defining struct
subsys_attribute for declaring attributes of subsystems themselves, as
well as the needed helpers for creation/teardown and read/write.

I've attached a replacement for noisy.c that creates a sysfs file named
'ctl' that handles addition and deletion of nprobes, similar to the char
device you had created. From the top of the file:

/*
* Noisy Interface for Linux
*
* This driver allows arbitrary printk's to be inserted into
* executing kernel code by using the new kprobes interface.
* A message is attached to an address, and when that address
* is reached, the message is printed.
*
* This uses a sysfs control file to manage a list of probes.
* The sysfs directory is at
*
* /sys/noisy/
*
* and the control is named 'ctl'.
*
* A Noisy Probe can be added by echoing into the file, like:
*
* $ echo "add <address> <message>" > /sys/noisy/ctl
*
* where <address> is the address to break on, and <message>
* is the message to print when the address is reached.
*
* Probes can be removed by doing:
*
* $ echo "del <address>" > /sys/noisy/ctl
*
* where <address> is the address of the probe.
*
* The probes themselves get a directory under /sys/noisy/, and
* the name of the directory is the address of the probe. Each
* probe directory contains one file ('message') that contains
* the message to be printed. (More may be added later).
*/

I've tried to comment the changes and the necessary steps in making this
work.

[ Note: While I'm generally happy with the way things work, I realize that
it still requires a decent amount of overhead in using the sysfs interface
(see the file). I'll be looking into shrinking this...]

Everything seems to work..

> It looks like the patch is against the bk tree, and does not apply cleanly
> to
> strait 2.5.48. I don't know how much has changed to sysfs/inode.c, but
> I can see where the last hunk is looking too far up, so I'll try it anyway.

Ah yes, I forgot there was a patch applied to sysfs since 2.5.48. I've
included everything since 2.5.48 in the one I've appended.


-pat

===== fs/sysfs/inode.c 1.60 vs 1.67 =====
--- 1.60/fs/sysfs/inode.c Sat Nov 16 15:01:34 2002
+++ 1.67/fs/sysfs/inode.c Thu Nov 21 17:01:53 2002
@@ -23,6 +23,8 @@
* Please see Documentation/filesystems/sysfs.txt for more information.
*/

+#undef DEBUG
+
#include <linux/list.h>
#include <linux/init.h>
#include <linux/pagemap.h>
@@ -87,16 +89,17 @@
return inode;
}

-static int sysfs_mknod(struct inode *dir, struct dentry *dentry, int mode, int dev)
+static int sysfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
{
struct inode *inode;
int error = 0;

if (!dentry->d_inode) {
inode = sysfs_get_inode(dir->i_sb, mode, dev);
- if (inode)
+ if (inode) {
d_instantiate(dentry, inode);
- else
+ dget(dentry);
+ } else
error = -ENOSPC;
} else
error = -EEXIST;
@@ -142,17 +145,43 @@
return error;
}

-static int sysfs_unlink(struct inode *dir, struct dentry *dentry)
+#define to_subsys(k) container_of(k,struct subsystem,kobj)
+#define to_sattr(a) container_of(a,struct subsys_attribute,attr)
+
+/**
+ * Subsystem file operations.
+ * These operations allow subsystems to have files that can be
+ * read/written.
+ */
+ssize_t subsys_attr_show(struct kobject * kobj, struct attribute * attr,
+ char * page, size_t count, loff_t off)
{
- struct inode *inode = dentry->d_inode;
- down(&inode->i_sem);
- dentry->d_inode->i_nlink--;
- up(&inode->i_sem);
- d_invalidate(dentry);
- dput(dentry);
- return 0;
+ struct subsystem * s = to_subsys(kobj);
+ struct subsys_attribute * sattr = to_sattr(attr);
+ ssize_t ret = 0;
+
+ if (sattr->show)
+ ret = sattr->show(s,page,count,off);
+ return ret;
}

+ssize_t subsys_attr_store(struct kobject * kobj, struct attribute * attr,
+ const char * page, size_t count, loff_t off)
+{
+ struct subsystem * s = to_subsys(kobj);
+ struct subsys_attribute * sattr = to_sattr(attr);
+ ssize_t ret = 0;
+
+ if (sattr->store)
+ ret = sattr->store(s,page,count,off);
+ return ret;
+}
+
+static struct sysfs_ops subsys_sysfs_ops = {
+ .show = subsys_attr_show,
+ .store = subsys_attr_store,
+};
+
/**
* sysfs_read_file - read an attribute.
* @file: file pointer.
@@ -173,17 +202,11 @@
sysfs_read_file(struct file *file, char *buf, size_t count, loff_t *ppos)
{
struct attribute * attr = file->f_dentry->d_fsdata;
- struct sysfs_ops * ops = NULL;
- struct kobject * kobj;
+ struct sysfs_ops * ops = file->private_data;
+ struct kobject * kobj = file->f_dentry->d_parent->d_fsdata;
unsigned char *page;
ssize_t retval = 0;

- kobj = file->f_dentry->d_parent->d_fsdata;
- if (kobj && kobj->subsys)
- ops = kobj->subsys->sysfs_ops;
- if (!ops || !ops->show)
- return 0;
-
if (count > PAGE_SIZE)
count = PAGE_SIZE;

@@ -234,16 +257,11 @@
sysfs_write_file(struct file *file, const char *buf, size_t count, loff_t *ppos)
{
struct attribute * attr = file->f_dentry->d_fsdata;
- struct sysfs_ops * ops = NULL;
- struct kobject * kobj;
+ struct sysfs_ops * ops = file->private_data;
+ struct kobject * kobj = file->f_dentry->d_parent->d_fsdata;
ssize_t retval = 0;
char * page;

- kobj = file->f_dentry->d_parent->d_fsdata;
- if (kobj && kobj->subsys)
- ops = kobj->subsys->sysfs_ops;
- if (!ops || !ops->store)
- return 0;

page = (char *)__get_free_page(GFP_KERNEL);
if (!page)
@@ -275,21 +293,77 @@
return retval;
}

-static int sysfs_open_file(struct inode * inode, struct file * filp)
+static int check_perm(struct inode * inode, struct file * file)
{
- struct kobject * kobj;
+ struct kobject * kobj = kobject_get(file->f_dentry->d_parent->d_fsdata);
+ struct attribute * attr = file->f_dentry->d_fsdata;
+ struct sysfs_ops * ops = NULL;
int error = 0;

- kobj = filp->f_dentry->d_parent->d_fsdata;
- if ((kobj = kobject_get(kobj))) {
- struct attribute * attr = filp->f_dentry->d_fsdata;
- if (!attr)
- error = -EINVAL;
- } else
- error = -EINVAL;
+ if (!kobj || !attr)
+ goto Einval;
+
+ /* if the kobject has no subsystem, then it is a subsystem itself,
+ * so give it the subsys_sysfs_ops.
+ */
+ if (kobj->subsys)
+ ops = kobj->subsys->sysfs_ops;
+ else
+ ops = &subsys_sysfs_ops;
+
+ /* No sysfs operations, either from having no subsystem,
+ * or the subsystem have no operations.
+ */
+ if (!ops)
+ goto Eaccess;
+
+ /* File needs write support.
+ * The inode's perms must say it's ok,
+ * and we must have a store method.
+ */
+ if (file->f_mode & FMODE_WRITE) {
+
+ if (!(inode->i_mode & S_IWUGO))
+ goto Eperm;
+ if (!ops->store)
+ goto Eaccess;
+
+ }
+
+ /* File needs read support.
+ * The inode's perms must say it's ok, and we there
+ * must be a show method for it.
+ */
+ if (file->f_mode & FMODE_READ) {
+ if (!(inode->i_mode & S_IRUGO))
+ goto Eperm;
+ if (!ops->show)
+ goto Eaccess;
+ }
+
+ /* No error? Great, store the ops in file->private_data
+ * for easy access in the read/write functions.
+ */
+ file->private_data = ops;
+ goto Done;
+
+ Einval:
+ error = -EINVAL;
+ goto Done;
+ Eaccess:
+ error = -EACCES;
+ goto Done;
+ Eperm:
+ error = -EPERM;
+ Done:
return error;
}

+static int sysfs_open_file(struct inode * inode, struct file * filp)
+{
+ return check_perm(inode,filp);
+}
+
static int sysfs_release(struct inode * inode, struct file * filp)
{
struct kobject * kobj = filp->f_dentry->d_parent->d_fsdata;
@@ -541,7 +615,8 @@
/* make sure dentry is really there */
if (victim->d_inode &&
(victim->d_parent->d_inode == dir->d_inode)) {
- sysfs_unlink(dir->d_inode,victim);
+ simple_unlink(dir->d_inode,victim);
+ d_delete(victim);
}
}
up(&dir->d_inode->i_sem);
@@ -599,19 +674,16 @@
list_for_each_safe(node,next,&dentry->d_subdirs) {
struct dentry * d = list_entry(node,struct dentry,d_child);
/* make sure dentry is still there */
- if (d->d_inode)
- sysfs_unlink(dentry->d_inode,d);
+ if (d->d_inode) {
+ simple_unlink(dentry->d_inode,d);
+ d_delete(dentry);
+ }
}

- d_invalidate(dentry);
- if (simple_empty(dentry)) {
- dentry->d_inode->i_nlink -= 2;
- dentry->d_inode->i_flags |= S_DEAD;
- parent->d_inode->i_nlink--;
- }
up(&dentry->d_inode->i_sem);
- dput(dentry);
-
+ d_invalidate(dentry);
+ simple_rmdir(parent->d_inode,dentry);
+ d_delete(dentry);
up(&parent->d_inode->i_sem);
dput(parent);
}
@@ -622,4 +694,3 @@
EXPORT_SYMBOL(sysfs_remove_link);
EXPORT_SYMBOL(sysfs_create_dir);
EXPORT_SYMBOL(sysfs_remove_dir);
-MODULE_LICENSE("GPL");


Attachments:
noisy.c (7.23 kB)

2002-11-23 00:25:59

by Rusty Lynch

[permalink] [raw]
Subject: Re: [BUG] sysfs on 2.5.48 unable to remove files while in use

Patrick,

Your changes added various subsys_* calls that do not seem to
exist in the latest bk tree, or in the patch you sent earlier in this
thread.

Do you have a local implementation of:

subsys_create_file()
subsys_remove_file()

and defines struct subsys_attribute?

-rustyl

----- Original Message -----
From: "Patrick Mochel" <[email protected]>
To: "Rusty Lynch" <[email protected]>
Cc: "Linux Kernel Mailing List" <[email protected]>
Sent: Thursday, November 21, 2002 3:03 PM
Subject: Re: [BUG] sysfs on 2.5.48 unable to remove files while in use


>
> > Very cool. Bitkeeper is one of those things I never bothered
> > with yet (mainly because I feel some comfortable with CVS.)
> > It looks like it might be worth going through ramp-up time
> > on bk to keep up with changes to the kernel.
>
> There are also nightly snapshots, though I don't recall the URL off the
> top of my head..
>
> > > It seems that having a pure sysfs implementation would be superior,
> > > instead of having to use a character device to write to. After looking
> > > into this, I realize that a couple of pieces of infrastructure are
needed,
> > > so I'm working on that, and will post a modified version of your
module
> > > once I'm done..
> >
> > I look forward to seeing it.
>
> Ok, there are basically two parts to it: the required modifications to
> sysfs and your updated module.
>
> The appended patch adds the support to sysfs be defining struct
> subsys_attribute for declaring attributes of subsystems themselves, as
> well as the needed helpers for creation/teardown and read/write.
>
> I've attached a replacement for noisy.c that creates a sysfs file named
> 'ctl' that handles addition and deletion of nprobes, similar to the char
> device you had created. From the top of the file:
>
> /*
> * Noisy Interface for Linux
> *
> * This driver allows arbitrary printk's to be inserted into
> * executing kernel code by using the new kprobes interface.
> * A message is attached to an address, and when that address
> * is reached, the message is printed.
> *
> * This uses a sysfs control file to manage a list of probes.
> * The sysfs directory is at
> *
> * /sys/noisy/
> *
> * and the control is named 'ctl'.
> *
> * A Noisy Probe can be added by echoing into the file, like:
> *
> * $ echo "add <address> <message>" > /sys/noisy/ctl
> *
> * where <address> is the address to break on, and <message>
> * is the message to print when the address is reached.
> *
> * Probes can be removed by doing:
> *
> * $ echo "del <address>" > /sys/noisy/ctl
> *
> * where <address> is the address of the probe.
> *
> * The probes themselves get a directory under /sys/noisy/, and
> * the name of the directory is the address of the probe. Each
> * probe directory contains one file ('message') that contains
> * the message to be printed. (More may be added later).
> */
>
> I've tried to comment the changes and the necessary steps in making this
> work.
>
> [ Note: While I'm generally happy with the way things work, I realize that
> it still requires a decent amount of overhead in using the sysfs interface
> (see the file). I'll be looking into shrinking this...]
>
> Everything seems to work..
>
> > It looks like the patch is against the bk tree, and does not apply
cleanly
> > to
> > strait 2.5.48. I don't know how much has changed to sysfs/inode.c, but
> > I can see where the last hunk is looking too far up, so I'll try it
anyway.
>
> Ah yes, I forgot there was a patch applied to sysfs since 2.5.48. I've
> included everything since 2.5.48 in the one I've appended.
>
>
> -pat
>
> ===== fs/sysfs/inode.c 1.60 vs 1.67 =====
> --- 1.60/fs/sysfs/inode.c Sat Nov 16 15:01:34 2002
> +++ 1.67/fs/sysfs/inode.c Thu Nov 21 17:01:53 2002
> @@ -23,6 +23,8 @@
> * Please see Documentation/filesystems/sysfs.txt for more information.
> */
>
> +#undef DEBUG
> +
> #include <linux/list.h>
> #include <linux/init.h>
> #include <linux/pagemap.h>
> @@ -87,16 +89,17 @@
> return inode;
> }
>
> -static int sysfs_mknod(struct inode *dir, struct dentry *dentry, int
mode, int dev)
> +static int sysfs_mknod(struct inode *dir, struct dentry *dentry, int
mode, dev_t dev)
> {
> struct inode *inode;
> int error = 0;
>
> if (!dentry->d_inode) {
> inode = sysfs_get_inode(dir->i_sb, mode, dev);
> - if (inode)
> + if (inode) {
> d_instantiate(dentry, inode);
> - else
> + dget(dentry);
> + } else
> error = -ENOSPC;
> } else
> error = -EEXIST;
> @@ -142,17 +145,43 @@
> return error;
> }
>
> -static int sysfs_unlink(struct inode *dir, struct dentry *dentry)
> +#define to_subsys(k) container_of(k,struct subsystem,kobj)
> +#define to_sattr(a) container_of(a,struct subsys_attribute,attr)
> +
> +/**
> + * Subsystem file operations.
> + * These operations allow subsystems to have files that can be
> + * read/written.
> + */
> +ssize_t subsys_attr_show(struct kobject * kobj, struct attribute * attr,
> + char * page, size_t count, loff_t off)
> {
> - struct inode *inode = dentry->d_inode;
> - down(&inode->i_sem);
> - dentry->d_inode->i_nlink--;
> - up(&inode->i_sem);
> - d_invalidate(dentry);
> - dput(dentry);
> - return 0;
> + struct subsystem * s = to_subsys(kobj);
> + struct subsys_attribute * sattr = to_sattr(attr);
> + ssize_t ret = 0;
> +
> + if (sattr->show)
> + ret = sattr->show(s,page,count,off);
> + return ret;
> }
>
> +ssize_t subsys_attr_store(struct kobject * kobj, struct attribute * attr,
> + const char * page, size_t count, loff_t off)
> +{
> + struct subsystem * s = to_subsys(kobj);
> + struct subsys_attribute * sattr = to_sattr(attr);
> + ssize_t ret = 0;
> +
> + if (sattr->store)
> + ret = sattr->store(s,page,count,off);
> + return ret;
> +}
> +
> +static struct sysfs_ops subsys_sysfs_ops = {
> + .show = subsys_attr_show,
> + .store = subsys_attr_store,
> +};
> +
> /**
> * sysfs_read_file - read an attribute.
> * @file: file pointer.
> @@ -173,17 +202,11 @@
> sysfs_read_file(struct file *file, char *buf, size_t count, loff_t *ppos)
> {
> struct attribute * attr = file->f_dentry->d_fsdata;
> - struct sysfs_ops * ops = NULL;
> - struct kobject * kobj;
> + struct sysfs_ops * ops = file->private_data;
> + struct kobject * kobj = file->f_dentry->d_parent->d_fsdata;
> unsigned char *page;
> ssize_t retval = 0;
>
> - kobj = file->f_dentry->d_parent->d_fsdata;
> - if (kobj && kobj->subsys)
> - ops = kobj->subsys->sysfs_ops;
> - if (!ops || !ops->show)
> - return 0;
> -
> if (count > PAGE_SIZE)
> count = PAGE_SIZE;
>
> @@ -234,16 +257,11 @@
> sysfs_write_file(struct file *file, const char *buf, size_t count, loff_t
*ppos)
> {
> struct attribute * attr = file->f_dentry->d_fsdata;
> - struct sysfs_ops * ops = NULL;
> - struct kobject * kobj;
> + struct sysfs_ops * ops = file->private_data;
> + struct kobject * kobj = file->f_dentry->d_parent->d_fsdata;
> ssize_t retval = 0;
> char * page;
>
> - kobj = file->f_dentry->d_parent->d_fsdata;
> - if (kobj && kobj->subsys)
> - ops = kobj->subsys->sysfs_ops;
> - if (!ops || !ops->store)
> - return 0;
>
> page = (char *)__get_free_page(GFP_KERNEL);
> if (!page)
> @@ -275,21 +293,77 @@
> return retval;
> }
>
> -static int sysfs_open_file(struct inode * inode, struct file * filp)
> +static int check_perm(struct inode * inode, struct file * file)
> {
> - struct kobject * kobj;
> + struct kobject * kobj = kobject_get(file->f_dentry->d_parent->d_fsdata);
> + struct attribute * attr = file->f_dentry->d_fsdata;
> + struct sysfs_ops * ops = NULL;
> int error = 0;
>
> - kobj = filp->f_dentry->d_parent->d_fsdata;
> - if ((kobj = kobject_get(kobj))) {
> - struct attribute * attr = filp->f_dentry->d_fsdata;
> - if (!attr)
> - error = -EINVAL;
> - } else
> - error = -EINVAL;
> + if (!kobj || !attr)
> + goto Einval;
> +
> + /* if the kobject has no subsystem, then it is a subsystem itself,
> + * so give it the subsys_sysfs_ops.
> + */
> + if (kobj->subsys)
> + ops = kobj->subsys->sysfs_ops;
> + else
> + ops = &subsys_sysfs_ops;
> +
> + /* No sysfs operations, either from having no subsystem,
> + * or the subsystem have no operations.
> + */
> + if (!ops)
> + goto Eaccess;
> +
> + /* File needs write support.
> + * The inode's perms must say it's ok,
> + * and we must have a store method.
> + */
> + if (file->f_mode & FMODE_WRITE) {
> +
> + if (!(inode->i_mode & S_IWUGO))
> + goto Eperm;
> + if (!ops->store)
> + goto Eaccess;
> +
> + }
> +
> + /* File needs read support.
> + * The inode's perms must say it's ok, and we there
> + * must be a show method for it.
> + */
> + if (file->f_mode & FMODE_READ) {
> + if (!(inode->i_mode & S_IRUGO))
> + goto Eperm;
> + if (!ops->show)
> + goto Eaccess;
> + }
> +
> + /* No error? Great, store the ops in file->private_data
> + * for easy access in the read/write functions.
> + */
> + file->private_data = ops;
> + goto Done;
> +
> + Einval:
> + error = -EINVAL;
> + goto Done;
> + Eaccess:
> + error = -EACCES;
> + goto Done;
> + Eperm:
> + error = -EPERM;
> + Done:
> return error;
> }
>
> +static int sysfs_open_file(struct inode * inode, struct file * filp)
> +{
> + return check_perm(inode,filp);
> +}
> +
> static int sysfs_release(struct inode * inode, struct file * filp)
> {
> struct kobject * kobj = filp->f_dentry->d_parent->d_fsdata;
> @@ -541,7 +615,8 @@
> /* make sure dentry is really there */
> if (victim->d_inode &&
> (victim->d_parent->d_inode == dir->d_inode)) {
> - sysfs_unlink(dir->d_inode,victim);
> + simple_unlink(dir->d_inode,victim);
> + d_delete(victim);
> }
> }
> up(&dir->d_inode->i_sem);
> @@ -599,19 +674,16 @@
> list_for_each_safe(node,next,&dentry->d_subdirs) {
> struct dentry * d = list_entry(node,struct dentry,d_child);
> /* make sure dentry is still there */
> - if (d->d_inode)
> - sysfs_unlink(dentry->d_inode,d);
> + if (d->d_inode) {
> + simple_unlink(dentry->d_inode,d);
> + d_delete(dentry);
> + }
> }
>
> - d_invalidate(dentry);
> - if (simple_empty(dentry)) {
> - dentry->d_inode->i_nlink -= 2;
> - dentry->d_inode->i_flags |= S_DEAD;
> - parent->d_inode->i_nlink--;
> - }
> up(&dentry->d_inode->i_sem);
> - dput(dentry);
> -
> + d_invalidate(dentry);
> + simple_rmdir(parent->d_inode,dentry);
> + d_delete(dentry);
> up(&parent->d_inode->i_sem);
> dput(parent);
> }
> @@ -622,4 +694,3 @@
> EXPORT_SYMBOL(sysfs_remove_link);
> EXPORT_SYMBOL(sysfs_create_dir);
> EXPORT_SYMBOL(sysfs_remove_dir);
> -MODULE_LICENSE("GPL");
>

2002-11-24 12:57:39

by Werner Almesberger

[permalink] [raw]
Subject: Re: [BUG] sysfs on 2.5.48 unable to remove files while in use

Patrick Mochel wrote:
> * This uses a sysfs control file to manage a list of probes.
> * The sysfs directory is at
> *
> * /sys/noisy/

I really like the idea of controlling kprobes through sysfs.
However, ...

> * A Noisy Probe can be added by echoing into the file, like:
> *
> * $ echo "add <address> <message>" > /sys/noisy/ctl

do you really need a "magic" file for this ? I don't know how
well sysfs supports mkdir/rmdir (if at all), but they would
seem to provide a much more natural interface. (VFS allows
rmdir to remove non-empty directories, so you wouldn't have
to rm -r.)

I don't think you need probe installation and message setting
to be atomic, so you could just assign a unique default
message, e.g. the probe address.

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2002-11-24 13:31:14

by Alexander Viro

[permalink] [raw]
Subject: Re: [BUG] sysfs on 2.5.48 unable to remove files while in use



On Sun, 24 Nov 2002, Werner Almesberger wrote:

> do you really need a "magic" file for this ? I don't know how
> well sysfs supports mkdir/rmdir (if at all), but they would
> seem to provide a much more natural interface. (VFS allows
> rmdir to remove non-empty directories, so you wouldn't have
> to rm -r.)

a) sysfs doesn't allow mkdir/rmdir and thus avoids an imperial buttload
of races - witness the crap in devfs.

b) rmdir of non-empty directory pretty much guarantees another buttload of
races.

c) mkdir creating non-empty directory or rmdir removing non-empty directory
is *ugly*. BTW, Roman's "filesystem" for modules in its current form is
vetoed, as far as I'm concerned - this sort of magic is just plain wrong.

2002-11-24 14:25:53

by Werner Almesberger

[permalink] [raw]
Subject: Re: [BUG] sysfs on 2.5.48 unable to remove files while in use

Alexander Viro wrote:
> a) sysfs doesn't allow mkdir/rmdir and thus avoids an imperial buttload
> of races - witness the crap in devfs.

But isn't one of the problems there that kernel and user space can
both initiate changes ? What I'm proposing is to let this be driven
by user space. You'd of course still have different policies in
different parts of the sysfs hierarchy, but would that really be a
problem ?

> b) rmdir of non-empty directory pretty much guarantees another buttload of
> races.

Hmm, if the sysfs user could veto file creation while the removal is
in progress, behaviour like rm -r would certainly be sufficient.
Even if it can't veto it, this may be good enough, since user space
is in charge of file creation (after mkdir) anyway.

The main reason why I think rmdir with rm -r functionality would be
useful in this case, is that, in order to implement a "remove probe"
functionality, the application wouldn't have to know what exactly
lives in that directory, and it also wouldn't have to implement a
real rm -r (or rm xxx/* && rmdir xxx), which I'd also consider a
more powerful operation than a simple rmdir.

I can see a potential issue with a proliferation of callbacks,
though.

> c) mkdir creating non-empty directory or rmdir removing non-empty directory
> is *ugly*.

Uglier than a "magic" file that then goes and creates/removes
directories and files in them ? Why don't we echo mkdir foo >.
then ? ;-)

Besides, there's some precedent: . and .. are also handled
implicitly. (And do we really have no file systems that store
some meta-data in "magic" files that are created/removed
implicitly ?)

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2002-11-24 14:45:03

by Roman Zippel

[permalink] [raw]
Subject: Re: [BUG] sysfs on 2.5.48 unable to remove files while in use

Hi,

On Sun, 24 Nov 2002, Alexander Viro wrote:

> c) mkdir creating non-empty directory or rmdir removing non-empty directory
> is *ugly*. BTW, Roman's "filesystem" for modules in its current form is
> vetoed, as far as I'm concerned - this sort of magic is just plain wrong.

I'm open to alternative ideas.
If the contents of the dir again is controlled by the fs itself, I don't
really see a problem. You need some way to tell the kernel "please create
this object and give me an interface to control it".

bye, Roman

2002-11-25 17:27:53

by Patrick Mochel

[permalink] [raw]
Subject: Re: [BUG] sysfs on 2.5.48 unable to remove files while in use


On Fri, 22 Nov 2002, Rusty Lynch wrote:

> Patrick,
>
> Your changes added various subsys_* calls that do not seem to
> exist in the latest bk tree, or in the patch you sent earlier in this
> thread.
>
> Do you have a local implementation of:
>
> subsys_create_file()
> subsys_remove_file()
>
> and defines struct subsys_attribute?

Woops, I forgot to include that part of the patch. Attached is what I have
against current BK, which should also be 2.5.49. This should be everything
you need to make it work. Sorry about that.

-pat

diff -Nru a/fs/sysfs/inode.c b/fs/sysfs/inode.c
--- a/fs/sysfs/inode.c Mon Nov 25 11:28:31 2002
+++ b/fs/sysfs/inode.c Mon Nov 25 11:28:31 2002
@@ -145,6 +145,43 @@
return error;
}

+#define to_subsys(k) container_of(k,struct subsystem,kobj)
+#define to_sattr(a) container_of(a,struct subsys_attribute,attr)
+
+/**
+ * Subsystem file operations.
+ * These operations allow subsystems to have files that can be
+ * read/written.
+ */
+ssize_t subsys_attr_show(struct kobject * kobj, struct attribute * attr,
+ char * page, size_t count, loff_t off)
+{
+ struct subsystem * s = to_subsys(kobj);
+ struct subsys_attribute * sattr = to_sattr(attr);
+ ssize_t ret = 0;
+
+ if (sattr->show)
+ ret = sattr->show(s,page,count,off);
+ return ret;
+}
+
+ssize_t subsys_attr_store(struct kobject * kobj, struct attribute * attr,
+ const char * page, size_t count, loff_t off)
+{
+ struct subsystem * s = to_subsys(kobj);
+ struct subsys_attribute * sattr = to_sattr(attr);
+ ssize_t ret = 0;
+
+ if (sattr->store)
+ ret = sattr->store(s,page,count,off);
+ return ret;
+}
+
+static struct sysfs_ops subsys_sysfs_ops = {
+ .show = subsys_attr_show,
+ .store = subsys_attr_store,
+};
+
/**
* sysfs_read_file - read an attribute.
* @file: file pointer.
@@ -265,9 +302,14 @@

if (!kobj || !attr)
goto Einval;
-
+
+ /* if the kobject has no subsystem, then it is a subsystem itself,
+ * so give it the subsys_sysfs_ops.
+ */
if (kobj->subsys)
ops = kobj->subsys->sysfs_ops;
+ else
+ ops = &subsys_sysfs_ops;

/* No sysfs operations, either from having no subsystem,
* or the subsystem have no operations.
diff -Nru a/include/linux/kobject.h b/include/linux/kobject.h
--- a/include/linux/kobject.h Mon Nov 25 11:28:31 2002
+++ b/include/linux/kobject.h Mon Nov 25 11:28:31 2002
@@ -60,4 +60,13 @@
kobject_put(&s->kobj);
}

+struct subsys_attribute {
+ struct attribute attr;
+ ssize_t (*show)(struct subsystem *, char *, size_t, loff_t);
+ ssize_t (*store)(struct subsystem *, const char *, size_t, loff_t);
+};
+
+extern int subsys_create_file(struct subsystem * , struct subsys_attribute *);
+extern void subsys_remove_file(struct subsystem * , struct subsys_attribute *);
+
#endif /* _KOBJECT_H_ */
diff -Nru a/lib/kobject.c b/lib/kobject.c
--- a/lib/kobject.c Mon Nov 25 11:28:31 2002
+++ b/lib/kobject.c Mon Nov 25 11:28:31 2002
@@ -229,6 +229,38 @@
}


+/**
+ * subsystem_create_file - export sysfs attribute file.
+ * @s: subsystem.
+ * @a: subsystem attribute descriptor.
+ */
+
+int subsys_create_file(struct subsystem * s, struct subsys_attribute * a)
+{
+ int error = 0;
+ if (subsys_get(s)) {
+ error = sysfs_create_file(&s->kobj,&a->attr);
+ subsys_put(s);
+ }
+ return error;
+}
+
+
+/**
+ * subsystem_remove_file - remove sysfs attribute file.
+ * @s: subsystem.
+ * @a: attribute desciptor.
+ */
+
+void subsys_remove_file(struct subsystem * s, struct subsys_attribute * a)
+{
+ if (subsys_get(s)) {
+ sysfs_remove_file(&s->kobj,&a->attr);
+ subsys_put(s);
+ }
+}
+
+
EXPORT_SYMBOL(kobject_init);
EXPORT_SYMBOL(kobject_register);
EXPORT_SYMBOL(kobject_unregister);
@@ -238,3 +270,5 @@
EXPORT_SYMBOL(subsystem_init);
EXPORT_SYMBOL(subsystem_register);
EXPORT_SYMBOL(subsystem_unregister);
+EXPORT_SYMBOL(subsys_create_file);
+EXPORT_SYMBOL(subsys_remove_file);

2002-11-25 18:33:30

by Patrick Mochel

[permalink] [raw]
Subject: Re: [BUG] sysfs on 2.5.48 unable to remove files while in use


On Sun, 24 Nov 2002, Werner Almesberger wrote:

> Alexander Viro wrote:
> > a) sysfs doesn't allow mkdir/rmdir and thus avoids an imperial buttload
> > of races - witness the crap in devfs.
>
> But isn't one of the problems there that kernel and user space can
> both initiate changes ? What I'm proposing is to let this be driven
> by user space. You'd of course still have different policies in
> different parts of the sysfs hierarchy, but would that really be a
> problem ?

Yes. The fs hasn't allowed userspace to create and remove files and
directories because it's too complex to be worth it. There are a number of
races to be dealt with, plus you have to make the filesystem deal with the
policy of interpreting the user's request properly.

> > c) mkdir creating non-empty directory or rmdir removing non-empty directory
> > is *ugly*.
>
> Uglier than a "magic" file that then goes and creates/removes
> directories and files in them ? Why don't we echo mkdir foo >.
> then ? ;-)

On the surface, that's what's happening, and it's the same way procfs has
worked for ages. It's not great, but it works well for specific purposes.

The difference is that sysfs directories are tied to kobjects. By writing
to the file with the specific syntax, you are telling the module to create
an object with the parameters you give. Once the object is registered, a
directory is created for it, and it's only removed when the object is
unregistered. We don't just randomly create directories.

The object type is context dependent, as well as the parameters and the
syntax to write to the file. That's the flexibility we can afford.

[ From a purist standpoint, it's still a little weird. Al has been telling
me for ages that the only proper way to do it is to have each object get a
mountpoint instead of a directory. According to him, which I generally
take as gospel, it's the only way to do in-kernel filesystems in a
race-free way. It's hard, and IIRC, there are several infrastructural
changes that must take place in order for it to happen. (I think I still
have the IRC log somewhere..) But, it might happen someday.. ]


-pat

2002-11-25 19:06:47

by Werner Almesberger

[permalink] [raw]
Subject: Re: [BUG] sysfs on 2.5.48 unable to remove files while in use

Patrick Mochel wrote:
> The difference is that sysfs directories are tied to kobjects. By writing
> to the file with the specific syntax, you are telling the module to create
> an object with the parameters you give. Once the object is registered, a
> directory is created for it, and it's only removed when the object is
> unregistered. We don't just randomly create directories.

Yes, and I think that's perfect. All I'm suggesting is to use
mkdir/rmdir to make the creation/removal request, and then use
whatever is convenient for ensuring that things stay unique (i.e.
resolve it either at the VFS, kprobes, or "glue" level).

I fully agree that creating interrelated objects at two distinct
places in the kernel and then trying to "synchronize" them leads
to madness. (*)

(*) Actually, someone in academia who works with protocol
validations, and has a bit too much time on his or her hands,
should once try to find an elegant way of linking this type of
in-kernel dependencies to a validation tool like Spin.

Right now, the process for validating such a mess is still to
abstract the design into a model in a language like Promela
(very slick, but decidedly "alien"), SyncC++ (C-ish, but still
too far from real kernel code), etc., and to validate the
model.

I've done this on some occasions (and discovered interesting
bugs in the process), but the pain threshold required is a bit
too high to suggest this as a general procedure.

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2002-11-25 19:33:20

by Roman Zippel

[permalink] [raw]
Subject: Re: [BUG] sysfs on 2.5.48 unable to remove files while in use

Hi,

On Mon, 25 Nov 2002, Patrick Mochel wrote:

> [ From a purist standpoint, it's still a little weird. Al has been telling
> me for ages that the only proper way to do it is to have each object get a
> mountpoint instead of a directory. According to him, which I generally
> take as gospel, it's the only way to do in-kernel filesystems in a
> race-free way. It's hard, and IIRC, there are several infrastructural
> changes that must take place in order for it to happen. (I think I still
> have the IRC log somewhere..) But, it might happen someday.. ]

What advantages do mountpoints have compared to normal directories?

bye, Roman

2002-11-25 23:58:12

by Rusty Lynch

[permalink] [raw]
Subject: Re:[BUG] sysfs on 2.5.48 unable to remove files while in use

Patrick,

Sysfs (with the new subsystem_* function calls
that your previous patch added) will crash the
kernel if a subsystem is unregistered with content
inside it.

I stumbled across this while messing with the
noisy driver that you fixed up for proper sysfs
usage. To make sure that I wasn't seeing some
side effects from kprobes, I distilled the
essence of the sysfs operations to a new module
that I have attached to this email.

With sysfs mounted to /sys/, loading this module
will cause /sys/test/ and /sys/test/ctl to be
created.

Test entries can be created by:
echo "add 1 test1" > /sys/test/ctl

this will cause /sys/1/message to be created
where message contains "test1".

This test entry can be removed by:
echo "del 1" > /sys/test/ctl

All is fine if I create a few entries, and
then remove them all before rmmod'ing the
driver, but if I create an entry and then
attempt to rmmod the modules then my system
freezes with a partial oops message (not
enough ksymoops to do anything with).

Here is the partial oops message ==>

Unable to handle kernel paging request at virtual address 5a5a5a5a
printing eip:
c018a9c1
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0060:[<c018a9c1>] Not tainted
EFLAGS: 00010206
EIP is at sysfs_remove_dir+0x21/0x121
eax: 5a5a5a5a ebx: da269f74 ecx: 00000000 edx: 00000002
esi: 00000800 edi: d9e72654 ebp: de4fdf14 esp: de4fdef8
ds: 0068 es: 0068 ss: 0068
Process rmmod (pid: 1173, threadinfo=de4fc000 task=dce30ce0)
Stack: e08aa540 da269f74 de4fdf14 5a5a5a5a da269f74 00000800 da269f74 de4fdf30
c01e9a2a da269f74 e08aa540 da269f74 00000800 e08a8000 de4fdf40 c01e9ad4
da269f74 da269f88 de4fdf54 e08aa65b da269f74 00000003 c0371b94 de4fdfbc
Call Trace:
[<e08aa540>] <1>Unable to handle kernel paging request at virtual address e08ad printing eip:
c0133e28
*pde = 015df067
*pte = 00000000

Here is the code for the test case described above:


/*
* This is a simple module that performs basic sysfs file
* operations. By installing this module, a new directory
* called "test" is added to the sysfs root, with a control
* file in that directory called "ctl"
*
* To add a new test entry:
* echo "add somename mytest" > $MYSYSFS/test/ctl
*
* As a result a new directory named "somename" will
* be created in the test directory, with a file inside
* that directory called "message". Doing a cat on that
* file will dump "mytest".
*
* To remove the entry :
* echo "del somename" > $MYSYSFS/test/ctl
*/

#include <linux/module.h>
#include <linux/init.h>
#include <linux/slab.h>
#include <linux/kobject.h>

#define MAX_MSG_SIZE 128
#define to_mycont(entry) container_of(entry,struct mycont,kobj.entry);

static DECLARE_MUTEX(test_sem);
static struct subsystem test_subsys;

struct mycont {
int id;
char message[MAX_MSG_SIZE + 1];
struct kobject kobj;
};

/*
* struct test_attribute - used for defining test attributes, with a
* typesafe show method.
*/
struct test_attribute {
struct attribute attr;
ssize_t (*show)(struct mycont *,char *,size_t,loff_t);
};


/**
* test_attr_show - forward sysfs read call to proper handler.
* @kobj: kobject of test being acessed.
* @attr: generic attribute portion of struct test_attribute.
* @page: buffer to write into.
* @count: number of bytes requested.
* @off: offset into buffer.
*
* This is called from sysfs and is necessary to convert the generic
* kobject into the right type, and to convert the attribute into the
* right attribute type.
*/

static ssize_t test_attr_show(struct kobject * kobj, struct attribute * attr,
char * page, size_t count, loff_t off)
{
struct mycont * n = container_of(kobj,struct mycont,kobj);
struct test_attribute * test_attr = container_of(attr,struct test_attribute,attr);
return test_attr->show ? test_attr->show(n,page,count,off) : 0;
}

/*
* test_sysfs_ops - sysfs operations for struct myconts.
*/
static struct sysfs_ops test_sysfs_ops = {
.show = test_attr_show,
};


/* Default Attribute - the message to print out. */
static ssize_t test_message_read(struct mycont * n, char * page, size_t count, loff_t off)
{
return off ? 0: snprintf(page,MAX_MSG_SIZE,"%s\n",n->message);

}

static struct test_attribute attr_message = {
.attr = { .name = "message", .mode = S_IRUGO },
.show = test_message_read,
};

/* Declare array of default attributes to be added when an mycont is added */
static struct attribute * default_attrs[] = {
&attr_message.attr,
NULL,
};


/* Declare test_subsys for addition to sysfs */
static struct subsystem test_subsys = {
.kobj = { .name = "test" },
.default_attrs = default_attrs,
.sysfs_ops = &test_sysfs_ops,
};


/*
* test ctl attribute.
* This is declared as an attribute of the subsystem, and added in
* test_init().
*
* Reading this attribute dumps all the registered test messages.
* Writing to it either adds or deletes a test message, as described at
* the beginning of the file.
*/
static ssize_t ctl_show(struct subsystem * s, char * page, size_t count, loff_t off)
{
char * str = page;
int ret = 0;

down(&test_sem);
if (!off) {
struct list_head * entry, * next;
list_for_each_safe(entry,next,&test_subsys.list) {
struct mycont * n = to_mycont(entry);
if ((ret + MAX_MSG_SIZE) > PAGE_SIZE)
break;
str += snprintf(str,PAGE_SIZE - ret,"%i: %s\n",
n->id, n->message);
ret = str - page;
}
}
up(&test_sem);
return ret;
}

static int add(int id, char * message)
{
struct mycont * n;
int error = 0;

n = kmalloc(sizeof(struct mycont),GFP_KERNEL);
if (!n)
return -ENOMEM;
memset(n,0,sizeof(struct mycont));

strncpy(n->message,message,MAX_MSG_SIZE);
n->id = id;

/* Doing this manually will be unnecessary soon. */
kobject_init(&n->kobj);
n->kobj.subsys = &test_subsys;
snprintf(n->kobj.name, KOBJ_NAME_LEN, "%i", id);
if ((error = kobject_register(&n->kobj))) {
goto Error;
}
goto Done;
Error:
kfree(n);
Done:
return error;
}

static int del(int id)
{
struct list_head * entry;
struct mycont * n;

list_for_each(entry,&test_subsys.list) {
n = to_mycont(entry);
if ((int)(n->id) == id) {
kobject_unregister(&n->kobj);
return 0;
}
}
return -EFAULT;
}

static ssize_t ctl_store(struct subsystem * s, const char * page, size_t count, loff_t off)
{
char message[MAX_MSG_SIZE];
char ctl[16];
int id;
int num;
int error;
int ret = 0;

down(&test_sem);
if (off)
goto Done;
num = sscanf(page,"%15s %i %128s",ctl,&id,message);
if (!num) {
error = -EINVAL;
goto Done;
}
if (!strcmp(ctl,"add") && num == 3)
error = add(id,message);
else if (!strcmp(ctl,"del") && num == 2)
error = del(id);
else
error = -EINVAL;
ret = error ? error : count;
Done:
up(&test_sem);
return ret;
}

static struct subsys_attribute subsys_attr_ctl = {
.attr = { .name = "ctl", .mode = 0644 },
.show = ctl_show,
.store = ctl_store,
};


static int __init test_init(void)
{
subsystem_register(&test_subsys);
subsys_create_file(&test_subsys,&subsys_attr_ctl);
return 0;
}

static void __exit test_exit (void)
{
subsys_remove_file(&test_subsys,&subsys_attr_ctl);
subsystem_unregister(&test_subsys);
}

module_init(test_init);
module_exit(test_exit);

MODULE_AUTHOR("Rusty Lynch");
MODULE_LICENSE("GPL");

2002-11-26 01:10:22

by Patrick Mochel

[permalink] [raw]
Subject: Re:[BUG] sysfs on 2.5.48 unable to remove files while in use


On Mon, 25 Nov 2002, Rusty Lynch wrote:

> Patrick,
>
> Sysfs (with the new subsystem_* function calls
> that your previous patch added) will crash the
> kernel if a subsystem is unregistered with content
> inside it.
>
> I stumbled across this while messing with the
> noisy driver that you fixed up for proper sysfs
> usage. To make sure that I wasn't seeing some
> side effects from kprobes, I distilled the
> essence of the sysfs operations to a new module
> that I have attached to this email.
>
> With sysfs mounted to /sys/, loading this module
> will cause /sys/test/ and /sys/test/ctl to be
> created.
>
> Test entries can be created by:
> echo "add 1 test1" > /sys/test/ctl
>
> this will cause /sys/1/message to be created
> where message contains "test1".
>
> This test entry can be removed by:
> echo "del 1" > /sys/test/ctl
>
> All is fine if I create a few entries, and
> then remove them all before rmmod'ing the
> driver, but if I create an entry and then
> attempt to rmmod the modules then my system
> freezes with a partial oops message (not
> enough ksymoops to do anything with).

Ah yes. This is a known problem, though I do apologize for the rudeness in
the Oops. You're going to have to increment the module count whenever a
noisy probe is registered. There is no automatic correlation between the
subsystem refcount and the module refcount. Because that is true, there
are some races involved between adding a probe and the module being
unloaded.

I recently changed the device driver mechanism for handling this. Each
driver has a semaphore that is locked when the driver is registered, and
unlocked only when the refcount reaches 0. driver_unregister() decrements
the refcount, then waits to take the lock. If there are any users, rmmod
will block until they go away. You can try an approach like that, though
it may not be what you want. You might just want to remove all the probes
when someone does rmmod..

-pat