2004-11-16 13:55:36

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: [patch 0/4] Cleanup file_count usage

This patchset is an attempt to cleanup some bogus and some not so bogus
reads to struct file.f_count of the vfs from various subsystems in the
kernel. This patchset doesn't cleanup uses of f_count completely;
Geting rid of all reads to f_count was suggested by Viro during the discussion
on kref based lockfree fd management sometime back.

This cleans up:
1. Wrong usage of f_count in .open and .release routines of some file
systems
2. Error messages and warnings based on reads to f_count
3. Redundant check based on file_count in the vfs
4. Usage of file_count in hugetlb to report number of attaches

Patchset to follow.

What remains:
1. Hack to return error code to user space at last close through file_count
check at the driver's flush routine. This hack is used in scsi/st.c,
scsi/osst.c and coda/file.c to return error code through .flush()
(Although it is doubtful if applications check for error during close(2)).
Kai has a patch to cleanup scsi/st.c. I will make patches to move last
close code from .flush() to .release() in the coda filesystem if no one
objects to it. Not sure if you can do anything on errors at close...
Suggestions on alternatives welcome.
2. Read to f_count at drivers/net/ppp_generic.c:
The PPPIOCDETACH ioctl is failed if the device fd is duped and
being polled by another process -- which is determined by a read
to f_count. The comments in the code indicate that the PPPIOCDETACH
ioctl should be junked and userland can use a workaround
by closing the fd and reopening /dev/ppp. I can make a patch to junk
PPPIOCDETACH, but is it okay to break binary compatibility? Paul?
3. Usage of file_count in the AF_UNIX garbage collector (unix_gc)

Thanks,
Kiran


2004-11-16 14:02:43

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: [patch 1/4] Cleanup file_count usage: Bad usage at some .open, .release

Following patch cleans up some obviously wrong usage of struct file.f_count
in ->open() and ->release() routines of some file systems, the use case being
check if f_count is 1 in ->open() and f_count is 0 in ->release().
This patch cleans up the f_count usage for affs, hfs and hfsplus file systems.

Signed-off-by: Ravikiran Thirumalai <[email protected]>
---

affs/file.c | 4 ----
hfs/inode.c | 4 ----
hfsplus/inode.c | 4 ----
3 files changed, 12 deletions(-)


diff -ruN -X dontdiff linux-2.6.9/fs/affs/file.c f_count-2.6.9/fs/affs/file.c
--- linux-2.6.9/fs/affs/file.c 2004-10-19 03:24:08.000000000 +0530
+++ f_count-2.6.9/fs/affs/file.c 2004-11-09 17:45:19.000000000 +0530
@@ -62,8 +62,6 @@
static int
affs_file_open(struct inode *inode, struct file *filp)
{
- if (atomic_read(&filp->f_count) != 1)
- return 0;
pr_debug("AFFS: open(%d)\n", AFFS_I(inode)->i_opencnt);
AFFS_I(inode)->i_opencnt++;
return 0;
@@ -72,8 +70,6 @@
static int
affs_file_release(struct inode *inode, struct file *filp)
{
- if (atomic_read(&filp->f_count) != 0)
- return 0;
pr_debug("AFFS: release(%d)\n", AFFS_I(inode)->i_opencnt);
AFFS_I(inode)->i_opencnt--;
if (!AFFS_I(inode)->i_opencnt)
diff -ruN -X dontdiff linux-2.6.9/fs/hfs/inode.c f_count-2.6.9/fs/hfs/inode.c
--- linux-2.6.9/fs/hfs/inode.c 2004-10-19 03:25:29.000000000 +0530
+++ f_count-2.6.9/fs/hfs/inode.c 2004-11-09 17:45:19.000000000 +0530
@@ -524,8 +524,6 @@
{
if (HFS_IS_RSRC(inode))
inode = HFS_I(inode)->rsrc_inode;
- if (atomic_read(&file->f_count) != 1)
- return 0;
atomic_inc(&HFS_I(inode)->opencnt);
return 0;
}
@@ -536,8 +534,6 @@

if (HFS_IS_RSRC(inode))
inode = HFS_I(inode)->rsrc_inode;
- if (atomic_read(&file->f_count) != 0)
- return 0;
if (atomic_dec_and_test(&HFS_I(inode)->opencnt)) {
down(&inode->i_sem);
hfs_file_truncate(inode);
diff -ruN -X dontdiff linux-2.6.9/fs/hfsplus/inode.c f_count-2.6.9/fs/hfsplus/inode.c
--- linux-2.6.9/fs/hfsplus/inode.c 2004-10-19 03:23:44.000000000 +0530
+++ f_count-2.6.9/fs/hfsplus/inode.c 2004-11-09 17:45:19.000000000 +0530
@@ -268,8 +268,6 @@
{
if (HFSPLUS_IS_RSRC(inode))
inode = HFSPLUS_I(inode).rsrc_inode;
- if (atomic_read(&file->f_count) != 1)
- return 0;
atomic_inc(&HFSPLUS_I(inode).opencnt);
return 0;
}
@@ -280,8 +278,6 @@

if (HFSPLUS_IS_RSRC(inode))
inode = HFSPLUS_I(inode).rsrc_inode;
- if (atomic_read(&file->f_count) != 0)
- return 0;
if (atomic_dec_and_test(&HFSPLUS_I(inode).opencnt)) {
down(&inode->i_sem);
hfsplus_file_truncate(inode);

2004-11-16 14:08:08

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: [patch 2/4] Cleanup file_count usage: Error/debug messages on f_count reads

Viro doesn't like reads to f_count. This patch cleans up (removes) some
error messages and warnings based on f_count checks.

We also don't want to have atomic_reads to refcount lying around, since for
converting the f_count refcounter to struct kref and lockfree kref extensions,
krefs might not support read routines to struct kref.

Notes:
1. Not sure if the debug message in __aio_put_req is _really_ useful. This
patch removes this debug message for aio.
2. The file_count check to check if the file is already closed in filp_close
also doesn't seem to be useful, as an fput should have a corresponding fget,
and a close on a closed file will return -EBADF from sys_close itself,
without having to call filp_close
3. Not sure if the debug messages in atm scheduler are _really_ useful either.

Signed-off-by: Ravikiran Thirumalai <[email protected]>
---

fs/aio.c | 2 --
fs/open.c | 5 -----
net/sched/sch_atm.c | 3 ---
3 files changed, 10 deletions(-)

diff -ruN -X dontdiff2 linux-2.6.9/fs/aio.c f_count-2.6.9/fs/aio.c
--- linux-2.6.9/fs/aio.c 2004-10-19 03:24:07.000000000 +0530
+++ f_count-2.6.9/fs/aio.c 2004-11-15 14:07:29.000000000 +0530
@@ -493,8 +493,6 @@
*/
static int __aio_put_req(struct kioctx *ctx, struct kiocb *req)
{
- dprintk(KERN_DEBUG "aio_put(%p): f_count=%d\n",
- req, atomic_read(&req->ki_filp->f_count));

req->ki_users --;
if (unlikely(req->ki_users < 0))
diff -ruN -X dontdiff2 linux-2.6.9/fs/open.c f_count-2.6.9/fs/open.c
--- linux-2.6.9/fs/open.c 2004-10-19 03:23:08.000000000 +0530
+++ f_count-2.6.9/fs/open.c 2004-11-15 14:07:29.000000000 +0530
@@ -995,11 +995,6 @@
if (retval)
filp->f_error = 0;

- if (!file_count(filp)) {
- printk(KERN_ERR "VFS: Close: file count is 0\n");
- return retval;
- }
-
if (filp->f_op && filp->f_op->flush) {
int err = filp->f_op->flush(filp);
if (!retval)
diff -ruN -X dontdiff2 linux-2.6.9/net/sched/sch_atm.c f_count-2.6.9/net/sched/sch_atm.c
--- linux-2.6.9/net/sched/sch_atm.c 2004-10-19 03:24:07.000000000 +0530
+++ f_count-2.6.9/net/sched/sch_atm.c 2004-11-15 14:07:29.000000000 +0530
@@ -196,8 +196,6 @@
qdisc_destroy(flow->q);
destroy_filters(flow);
if (flow->sock) {
- DPRINTK("atm_tc_put: f_count %d\n",
- file_count(flow->sock->file));
flow->vcc->pop = flow->old_pop;
sockfd_put(flow->sock);
}
@@ -279,7 +277,6 @@
DPRINTK("atm_tc_change: type %d, payload %d, hdr_len %d\n",
opt->rta_type,RTA_PAYLOAD(opt),hdr_len);
if (!(sock = sockfd_lookup(fd,&error))) return error; /* f_count++ */
- DPRINTK("atm_tc_change: f_count %d\n",file_count(sock->file));
if (sock->ops->family != PF_ATMSVC && sock->ops->family != PF_ATMPVC) {
error = -EPROTOTYPE;
goto err_out;

2004-11-16 14:23:50

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: [patch 3/4] Cleanup file_count usage: Redundant check based on file_count

The file_count check below seems to be redundant. Getting rid of it.

Signed-off-by: Ravikiran Thirumalai <[email protected]>
---
diff -ruN -X dontdiff2 linux-2.6.9/fs/super.c f_count-2.6.9/fs/super.c
--- linux-2.6.9/fs/super.c 2004-10-19 03:24:07.000000000 +0530
+++ f_count-2.6.9/fs/super.c 2004-11-15 14:52:06.000000000 +0530
@@ -503,7 +503,7 @@

file_list_lock();
list_for_each_entry(f, &sb->s_files, f_list) {
- if (S_ISREG(f->f_dentry->d_inode->i_mode) && file_count(f))
+ if (S_ISREG(f->f_dentry->d_inode->i_mode))
f->f_mode &= ~FMODE_WRITE;
}
file_list_unlock();

2004-11-16 14:29:54

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [patch 4/4] Cleanup file_count usage: Avoid file_count usage for hugetlb nattach reporting

On Tue, Nov 16, 2004 at 07:41:25PM +0530, Ravikiran G Thirumalai wrote:
> The file_count macro usage to get the number of attaches to a hugetlb
> sysv style shared memory segment, seems to work although it looks bogus
> on the first look. But we want to get rid of file_count macros and
> reads to struct file.f_count. This patch cleans up the file_count usage
> for shm hugetlb attaches. The nattch counter is maintained on the same
> lines as in regular sysv shared memory segments. The file_operations for the
> struct file of the shared memory hugetlb segment now is different from the
> hugetlbfs specific f_ops.
> Patch has been tested with some userspace testcases
> Signed-off-by: Ravikiran Thirumalai <[email protected]>

I'm generally in favor of getting rid of the ->f_count twiddling from
hugetlb, though I've not reviewed this closely.


-- wli

2004-11-16 14:24:47

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: [patch 4/4] Cleanup file_count usage: Avoid file_count usage for hugetlb nattach reporting

The file_count macro usage to get the number of attaches to a hugetlb
sysv style shared memory segment, seems to work although it looks bogus
on the first look. But we want to get rid of file_count macros and
reads to struct file.f_count. This patch cleans up the file_count usage
for shm hugetlb attaches. The nattch counter is maintained on the same
lines as in regular sysv shared memory segments. The file_operations for the
struct file of the shared memory hugetlb segment now is different from the
hugetlbfs specific f_ops.

Patch has been tested with some userspace testcases

Signed-off-by: Ravikiran Thirumalai <[email protected]>
---

fs/hugetlbfs/inode.c | 21 ++++++++++++++++++++-
include/linux/hugetlb.h | 5 ++++-
include/linux/shm.h | 19 +++++++++++++++++++
ipc/shm.c | 19 ++++++-------------
mm/hugetlb.c | 7 +++++++
5 files changed, 56 insertions(+), 15 deletions(-)

diff -ruN -X dontdiff2 linux-2.6.9/fs/hugetlbfs/inode.c f_count-2.6.9/fs/hugetlbfs/inode.c
--- linux-2.6.9/fs/hugetlbfs/inode.c 2004-10-19 03:25:07.000000000 +0530
+++ f_count-2.6.9/fs/hugetlbfs/inode.c 2004-11-15 15:12:17.000000000 +0530
@@ -35,6 +35,7 @@
static struct super_operations hugetlbfs_ops;
static struct address_space_operations hugetlbfs_aops;
struct file_operations hugetlbfs_file_operations;
+struct file_operations hugetlbfs_shm_file_operations;
static struct inode_operations hugetlbfs_dir_inode_operations;
static struct inode_operations hugetlbfs_inode_operations;

@@ -91,6 +92,16 @@
return ret;
}

+static int
+hugetlbfs_shm_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ int ret;
+ ret = hugetlbfs_file_mmap(file, vma);
+ vma->vm_ops = &hugetlb_shm_vm_ops;
+ shm_inc(file->f_dentry->d_inode->i_ino);
+ return ret;
+}
+
/*
* Called under down_write(mmap_sem), page_table_lock is not held
*/
@@ -552,6 +563,11 @@
.get_unmapped_area = hugetlb_get_unmapped_area,
};

+struct file_operations hugetlbfs_shm_file_operations = {
+ .mmap = hugetlbfs_shm_file_mmap,
+ .get_unmapped_area = hugetlb_get_unmapped_area,
+};
+
static struct inode_operations hugetlbfs_dir_inode_operations = {
.create = hugetlbfs_create,
.lookup = simple_lookup,
@@ -737,6 +753,9 @@
can_do_mlock());
}

+/*
+ * Setup a hugetlb file on hugetlbfs for a new shm segment
+ */
struct file *hugetlb_zero_setup(size_t size)
{
int error = -ENOMEM;
@@ -781,7 +800,7 @@
file->f_vfsmnt = mntget(hugetlbfs_vfsmount);
file->f_dentry = dentry;
file->f_mapping = inode->i_mapping;
- file->f_op = &hugetlbfs_file_operations;
+ file->f_op = &hugetlbfs_shm_file_operations;
file->f_mode = FMODE_WRITE | FMODE_READ;
return file;

diff -ruN -X dontdiff2 linux-2.6.9/include/linux/hugetlb.h f_count-2.6.9/include/linux/hugetlb.h
--- linux-2.6.9/include/linux/hugetlb.h 2004-10-19 03:24:08.000000000 +0530
+++ f_count-2.6.9/include/linux/hugetlb.h 2004-11-15 15:12:17.000000000 +0530
@@ -117,14 +117,17 @@
}

extern struct file_operations hugetlbfs_file_operations;
+extern struct file_operations hugetlbfs_shm_file_operations;
extern struct vm_operations_struct hugetlb_vm_ops;
+extern struct vm_operations_struct hugetlb_shm_vm_ops;
struct file *hugetlb_zero_setup(size_t);
int hugetlb_get_quota(struct address_space *mapping);
void hugetlb_put_quota(struct address_space *mapping);

static inline int is_file_hugepages(struct file *file)
{
- return file->f_op == &hugetlbfs_file_operations;
+ return file->f_op == &hugetlbfs_shm_file_operations
+ || file->f_op == &hugetlbfs_file_operations;
}

static inline void set_file_hugepages(struct file *file)
diff -ruN -X dontdiff2 linux-2.6.9/include/linux/shm.h f_count-2.6.9/include/linux/shm.h
--- linux-2.6.9/include/linux/shm.h 2004-10-19 03:24:40.000000000 +0530
+++ f_count-2.6.9/include/linux/shm.h 2004-11-15 15:12:17.000000000 +0530
@@ -95,12 +95,31 @@

#ifdef CONFIG_SYSVIPC
long do_shmat(int shmid, char __user *shmaddr, int shmflg, unsigned long *addr);
+void shm_open(struct vm_area_struct *shmd);
+void shm_close(struct vm_area_struct *shmd);
+void shm_inc(int id);
#else
static inline long do_shmat(int shmid, char __user *shmaddr,
int shmflg, unsigned long *addr)
{
return -ENOSYS;
}
+
+static inline void shm_open(struct vm_area_struct *shmd)
+{
+ BUG();
+}
+
+static inline void shm_close(struct vm_area_struct *shmd)
+{
+ BUG();
+}
+
+static inline void shm_inc(int id)
+{
+ BUG();
+}
+
#endif

#endif /* __KERNEL__ */
diff -ruN -X dontdiff2 linux-2.6.9/ipc/shm.c f_count-2.6.9/ipc/shm.c
--- linux-2.6.9/ipc/shm.c 2004-10-19 03:24:08.000000000 +0530
+++ f_count-2.6.9/ipc/shm.c 2004-11-15 15:12:17.000000000 +0530
@@ -44,8 +44,6 @@
ipc_buildid(&shm_ids, id, seq)

static int newseg (key_t key, int shmflg, size_t size);
-static void shm_open (struct vm_area_struct *shmd);
-static void shm_close (struct vm_area_struct *shmd);
#ifdef CONFIG_PROC_FS
static int sysvipc_shm_read_proc(char *buffer, char **start, off_t offset, int length, int *eof, void *data);
#endif
@@ -83,7 +81,7 @@



-static inline void shm_inc (int id) {
+void shm_inc(int id) {
struct shmid_kernel *shp;

if(!(shp = shm_lock(id)))
@@ -95,7 +93,7 @@
}

/* This is called by fork, once for every shm attach. */
-static void shm_open (struct vm_area_struct *shmd)
+void shm_open(struct vm_area_struct *shmd)
{
shm_inc (shmd->vm_file->f_dentry->d_inode->i_ino);
}
@@ -129,7 +127,7 @@
* The descriptor has already been removed from the current->mm->mmap list
* and will later be kfree()d.
*/
-static void shm_close (struct vm_area_struct *shmd)
+void shm_close(struct vm_area_struct *shmd)
{
struct file * file = shmd->vm_file;
int id = file->f_dentry->d_inode->i_ino;
@@ -228,9 +226,7 @@
shp->id = shm_buildid(id,shp->shm_perm.seq);
shp->shm_file = file;
file->f_dentry->d_inode->i_ino = shp->id;
- if (shmflg & SHM_HUGETLB)
- set_file_hugepages(file);
- else
+ if (!(shmflg & SHM_HUGETLB))
file->f_op = &shm_file_operations;
shm_tot += numpages;
shm_unlock(shp);
@@ -496,10 +492,7 @@
tbuf.shm_ctime = shp->shm_ctim;
tbuf.shm_cpid = shp->shm_cprid;
tbuf.shm_lpid = shp->shm_lprid;
- if (!is_file_hugepages(shp->shm_file))
- tbuf.shm_nattch = shp->shm_nattch;
- else
- tbuf.shm_nattch = file_count(shp->shm_file) - 1;
+ tbuf.shm_nattch = shp->shm_nattch;
shm_unlock(shp);
if(copy_shmid_to_user (buf, &tbuf, version))
err = -EFAULT;
@@ -875,7 +868,7 @@
shp->shm_segsz,
shp->shm_cprid,
shp->shm_lprid,
- is_file_hugepages(shp->shm_file) ? (file_count(shp->shm_file) - 1) : shp->shm_nattch,
+ shp->shm_nattch,
shp->shm_perm.uid,
shp->shm_perm.gid,
shp->shm_perm.cuid,
diff -ruN -X dontdiff2 linux-2.6.9/mm/hugetlb.c f_count-2.6.9/mm/hugetlb.c
--- linux-2.6.9/mm/hugetlb.c 2004-10-19 03:24:37.000000000 +0530
+++ f_count-2.6.9/mm/hugetlb.c 2004-11-15 15:12:17.000000000 +0530
@@ -10,6 +10,7 @@
#include <linux/hugetlb.h>
#include <linux/sysctl.h>
#include <linux/highmem.h>
+#include <linux/shm.h>

const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
static unsigned long nr_huge_pages, free_huge_pages;
@@ -248,6 +249,12 @@
.nopage = hugetlb_nopage,
};

+struct vm_operations_struct hugetlb_shm_vm_ops = {
+ .open = shm_open,
+ .close = shm_close,
+ .nopage = hugetlb_nopage,
+};
+
void zap_hugepage_range(struct vm_area_struct *vma,
unsigned long start, unsigned long length)
{

2004-11-16 22:31:48

by Paul Mackerras

[permalink] [raw]
Subject: Re: [patch 0/4] Cleanup file_count usage

Ravikiran G Thirumalai writes:

> 2. Read to f_count at drivers/net/ppp_generic.c:
> The PPPIOCDETACH ioctl is failed if the device fd is duped and
> being polled by another process -- which is determined by a read
> to f_count. The comments in the code indicate that the PPPIOCDETACH
> ioctl should be junked and userland can use a workaround
> by closing the fd and reopening /dev/ppp. I can make a patch to junk
> PPPIOCDETACH, but is it okay to break binary compatibility? Paul?

I would very much rather not break binary compatibility. People using
ppp-2.4.2 or ppp-2.4.3 won't be affected by this change, but ppp-2.4.2
was released in January this year and I wouldn't be surprised if
some people are still using ppp-2.4.1 (or even ppp-2.4.0), which would
be affected.

I suggest that as a temporary compromise, we keep PPPIOCDETACH but
make it zero file->private_data without freeing the ppp structure that
it points to. That will cause a memory leak but will avoid the
possibility of using that memory after it is freed (in the case where
the descriptor has been dup'd and another process is doing a poll on
it).

We should make the kernel whinge loudly when anyone uses PPPIOCDETACH,
too. If/when 2.7.0 comes along we can remove it.

Paul.

2004-11-16 23:01:40

by Jan Harkes

[permalink] [raw]
Subject: Re: [patch 0/4] Cleanup file_count usage

On Tue, Nov 16, 2004 at 07:22:00PM +0530, Ravikiran G Thirumalai wrote:
> What remains:
> 1. Hack to return error code to user space at last close through file_count
> check at the driver's flush routine. This hack is used in scsi/st.c,
> scsi/osst.c and coda/file.c to return error code through .flush()
> (Although it is doubtful if applications check for error during close(2)).
> Kai has a patch to cleanup scsi/st.c. I will make patches to move last
> close code from .flush() to .release() in the coda filesystem if no one
> objects to it. Not sure if you can do anything on errors at close...

That won't work, in fops_release it is far too late to pass error codes
back to the application that called close(2).

In fact Coda used to only have a CODA_CLOSE upcall in coda_release until
users noticed that they never got an error return when the final write
to the servers failed. The only solution was to split the store and
release functionality of CODA_CLOSE so that we can perform the writeback
to the servers during the fops_flush operation (CODA_SYNC) and release
the last reference to the object during fops_release (CODA_RELEASE). If
either of these upcalls fails we fall back on the old behaviour.

People who write application with Coda in mind actually do check for
errors at close, it is the only time that we can actually generate any
errors as individual writes are not visible to the cache manager.

Do you have a link to the original discussion, what problem are we
trying to solve here?

Jan

2004-11-16 23:24:18

by Jan Harkes

[permalink] [raw]
Subject: Re: [patch 0/4] Cleanup file_count usage

On Tue, Nov 16, 2004 at 05:56:58PM -0500, Jan Harkes wrote:
> On Tue, Nov 16, 2004 at 07:22:00PM +0530, Ravikiran G Thirumalai wrote:
> > What remains:
> > 1. Hack to return error code to user space at last close through file_count
> > check at the driver's flush routine. This hack is used in scsi/st.c,
> > scsi/osst.c and coda/file.c to return error code through .flush()
> > (Although it is doubtful if applications check for error during close(2)).
> > Kai has a patch to cleanup scsi/st.c. I will make patches to move last
> > close code from .flush() to .release() in the coda filesystem if no one
> > objects to it. Not sure if you can do anything on errors at close...

If this f_count/file_count cleanup really has to go in, instead of moving
the Coda code from flush to release, use the attached patch which simply
removes last_close semantics from the kernel. This way I'll have to deal
with an upcall for every close, but I can try to track the open count
in our userspace cachemanager and do the last writer thing up there.

I would rather take the performance hit than lose the possibility to
return meaningful errors on close.

Jan


--- linux/fs/coda/file.c.orig 2004-11-16 18:03:17.000000000 -0500
+++ linux/fs/coda/file.c 2004-11-16 18:04:03.000000000 -0500
@@ -157,11 +157,6 @@

coda_vfs_stat.flush++;

- /* last close semantics */
- fcnt = file_count(coda_file);
- if (fcnt > 1)
- goto out;
-
/* No need to make an upcall when we have not made any modifications
* to the file */
if ((coda_file->f_flags & O_ACCMODE) == O_RDONLY)

2004-11-17 00:58:21

by Willem Riede

[permalink] [raw]
Subject: Re: [patch 0/4] Cleanup file_count usage

On 11/16/2004 08:52:00 AM, Ravikiran G Thirumalai wrote:
> This patchset is an attempt to cleanup some bogus and some not so bogus
> reads to struct file.f_count of the vfs from various subsystems in the
> kernel. This patchset doesn't cleanup uses of f_count completely;
> Geting rid of all reads to f_count was suggested by Viro during the
> discussion on kref based lockfree fd management sometime back.
>
> What remains:
> 1. Hack to return error code to user space at last close through file_count
> check at the driver's flush routine. This hack is used in scsi/st.c,
> scsi/osst.c and coda/file.c to return error code through .flush()
> (Although it is doubtful if applications check for error during
> close(2)).
> Kai has a patch to cleanup scsi/st.c.

The equivalent change can easily be made to osst. I'll be happy to take
care of that if we collectively decide that's where we need to go.

Regards, Willem Riede.


2004-11-17 16:57:38

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: Re: [patch 0/4] Cleanup file_count usage

On Wed, Nov 17, 2004 at 09:31:54AM +1100, Paul Mackerras wrote:
> ...
> I suggest that as a temporary compromise, we keep PPPIOCDETACH but
> make it zero file->private_data without freeing the ppp structure that
> it points to. That will cause a memory leak but will avoid the
> possibility of using that memory after it is freed (in the case where
> the descriptor has been dup'd and another process is doing a poll on
> it).
>

How about this patch? This doesn't leak memory hopefully.

---
Patch to cleanup reads to f_count from ppp driver, deprecate PPPIOCDETACH
ioctl, and print warning messages if the ioctl is used.

A new 'detached' field is included in struct ppp_file, and the ppp
channel/interface is marked as detached when PPPIOCDETACH is invoked
instead of checking the f_count and failing the ioctl for 'if the fd was
dup'd' case. The ppp structure is freed on last close.

Patch tested with ppp-2.4.0 and ppp-2.4.2. I could actually dial out :)

Signed-off-by: Ravikiran Thirumalai <[email protected]>

---
diff -ruN -X dontdiff2 linux-2.6.9/drivers/net/ppp_generic.c f_count-2.6.9/drivers/net/ppp_generic.c
--- linux-2.6.9/drivers/net/ppp_generic.c 2004-10-19 03:25:24.000000000 +0530
+++ f_count-2.6.9/drivers/net/ppp_generic.c 2004-11-16 22:26:36.000000000 +0530
@@ -82,6 +82,7 @@
int hdrlen; /* space to leave for headers */
int index; /* interface unit / channel number */
int dead; /* unit/channel has been shut down */
+ int detached; /* device has been detached */
};

#define PF_TO_X(pf, X) ((X *)((char *)(pf) - offsetof(X, file)))
@@ -401,7 +402,7 @@

ret = count;

- if (pf == 0)
+ if (pf == 0 || (pf && pf->detached))
return -ENXIO;
add_wait_queue(&pf->rwait, &wait);
for (;;) {
@@ -447,7 +448,7 @@
struct sk_buff *skb;
ssize_t ret;

- if (pf == 0)
+ if (pf == 0 || (pf && pf->detached))
return -ENXIO;
ret = -ENOMEM;
skb = alloc_skb(count + pf->hdrlen, GFP_KERNEL);
@@ -483,7 +484,7 @@
struct ppp_file *pf = file->private_data;
unsigned int mask;

- if (pf == 0)
+ if (pf == 0 || (pf && pf->detached))
return 0;
poll_wait(file, &pf->rwait, wait);
mask = POLLOUT | POLLWRNORM;
@@ -549,31 +550,33 @@
if (pf == 0)
return ppp_unattached_ioctl(pf, file, cmd, arg);

+ if (pf && pf->detached)
+ return -EAGAIN;
+
if (cmd == PPPIOCDETACH) {
/*
+ * This ioctl is deprecated and should not be used
+ * ioctl is lying around for binary compatibility and
+ * will be junked at 2.7. Userland can get the same
+ * effect as PPPIOCDETACH by closing this fd and
+ * reopening /dev/ppp
+ *
* We have to be careful here... if the file descriptor
* has been dup'd, we could have another process in the
* middle of a poll using the same file *, so we had
* better not free the interface data structures -
- * instead we fail the ioctl. Even in this case, we
- * shut down the interface if we are the owner of it.
- * Actually, we should get rid of PPPIOCDETACH, userland
- * (i.e. pppd) could achieve the same effect by closing
- * this fd and reopening /dev/ppp.
+ * instead we mark the interface datastructure as detached
+ * and shut down the interface if we are the owner of it.
*/
- err = -EINVAL;
if (pf->kind == INTERFACE) {
ppp = PF_TO_PPP(pf);
if (file == ppp->owner)
ppp_shutdown_interface(ppp);
}
- if (atomic_read(&file->f_count) <= 2) {
- ppp_release(inode, file);
- err = 0;
- } else
- printk(KERN_DEBUG "PPPIOCDETACH file->f_count=%d\n",
- atomic_read(&file->f_count));
- return err;
+ pf->detached = 1;
+ printk(KERN_WARNING "PPPIOCDETACH ioctl is deprecated; "
+ "Upgrade to ppp-2.4.2 or above.\n");
+ return 0;
}

if (pf->kind == CHANNEL) {
@@ -790,7 +793,7 @@
down(&all_ppp_sem);
err = -ENXIO;
ppp = ppp_find_unit(unit);
- if (ppp != 0) {
+ if (ppp != 0 && !ppp->file.detached) {
atomic_inc(&ppp->file.refcnt);
file->private_data = &ppp->file;
err = 0;
@@ -804,7 +807,7 @@
spin_lock_bh(&all_channels_lock);
err = -ENXIO;
chan = ppp_find_channel(unit);
- if (chan != 0) {
+ if (chan != 0 && !chan->file.detached) {
atomic_inc(&chan->file.refcnt);
file->private_data = &chan->file;
err = 0;

2004-11-17 22:12:18

by Paul Mackerras

[permalink] [raw]
Subject: Re: [patch 0/4] Cleanup file_count usage

Ravikiran G Thirumalai writes:

> How about this patch? This doesn't leak memory hopefully.
>
> ---
> Patch to cleanup reads to f_count from ppp driver, deprecate PPPIOCDETACH
> ioctl, and print warning messages if the ioctl is used.
>
> A new 'detached' field is included in struct ppp_file, and the ppp
> channel/interface is marked as detached when PPPIOCDETACH is invoked
> instead of checking the f_count and failing the ioctl for 'if the fd was
> dup'd' case. The ppp structure is freed on last close.

The difficulty comes when pppd does a PPPIOCNEWUNIT, a PPPIOCDETACH,
and another PPPIOCNEWUNIT. To test that, you would need to use
ppp-2.4.0 or ppp-2.4.1 and use the persist option to pppd. Make one
connection and then terminate it (unplug the modem, or use the idle
option to pppd). Pppd should then try to reestablish the connection.
The question is whether the second connection attempt succeeds or not.
I think that with your patch it won't (from a quick look).

Paul.