Kernel 2.4.2-ac3.
FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME COMMAND
40 0 425 1 -1 -20 0 0 down DW< ? 0:00 (loop0)
>From a look at the source it seems that this may be normal behavior
(though I'm not sure). However, it's still cosmetically annoying,
because it throws off the load average (a D state process is counted
as "running" for the loadavg calculation).
My loopback-mounted fs seems to be working fine, nevertheless, which
is a nice change from previous kernels.
--
Nate Eldredge
[email protected]
Nate Eldredge writes:
> Kernel 2.4.2-ac3.
>
> FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME COMMAND
> 40 0 425 1 -1 -20 0 0 down DW< ? 0:00 (loop0)
It looks like this has been addressed in the thread "242-ac3 loop
bug". Jens Axboe posted a patch, but the list archive I'm reading
mangled it. Jens, could you make this patch available somewhere, or
at least email me a copy? (If it's going in an upcoming -ac patch,
then don't bother; I can wait until then.)
Thanks.
> From a look at the source it seems that this may be normal behavior
> (though I'm not sure). However, it's still cosmetically annoying,
> because it throws off the load average (a D state process is counted
> as "running" for the loadavg calculation).
>
> My loopback-mounted fs seems to be working fine, nevertheless, which
> is a nice change from previous kernels.
--
Nate Eldredge
[email protected]
--- drivers/block/loop.c~ Sat Feb 24 23:08:38 2001
+++ drivers/block/loop.c Sat Feb 24 23:11:13 2001
@@ -507,7 +507,7 @@
sprintf(current->comm, "loop%d", lo->lo_number);
spin_lock_irq(¤t->sigmask_lock);
- siginitsetinv(¤t->blocked, sigmask(SIGKILL));
+ sigfillset(¤t->blocked);
flush_signals(current);
spin_unlock_irq(¤t->sigmask_lock);
@@ -525,7 +525,7 @@
up(&lo->lo_sem);
for (;;) {
- down(&lo->lo_bh_mutex);
+ down_interruptible(&lo->lo_bh_mutex);
if (!atomic_read(&lo->lo_pending))
break;
On Sun, 25 Feb 2001, Jens Axboe wrote:
> On Sun, Feb 25 2001, Nate Eldredge wrote:
> > Nate Eldredge writes:
> > > Kernel 2.4.2-ac3.
> > >
> > > FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME COMMAND
> > > 40 0 425 1 -1 -20 0 0 down DW< ? 0:00 (loop0)
> >
> > It looks like this has been addressed in the thread "242-ac3 loop
> > bug". Jens Axboe posted a patch, but the list archive I'm reading
> > mangled it. Jens, could you make this patch available somewhere, or
> > at least email me a copy? (If it's going in an upcoming -ac patch,
> > then don't bother; I can wait until then.)
>
> Patch is here, I haven't checked whether Alan put it in ac4 yet (I
> did cc him, but noone knows for sure :-).
Jens, you have a race in lo_clr_fd() (loop-6). I've put the fixed
variant on ftp.math.psu.edu/pub/viro/loop-S2.gz. Diff and you'll
see - it's in the very beginning of the lo_clr_fd().
Cheers,
Al
diff -ur --exclude-from /home/axboe/cdrom/exclude /opt/kernel/linux-2.4.2-ac4/drivers/block/loop.c linux/drivers/block/loop.c
--- /opt/kernel/linux-2.4.2-ac4/drivers/block/loop.c Mon Feb 26 01:19:38 2001
+++ linux/drivers/block/loop.c Mon Feb 26 01:22:28 2001
@@ -79,7 +79,6 @@
static int *loop_sizes;
static int *loop_blksizes;
static devfs_handle_t devfs_handle; /* For the directory */
-static kmem_cache_t *loop_bhp;
/*
* Transfer functions
@@ -289,7 +288,7 @@
if (bh) {
kunmap(bh->b_page);
__free_page(bh->b_page);
- kmem_cache_free(loop_bhp, bh);
+ kmem_cache_free(bh_cachep, bh);
}
}
@@ -358,7 +357,7 @@
struct buffer_head *bh;
do {
- bh = kmem_cache_alloc(loop_bhp, SLAB_BUFFER);
+ bh = kmem_cache_alloc(bh_cachep, SLAB_BUFFER);
if (bh)
break;
@@ -508,7 +507,7 @@
sprintf(current->comm, "loop%d", lo->lo_number);
spin_lock_irq(¤t->sigmask_lock);
- siginitsetinv(¤t->blocked, sigmask(SIGKILL));
+ sigfillset(¤t->blocked);
flush_signals(current);
spin_unlock_irq(¤t->sigmask_lock);
@@ -526,7 +525,7 @@
up(&lo->lo_sem);
for (;;) {
- down(&lo->lo_bh_mutex);
+ down_interruptible(&lo->lo_bh_mutex);
if (!atomic_read(&lo->lo_pending))
break;
@@ -671,9 +670,12 @@
if (lo->lo_refcnt > 1) /* we needed one fd for the ioctl */
return -EBUSY;
+ spin_lock_irq(&lo->lo_lock);
lo->lo_state = Lo_rundown;
atomic_dec(&lo->lo_pending);
up(&lo->lo_bh_mutex);
+ spin_unlock_irq(&lo->lo_lock);
+
down(&lo->lo_sem);
lo->lo_backing_file = NULL;
@@ -927,13 +929,6 @@
return -EIO;
}
- loop_bhp = kmem_cache_create("loop_buffers", sizeof(struct buffer_head),
- 0, SLAB_HWCACHE_ALIGN, NULL, NULL);
- if (!loop_bhp) {
- printk(KERN_WARNING "loop: unable to create slab cache\n");
- return -ENOMEM;
- }
-
devfs_handle = devfs_mk_dir(NULL, "loop", NULL);
devfs_register_series(devfs_handle, "%u", max_loop, DEVFS_FL_DEFAULT,
MAJOR_NR, 0,
@@ -942,7 +937,7 @@
loop_dev = kmalloc(max_loop * sizeof(struct loop_device), GFP_KERNEL);
if (!loop_dev)
- goto out_dev;
+ return -ENOMEM;
loop_sizes = kmalloc(max_loop * sizeof(int), GFP_KERNEL);
if (!loop_sizes)
@@ -974,8 +969,6 @@
printk(KERN_INFO "loop: loaded (max %d devices)\n", max_loop);
return 0;
-out_dev:
- kmem_cache_destroy(loop_bhp);
out_sizes:
kfree(loop_dev);
out_blksizes:
@@ -990,7 +983,6 @@
if (devfs_unregister_blkdev(MAJOR_NR, "loop"))
printk(KERN_WARNING "loop: cannot unregister blkdev\n");
- kmem_cache_destroy(loop_bhp);
kfree(loop_dev);
kfree(loop_sizes);
kfree(loop_blksizes);
diff -ur --exclude-from /home/axboe/cdrom/exclude /opt/kernel/linux-2.4.2-ac4/fs/Makefile linux/fs/Makefile
--- /opt/kernel/linux-2.4.2-ac4/fs/Makefile Mon Feb 26 01:19:40 2001
+++ linux/fs/Makefile Mon Feb 26 01:14:44 2001
@@ -7,7 +7,7 @@
O_TARGET := fs.o
-export-objs := filesystems.o open.o
+export-objs := filesystems.o open.o dcache.o
mod-subdirs := nls
obj-y := open.o read_write.o devices.o file_table.o buffer.o \
diff -ur --exclude-from /home/axboe/cdrom/exclude /opt/kernel/linux-2.4.2-ac4/fs/dcache.c linux/fs/dcache.c
--- /opt/kernel/linux-2.4.2-ac4/fs/dcache.c Sat Feb 17 01:06:17 2001
+++ linux/fs/dcache.c Mon Feb 26 01:14:54 2001
@@ -22,6 +22,7 @@
#include <linux/init.h>
#include <linux/smp_lock.h>
#include <linux/cache.h>
+#include <linux/module.h>
#include <asm/uaccess.h>
@@ -1250,6 +1251,7 @@
/* SLAB cache for buffer_head structures */
kmem_cache_t *bh_cachep;
+EXPORT_SYMBOL(bh_cachep);
void __init vfs_caches_init(unsigned long mempages)
{
On Mon, 26 Feb 2001, Jens Axboe wrote:
> On Sun, Feb 25 2001, Alexander Viro wrote:
> > Jens, you have a race in lo_clr_fd() (loop-6). I've put the fixed
> > variant on ftp.math.psu.edu/pub/viro/loop-S2.gz. Diff and you'll
> > see - it's in the very beginning of the lo_clr_fd().
>
> Oops yeah you are right. Here's a diff of my current loop stuff
> against -ac4, Alan could you apply? Andrea suggested removing
> the loop private slab cache for buffer heads and just using the
> bh_cachep pool, and it seems like a good idea to me.
Erm... Jens, it really should be
if (atomic_dec_and_test(...))
up(...);
not just
atomic_dec(...);
up(...);
Otherwise you can end up with too early exit of loop_thread. Normally
it would not matter, but in pathological cases...
On Sun, Feb 25 2001, Alexander Viro wrote:
> Erm... Jens, it really should be
> if (atomic_dec_and_test(...))
> up(...);
> not just
> atomic_dec(...);
> up(...);
>
> Otherwise you can end up with too early exit of loop_thread. Normally
> it would not matter, but in pathological cases...
How so? We dec it and up the semaphore, loop_thread runs until it's
done and ups lo_sem.
--
Jens Axboe
On Mon, 26 Feb 2001, Jens Axboe wrote:
> On Sun, Feb 25 2001, Alexander Viro wrote:
> > Erm... Jens, it really should be
> > if (atomic_dec_and_test(...))
> > up(...);
> > not just
> > atomic_dec(...);
> > up(...);
> >
> > Otherwise you can end up with too early exit of loop_thread. Normally
> > it would not matter, but in pathological cases...
>
> How so? We dec it and up the semaphore, loop_thread runs until it's
> done and ups lo_sem.
You are risking an extra up() here. Think what happens if you already had a
pending request.
diff -ur --exclude-from /home/axboe/cdrom/exclude /opt/kernel/linux-2.4.2-ac4/drivers/block/loop.c linux/drivers/block/loop.c
--- /opt/kernel/linux-2.4.2-ac4/drivers/block/loop.c Mon Feb 26 01:19:38 2001
+++ linux/drivers/block/loop.c Mon Feb 26 01:38:36 2001
@@ -79,7 +79,6 @@
static int *loop_sizes;
static int *loop_blksizes;
static devfs_handle_t devfs_handle; /* For the directory */
-static kmem_cache_t *loop_bhp;
/*
* Transfer functions
@@ -289,7 +288,7 @@
if (bh) {
kunmap(bh->b_page);
__free_page(bh->b_page);
- kmem_cache_free(loop_bhp, bh);
+ kmem_cache_free(bh_cachep, bh);
}
}
@@ -358,7 +357,7 @@
struct buffer_head *bh;
do {
- bh = kmem_cache_alloc(loop_bhp, SLAB_BUFFER);
+ bh = kmem_cache_alloc(bh_cachep, SLAB_BUFFER);
if (bh)
break;
@@ -508,7 +507,7 @@
sprintf(current->comm, "loop%d", lo->lo_number);
spin_lock_irq(¤t->sigmask_lock);
- siginitsetinv(¤t->blocked, sigmask(SIGKILL));
+ sigfillset(¤t->blocked);
flush_signals(current);
spin_unlock_irq(¤t->sigmask_lock);
@@ -526,7 +525,7 @@
up(&lo->lo_sem);
for (;;) {
- down(&lo->lo_bh_mutex);
+ down_interruptible(&lo->lo_bh_mutex);
if (!atomic_read(&lo->lo_pending))
break;
@@ -671,9 +670,12 @@
if (lo->lo_refcnt > 1) /* we needed one fd for the ioctl */
return -EBUSY;
+ spin_lock_irq(&lo->lo_lock);
lo->lo_state = Lo_rundown;
- atomic_dec(&lo->lo_pending);
- up(&lo->lo_bh_mutex);
+ if (atomic_dec_and_test(&lo->lo_pending))
+ up(&lo->lo_bh_mutex);
+ spin_unlock_irq(&lo->lo_lock);
+
down(&lo->lo_sem);
lo->lo_backing_file = NULL;
@@ -927,13 +929,6 @@
return -EIO;
}
- loop_bhp = kmem_cache_create("loop_buffers", sizeof(struct buffer_head),
- 0, SLAB_HWCACHE_ALIGN, NULL, NULL);
- if (!loop_bhp) {
- printk(KERN_WARNING "loop: unable to create slab cache\n");
- return -ENOMEM;
- }
-
devfs_handle = devfs_mk_dir(NULL, "loop", NULL);
devfs_register_series(devfs_handle, "%u", max_loop, DEVFS_FL_DEFAULT,
MAJOR_NR, 0,
@@ -942,7 +937,7 @@
loop_dev = kmalloc(max_loop * sizeof(struct loop_device), GFP_KERNEL);
if (!loop_dev)
- goto out_dev;
+ return -ENOMEM;
loop_sizes = kmalloc(max_loop * sizeof(int), GFP_KERNEL);
if (!loop_sizes)
@@ -974,8 +969,6 @@
printk(KERN_INFO "loop: loaded (max %d devices)\n", max_loop);
return 0;
-out_dev:
- kmem_cache_destroy(loop_bhp);
out_sizes:
kfree(loop_dev);
out_blksizes:
@@ -990,7 +983,6 @@
if (devfs_unregister_blkdev(MAJOR_NR, "loop"))
printk(KERN_WARNING "loop: cannot unregister blkdev\n");
- kmem_cache_destroy(loop_bhp);
kfree(loop_dev);
kfree(loop_sizes);
kfree(loop_blksizes);
diff -ur --exclude-from /home/axboe/cdrom/exclude /opt/kernel/linux-2.4.2-ac4/fs/Makefile linux/fs/Makefile
--- /opt/kernel/linux-2.4.2-ac4/fs/Makefile Mon Feb 26 01:19:40 2001
+++ linux/fs/Makefile Mon Feb 26 01:14:44 2001
@@ -7,7 +7,7 @@
O_TARGET := fs.o
-export-objs := filesystems.o open.o
+export-objs := filesystems.o open.o dcache.o
mod-subdirs := nls
obj-y := open.o read_write.o devices.o file_table.o buffer.o \
diff -ur --exclude-from /home/axboe/cdrom/exclude /opt/kernel/linux-2.4.2-ac4/fs/dcache.c linux/fs/dcache.c
--- /opt/kernel/linux-2.4.2-ac4/fs/dcache.c Sat Feb 17 01:06:17 2001
+++ linux/fs/dcache.c Mon Feb 26 01:14:54 2001
@@ -22,6 +22,7 @@
#include <linux/init.h>
#include <linux/smp_lock.h>
#include <linux/cache.h>
+#include <linux/module.h>
#include <asm/uaccess.h>
@@ -1250,6 +1251,7 @@
/* SLAB cache for buffer_head structures */
kmem_cache_t *bh_cachep;
+EXPORT_SYMBOL(bh_cachep);
void __init vfs_caches_init(unsigned long mempages)
{
On Sun, 25 Feb 2001, Alexander Viro wrote:
>
>
> On Mon, 26 Feb 2001, Jens Axboe wrote:
>
> > On Sun, Feb 25 2001, Alexander Viro wrote:
> > > Erm... Jens, it really should be
> > > if (atomic_dec_and_test(...))
> > > up(...);
> > > not just
> > > atomic_dec(...);
> > > up(...);
> > >
> > > Otherwise you can end up with too early exit of loop_thread. Normally
> > > it would not matter, but in pathological cases...
> >
> > How so? We dec it and up the semaphore, loop_thread runs until it's
> > done and ups lo_sem.
>
> You are risking an extra up() here. Think what happens if you already had a
> pending request.
Let me elaborate: the race is very narrow and takes deliberate efforts to
hit. It _can_ be triggered, unfortunately. This extra up() will mess your
life later on.
On Sun, Feb 25 2001, Alexander Viro wrote:
> Let me elaborate: the race is very narrow and takes deliberate efforts to
> hit. It _can_ be triggered, unfortunately. This extra up() will mess your
> life later on.
What's the worst that can happen? We do an extra up, but loop_thread
will still quit once we hit zero lo_pending. And loop_clr_fd
is still protected by lo_ctl_mutex.
--
Jens Axboe
On Mon, 26 Feb 2001, Jens Axboe wrote:
> On Sun, Feb 25 2001, Alexander Viro wrote:
> > Let me elaborate: the race is very narrow and takes deliberate efforts to
> > hit. It _can_ be triggered, unfortunately. This extra up() will mess your
> > life later on.
>
> What's the worst that can happen? We do an extra up, but loop_thread
> will still quit once we hit zero lo_pending. And loop_clr_fd
> is still protected by lo_ctl_mutex.
Well, for one thing you'll get some surprises next time you losetup
the same device ;-) There are more subtle scenarios, but that one
is pretty unpleasant in itself.
Cheers,
Al
On Sun, Feb 25 2001, Alexander Viro wrote:
> > What's the worst that can happen? We do an extra up, but loop_thread
> > will still quit once we hit zero lo_pending. And loop_clr_fd
> > is still protected by lo_ctl_mutex.
>
> Well, for one thing you'll get some surprises next time you losetup
> the same device ;-) There are more subtle scenarios, but that one
> is pretty unpleasant in itself.
Ah ok, but that could be solved by just reiniting the sems on
each losetup (which probably would be a good idea anyway). But
ok, I'll shut up now :-)
--
Jens Axboe
On Mon, 26 Feb 2001, Jens Axboe wrote:
> Ah ok, I see what you mean. Updated patch attached.
Corresponding patch against 2.4.2 is on ftp.math.psu.edu/pub/viro/loop-S2.gz
Cheers,
Al
Jens Axboe writes:
> On Sun, Feb 25 2001, Nate Eldredge wrote:
> > Nate Eldredge writes:
> > > Kernel 2.4.2-ac3.
> > >
> > > FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME COMMAND
> > > 40 0 425 1 -1 -20 0 0 down DW< ? 0:00 (loop0)
> >
> > It looks like this has been addressed in the thread "242-ac3 loop
> > bug". Jens Axboe posted a patch, but the list archive I'm reading
> > mangled it. Jens, could you make this patch available somewhere, or
> > at least email me a copy? (If it's going in an upcoming -ac patch,
> > then don't bother; I can wait until then.)
>
> Patch is here, I haven't checked whether Alan put it in ac4 yet (I
> did cc him, but noone knows for sure :-).
It's in 2.4.2-ac5, which does indeed fix the problem. Thanks to all.
--
Nate Eldredge
[email protected]