url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.38/2.5.38-mm2/
+linus.patch
Linus's current diff.
-filemap-fixes.patch
Merged
+unbreak-writeback-mode.patch
ext3 in data=writeback mode was oopsing on writeback of MAP_SHARED
data.
+read-latency.patch
Fix the writer-starves-reader elevator problem. This is basically
the read_latency2 patch from -ac kernels.
On IDE it provides a 100x improvement in read throughput when there
is heavy writeback happening. 40x on SCSI. You need to disable
tagged command queueing on scsi - it appears to be quite stupidly
implemented.
linus.patch
cset-1.580.1.4-to-1.597.txt.gz
ide-high-1.patch
ide-block-fix-1.patch
scsi_hack.patch
Fix block-highmem for scsi
ext3-htree.patch
Indexed directories for ext3
spin-lock-check.patch
spinlock/rwlock checking infrastructure
rd-cleanup.patch
Cleanup and fix the ramdisk driver (doesn't work right yet)
might_sleep.patch
debug code to detect might-sleep-inside-spinlock bugs
unbreak-writeback-mode.patch
Fix ext3's data=writeback mode
queue-congestion.patch
Infrastructure for communicating request queue congestion to the VM
nonblocking-ext2-preread.patch
avoid ext2 inode prereads if the queue is congested
nonblocking-pdflush.patch
non-blocking writeback infrastructure, use it for pdflush
nonblocking-vm.patch
Non-blocking page reclaim
set_page_dirty-locking-fix.patch
don't call __mark_inode_dirty under spinlock
prepare_to_wait.patch
prepare_to_wait/finish_wait: new sleep/wakeup API
vm-wakeups.patch
Use the faster wakeups in the VM and block layers
sync-helper.patch
Speed up sys_sync() against multiple spindles
slabasap.patch
Early and smarter shrinking of slabs
write-deadlock.patch
Fix the generic_file_write-from-same-mmapped-page deadlock
buddyinfo.patch
Add /proc/buddyinfo - stats on the free pages pool
free_area.patch
Remove struct free_area_struct and free_area_t, use `struct free_area'
per-node-kswapd.patch
Per-node kswapd instance
topology-api.patch
Simple topology API
radix_tree_gang_lookup.patch
radix tree gang lookup
truncate_inode_pages.patch
truncate/invalidate_inode_pages rewrite
proc_vmstat.patch
Move the vm accounting out of /proc/stat
kswapd-reclaim-stats.patch
Add kswapd_steal to /proc/vmstat
iowait.patch
I/O wait statistics
sard.patch
SARD disk accounting
remove-gfp_nfs.patch
remove GFP_NFS
tcp-wakeups.patch
Use fast wakeups in TCP/IPV4
swapoff-deadlock.patch
Fix a tmpfs swapoff deadlock
dirty-and-uptodate.patch
page state cleanup
shmem_rename.patch
shmem_rename() directory link count fix
dirent-size.patch
tmpfs: show a non-zero size for directories
tmpfs-trivia.patch
tmpfs: small fixlets
per-zone-vm.patch
separate the kswapd and direct reclaim code paths
swsusp-feature.patch
add shrink_all_memory() for swsusp
adaptec-fix.patch
partial fix for aic7xxx error recovery
remove-page-virtual.patch
remove page->virtual for !WANT_PAGE_VIRTUAL
dirty-memory-clamp.patch
sterner dirty-memory clamping
mempool-wakeup-fix.patch
Fix for stuck tasks in mempool_alloc()
remove-write_mapping_buffers.patch
Remove write_mapping_buffers
buffer_boundary-scheduling.patch
IO schduling for indirect blocks
ll_rw_block-cleanup.patch
cleanup ll_rw_block()
lseek-ext2_readdir.patch
remove lock_kernel() from ext2_readdir()
discontig-no-contig_page_data.patch
undefine contif_page_data for discontigmem
per-node-zone_normal.patch
ia32 NUMA: per-node ZONE_NORMAL
alloc_pages_node-cleanup.patch
alloc_pages_node cleanup
read_barrier_depends.patch
extended barrier primitives
rcu_ltimer.patch
RCU core
dcache_rcu.patch
Use RCU for dcache
read-latency.patch
Elevator fix for writes-starving-reads
On Sun, Sep 22 2002, Andrew Morton wrote:
> +read-latency.patch
>
> Fix the writer-starves-reader elevator problem. This is basically
> the read_latency2 patch from -ac kernels.
>
> On IDE it provides a 100x improvement in read throughput when there
> is heavy writeback happening. 40x on SCSI. You need to disable
Ah interesting. I do still think that it is worth to investigate _why_
both elevator_linus and deadline does not prevent the read starvation.
The read-latency is a hack, not a solution imo.
> tagged command queueing on scsi - it appears to be quite stupidly
> implemented.
Ahem I think you are being excessively harsh, or maybe passing judgement
on something you haven't even looked at. Did you consider that you
_drive_ may be the broken component? Excessive turn-around times for
request when using deep tcq is not unusual, by far.
--
Jens Axboe
Jens Axboe wrote:
>
> On Sun, Sep 22 2002, Andrew Morton wrote:
> > +read-latency.patch
> >
> > Fix the writer-starves-reader elevator problem. This is basically
> > the read_latency2 patch from -ac kernels.
> >
> > On IDE it provides a 100x improvement in read throughput when there
> > is heavy writeback happening. 40x on SCSI. You need to disable
>
> Ah interesting. I do still think that it is worth to investigate _why_
> both elevator_linus and deadline does not prevent the read starvation.
I did. See below.
> The read-latency is a hack, not a solution imo.
Well it clearly _is_ a solution. To a grave problem. But hopefully not
the best solution. Really, this is just me saying "ouch". This is
your stuff ;)
> > tagged command queueing on scsi - it appears to be quite stupidly
> > implemented.
>
> Ahem I think you are being excessively harsh, or maybe passing judgement
> on something you haven't even looked at. Did you consider that you
> _drive_ may be the broken component? Excessive turn-around times for
> request when using deep tcq is not unusual, by far.
It's a Fujitsu SCA-2 thing. Could be that other drive manufacturers
have a slight clue, but I doubt it. I bet they just went and designed
the queueing for optimum throughput, with the assumption that reads
and writes are muchly the same thing.
But they're not. They are vastly different things. Your fancy 2GHz
processor twiddles thumbs waiting for reads. But not for writes.
The "hack" _recognises_ this fact - that reads are very different
things from writes.
Let's run the numbers. 128 slot write request queue. 512k writes.
30 mbyte/sec bandwidth. That's two seconds worth of writes in the
request queue.
The reads have basically no chance of getting inserted between those
writes, so the first read has a two second latency, and that's before
adding in any of the passovers which additional writes will enjoy.
It works out that the latency per read is about three seconds. I
have all the traces of this.
Now think about what userspace wants to do. It reads a block from
the directory. Three seconds. Parse the directory, go read an
inode block. Three seconds. Go read the file. Three seconds
if it's less than 56k. Six seconds otherwise.
That's nine seconds since we read the directory block. I'm running
with mem=192m. So by now, the directory block has been reclaimed.
Move onto the next file.
So there is no bug or coding error present in the elevator. Everything
is working as it is designed to. But a streaming write slows read
performance by a factor of 4000.
On Mon, Sep 23, 2002 at 04:22:28AM +0000, Andrew Morton wrote:
> url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.38/2.5.38-mm2/
> read_barrier_depends.patch
> extended barrier primitives
>
> rcu_ltimer.patch
> RCU core
>
> dcache_rcu.patch
> Use RCU for dcache
>
Hi Andrew,
The following patch fixes a typo for preemptive kernels.
Later I will submit a full rcu_ltimer patch that contains
the call_rcu_preempt() interface which can be useful for
module unloading and the likes. This doesn't affect
the non-preemption path.
Thanks
--
Dipankar Sarma <[email protected]> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.
--- include/linux/rcupdate.h Mon Sep 23 11:47:26 2002
+++ /tmp/rcupdate.h Mon Sep 23 12:45:21 2002
@@ -116,7 +116,7 @@
return 0;
}
-#ifdef CONFIG_PREEMPTION
+#ifdef CONFIG_PREEMPT
#define rcu_read_lock() preempt_disable()
#define rcu_read_unlock() preempt_enable()
#else
On Mon, Sep 23, 2002 at 04:22:28AM +0000, Andrew Morton wrote:
> url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.38/2.5.38-mm2/
>
> read_barrier_depends.patch
> extended barrier primitives
>
> rcu_ltimer.patch
> RCU core
>
> dcache_rcu.patch
> Use RCU for dcache
>
Hi Andrew,
dcache_rcu orders writes using wmb() (list_del_rcu) while deleting from
the hash list and the d_lookup() hash list traversal requires an rmb() for
alpha. So, we need to use the read_barrier_depends() interface there.
This isn't a problem with any other archs AFAIK.
Thanks
--
Dipankar Sarma <[email protected]> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.
--- fs/dcache.c Mon Sep 23 11:47:26 2002
+++ /tmp/dcache.c Mon Sep 23 12:54:33 2002
@@ -870,7 +870,9 @@
rcu_read_lock();
tmp = head->next;
for (;;) {
- struct dentry * dentry = list_entry(tmp, struct dentry, d_hash);
+ struct dentry * dentry;
+ read_barrier_depends();
+ dentry = list_entry(tmp, struct dentry, d_hash);
if (tmp == head)
break;
tmp = tmp->next;
On Mon, 23 Sep 2002 15:15:59 +0530
Dipankar Sarma <[email protected]> wrote:
> Later I will submit a full rcu_ltimer patch that contains
> the call_rcu_preempt() interface which can be useful for
> module unloading and the likes. This doesn't affect
> the non-preemption path.
You don't need this: I've dropped the requirement for module
unload.
Cheers!
Rusty.
--
there are those who do and those who hang on and you don't see too
many doers quoting their contemporaries. -- Larry McVoy
On Tue, Sep 24, 2002 at 02:41:09PM +1000, Rusty Russell wrote:
> On Mon, 23 Sep 2002 15:15:59 +0530
> Dipankar Sarma <[email protected]> wrote:
> > Later I will submit a full rcu_ltimer patch that contains
> > the call_rcu_preempt() interface which can be useful for
> > module unloading and the likes. This doesn't affect
> > the non-preemption path.
>
> You don't need this: I've dropped the requirement for module
> unload.
Isn't wait_for_later() similar to synchornize_kernel() or has the
entire module unloading design been changed since ?
Thanks
--
Dipankar Sarma <[email protected]> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.
In message <[email protected]> you write:
> On Tue, Sep 24, 2002 at 02:41:09PM +1000, Rusty Russell wrote:
> > On Mon, 23 Sep 2002 15:15:59 +0530
> > Dipankar Sarma <[email protected]> wrote:
> > > Later I will submit a full rcu_ltimer patch that contains
> > > the call_rcu_preempt() interface which can be useful for
> > > module unloading and the likes. This doesn't affect
> > > the non-preemption path.
> >
> > You don't need this: I've dropped the requirement for module
> > unload.
>
> Isn't wait_for_later() similar to synchornize_kernel() or has the
> entire module unloading design been changed since ?
Yes, that was *days* ago 8)
I now just use a synchronize_kernel() which schedules on every CPU,
and disable preempt in magic places.
Ingo growled at me...
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
On Mon, 23 Sep 2002, Jens Axboe wrote:
> Ah interesting. I do still think that it is worth to investigate _why_
> both elevator_linus and deadline does not prevent the read starvation.
> The read-latency is a hack, not a solution imo.
>
> > tagged command queueing on scsi - it appears to be quite stupidly
> > implemented.
>
> Ahem I think you are being excessively harsh, or maybe passing judgement
> on something you haven't even looked at. Did you consider that you
> _drive_ may be the broken component? Excessive turn-around times for
> request when using deep tcq is not unusual, by far.
I do think that's what he meant! I think most drives are optimized this
way, and performance would be better if the kernel used the queueing more
sparingly, so the drive couldn't just run with the writes and let the
reads take the leftovers.
I think that's a better long run solution, although the fix addresses the
immediate problem.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.