Andrew Morton wrote:
> Jens Axboe wrote:
> >
> > I agree that the current i/o scheduler has really bad interactive
> > performance -- at first sight your changes looks mostly like add-on
> > hacks though.
>
> Good hacks, or bad ones?
As I can "see" not so good.
I've tried "dbench 32" and playing an MP3 with Noatun (KDE-2.2.2) and "saw"
my reported hiccup since 2.4.7-ac4, as always.
Noatun stops after 9-10 seconds of the "dbench 32" run and then every few
seconds, again and again. The hiccup take place more often but for shorter
times then without your patch.
System was:
2.4.16 +
preempt +
lock-break-rml-2.4.16-1.patch +
all ReiserFS patches for 2.4.16
1 GHz Athlon II
MSI MS-6167 Rev 1.0B (AMD Irongate C4, without bypass)
640 MB PC100-2-2-2 SDRAM
U160 IBM 18 GB disk
AHA-2940 UW
> It keeps things localised. It works. It's tunable. It's the best
> IO scheduler presently available.
Throughput was a little lower ;-)
Don't forget to tune max-readahead.
I've used 127 and that gave me 4 MB (at the end) to 6 MB (at the beginning of
the disk) more transferrate.
Write caching is off per default on all of my disks and it didn't offer much
gain with dbench and bonnie++.
> > Arjan's priority based scheme is more promising.
>
> If the IO priority becomes an attribute of the calling process
> then an approach like that has value. For writes, the priority
> should be driven by VM pressure and it's probably simpler just
> to stick the priority into struct buffer_head -> struct request.
> For reads, the priority could just be scooped out of *current.
Yes, please. I think, too that we need IO priority even for "little" IO
consuming (weak) RT tasks (MP3, DVD, etc).
> If we're not going to push the IO priority all the way down from
> userspace then you may as well keep the logic inside the elevator
> and just say reads-go-here and writes-go-there.
>
> But this has potential to turn into a great designfest. Are
> we going to leave 2.4 as-is? Please say no.
I'll second that.
Thank you for your work, Andrew!
-Dieter
--
Dieter N?tzel
Graduate Student, Computer Science
University of Hamburg
Department of Computer Science
@home: [email protected]
Dieter N?tzel wrote:
>
> Andrew Morton wrote:
> > Jens Axboe wrote:
> > >
> > > I agree that the current i/o scheduler has really bad interactive
> > > performance -- at first sight your changes looks mostly like add-on
> > > hacks though.
> >
> > Good hacks, or bad ones?
>
> As I can "see" not so good.
> I've tried "dbench 32" and playing an MP3 with Noatun (KDE-2.2.2) and "saw"
> my reported hiccup since 2.4.7-ac4, as always.
Ah. dbench. The change to balance_dirty_state() absolutely
cripples dbench throughput. And that really doesn't matter,
unless you want to run dbench for a living.
You can get the dbench throughput back by increasing the
async and sync dirty buffer writeback thresholds:
echo 70 64 64 256 30000 3000 80 0 0 > /proc/sys/vm/bdflush
> Noatun stops after 9-10 seconds of the "dbench 32" run and then every few
> seconds, again and again. The hiccup take place more often but for shorter
> times then without your patch.
Probably Noatun needs larger buffers if it is to survive concurrent
dbench. You may see improvement with
elvtune -b N /dev/hdaX
where 8 >= N >= 1.
> System was:
>
> 2.4.16 +
> preempt +
> lock-break-rml-2.4.16-1.patch +
> all ReiserFS patches for 2.4.16
>
> 1 GHz Athlon II
> MSI MS-6167 Rev 1.0B (AMD Irongate C4, without bypass)
> 640 MB PC100-2-2-2 SDRAM
> U160 IBM 18 GB disk
> AHA-2940 UW
>
> > It keeps things localised. It works. It's tunable. It's the best
> > IO scheduler presently available.
>
> Throughput was a little lower ;-)
dbench? Throughput seems to scale with the fourth power of the
amount of RAM you chuck at it :)
> Don't forget to tune max-readahead.
Yes. Readahead is fairly critical and there may be additional fixes
needed in this area.
Someone recently added the /proc/sys/vm/max_readahead (?) tunable.
Beware of this. It only works for device drivers which do not
populate their own readhead table. For IDE, it *looks* like
it works, but it doesn't. For IDE, the only way to alter VM
readahead is via
echo file_readahead:N > /proc/ide/ide0/hda/settings
where N is in kilobytes in 2.4.16 kernels. In earlier kernels
it's kilopages (!).
-
On Tue, Nov 27, 2001 at 06:13:41PM -0800, Andrew Morton wrote:
> Dieter N?tzel wrote:
> > Don't forget to tune max-readahead.
>
> Yes. Readahead is fairly critical and there may be additional fixes
> needed in this area.
>
> Someone recently added the /proc/sys/vm/max_readahead (?) tunable.
> Beware of this. It only works for device drivers which do not
> populate their own readhead table. For IDE, it *looks* like
> it works, but it doesn't. For IDE, the only way to alter VM
> readahead is via
>
> echo file_readahead:N > /proc/ide/ide0/hda/settings
>
> where N is in kilobytes in 2.4.16 kernels.
Any idea which drivers it will/won't work on? ie, "almost all ide" or
"almost none of the ide driers"?
>In earlier kernels
> it's kilopages (!).
Isn't this part of the max-readahead patch?
Does /proc/sys/vm/max_readahead affect scsi in any way?
What layer does /proc/sys/vm/max_readahead affect? Block? FS?
MF
Mike Fedyk wrote:
>
> > echo file_readahead:N > /proc/ide/ide0/hda/settings
> >
> > where N is in kilobytes in 2.4.16 kernels.
>
> Any idea which drivers it will/won't work on? ie, "almost all ide" or
> "almost none of the ide driers"?
It appears that all IDE is controlled with /proc/ide/ide0/hda/settings
> >In earlier kernels
> > it's kilopages (!).
>
> Isn't this part of the max-readahead patch?
No, that fix went in separately. Roger Larsson created it, then
I hit the same problem and forwarded Roger's patch to the relevant
parties.
> Does /proc/sys/vm/max_readahead affect scsi in any way?
Well, `grep -r max_readahead drivers/scsi' comes up blank,
so it looks like the scsi drivers don't implement the
driver-specific readhead tunable, and so they will fall back
to the /proc/sys/vm/max_readahead global. I guess.
> What layer does /proc/sys/vm/max_readahead affect? Block? FS?
The generic filesystem library code. The bit which sits
on top of the block layer and gets its block mappings from the
filesystem and does generic_file_readahead(). Variously
referred to as VFS or VM. It's neither, and both, really.
Am Mittwoch, 28. November 2001 03:34 schrieb Mike Fedyk:
> On Tue, Nov 27, 2001 at 06:13:41PM -0800, Andrew Morton wrote:
> > Dieter N?tzel wrote:
> > > Don't forget to tune max-readahead.
> >
> > Yes. Readahead is fairly critical and there may be additional fixes
> > needed in this area.
> >
> > Someone recently added the /proc/sys/vm/max_readahead (?) tunable.
-mt (Marcelo Tosatti) our _new_ 2.4.x maintainer did it.
> Isn't this part of the max-readahead patch?
>
> Does /proc/sys/vm/max_readahead affect scsi in any way?
Hello people, can you read?
I've reported U160 (SCSI) IBM DDYS (Ultrastar 36LZX) 18 GB 10k results...;-)
Kernel default:
SunWave1 src/linux# cat /proc/sys/vm/min-readahead
3
SunWave1 src/linux# cat /proc/sys/vm/max-readahead
31
SunWave1 src/linux# hdparm -tT /dev/sda1
/dev/sda1:
Timing buffer-cache reads: 128 MB in 0.80 seconds =160.00 MB/sec
Timing buffered disk reads: 64 MB in 2.28 seconds = 28.07 MB/sec
SunWave1 src/linux# cat /proc/sys/vm/max-readahead
127
SunWave1 src/linux# hdparm -tT /dev/sda1
/dev/sda1:
Timing buffer-cache reads: 128 MB in 0.80 seconds =160.00 MB/sec
Timing buffered disk reads: 64 MB in 1.87 seconds = 34.22 MB/sec
So it improved hdparm by 0.5 MB at the inner and 6 MB at the outer cylinders.
max-readahead=31 max-readahead=127
26-28 MB/s 26.5-34 MB/s
max-readahead=63 is nearly the same
max-readahead=255 little slower
max-readahead=511 even little slower
Here is a snipped of the IBM specs:
Performance
Data buffer 4 MB?
Rotational speed 10,000 RPM
Latency (average) 2.99 ms
Media transfer rate 280-452 Mbits/sec
Interface transfer rate 160 MB/sec
Sustained data rate 21.7- 36.1MB/sec
Seek time
Average 4.9 ms
Track to track 0.5 ms
Full track 10.5 ms
To Robert Love:
I get the following in dmesg:
lock-break-rml-2.4.16-1.patch
date: busy buffer
lock_break: buffer.c:681: count was 2 not 551
invalidate: busy buffer
lock_break: buffer.c:681: count was 2 not 551
invalidate: busy buffer
[-]
lock-break-rml-2.4.16-2.patch
validate: busy buffer
invalidate: busy buffer
invalidate: busy buffer
invalidate: busy buffer
[-]
Now my dbench numbers.
First without Noatun playing Ogg-Vorbis:
dbench/dbench> time ./dbench 32
32 clients started
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................+................+..............................................................................+++.+.+.......+.....++++....++.....+++++.++++.++.+++++++********************************
Throughput 43.8254 MB/sec (NB=54.7818 MB/sec 438.254 MBit/sec)
14.490u 53.230s 1:37.40 69.5% 0+0k 0+0io 937pf+0w
system load: 23.52
Second Noatun playing Ogg-Vorbis (with hiccup):
dbench/dbench> time ./dbench 32
32 clients started
...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................+....+........................+.......++++...+.+...........+.+.....+++++.+.++..++..+++.++.++++.++********************************
Throughput 42.1212 MB/sec (NB=52.6515 MB/sec 421.212 MBit/sec)
14.710u 53.940s 1:41.29 67.7% 0+0k 0+0io 937pf+0w
system load: 26.30
Not bad, I think.
Andrew, your patch follows tomorrow.
Regards,
Dieter
On Wednesday 28 November 2001 03:48, Andrew Morton wrote:
> Mike Fedyk wrote:
> > > echo file_readahead:N > /proc/ide/ide0/hda/settings
> > >
> > > where N is in kilobytes in 2.4.16 kernels.
> >
> > Any idea which drivers it will/won't work on? ie, "almost all ide" or
> > "almost none of the ide driers"?
>
> It appears that all IDE is controlled with /proc/ide/ide0/hda/settings
>
> > >In earlier kernels
> > > it's kilopages (!).
> >
> > Isn't this part of the max-readahead patch?
>
> No, that fix went in separately. Roger Larsson created it, then
> I hit the same problem and forwarded Roger's patch to the relevant
> parties.
>
The reason I did not send it directly, but sent it to Andre for proper
fix, is that the error is all over the place, this is from ide-cd.c
static void ide_cdrom_add_settings(ide_drive_t *drive)
{
int major = HWIF(drive)->major;
int minor = drive->select.b.unit << PARTN_BITS;
ide_add_setting(drive, "breada_readahead", SETTING_RW, BLKRAGET, BLKRASET,
TYPE_INT, 0, 255, 1, 2, &read_ahead[major], NULL);
ide_add_setting(drive, "file_readahead", SETTING_RW, BLKFRAGET, BLKFRASET,
TYPE_INTA, 0, INT_MAX, 1, 1024, &max_readahead[major][minor], NULL);
ide_add_setting(drive, "max_kb_per_request", SETTING_RW, BLKSECTGET,
BLKSECTSET, TYPE_INTA, 1, 255, 1, 2, &max_sectors[major][minor], NULL);
ide_add_setting(drive, "dsc_overlap", SETTING_RW, -1, -1, TYPE_BYTE, 0, 1,
1, 1, &drive->dsc_overlap, NULL);
}
/RogerL
--
Roger Larsson
Skellefte?
Sweden