2001-11-28 01:31:41

by Dieter Nützel

[permalink] [raw]
Subject: RE: Unresponiveness of 2.4.16

Andrew Morton wrote:
> Jens Axboe wrote:
> >
> > I agree that the current i/o scheduler has really bad interactive
> > performance -- at first sight your changes looks mostly like add-on
> > hacks though.
>
> Good hacks, or bad ones?

As I can "see" not so good.
I've tried "dbench 32" and playing an MP3 with Noatun (KDE-2.2.2) and "saw"
my reported hiccup since 2.4.7-ac4, as always.

Noatun stops after 9-10 seconds of the "dbench 32" run and then every few
seconds, again and again. The hiccup take place more often but for shorter
times then without your patch.

System was:

2.4.16 +
preempt +
lock-break-rml-2.4.16-1.patch +
all ReiserFS patches for 2.4.16

1 GHz Athlon II
MSI MS-6167 Rev 1.0B (AMD Irongate C4, without bypass)
640 MB PC100-2-2-2 SDRAM
U160 IBM 18 GB disk
AHA-2940 UW

> It keeps things localised. It works. It's tunable. It's the best
> IO scheduler presently available.

Throughput was a little lower ;-)

Don't forget to tune max-readahead.
I've used 127 and that gave me 4 MB (at the end) to 6 MB (at the beginning of
the disk) more transferrate.
Write caching is off per default on all of my disks and it didn't offer much
gain with dbench and bonnie++.

> > Arjan's priority based scheme is more promising.
>
> If the IO priority becomes an attribute of the calling process
> then an approach like that has value. For writes, the priority
> should be driven by VM pressure and it's probably simpler just
> to stick the priority into struct buffer_head -> struct request.
> For reads, the priority could just be scooped out of *current.

Yes, please. I think, too that we need IO priority even for "little" IO
consuming (weak) RT tasks (MP3, DVD, etc).

> If we're not going to push the IO priority all the way down from
> userspace then you may as well keep the logic inside the elevator
> and just say reads-go-here and writes-go-there.
>
> But this has potential to turn into a great designfest. Are
> we going to leave 2.4 as-is? Please say no.

I'll second that.

Thank you for your work, Andrew!

-Dieter
--
Dieter N?tzel
Graduate Student, Computer Science

University of Hamburg
Department of Computer Science
@home: [email protected]


2001-11-28 02:15:04

by Andrew Morton

[permalink] [raw]
Subject: Re: Unresponiveness of 2.4.16

Dieter N?tzel wrote:
>
> Andrew Morton wrote:
> > Jens Axboe wrote:
> > >
> > > I agree that the current i/o scheduler has really bad interactive
> > > performance -- at first sight your changes looks mostly like add-on
> > > hacks though.
> >
> > Good hacks, or bad ones?
>
> As I can "see" not so good.
> I've tried "dbench 32" and playing an MP3 with Noatun (KDE-2.2.2) and "saw"
> my reported hiccup since 2.4.7-ac4, as always.

Ah. dbench. The change to balance_dirty_state() absolutely
cripples dbench throughput. And that really doesn't matter,
unless you want to run dbench for a living.

You can get the dbench throughput back by increasing the
async and sync dirty buffer writeback thresholds:

echo 70 64 64 256 30000 3000 80 0 0 > /proc/sys/vm/bdflush

> Noatun stops after 9-10 seconds of the "dbench 32" run and then every few
> seconds, again and again. The hiccup take place more often but for shorter
> times then without your patch.

Probably Noatun needs larger buffers if it is to survive concurrent
dbench. You may see improvement with

elvtune -b N /dev/hdaX

where 8 >= N >= 1.

> System was:
>
> 2.4.16 +
> preempt +
> lock-break-rml-2.4.16-1.patch +
> all ReiserFS patches for 2.4.16
>
> 1 GHz Athlon II
> MSI MS-6167 Rev 1.0B (AMD Irongate C4, without bypass)
> 640 MB PC100-2-2-2 SDRAM
> U160 IBM 18 GB disk
> AHA-2940 UW
>
> > It keeps things localised. It works. It's tunable. It's the best
> > IO scheduler presently available.
>
> Throughput was a little lower ;-)

dbench? Throughput seems to scale with the fourth power of the
amount of RAM you chuck at it :)

> Don't forget to tune max-readahead.

Yes. Readahead is fairly critical and there may be additional fixes
needed in this area.

Someone recently added the /proc/sys/vm/max_readahead (?) tunable.
Beware of this. It only works for device drivers which do not
populate their own readhead table. For IDE, it *looks* like
it works, but it doesn't. For IDE, the only way to alter VM
readahead is via

echo file_readahead:N > /proc/ide/ide0/hda/settings

where N is in kilobytes in 2.4.16 kernels. In earlier kernels
it's kilopages (!).

-

2001-11-28 02:34:57

by Mike Fedyk

[permalink] [raw]
Subject: Re: Unresponiveness of 2.4.16

On Tue, Nov 27, 2001 at 06:13:41PM -0800, Andrew Morton wrote:
> Dieter N?tzel wrote:
> > Don't forget to tune max-readahead.
>
> Yes. Readahead is fairly critical and there may be additional fixes
> needed in this area.
>
> Someone recently added the /proc/sys/vm/max_readahead (?) tunable.
> Beware of this. It only works for device drivers which do not
> populate their own readhead table. For IDE, it *looks* like
> it works, but it doesn't. For IDE, the only way to alter VM
> readahead is via
>
> echo file_readahead:N > /proc/ide/ide0/hda/settings
>
> where N is in kilobytes in 2.4.16 kernels.

Any idea which drivers it will/won't work on? ie, "almost all ide" or
"almost none of the ide driers"?

>In earlier kernels
> it's kilopages (!).

Isn't this part of the max-readahead patch?

Does /proc/sys/vm/max_readahead affect scsi in any way?

What layer does /proc/sys/vm/max_readahead affect? Block? FS?

MF

2001-11-28 02:49:25

by Andrew Morton

[permalink] [raw]
Subject: Re: Unresponiveness of 2.4.16

Mike Fedyk wrote:
>
> > echo file_readahead:N > /proc/ide/ide0/hda/settings
> >
> > where N is in kilobytes in 2.4.16 kernels.
>
> Any idea which drivers it will/won't work on? ie, "almost all ide" or
> "almost none of the ide driers"?

It appears that all IDE is controlled with /proc/ide/ide0/hda/settings

> >In earlier kernels
> > it's kilopages (!).
>
> Isn't this part of the max-readahead patch?

No, that fix went in separately. Roger Larsson created it, then
I hit the same problem and forwarded Roger's patch to the relevant
parties.

> Does /proc/sys/vm/max_readahead affect scsi in any way?

Well, `grep -r max_readahead drivers/scsi' comes up blank,
so it looks like the scsi drivers don't implement the
driver-specific readhead tunable, and so they will fall back
to the /proc/sys/vm/max_readahead global. I guess.

> What layer does /proc/sys/vm/max_readahead affect? Block? FS?

The generic filesystem library code. The bit which sits
on top of the block layer and gets its block mappings from the
filesystem and does generic_file_readahead(). Variously
referred to as VFS or VM. It's neither, and both, really.

2001-11-28 03:53:30

by Dieter Nützel

[permalink] [raw]
Subject: Re: Unresponiveness of 2.4.16

Am Mittwoch, 28. November 2001 03:34 schrieb Mike Fedyk:
> On Tue, Nov 27, 2001 at 06:13:41PM -0800, Andrew Morton wrote:
> > Dieter N?tzel wrote:
> > > Don't forget to tune max-readahead.
> >
> > Yes. Readahead is fairly critical and there may be additional fixes
> > needed in this area.
> >
> > Someone recently added the /proc/sys/vm/max_readahead (?) tunable.

-mt (Marcelo Tosatti) our _new_ 2.4.x maintainer did it.

> Isn't this part of the max-readahead patch?
>
> Does /proc/sys/vm/max_readahead affect scsi in any way?

Hello people, can you read?
I've reported U160 (SCSI) IBM DDYS (Ultrastar 36LZX) 18 GB 10k results...;-)

Kernel default:
SunWave1 src/linux# cat /proc/sys/vm/min-readahead
3
SunWave1 src/linux# cat /proc/sys/vm/max-readahead
31
SunWave1 src/linux# hdparm -tT /dev/sda1
/dev/sda1:
Timing buffer-cache reads: 128 MB in 0.80 seconds =160.00 MB/sec
Timing buffered disk reads: 64 MB in 2.28 seconds = 28.07 MB/sec


SunWave1 src/linux# cat /proc/sys/vm/max-readahead
127
SunWave1 src/linux# hdparm -tT /dev/sda1
/dev/sda1:
Timing buffer-cache reads: 128 MB in 0.80 seconds =160.00 MB/sec
Timing buffered disk reads: 64 MB in 1.87 seconds = 34.22 MB/sec

So it improved hdparm by 0.5 MB at the inner and 6 MB at the outer cylinders.

max-readahead=31 max-readahead=127
26-28 MB/s 26.5-34 MB/s

max-readahead=63 is nearly the same
max-readahead=255 little slower
max-readahead=511 even little slower

Here is a snipped of the IBM specs:

Performance
Data buffer 4 MB?
Rotational speed 10,000 RPM
Latency (average) 2.99 ms
Media transfer rate 280-452 Mbits/sec
Interface transfer rate 160 MB/sec
Sustained data rate 21.7- 36.1MB/sec

Seek time
Average 4.9 ms
Track to track 0.5 ms
Full track 10.5 ms

To Robert Love:
I get the following in dmesg:
lock-break-rml-2.4.16-1.patch

date: busy buffer
lock_break: buffer.c:681: count was 2 not 551
invalidate: busy buffer
lock_break: buffer.c:681: count was 2 not 551
invalidate: busy buffer
[-]

lock-break-rml-2.4.16-2.patch

validate: busy buffer
invalidate: busy buffer
invalidate: busy buffer
invalidate: busy buffer
[-]

Now my dbench numbers.
First without Noatun playing Ogg-Vorbis:
dbench/dbench> time ./dbench 32
32 clients started
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................+................+..............................................................................+++.+.+.......+.....++++....++.....+++++.++++.++.+++++++********************************
Throughput 43.8254 MB/sec (NB=54.7818 MB/sec 438.254 MBit/sec)
14.490u 53.230s 1:37.40 69.5% 0+0k 0+0io 937pf+0w
system load: 23.52

Second Noatun playing Ogg-Vorbis (with hiccup):
dbench/dbench> time ./dbench 32
32 clients started
...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................+....+........................+.......++++...+.+...........+.+.....+++++.+.++..++..+++.++.++++.++********************************
Throughput 42.1212 MB/sec (NB=52.6515 MB/sec 421.212 MBit/sec)
14.710u 53.940s 1:41.29 67.7% 0+0k 0+0io 937pf+0w
system load: 26.30

Not bad, I think.
Andrew, your patch follows tomorrow.

Regards,
Dieter

2001-11-28 20:24:10

by Roger Larsson

[permalink] [raw]
Subject: Re: Unresponiveness of 2.4.16

On Wednesday 28 November 2001 03:48, Andrew Morton wrote:
> Mike Fedyk wrote:
> > > echo file_readahead:N > /proc/ide/ide0/hda/settings
> > >
> > > where N is in kilobytes in 2.4.16 kernels.
> >
> > Any idea which drivers it will/won't work on? ie, "almost all ide" or
> > "almost none of the ide driers"?
>
> It appears that all IDE is controlled with /proc/ide/ide0/hda/settings
>
> > >In earlier kernels
> > > it's kilopages (!).
> >
> > Isn't this part of the max-readahead patch?
>
> No, that fix went in separately. Roger Larsson created it, then
> I hit the same problem and forwarded Roger's patch to the relevant
> parties.
>

The reason I did not send it directly, but sent it to Andre for proper
fix, is that the error is all over the place, this is from ide-cd.c

static void ide_cdrom_add_settings(ide_drive_t *drive)
{
int major = HWIF(drive)->major;
int minor = drive->select.b.unit << PARTN_BITS;

ide_add_setting(drive, "breada_readahead", SETTING_RW, BLKRAGET, BLKRASET,
TYPE_INT, 0, 255, 1, 2, &read_ahead[major], NULL);
ide_add_setting(drive, "file_readahead", SETTING_RW, BLKFRAGET, BLKFRASET,
TYPE_INTA, 0, INT_MAX, 1, 1024, &max_readahead[major][minor], NULL);
ide_add_setting(drive, "max_kb_per_request", SETTING_RW, BLKSECTGET,
BLKSECTSET, TYPE_INTA, 1, 255, 1, 2, &max_sectors[major][minor], NULL);
ide_add_setting(drive, "dsc_overlap", SETTING_RW, -1, -1, TYPE_BYTE, 0, 1,
1, 1, &drive->dsc_overlap, NULL);
}

/RogerL

--
Roger Larsson
Skellefte?
Sweden