LinuxLists.cc - I/O issues, iowait problems, 2.4 v 2.6

2003-11-11 03:04:20

Subject: I/O issues, iowait problems, 2.4 v 2.6

I've been running some benchmarks on 2.4 v 2.6 recently, and came across
another oddity.

Running smbtorture's NBENCH test against the 2.6 kernel shows a
significant performance disparity vs Redhat 2.4.20 or 2.4.22. The target
system is running RH AS 3.0, and is an IBM x335 dual P4 XEON with 1.5GB
RAM, Broadcom gigabit NIC linked at 1000/full and an MPT RAID
controller.

Running a 12-client NBENCH test against this server running 2.4.22
consistently produces a result of ~33MB/s. Running 2.6.0-test9 through
bk-11 however, produces a much lower result, usually ~14MB/s. The test
will start at ~80MB/s, sustained for 10-15 seconds, then throughput
drops precipitously, and the file transfers slow to a crawl. The target
system shows that it's 100% I/O bound, but I can't seem to locate the
constraint. iostat and sar don't show anything out of the ordinary, but
top shows the CPUs at 99% iowait. Eventually, the bottleneck disappears,
and the performance increases, but never substantially.

I can reproduce this at will on this system, and a dual Itanium2 system
with Samba 2.2.8a and 3.0. Removing Samba from the equation, copying a
550MB file via NFS takes 240s under 2.6.0-test9-bk11 (~2.5MB/s average),
exhibiting the same iowait problem, while under 2.4.22 the same transfer
takes ~13s (~44MB/s average) without any iowait issues. A raw IP
throughput test shows ~900Mb throughput between the two boxes.

This could be a driver issue, but I don't have any other test boxes at
the moment. I can provide any debug info requested.

-Paul

2003-11-11 03:23:07

by Linus Torvalds

[permalink] [raw]

Subject: Re: I/O issues, iowait problems, 2.4 v 2.6

On 10 Nov 2003, Paul Venezia wrote:
>
> Running smbtorture's NBENCH test against the 2.6 kernel shows a
> significant performance disparity vs Redhat 2.4.20 or 2.4.22. The target
> system is running RH AS 3.0, and is an IBM x335 dual P4 XEON with 1.5GB
> RAM, Broadcom gigabit NIC linked at 1000/full and an MPT RAID
> controller.
>
> Running a 12-client NBENCH test against this server running 2.4.22
> consistently produces a result of ~33MB/s. Running 2.6.0-test9 through
> bk-11 however, produces a much lower result, usually ~14MB/s. The test
> will start at ~80MB/s, sustained for 10-15 seconds, then throughput
> drops precipitously, and the file transfers slow to a crawl.

Can you try to see where it is waiting? "ctrl + scrollock" while outside X
should get you a call trace of all the processes in the system, and it
would be interesting to see what seems to trigger the iowait. So if you do
the scrollock thing a few times while the system is spending 99% in
iowait, the results should show where the processes ended up actually
waiting for IO.

Now, it's entirely possible that the IO waits are there in 2.4.x too, but
that driver breakage or just IO scheduler breakage makes them much
_bigger_ in 2.6.x. In which case you won't see anything very interesting.

But we've also had a few cases where the IO gets throttled for totally
different reasons - implementing FDATASYNC incorrectly, for example, or
just having the memory allocator throttle on IO too aggressively (the
latter usually leads to much nicer interactive usage, but can hurt
throughput a lot).

Linus

2003-11-11 03:50:45

by Andrew Morton

[permalink] [raw]

Subject: Re: I/O issues, iowait problems, 2.4 v 2.6

Paul Venezia <[email protected]> wrote:
>
> I've been running some benchmarks on 2.4 v 2.6 recently, and came across
> another oddity.
>
> Running smbtorture's NBENCH test against the 2.6 kernel shows a
> significant performance disparity vs Redhat 2.4.20 or 2.4.22. The target
> system is running RH AS 3.0, and is an IBM x335 dual P4 XEON with 1.5GB
> RAM, Broadcom gigabit NIC linked at 1000/full and an MPT RAID
> controller.
>
> Running a 12-client NBENCH test against this server running 2.4.22
> consistently produces a result of ~33MB/s. Running 2.6.0-test9 through
> bk-11 however, produces a much lower result, usually ~14MB/s. The test
> will start at ~80MB/s, sustained for 10-15 seconds, then throughput
> drops precipitously, and the file transfers slow to a crawl. The target
> system shows that it's 100% I/O bound, but I can't seem to locate the
> constraint. iostat and sar don't show anything out of the ordinary, but
> top shows the CPUs at 99% iowait. Eventually, the bottleneck disappears,
> and the performance increases, but never substantially.

It's not clear here what direction the data is being transferred in. Is it
mostly client->server, or mostly server->client?

What filesystem is the server using?

> I can reproduce this at will on this system, and a dual Itanium2 system
> with Samba 2.2.8a and 3.0. Removing Samba from the equation, copying a
> 550MB file via NFS takes 240s under 2.6.0-test9-bk11 (~2.5MB/s average),
> exhibiting the same iowait problem, while under 2.4.22 the same transfer
> takes ~13s (~44MB/s average) without any iowait issues.

OK well that's a good step: break the problem down to the simplest possible
operation.

In which direction was the file transferred? client to server or server to
client? What kernel was running on each?

As next steps I'd suggest that you log into the server and do

time (dd if=/dev/zero of=x bs=1M count=2048 ; sync)

and

time (dd if=x of=/dev/null bs=1M count=2048 ; sync)

(this assumes that the machine has less that 2G of memory, to avoid caching
effects).

And then run the same two commands across NFS.

So break it down to the simplest possible step which exhibits the problem.

2003-11-11 04:12:58

by Paul Venezia

[permalink] [raw]

Subject: Re: I/O issues, iowait problems, 2.4 v 2.6

On Mon, 2003-11-10 at 22:54, Andrew Morton wrote:
> Eventually, the bottleneck disappears,
> > and the performance increases, but never substantially.
>
> It's not clear here what direction the data is being transferred in. Is it
> mostly client->server, or mostly server->client?

Seems to be bidirectional.

> What filesystem is the server using?

ext3

> In which direction was the file transferred? client to server or server to
> client? What kernel was running on each?

The client is running AS2.1, RH's 2.4.9-e12. Server is RH AS 3.0, 2.4.22
stock, and 2.6.0-test9, 2.6.0-test9-bk11. Transfers in both directions.
>
> As next steps I'd suggest that you log into the server and do
>
> time (dd if=/dev/zero of=x bs=1M count=2048 ; sync)
>
> and
>
> time (dd if=x of=/dev/null bs=1M count=2048 ; sync)
>
> (this assumes that the machine has less that 2G of memory, to avoid caching
> effects).

The raw file read/write is the ticket. The box tightens right up at 100% iowait.

I'd done bonnie++ i/o tests already, and except for an apparent NPTL issue on the per char,
the block i/o numbers were fine; no abnormal results whatsoever. In fact, block r/w
numbers were improved compared to 2.4.22. Now that I'm looking for it, however, I
do note extremely elevated iowait numbers during a bonnie++ run. Something in the MPT
modules?

-Paul

2003-11-11 04:33:24

by Paul Venezia

[permalink] [raw]

Subject: Re: I/O issues, iowait problems, 2.4 v 2.6

>
> Well that's nice and simple. Could you please run `vmstat 1' during that
> big `dd'? Wait for everything to achieve steady state, send us twenty
> lines of the vmstat trace?

I'd pulled this before, here's the output.

> >
> > I'd done bonnie++ i/o tests already, and except for an apparent NPTL issue on the per char,
> > the block i/o numbers were fine; no abnormal results whatsoever. In fact, block r/w
> > numbers were improved compared to 2.4.22. Now that I'm looking for it, however, I
> > do note extremely elevated iowait numbers during a bonnie++ run. Something in the MPT
> > modules?
>
> Greater than 90% I/O wait is to be expected in these tests. What is of
> interest is the overall bandwidth.2.5 megabytes per second is very broken.

Definitely. I should have noted that the 2.4.22 tests on that dd came
back in 23.607s, the 2.6.0 tests hadn't returned for over 4 minutes.

> 2.5 megabytes per second is very
> broken. I have a 53c1030 box here which uses the MPT fusion driver and it
> happily does 50MB/sec to a single disk, but I guess that's a different
> setup.

This is a 53C1030 with a RAID1 mirror. Now that I think about it, I did
the >2GB bonnie tests on a single disk, no mirror. I'll rerun the i/o
tests with this setup and then remove the mirror and see what happens.

-Paul

2003-11-11 04:23:06

by Paul Venezia

[permalink] [raw]

Subject: Re: I/O issues, iowait problems, 2.4 v 2.6

>> Now that I'm looking for it, however, I
>> do note extremely elevated iowait numbers during a bonnie++ run.

Note, however, that I do not see them during the NFS and Samba tests
under 2.4, only 2.6.0-testx

-Paul

2003-11-11 04:24:25

by Andrew Morton

[permalink] [raw]

Subject: Re: I/O issues, iowait problems, 2.4 v 2.6

Paul Venezia <[email protected]> wrote:
>
> > As next steps I'd suggest that you log into the server and do
> >
> > time (dd if=/dev/zero of=x bs=1M count=2048 ; sync)
> >
> > and
> >
> > time (dd if=x of=/dev/null bs=1M count=2048 ; sync)
> >
> > (this assumes that the machine has less that 2G of memory, to avoid caching
> > effects).
>
> The raw file read/write is the ticket. The box tightens right up at 100% iowait.

Well that's nice and simple. Could you please run `vmstat 1' during that
big `dd'? Wait for everything to achieve steady state, send us twenty
lines of the vmstat trace?

>
> I'd done bonnie++ i/o tests already, and except for an apparent NPTL issue on the per char,
> the block i/o numbers were fine; no abnormal results whatsoever. In fact, block r/w
> numbers were improved compared to 2.4.22. Now that I'm looking for it, however, I
> do note extremely elevated iowait numbers during a bonnie++ run. Something in the MPT
> modules?

Greater than 90% I/O wait is to be expected in these tests. What is of
interest is the overall bandwidth. 2.5 megabytes per second is very
broken. I have a 53c1030 box here which uses the MPT fusion driver and it
happily does 50MB/sec to a single disk, but I guess that's a different
setup.

2003-11-11 04:35:18

by Paul Venezia

[permalink] [raw]

Subject: Re: I/O issues, iowait problems, 2.4 v 2.6

vmstat output

procs memory r b swpd free buff cache si so 0 0 0 1427316 8572 68472 0 0 0 1427348 8580 68464 0 0 0 1427412 8580 68464 0 0 0 1427420 8580 68464 0 0 0 1427484 8588 68456 0 0 0 1427484 8588 68456 0 0 0 1427484 8588 68456 0 10 0 1318012 9776 169132 2 6 0 1249532 10820 233980 2 4 0 1192188 12048 288104 4 4 0 1166588 12584 311844 0 4 0 1157372 13088 319976 2 4 0 1163068 13460 313824 4 4 0 1155700 13948 320544 3 4 0 1157044 14372 318692 4 4 0 1136820 15028 337688 4 2 0 1141428 15404 332552 4 2 0 1136052 15936 337256 0 10 0 1136948 16284 336092 0 14 0 1135668 16336 337264 0 13 0 1135668 16336 337264 0 13 0 1135676 16336 337264 0 15 0 1133500 16404 336788 0 13 0 1133308 16912 332132 2 13 0 1131452 17088 333860 0 13 0 1128444 17304 336772 0 13 0 1129340 17548 335440 0 13 0 1128636 17776 336096 0 12 0 1135548 17980 313112 0 12 0 1135868 18204 312616 0 11 0 1146316 18352 287172 2 11 0 1143052 18564 290428 0 11 0 1142668 18720 290680 0 10 0 1146444 18940 286856 0 9 0 1140684 19260 290072 2 9 0 1145804 19580 284720 0 9 0 1143884 19920 286420 0 9 0 1149388 20248 267188 0 9 0 1146068 20512 270256 0 8 0 1149004 20764 248244 0 8 0 1153868 20972 243140 0 8 0 1149644 21200 247196 0 9 0 1152468 21336 244204 0 9 0 1154132 21376 223424 0 9 0 1154012 21376 220636 0 9 0 1154012 21376 220636 0 9 0 1154012 21376 220636 0 9 0 1154012 21376 220636 0 9 0 1154076 21376 220636 1 8 0 1150812 21456 223888 1 8 0 1152028 21576 222612 1 8 0 1149276 21724 225320 0 7 0 1151388 21876 221360 0 9 0 1149980 21900 222900 0 9 0 1149980 21900 222900 0 9 0 1148380 21948 222240 0 9 0 1148380 21948 222240 0 9 0 1148380 21948 222240 0 9 0 1148380 21948 222240 0 9 0 1148380 21948 222240 0 9 0 1148764 21948 222240 0 9 0 1148892 21948 222240 0 9 0 1148836 22012 222244 0 9 0 1148836 22012 222244 0 8 0 1148836 22016 222240 2 5 0 1169380 22188 202008 0 2 0 1169964 22476 199068 1 6 0 1159468 22848 203660 0 5 0 1159980 23072 200716 2 4 0 1156652 23368 202120 1 4 0 1157164 23664 201144 1 2 0 1155500 23992 202244 0 5 0 1152236 24320 198924 0 5 0 1148908 24516 202264 2 6 0 1144620 24816 206248 0 4 0 1171628 24968 178488 0 4 0 1185716 25220 162800 1 3 0 1189940 25328 158272 2 1 0 1186428 25688 161380 0 6 0 1187004 25752 159888 0 4 0 1185404 25872 159972 0 5 0 1209148 26016 133512 0 2 0 1205956 26228 136768 0 2 0 1203268 26336 134756 0 3 0 1221572 26552 116656 0 2 0 1225860 26660 112944 0 0 0 1225036 26812 113744 0 0 0 1269900 26948 69680 0 0 0 1269972 26960 69668

swap io system cpu
bi bo in cs us sy id wa
0 0 0 0 1012 17 0 0 100 0
0 0 0 50 1014 20 0 0 99 1
0 0 0 0 1012 23 0 0 100 0
0 0 0 0 1012 15 0 0 100 0
0 0 0 8 1013 20 0 0 100 0
0 0 0 0 1012 22 0 0 100 0
0 0 0 0 1012 15 0 0 100 0
0 0 392 11202 34735 40869 4 33 32 32
0 0 0 1422 34765 42133 8 34 8 49
0 0 10 5276 33213 38553 11 39 15 36
0 0 4 0 31907 37992 14 41 18 26
0 0 4 1978 30261 35581 15 41 17 27
0 0 0 1886 29806 35017 16 41 15 28
0 0 0 202 30341 35867 15 42 16 27
0 0 0 696 30072 35217 16 43 17 24
0 0 0 1574 31872 36292 17 47 13 23
0 0 0 1980 30494 34756 17 46 17 20
0 0 2 2166 30653 35264 18 47 13 23
0 0 0 3456 26323 31729 10 30 25 35
0 0 0 2446 6548 7126 1 5 45 48
0 0 0 110 1038 59 0 0 54 46
0 0 0 0 1030 22 0 0 75 25
0 0 0 9304 6265 5409 2 6 37 55
0 0 0 11572 56145 61147 1 6 2 90
0 0 0 1316 18934 21857 2 11 18 69
0 0 0 1128 20925 23989 3 12 19 66
0 0 0 964 21529 25782 3 14 10 72
0 0 0 338 20924 23571 4 15 1 81
0 0 0 4056 21764 27560 3 15 15 66
0 0 0 4434 21152 25559 3 13 24 60
0 0 0 3442 18240 19442 2 10 40 47
0 0 0 1812 20483 22976 3 14 12 71
0 0 32 3298 15806 17710 3 10 17 70
0 0 0 2106 21450 25860 4 14 37 45
0 0 0 854 27847 35023 7 25 15 53
0 0 0 1284 28355 36997 7 24 8 61
0 0 0 1202 28032 36648 6 25 25 44
0 0 0 2484 26252 31802 6 21 22 51
0 0 0 2486 25377 30292 5 18 52 25
0 0 0 2634 23041 27485 4 15 44 37
0 0 0 4748 21615 23348 3 13 42 42
0 0 0 1668 21807 23328 3 13 24 61
0 0 0 3318 18631 19351 2 10 36 52
0 0 0 1130 3800 3082 0 1 49 49
0 0 0 2292 2360 1460 0 1 41 57
0 0 0 946 1036 70 0 0 25 75
0 0 0 948 1032 59 0 0 25 75
0 0 0 3208 1044 77 0 0 25 75
0 0 0 3714 1047 74 0 0 25 75
0 0 2 2424 7540 8083 1 4 25 70
0 0 0 1778 16131 16243 2 7 20 71
0 0 0 1092 15927 17131 1 7 19 72
0 0 0 746 17818 19023 2 9 22 66
0 0 2 706 3991 3179 0 1 44 54
0 0 0 5456 1069 53 0 0 50 50
0 0 0 1558 7654 7723 1 4 58 37
0 0 0 1834 1035 45 0 0 75 25
0 0 0 1704 1044 46 0 0 75 25
0 0 0 2378 1052 50 0 0 75 25
0 0 0 2134 1052 43 0 0 75 25
0 0 0 1428 1038 47 0 0 75 25
0 0 0 964 1072 58 0 0 75 25
0 0 0 2298 1060 68 0 0 51 49
0 0 0 2204 1070 81 0 0 50 50
0 0 0 804 1059 75 0 0 50 50
0 0 0 3322 15778 18802 2 11 43 44
0 0 0 0 27115 32740 7 22 29 42
0 0 2 2218 26945 32362 7 25 44 24
0 0 0 3180 23624 28980 5 17 10 67
0 0 0 3192 27497 35656 7 24 19 50
0 0 0 1500 24638 33430 6 20 30 44
0 0 0 0 28207 36579 7 23 26 44
0 0 0 2686 25886 31366 9 25 34 32
0 0 0 2702 21070 28282 4 15 23 59
0 0 2 2650 24046 29828 5 18 17 59
0 0 0 0 19351 21122 3 12 45 40
0 0 0 3788 24860 31120 5 19 42 34
0 0 0 142 16789 16411 2 9 22 67
0 0 0 2440 27808 35784 6 24 56 15
0 0 0 2646 8459 9713 2 6 13 79
0 0 0 930 15788 15807 2 7 18 73
0 0 0 1342 15087 16572 2 8 28 61
0 0 0 632 20933 24795 4 15 25 56
0 0 0 3916 11832 12939 2 7 7 84
0 0 0 234 22075 26224 3 15 31 51
0 0 0 1632 16227 14855 1 9 51 38
0 0 0 118 17556 19174 2 10 61 27
0 0 0 0 18351 17765 2 9 89 0
0 0 0 84 1015 32 0 0 99 1

2003-11-11 04:50:52

by Andrew Morton

[permalink] [raw]

Subject: Re: I/O issues, iowait problems, 2.4 v 2.6

Paul Venezia <[email protected]> wrote:
>
> vmstat output
>
>
> procs memory swap io system cpu
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 0 0 0 1427316 8572 68472 0 0 0 0 1012 17 0 0 100 0
> 0 0 0 1427348 8580 68464 0 0 0 50 1014 20 0 0 99 1
> 0 0 0 1427412 8580 68464 0 0 0 0 1012 23 0 0 100 0
> 0 0 0 1427420 8580 68464 0 0 0 0 1012 15 0 0 100 0
> 0 0 0 1427484 8588 68456 0 0 0 8 1013 20 0 0 100 0
> 0 0 0 1427484 8588 68456 0 0 0 0 1012 22 0 0 100 0
> 0 0 0 1427484 8588 68456 0 0 0 0 1012 15 0 0 100 0
> 0 10 0 1318012 9776 169132 0 0 392 11202 34735 40869 4 33 32 32
> 2 6 0 1249532 10820 233980 0 0 0 1422 34765 42133 8 34 8 49
> 2 4 0 1192188 12048 288104 0 0 10 5276 33213 38553 11 39 15 36
> 4 4 0 1166588 12584 311844 0 0 4 0 31907 37992 14 41 18 26
> 0 4 0 1157372 13088 319976 0 0 4 1978 30261 35581 15 41 17 27
> 2 4 0 1163068 13460 313824 0 0 0 1886 29806 35017 16 41 15 28
> 4 4 0 1155700 13948 320544 0 0 0 202 30341 35867 15 42 16 27
> 3 4 0 1157044 14372 318692 0 0 0 696 30072 35217 16 43 17 24
> 4 4 0 1136820 15028 337688 0 0 0 1574 31872 36292 17 47 13 23
> 4 2 0 1141428 15404 332552 0 0 0 1980 30494 34756 17 46 17 20
> 4 2 0 1136052 15936 337256 0 0 2 2166 30653 35264 18 47 13 23
> 0 10 0 1136948 16284 336092 0 0 0 3456 26323 31729 10 30 25 35
> 0 14 0 1135668 16336 337264 0 0 0 2446 6548 7126 1 5 45 48
> 0 13 0 1135668 16336 337264 0 0 0 110 1038 59 0 0 54 46
> 0 13 0 1135676 16336 337264 0 0 0 0 1030 22 0 0 75 25
> 0 15 0 1133500 16404 336788 0 0 0 9304 6265 5409 2 6 37 55
> 0 13 0 1133308 16912 332132 0 0 0 11572 56145 61147 1 6 2 90
> 2 13 0 1131452 17088 333860 0 0 0 1316 18934 21857 2 11 18 69
> 0 13 0 1128444 17304 336772 0 0 0 1128 20925 23989 3 12 19 66
> 0 13 0 1129340 17548 335440 0 0 0 964 21529 25782 3 14 10 72
> 0 13 0 1128636 17776 336096 0 0 0 338 20924 23571 4 15 1 81
> 0 12 0 1135548 17980 313112 0 0 0 4056 21764 27560 3 15 15 66
> 0 12 0 1135868 18204 312616 0 0 0 4434 21152 25559 3 13 24 60
> 0 11 0 1146316 18352 287172 0 0 0 3442 18240 19442 2 10 40 47
> 2 11 0 1143052 18564 290428 0 0 0 1812 20483 22976 3 14 12 71
> 0 11 0 1142668 18720 290680 0 0 32 3298 15806 17710 3 10 17 70
> 0 10 0 1146444 18940 286856 0 0 0 2106 21450 25860 4 14 37 45

OK, the IO rates are obviously very poor, and the context switch rate is
suspicious as well. Certainly, testing with the single disk would help.

But. If the workload here was a simple dd of /dev/zero onto a regular
file then why on earth is the pagecache size not rising? Could you please
do:

rm foo
cat /dev/zero > foo

and rerun the `vmstat 1' trace? Make sure that after the big initial jump,
the `cache' column is increasing at a rate equal to the I/O rate. Thanks.

2003-11-11 05:06:38

by Paul Venezia

[permalink] [raw]

Subject: Re: I/O issues, iowait problems, 2.4 v 2.6

On Mon, 2003-11-10 at 23:54, Andrew Morton wrote:

> > 0 10 0 1146444 18940 286856 0 0 0 2106 21450 25860 4 14 37 45
>
> OK, the IO rates are obviously very poor, and the context switch rate is
> suspicious as well. Certainly, testing with the single disk would help.

I'll get to that as soon as I can.

>
> But. If the workload here was a simple dd of /dev/zero onto a regular
> file then why on earth is the pagecache size not rising?

This vmstat output was shot when I was first noticing this problem. The
nbench tests were running at the time. Seems to indicate the same as
below.

> Could you please
> do:
>
> rm foo
> cat /dev/zero > foo
>
> and rerun the `vmstat 1' trace? Make sure that after the big initial jump,
> the `cache' column is increasing at a rate equal to the I/O rate. Thanks.

When I first ran this test, I killed it after 45s or so, noting that the vmstat
output didn't look right. I then deleted the sample file. The file no longer existed,
but the rm didn't exit in a timely fashion, the CPUs were at 100% iowait, the load
was rising and vmstat was showing a consistent pattern of 5056 blocks out every two
seconds.

I rebooted and shot these, starting 5 seconds before the cat:

0 0 0 1474524 7084 42420 0 0 0 0 1033 47 0 0 100 0
0 0 0 1474524 7084 42420 0 0 0 0 1031 38 0 0 100 0
0 0 0 1474524 7084 42420 0 0 0 0 1016 12 0 0 100 0
1 0 0 1373716 7184 140376 0 0 0 0 1020 14 0 10 90 0
1 2 0 1166548 7392 341652 0 0 8 18836 1028 56 0 21 43 36
1 2 0 994132 7556 509312 0 0 4 1696 1030 63 0 17 27 56
1 2 0 867732 7684 632264 0 0 4 2400 1033 65 0 12 27 60
0 3 0 817748 7732 680700 0 0 4 9632 1033 66 0 5 27 67
0 4 0 817748 7732 680700 0 0 0 0 1029 47 0 0 25 75
2 2 0 817748 7732 680700 0 0 0 5372 1032 48 0 0 25 75
0 4 0 810324 7740 688104 0 0 0 104 1032 49 0 1 25 74
0 4 0 810324 7740 688104 0 0 0 0 1029 48 0 0 25 75
0 4 0 810324 7740 688104 0 0 0 4892 1038 54 0 0 25 75
0 4 0 810324 7740 688104 0 0 0 0 1024 46 0 0 25 75
0 4 0 793492 7756 704544 0 0 0 9952 1033 52 0 2 25 73
0 4 0 793492 7756 704544 0 0 0 0 1032 48 0 0 25 75
0 4 0 793428 7756 704544 0 0 0 0 1031 48 0 0 25 75
0 4 0 793428 7756 704544 0 0 0 0 1028 52 0 0 25 75
0 4 0 768276 7780 729136 0 0 0 4996 1032 51 0 2 25 72
0 4 0 768276 7780 729136 0 0 0 0 1035 46 0 0 25 75
0 4 0 768276 7780 729136 0 0 0 4892 1026 50 0 0 25 75
0 4 0 768276 7780 729136 0 0 0 0 1037 46 0 0 25 75
0 4 0 763988 7784 733212 0 0 0 5060 1032 56 0 0 25 75
0 4 0 763988 7784 733212 0 0 0 0 1032 46 0 0 25 75
0 4 0 763988 7784 733212 0 0 0 4892 1033 48 0 0 25 75
0 4 0 763988 7784 733212 0 0 0 0 1029 50 0 0 25 75
0 4 0 751316 7796 745508 0 0 0 5060 1039 52 0 1 25 74
0 4 0 751316 7796 745508 0 0 0 0 1025 52 0 0 25 75

Very similar.

-Paul

2003-11-11 05:36:13

by Linus Torvalds

[permalink] [raw]

Subject: Re: I/O issues, iowait problems, 2.4 v 2.6

On Mon, 10 Nov 2003, Andrew Morton wrote:
> > 0 10 0 1146444 18940 286856 0 0 0 2106 21450 25860 4 14 37 45
>
> OK, the IO rates are obviously very poor, and the context switch rate is
> suspicious as well. Certainly, testing with the single disk would help.
>
> But. If the workload here was a simple dd of /dev/zero onto a regular
> file then why on earth is the pagecache size not rising?

Interesting. That does indeed look really strange. Paul has more than a
gigabyte of free memory, and it isn't shrinking.

That interrupt and context switch numbers are also _way_ out of line: Paul
has big stretches with 25k+ interrupts per second and 30k+ context
switches. While at the same time only feeding a few megabytes of data
through the system.

That would imply more than one interrupt per _sector_ (there really
shouldn't be anything else going on there, if it's a local "dd"). Which
is patently ridiculous. There's something seriously broken in there.

Any "normal" IO load should get one interrupt per request, and requests
should be in the "closer to hundred-kB" range for reasonably contiguous
IO. For a normal "dd to file" on a good single-disk system, something like
40MB/s with 1500+ interrupts/sec should be the normal baseline (1000
interrupts / sec come from the regular timer interrupt, the extra 500+
would be the IO completion interrupts).

And you should see context switches on maybe a request basis (ie you might
see 500 context switches a second). Again, seeing 30k ctx/sec for a 3MB
throughput implies one context switch per 100 _bytes_. Whee. That's just
_wrong_.

I see 1200 context switches a second when I move my mouse around and try
to upset X and the window manager as much as possible. That's "normal",
with a hundred mouse events a second that just _cascade_ through a system.
But that's for a device that is literally _designed_ to be "lots of small
events, and throughput isn't even on our radar".

Btw, is the RAID1 setup using hw raid or the sw raid code? I _assume_
you're just using the MPT hw support?

Linus

2003-11-11 05:28:09

by Paul Venezia

[permalink] [raw]

Subject: Re: I/O issues, iowait problems, 2.4 v 2.6

On Mon, 2003-11-10 at 23:54, Andrew Morton wrote:

> But. If the workload here was a simple dd of /dev/zero onto a regular
> file then why on earth is the pagecache size not rising?

2.6.0-test8 on a box with an AIC7899 performs exactly as expected. I'll
pull the mirror from the MPT and see what I get.

The box with the aic7xxx is a 1P 1Ghz. I'll compile test9-bk11 on that
one just to be sure, but it looks like it may be a driver issue. I can
replicate this on another box with a 53C1030 running a mirror, so I
don't think it's specific hardware.

-Paul

2003-11-11 05:46:53

by Andrew Morton

[permalink] [raw]

Subject: Re: I/O issues, iowait problems, 2.4 v 2.6

Paul Venezia <[email protected]> wrote:
>
> On Mon, 2003-11-10 at 23:54, Andrew Morton wrote:
>
> > OK, the IO rates are obviously very poor, and the context switch rate is
> > suspicious as well. Certainly, testing with the single disk would help.
>
> I pulled the secondary, reconfigured to single drives and rebooted. All
> is now well, performance is right where it should be.
>
>
> 0 0 0 1475924 7052 42384 0 0 0 0 1015 6
> 0 0 0 1475284 7076 42360 0 0 0 156 1041 311
> 0 0 0 1475284 7076 42360 0 0 0 0 1016 12
> 0 0 0 1475284 7076 42360 0 0 0 0 1026 30
> 2 0 0 1252628 7300 258852 0 0 8 37240 1157 119
> 0 3 0 1027284 7524 478064 0 0 8 66016 1441 317
> 1 3 0 818132 7728 682948 0 0 4 70752 1439 202
> 1 3 0 593236 7944 901760 0 0 4 64576 1452 92
> 0 4 0 531412 8008 961876 0 0 4 63680 1434 97

OK, so either we broke the driver or there is some tuning sensitivity.

Could you please do:

mkdir /sys
mount none /sys -t sysfs
cd /sys/block/sdXX/queue
echo 512 > nr_requests

and retry the RAID setup?

Beyond that, dunno. We'll need to hunt down the people who worked on that
driver.

2003-11-11 05:40:40

by Paul Venezia

[permalink] [raw]

Subject: Re: I/O issues, iowait problems, 2.4 v 2.6

On Mon, 2003-11-10 at 23:54, Andrew Morton wrote:

> OK, the IO rates are obviously very poor, and the context switch rate is
> suspicious as well. Certainly, testing with the single disk would help.

I pulled the secondary, reconfigured to single drives and rebooted. All
is now well, performance is right where it should be.

0 0 0 1475924 7052 42384 0 0 0 0 1015 6
0 0 0 1475284 7076 42360 0 0 0 156 1041 311
0 0 0 1475284 7076 42360 0 0 0 0 1016 12
0 0 0 1475284 7076 42360 0 0 0 0 1026 30
2 0 0 1252628 7300 258852 0 0 8 37240 1157 119
0 3 0 1027284 7524 478064 0 0 8 66016 1441 317
1 3 0 818132 7728 682948 0 0 4 70752 1439 202
1 3 0 593236 7944 901760 0 0 4 64576 1452 92
0 4 0 531412 8008 961876 0 0 4 63680 1434 97
1 3 0 455604 8084 1035648 0 0 4 69312 1464 103
0 4 0 388148 8148 1101272 0 0 0 66328 1442 141
0 4 0 308172 8228 1179120 0 0 4 69964 1446 100
0 5 0 257548 8280 1228436 0 0 4 67548 1460 148
1 2 0 195340 8356 1288948 0 0 0 63444 1452 291
0 5 0 115532 8436 1367884 0 0 4 73896 1448 319

-Paul

2003-11-11 05:44:19

by Linus Torvalds

[permalink] [raw]

Subject: Re: I/O issues, iowait problems, 2.4 v 2.6

On 10 Nov 2003, Paul Venezia wrote:
>
> I rebooted and shot these, starting 5 seconds before the cat:
>
> 0 0 0 1474524 7084 42420 0 0 0 0 1033 47 0 0 100 0
> 0 0 0 1474524 7084 42420 0 0 0 0 1031 38 0 0 100 0
> 0 0 0 1474524 7084 42420 0 0 0 0 1016 12 0 0 100 0
> 1 0 0 1373716 7184 140376 0 0 0 0 1020 14 0 10 90 0
> 1 2 0 1166548 7392 341652 0 0 8 18836 1028 56 0 21 43 36
> 1 2 0 994132 7556 509312 0 0 4 1696 1030 63 0 17 27 56
> 1 2 0 867732 7684 632264 0 0 4 2400 1033 65 0 12 27 60
> 0 3 0 817748 7732 680700 0 0 4 9632 1033 66 0 5 27 67

Ok, looks saner in the sense that now you seem to get no interrupts at all
from your card. At least that matches the fact that you basically get no
throughput either ;)

So the previous vmstat was when there was a lot of network activity with
some associated server going on too?

So this same setup worked find for you with bonnie++ with a single disk?

Linus

2003-11-11 06:15:22

by Paul Venezia

[permalink] [raw]

Subject: Re: I/O issues, iowait problems, 2.4 v 2.6

On Tue, 2003-11-11 at 00:50, Andrew Morton wrote:
>
> Could you please do:
>
> mkdir /sys
> mount none /sys -t sysfs
> cd /sys/block/sdXX/queue
> echo 512 > nr_requests
>
> and retry the RAID setup?

No change.

> Beyond that, dunno. We'll need to hunt down the people who worked on that
> driver.

Okie, thanks for the help. I should be able to rerun these tests later
if the drivers get tweaked.

-Paul