I tried out 2.4.14-pre3 plus the ext3 patch from Andrew Morton and
encountered some strange I/O stalls. I was doing a 'cvs tag' of my local
kernel-cvs repository, which generates a lot of read/write traffic in a
single process. I see that pre4 fixed a get_request race, (I think) but
it only applies on SMP, right?
All was going well when it seemed to stall for about 20 seconds at
least, during which it dribbled some blocks out to disk but made little
progress generally speaking.
System: UP Athlon 1.4GHZ, 512MB ram, SiS5513 IDE UDMA-100, Red Hat 7.2
plus above kernel. Ext3 is mounted in ordered-data-mode.
Here's the vmstat 1 output during that period, and for a while
before and after:
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
2 1 0 16560 5572 26956 296160 0 0 2312 4 470 1123 64 13 23
0 1 0 16560 3832 27040 297788 0 0 1680 0 521 1110 36 12 52
2 1 0 16560 3356 27052 298588 32 0 1608 0 533 16591 25 33 43
0 1 0 16560 3484 27048 298656 0 0 1936 0 581 1320 15 9 76
0 1 1 16560 3624 27024 298536 0 0 828 5176 372 466 7 5 88
1 1 1 16560 3580 27028 298564 0 0 4 3200 184 129 0 0 100
0 1 1 16560 3616 27028 298516 0 0 16 3408 181 132 0 1 99
1 1 1 16560 3588 27028 298544 0 0 12 3720 208 173 0 1 99
0 1 1 16560 3588 27028 298544 0 0 0 3336 181 1206 0 2 98
1 0 0 16560 4076 27084 298296 0 0 1640 3576 373 540 6 2 92
1 0 0 16560 3852 26920 298876 0 0 5460 0 611 16907 16 38 47
0 1 0 16560 3544 26932 299048 0 0 1212 0 536 1116 13 9 78
0 1 0 16560 3384 26940 299072 0 0 1044 0 521 977 15 7 78
1 0 0 16560 3152 26980 299156 0 0 2120 0 583 1032 10 8 82
0 1 1 16560 3292 26968 298960 0 0 796 7792 343 412 6 4 90
2 1 1 16560 3288 26968 298964 0 0 4 3272 189 411 38 1 61
0 1 1 16560 3280 26968 298968 0 0 4 2388 175 536 57 3 40
2 1 1 16560 3264 26968 298984 0 0 16 3116 180 4688 1 7 92
0 1 1 16560 3224 26972 299020 0 0 28 2016 164 11582 4 12 84
0 1 1 16560 3140 26972 299104 0 0 56 2216 180 133 0 1 99
1 0 0 16560 4008 26964 298424 0 0 1180 2932 397 889 10 8 82
1 0 0 16560 3452 26964 299068 0 0 2648 0 632 1082 8 10 82
1 0 0 16560 3212 26972 299220 0 0 1132 0 518 1347 12 8 80
2 0 0 16560 4000 27040 298352 0 0 3160 0 567 2021 12 17 71
2 1 1 16560 3624 27088 298636 0 0 1304 4452 292 2556 58 10 32
0 1 1 16560 3620 27088 298636 0 0 0 4040 191 1074 26 1 73
1 1 1 16560 3600 27088 298656 0 0 20 2788 200 16140 4 19 77
0 1 1 16560 3600 27088 298656 0 0 0 4296 209 136 0 1 99
1 1 1 16560 3580 27088 298676 0 0 4 3984 208 136 0 1 99
0 1 0 16560 3516 27168 298876 0 0 2236 3292 382 543 4 7 89
1 0 0 16560 3328 27088 299140 0 0 2804 0 585 997 12 14 74
3 0 0 16560 4592 27064 297820 0 0 2312 60 500 834 32 12 56
1 0 0 16688 3736 27108 298976 0 0 3172 0 633 1259 80 14 6
1 0 0 16944 3548 27284 301108 0 0 3336 0 647 1047 6 20 74
0 1 2 16944 3164 27332 301484 0 0 2416 260 302 16470 10 23 67
0 1 2 16944 3160 27332 301484 0 0 0 1876 160 134 0 1 99
0 1 2 16944 3160 27332 301484 0 0 0 1356 183 130 0 0 100
0 1 2 16944 3160 27332 301484 0 0 0 2320 171 133 0 0 100
0 1 2 16944 3104 27336 301520 0 0 28 1408 164 224 0 0 100
1 1 2 16944 4108 27232 300640 0 0 12 1724 175 502 50 2 49
0 1 2 16944 4108 27232 300640 0 0 0 2920 211 463 0 0 100
0 1 2 16944 4092 27236 300652 0 0 4 2056 178 16135 1 21 78
0 1 2 16944 4100 27236 300620 0 0 16 2724 150 131 0 0 100
0 1 2 16944 4100 27236 300620 0 0 0 860 182 132 0 1 99
0 1 2 16944 4060 27240 300656 0 0 28 1756 175 134 0 0 100
0 1 2 16944 4000 27240 300716 0 0 32 1788 176 133 0 1 99
2 1 2 16944 3964 27244 300752 0 0 4 1808 173 182 8 0 92
1 1 2 16944 4024 27244 300684 0 0 4 780 175 763 44 0 56
0 1 2 16944 4008 27244 300700 0 0 16 3696 190 152 0 1 99
0 1 2 16944 3972 27244 300736 0 0 20 288 172 16141 2 20 78
0 1 2 16944 3944 27272 300736 0 0 0 1768 178 140 0 0 100
<stalls around here>
0 1 2 16944 3924 27272 300756 0 0 4 252 170 135 0 1 99
0 1 2 16944 3952 27272 300728 0 0 4 160 140 130 0 0 100
0 1 2 16944 3940 27280 300732 0 0 4 696 167 136 0 0 100
2 1 2 16944 3940 27284 300732 0 0 0 424 162 396 60 0 40
0 1 2 16944 3932 27284 300736 0 0 4 244 149 233 38 0 62
0 1 1 16944 3928 27284 300740 0 0 4 204 175 16141 4 18 78
0 1 1 16944 3912 27284 300756 0 0 16 356 160 136 0 0 100
0 1 2 16944 3912 27284 300756 0 0 0 396 176 130 0 0 100
0 1 2 16944 3868 27292 300792 0 0 28 384 160 139 0 0 100
0 1 2 16944 3780 27296 300876 0 0 56 200 164 134 0 0 100
1 1 1 16944 3780 27296 300876 0 0 0 444 191 250 0 0 100
0 1 1 16944 3620 27300 301032 0 0 100 412 183 1010 1 0 99
0 1 1 16944 3396 27300 301256 0 0 128 140 148 139 0 0 100
1 0 2 16944 3392 27304 301256 0 0 0 580 170 16137 4 18 78
0 1 2 16944 3240 27308 301404 0 0 20 536 184 136 0 0 100
0 1 2 16944 3212 27308 301432 0 0 4 276 152 131 0 0 100
0 1 1 16944 3548 27308 301096 0 0 8 372 171 144 0 0 100
1 1 1 16944 3536 27308 301108 0 0 4 456 208 145 0 1 99
2 1 2 16944 3536 27312 301108 0 0 0 540 187 544 59 0 41
0 1 2 16944 3528 27316 301108 0 0 0 188 149 409 39 0 61
0 1 2 16944 3512 27320 301120 0 0 20 556 185 16140 5 18 77
0 1 1 16944 3472 27320 301160 0 0 24 444 177 136 0 0 100
0 1 1 16944 3440 27324 301188 0 0 4 424 191 223 0 0 100
0 1 1 16944 3440 27324 301188 0 0 4 492 190 144 0 1 99
0 1 2 16944 3456 27336 301160 0 0 12 444 182 140 0 0 100
2 1 1 16944 3448 27336 301172 0 0 4 456 183 401 9 1 90
<picks up again here>
2 0 0 17236 6456 27340 301304 0 0 572 44 190 2412 58 8 34
1 0 0 17428 3712 27424 304596 0 0 5800 0 637 1159 9 19 72
1 1 0 17500 3424 27412 305396 0 0 4488 0 592 16935 23 24 53
0 1 1 17680 3868 27328 305880 0 0 2580 6656 376 520 7 11 82
0 1 1 17680 3868 27328 305880 0 0 0 4792 194 130 0 0 100
0 1 2 17680 3844 27328 305892 0 0 4 5184 203 134 0 2 98
1 0 0 17680 3648 27364 306036 0 0 136 1200 197 193 0 2 98
2 0 0 17728 4016 27320 305944 0 0 3812 0 562 5660 11 19 70
3 0 0 19316 2568 27236 309324 0 0 8056 0 506 9336 16 38 47
0 1 0 19444 3820 27164 308044 0 0 3000 0 485 8695 12 14 74
0 1 0 20212 2252 27248 309580 0 24 4280 24 470 835 8 15 77
0 1 1 20212 3828 26960 308284 0 0 124 6592 220 16189 3 20 77
1 1 1 20212 3784 26960 308328 0 0 28 5468 185 128 0 0 100
0 3 1 20212 3060 26968 308404 0 0 60 8812 211 153 1 2 97
0 3 2 20340 3368 26784 308288 0 4 228 5456 190 136 0 1 99
0 2 1 20340 3584 26532 308312 0 12 60 1928 199 149 0 1 99
2 1 0 20340 4060 26664 308320 0 0 764 2104 228 798 8 7 85
3 0 0 20724 3728 26112 309352 0 40 4836 40 521 2624 50 18 32
0 1 0 20724 3240 25864 310060 0 0 3188 0 483 2137 46 17 38
1 1 0 20724 4072 25820 309228 0 0 1512 0 461 703 6 9 85
1 0 0 20724 3812 25832 309444 0 0 1212 0 458 16654 13 24 63
0 1 1 20724 3688 25860 309532 0 0 124 6612 213 184 1 2 97
0 1 1 20724 3656 25860 309540 0 0 4 6996 208 135 0 2 98
0 1 1 20724 3652 25860 309544 0 0 4 2204 200 136 0 1 99
1 0 0 20724 4168 25596 309372 0 0 940 1164 315 483 0 6 94
--
/==============================\
| David Mansfield |
| [email protected] |
\==============================/
David Mansfield wrote:
>
> I tried out 2.4.14-pre3 plus the ext3 patch from Andrew Morton and
> encountered some strange I/O stalls. I was doing a 'cvs tag' of my local
> kernel-cvs repository, which generates a lot of read/write traffic in a
> single process.
hmm.. Thanks - I'll do some metadata-intensive testing.
ext3's problem is that it is unable to react to VM pressure
for metadata (buffercache) pages. Once upon a time it did
do this, but we backed it out because it involved mauling
core kernel code. So at present we only react to VM pressure
for data pages.
Now that metadata pages have a backing address_space, I think we
can again allow ext3 to react to VM pressure against metadata.
It'll take some more mauling, but it's good mauling ;)
Then again, maybe something got broken in the buffer writeout
code or something.
Is this repeatable?
while true
do
cvs tag foo
cvs tag -d foo
done
If so, can you run `top' in parallel, see if there's
anything suspicious happening?
> David Mansfield wrote:
> >
> > I tried out 2.4.14-pre3 plus the ext3 patch from Andrew Morton and
> > encountered some strange I/O stalls. I was doing a 'cvs tag' of my local
> > kernel-cvs repository, which generates a lot of read/write traffic in a
> > single process.
>
> hmm.. Thanks - I'll do some metadata-intensive testing.
>
> ext3's problem is that it is unable to react to VM pressure
> for metadata (buffercache) pages. Once upon a time it did
> do this, but we backed it out because it involved mauling
> core kernel code. So at present we only react to VM pressure
> for data pages.
>
> Now that metadata pages have a backing address_space, I think we
> can again allow ext3 to react to VM pressure against metadata.
> It'll take some more mauling, but it's good mauling ;)
>
> Then again, maybe something got broken in the buffer writeout
> code or something.
>
> Is this repeatable?
>
> while true
> do
> cvs tag foo
> cvs tag -d foo
> done
>
> If so, can you run `top' in parallel, see if there's
> anything suspicious happening?
>
Yes it's very repeatable. In fact, running as above demonstrates it even
better here because all of the file data ends up in the cache. So there
are no reads to confuse things. Basically, to save space - it's the same,
with the writes degenerating from 5000 a second to a couple
hundred. During this time (the stalls) no process is using any CPU time
and only 'cvs' is in D state, according to ps.
Here's another detail, running 'sync' takes about 20 to 30 seconds during
these stalls, and I *think* cvs issues a 'sync' when it finishes the tag,
because it stalls in an even more pronounced way exactly at the end of the
tag (or un-tag).
David
--
/==============================\
| David Mansfield |
| [email protected] |
\==============================/
In article <[email protected]>,
Andrew Morton <[email protected]> wrote:
>
>ext3's problem is that it is unable to react to VM pressure
>for metadata (buffercache) pages. Once upon a time it did
>do this, but we backed it out because it involved mauling
>core kernel code. So at present we only react to VM pressure
>for data pages.
Note that the new VM has some support in place for the low-level
filesystem reacting to VM pressure. In particular, one thing the fs can
do is to look at the PG_launder bit (for pages) and PG_launder bit (for
buffers), to figure out if the IO is due to memory pressure.
A "sync" will not have the launder bit set, while something started due
to VM pressure will have the bits set.
>Then again, maybe something got broken in the buffer writeout
>code or something.
There are two really silly request bugs in 2.4.14-pre3. I'd suggest
trying pre5 which cleans up other things too, but even more notably
should fix the request queue thinkos.
Linus
Linus Torvalds wrote:
>
> In article <[email protected]>,
> Andrew Morton <[email protected]> wrote:
> >
> >ext3's problem is that it is unable to react to VM pressure
> >for metadata (buffercache) pages. Once upon a time it did
> >do this, but we backed it out because it involved mauling
> >core kernel code. So at present we only react to VM pressure
> >for data pages.
>
> Note that the new VM has some support in place for the low-level
> filesystem reacting to VM pressure. In particular, one thing the fs can
> do is to look at the PG_launder bit (for pages) and PG_launder bit (for
> buffers), to figure out if the IO is due to memory pressure.
We don't get that far, unfortunately. ext3's problem is that the data
journalling (and its consequent ordering requirements) mean that a bare
try_to_free_buffers() will always fail, because we're holding elevated
refcounts against the buffers. This is why we added an a_op here. To
give the fs a chance to strip its stuff off the page before
try_to_free_buffers() has its try.
> There are two really silly request bugs in 2.4.14-pre3. I'd suggest
> trying pre5 which cleans up other things too, but even more notably
> should fix the request queue thinkos.
>
Hum. I did a quick test here. cvs checkout of a kernel
tree with source and dest both on the same platter. Using
ext2:
2.4.13: 1:34
2.4.14-pre3: 1:28
2.4.14-pre5: 1:37
We need more silly bugs.
I'll poke at it a bit more. One perennial problem which
we face is that there isn't, IMO, a good set of tests for tracking
changes in thoughput. All the tools which are readily available
are good for stress testing and silly corner cases but they
don't seem to model real-world workloads well.
-
On Mon, 29 Oct 2001, Andrew Morton wrote:
>
> Hum. I did a quick test here. cvs checkout of a kernel
> tree with source and dest both on the same platter. Using
> ext2:
>
> 2.4.13: 1:34
> 2.4.14-pre3: 1:28
> 2.4.14-pre5: 1:37
>
> We need more silly bugs.
Well, considering that the silly bug could result in request queue
corruption, I really suspect you'll be happier without it ;)
The io_request_lock wasn't held in a critical place, which would
potentially improve performance, but ...
> I'll poke at it a bit more. One perennial problem which
> we face is that there isn't, IMO, a good set of tests for tracking
> changes in thoughput. All the tools which are readily available
> are good for stress testing and silly corner cases but they
> don't seem to model real-world workloads well.
Agreed.
Linus