Subject: Strange vmstat output. 2.6.10 Scheduler?

Hello!

I have a weird problem with a server. I use deadline.
The output of vmstat is:
1 1 12312 6064 7580 815432 0 0 5332 456 1263 3119 22 5 0 73
0 1 12312 5280 7584 816240 0 0 6204 620 1307 1530 15 4 0 81
0 1 12312 6096 7528 815088 0 0 7604 8732 1256 1398 5 8 0 87
0 1 12312 6048 7528 815144 0 0 8 8 1243 1440 14 2 0 84
0 1 12312 6048 7528 815168 0 0 0 0 1261 1287 8 2 0 90
0 2 12312 6040 7584 815184 0 0 0 0 1258 1197 7 2 0 91
0 2 12312 6024 7584 815208 0 0 0 0 1265 1309 6 5 0 89
0 2 12312 6024 7584 815208 0 0 0 136 1253 365 2 2 0 96
0 2 12312 5984 7584 815208 0 0 0 0 1268 413 2 1 0 97
0 2 12312 6052 7584 815208 0 0 0 0 1270 375 2 1 0 97
1 0 12312 6164 7584 815156 0 0 6188 288 1270 4314 23 5 0 72
0 1 12312 5684 7536 815592 0 0 4128 8656 1270 1316 14 3 0 83
0 1 12312 5684 7536 815608 0 0 0 0 1258 1251 8 3 0 89
0 1 12312 5552 7536 815628 0 0 0 0 1258 1327 8 2 0 90
0 1 12312 5552 7536 815644 0 0 0 0 1264 1127 7 2 0 91
0 1 12312 5552 7536 815668 0 0 0 2 1258 1225 8 3 0 89
0 3 12312 5248 7588 816108 0 0 4340 585 1249 1316 7 3 0 90
0 2 12312 5612 7584 815680 0 0 3092 188 1254 411 3 1 0 96
0 2 12312 5564 7584 815676 0 0 0 0 1261 259 2 2 0 96
0 2 12312 5564 7584 815676 0 0 0 0 1254 348 2 1 0 97

It seems strange because iowait is at 90% but nothing is trasnfered
to/from disk. Why is that?

I run postgresql on this server and I'm not satistied by it's speed.

Single Pentium IV, IDE disk.

What can be the problem?

Thank you very much!
---
Catalin(ux aka Dino) BOIE
catab at deuroconsult.ro
http://kernel.umbrella.ro/


2005-02-01 10:26:48

by Jens Axboe

[permalink] [raw]
Subject: Re: Strange vmstat output. 2.6.10 Scheduler?

On Tue, Feb 01 2005, Catalin(ux aka Dino) BOIE wrote:
> Hello!
>
> I have a weird problem with a server. I use deadline.
> The output of vmstat is:
> 1 1 12312 6064 7580 815432 0 0 5332 456 1263 3119 22 5 0
> 73
> 0 1 12312 5280 7584 816240 0 0 6204 620 1307 1530 15 4 0
> 81
> 0 1 12312 6096 7528 815088 0 0 7604 8732 1256 1398 5 8 0
> 87
> 0 1 12312 6048 7528 815144 0 0 8 8 1243 1440 14 2 0
> 84
> 0 1 12312 6048 7528 815168 0 0 0 0 1261 1287 8 2 0
> 90
> 0 2 12312 6040 7584 815184 0 0 0 0 1258 1197 7 2 0
> 91
> 0 2 12312 6024 7584 815208 0 0 0 0 1265 1309 6 5 0
> 89
> 0 2 12312 6024 7584 815208 0 0 0 136 1253 365 2 2 0
> 96
> 0 2 12312 5984 7584 815208 0 0 0 0 1268 413 2 1 0
> 97
> 0 2 12312 6052 7584 815208 0 0 0 0 1270 375 2 1 0
> 97
> 1 0 12312 6164 7584 815156 0 0 6188 288 1270 4314 23 5 0
> 72
> 0 1 12312 5684 7536 815592 0 0 4128 8656 1270 1316 14 3 0
> 83
> 0 1 12312 5684 7536 815608 0 0 0 0 1258 1251 8 3 0
> 89
> 0 1 12312 5552 7536 815628 0 0 0 0 1258 1327 8 2 0
> 90
> 0 1 12312 5552 7536 815644 0 0 0 0 1264 1127 7 2 0
> 91
> 0 1 12312 5552 7536 815668 0 0 0 2 1258 1225 8 3 0
> 89
> 0 3 12312 5248 7588 816108 0 0 4340 585 1249 1316 7 3 0
> 90
> 0 2 12312 5612 7584 815680 0 0 3092 188 1254 411 3 1 0
> 96
> 0 2 12312 5564 7584 815676 0 0 0 0 1261 259 2 2 0
> 96
> 0 2 12312 5564 7584 815676 0 0 0 0 1254 348 2 1 0
> 97
>
> It seems strange because iowait is at 90% but nothing is trasnfered
> to/from disk. Why is that?

The bi/bo numbers are sometimes a little confusing, it doesn't mean that
the drive is actually doing 8656kb of io at that very second - only that
this much io was queued to the drive.

> I run postgresql on this server and I'm not satistied by it's speed.
>
> Single Pentium IV, IDE disk.
>
> What can be the problem?

Do you have write cache enabled on the drive?

--
Jens Axboe

2005-02-01 10:34:23

by Massimo Cetra

[permalink] [raw]
Subject: RE: Strange vmstat output. 2.6.10 Scheduler?


> It seems strange because iowait is at 90% but nothing is
> trasnfered to/from disk. Why is that?
>
> I run postgresql on this server and I'm not satistied by it's speed.
>
> Single Pentium IV, IDE disk.
>
> What can be the problem?


I have noticed it several months ago.
Maybe this thread could be useful for you.
http://www.ussg.iu.edu/hypermail/linux/kernel/0408.2/1758.html

According to my tests, 2.4 with -lck still performs better than 2.6, so I
use 2.4 on production DB servers with any DB.

Massimo

Subject: Re: Strange vmstat output. 2.6.10 Scheduler?


> On Tue, Feb 01 2005, Catalin(ux aka Dino) BOIE wrote:
>> Hello!
>>
>> I have a weird problem with a server. I use deadline.
>> The output of vmstat is:
>> 1 1 12312 6064 7580 815432 0 0 5332 456 1263 3119 22 5 0
>> 73
>> 0 1 12312 5280 7584 816240 0 0 6204 620 1307 1530 15 4 0
>> 81
>> 0 1 12312 6096 7528 815088 0 0 7604 8732 1256 1398 5 8 0
>> 87
>> 0 1 12312 6048 7528 815144 0 0 8 8 1243 1440 14 2 0
>> 84
>> 0 1 12312 6048 7528 815168 0 0 0 0 1261 1287 8 2 0
>> 90
>> 0 2 12312 6040 7584 815184 0 0 0 0 1258 1197 7 2 0
>> 91
>> 0 2 12312 6024 7584 815208 0 0 0 0 1265 1309 6 5 0
>> 89
>> 0 2 12312 6024 7584 815208 0 0 0 136 1253 365 2 2 0
>> 96
>> 0 2 12312 5984 7584 815208 0 0 0 0 1268 413 2 1 0
>> 97
>> 0 2 12312 6052 7584 815208 0 0 0 0 1270 375 2 1 0
>> 97
>> 1 0 12312 6164 7584 815156 0 0 6188 288 1270 4314 23 5 0
>> 72
>> 0 1 12312 5684 7536 815592 0 0 4128 8656 1270 1316 14 3 0
>> 83
>> 0 1 12312 5684 7536 815608 0 0 0 0 1258 1251 8 3 0
>> 89
>> 0 1 12312 5552 7536 815628 0 0 0 0 1258 1327 8 2 0
>> 90
>> 0 1 12312 5552 7536 815644 0 0 0 0 1264 1127 7 2 0
>> 91
>> 0 1 12312 5552 7536 815668 0 0 0 2 1258 1225 8 3 0
>> 89
>> 0 3 12312 5248 7588 816108 0 0 4340 585 1249 1316 7 3 0
>> 90
>> 0 2 12312 5612 7584 815680 0 0 3092 188 1254 411 3 1 0
>> 96
>> 0 2 12312 5564 7584 815676 0 0 0 0 1261 259 2 2 0
>> 96
>> 0 2 12312 5564 7584 815676 0 0 0 0 1254 348 2 1 0
>> 97
>>
>> It seems strange because iowait is at 90% but nothing is trasnfered
>> to/from disk. Why is that?
>
> The bi/bo numbers are sometimes a little confusing, it doesn't mean that
> the drive is actually doing 8656kb of io at that very second - only that
> this much io was queued to the drive.
>
>> I run postgresql on this server and I'm not satistied by it's speed.
>>
>> Single Pentium IV, IDE disk.
>>
>> What can be the problem?
>
> Do you have write cache enabled on the drive?

No. Intentionaly I disabled it:

ATA device, with non-removable media
Model Number: WDC WD800JB-00CRA1
Serial Number: WD-WMA8E8797395
Firmware Revision: 17.07W17
Standards:
Supported: 5 4 3 2
Likely used: 6
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 156301488
device size with M = 1024*1024: 76319 MBytes
device size with M = 1000*1000: 80026 MBytes (80 GB)
Capabilities:
LBA, IORDY(can be disabled)
bytes avail on r/w long: 40 Queue depth: 1
Standby timer values: spec'd by Standard, with device specific
minimum
R/W multiple sector transfer: Max = 16 Current = 16
Recommended acoustic management value: 128, current value: 254
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* READ BUFFER cmd
* WRITE BUFFER cmd
* Host Protected Area feature set
* Look-ahead
Write cache
* Power Management feature set
Security Mode feature set
* SMART feature set
* Device Configuration Overlay feature set
Automatic Acoustic Management feature set
SET MAX security extension
* DOWNLOAD MICROCODE cmd
* SMART self-test
* SMART error logging
Security:
supported
not enabled
not locked
frozen
not expired: security count
not supported: enhanced erase
HW reset results:
CBLID- above Vih
Device num = 0 determined by CSEL
Checksum: correct

Thanks!
>
> --
> Jens Axboe
>

---
Catalin(ux aka Dino) BOIE
catab at deuroconsult.ro
http://kernel.umbrella.ro/

2005-02-01 10:42:52

by Jens Axboe

[permalink] [raw]
Subject: Re: Strange vmstat output. 2.6.10 Scheduler?

On Tue, Feb 01 2005, Catalin(ux aka Dino) BOIE wrote:
>
> >On Tue, Feb 01 2005, Catalin(ux aka Dino) BOIE wrote:
> >>Hello!
> >>
> >>I have a weird problem with a server. I use deadline.
> >>The output of vmstat is:
> >> 1 1 12312 6064 7580 815432 0 0 5332 456 1263 3119 22 5
> >> 0
> >> 73
> >> 0 1 12312 5280 7584 816240 0 0 6204 620 1307 1530 15 4
> >> 0
> >> 81
> >> 0 1 12312 6096 7528 815088 0 0 7604 8732 1256 1398 5 8
> >> 0
> >> 87
> >> 0 1 12312 6048 7528 815144 0 0 8 8 1243 1440 14 2
> >> 0
> >> 84
> >> 0 1 12312 6048 7528 815168 0 0 0 0 1261 1287 8 2
> >> 0
> >> 90
> >> 0 2 12312 6040 7584 815184 0 0 0 0 1258 1197 7 2
> >> 0
> >> 91
> >> 0 2 12312 6024 7584 815208 0 0 0 0 1265 1309 6 5
> >> 0
> >> 89
> >> 0 2 12312 6024 7584 815208 0 0 0 136 1253 365 2 2
> >> 0
> >> 96
> >> 0 2 12312 5984 7584 815208 0 0 0 0 1268 413 2 1
> >> 0
> >> 97
> >> 0 2 12312 6052 7584 815208 0 0 0 0 1270 375 2 1
> >> 0
> >> 97
> >> 1 0 12312 6164 7584 815156 0 0 6188 288 1270 4314 23 5
> >> 0
> >> 72
> >> 0 1 12312 5684 7536 815592 0 0 4128 8656 1270 1316 14 3
> >> 0
> >> 83
> >> 0 1 12312 5684 7536 815608 0 0 0 0 1258 1251 8 3
> >> 0
> >> 89
> >> 0 1 12312 5552 7536 815628 0 0 0 0 1258 1327 8 2
> >> 0
> >> 90
> >> 0 1 12312 5552 7536 815644 0 0 0 0 1264 1127 7 2
> >> 0
> >> 91
> >> 0 1 12312 5552 7536 815668 0 0 0 2 1258 1225 8 3
> >> 0
> >> 89
> >> 0 3 12312 5248 7588 816108 0 0 4340 585 1249 1316 7 3
> >> 0
> >> 90
> >> 0 2 12312 5612 7584 815680 0 0 3092 188 1254 411 3 1
> >> 0
> >> 96
> >> 0 2 12312 5564 7584 815676 0 0 0 0 1261 259 2 2
> >> 0
> >> 96
> >> 0 2 12312 5564 7584 815676 0 0 0 0 1254 348 2 1
> >> 0
> >> 97
> >>
> >>It seems strange because iowait is at 90% but nothing is trasnfered
> >>to/from disk. Why is that?
> >
> >The bi/bo numbers are sometimes a little confusing, it doesn't mean that
> >the drive is actually doing 8656kb of io at that very second - only that
> >this much io was queued to the drive.
> >
> >>I run postgresql on this server and I'm not satistied by it's speed.
> >>
> >>Single Pentium IV, IDE disk.
> >>
> >>What can be the problem?
> >
> >Do you have write cache enabled on the drive?
>
> No. Intentionaly I disabled it:
>
> ATA device, with non-removable media
> Model Number: WDC WD800JB-00CRA1
> Serial Number: WD-WMA8E8797395
> Firmware Revision: 17.07W17
> Standards:
> Supported: 5 4 3 2
> Likely used: 6
> Configuration:
> Logical max current
> cylinders 16383 16383
> heads 16 16
> sectors/track 63 63
> --
> CHS current addressable sectors: 16514064
> LBA user addressable sectors: 156301488
> device size with M = 1024*1024: 76319 MBytes
> device size with M = 1000*1000: 80026 MBytes (80 GB)
> Capabilities:
> LBA, IORDY(can be disabled)
> bytes avail on r/w long: 40 Queue depth: 1
> Standby timer values: spec'd by Standard, with device specific
> minimum
> R/W multiple sector transfer: Max = 16 Current = 16
> Recommended acoustic management value: 128, current value: 254
> DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5
> Cycle time: min=120ns recommended=120ns
> PIO: pio0 pio1 pio2 pio3 pio4
> Cycle time: no flow control=120ns IORDY flow control=120ns
> Commands/features:
> Enabled Supported:
> * READ BUFFER cmd
> * WRITE BUFFER cmd
> * Host Protected Area feature set
> * Look-ahead
> Write cache
> * Power Management feature set
> Security Mode feature set
> * SMART feature set
> * Device Configuration Overlay feature set
> Automatic Acoustic Management feature set
> SET MAX security extension
> * DOWNLOAD MICROCODE cmd
> * SMART self-test
> * SMART error logging
> Security:
> supported
> not enabled
> not locked
> frozen
> not expired: security count
> not supported: enhanced erase
> HW reset results:
> CBLID- above Vih
> Device num = 0 determined by CSEL
> Checksum: correct
>
> Thanks!

Well then that is why, the writes are taking quite a while to reach the
platter. Nothing is wrong, the drive is just slow :-)

--
Jens Axboe

Subject: Re: Strange vmstat output. 2.6.10 Scheduler?

> Well then that is why, the writes are taking quite a while to reach the
> platter. Nothing is wrong, the drive is just slow :-)
>
> --
> Jens Axboe

I don't know what to say because sometimes works really good.
bi=20000.

I'm fighting with this problem for 1 month now and I realy tried to
resolve the problem.

Postgresql is 8.0.0.

iostat -x 1 shows:
extended device statistics
device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b
hda 1 98 190.5 37.1 1161.4 542.5 7.5 2.3 20.4 3.6 81
extended device statistics
device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b
hda 0 144 8.3 121.3 0.0 1081.5 8.3 0.0 0.0 7.8 100
extended device statistics
device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b
hda 0 134 0.0 122.0 0.0 1016.0 8.3 0.0 0.0 8.2 100
extended device statistics
device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b
hda 0 139 0.0 128.3 0.0 1074.7 8.4 0.0 0.0 7.9 100
extended device statistics
device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b
hda 0 140 0.0 128.0 0.0 1072.0 8.4 0.0 0.0 7.8 100
extended device statistics
device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b
hda 0 121 0.0 115.0 0.0 936.0 8.1 0.0 0.0 8.7 100
extended device statistics
device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b
hda 0 140 0.0 123.0 0.0 1072.0 8.7 0.0 0.0 8.1 100
extended device statistics
device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b
hda 0 142 0.0 129.0 0.0 1080.0 8.4 0.0 0.0 7.8 100
extended device statistics
device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b
hda 0 141 0.0 129.3 0.0 1082.8 8.4 0.0 0.0 7.8 100

Thanks!

P.S. I tried also with cfq and see no improvement.

---
Catalin(ux aka Dino) BOIE
catab at deuroconsult.ro
http://kernel.umbrella.ro/

Subject: RE: Strange vmstat output. 2.6.10 Scheduler?

> I have noticed it several months ago.
> Maybe this thread could be useful for you.
> http://www.ussg.iu.edu/hypermail/linux/kernel/0408.2/1758.html
>
> According to my tests, 2.4 with -lck still performs better than 2.6, so I
> use 2.4 on production DB servers with any DB.
>
> Massimo

Thanks!
But seems that the thread is not finished... ;)

---
Catalin(ux aka Dino) BOIE
catab at deuroconsult.ro
http://kernel.umbrella.ro/

2005-02-01 10:50:16

by Jens Axboe

[permalink] [raw]
Subject: Re: Strange vmstat output. 2.6.10 Scheduler?

On Tue, Feb 01 2005, Catalin(ux aka Dino) BOIE wrote:
> >Well then that is why, the writes are taking quite a while to reach the
> >platter. Nothing is wrong, the drive is just slow :-)
> >
> >--
> >Jens Axboe
>
> I don't know what to say because sometimes works really good.
> bi=20000.

The vmstat you showed had bi and bo at the same time, so it's not so
surprising if bi often suffers if it is seeking all over the map to
write out the big chunk submitted.

How does it look with write cache enabled?

> I'm fighting with this problem for 1 month now and I realy tried to
> resolve the problem.

If postgres uses fsync() properly, you should be safe with write cache
enabled.

> P.S. I tried also with cfq and see no improvement.

Not surprising, there's not much the io scheduler can do for you here.

--
Jens Axboe

Subject: Re: Strange vmstat output. 2.6.10 Scheduler?

> The vmstat you showed had bi and bo at the same time, so it's not so
> surprising if bi often suffers if it is seeking all over the map to
> write out the big chunk submitted.

I understand.

> How does it look with write cache enabled?
Seems that almost the same:
0 1 12416 6068 6908 816292 0 0 1744 156 1276 1327 16 2 0 82
0 1 12416 6028 6908 816324 0 0 0 0 1270 1237 8 2 0 90
0 1 12416 5988 6908 816324 0 0 0 0 1255 454 2 2 0 96
0 2 12416 5956 6952 816324 0 0 0 80 1277 348 2 1 0 97
0 2 12416 5944 6952 816324 0 0 0 0 1257 322 2 2 0 96
0 2 12416 5960 6952 816324 0 0 0 0 1256 380 2 2 0 96
procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 2 12416 5960 6952 816324 0 0 0 0 1262 337 2 1 0 97
0 1 12416 4936 6952 817676 0 0 3400 180 1293 5466 31 6 0 63
0 1 12416 5276 6896 817180 0 0 5600 8288 1329 1622 15 4 0 81
0 1 12416 5212 6896 817228 0 0 0 0 1245 1237 7 2 0 91
0 1 12416 5148 6896 817252 0 0 0 0 1250 1313 7 3 0 90
0 1 12416 5152 6896 817284 0 0 0 0 1264 1243 7 2 0 91
0 2 12416 5152 6944 817284 0 0 0 104 1257 377 1 4 0 95
0 2 12416 5160 6944 817284 0 0 0 44 1250 407 2 1 0 97
0 2 12416 5160 6944 817284 0 0 0 0 1265 287 2 0 0 98
0 2 12416 5928 6940 816508 0 0 344 336 1272 384 2 3 0 95
0 1 12416 5608 6892 817088 0 0 4440 140 1246 5390 24 7 0 69
0 1 12416 5584 6848 817140 0 0 7436 612 1268 1707 16 3 0 81
0 1 12416 5960 6848 816480 0 0 5900 8156 1300 1428 9 5 0 86
0 1 12416 5632 6844 816824 0 0 1528 196 1271 1402 15 3 0 82

> Not surprising, there's not much the io scheduler can do for you here.
>
> --
> Jens Axboe
>

---
Catalin(ux aka Dino) BOIE
catab at deuroconsult.ro
http://kernel.umbrella.ro/

Subject: Re: Strange vmstat output. 2.6.10 Scheduler?

I forgot to say that I disabled HT because I saw an improvement.

Also, the cs seems very high:
r b swpd free buff cache si so bi bo in cs us sy id wa
1 2 12416 6048 7040 816592 0 0 0 0 1299 483 2 1 0 97
0 2 12416 6016 7040 816592 0 0 0 0 1251 400 3 2 0 95
0 1 12416 6008 7040 816616 0 0 0 84 1294 409 1 3 0 96
2 1 12416 5500 7040 817372 0 0 1460 8 1240 11636 61 11 0 28
0 1 12416 6060 6972 816796 0 0 7620 8420 1260 2327 12 4 0 84
0 1 12416 6160 6924 816784 0 0 7512 588 1266 1369 16 5 0 79
0 2 12416 5032 6924 817964 0 0 2964 936 1249 1285 14 4 0 82
0 3 12416 6176 6992 816628 0 0 0 1552 1256 1187 8 3 0 89
0 3 12416 6164 6992 816668 0 0 0 1160 1248 1272 7 1 0 92
0 3 12416 6052 6992 816716 0 0 0 876 1242 1278 8 3 0 89
0 3 12416 6052 6992 816740 0 0 0 1256 1258 1261 9 2 0 89
0 3 12416 6028 6992 816772 0 0 0 1264 1266 1388 7 1 0 92
0 3 12416 5980 6992 816772 0 0 0 616 1236 338 1 1 0 98
0 4 12416 5028 6996 817900 0 0 844 1472 1250 350 2 2 0 96
0 4 12416 6064 6996 816700 0 0 6668 700 1256 514 5 2 0 93
0 3 12416 6180 6996 816532 0 0 3252 164 1237 370 10 2 0 88
0 3 12416 6116 6996 816532 0 0 0 1120 1267 413 2 2 0 96
0 2 12416 6116 6996 816532 0 0 0 1240 1275 395 2 1 0 97

It's normal?

---
Catalin(ux aka Dino) BOIE
catab at deuroconsult.ro
http://kernel.umbrella.ro/