2010-04-01 10:43:23

by Denys Fedoryshchenko

[permalink] [raw]
Subject: Re: endless sync on bdi_sched_wait()? 2.6.33.1

On Thursday 01 April 2010 01:12:54 Dave Chinner wrote:
> On Wed, Mar 31, 2010 at 07:07:31PM +0300, Denys Fedorysychenko wrote:
> > I have a proxy server with "loaded" squid. On some moment i did sync, and
> > expecting it to finish in reasonable time. Waited more than 30 minutes,
> > still "sync". Can be reproduced easily.
> >
> > Here is some stats and info:
> >
> > Linux SUPERPROXY 2.6.33.1-build-0051 #16 SMP Wed Mar 31 17:23:28 EEST
> > 2010 i686 GNU/Linux
> >
> > SUPERPROXY ~ # iostat -k -x -d 30
> > Linux 2.6.33.1-build-0051 (SUPERPROXY) 03/31/10 _i686_ (4 CPU)
> >
> > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
> > avgrq-sz avgqu-sz await svctm %util
> > sda 0.16 0.01 0.08 0.03 3.62 1.33
> > 88.94 0.15 1389.89 59.15 0.66
> > sdb 4.14 61.25 6.22 25.55 44.52 347.21
> > 24.66 2.24 70.60 2.36 7.49
> > sdc 4.37 421.28 9.95 98.31 318.27 2081.95
> > 44.34 20.93 193.21 2.31 24.96
> > sdd 2.34 339.90 3.97 117.47 95.48 1829.52
> > 31.70 1.73 14.23 8.09 98.20
>
> ^^^^ ^^^^^
>
> /dev/sdd is IO bound doing small random writeback IO. A service time
> of 8ms implies that it is doing lots of large seeks. If you've got
> GBs of data to sync and that's the writeback pattern, then sync will
> most definitely take a long, long time.
>
> it may be that ext4 is allocating blocks far apart rather than close
> together (as appears to be the case for /dev/sdc), so maybe this is
> is related to how the filesytems are aging or how full they are...
Thats correct, it is quite busy cache server.

Well, if i stop squid(cache) sync will finish enough fast.
If i don't - it took more than hour. Actually i left that PC after 1 hour, and
it didn't finish yet. I don't think it is normal.
Probably sync taking new data and trying to flush it too, and till he finish
that, more data comes.
Actually all what i need - to sync config directory. I cannot use fsync,
because it is multiple files opened before by other processes, and sync is
doing trick like this. I got dead process, and only fast way to recover system
- kill the cache process, so I/O pumping will stop for a while, and sync()
will have chance to finish.
Sure there is way just to "remount" config partition to ro, but i guess just
sync must flush only current buffer cache pages.

I will do more tests now and will give exact numbers, how much time it needs
with running squid and if i kill it shortly after running sync.
>
> Cheers,
>
> Dave.
>


2010-04-01 11:14:06

by Dave Chinner

[permalink] [raw]
Subject: Re: endless sync on bdi_sched_wait()? 2.6.33.1

On Thu, Apr 01, 2010 at 01:42:42PM +0300, Denys Fedorysychenko wrote:
> Thats correct, it is quite busy cache server.
>
> Well, if i stop squid(cache) sync will finish enough fast.
> If i don't - it took more than hour. Actually i left that PC after 1 hour, and
> it didn't finish yet. I don't think it is normal.
> Probably sync taking new data and trying to flush it too, and till he finish
> that, more data comes.
> Actually all what i need - to sync config directory. I cannot use fsync,
> because it is multiple files opened before by other processes, and sync is
> doing trick like this. I got dead process, and only fast way to recover system
> - kill the cache process, so I/O pumping will stop for a while, and sync()
> will have chance to finish.
> Sure there is way just to "remount" config partition to ro, but i guess just
> sync must flush only current buffer cache pages.
>
> I will do more tests now and will give exact numbers, how much time it needs
> with running squid and if i kill it shortly after running sync.

Ok. What would be interesting is regular output from /proc/meminfo
to see how the dirty memory is changing over the time the sync is
running....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2010-04-01 20:15:14

by Jeff Moyer

[permalink] [raw]
Subject: Re: endless sync on bdi_sched_wait()? 2.6.33.1

Dave Chinner <[email protected]> writes:

> On Thu, Apr 01, 2010 at 01:42:42PM +0300, Denys Fedorysychenko wrote:
>> Thats correct, it is quite busy cache server.
>>
>> Well, if i stop squid(cache) sync will finish enough fast.
>> If i don't - it took more than hour. Actually i left that PC after 1 hour, and
>> it didn't finish yet. I don't think it is normal.
>> Probably sync taking new data and trying to flush it too, and till he finish
>> that, more data comes.
>> Actually all what i need - to sync config directory. I cannot use fsync,
>> because it is multiple files opened before by other processes, and sync is
>> doing trick like this. I got dead process, and only fast way to recover system
>> - kill the cache process, so I/O pumping will stop for a while, and sync()
>> will have chance to finish.
>> Sure there is way just to "remount" config partition to ro, but i guess just
>> sync must flush only current buffer cache pages.
>>
>> I will do more tests now and will give exact numbers, how much time it needs
>> with running squid and if i kill it shortly after running sync.
>
> Ok. What would be interesting is regular output from /proc/meminfo
> to see how the dirty memory is changing over the time the sync is
> running....

This sounds familiar:

http://lkml.org/lkml/2010/2/12/41

Cheers,
Jeff