2010-11-11 12:59:03

by Rogier Wolff

[permalink] [raw]
Subject: Sync semantics.


Hi,

What should I expect from a "sync" system call?

The manual says:

sync() first commits inodes to buffers, and then buffers to disk.

and then goes on to state:

... since version 1.3.20 Linux does actually wait.

[for the buffers to be handed over to the drive]

So how long can I expect a "sync" call to take?

I would expect that all buffers that are dirty at the time of the
"sync" call are written by the time that sync returns. I'm currently
bombarding my fileserver with some 40-60Mbytes per second of data to
be written (*). The fileserver has 8G of memory. So max 8000 Mb of
dirty buffers can be stored, right? The server writes an average of
(at least) 40Mb/second to disk. According to my calculator, I will
have to wait up to 200 seconds for the sync system call to return....


# time sync
0.000u 0.220s 2:22:23.96 0.0% 0+0k 0+0io 2pf+0w

Two hours 22 minutes.

I typed the "time sync" again, and it hasn't returned yet.... Actually
I don't expect it to before 6-hours-from-now because that's when the
clients will run out of data to send.

wolff 13706 0.0 0.0 1816 208 pts/12 D+ 12:08 0:00 sync
wolff 14116 0.0 0.0 1908 520 pts/34 S+ 13:48 0:00 grep sync

It's been running 100 minutes by now.....


(*) The three clients are each sending 20-35 Mb/second but are being
held up by the server who doesn't seem to be handling more than about
40-50Mb/sec total.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ


2010-11-12 07:46:14

by Michal Svoboda

[permalink] [raw]
Subject: Re: Sync semantics.

Rogier Wolff wrote:
> What should I expect from a "sync" system call?
> # time sync
> 0.000u 0.220s 2:22:23.96 0.0% 0+0k 0+0io 2pf+0w
> Two hours 22 minutes.

I would also be interested to hear the answer for this question. Sync
and ongoing i/o simply don't seem to play well together, and you don't
even need anything as heavy as mentioned above.


Michal Svoboda


Attachments:
(No filename) (368.00 B)
(No filename) (198.00 B)
Download all attachments

2010-11-15 02:39:38

by Dave Chinner

[permalink] [raw]
Subject: Re: Sync semantics.

On Thu, Nov 11, 2010 at 01:52:19PM +0100, Rogier Wolff wrote:
>
> Hi,
>
> What should I expect from a "sync" system call?
>
> The manual says:
>
> sync() first commits inodes to buffers, and then buffers to disk.
>
> and then goes on to state:
>
> ... since version 1.3.20 Linux does actually wait.
>
> [for the buffers to be handed over to the drive]
>
> So how long can I expect a "sync" call to take?
>
> I would expect that all buffers that are dirty at the time of the
> "sync" call are written by the time that sync returns. I'm currently
> bombarding my fileserver with some 40-60Mbytes per second of data to
> be written (*). The fileserver has 8G of memory. So max 8000 Mb of
> dirty buffers can be stored, right? The server writes an average of
> (at least) 40Mb/second to disk. According to my calculator, I will
> have to wait up to 200 seconds for the sync system call to return....
>
>
> # time sync
> 0.000u 0.220s 2:22:23.96 0.0% 0+0k 0+0io 2pf+0w

Depending on the kernel, sync will keep writing if you keep
dirtying. This should be mostly fixed in 2.6.36....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2010-11-15 07:42:50

by Michal Svoboda

[permalink] [raw]
Subject: Re: Sync semantics.

Dave Chinner wrote:
> Depending on the kernel, sync will keep writing if you keep
> dirtying. This should be mostly fixed in 2.6.36....

Is that a "we hope that it is so but we are not sure" kind of "mostly",
or a "there are known cases when this is not true" one?

Michal Svoboda


Attachments:
(No filename) (281.00 B)
(No filename) (198.00 B)
Download all attachments

2010-11-16 01:16:12

by Dave Chinner

[permalink] [raw]
Subject: Re: Sync semantics.

On Mon, Nov 15, 2010 at 08:42:41AM +0100, Michal Svoboda wrote:
> Dave Chinner wrote:
> > Depending on the kernel, sync will keep writing if you keep
> > dirtying. This should be mostly fixed in 2.6.36....
>
> Is that a "we hope that it is so but we are not sure" kind of "mostly",
> or a "there are known cases when this is not true" one?

The latter.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2010-11-16 14:32:14

by Pavel Machek

[permalink] [raw]
Subject: Re: Sync semantics.

Hi!

> What should I expect from a "sync" system call?
>
> The manual says:
>
> sync() first commits inodes to buffers, and then buffers to disk.
>
> and then goes on to state:
>
> ... since version 1.3.20 Linux does actually wait.
>
> [for the buffers to be handed over to the drive]
>
> So how long can I expect a "sync" call to take?
>
> I would expect that all buffers that are dirty at the time of the
> "sync" call are written by the time that sync returns. I'm currently
> bombarding my fileserver with some 40-60Mbytes per second of data to
> be written (*). The fileserver has 8G of memory. So max 8000 Mb of

Are you sure? Hitting 40MB/sec is hard when it involves seeking...

You may want to lower dirty_ratio...
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2010-11-17 08:09:46

by Rogier Wolff

[permalink] [raw]
Subject: Re: Sync semantics.

On Tue, Nov 16, 2010 at 03:31:49PM +0100, Pavel Machek wrote:
> > I would expect that all buffers that are dirty at the time of the
> > "sync" call are written by the time that sync returns. I'm currently
> > bombarding my fileserver with some 40-60Mbytes per second of data to
> > be written (*). The fileserver has 8G of memory. So max 8000 Mb of
>
> Are you sure? Hitting 40MB/sec is hard when it involves seeking...

Yeah... It's about 10 times slower than when no seeking is involved,
so that makes sense, doesn't it? The machine can sustain over 400 Mb
per second on linear reads:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 0 50908 6667292 502040 0 0 430064 0 2171 1677 0 23 66 11
4 0 0 51280 6713952 501976 0 0 429596 0 2430 1889 16 28 44 12
1 0 0 51768 6754884 502100 0 0 423388 0 2460 2100 13 28 47 13
0 1 0 50760 6793392 502416 0 0 422892 0 2174 1796 0 21 68 10

Through the filesystem I get:

1073741824 bytes (1.1 GB) copied, 2.70151 s, 397 MB/s
1073741824 bytes (1.1 GB) copied, 2.62782 s, 409 MB/s

Which impresses me. In practise I seldomly see high
1xxMb/sec. (i.e. 120-150Mb per second happens, while 180-190 is rare).

On the other hand, in the same run I also get:
1073741824 bytes (1.1 GB) copied, 6.82678 s, 157 MB/s
1073741824 bytes (1.1 GB) copied, 6.66133 s, 161 MB/s
1073741824 bytes (1.1 GB) copied, 6.58995 s, 163 MB/s

which apparently is caused by these files being more fragmented. These
files (1Gb each) were written linearly, but some might have been
written wile other of these 1G files (in a different directory) were
written at the same time. I'm guessing these ended up more or less
interleaved.

Checking up on the fragmentation of these files, the fast ones have
about 600-800 fragments, while the slow ones have 1300-2000 fragments.

Mb/sec #frags
400 1252
493 865
391 755
393 606
395 819
206 937
159 901
173 1940
165 1806
157 1481
168 1351
179 2692
166 1541
154 1151
159 924
149 1228
155 1139
151 1103
150 1070
155 1160

There is SOME correlation but not 100%. This is on an 8x1T RAID.

> You may want to lower dirty_ratio...

You know, what I would REALLY want is that when say 400Mb of dirty
buffers exist, the machine would start alternating between the two or
three areas that require writing. All these should be "linear". If you
switch only once every second or so, the "seeking time" is less than
1%. In that case, my server should be able to write up to 400Mb per
second, except for that I can only supply 120Mb per second over the
Ethernet. But that would still be a 3x improvement over what the
machine can handle now.

In theory these things should work even better if things like
"dirty_ratio" are higher.

In the current situation, the "sync" call will return when the IO
system falls to "idle". The chances of "nothing needing writing"
increases as the amount of allowed buffers is lower. But the problem
is that sync keeps on waiting for those new "dirty" buffers that have
become dirty AFTER the start of the sync call.

Suppose we have a mail handling daemon that just recieved an Email
from over the network. Instead of just saying: Ok, i'll take over from
here, it prefers to write it to disk, and calls sync, so that should
the power fail, the EMail is on permanent storage, and can be
correctly handled.

This works just great, until someone manages to get the server to
continue to get new dirty buffers, so that the sync takes over ten
minutes, and the other sides MTA will time out.....

Anyway, someone told me that it's been fixed, and sync won't behave
like this anymore.

Roger.

> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
>

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ