2005-12-07 20:46:14

by Kenny Simpson

[permalink] [raw]
Subject: nfs question - ftruncate vs pwrite

Sorry about the previous partial message...

If a file is extended via ftruncate, the new empty pages are read in before the the ftruncate
returns (taking 64mS on my machine), but if the file is extended via pwrite, nothing is read in
and the system call is very quick (34uS).

Why is there such a difference? Is there another cheap way to grow a file and map in its new
pages? Am I missing some other semantic difference between ftruncate and a pwrite past the end of
the file?

Here is a test program.. compile with -DABUSE to get the pwrite version.

thanks,
-Kenny



__________________________________________
Yahoo! DSL ? Something to write home about.
Just $16.99/mo. or less.
dsl.yahoo.com


Attachments:
dtest.c (1.35 kB)
862959384-dtest.c

2005-12-07 21:14:19

by Peter Staubach

[permalink] [raw]
Subject: Re: nfs question - ftruncate vs pwrite

Kenny Simpson wrote:

>Sorry about the previous partial message...
>
>If a file is extended via ftruncate, the new empty pages are read in before the the ftruncate
>returns (taking 64mS on my machine), but if the file is extended via pwrite, nothing is read in
>and the system call is very quick (34uS).
>
>Why is there such a difference? Is there another cheap way to grow a file and map in its new
>pages? Am I missing some other semantic difference between ftruncate and a pwrite past the end of
>the file?
>

You might use tcpdump or etherreal to see what the different traffic looks
like. I suspect that ftruncate() leads a SETATTR operation while pwrite()
leads to a WRITE operation.

ps

2005-12-07 21:50:43

by Kenny Simpson

[permalink] [raw]
Subject: Re: nfs question - ftruncate vs pwrite

--- Peter Staubach <[email protected]> wrote:
> You might use tcpdump or etherreal to see what the different traffic looks
> like. I suspect that ftruncate() leads a SETATTR operation while pwrite()
> leads to a WRITE operation.

Ethereal results interpreted with wild speculation:
The pwrite case:
This does a bunch of reads, but the server always returns a short read responding with EOF. It
seems that a pwrite does cause a getattr call, but that's it.
Once memory is exhausted, the pages are written out.

The ftruncate case:
This does a setattr, then does a read - this time the server responds with a large amount of
0's.

Since this is using the buffer cache (not opened with O_DIRECT), and since we know we are
extending the file... is it strictly necessary to read in pages of 0's from the server?

-Kenny



__________________________________________
Yahoo! DSL ? Something to write home about.
Just $16.99/mo. or less.
dsl.yahoo.com

2005-12-08 04:53:57

by Trond Myklebust

[permalink] [raw]
Subject: Re: nfs question - ftruncate vs pwrite

On Wed, 2005-12-07 at 13:50 -0800, Kenny Simpson wrote:
> --- Peter Staubach <[email protected]> wrote:
> > You might use tcpdump or etherreal to see what the different traffic looks
> > like. I suspect that ftruncate() leads a SETATTR operation while pwrite()
> > leads to a WRITE operation.
>
> Ethereal results interpreted with wild speculation:
> The pwrite case:
> This does a bunch of reads, but the server always returns a short read responding with EOF. It
> seems that a pwrite does cause a getattr call, but that's it.
> Once memory is exhausted, the pages are written out.
>
> The ftruncate case:
> This does a setattr, then does a read - this time the server responds with a large amount of
> 0's.

That is as expected. The ftruncate() causes an immediate change in
length of the file on the server, and so reads will. In the case of
pwrite(), that is cached on the client until you fsync/close, and so the
server returns short reads.

> Since this is using the buffer cache (not opened with O_DIRECT), and since we know we are
> extending the file... is it strictly necessary to read in pages of 0's from the server?

Possibly not, but is this a common case that is worth optimising for?
Note that use of the standard write() syscall as opposed to mmap() will
not trigger this avalanche of page-ins.

Cheers,
Trond

2005-12-08 05:00:28

by Trond Myklebust

[permalink] [raw]
Subject: Re: nfs question - ftruncate vs pwrite

On Wed, 2005-12-07 at 23:53 -0500, Trond Myklebust wrote:
> On Wed, 2005-12-07 at 13:50 -0800, Kenny Simpson wrote:
> > --- Peter Staubach <[email protected]> wrote:
> > > You might use tcpdump or etherreal to see what the different traffic looks
> > > like. I suspect that ftruncate() leads a SETATTR operation while pwrite()
> > > leads to a WRITE operation.
> >
> > Ethereal results interpreted with wild speculation:
> > The pwrite case:
> > This does a bunch of reads, but the server always returns a short read responding with EOF. It
> > seems that a pwrite does cause a getattr call, but that's it.
> > Once memory is exhausted, the pages are written out.
> >
> > The ftruncate case:
> > This does a setattr, then does a read - this time the server responds with a large amount of
> > 0's.
>
> That is as expected. The ftruncate() causes an immediate change in
> length of the file on the server, and so reads will.

...Err...

...and so reads of the empty pages will succeed.

> In the case of
> pwrite(), that is cached on the client until you fsync/close, and so the
> server returns short reads.
>
> > Since this is using the buffer cache (not opened with O_DIRECT), and since we know we are
> > extending the file... is it strictly necessary to read in pages of 0's from the server?
>
> Possibly not, but is this a common case that is worth optimising for?
> Note that use of the standard write() syscall as opposed to mmap() will
> not trigger this avalanche of page-ins.
>
> Cheers,
> Trond
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2005-12-08 16:15:28

by Kenny Simpson

[permalink] [raw]
Subject: Re: nfs question - ftruncate vs pwrite

--- Trond Myklebust <[email protected]> wrote:
> That is as expected. The ftruncate() causes an immediate change in
> length of the file on the server, and so reads will. In the case of
> pwrite(), that is cached on the client until you fsync/close, and so the
> server returns short reads.
>
> > Since this is using the buffer cache (not opened with O_DIRECT), and since we know we are
> > extending the file... is it strictly necessary to read in pages of 0's from the server?
>
> Possibly not, but is this a common case that is worth optimising for?

I am attempting to write a low-latency logger. 'Low' meaning a system call is too slow (measured
at 0.3 microseconds) for each message. So I am trying to use the page cache to handle the
background scheduling of bulk writes to the server, and as an extra layer of reliability in the
event of a program crash. The use of pwrite seems to be the best option at this time as spending
a few milliseconds for an ftruncate to a show-stopper.

I could also just write locally into a shared memory region, and have my own background copy to
the server, but this seems a bit wasteful when the page cache does most of this already, and can
optimize page-sized writes.

-Kenny


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com