2012-10-15 21:50:00

by Juliusz Chroboczek

[permalink] [raw]
Subject: Write is not atomic?

Hi,

The Linux manual page for write(2) says:

The adjustment of the file offset and the write operation are
performed as an atomic step.

This is apparently an extension to POSIX, which says

This volume of IEEE Std 1003.1-2001 does not specify behavior of
concurrent writes to a file from multiple processes. Applications
should use some form of concurrency control.

The following fragment of code

int fd;
fd = open("exemple", O_CREAT | O_WRONLY | O_TRUNC, 0666);
fork();
write(fd, "Ouille", 6);
close(fd);

produces "OuilleOuille", as expected, on ext4 on two machines running
Linux 3.2 AMD64. However, over XFS on an old Pentium III at 500 MHz
running 2.6.32, it produces just "Ouille" roughly once in three times.

Sorry for not being able to give more test cases, but I cannot easily
change either the filesystem or the kernel on the Pentium server.

-- Juliusz Chroboczek

P.S. I'll appreciate being copied with any replies.


2012-10-15 22:21:38

by Max Filippov

[permalink] [raw]
Subject: Re: Write is not atomic?

On Tue, Oct 16, 2012 at 1:36 AM, Juliusz Chroboczek <[email protected]> wrote:
> Hi,
>
> The Linux manual page for write(2) says:
>
> The adjustment of the file offset and the write operation are
> performed as an atomic step.
>
> This is apparently an extension to POSIX, which says
>
> This volume of IEEE Std 1003.1-2001 does not specify behavior of
> concurrent writes to a file from multiple processes. Applications
> should use some form of concurrency control.
>
> The following fragment of code
>
> int fd;
> fd = open("exemple", O_CREAT | O_WRONLY | O_TRUNC, 0666);
> fork();
> write(fd, "Ouille", 6);

You don't check return code here, does write succeed at all?

> close(fd);
>
> produces "OuilleOuille", as expected, on ext4 on two machines running
> Linux 3.2 AMD64. However, over XFS on an old Pentium III at 500 MHz
> running 2.6.32, it produces just "Ouille" roughly once in three times.

Does it ever produce e.g. OuOuilleille (as this is what atomicity is about
here)?

--
Thanks.
-- Max

2012-10-15 23:13:10

by Dave Chinner

[permalink] [raw]
Subject: Re: Write is not atomic?

On Mon, Oct 15, 2012 at 11:36:15PM +0200, Juliusz Chroboczek wrote:
> Hi,
>
> The Linux manual page for write(2) says:
>
> The adjustment of the file offset and the write operation are
> performed as an atomic step.

That's wrong. The file offset update is not synchronised at all with
the write, and for a shared fd the update will race.


> This is apparently an extension to POSIX, which says
>
> This volume of IEEE Std 1003.1-2001 does not specify behavior of
> concurrent writes to a file from multiple processes. Applications
> should use some form of concurrency control.

This is how Linux behaves.

> The following fragment of code
>
> int fd;
> fd = open("exemple", O_CREAT | O_WRONLY | O_TRUNC, 0666);
> fork();
> write(fd, "Ouille", 6);
> close(fd);
>
> produces "OuilleOuille", as expected, on ext4 on two machines running
> Linux 3.2 AMD64. However, over XFS on an old Pentium III at 500 MHz
> running 2.6.32, it produces just "Ouille" roughly once in three times.

ext4, on 3.6:

$ for i in `seq 0 10000`; do ./a.out ; cat /mnt/scratch/foo ; echo ; done | sort |uniq -c
39 Ouille
9962 OuilleOuille
$

XFS, on the same kernel, hardware and block device:

$ for i in `seq 0 10000`; do ./a.out ; cat /mnt/scratch/foo ; echo ; done | sort |uniq -c
40 Ouille
9961 OuilleOuille
$

So both filesystems behave according to the POSIX definition of
concurrent writes....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2012-10-15 23:34:48

by Philippe Troin

[permalink] [raw]
Subject: Re: Write is not atomic?

On Tue, 2012-10-16 at 10:13 +1100, Dave Chinner wrote:
> On Mon, Oct 15, 2012 at 11:36:15PM +0200, Juliusz Chroboczek wrote:
> > Hi,
> >
> > The Linux manual page for write(2) says:
> >
> > The adjustment of the file offset and the write operation are
> > performed as an atomic step.
>
> That's wrong. The file offset update is not synchronised at all with
> the write, and for a shared fd the update will race.

That's what O_APPEND or pread/pwrite are for.

> > This is apparently an extension to POSIX, which says
> >
> > This volume of IEEE Std 1003.1-2001 does not specify behavior of
> > concurrent writes to a file from multiple processes. Applications
> > should use some form of concurrency control.
>
> This is how Linux behaves.
>
> > The following fragment of code
> >
> > int fd;
> > fd = open("exemple", O_CREAT | O_WRONLY | O_TRUNC, 0666);
> > fork();
> > write(fd, "Ouille", 6);
> > close(fd);

can be replaced with:

int fd;
fd = open("exemple", O_CREAT | O_WRONLY | O_TRUNC | O_APPEND, 0666);
fork();
write(fd, "Ouille", 6);
close(fd);

or:

int fd;
fd = open("exemple", O_CREAT | O_WRONLY | O_TRUNC, 0666);
pid_t pid = fork();
pwrite(fd, "Ouille", 6, strlen("Ouille")*(pid == 0));
close(fd);

(both code fragments untested)

Phil.

2012-10-15 23:36:35

by Juliusz Chroboczek

[permalink] [raw]
Subject: Re: Write is not atomic?

> You don't check return code here, does write succeed at all?

Yes, both writes return 6.

> Does it ever produce e.g. OuOuilleille

No.

> (as this is what atomicity is about here)?

I was referring to the claim that under Linux writing and adjusting the
file offset are performed as an atomic step, not to the atomicity of the
write operation itself.

-- Juliusz

2012-10-15 23:42:54

by Max Filippov

[permalink] [raw]
Subject: Re: Write is not atomic?

On Tue, Oct 16, 2012 at 3:24 AM, Philippe Troin <[email protected]> wrote:
> On Tue, 2012-10-16 at 10:13 +1100, Dave Chinner wrote:
>> On Mon, Oct 15, 2012 at 11:36:15PM +0200, Juliusz Chroboczek wrote:
>> > The following fragment of code
>> >
>> > int fd;
>> > fd = open("exemple", O_CREAT | O_WRONLY | O_TRUNC, 0666);
>> > fork();
>> > write(fd, "Ouille", 6);
>> > close(fd);
>
> can be replaced with:
>
> int fd;
> fd = open("exemple", O_CREAT | O_WRONLY | O_TRUNC | O_APPEND, 0666);
> fork();
> write(fd, "Ouille", 6);
> close(fd);

Fails the same way as the original. I guess O_APPEND doesn't work this way
for writes to the shared file descriptor.

--
Thanks.
-- Max

2012-10-16 00:00:43

by Jochen Striepe

[permalink] [raw]
Subject: Re: Write is not atomic?

Hello,

On Mon, Oct 15, 2012 at 11:36:15PM +0200, Juliusz Chroboczek wrote:
> The Linux manual page for write(2) says:
>
> The adjustment of the file offset and the write operation are
> performed as an atomic step.

This seems out of context.

Over here write(2) reads:

If the file was open(2)ed with O_APPEND, the file offset is first
set to the end of the file before writing. The adjustment of the
file offset and the write operation are performed as an atomic
step.

Sounds different, doesn't it?


Hth,
Jochen.

2012-10-16 06:22:11

by Juliusz Chroboczek

[permalink] [raw]
Subject: Re: Write is not atomic?

> This seems out of context.

> If the file was open(2)ed with O_APPEND, the file offset is first
> set to the end of the file before writing. The adjustment of the
> file offset and the write operation are performed as an atomic
> step.

> Sounds different, doesn't it?

Yes, it does -- thanks for the clarification. (And thanks to Dave for
the interesting tests.)

Sorry for the confusion,

-- Juliusz