2013-08-01 11:36:31

by Ulrich Windl

[permalink] [raw]
Subject: Possible mmap() write() problem in SLES11 SP2 kernel

Hi folks!

I think I'd let you know (maybe I'm wrong, and the kernel is right):

I write a C-program that maps a file into an private writable map. Then I modify the area a bit and use one write to write that area back to a file.

This worked fine in SLES11 kernel 3.0.74-0.6.10. However with kernel 3.0.80-0.7 the write() fails with EFAULT if the output file is the same as the input file.

The strace is amazingly short (I removed the unrelated calls):
open("xxx", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=4416, ...}) = 0
mmap(NULL, 4416, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0x7f85ac045000
close(3) = 0
open("xxx", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
write(3, 0x7f85ac045000, 4414) = -1 EFAULT (Bad address)
close(3) = 0
munmap(0x7f85ac045000, 4414) = 0

I want to have your attention if this should work, and you get my attention if this should not work. Note that the input file is closed before it's opened for write again. As the output file is typically shorter than the input, I didn't want to use a non-private mapping and a truncate, just in case you wonder...

Regards,
Ulrich


2013-08-03 22:37:29

by Hugh Dickins

[permalink] [raw]
Subject: Re: Possible mmap() write() problem in SLES11 SP2 kernel

On Thu, 1 Aug 2013, Ulrich Windl wrote:
> Hi folks!
>
> I think I'd let you know (maybe I'm wrong, and the kernel is right):
>
> I write a C-program that maps a file into an private writable map. Then I modify the area a bit and use one write to write that area back to a file.
>
> This worked fine in SLES11 kernel 3.0.74-0.6.10. However with kernel 3.0.80-0.7 the write() fails with EFAULT if the output file is the same as the input file.

I wonder if you actually did exactly the same on both kernels.

>
> The strace is amazingly short (I removed the unrelated calls):

Providing that was very helpful.

> open("xxx", O_RDONLY) = 3
> fstat(3, {st_mode=S_IFREG|0644, st_size=4416, ...}) = 0
> mmap(NULL, 4416, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0x7f85ac045000
> close(3) = 0
> open("xxx", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3

The crucial point is the above O_TRUNC when you now open the file for
writing: that truncates the file to 0-length, which unmaps any pages
mapped from it into userspace. Even the privately modified COW pages:
that often seems surprising, but it is how mmap versus truncate is
specified to work.

> write(3, 0x7f85ac045000, 4414) = -1 EFAULT (Bad address)

If your program now touched a part of the mapping, it would get
SIGBUS, there being no pages of underlying object to page in from.
But since you're accessing the area from within a system call,
that simply fails with EFAULT.

> close(3) = 0
> munmap(0x7f85ac045000, 4414) = 0
>
> I want to have your attention if this should work, and you get my attention if this should not work.

It should not work.

> Note that the input file is closed before it's opened for write again. As the output file is typically shorter than the input, I didn't want to use a non-private mapping and a truncate, just in case you wonder...

(I didn't understand your logic there.)

Hugh

2013-08-05 06:54:59

by Ulrich Windl

[permalink] [raw]
Subject: Antw: Re: Possible mmap() write() problem in SLES11 SP2 kernel

>>> Hugh Dickins <[email protected]> schrieb am 04.08.2013 um 00:37 in Nachricht
<[email protected]>:
> On Thu, 1 Aug 2013, Ulrich Windl wrote:
>> Hi folks!
>>
>> I think I'd let you know (maybe I'm wrong, and the kernel is right):
>>
>> I write a C-program that maps a file into an private writable map. Then I
> modify the area a bit and use one write to write that area back to a file.
>>
>> This worked fine in SLES11 kernel 3.0.74-0.6.10. However with kernel
> 3.0.80-0.7 the write() fails with EFAULT if the output file is the same as
> the input file.
>
> I wonder if you actually did exactly the same on both kernels.

Hi!

thanks for replying! Actually id did the sam a few thousand times (with different files and different lengths) in the previous kernel, weher it never failed, just as with the newer kernel where it always fails (it seems).

>
>>
>> The strace is amazingly short (I removed the unrelated calls):
>
> Providing that was very helpful.
>
>> open("xxx", O_RDONLY) = 3
>> fstat(3, {st_mode=S_IFREG|0644, st_size=4416, ...}) = 0
>> mmap(NULL, 4416, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0x7f85ac045000
>> close(3) = 0
>> open("xxx", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
>
> The crucial point is the above O_TRUNC when you now open the file for
> writing: that truncates the file to 0-length, which unmaps any pages
> mapped from it into userspace. Even the privately modified COW pages:

Well, but the mapping is PRIVATE, so I guessed once mapped, changes to the map won't affect the file, just as changes to the file won't affect the map. Specifically when re-opening the file for writing with O_TRUNC I did not expect the map to become invalid. Also note that the unmap still returns no error.
My manual page vaguely says: "It is unspecified whether changes made to the file after the mmap() call are visible in the mapped region."
> that often seems surprising, but it is how mmap versus truncate is
> specified to work.
>
>> write(3, 0x7f85ac045000, 4414) = -1 EFAULT (Bad address)
>
> If your program now touched a part of the mapping, it would get
> SIGBUS, there being no pages of underlying object to page in from.
> But since you're accessing the area from within a system call,
> that simply fails with EFAULT.

OK, if things are like this, the older kernel must have been faulty.

>
>> close(3) = 0
>> munmap(0x7f85ac045000, 4414) = 0
>>
>> I want to have your attention if this should work, and you get my attention
> if this should not work.
>
> It should not work.
>
>> Note that the input file is closed before it's opened for write again. As
> the output file is typically shorter than the input, I didn't want to use a
> non-private mapping and a truncate, just in case you wonder...
>
> (I didn't understand your logic there.)

The alternative to write() a part of the PRIVATE area would be to work with a non-PRIVATE area that is truncated after flushing the changes. In principle the same blocks could be written multiple times (when you move data from later parts to earlier parts (i.e.: from the far end closer to the beginning)), so I thought a PRIVATE mapping plus one write() would avoid that. I had the coice of truncate while opening, or to truncate the extra data after write(). I chose the first alternative.

Maybe I'll re-design...

Thanks,
Ulrich

>
> Hugh