> +suffices. However, if the user buffer is not page aligned and direct read
One more thing. direct write also makes data corruption. Think
following scenario,
1) P1-T1 uses DIO write (and starting dma)
2) P1-T2 call fork() and makes P2
3) P1-T3 write to the dio target page. and then, cow break occur and
original dio target
pages is now owned by P2.
4) P2 write the dio target page. It now does NOT make cow break. and
now we break
dio target page data.
5) DMA transfer write invalid data to disk.
The detail is described in your refer URLs.
> +runs in parallel with a
> +.BR fork (2)
> +of the reader process, it may happen that the read data is split between
> +pages owned by the original process and its child. Thus effectively read
> +data is corrupted.
> ?.LP
> ?The
> ?.B O_DIRECT
On Wed, May 2, 2012 at 4:15 AM, KOSAKI Motohiro
<[email protected]> wrote:
>> +suffices. However, if the user buffer is not page aligned and direct read
>
> One more thing. direct write also makes data corruption. Think
> following scenario,
In the light of all of the comments, can someone revise the man-pages
patch that Jan sent?
Thanks,
Michael
> 1) P1-T1 uses DIO write (and starting dma)
> 2) P1-T2 call fork() and makes P2
> 3) P1-T3 write to the dio target page. and then, cow break occur and
> original dio target
> ? ?pages is now owned by P2.
> 4) P2 write the dio target page. It now does NOT make cow break. and
> now we break
> ? ?dio target page data.
> 5) DMA transfer write invalid data to disk.
>
> The detail is described in your refer URLs.
>
>
>> +runs in parallel with a
>> +.BR fork (2)
>> +of the reader process, it may happen that the read data is split between
>> +pages owned by the original process and its child. Thus effectively read
>> +data is corrupted.
>> ?.LP
>> ?The
>> ?.B O_DIRECT
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
On 2 May 2012 03:56, Michael Kerrisk (man-pages) <[email protected]> wrote:
> On Wed, May 2, 2012 at 4:15 AM, KOSAKI Motohiro
> <[email protected]> wrote:
>>> +suffices. However, if the user buffer is not page aligned and direct read
>>
>> One more thing. direct write also makes data corruption. Think
>> following scenario,
>
> In the light of all of the comments, can someone revise the man-pages
> patch that Jan sent?
This does not quite describe the entire situation, but something understandable
to developers:
O_DIRECT IOs should never be run concurrently with fork(2) system call,
when the memory buffer is anonymous memory, or comes from mmap(2)
with MAP_PRIVATE.
Any such IOs, whether submitted with asynchronous IO interface or from
another thread in the process, should be quiesced before fork(2) is called.
Failure to do so can result in data corruption and undefined behavior in
parent and child processes.
This restriction does not apply when the memory buffer for the O_DIRECT
IOs comes from mmap(2) with MAP_SHARED or from shmat(2).
Is that on the right track? I feel it might be necessary to describe this
allowance for MAP_SHARED, because some databases may be doing
such things, and anyway it gives apps a potential way to make this work
if concurrent fork + DIO is very important.
On Wed, 2 May 2012, Nick Piggin wrote:
> On 2 May 2012 03:56, Michael Kerrisk (man-pages) <[email protected]> wrote:
> >
> > In the light of all of the comments, can someone revise the man-pages
> > patch that Jan sent?
>
> This does not quite describe the entire situation, but something understandable
> to developers:
>
> O_DIRECT IOs should never be run concurrently with fork(2) system call,
> when the memory buffer is anonymous memory, or comes from mmap(2)
> with MAP_PRIVATE.
>
> Any such IOs, whether submitted with asynchronous IO interface or from
> another thread in the process, should be quiesced before fork(2) is called.
> Failure to do so can result in data corruption and undefined behavior in
> parent and child processes.
>
> This restriction does not apply when the memory buffer for the O_DIRECT
> IOs comes from mmap(2) with MAP_SHARED or from shmat(2).
Nor does this restriction apply when the memory buffer has been advised
as MADV_DONTFORK with madvise(2), ensuring that it will not be available
to the child after fork(2).
>
>
>
> Is that on the right track? I feel it might be necessary to describe this
> allowance for MAP_SHARED, because some databases may be doing
> such things, and anyway it gives apps a potential way to make this work
> if concurrent fork + DIO is very important.
Looks good, but we do need a reference to MADV_DONTFORK, perhaps as above.
Hugh
On 2 May 2012 13:04, Hugh Dickins <[email protected]> wrote:
> On Wed, 2 May 2012, Nick Piggin wrote:
>> On 2 May 2012 03:56, Michael Kerrisk (man-pages) <[email protected]> wrote:
>> >
>> > In the light of all of the comments, can someone revise the man-pages
>> > patch that Jan sent?
>>
>> This does not quite describe the entire situation, but something understandable
>> to developers:
>>
>> O_DIRECT IOs should never be run concurrently with fork(2) system call,
>> when the memory buffer is anonymous memory, or comes from mmap(2)
>> with MAP_PRIVATE.
>>
>> Any such IOs, whether submitted with asynchronous IO interface or from
>> another thread in the process, should be quiesced before fork(2) is called.
>> Failure to do so can result in data corruption and undefined behavior in
>> parent and child processes.
>>
>> This restriction does not apply when the memory buffer for the O_DIRECT
>> IOs comes from mmap(2) with MAP_SHARED or from shmat(2).
>
> Nor does this restriction apply when the memory buffer has been advised
> as MADV_DONTFORK with madvise(2), ensuring that it will not be available
> to the child after fork(2).
Yes of course, I forgot that was exported too.
>
>>
>>
>>
>> Is that on the right track? I feel it might be necessary to describe this
>> allowance for MAP_SHARED, because some databases may be doing
>> such things, and anyway it gives apps a potential way to make this work
>> if concurrent fork + DIO is very important.
>
> Looks good, but we do need a reference to MADV_DONTFORK, perhaps as above.
Yep, thanks Hugh.
On Tue 01-05-12 20:04:15, Hugh Dickins wrote:
> On Wed, 2 May 2012, Nick Piggin wrote:
> > On 2 May 2012 03:56, Michael Kerrisk (man-pages) <[email protected]> wrote:
> > >
> > > In the light of all of the comments, can someone revise the man-pages
> > > patch that Jan sent?
> >
> > This does not quite describe the entire situation, but something understandable
> > to developers:
> >
> > O_DIRECT IOs should never be run concurrently with fork(2) system call,
> > when the memory buffer is anonymous memory, or comes from mmap(2)
> > with MAP_PRIVATE.
> >
> > Any such IOs, whether submitted with asynchronous IO interface or from
> > another thread in the process, should be quiesced before fork(2) is called.
> > Failure to do so can result in data corruption and undefined behavior in
> > parent and child processes.
> >
> > This restriction does not apply when the memory buffer for the O_DIRECT
> > IOs comes from mmap(2) with MAP_SHARED or from shmat(2).
>
> Nor does this restriction apply when the memory buffer has been advised
> as MADV_DONTFORK with madvise(2), ensuring that it will not be available
> to the child after fork(2).
Yes, I think with this addition the text is fine.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR