2009-01-28 21:34:53

by Greg KH

[permalink] [raw]
Subject: open(2) says O_DIRECT works on 512 byte boundries?

In looking at open(2), it says that O_DIRECT works on 512 byte boundries
with the 2.6 kernel release:
Under Linux 2.4, transfer sizes, and the alignment of the user
buffer and the file offset must all be multiples of the logical
block size of the file system. Under Linux 2.6, alignment to
512-byte boundaries suffices.

However if you try to access an O_DIRECT opened file with a buffer that
is PAGE_SIZE aligned + 512 bytes, it fails in a bad way (wrong data is
read.)

Is this just a mistake in the documentation? Or am I reading it
incorrectly?

I have a test program that shows this if anyone wants it.

thanks,

greg k-h


2009-01-29 00:42:21

by Robert Hancock

[permalink] [raw]
Subject: Re: open(2) says O_DIRECT works on 512 byte boundries?

Greg KH wrote:
> In looking at open(2), it says that O_DIRECT works on 512 byte boundries
> with the 2.6 kernel release:
> Under Linux 2.4, transfer sizes, and the alignment of the user
> buffer and the file offset must all be multiples of the logical
> block size of the file system. Under Linux 2.6, alignment to
> 512-byte boundaries suffices.
>
> However if you try to access an O_DIRECT opened file with a buffer that
> is PAGE_SIZE aligned + 512 bytes, it fails in a bad way (wrong data is
> read.)
>
> Is this just a mistake in the documentation? Or am I reading it
> incorrectly?
>
> I have a test program that shows this if anyone wants it.

Well, it sounds like a bug to me.. even if it's not supported, if you do
such an access, surely the kernel should detect that and return EINVAL
or something rather than reading corrupted data..

2009-01-29 03:17:46

by Greg KH

[permalink] [raw]
Subject: Re: open(2) says O_DIRECT works on 512 byte boundries?

On Thu, Jan 29, 2009 at 03:59:12PM +1300, Michael Kerrisk wrote:
> On Thu, Jan 29, 2009 at 2:17 PM, Greg KH <[email protected]> wrote:
> >
> >
> >
> > On Wed, Jan 28, 2009 at 06:41:49PM -0600, Robert Hancock wrote:
> >>
> >>
> >> Greg KH wrote:
> >>> In looking at open(2), it says that O_DIRECT works on 512 byte boundries
> >>> with the 2.6 kernel release:
> >>> Under Linux 2.4, transfer sizes, and the alignment of the user
> >>> buffer and the file offset must all be multiples of the logical
> >>> block size of the file system. Under Linux 2.6, alignment to
> >>> 512-byte boundaries suffices.
> >>> However if you try to access an O_DIRECT opened file with a buffer that
> >>> is PAGE_SIZE aligned + 512 bytes, it fails in a bad way (wrong data is
> >>> read.)
> >>> Is this just a mistake in the documentation? Or am I reading it
> >>> incorrectly?
> >>> I have a test program that shows this if anyone wants it.
> >>
> >> Well, it sounds like a bug to me.. even if it's not supported, if you do
> >> such an access, surely the kernel should detect that and return EINVAL or
> >> something rather than reading corrupted data..
> >
> > It doesn't. It says the read is successful, yet the data is not really
> > read into the buffer. Portions of it is, but not the amount we asked
> > for.
>
> Greg,
>
> Can you post your test program?

Sure, here it is. I'm still not quite sure it is valid, but at first
glance it seems to be.

Run it once with no arguments and all of the files will be created.
Then run it again with no offset being asked for:
./dma_thread -a 0
then with an offset:
./dma_thread -a 512

The second one breaks.

thanks,

greg k-h


Attachments:
(No filename) (1.65 kB)
dma_thread.c (6.43 kB)
Download all attachments

2009-01-29 04:21:31

by Michael Kerrisk

[permalink] [raw]
Subject: Re: open(2) says O_DIRECT works on 512 byte boundries?




On Thu, Jan 29, 2009 at 2:17 PM, Greg KH <[email protected]> wrote:
>
>
>
> On Wed, Jan 28, 2009 at 06:41:49PM -0600, Robert Hancock wrote:
>>
>>
>> Greg KH wrote:
>>> In looking at open(2), it says that O_DIRECT works on 512 byte boundries
>>> with the 2.6 kernel release:
>>> Under Linux 2.4, transfer sizes, and the alignment of the user
>>> buffer and the file offset must all be multiples of the logical
>>> block size of the file system. Under Linux 2.6, alignment to
>>> 512-byte boundaries suffices.
>>> However if you try to access an O_DIRECT opened file with a buffer that
>>> is PAGE_SIZE aligned + 512 bytes, it fails in a bad way (wrong data is
>>> read.)
>>> Is this just a mistake in the documentation? Or am I reading it
>>> incorrectly?
>>> I have a test program that shows this if anyone wants it.
>>
>> Well, it sounds like a bug to me.. even if it's not supported, if you do
>> such an access, surely the kernel should detect that and return EINVAL or
>> something rather than reading corrupted data..
>
> It doesn't. It says the read is successful, yet the data is not really
> read into the buffer. Portions of it is, but not the amount we asked
> for.

Greg,

Can you post your test program?

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

2009-01-29 05:15:00

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: open(2) says O_DIRECT works on 512 byte boundries?

On Wed, 28 Jan 2009 13:33:22 -0800
Greg KH <[email protected]> wrote:

> In looking at open(2), it says that O_DIRECT works on 512 byte boundries
> with the 2.6 kernel release:
> Under Linux 2.4, transfer sizes, and the alignment of the user
> buffer and the file offset must all be multiples of the logical
> block size of the file system. Under Linux 2.6, alignment to
> 512-byte boundaries suffices.
>
> However if you try to access an O_DIRECT opened file with a buffer that
> is PAGE_SIZE aligned + 512 bytes, it fails in a bad way (wrong data is
> read.)
>

IIUC, it's not related to 512bytes boundary. Just a race between
direct-io v.s. copy-on-write. Copy-on-Write while reading a page via DIO
is a problem.

Maybe it's true that if buffer is aligned to page size, no copy-on-write will
happen in usual program. But assuming HugeTLB page, which does Copy-on-Write,
data corruption will happen again. HugeTLB aligned buffer is nonsense.

Thanks,
-Kame


> Is this just a mistake in the documentation? Or am I reading it
> incorrectly?
>
> I have a test program that shows this if anyone wants it.
>
> thanks,
>
> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2009-01-29 07:10:55

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: open(2) says O_DIRECT works on 512 byte boundries?

(CC to andrea)

> On Wed, 28 Jan 2009 13:33:22 -0800
> Greg KH <[email protected]> wrote:
>
> > In looking at open(2), it says that O_DIRECT works on 512 byte boundries
> > with the 2.6 kernel release:
> > Under Linux 2.4, transfer sizes, and the alignment of the user
> > buffer and the file offset must all be multiples of the logical
> > block size of the file system. Under Linux 2.6, alignment to
> > 512-byte boundaries suffices.
> >
> > However if you try to access an O_DIRECT opened file with a buffer that
> > is PAGE_SIZE aligned + 512 bytes, it fails in a bad way (wrong data is
> > read.)
> >
>
> IIUC, it's not related to 512bytes boundary. Just a race between
> direct-io v.s. copy-on-write. Copy-on-Write while reading a page via DIO
> is a problem.

Yes.
Greg's reproducer is a bit misleading.

> for (j = 0; j < workers; j++) {
> worker[j].offset = offset + j * PAGE_SIZE;
> worker[j].buffer = buffer + align + j * PAGE_SIZE;
> worker[j].length = PAGE_SIZE;
> }

this code mean,
- if align == 0, reader thread touch only one page.
and the page is touched only one thread.
- if align != 0, reader thread touch two page.
and the page is touched two thread.

then, race is happend if align != 0.
We discussed this issue with andrea last month.
("Corruption with O_DIRECT and unaligned user buffers" thread)

As far as I know, he is working on fixing this issue now.


>
> Maybe it's true that if buffer is aligned to page size, no copy-on-write will
> happen in usual program. But assuming HugeTLB page, which does Copy-on-Write,
> data corruption will happen again. HugeTLB aligned buffer is nonsense.




2009-01-29 15:40:36

by Jeff Moyer

[permalink] [raw]
Subject: Re: open(2) says O_DIRECT works on 512 byte boundries?

Greg KH <[email protected]> writes:

> On Thu, Jan 29, 2009 at 03:59:12PM +1300, Michael Kerrisk wrote:
>> On Thu, Jan 29, 2009 at 2:17 PM, Greg KH <[email protected]> wrote:
>> >
>> >
>> >
>> > On Wed, Jan 28, 2009 at 06:41:49PM -0600, Robert Hancock wrote:
>> >>
>> >>
>> >> Greg KH wrote:
>> >>> In looking at open(2), it says that O_DIRECT works on 512 byte boundries
>> >>> with the 2.6 kernel release:
>> >>> Under Linux 2.4, transfer sizes, and the alignment of the user
>> >>> buffer and the file offset must all be multiples of the logical
>> >>> block size of the file system. Under Linux 2.6, alignment to
>> >>> 512-byte boundaries suffices.
>> >>> However if you try to access an O_DIRECT opened file with a buffer that
>> >>> is PAGE_SIZE aligned + 512 bytes, it fails in a bad way (wrong data is
>> >>> read.)
>> >>> Is this just a mistake in the documentation? Or am I reading it
>> >>> incorrectly?
>> >>> I have a test program that shows this if anyone wants it.
>> >>
>> >> Well, it sounds like a bug to me.. even if it's not supported, if you do
>> >> such an access, surely the kernel should detect that and return EINVAL or
>> >> something rather than reading corrupted data..
>> >
>> > It doesn't. It says the read is successful, yet the data is not really
>> > read into the buffer. Portions of it is, but not the amount we asked
>> > for.
>>
>> Greg,
>>
>> Can you post your test program?
>
> Sure, here it is. I'm still not quite sure it is valid, but at first
> glance it seems to be.
>
> Run it once with no arguments and all of the files will be created.
> Then run it again with no offset being asked for:
> ./dma_thread -a 0
> then with an offset:
> ./dma_thread -a 512
>
> The second one breaks.

There are several folks working on this. See "Corruption with O_DIRECT
and unaligned user buffers" on the linux-fsdevel list. There is also a
Red Hat bugzilla for this (471613) that several folks have been working
through.

Cheers,
Jeff

2009-01-30 06:19:15

by Greg KH

[permalink] [raw]
Subject: Re: open(2) says O_DIRECT works on 512 byte boundries?

On Thu, Jan 29, 2009 at 10:40:22AM -0500, Jeff Moyer wrote:
> Greg KH <[email protected]> writes:
>
> > On Thu, Jan 29, 2009 at 03:59:12PM +1300, Michael Kerrisk wrote:
> >> On Thu, Jan 29, 2009 at 2:17 PM, Greg KH <[email protected]> wrote:
> >> >
> >> >
> >> >
> >> > On Wed, Jan 28, 2009 at 06:41:49PM -0600, Robert Hancock wrote:
> >> >>
> >> >>
> >> >> Greg KH wrote:
> >> >>> In looking at open(2), it says that O_DIRECT works on 512 byte boundries
> >> >>> with the 2.6 kernel release:
> >> >>> Under Linux 2.4, transfer sizes, and the alignment of the user
> >> >>> buffer and the file offset must all be multiples of the logical
> >> >>> block size of the file system. Under Linux 2.6, alignment to
> >> >>> 512-byte boundaries suffices.
> >> >>> However if you try to access an O_DIRECT opened file with a buffer that
> >> >>> is PAGE_SIZE aligned + 512 bytes, it fails in a bad way (wrong data is
> >> >>> read.)
> >> >>> Is this just a mistake in the documentation? Or am I reading it
> >> >>> incorrectly?
> >> >>> I have a test program that shows this if anyone wants it.
> >> >>
> >> >> Well, it sounds like a bug to me.. even if it's not supported, if you do
> >> >> such an access, surely the kernel should detect that and return EINVAL or
> >> >> something rather than reading corrupted data..
> >> >
> >> > It doesn't. It says the read is successful, yet the data is not really
> >> > read into the buffer. Portions of it is, but not the amount we asked
> >> > for.
> >>
> >> Greg,
> >>
> >> Can you post your test program?
> >
> > Sure, here it is. I'm still not quite sure it is valid, but at first
> > glance it seems to be.
> >
> > Run it once with no arguments and all of the files will be created.
> > Then run it again with no offset being asked for:
> > ./dma_thread -a 0
> > then with an offset:
> > ./dma_thread -a 512
> >
> > The second one breaks.
>
> There are several folks working on this. See "Corruption with O_DIRECT
> and unaligned user buffers" on the linux-fsdevel list. There is also a
> Red Hat bugzilla for this (471613) that several folks have been working
> through.

Thanks for the pointers to that, I'll follow along with it.

greg k-h

2009-01-30 06:19:30

by Greg KH

[permalink] [raw]
Subject: Re: open(2) says O_DIRECT works on 512 byte boundries?

On Thu, Jan 29, 2009 at 04:10:39PM +0900, KOSAKI Motohiro wrote:
> (CC to andrea)
>
> > On Wed, 28 Jan 2009 13:33:22 -0800
> > Greg KH <[email protected]> wrote:
> >
> > > In looking at open(2), it says that O_DIRECT works on 512 byte boundries
> > > with the 2.6 kernel release:
> > > Under Linux 2.4, transfer sizes, and the alignment of the user
> > > buffer and the file offset must all be multiples of the logical
> > > block size of the file system. Under Linux 2.6, alignment to
> > > 512-byte boundaries suffices.
> > >
> > > However if you try to access an O_DIRECT opened file with a buffer that
> > > is PAGE_SIZE aligned + 512 bytes, it fails in a bad way (wrong data is
> > > read.)
> > >
> >
> > IIUC, it's not related to 512bytes boundary. Just a race between
> > direct-io v.s. copy-on-write. Copy-on-Write while reading a page via DIO
> > is a problem.
>
> Yes.
> Greg's reproducer is a bit misleading.
>
> > for (j = 0; j < workers; j++) {
> > worker[j].offset = offset + j * PAGE_SIZE;
> > worker[j].buffer = buffer + align + j * PAGE_SIZE;
> > worker[j].length = PAGE_SIZE;
> > }
>
> this code mean,
> - if align == 0, reader thread touch only one page.
> and the page is touched only one thread.
> - if align != 0, reader thread touch two page.
> and the page is touched two thread.
>
> then, race is happend if align != 0.
> We discussed this issue with andrea last month.
> ("Corruption with O_DIRECT and unaligned user buffers" thread)
>
> As far as I know, he is working on fixing this issue now.

Thanks for the pointers, I'll go read the thread and follow up there.

greg k-h