2002-08-29 11:00:35

by Peter T. Breuer

[permalink] [raw]
Subject: O_DIRECT

What functions does a block driver have to implement in order to
support read/write when it has been opened with O_DIRECT from user
space.

I have made some experiments with plain read/write after opening with
O_DIRECT:

2.5.31:
/dev/ram0 open fails
file on ext2 r/w gives EINVAL
/dev/hdaN works

2.4.19:
/dev/ram0 r/w gives EINVAL
file on ext2 r/w gives EINVAL
/dev/hdaN r/w gives EINVAL

WTF? It's not a library issue - strace shows the syscalls happening
with the right flag set on the open.

Can someone put me out of my misery? Where the heck is this implemented
in the 2.5.31 ide code? If there? There's no mention of direct_IO.
Clues?

What I ultimately want is to know what code I have to put into a block
device driver in order to support O_DIRECT on the device.

Peter


2002-08-29 11:18:25

by Peter T. Breuer

[permalink] [raw]
Subject: Re: O_DIRECT

"A month of sundays ago [email protected] wrote:"
> > What functions does a block driver have to implement in order to
> > support read/write when it has been opened with O_DIRECT from user
> > space.
> >
> > I have made some experiments with plain read/write after opening with
> > O_DIRECT:
> >
> > 2.5.31:
> > /dev/ram0 open fails
> > file on ext2 r/w gives EINVAL
> > /dev/hdaN works
> >
> > 2.4.19:
> > /dev/ram0 r/w gives EINVAL
> > file on ext2 r/w gives EINVAL
> > /dev/hdaN r/w gives EINVAL
> >
> > WTF? It's not a library issue - strace shows the syscalls happening
> > with the right flag set on the open.
>
> You should be able to get it to work on ext2. It works fine for me.
> Remeber that the memory you read/write from must be page aligned (ie
> mmap /dev/zero not malloc) and reads and writes must be multiples of the
> page size. I think block devices work on 2.4 too, but I forget (otherwise
> you can use raw devices).

Thanks for the input. Well, I used the same test program on all, and
the buffer was aligned at 512 bytes (because I intended it to work
with raw character devices too). You are saying that it was luck .. OK,
I'll retest in a little while.

So I simply have to do "nothing" in the driver?

Peter

2002-08-29 12:10:40

by Peter T. Breuer

[permalink] [raw]
Subject: Re: O_DIRECT

"A month of sundays ago [email protected] wrote:"
> >
> > What functions does a block driver have to implement in order to
> > support read/write when it has been opened with O_DIRECT from user
> > space.

> Remeber that the memory you read/write from must be page aligned (ie
> mmap /dev/zero not malloc) and reads and writes must be multiples of the
> page size. I think block devices work on 2.4 too, but I forget (otherwise
> you can use raw devices).

I do believe you are right. Multiples of 4096 seem to work fine. No
support needed in the block device driver.

Peter

2003-06-27 10:21:12

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: O_DIRECT

Hi Alan,

On Fri, 2003-06-27 at 10:40, Stephen C. Tweedie wrote:

> On Thu, 2003-06-26 at 21:21, Alan Cox wrote:
> > So its now confirmed with 3 distros, two file systems and several
> > compilers. It certainly seems to be the O_DIRECT patches but I'll pull
> > the back out for the next -ac and check I guess

Ouch ouch ouch, there's nasty merge conflict between the O_DIRECT patch
and an existing 64-bit rlimit chunk in -ac3. You really, really want
the change below. :-) Marcelo's tree appears OK, and this is a common
code path for all filesystems in -ac, so it matches the failure patterns
that far.

Cheers,
Stephen

--- mm/filemap.c.~1~ 2003-06-27 09:58:08.000000000 +0100
+++ mm/filemap.c 2003-06-27 11:13:07.000000000 +0100
@@ -2995,8 +2995,8 @@
}
/* Fix this up when we got to rlimit64 */
if (pos > 0xFFFFFFFFULL)
- count = 0;
- else if(count > limit - (u32)pos) {
+ *count = 0;
+ else if(*count > limit - (u32)pos) {
/* send_sig(SIGXFSZ, current, 0); */
*count = limit - (u32)pos;
}


2003-06-27 10:52:48

by Alan Cox

[permalink] [raw]
Subject: Re: O_DIRECT

> Ouch ouch ouch, there's nasty merge conflict between the O_DIRECT patch
> and an existing 64-bit rlimit chunk in -ac3. You really, really want
> the change below. :-) Marcelo's tree appears OK, and this is a common
> code path for all filesystems in -ac, so it matches the failure patterns
> that far.

Ouch indeed - ok thats good, that means its not the O_DIRECT stuff. Thanks
for figuring it out