2003-02-12 21:09:35

by Bruno Diniz de Paula

[permalink] [raw]
Subject: O_DIRECT foolish question

Hi,

I am trying to use O_DIRECT to read ordinary files and read syscall
always returns 0, unless when the file size equals the fs block size. Is
it true that I can only use O_DIRECT when the size of the file written
in the inode is a multiple of block size?

Thanks and excuse me for the newbie question,

Bruno.
--
Bruno Diniz de Paula <[email protected]>
Rutgers University


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2003-02-12 21:57:39

by Chris Wedgwood

[permalink] [raw]
Subject: Re: O_DIRECT foolish question

On Wed, Feb 12, 2003 at 04:19:24PM -0500, Bruno Diniz de Paula wrote:

> I am trying to use O_DIRECT to read ordinary files and read syscall
> always returns 0, unless when the file size equals the fs block
> size.

Sounds correct.

> Is it true that I can only use O_DIRECT when the size of the file
> written in the inode is a multiple of block size?

You usually can only do O_DIRECT reads/writes in multiples of the
block size (or in some cases multiples of 512-bytes, but I'm not sure
of that code is still about though).

It depends on the filesystem to some extent.


--cw

2003-02-12 21:55:07

by Andrew Morton

[permalink] [raw]
Subject: Re: O_DIRECT foolish question

Bruno Diniz de Paula <[email protected]> wrote:
>
> Hi,
>
> I am trying to use O_DIRECT to read ordinary files and read syscall
> always returns 0, unless when the file size equals the fs block size.

It should be returning -1, with errno set to EINVAL.

> Is
> it true that I can only use O_DIRECT when the size of the file written
> in the inode is a multiple of block size?
>

The file can be of any size - the kernel will zero-fill any remaining bytes.

The address and length which you pass into the read() or write() system call
must both be a multiple of the filesystem block size.

It is always safe to just use the machine's page size for alignment
calculations - no filesystem has a blocksize larger than the pagesize.

A good way to do this is to run getpagesize(), and to then malloc a buffer
which is one page larger than you need. Then round that address up to the
next page boundary. And perform I/O into that memory with
multiple-of-page-size requests.



In the 2.5 kernel the "must be a multiple of blocksize" requirement was
relaxed. We now support alignments and lengths down to the minimum which is
supported by the underlying device. Typically 512 bytes, but not always.

Portable applications should not assume that 512-byte alignment is supported.
They should query the device's aligment requirements via the BLKSSZGET ioctl
against (say) /dev/hda1. Or they can simply try 512, 1024, 2048, ... at
initialisation time.

2003-02-12 22:20:03

by Bruno Diniz de Paula

[permalink] [raw]
Subject: Re: O_DIRECT foolish question

On Wed, 2003-02-12 at 17:03, Andrew Morton wrote:
> Bruno Diniz de Paula <[email protected]> wrote:
> >
> > Hi,
> >
> > I am trying to use O_DIRECT to read ordinary files and read syscall
> > always returns 0, unless when the file size equals the fs block size.
>
> It should be returning -1, with errno set to EINVAL.

But I am using multiples of page size in both buffer alignment and
buffer size (2nd and 3rd parameters of read). The issue is that when I
try to read files with sizes that are NOT multiples of block size (and
therefore also not multiples of page size), the read syscall returns 0,
with no errors. With files of size 4096, 8192 etc, everything works
fine. The errors shouldn't occur indeed, as I am using the correct
alignment and size to read. So the question remains, am I able to read
just files whose size is a multiple of block size?

Thanks,

Bruno.

PS: I am running 2.4.20...

>
> > Is
> > it true that I can only use O_DIRECT when the size of the file written
> > in the inode is a multiple of block size?
> >
>
> The file can be of any size - the kernel will zero-fill any remaining bytes.
>
> The address and length which you pass into the read() or write() system call
> must both be a multiple of the filesystem block size.
>
> It is always safe to just use the machine's page size for alignment
> calculations - no filesystem has a blocksize larger than the pagesize.
>
> A good way to do this is to run getpagesize(), and to then malloc a buffer
> which is one page larger than you need. Then round that address up to the
> next page boundary. And perform I/O into that memory with
> multiple-of-page-size requests.
>
>
>
> In the 2.5 kernel the "must be a multiple of blocksize" requirement was
> relaxed. We now support alignments and lengths down to the minimum which is
> supported by the underlying device. Typically 512 bytes, but not always.
>
> Portable applications should not assume that 512-byte alignment is supported.
> They should query the device's aligment requirements via the BLKSSZGET ioctl
> against (say) /dev/hda1. Or they can simply try 512, 1024, 2048, ... at
> initialisation time.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Bruno Diniz de Paula <[email protected]>
Rutgers University


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2003-02-12 22:32:37

by Chris Wedgwood

[permalink] [raw]
Subject: Re: O_DIRECT foolish question

On Wed, Feb 12, 2003 at 05:29:52PM -0500, Bruno Diniz de Paula wrote:

> But I am using multiples of page size in both buffer alignment and
> buffer size (2nd and 3rd parameters of read). The issue is that
> when I try to read files with sizes that are NOT multiples of block
> size (and therefore also not multiples of page size), the read
> syscall returns 0, with no errors.

What filesystem?

Can you send an strace of this occurring?

> So the question remains, am I able to read just files whose size is
> a multiple of block size?

No.

You ideally should be able to read any length file with O_DIRECT.
Even a 1-byte file.



--cw

2003-02-12 22:53:11

by Bruno Diniz de Paula

[permalink] [raw]
Subject: Re: O_DIRECT foolish question

On Wed, 2003-02-12 at 17:42, Chris Wedgwood wrote:
> On Wed, Feb 12, 2003 at 05:29:52PM -0500, Bruno Diniz de Paula wrote:
>
> > But I am using multiples of page size in both buffer alignment and
> > buffer size (2nd and 3rd parameters of read). The issue is that
> > when I try to read files with sizes that are NOT multiples of block
> > size (and therefore also not multiples of page size), the read
> > syscall returns 0, with no errors.
>
> What filesystem?

ext2.

>
> Can you send an strace of this occurring?

execve("./testopen", ["./testopen"], [/* 30 vars */]) = 0
uname({sys="Linux", node="urca", ...}) = 0
brk(0) = 0x80497fc
open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=57677, ...}) = 0
old_mmap(NULL, 57677, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40012000
close(3) = 0
open("/lib/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0]Z\1\000"...,
1024) = 1024
fstat64(3, {st_mode=S_IFREG|0755, st_size=1102984, ...}) = 0
old_mmap(NULL, 1112740, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) =
0x40021000
mprotect(0x40129000, 31396, PROT_NONE) = 0
old_mmap(0x40129000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED,
3, 0x107000) = 0x40129000
old_mmap(0x4012f000, 6820, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4012f000
close(3) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x40131000
munmap(0x40012000, 57677) = 0
open("/var/tmp/testopen.txt", O_RDONLY|O_DIRECT) = 3
brk(0) = 0x80497fc
brk(0x804c7fc) = 0x804c7fc
brk(0) = 0x804c7fc
brk(0x804d000) = 0x804d000
read(3, "", 4096) = 0
fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 4), ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x40012000
write(1, "0 bytes read from file.\n", 240 bytes read from file.
) = 24
close(3) = 0
write(1, "Message: ", 9Message: ) = 9
munmap(0x40012000, 4096) = 0
exit_group(0) = ?

Thanks a lot,

Bruno.

>
> > So the question remains, am I able to read just files whose size is
> > a multiple of block size?
>
> No.
>
> You ideally should be able to read any length file with O_DIRECT.
> Even a 1-byte file.
>
>
>
> --cw
--
Bruno Diniz de Paula <[email protected]>
Rutgers University


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2003-02-12 23:14:54

by Chris Wedgwood

[permalink] [raw]
Subject: Re: O_DIRECT foolish question

On Wed, Feb 12, 2003 at 06:02:58PM -0500, Bruno Diniz de Paula wrote:

> ext2.

are you able to test with another fs? (reiserfs and XFS also support
O_DIRECT)

> read(3, "", 4096) = 0

odd... I'm not sure why you get this

i tested locally here and it works as expected ... my test code is
appended.


--cw


Attachments:
(No filename) (331.00 B)
od.c (422.00 B)
Download all attachments

2003-02-12 23:12:47

by Bruno Diniz de Paula

[permalink] [raw]
Subject: Re: O_DIRECT foolish question

Just to complete the information, I am trying to read a file with 5
bytes, and here is the piece of code I am using:

char *message;
int fd = open("/var/tmp/testopen.txt", O_RDONLY|O_DIRECT);
int len, pagesize = getpagesize();

posix_memalign((void **)&message, pagesize, pagesize);
if(fd < 0) {
printf("Unable to open file, errno is %d.\n", errno);
} else {
if((len = read(fd, message, pagesize)) < 0) {
perror("read");
} else {
printf("%d bytes read from file.\n", len);
printf("Message: %s", message);
}
}
close(fd);

Thanks,

Bruno.

On Wed, 2003-02-12 at 18:02, Bruno Diniz de Paula wrote:
> On Wed, 2003-02-12 at 17:42, Chris Wedgwood wrote:
> > On Wed, Feb 12, 2003 at 05:29:52PM -0500, Bruno Diniz de Paula wrote:
> >
> > > But I am using multiples of page size in both buffer alignment and
> > > buffer size (2nd and 3rd parameters of read). The issue is that
> > > when I try to read files with sizes that are NOT multiples of block
> > > size (and therefore also not multiples of page size), the read
> > > syscall returns 0, with no errors.
> >
> > What filesystem?
>
> ext2.
>
> >
> > Can you send an strace of this occurring?
>
> execve("./testopen", ["./testopen"], [/* 30 vars */]) = 0
> uname({sys="Linux", node="urca", ...}) = 0
> brk(0) = 0x80497fc
> open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or
> directory)
> open("/etc/ld.so.cache", O_RDONLY) = 3
> fstat64(3, {st_mode=S_IFREG|0644, st_size=57677, ...}) = 0
> old_mmap(NULL, 57677, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40012000
> close(3) = 0
> open("/lib/libc.so.6", O_RDONLY) = 3
> read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0]Z\1\000"...,
> 1024) = 1024
> fstat64(3, {st_mode=S_IFREG|0755, st_size=1102984, ...}) = 0
> old_mmap(NULL, 1112740, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) =
> 0x40021000
> mprotect(0x40129000, 31396, PROT_NONE) = 0
> old_mmap(0x40129000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED,
> 3, 0x107000) = 0x40129000
> old_mmap(0x4012f000, 6820, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4012f000
> close(3) = 0
> old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> -1, 0) = 0x40131000
> munmap(0x40012000, 57677) = 0
> open("/var/tmp/testopen.txt", O_RDONLY|O_DIRECT) = 3
> brk(0) = 0x80497fc
> brk(0x804c7fc) = 0x804c7fc
> brk(0) = 0x804c7fc
> brk(0x804d000) = 0x804d000
> read(3, "", 4096) = 0
> fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 4), ...}) = 0
> old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> -1, 0) = 0x40012000
> write(1, "0 bytes read from file.\n", 240 bytes read from file.
> ) = 24
> close(3) = 0
> write(1, "Message: ", 9Message: ) = 9
> munmap(0x40012000, 4096) = 0
> exit_group(0) = ?
>
> Thanks a lot,
>
> Bruno.
>
> >
> > > So the question remains, am I able to read just files whose size is
> > > a multiple of block size?
> >
> > No.
> >
> > You ideally should be able to read any length file with O_DIRECT.
> > Even a 1-byte file.
> >
> >
> >
> > --cw
--
Bruno Diniz de Paula <[email protected]>
Rutgers University


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2003-02-12 23:23:44

by Chris Wedgwood

[permalink] [raw]
Subject: Re: O_DIRECT foolish question

On Wed, Feb 12, 2003 at 03:24:43PM -0800, Chris Wedgwood wrote:

> i tested locally here and it works as expected ... my test code is
> appended.

btw, edit the args to open for test with as i was messing about before
i sent this


--cw


2003-02-12 23:23:35

by Bruno Diniz de Paula

[permalink] [raw]
Subject: Re: O_DIRECT foolish question

On Wed, 2003-02-12 at 18:24, Chris Wedgwood wrote:
> On Wed, Feb 12, 2003 at 06:02:58PM -0500, Bruno Diniz de Paula wrote:
>
> > ext2.
>
> are you able to test with another fs? (reiserfs and XFS also support
> O_DIRECT)

Unfortunately not, I just have ext2 partitions here...

>
> > read(3, "", 4096) = 0
>
> odd... I'm not sure why you get this
>
> i tested locally here and it works as expected ... my test code is
> appended.

But your code doesn't use O_DIRECT:

if ((h = open("test", O_RDONLY)) < 0)

Let me know whether including O_DIRECT the test worked.

Bruno.

>
>
> --cw
--
Bruno Diniz de Paula <[email protected]>
Rutgers University


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2003-02-12 23:28:58

by Chris Wedgwood

[permalink] [raw]
Subject: Re: O_DIRECT foolish question

On Wed, Feb 12, 2003 at 06:33:23PM -0500, Bruno Diniz de Paula wrote:

> But your code doesn't use O_DIRECT:

Sorry, you need to edit it (see my previous email). A better version
(appended) gives the following results.

cw:3@tapu(cw)$ cp od.c test
cw:3@tapu(cw)$ gcc -Wall od.c
cw:3@tapu(cw)$ ./a.out
read 503 bytes
read 0 bytes

> Let me know whether including O_DIRECT the test worked.

Seems to. I get 0 the 2nd time about, presumably this is EOF but
arguably it should return something else.

--cw


Attachments:
(No filename) (507.00 B)
od.c (503.00 B)
Download all attachments

2003-02-12 23:39:46

by Bruno Diniz de Paula

[permalink] [raw]
Subject: Re: O_DIRECT foolish question

On Wed, 2003-02-12 at 18:38, Chris Wedgwood wrote:

> Seems to. I get 0 the 2nd time about, presumably this is EOF but
> arguably it should return something else.

It didn't work for me. See the output:

diniz@urca:/var/tmp$ gcc -Wall od.c
diniz@urca:/var/tmp$ cp od.c test
diniz@urca:/var/tmp$ ./a.out
read 0 bytes
diniz@urca:/var/tmp$

What is your partition type? ext2?

Bruno.
--
Bruno Diniz de Paula <[email protected]>
Rutgers University


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2003-02-12 23:41:41

by Chris Wedgwood

[permalink] [raw]
Subject: Re: O_DIRECT foolish question

On Wed, Feb 12, 2003 at 06:49:35PM -0500, Bruno Diniz de Paula wrote:

> What is your partition type? ext2?

XFS.

I can't test e2fs right now as my test machine is running 2.5.60 where
it fails just as it does for you. I think both use generic_direct_IO
or whatever it's called so maybe I'll have a poke in there as to why
2.5.x is failing.



--cw

2003-02-13 00:26:31

by Bruno Diniz de Paula

[permalink] [raw]
Subject: Re: O_DIRECT foolish question

On Wed, 2003-02-12 at 19:13, Chris Wedgwood wrote:
> If I had to guess, write should work more or less the same as reads
> (ie. I should be able to write aligned-but-smaller-than-page-sized
> blocks to the end of files).
>
> Testing this however shows this is *not* the case.

This is not the case, I have also tested here and the file written has
n*block_size always. The problem with writing is that we can't sign to
the kernel that the actual data has finished and from that point on it
should zero-fill the bytes. And what is worse, the information about the
actual size is lost, since the write syscall will store what is passed
on the 3rd argument in the inode (field st_size of stat). This means
that after writing using O_DIRECT we can't read data correctly anymore.
The exception is when we write together with the data information about
the actual size and process disregarding information from stat, for
instance.

Well, I am sure I am completely wrong because this doesn't make any
sense for me. Someone that has already dealt with this and can bring a
light to the discussion?

Thanks,

Bruno.

>
> Now, this *might* actually be the right thing to do ... if we allow
> 'small writes' how do we deal with larger writes once the file-write
> position is messed up?
>
> Heh... tricky stuff. Though required.
>
>
>
> --cw
--
Bruno Diniz de Paula <[email protected]>
Rutgers University


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2003-02-13 01:39:35

by Randy.Dunlap

[permalink] [raw]
Subject: Re: O_DIRECT foolish question

On 12 Feb 2003 18:22:35 -0500
Bruno Diniz de Paula <[email protected]> wrote:

| Just to complete the information, I am trying to read a file with 5
| bytes, and here is the piece of code I am using:
|
| char *message;
| int fd = open("/var/tmp/testopen.txt", O_RDONLY|O_DIRECT);
| int len, pagesize = getpagesize();
|
| posix_memalign((void **)&message, pagesize, pagesize);
| if(fd < 0) {
| printf("Unable to open file, errno is %d.\n", errno);
| } else {
| if((len = read(fd, message, pagesize)) < 0) {
| perror("read");
| } else {
| printf("%d bytes read from file.\n", len);
| printf("Message: %s", message);
| }
| }
| close(fd);
|
| Thanks,
|
| Bruno.
|
| On Wed, 2003-02-12 at 18:02, Bruno Diniz de Paula wrote:
| > On Wed, 2003-02-12 at 17:42, Chris Wedgwood wrote:
| > > On Wed, Feb 12, 2003 at 05:29:52PM -0500, Bruno Diniz de Paula wrote:
| > >
| > > > But I am using multiples of page size in both buffer alignment and
| > > > buffer size (2nd and 3rd parameters of read). The issue is that
| > > > when I try to read files with sizes that are NOT multiples of block
| > > > size (and therefore also not multiples of page size), the read
| > > > syscall returns 0, with no errors.
| > >
| > > What filesystem?
| >
| > ext2.
| >
| > >
| > > Can you send an strace of this occurring?
| >
[strace snipped]
| >
| > Thanks a lot,
| >
| > Bruno.
| >
| > >
| > > > So the question remains, am I able to read just files whose size is
| > > > a multiple of block size?
| > >
| > > No.
| > >
| > > You ideally should be able to read any length file with O_DIRECT.
| > > Even a 1-byte file.



Here's what I get using Bruno's and cw's (od) programs:

2.4.8|2.4.20 2.4.20 2.5.54
ext2 ext3 ext2|ext3
==== ==== =========
od: read 0 bytes read: Inv. arg. read: Inv. arg.
bruno: 0 bytes read read: Inv. arg. read: Inv. arg.

--
~Randy

2003-02-13 05:02:27

by Andrew Morton

[permalink] [raw]
Subject: Re: O_DIRECT foolish question

Bruno Diniz de Paula <[email protected]> wrote:
>
> On Wed, 2003-02-12 at 19:13, Chris Wedgwood wrote:
> > If I had to guess, write should work more or less the same as reads
> > (ie. I should be able to write aligned-but-smaller-than-page-sized
> > blocks to the end of files).
> >
> > Testing this however shows this is *not* the case.
>
> This is not the case, I have also tested here and the file written has
> n*block_size always. The problem with writing is that we can't sign to
> the kernel that the actual data has finished and from that point on it
> should zero-fill the bytes. And what is worse, the information about the
> actual size is lost, since the write syscall will store what is passed
> on the 3rd argument in the inode (field st_size of stat). This means
> that after writing using O_DIRECT we can't read data correctly anymore.
> The exception is when we write together with the data information about
> the actual size and process disregarding information from stat, for
> instance.
>
> Well, I am sure I am completely wrong because this doesn't make any
> sense for me. Someone that has already dealt with this and can bring a
> light to the discussion?
>

For writes, I don't think it is reasonable for the kernel to be have to
handle byte-granular appends. O_DIRECT is different. For this case the
application should ftruncate the file back to the desired size prior to
closing it.

For the short reads at EOF, the 2.4 kernel refuses to read anything, and
returns zero. The 2.5 kernel will return -EINVAL, which is better behaviour
(shouldn't make it just look like the file is shorter than it really is).

The ideal behaviour is that which I mistakenly described previously: we
should fill with zeroes and return the partial result. I'll look at
converting 2.5 to do that. As long as the changes are small - the direct-io
code does a ton of stuff, is complex, is not tested a lot and breakage tends
to be subtle.

2003-02-13 15:12:15

by Bruno Diniz de Paula

[permalink] [raw]
Subject: Re: O_DIRECT foolish question

Thanks, Andrew. So, no chances of getting this working correctly on 2.4
kernel for now (I mean, reading files with size != n*block_size), and
I'd better give up on this... Is it the case, or you think there is
still something to do to get this working on ext2 and 2.4 kernel?

Bruno.

On Thu, 2003-02-13 at 00:12, Andrew Morton wrote:
> Bruno Diniz de Paula <[email protected]> wrote:
> >
> > On Wed, 2003-02-12 at 19:13, Chris Wedgwood wrote:
> > > If I had to guess, write should work more or less the same as reads
> > > (ie. I should be able to write aligned-but-smaller-than-page-sized
> > > blocks to the end of files).
> > >
> > > Testing this however shows this is *not* the case.
> >
> > This is not the case, I have also tested here and the file written has
> > n*block_size always. The problem with writing is that we can't sign to
> > the kernel that the actual data has finished and from that point on it
> > should zero-fill the bytes. And what is worse, the information about the
> > actual size is lost, since the write syscall will store what is passed
> > on the 3rd argument in the inode (field st_size of stat). This means
> > that after writing using O_DIRECT we can't read data correctly anymore.
> > The exception is when we write together with the data information about
> > the actual size and process disregarding information from stat, for
> > instance.
> >
> > Well, I am sure I am completely wrong because this doesn't make any
> > sense for me. Someone that has already dealt with this and can bring a
> > light to the discussion?
> >
>
> For writes, I don't think it is reasonable for the kernel to be have to
> handle byte-granular appends. O_DIRECT is different. For this case the
> application should ftruncate the file back to the desired size prior to
> closing it.
>
> For the short reads at EOF, the 2.4 kernel refuses to read anything, and
> returns zero. The 2.5 kernel will return -EINVAL, which is better behaviour
> (shouldn't make it just look like the file is shorter than it really is).
>
> The ideal behaviour is that which I mistakenly described previously: we
> should fill with zeroes and return the partial result. I'll look at
> converting 2.5 to do that. As long as the changes are small - the direct-io
> code does a ton of stuff, is complex, is not tested a lot and breakage tends
> to be subtle.
--
Bruno Diniz de Paula <[email protected]>
Rutgers University


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2003-02-13 17:22:26

by Andrew Morton

[permalink] [raw]
Subject: Re: O_DIRECT foolish question

Bruno Diniz de Paula <[email protected]> wrote:
>
> Thanks, Andrew. So, no chances of getting this working correctly on 2.4
> kernel for now (I mean, reading files with size != n*block_size), and
> I'd better give up on this... Is it the case, or you think there is
> still something to do to get this working on ext2 and 2.4 kernel?
>

Oh I think we can probably fix this up. Can you test this diff?


diff -puN fs/buffer.c~o_direct-length-fix fs/buffer.c
--- 24/fs/buffer.c~o_direct-length-fix 2003-02-13 09:23:34.000000000 -0800
+++ 24-akpm/fs/buffer.c 2003-02-13 09:24:39.000000000 -0800
@@ -2107,7 +2107,7 @@ int generic_direct_IO(int rw, struct ino
int length;

length = iobuf->length;
- nr_blocks = length / blocksize;
+ nr_blocks = (length + blocksize - 1) / blocksize;
/* build the blocklist */
for (i = 0; i < nr_blocks; i++, blocknr++) {
struct buffer_head bh;
@@ -2148,6 +2148,10 @@ int generic_direct_IO(int rw, struct ino
retval = brw_kiovec(rw, 1, &iobuf, inode->i_dev, iobuf->blocks, blocksize);
/* restore orig length */
iobuf->length = length;
+
+ /* Return correct value for reads at eof */
+ if (retval > 0 && retval > length)
+ retval = length;
out:

return retval;

_

2003-02-13 22:35:29

by Bruno Diniz de Paula

[permalink] [raw]
Subject: Re: O_DIRECT foolish question

Hi Andrew,

it worked perfectly on my box. Now I am going to try in my experiments
environment and I'll let you know if everything was ok.

Thanks a lot,

Bruno.

PS: BTW, is this patch going to be added to 2.4 kernel?

On Thu, 2003-02-13 at 12:31, Andrew Morton wrote:
> Bruno Diniz de Paula <[email protected]> wrote:
> >
> > Thanks, Andrew. So, no chances of getting this working correctly on 2.4
> > kernel for now (I mean, reading files with size != n*block_size), and
> > I'd better give up on this... Is it the case, or you think there is
> > still something to do to get this working on ext2 and 2.4 kernel?
> >
>
> Oh I think we can probably fix this up. Can you test this diff?
>
>
> diff -puN fs/buffer.c~o_direct-length-fix fs/buffer.c
> --- 24/fs/buffer.c~o_direct-length-fix 2003-02-13 09:23:34.000000000 -0800
> +++ 24-akpm/fs/buffer.c 2003-02-13 09:24:39.000000000 -0800
> @@ -2107,7 +2107,7 @@ int generic_direct_IO(int rw, struct ino
> int length;
>
> length = iobuf->length;
> - nr_blocks = length / blocksize;
> + nr_blocks = (length + blocksize - 1) / blocksize;
> /* build the blocklist */
> for (i = 0; i < nr_blocks; i++, blocknr++) {
> struct buffer_head bh;
> @@ -2148,6 +2148,10 @@ int generic_direct_IO(int rw, struct ino
> retval = brw_kiovec(rw, 1, &iobuf, inode->i_dev, iobuf->blocks, blocksize);
> /* restore orig length */
> iobuf->length = length;
> +
> + /* Return correct value for reads at eof */
> + if (retval > 0 && retval > length)
> + retval = length;
> out:
>
> return retval;
>
> _
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Bruno Diniz de Paula <[email protected]>
Rutgers University


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part