2005-10-17 08:52:30

by li nux

[permalink] [raw]
Subject: Re: A problem about DIRECT IO on ext3



--- Jens Axboe <[email protected]> wrote:

> On Mon, Aug 29 2005, Erik Mouw wrote:
> > There are four prerequisites for direct IO:
> > - the file needs to be opened with O_DIRECT
> > - the buffer needs to be page aligned (hint: use
> getpagesize() instead
> > of assuming that a page is 4k
> > - reads and writes need to happen *in* multiples
> of the soft block size
> > - reads and writes need to happen *at* multiples
> of the soft block size
>
> Actually, the buffer only needs to be hard block
> size aligned, same goes
> for the chunk size used for reads/writes.
>
> --
> Jens Axboe
>
On 2.4 the open call succeeds with O_DIRECT
but read returns -EINVAL for any block size (512, 1024
..16384)

open("/tmp/midstress_idx10",
O_RDWR|O_CREAT|O_DIRECT|O_LARGEFILE, 01001101270) = 4
read(3, 0xbfffdc40, 16384) = -1 EINVAL (Invalid
argument)

how to correct this problem ?




__________________________________
Start your day with Yahoo! - Make it your home page!
http://www.yahoo.com/r/hs


2005-10-17 08:58:10

by li nux

[permalink] [raw]
Subject: Re: A problem about DIRECT IO on ext3



--- li nux <[email protected]> wrote:

>
>
> --- Jens Axboe <[email protected]> wrote:
>
> > On Mon, Aug 29 2005, Erik Mouw wrote:
> > > There are four prerequisites for direct IO:
> > > - the file needs to be opened with O_DIRECT
> > > - the buffer needs to be page aligned (hint: use
> > getpagesize() instead
> > > of assuming that a page is 4k
> > > - reads and writes need to happen *in* multiples
> > of the soft block size
> > > - reads and writes need to happen *at* multiples
> > of the soft block size
> >
> > Actually, the buffer only needs to be hard block
> > size aligned, same goes
> > for the chunk size used for reads/writes.
> >
> > --
> > Jens Axboe
> >
On 2.4 the open call succeeds with O_DIRECT
but read returns -EINVAL for any block size (512,
1024 ..16384)

open("/tmp/midstress_idx10",
O_RDWR|O_CREAT|O_DIRECT|O_LARGEFILE, 01001101270) =
3
read(3, 0xbfffdc40, 16384) = -1 EINVAL (Invalid
argument)

how to correct this problem ?





__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com

2005-10-17 09:03:16

by Jens Axboe

[permalink] [raw]
Subject: Re: A problem about DIRECT IO on ext3

On Mon, Oct 17 2005, li nux wrote:
>
>
> --- Jens Axboe <[email protected]> wrote:
>
> > On Mon, Aug 29 2005, Erik Mouw wrote:
> > > There are four prerequisites for direct IO:
> > > - the file needs to be opened with O_DIRECT
> > > - the buffer needs to be page aligned (hint: use
> > getpagesize() instead
> > > of assuming that a page is 4k
> > > - reads and writes need to happen *in* multiples
> > of the soft block size
> > > - reads and writes need to happen *at* multiples
> > of the soft block size
> >
> > Actually, the buffer only needs to be hard block
> > size aligned, same goes
> > for the chunk size used for reads/writes.
> >
> > --
> > Jens Axboe
> >
> On 2.4 the open call succeeds with O_DIRECT
> but read returns -EINVAL for any block size (512, 1024
> ..16384)
>
> open("/tmp/midstress_idx10",
> O_RDWR|O_CREAT|O_DIRECT|O_LARGEFILE, 01001101270) = 4
> read(3, 0xbfffdc40, 16384) = -1 EINVAL (Invalid
> argument)
>
> how to correct this problem ?

See your buffer address, it's not aligned. You need to align that as
well. This is needed because the hardware will dma directly to the user
buffer, and to be on the safe side we require the same alignment as the
block layer will normally generate for file system io.

So in short, just align your read buffer to the same as your block size
and you will be fine. Example:

#define BS (4096)
#define MASK (BS - 1)
#define ALIGN(buf) (((unsigned long) (buf) + MASK) & ~(MASK))

char *ptr = malloc(BS + MASK);
char *buf = (char *) ALIGN(ptr);

read(fd, buf, BS);

--
Jens Axboe

2005-10-17 09:12:35

by Grzegorz Kulewski

[permalink] [raw]
Subject: Re: A problem about DIRECT IO on ext3

On Mon, 17 Oct 2005, Jens Axboe wrote:
>> how to correct this problem ?
>
> See your buffer address, it's not aligned. You need to align that as
> well. This is needed because the hardware will dma directly to the user
> buffer, and to be on the safe side we require the same alignment as the
> block layer will normally generate for file system io.
>
> So in short, just align your read buffer to the same as your block size
> and you will be fine. Example:
>
> #define BS (4096)
> #define MASK (BS - 1)
> #define ALIGN(buf) (((unsigned long) (buf) + MASK) & ~(MASK))
>
> char *ptr = malloc(BS + MASK);
> char *buf = (char *) ALIGN(ptr);
>
> read(fd, buf, BS);

Shouldn't one use posix_memalign(3) for that?


Grzegorz Kulewski

2005-10-17 09:16:38

by Jens Axboe

[permalink] [raw]
Subject: Re: A problem about DIRECT IO on ext3

On Mon, Oct 17 2005, Grzegorz Kulewski wrote:
> On Mon, 17 Oct 2005, Jens Axboe wrote:
> >>how to correct this problem ?
> >
> >See your buffer address, it's not aligned. You need to align that as
> >well. This is needed because the hardware will dma directly to the user
> >buffer, and to be on the safe side we require the same alignment as the
> >block layer will normally generate for file system io.
> >
> >So in short, just align your read buffer to the same as your block size
> >and you will be fine. Example:
> >
> >#define BS (4096)
> >#define MASK (BS - 1)
> >#define ALIGN(buf) (((unsigned long) (buf) + MASK) & ~(MASK))
> >
> >char *ptr = malloc(BS + MASK);
> >char *buf = (char *) ALIGN(ptr);
> >
> >read(fd, buf, BS);
>
> Shouldn't one use posix_memalign(3) for that?

Dunno if one 'should', one 'can' if one wants to. I prefer to do it
manually so I don't have to jump through #define hoops to get at it
(which, btw, still doesn't expose it on this machine).

--
Jens Axboe

2005-10-17 09:41:42

by li nux

[permalink] [raw]
Subject: Re: A problem about DIRECT IO on ext3



--- Jens Axboe <[email protected]> wrote:

> On Mon, Oct 17 2005, Grzegorz Kulewski wrote:
> > On Mon, 17 Oct 2005, Jens Axboe wrote:
> > >>how to correct this problem ?
> > >
> > >See your buffer address, it's not aligned. You
> need to align that as
> > >well. This is needed because the hardware will
> dma directly to the user
> > >buffer, and to be on the safe side we require the
> same alignment as the
> > >block layer will normally generate for file
> system io.
> > >
> > >So in short, just align your read buffer to the
> same as your block size
> > >and you will be fine. Example:
> > >
> > >#define BS (4096)
> > >#define MASK (BS - 1)
> > >#define ALIGN(buf) (((unsigned long) (buf) +
> MASK) & ~(MASK))
> > >
> > >char *ptr = malloc(BS + MASK);
> > >char *buf = (char *) ALIGN(ptr);
> > >
> > >read(fd, buf, BS);
> >
> > Shouldn't one use posix_memalign(3) for that?
>
> Dunno if one 'should', one 'can' if one wants to. I
> prefer to do it
> manually so I don't have to jump through #define
> hoops to get at it
> (which, btw, still doesn't expose it on this
> machine).
>
> --
> Jens Axboe

Thanx a lot Jens :-)
Its working now.
I did not have to make these adjustments on 2.6
Is looks to be having more relaxation.

Can somebody please throw some light on how to find
your system's hard/soft block size ?




__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com

2005-10-17 09:50:54

by Jens Axboe

[permalink] [raw]
Subject: Re: A problem about DIRECT IO on ext3

On Mon, Oct 17 2005, li nux wrote:
>
>
> --- Jens Axboe <[email protected]> wrote:
>
> > On Mon, Oct 17 2005, Grzegorz Kulewski wrote:
> > > On Mon, 17 Oct 2005, Jens Axboe wrote:
> > > >>how to correct this problem ?
> > > >
> > > >See your buffer address, it's not aligned. You
> > need to align that as
> > > >well. This is needed because the hardware will
> > dma directly to the user
> > > >buffer, and to be on the safe side we require the
> > same alignment as the
> > > >block layer will normally generate for file
> > system io.
> > > >
> > > >So in short, just align your read buffer to the
> > same as your block size
> > > >and you will be fine. Example:
> > > >
> > > >#define BS (4096)
> > > >#define MASK (BS - 1)
> > > >#define ALIGN(buf) (((unsigned long) (buf) +
> > MASK) & ~(MASK))
> > > >
> > > >char *ptr = malloc(BS + MASK);
> > > >char *buf = (char *) ALIGN(ptr);
> > > >
> > > >read(fd, buf, BS);
> > >
> > > Shouldn't one use posix_memalign(3) for that?
> >
> > Dunno if one 'should', one 'can' if one wants to. I
> > prefer to do it
> > manually so I don't have to jump through #define
> > hoops to get at it
> > (which, btw, still doesn't expose it on this
> > machine).
> >
> > --
> > Jens Axboe
>
> Thanx a lot Jens :-)
> Its working now.
> I did not have to make these adjustments on 2.6
> Is looks to be having more relaxation.

2.6 does have the option of checking the hardware dma requirement
seperately, but for this path you should run into the same restrictions.
Perhaps you just got lucky when testing 2.6?

> Can somebody please throw some light on how to find
> your system's hard/soft block size ?

It's a per-device (or even per-partition, in case of mounted partitions)
setting, you can use the BLKBSZGET and BLKSSZGET ioctls to query for
soft/hard sector sizes.

--
Jens Axboe

2005-10-17 16:36:49

by Badari Pulavarty

[permalink] [raw]
Subject: Re: A problem about DIRECT IO on ext3

On Mon, 2005-10-17 at 11:51 +0200, Jens Axboe wrote:
> On Mon, Oct 17 2005, li nux wrote:
> >
> >
> > --- Jens Axboe <[email protected]> wrote:
> >
> > > On Mon, Oct 17 2005, Grzegorz Kulewski wrote:
> > > > On Mon, 17 Oct 2005, Jens Axboe wrote:
> > > > >>how to correct this problem ?
> > > > >
> > > > >See your buffer address, it's not aligned. You
> > > need to align that as
> > > > >well. This is needed because the hardware will
> > > dma directly to the user
> > > > >buffer, and to be on the safe side we require the
> > > same alignment as the
> > > > >block layer will normally generate for file
> > > system io.
> > > > >
> > > > >So in short, just align your read buffer to the
> > > same as your block size
> > > > >and you will be fine. Example:
> > > > >
> > > > >#define BS (4096)
> > > > >#define MASK (BS - 1)
> > > > >#define ALIGN(buf) (((unsigned long) (buf) +
> > > MASK) & ~(MASK))
> > > > >
> > > > >char *ptr = malloc(BS + MASK);
> > > > >char *buf = (char *) ALIGN(ptr);
> > > > >
> > > > >read(fd, buf, BS);
> > > >
> > > > Shouldn't one use posix_memalign(3) for that?
> > >
> > > Dunno if one 'should', one 'can' if one wants to. I
> > > prefer to do it
> > > manually so I don't have to jump through #define
> > > hoops to get at it
> > > (which, btw, still doesn't expose it on this
> > > machine).
> > >
> > > --
> > > Jens Axboe
> >
> > Thanx a lot Jens :-)
> > Its working now.
> > I did not have to make these adjustments on 2.6
> > Is looks to be having more relaxation.
>
> 2.6 does have the option of checking the hardware dma requirement
> seperately, but for this path you should run into the same restrictions.
> Perhaps you just got lucky when testing 2.6?

2.6 also has the same restriction. But, if the "filesystem
blocksize alignment" (soft block size) fails, we try to see
if its aligned with hard sector size (512). If so, we can do the IO.

2.4 fails if the offset or buffer is NOT filesystem blocksize
aligned. Period.

So, its possible that your buffer is atleast 512byte aligned,
there by succeeding on 2.6

BTW, posix_memalign() or valloc() should be safe.

>
> > Can somebody please throw some light on how to find
> > your system's hard/soft block size ?
>
> It's a per-device (or even per-partition, in case of mounted partitions)
> setting, you can use the BLKBSZGET and BLKSSZGET ioctls to query for
> soft/hard sector sizes.
>

Thanks,
Badari

2005-10-17 17:52:56

by Jens Axboe

[permalink] [raw]
Subject: Re: A problem about DIRECT IO on ext3

On Mon, Oct 17 2005, Badari Pulavarty wrote:
> On Mon, 2005-10-17 at 11:51 +0200, Jens Axboe wrote:
> > On Mon, Oct 17 2005, li nux wrote:
> > >
> > >
> > > --- Jens Axboe <[email protected]> wrote:
> > >
> > > > On Mon, Oct 17 2005, Grzegorz Kulewski wrote:
> > > > > On Mon, 17 Oct 2005, Jens Axboe wrote:
> > > > > >>how to correct this problem ?
> > > > > >
> > > > > >See your buffer address, it's not aligned. You
> > > > need to align that as
> > > > > >well. This is needed because the hardware will
> > > > dma directly to the user
> > > > > >buffer, and to be on the safe side we require the
> > > > same alignment as the
> > > > > >block layer will normally generate for file
> > > > system io.
> > > > > >
> > > > > >So in short, just align your read buffer to the
> > > > same as your block size
> > > > > >and you will be fine. Example:
> > > > > >
> > > > > >#define BS (4096)
> > > > > >#define MASK (BS - 1)
> > > > > >#define ALIGN(buf) (((unsigned long) (buf) +
> > > > MASK) & ~(MASK))
> > > > > >
> > > > > >char *ptr = malloc(BS + MASK);
> > > > > >char *buf = (char *) ALIGN(ptr);
> > > > > >
> > > > > >read(fd, buf, BS);
> > > > >
> > > > > Shouldn't one use posix_memalign(3) for that?
> > > >
> > > > Dunno if one 'should', one 'can' if one wants to. I
> > > > prefer to do it
> > > > manually so I don't have to jump through #define
> > > > hoops to get at it
> > > > (which, btw, still doesn't expose it on this
> > > > machine).
> > > >
> > > > --
> > > > Jens Axboe
> > >
> > > Thanx a lot Jens :-)
> > > Its working now.
> > > I did not have to make these adjustments on 2.6
> > > Is looks to be having more relaxation.
> >
> > 2.6 does have the option of checking the hardware dma requirement
> > seperately, but for this path you should run into the same restrictions.
> > Perhaps you just got lucky when testing 2.6?
>
> 2.6 also has the same restriction. But, if the "filesystem
> blocksize alignment" (soft block size) fails, we try to see
> if its aligned with hard sector size (512). If so, we can do the IO.
>
> 2.4 fails if the offset or buffer is NOT filesystem blocksize
> aligned. Period.

I'm aware of that, however this particular case was about the buffer
alignment (which was 32-bytes in the strace). And that should not work
for 2.6 either.

The block-size alignment is really a separate property of correctness.

> BTW, posix_memalign() or valloc() should be safe.

Certainly.

--
Jens Axboe