2003-09-19 04:46:39

by Hu Gang

[permalink] [raw]
Subject: use O_DIRECT open file, when read will hang.

Hello all:

Steps to reproduce:

rm -f /tmp/1.log
touch /tmp/1.log
echo << EOF > /tmp/hang.c
#include <sys/types.h>
#include <asm/fcntl.h>

main()
{
int i;
char buf[1025];

i = open("/tmp/1.log", O_RDONLY | 040000, 0);
if ( i != -1) {
read(i, buf, 1);
}
printf("'%s'", buf);
}
EOF
gcc -o /tmp/hang /tmp/hang.c
/tmp/hang


--
Hu Gang / Steve
Email : [email protected], [email protected]
GPG FinePrint : 4099 3F1D AE01 1817 68F7 D499 A6C2 C418 86C8 610E
GPG Public Key: http://soulinfo.com/~hugang/HuGang.asc
MSN# : [email protected] [9:00AM - 5:30PM +8:00]
RLU# : 204016 [1999] (Register Linux User)


2003-09-19 05:54:08

by Andrew Morton

[permalink] [raw]
Subject: Re: use O_DIRECT open file, when read will hang.

Hugang <[email protected]> wrote:
>
> Hello all:
>
> Steps to reproduce:
>
> rm -f /tmp/1.log
> touch /tmp/1.log
> echo << EOF > /tmp/hang.c
> #include <sys/types.h>
> #include <asm/fcntl.h>
>
> main()
> {
> int i;
> char buf[1025];
>
> i = open("/tmp/1.log", O_RDONLY | 040000, 0);
> if ( i != -1) {
> read(i, buf, 1);
> }
> printf("'%s'", buf);
> }
> EOF
> gcc -o /tmp/hang /tmp/hang.c
> /tmp/hang

This is due to O_DIRECT-race-fixes.patch forgetting to drop locks
on error paths all over the place.

I think this patch plugs them all for block-based direct-io, but it needs
checking.

There's also the little matter of (say) NFS direct-io which doesn't go
through fs/direct-io.c at all; it will deadlock in a jiffy.

Must say that I am getting very concerned about the general state of the IO
paths in the -mm kernel.


fs/direct-io.c | 20 +++++++++++++-------
mm/filemap.c | 15 ++++++++++-----
2 files changed, 23 insertions(+), 12 deletions(-)

diff -puN fs/direct-io.c~O_DIRECT-race-fixes-fixes-2 fs/direct-io.c
--- 25/fs/direct-io.c~O_DIRECT-race-fixes-fixes-2 2003-09-18 22:33:26.000000000 -0700
+++ 25-akpm/fs/direct-io.c 2003-09-18 22:47:13.000000000 -0700
@@ -858,18 +858,15 @@ out:
static int
direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode,
const struct iovec *iov, loff_t offset, unsigned long nr_segs,
- unsigned blkbits, get_blocks_t get_blocks, dio_iodone_t end_io)
+ unsigned blkbits, get_blocks_t get_blocks, dio_iodone_t end_io,
+ struct dio *dio)
{
unsigned long user_addr;
int seg;
int ret = 0;
int ret2;
- struct dio *dio;
size_t bytes;

- dio = kmalloc(sizeof(*dio), GFP_KERNEL);
- if (!dio)
- return -ENOMEM;
dio->is_async = !is_sync_kiocb(iocb);

dio->bio = NULL;
@@ -1016,6 +1013,7 @@ blockdev_direct_IO(int rw, struct kiocb
unsigned bdev_blkbits = 0;
unsigned blocksize_mask = (1 << blkbits) - 1;
ssize_t retval = -EINVAL;
+ struct dio *dio;

if (bdev)
bdev_blkbits = blksize_bits(bdev_hardsect_size(bdev));
@@ -1041,8 +1039,16 @@ blockdev_direct_IO(int rw, struct kiocb
}
}

- retval = direct_io_worker(rw, iocb, inode, iov, offset,
- nr_segs, blkbits, get_blocks, end_io);
+ dio = kmalloc(sizeof(*dio), GFP_KERNEL);
+ retval = -ENOMEM;
+ if (!dio)
+ goto out;
+
+ return direct_io_worker(rw, iocb, inode, iov, offset,
+ nr_segs, blkbits, get_blocks, end_io, dio);
out:
+ up(&inode->i_sem);
+ if (S_ISREG(inode->i_mode))
+ up_read(&inode->i_alloc_sem);
return retval;
}
diff -puN mm/filemap.c~O_DIRECT-race-fixes-fixes-2 mm/filemap.c
--- 25/mm/filemap.c~O_DIRECT-race-fixes-fixes-2 2003-09-18 22:42:49.000000000 -0700
+++ 25-akpm/mm/filemap.c 2003-09-18 22:51:30.000000000 -0700
@@ -886,12 +886,12 @@ __generic_file_aio_read(struct kiocb *io
retval = 0;
if (!count)
goto out; /* skip atime */
- if (S_ISREG(inode->i_mode)) {
- down_read(&inode->i_alloc_sem);
- down(&inode->i_sem);
- }
size = i_size_read(inode);
if (pos < size) {
+ if (S_ISREG(inode->i_mode)) {
+ down_read(&inode->i_alloc_sem);
+ down(&inode->i_sem);
+ }
retval = generic_file_direct_IO(READ, iocb,
iov, pos, nr_segs);
if (retval >= 0 && !is_sync_kiocb(iocb))
@@ -2176,7 +2176,8 @@ generic_file_direct_IO(int rw, struct ki
loff_t offset, unsigned long nr_segs)
{
struct file *file = iocb->ki_filp;
- struct address_space *mapping = file->f_dentry->d_inode->i_mapping;
+ struct inode *inode = file->f_dentry->d_inode;
+ struct address_space *mapping = inode->i_mapping;
ssize_t retval;

if (mapping->nrpages) {
@@ -2190,6 +2191,10 @@ generic_file_direct_IO(int rw, struct ki
retval = mapping->a_ops->direct_IO(rw, iocb, iov, offset, nr_segs);
if (rw == WRITE && mapping->nrpages)
invalidate_inode_pages2(mapping);
+ return retval;
out:
+ up(&inode->i_sem);
+ if (S_ISREG(inode->i_mode))
+ up_read(&inode->i_alloc_sem);
return retval;
}

_

2003-09-19 13:05:17

by Richard B. Johnson

[permalink] [raw]
Subject: Re: use O_DIRECT open file, when read will hang.

On Fri, 19 Sep 2003, Hugang wrote:

Your script cannot work.
Also, nothing hangs.

> Hello all:
>
> Steps to reproduce:
>
> rm -f /tmp/1.log
> touch /tmp/1.log
> echo << EOF > /tmp/hang.c
^^^^______ cat, not echo

> #include <sys/types.h>
> #include <asm/fcntl.h>
>
> main()
> {
> int i;
> char buf[1025];
>
> i = open("/tmp/1.log", O_RDONLY | 040000, 0);
> if ( i != -1) {
> read(i, buf, 1);
> }
> printf("'%s'", buf);
> }
> EOF
> gcc -o /tmp/hang /tmp/hang.c
> /tmp/hang
>
>

This is a `strace` of it working:

getpid() = 14243
open("/tmp/1.log", O_RDONLY|0x4000) = 3
read(3, "", 1) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(4, 1), ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x400aa000
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(1, "\'\'", 2) = 2
munmap(0x400aa000, 4096) = 0
_exit(2) = ?

Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (794.73 BogoMips).
Note 96.31% of all statistics are fiction.


2003-09-19 16:40:31

by Badari Pulavarty

[permalink] [raw]
Subject: Re: use O_DIRECT open file, when read will hang.

On Thursday 18 September 2003 10:54 pm, Andrew Morton wrote:
> Hugang <[email protected]> wrote:
> > Hello all:
> >
> > Steps to reproduce:
> >
> > rm -f /tmp/1.log
> > touch /tmp/1.log
> > echo << EOF > /tmp/hang.c
> > #include <sys/types.h>
> > #include <asm/fcntl.h>
> >
> > main()
> > {
> > int i;
> > char buf[1025];
> >
> > i = open("/tmp/1.log", O_RDONLY | 040000, 0);
> > if ( i != -1) {
> > read(i, buf, 1);
> > }
> > printf("'%s'", buf);
> > }
> > EOF
> > gcc -o /tmp/hang /tmp/hang.c
> > /tmp/hang
>
> This is due to O_DIRECT-race-fixes.patch forgetting to drop locks
> on error paths all over the place.
>
> I think this patch plugs them all for block-based direct-io, but it needs
> checking.
>
> There's also the little matter of (say) NFS direct-io which doesn't go
> through fs/direct-io.c at all; it will deadlock in a jiffy.
>
> Must say that I am getting very concerned about the general state of the IO
> paths in the -mm kernel.

Andrew,

I am also seeing some kind of regression on raw in 2.6.0-test5-mm2.
Unfortunately, this happens only with huge database benchmarks.
I still haven't narrowed it down.

So I am planning to do following to address your concerns.

1) I am going to do full code review of DIO & RAW code in -mm tree.

2) I want to test RAW & DIO code with various test cases we have.
- fsx:
on RAW, DIO on files, AIO on RAW, AIO on files, AIO-DIO on files

- rawiobench
on RAW, DIO on files, AIO on RAW, AIO on files, AIO-DIO on files

- database benchmarks:
on RAW, on files

- can I do fsx, rawiobench on NFS files to test DIO on NFS ?

- What else you would like to see ?

Thanks,
Badari

2003-09-19 16:56:31

by Andrew Morton

[permalink] [raw]
Subject: Re: use O_DIRECT open file, when read will hang.

Badari Pulavarty <[email protected]> wrote:
>
> I am also seeing some kind of regression on raw in 2.6.0-test5-mm2.

What is "some kind of regression"?

> Unfortunately, this happens only with huge database benchmarks.
> I still haven't narrowed it down.

Use mm3 - it has fixes. Daniel McNeil reports that mm3 fixes the dbt2
problems he was seeing.

>
> - can I do fsx, rawiobench on NFS files to test DIO on NFS ?

Probably, but it will deadlock immediately.

All the i_sem and i_alloc_sem rework needs to be pushed down to the
blockdev_direct_IO level. That means reverting
O_DIRECT-race-fixes-fixes-2.patch first.


2003-09-19 17:39:41

by Badari Pulavarty

[permalink] [raw]
Subject: Re: use O_DIRECT open file, when read will hang.

On Friday 19 September 2003 09:57 am, Andrew Morton wrote:
> Badari Pulavarty <[email protected]> wrote:
> > I am also seeing some kind of regression on raw in 2.6.0-test5-mm2.
>
> What is "some kind of regression"?

I am getting different errors with database: ( I don't see these with my
"dd" tests)

(1) sometimes open fails with EFAULT
(2) sometimes read/write fails with EFAULT

I have been running on "raw" for quite a while, i haven't seen this before.
I moved my logdevice to filesystem files, everything is fine.

>
> > Unfortunately, this happens only with huge database benchmarks.
> > I still haven't narrowed it down.
>
> Use mm3 - it has fixes. Daniel McNeil reports that mm3 fixes the dbt2
> problems he was seeing.

Okay. I will use -mm3.

Thanks,
Badari

2003-09-19 22:13:16

by Badari Pulavarty

[permalink] [raw]
Subject: Re: use O_DIRECT open file, when read will hang.

On Friday 19 September 2003 09:57 am, Andrew Morton wrote:
> Badari Pulavarty <[email protected]> wrote:
> > I am also seeing some kind of regression on raw in 2.6.0-test5-mm2.
>
> What is "some kind of regression"?
>
> > Unfortunately, this happens only with huge database benchmarks.
> > I still haven't narrowed it down.
>
> Use mm3 - it has fixes. Daniel McNeil reports that mm3 fixes the dbt2
> problems he was seeing.

My database tests work fine with 2.6.0-test5-mm3. My database doesn't
complain about EFAULTs any more.

Thanks,
Badari