Hello all:
Steps to reproduce:
rm -f /tmp/1.log
touch /tmp/1.log
echo << EOF > /tmp/hang.c
#include <sys/types.h>
#include <asm/fcntl.h>
main()
{
int i;
char buf[1025];
i = open("/tmp/1.log", O_RDONLY | 040000, 0);
if ( i != -1) {
read(i, buf, 1);
}
printf("'%s'", buf);
}
EOF
gcc -o /tmp/hang /tmp/hang.c
/tmp/hang
--
Hu Gang / Steve
Email : [email protected], [email protected]
GPG FinePrint : 4099 3F1D AE01 1817 68F7 D499 A6C2 C418 86C8 610E
GPG Public Key: http://soulinfo.com/~hugang/HuGang.asc
MSN# : [email protected] [9:00AM - 5:30PM +8:00]
RLU# : 204016 [1999] (Register Linux User)
Hugang <[email protected]> wrote:
>
> Hello all:
>
> Steps to reproduce:
>
> rm -f /tmp/1.log
> touch /tmp/1.log
> echo << EOF > /tmp/hang.c
> #include <sys/types.h>
> #include <asm/fcntl.h>
>
> main()
> {
> int i;
> char buf[1025];
>
> i = open("/tmp/1.log", O_RDONLY | 040000, 0);
> if ( i != -1) {
> read(i, buf, 1);
> }
> printf("'%s'", buf);
> }
> EOF
> gcc -o /tmp/hang /tmp/hang.c
> /tmp/hang
This is due to O_DIRECT-race-fixes.patch forgetting to drop locks
on error paths all over the place.
I think this patch plugs them all for block-based direct-io, but it needs
checking.
There's also the little matter of (say) NFS direct-io which doesn't go
through fs/direct-io.c at all; it will deadlock in a jiffy.
Must say that I am getting very concerned about the general state of the IO
paths in the -mm kernel.
fs/direct-io.c | 20 +++++++++++++-------
mm/filemap.c | 15 ++++++++++-----
2 files changed, 23 insertions(+), 12 deletions(-)
diff -puN fs/direct-io.c~O_DIRECT-race-fixes-fixes-2 fs/direct-io.c
--- 25/fs/direct-io.c~O_DIRECT-race-fixes-fixes-2 2003-09-18 22:33:26.000000000 -0700
+++ 25-akpm/fs/direct-io.c 2003-09-18 22:47:13.000000000 -0700
@@ -858,18 +858,15 @@ out:
static int
direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode,
const struct iovec *iov, loff_t offset, unsigned long nr_segs,
- unsigned blkbits, get_blocks_t get_blocks, dio_iodone_t end_io)
+ unsigned blkbits, get_blocks_t get_blocks, dio_iodone_t end_io,
+ struct dio *dio)
{
unsigned long user_addr;
int seg;
int ret = 0;
int ret2;
- struct dio *dio;
size_t bytes;
- dio = kmalloc(sizeof(*dio), GFP_KERNEL);
- if (!dio)
- return -ENOMEM;
dio->is_async = !is_sync_kiocb(iocb);
dio->bio = NULL;
@@ -1016,6 +1013,7 @@ blockdev_direct_IO(int rw, struct kiocb
unsigned bdev_blkbits = 0;
unsigned blocksize_mask = (1 << blkbits) - 1;
ssize_t retval = -EINVAL;
+ struct dio *dio;
if (bdev)
bdev_blkbits = blksize_bits(bdev_hardsect_size(bdev));
@@ -1041,8 +1039,16 @@ blockdev_direct_IO(int rw, struct kiocb
}
}
- retval = direct_io_worker(rw, iocb, inode, iov, offset,
- nr_segs, blkbits, get_blocks, end_io);
+ dio = kmalloc(sizeof(*dio), GFP_KERNEL);
+ retval = -ENOMEM;
+ if (!dio)
+ goto out;
+
+ return direct_io_worker(rw, iocb, inode, iov, offset,
+ nr_segs, blkbits, get_blocks, end_io, dio);
out:
+ up(&inode->i_sem);
+ if (S_ISREG(inode->i_mode))
+ up_read(&inode->i_alloc_sem);
return retval;
}
diff -puN mm/filemap.c~O_DIRECT-race-fixes-fixes-2 mm/filemap.c
--- 25/mm/filemap.c~O_DIRECT-race-fixes-fixes-2 2003-09-18 22:42:49.000000000 -0700
+++ 25-akpm/mm/filemap.c 2003-09-18 22:51:30.000000000 -0700
@@ -886,12 +886,12 @@ __generic_file_aio_read(struct kiocb *io
retval = 0;
if (!count)
goto out; /* skip atime */
- if (S_ISREG(inode->i_mode)) {
- down_read(&inode->i_alloc_sem);
- down(&inode->i_sem);
- }
size = i_size_read(inode);
if (pos < size) {
+ if (S_ISREG(inode->i_mode)) {
+ down_read(&inode->i_alloc_sem);
+ down(&inode->i_sem);
+ }
retval = generic_file_direct_IO(READ, iocb,
iov, pos, nr_segs);
if (retval >= 0 && !is_sync_kiocb(iocb))
@@ -2176,7 +2176,8 @@ generic_file_direct_IO(int rw, struct ki
loff_t offset, unsigned long nr_segs)
{
struct file *file = iocb->ki_filp;
- struct address_space *mapping = file->f_dentry->d_inode->i_mapping;
+ struct inode *inode = file->f_dentry->d_inode;
+ struct address_space *mapping = inode->i_mapping;
ssize_t retval;
if (mapping->nrpages) {
@@ -2190,6 +2191,10 @@ generic_file_direct_IO(int rw, struct ki
retval = mapping->a_ops->direct_IO(rw, iocb, iov, offset, nr_segs);
if (rw == WRITE && mapping->nrpages)
invalidate_inode_pages2(mapping);
+ return retval;
out:
+ up(&inode->i_sem);
+ if (S_ISREG(inode->i_mode))
+ up_read(&inode->i_alloc_sem);
return retval;
}
_
On Fri, 19 Sep 2003, Hugang wrote:
Your script cannot work.
Also, nothing hangs.
> Hello all:
>
> Steps to reproduce:
>
> rm -f /tmp/1.log
> touch /tmp/1.log
> echo << EOF > /tmp/hang.c
^^^^______ cat, not echo
> #include <sys/types.h>
> #include <asm/fcntl.h>
>
> main()
> {
> int i;
> char buf[1025];
>
> i = open("/tmp/1.log", O_RDONLY | 040000, 0);
> if ( i != -1) {
> read(i, buf, 1);
> }
> printf("'%s'", buf);
> }
> EOF
> gcc -o /tmp/hang /tmp/hang.c
> /tmp/hang
>
>
This is a `strace` of it working:
getpid() = 14243
open("/tmp/1.log", O_RDONLY|0x4000) = 3
read(3, "", 1) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(4, 1), ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x400aa000
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(1, "\'\'", 2) = 2
munmap(0x400aa000, 4096) = 0
_exit(2) = ?
Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (794.73 BogoMips).
Note 96.31% of all statistics are fiction.
On Thursday 18 September 2003 10:54 pm, Andrew Morton wrote:
> Hugang <[email protected]> wrote:
> > Hello all:
> >
> > Steps to reproduce:
> >
> > rm -f /tmp/1.log
> > touch /tmp/1.log
> > echo << EOF > /tmp/hang.c
> > #include <sys/types.h>
> > #include <asm/fcntl.h>
> >
> > main()
> > {
> > int i;
> > char buf[1025];
> >
> > i = open("/tmp/1.log", O_RDONLY | 040000, 0);
> > if ( i != -1) {
> > read(i, buf, 1);
> > }
> > printf("'%s'", buf);
> > }
> > EOF
> > gcc -o /tmp/hang /tmp/hang.c
> > /tmp/hang
>
> This is due to O_DIRECT-race-fixes.patch forgetting to drop locks
> on error paths all over the place.
>
> I think this patch plugs them all for block-based direct-io, but it needs
> checking.
>
> There's also the little matter of (say) NFS direct-io which doesn't go
> through fs/direct-io.c at all; it will deadlock in a jiffy.
>
> Must say that I am getting very concerned about the general state of the IO
> paths in the -mm kernel.
Andrew,
I am also seeing some kind of regression on raw in 2.6.0-test5-mm2.
Unfortunately, this happens only with huge database benchmarks.
I still haven't narrowed it down.
So I am planning to do following to address your concerns.
1) I am going to do full code review of DIO & RAW code in -mm tree.
2) I want to test RAW & DIO code with various test cases we have.
- fsx:
on RAW, DIO on files, AIO on RAW, AIO on files, AIO-DIO on files
- rawiobench
on RAW, DIO on files, AIO on RAW, AIO on files, AIO-DIO on files
- database benchmarks:
on RAW, on files
- can I do fsx, rawiobench on NFS files to test DIO on NFS ?
- What else you would like to see ?
Thanks,
Badari
Badari Pulavarty <[email protected]> wrote:
>
> I am also seeing some kind of regression on raw in 2.6.0-test5-mm2.
What is "some kind of regression"?
> Unfortunately, this happens only with huge database benchmarks.
> I still haven't narrowed it down.
Use mm3 - it has fixes. Daniel McNeil reports that mm3 fixes the dbt2
problems he was seeing.
>
> - can I do fsx, rawiobench on NFS files to test DIO on NFS ?
Probably, but it will deadlock immediately.
All the i_sem and i_alloc_sem rework needs to be pushed down to the
blockdev_direct_IO level. That means reverting
O_DIRECT-race-fixes-fixes-2.patch first.
On Friday 19 September 2003 09:57 am, Andrew Morton wrote:
> Badari Pulavarty <[email protected]> wrote:
> > I am also seeing some kind of regression on raw in 2.6.0-test5-mm2.
>
> What is "some kind of regression"?
I am getting different errors with database: ( I don't see these with my
"dd" tests)
(1) sometimes open fails with EFAULT
(2) sometimes read/write fails with EFAULT
I have been running on "raw" for quite a while, i haven't seen this before.
I moved my logdevice to filesystem files, everything is fine.
>
> > Unfortunately, this happens only with huge database benchmarks.
> > I still haven't narrowed it down.
>
> Use mm3 - it has fixes. Daniel McNeil reports that mm3 fixes the dbt2
> problems he was seeing.
Okay. I will use -mm3.
Thanks,
Badari
On Friday 19 September 2003 09:57 am, Andrew Morton wrote:
> Badari Pulavarty <[email protected]> wrote:
> > I am also seeing some kind of regression on raw in 2.6.0-test5-mm2.
>
> What is "some kind of regression"?
>
> > Unfortunately, this happens only with huge database benchmarks.
> > I still haven't narrowed it down.
>
> Use mm3 - it has fixes. Daniel McNeil reports that mm3 fixes the dbt2
> problems he was seeing.
My database tests work fine with 2.6.0-test5-mm3. My database doesn't
complain about EFAULTs any more.
Thanks,
Badari