2003-08-06 09:49:27

by RaTao

[permalink] [raw]
Subject: [email protected] O_DIRECT


Hi!

While testing linux-2.6.0-test2-mm4 I noticed two things:

- O_DIRECT doesn't work, at least in ext3, with block size different
from filesystem's blocksize. (It doesn't work with 512 bs, at least).
This works in 2.6.0-test2, from 512 to 4096.

- vmstat doesn't show bi/bo for O_DIRECT's disk access.
Tested with filesystem's bs alignment.
This works in 2.6.0-test2.
(This one can be a feature, not a bug. But I really don't know)

Just to let you know! :)

If I can help with something feel free to ask. I tried to review -mm4
but it's too big for me so I can't point where the "problem" is...
Anyway, I suspect the AIO stuff ;)

Have fun,
RaTao




2003-08-06 19:16:26

by Andrew Morton

[permalink] [raw]
Subject: Re: [email protected] O_DIRECT

RaTao <[email protected]> wrote:
>
>
> Hi!
>
> While testing linux-2.6.0-test2-mm4 I noticed two things:
>
> - O_DIRECT doesn't work, at least in ext3, with block size different
> from filesystem's blocksize. (It doesn't work with 512 bs, at least).
> This works in 2.6.0-test2, from 512 to 4096.

It works OK here.

> - vmstat doesn't show bi/bo for O_DIRECT's disk access.

It does here.


I'd be suspecting your test app: is it checking the return value of all
syscalls?

2003-08-06 21:31:50

by RaTao

[permalink] [raw]
Subject: Re: linux-2.6.0-test2-mm4 O_DIRECT


Hi!

I've correct my (don't know how) misspelled subject :)

Andrew Morton wrote:

[..snip..]
>
>
> It works OK here.
>
>
>>- vmstat doesn't show bi/bo for O_DIRECT's disk access.
>
>
> It does here.
>

Maybe goofed somewhere. I can't test it again today, I'll do it tomorrow.


>
> I'd be suspecting your test app: is it checking the return value of all
> syscalls?

I'll double check.
Thanks,

Ratao



2003-08-06 22:37:51

by Daniel McNeil

[permalink] [raw]
Subject: Re: linux-2.6.0-test2-mm4 O_DIRECT

O_DIRECT also works for me on ext3 using regular write and async i/o
using 512-byte i/o.

Is your buffer alignment correct?
O_DIRECT requires a 512-byte aligned buffer.

Daniel
On Wed, 2003-08-06 at 14:32, RaTao wrote:
> Hi!
>
> I've correct my (don't know how) misspelled subject :)
>
> Andrew Morton wrote:
>
> [..snip..]
> >
> >
> > It works OK here.
> >
> >
> >>- vmstat doesn't show bi/bo for O_DIRECT's disk access.
> >
> >
> > It does here.
> >
>
> Maybe goofed somewhere. I can't test it again today, I'll do it tomorrow.
>
>
> >
> > I'd be suspecting your test app: is it checking the return value of all
> > syscalls?
>
> I'll double check.
> Thanks,
>
> Ratao
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2003-08-07 01:53:13

by RaTao

[permalink] [raw]
Subject: Re: linux-2.6.0-test2-mm4 O_DIRECT

Hi Andrew, Daniel!

Just recompiled test2-mm4 and I tested it with:

time iozone -f hello.data -i 0 -s 50000 -r 1 -I

(-I enables O_DIRECT, I straced it and fopen has O_DIRECT flag)

and everything works great! My app is working too!! I can't understand
it but everything is fine, now.

I'm sorry for wasting your time :(

Thanks,
Ratao

Daniel McNeil wrote:
> O_DIRECT also works for me on ext3 using regular write and async i/o
> using 512-byte i/o.
>
> Is your buffer alignment correct?
> O_DIRECT requires a 512-byte aligned buffer.
>
> Daniel
> On Wed, 2003-08-06 at 14:32, RaTao wrote:
>
>>Hi!
>>
>>I've correct my (don't know how) misspelled subject :)
>>
>>Andrew Morton wrote:
>>
>>[..snip..]
>>
>>>
>>>It works OK here.
>>>
>>>
>>>
>>>>- vmstat doesn't show bi/bo for O_DIRECT's disk access.
>>>
>>>
>>>It does here.
>>>
>>
>>Maybe goofed somewhere. I can't test it again today, I'll do it tomorrow.
>>
>>
>>
>>>I'd be suspecting your test app: is it checking the return value of all
>>>syscalls?
>>
>>I'll double check.
>>Thanks,
>>
>>Ratao
>>
>>
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>the body of a message to [email protected]
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>Please read the FAQ at http://www.tux.org/lkml/
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2003-08-07 02:18:14

by J.C. Wren

[permalink] [raw]
Subject: 2.6.0-test2, compact flash, IDE, and kobject errors

As I've mentioned before, I have an embedded Linux system that has been
running 2.2.12, and I'm looking at bringing it up to a modern kernel for some
features that we're looking at implementing. It's a 386EX system, 2MB flash
(for BIOS and kernel), 8MB RAM, and a 32MB compact flash card.

WIth both 2.5.69 (last version I attempted), and 2.6.0-test2, when the CF is
probed, I would get a message indicating kobject had failed with an EEXISTS
error code. After the the kernel spits out the message about the HD size,
I'd get:

hda: hda1
hda: hda1
(then the kobject failure and stack trace)

After a lot of printk's, I determine that the kernel is attempting to register
the partition or drive twice. This happens because in fs/partions/check.c,
register_disk() calls blkdev_get(). If blkdev_get() sees the media change
flag set, he calls rescan_partitions(), which causes the partition to be
registered. After it returns, register_disk() calls add_partition(), which
results in the kernel throwing a kobject error that it's already registered.

The solution that I think is correct (the audience LAUGH sign is now lit) is
to add a 'hdx=removable' and 'hdx=notremovable' config parameter. If you are
booting from a removable media device, such as a CF card (and certain items
like floppies seems to be special cased out, which I'm guessing is why you
don't see this on certain media types), this flag would override the
removable flag determined by the probe. And for whatever reason someone
might want to, a non-removable device could be marked as removable.

I need to clean out a bunch of printks, but if this isn't the totally wrong
approach, I'll submit a patch for it. So far, this patch seems to have fixed
my problem.

One question I do have is that e2fsck seems phenominally slower under
2.6.0-test2 than 2.2.12. It's the same version of e2fsck, so I'm guessing
the disk throughput is slower (it's all PIO), but I'm not sure what in the
IDE driver could have halfed or one-thirded the disk throughput. Any
thoughts on that would be greatly appreciated.

--John

Subject: Re: 2.6.0-test2, compact flash, IDE, and kobject errors


On Wed, 6 Aug 2003, J.C. Wren wrote:

> As I've mentioned before, I have an embedded Linux system that has been
> running 2.2.12, and I'm looking at bringing it up to a modern kernel for some
> features that we're looking at implementing. It's a 386EX system, 2MB flash
> (for BIOS and kernel), 8MB RAM, and a 32MB compact flash card.
>
> WIth both 2.5.69 (last version I attempted), and 2.6.0-test2, when the CF is
> probed, I would get a message indicating kobject had failed with an EEXISTS
> error code. After the the kernel spits out the message about the HD size,
> I'd get:
>
> hda: hda1
> hda: hda1
> (then the kobject failure and stack trace)
>
> After a lot of printk's, I determine that the kernel is attempting to register
> the partition or drive twice. This happens because in fs/partions/check.c,
> register_disk() calls blkdev_get(). If blkdev_get() sees the media change
> flag set, he calls rescan_partitions(), which causes the partition to be
> registered. After it returns, register_disk() calls add_partition(), which
> results in the kernel throwing a kobject error that it's already registered.
>
> The solution that I think is correct (the audience LAUGH sign is now lit) is
> to add a 'hdx=removable' and 'hdx=notremovable' config parameter. If you are
> booting from a removable media device, such as a CF card (and certain items
> like floppies seems to be special cased out, which I'm guessing is why you
> don't see this on certain media types), this flag would override the
> removable flag determined by the probe. And for whatever reason someone
> might want to, a non-removable device could be marked as removable.

Known problem. "ide-cs stack_dump" thread :-).
Does this patch help?

drivers/ide/ide-disk.c | 7 +++++++
drivers/ide/ide-floppy.c | 8 +++++++-
include/linux/ide.h | 1 +
3 files changed, 15 insertions(+), 1 deletion(-)

diff -puN drivers/ide/ide-disk.c~ide-attach-flag drivers/ide/ide-disk.c
--- linux-2.6.0-test2-bk3/drivers/ide/ide-disk.c~ide-attach-flag 2003-08-05 01:43:03.312872768 +0200
+++ linux-2.6.0-test2-bk3-root/drivers/ide/ide-disk.c 2003-08-05 01:48:44.197050496 +0200
@@ -1790,6 +1790,12 @@ static int idedisk_ioctl(struct inode *i
static int idedisk_media_changed(struct gendisk *disk)
{
ide_drive_t *drive = disk->private_data;
+
+ /* do not scan partitions twice if we are attaching this device */
+ if (drive->attach) {
+ drive->attach = 0;
+ return 0;
+ }
/* if removable, always assume it was changed */
return drive->removable;
}
@@ -1848,6 +1854,7 @@ static int idedisk_attach(ide_drive_t *d
g->flags = drive->removable ? GENHD_FL_REMOVABLE : 0;
set_capacity(g, current_capacity(drive));
g->fops = &idedisk_ops;
+ drive->attach = 1;
add_disk(g);
return 0;
failed:
diff -puN drivers/ide/ide-floppy.c~ide-attach-flag drivers/ide/ide-floppy.c
--- linux-2.6.0-test2-bk3/drivers/ide/ide-floppy.c~ide-attach-flag 2003-08-05 01:43:06.710356272 +0200
+++ linux-2.6.0-test2-bk3-root/drivers/ide/ide-floppy.c 2003-08-05 01:48:59.546716992 +0200
@@ -2006,7 +2006,12 @@ static int idefloppy_media_changed(struc
{
ide_drive_t *drive = disk->private_data;
idefloppy_floppy_t *floppy = drive->driver_data;
-
+
+ /* do not scan partitions twice if we are attaching this device */
+ if (drive->attach) {
+ drive->attach = 0;
+ return 0;
+ }
return test_and_clear_bit(IDEFLOPPY_MEDIA_CHANGED, &floppy->flags);
}

@@ -2061,6 +2066,7 @@ static int idefloppy_attach (ide_drive_t
strcpy(g->devfs_name, drive->devfs_name);
g->flags = drive->removable ? GENHD_FL_REMOVABLE : 0;
g->fops = &idefloppy_ops;
+ drive->attach = 1;
add_disk(g);
return 0;
failed:
diff -puN include/linux/ide.h~ide-attach-flag include/linux/ide.h
--- linux-2.6.0-test2-bk3/include/linux/ide.h~ide-attach-flag 2003-08-05 01:43:14.735136320 +0200
+++ linux-2.6.0-test2-bk3-root/include/linux/ide.h 2003-08-05 01:45:19.069234664 +0200
@@ -711,6 +711,7 @@ typedef struct ide_drive_s {
unsigned id_read : 1; /* 1=id read from disk 0 = synthetic */
unsigned noprobe : 1; /* from: hdx=noprobe */
unsigned removable : 1; /* 1 if need to do check_media_change */
+ unsigned attach : 1; /* set to 1 in ->attach() */
unsigned is_flash : 1; /* 1 if probed as flash */
unsigned forced_geom : 1; /* 1 if hdx=c,h,s was given at boot */
unsigned no_unmask : 1; /* disallow setting unmask bit */

_

> I need to clean out a bunch of printks, but if this isn't the totally wrong
> approach, I'll submit a patch for it. So far, this patch seems to have fixed
> my problem.

Can you send your patch (even with bunch of printks)?

> One question I do have is that e2fsck seems phenominally slower under
> 2.6.0-test2 than 2.2.12. It's the same version of e2fsck, so I'm guessing
> the disk throughput is slower (it's all PIO), but I'm not sure what in the
> IDE driver could have halfed or one-thirded the disk throughput. Any
> thoughts on that would be greatly appreciated.

There was a bug in e2fsck resulting in CPU hogging.

What makes you think that the disk throughput is halfed?
Can you check with hdparm?

Thanks,
--
Bartlomiej