2002-09-01 03:26:04

by Thomas Molina

[permalink] [raw]
Subject: 2.5.33-bk testing

A patch had been previously sent to the list fixing the __FUNCTION__
problem in ip_nat_helper.c was missing. It was needed before the file
would compile.

I've beat on the floppy driver and it seems to work well. I haven't been
able to make it hiccup yet.

The IDE subsystem likewise seems stable in limited testing.


2002-09-01 05:17:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.5.33-bk testing

In article <[email protected]>,
Thomas Molina <[email protected]> wrote:
>
>I've beat on the floppy driver and it seems to work well. I haven't been
>able to make it hiccup yet.

Really? I was looking at my cleanups, and I'm fairly certain they are
buggy. I just pushed an update to the BK tree a few minutes ago that I
think should fix it, but since I'm not a floppy user myself I can only
go by the source and trying to imagine everything that could go wrong.

Anyway, can you please pull the floppy patch from BK or apply this patch
on top of clean 2.5.33 to see if it makes any difference?

[ Now, in all honesty this should _only_ make a difference in case the
floppy request queue empties and another proces comes in and adds a
new request while the previous one is pending, so the timing for the
bug triggering is probably fairly interesting. Which may explain why
you find the floppy driver solid even without this.

Alternatively, I'm just full of crap, and this "fix" is bogus. In
either case I'd like to know about it ;]

Thanks for testing,

Linus

----
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/08/31 [email protected] 1.576
# De-queue the floppy request late, so that the higher levels
# know the floppy driver is busy.
# --------------------------------------------
#
diff -Nru a/drivers/block/floppy.c b/drivers/block/floppy.c
--- a/drivers/block/floppy.c Sat Aug 31 22:25:41 2002
+++ b/drivers/block/floppy.c Sat Aug 31 22:25:41 2002
@@ -2299,12 +2299,11 @@
return;
add_blkdev_randomness(major(dev));
floppy_off(DEVICE_NR(dev));
+ blkdev_dequeue_request(req);
end_that_request_last(req);

/* Get the next request */
req = elv_next_request(QUEUE);
- if (req)
- blkdev_dequeue_request(req);
CURRENT = req;
}

@@ -2939,7 +2938,6 @@
unlock_fdc();
return;
}
- blkdev_dequeue_request(req);
CURRENT = req;
}
if (major(CURRENT->rq_dev) != MAJOR_NR)

2002-09-01 06:08:02

by David Miller

[permalink] [raw]
Subject: Re: 2.5.33-bk testing

From: Thomas Molina <[email protected]>
Date: Sat, 31 Aug 2002 22:23:43 -0500 (CDT)

A patch had been previously sent to the list fixing the __FUNCTION__
problem in ip_nat_helper.c was missing. It was needed before the file
would compile.

I've got this fixed in my tree now, and I'll push it on to
Linus.

2002-09-01 11:55:25

by Mikael Pettersson

[permalink] [raw]
Subject: Re: 2.5.33-bk testing

On Sun, 1 Sep 2002 05:24:58 +0000 (UTC), Linus Torvalds wrote:
>In article <[email protected]>,
>Thomas Molina <[email protected]> wrote:
>>
>>I've beat on the floppy driver and it seems to work well. I haven't been
>>able to make it hiccup yet.
>
>Really? I was looking at my cleanups, and I'm fairly certain they are
>buggy. I just pushed an update to the BK tree a few minutes ago that I
>think should fix it, but since I'm not a floppy user myself I can only
>go by the source and trying to imagine everything that could go wrong.

I was able to get 2.5.33 (even with your patch) to corrupt data
in a few seconds: writes (with dd) put corrupted data on the
media, and reads (again with dd) returns data that doesn't
match what's on the media.

The patch below is an update of the floppy workarounds patch
I've been maintaining since the problems began in 2.5.13.
With this patch I'm able to reliably read and write to the
raw /dev/fd0 device. I'm not suggesting that my hack to
bdev->bd_block_size is the correct fix, but maybe someone who
understands the block I/O system can see what's going on and
do a proper fix.

There is also a fix to the broken {,un}register_sys_device() calls.
The driver forgets to unregister in several cases, leading to data
structure corruption.

/Mikael

diff -ruN linux-2.5.33.linus/drivers/block/floppy.c linux-2.5.33.floppy/drivers/block/floppy.c
--- linux-2.5.33.linus/drivers/block/floppy.c Sun Sep 1 13:25:00 2002
+++ linux-2.5.33.floppy/drivers/block/floppy.c Sun Sep 1 13:24:30 2002
@@ -3710,6 +3710,10 @@
UDRS->fd_ref = 0;
}
floppy_release_irq_and_dma();
+
+ /* undo ++bd_openers in floppy_open() */
+ --inode->i_bdev->bd_openers;
+
return 0;
}

@@ -3802,6 +3806,44 @@
invalidate_buffers(mk_kdev(FLOPPY_MAJOR,old_dev));
}

+ /* Problems:
+ * 1. fs/block_dev.c:do_open() calls bd_set_size() which
+ * changes bdev's block size after floppy_open() has
+ * returned. For some reason, this causes data corruption.
+ * 2. The device size appears to be zero on first access
+ * sometimes, especially if the first access is a write.
+ * Workarounds:
+ * 1. Set bdev block size equal to the hardsect/queue size.
+ * This seems to cure the data corruption problem for raw
+ * accesses. VFS accesses to mounted floppies still cause
+ * data corruption for unknown reasons.
+ * 2. Fake a constant 1.44MB i_size.
+ * 1&2. ++bdev->bd_openers to bypass do_open()'s changes.
+ * floppy_release() does a --bdev->bd_openers.
+ */
+ {
+ struct block_device *bdev = inode->i_bdev;
+ struct backing_dev_info *bdi;
+ sector_t sect;
+
+ bdev->bd_offset = 0;
+
+ sect = (1440 * 1024) >> 9;
+
+ /* like bd_set_size() but don't mutate block_size */
+ bdev->bd_inode->i_size = (loff_t)sect << 9;
+ bdev->bd_block_size = bdev_hardsect_size(bdev);
+ bdev->bd_inode->i_blkbits = blksize_bits(block_size(bdev));
+
+ bdi = blk_get_backing_dev_info(bdev);
+ if (bdi == NULL)
+ bdi = &default_backing_dev_info;
+ inode->i_data.backing_dev_info = bdi;
+ bdev->bd_inode->i_data.backing_dev_info = bdi;
+
+ ++bdev->bd_openers;
+ }
+
/* Allow ioctls if we have write-permissions even if read-only open.
* Needed so that programs such as fdrawcmd still can work on write
* protected disks */
@@ -4239,8 +4281,6 @@
{
int i,unit,drive;

- register_sys_device(&device_floppy);
-
raw_cmd = NULL;

devfs_handle = devfs_mk_dir (NULL, "floppy", NULL);
@@ -4367,6 +4407,9 @@
register_disk(NULL, mk_kdev(MAJOR_NR,TOMINOR(drive)+i*4),
1, &floppy_fops, 0);
}
+
+ register_sys_device(&device_floppy);
+
return have_no_fdc;
}

@@ -4549,6 +4592,7 @@
{
int dummy;

+ unregister_sys_device(&device_floppy);
devfs_unregister (devfs_handle);
unregister_blkdev(MAJOR_NR, "fd");

2002-09-01 16:35:19

by Thomas Molina

[permalink] [raw]
Subject: Re: 2.5.33-bk testing

On Sun, 1 Sep 2002, Mikael Pettersson wrote:

> I was able to get 2.5.33 (even with your patch) to corrupt data
> in a few seconds: writes (with dd) put corrupted data on the
> media, and reads (again with dd) returns data that doesn't
> match what's on the media.

I don't know why I didn't see the corruption the first time around.
However, I repeated the tests with Linus' floppy update and saw the same
thing you did.
dd if=/dev/fd0 of=floppyimage
produced corrupt output, as did a dd write to floppy command.

> The patch below is an update of the floppy workarounds patch
> I've been maintaining since the problems began in 2.5.13.
> With this patch I'm able to reliably read and write to the
> raw /dev/fd0 device. I'm not suggesting that my hack to
> bdev->bd_block_size is the correct fix, but maybe someone who
> understands the block I/O system can see what's going on and
> do a proper fix.

Thanks. Adding your workaround produced correct outputs. I hope we can
some other eyes on the code and get it integrated into mainline.

2002-09-01 17:51:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.5.33-bk testing


On Sun, 1 Sep 2002, Mikael Pettersson wrote:
>
> The patch below is an update of the floppy workarounds patch
> I've been maintaining since the problems began in 2.5.13.
> With this patch I'm able to reliably read and write to the
> raw /dev/fd0 device. I'm not suggesting that my hack to
> bdev->bd_block_size is the correct fix, but maybe someone who
> understands the block I/O system can see what's going on and
> do a proper fix.

It looks like what your fix does is to force a 512-byte blocksize on the
floppy.

Which implies to me that the floppy driver is broken for other blocksizes.

Does your patch still leave the floppy driver broken for something like a
mounted minix or ext2 filesystem? Those have 1kB blocksizes, and will set
it to that. If the non-512B blocksize in the floppy driver is broken, then
such mounted filesystems should not work reliably either.

Linus

2002-09-01 19:04:13

by Thomas Molina

[permalink] [raw]
Subject: Re: 2.5.33-bk testing

On Sun, 1 Sep 2002, Linus Torvalds wrote:

> It looks like what your fix does is to force a 512-byte blocksize on the
> floppy.
>
> Which implies to me that the floppy driver is broken for other blocksizes.
>
> Does your patch still leave the floppy driver broken for something like a
> mounted minix or ext2 filesystem? Those have 1kB blocksizes, and will set
> it to that. If the non-512B blocksize in the floppy driver is broken, then
> such mounted filesystems should not work reliably either.

You are correct. Making an ext2 filesystem on the floppy is indeed
broken. Attempting to read/write from the resulting filesystem results in
the following type errors:

Sep 1 13:18:23 dad kernel: EXT2-fs error (device fd(2,0)):
ext2_check_page: bad entry in directory #11: unaligned directory entry -
offset=7168, inode=1431634935, rec_len=50, name_len=0
Sep 1 13:20:20 dad kernel: EXT2-fs error (device fd(2,0)):
ext2_check_page: bad entry in directory #11: rec_len is smaller than
minimal - offset=7168, inode=0, rec_len=0, name_len=0
Sep 1 13:22:29 dad kernel: EXT2-fs warning: mounting fs with errors,
running e2fsck is recommended
Sep 1 13:23:09 dad kernel: EXT2-fs error (device fd(2,0)):
ext2_check_page: bad entry in directory #11: directory entry across
blocks - offset=7168, inode=861697588, rec_len=12592, name_len=119
Sep 1 13:24:38 dad kernel: EXT2-fs error (device fd(2,0)):
ext2_check_page: bad entry in directory #11: rec_len is smaller than
minimal - offset=7168, inode=0, rec_len=0, name_len=0

Putting a text file on the floppy shows 512 byte chunks of the file being
moved around depending on how many times the file is read/written.

2002-09-01 19:12:21

by Mikael Pettersson

[permalink] [raw]
Subject: Re: 2.5.33-bk testing

On Sun, 1 Sep 2002 11:03:44 -0700 (PDT), Linus Torvalds wrote:
>On Sun, 1 Sep 2002, Mikael Pettersson wrote:
>>
>> The patch below is an update of the floppy workarounds patch
>> I've been maintaining since the problems began in 2.5.13.
>> With this patch I'm able to reliably read and write to the
>> raw /dev/fd0 device. I'm not suggesting that my hack to
>> bdev->bd_block_size is the correct fix, but maybe someone who
>> understands the block I/O system can see what's going on and
>> do a proper fix.
>
>It looks like what your fix does is to force a 512-byte blocksize on the
>floppy.
>
>Which implies to me that the floppy driver is broken for other blocksizes.

I did not test other block sizes. I just tested with the hardsect size
and noticed that that combo worked.

>Does your patch still leave the floppy driver broken for something like a
>mounted minix or ext2 filesystem? Those have 1kB blocksizes, and will set
>it to that. If the non-512B blocksize in the floppy driver is broken, then
>such mounted filesystems should not work reliably either.

In kernels 2.5.13--2.5.32, it was always the case that with my hack,
a mounted ext2 would seem to work for a while and then break down,
e.g. when unmounted and fsck:d. I haven't checked 2.5.33.

Also, attempts to put lilo on ext2 on /dev/fd0 would throw the
kernel into an infinite stream of "buffer layer error!" from
buffer.c:__find_get_block_slow().

/Mikael

2002-09-01 19:22:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.5.33-bk testing


On Sun, 1 Sep 2002, Mikael Pettersson wrote:
>
> In kernels 2.5.13--2.5.32, it was always the case that with my hack,
> a mounted ext2 would seem to work for a while and then break down,
> e.g. when unmounted and fsck:d. I haven't checked 2.5.33.

Ok, that seems to clinch it - that definitely means that the floppy driver
is confused by non-512-byte buffers. And that in turn is almost certainly
due to some subtle bug in the bio rework.

The only bug I really fixed in 2.5.33 was completely unrelated, and had to
do with the setup for the block open() routine not setting up the request
queue - which caused problems for the media change logic.

Linus