LinuxLists.cc - [PATCH 0/8] floppy: cleanup patches

2010-06-09 18:54:20

Subject: [PATCH 0/8] floppy: cleanup patches

When going through floppy driver addressing some of the timing
bugs, I also saw these leftover things that ought to be cleaned up.

2010-06-09 18:46:46

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH 0/8] floppy: cleanup patches

On Wed, 9 Jun 2010, Stephen Hemminger wrote:
>
> When going through floppy driver addressing some of the timing
> bugs, I also saw these leftover things that ought to be cleaned up.

Ack on them all as far as I'm concerned. I suspect the _real_ bugs in
floppy.c are about various paths not holding the floppy_lock spinlock, but
the patches all look fine and certainly don't make anything worse.

Mind re-sending them after 2.6.35, though? (Or maybe somebody like Greg
wants to queue them up in their drievr trees)

Linus

2010-06-09 18:54:10

by Stephen Hemminger

[permalink] [raw]

Subject: Re: [PATCH 0/8] floppy: cleanup patches

On Wed, 9 Jun 2010 11:46:13 -0700 (PDT)
Linus Torvalds <[email protected]> wrote:

>
>
> On Wed, 9 Jun 2010, Stephen Hemminger wrote:
> >
> > When going through floppy driver addressing some of the timing
> > bugs, I also saw these leftover things that ought to be cleaned up.
>
> Ack on them all as far as I'm concerned. I suspect the _real_ bugs in
> floppy.c are about various paths not holding the floppy_lock spinlock, but
> the patches all look fine and certainly don't make anything worse.
>
> Mind re-sending them after 2.6.35, though? (Or maybe somebody like Greg
> wants to queue them up in their drievr trees)
>
> Linus

Current plan is to convert all the timers and old BH code to go through
a single workqueue. This avoids all the races and cleans up the cancellation
logic as well without having lots of other changes.

--

2010-06-09 19:05:39

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH 0/8] floppy: cleanup patches

On Wed, 9 Jun 2010, Stephen Hemminger wrote:
>
> Current plan is to convert all the timers and old BH code to go through
> a single workqueue. This avoids all the races and cleans up the cancellation
> logic as well without having lots of other changes.

That will definitely help. We still end up with the things that call the
request queue directly as a second (nonworkqueue) thread, but I think
that's what the floppy_lock has _traditionally_ protected against, so it
might all work out and largely fix the locking.

Linus

2010-06-10 06:43:35

by Thomas Meyer

[permalink] [raw]

Subject: Re: [PATCH 0/8] floppy: cleanup patches

Am Mittwoch, den 09.06.2010, 11:34 -0700 schrieb Stephen Hemminger:
> When going through floppy driver addressing some of the timing
> bugs, I also saw these leftover things that ought to be cleaned up.
>

With your patches applied on top of 2.6.34 I still get this timeout
while resuming from ram:

[ 1086.971283] ata2.00: configured for UDMA/133
[ 1089.986024]
[ 1089.986026] floppy driver state
[ 1089.986026] -------------------
[ 1089.986032] now=4295757282 last interrupt=4295281727 diff=475555 last called handler=main_command_interrupt
[ 1089.986033] timeout_message=lock fdc
[ 1089.986034] last output bytes:
[ 1089.986035] 2 90 4295192821
[ 1089.986036] 1 90 4295192821
[ 1089.986037] d 90 4295192821
[ 1089.986038] 2 90 4295192821
[ 1089.986039] e 90 4295192821
[ 1089.986040] 1b 90 4295192821
[ 1089.986041] ff 90 4295192821
[ 1089.986042] f 80 4295281121
[ 1089.986043] 0 90 4295281121
[ 1089.986044] 0 91 4295281121
[ 1089.986045] 8 81 4295281129
[ 1089.986046] 45 80 4295281499
[ 1089.986047] 0 90 4295281499
[ 1089.986048] 0 90 4295281499
[ 1089.986049] 0 90 4295281499
[ 1089.986050] 3 90 4295281499
[ 1089.986051] 2 90 4295281499
[ 1089.986052] 4 90 4295281499
[ 1089.986053] 1b 90 4295281499
[ 1089.986054] ff 90 4295281499
[ 1089.986055] last result at 4295281741
[ 1089.986056] last redo_fd_request at 4295281741
[ 1089.986060] status=a
[ 1089.986061] fdc_busy=1
[ 1089.986063] do_floppy=reset_interrupt
[ 1089.986064] cont=ffffffffa0009ae0
[ 1089.986065] current_req=(null)
[ 1089.986066] command_status=-1
[ 1089.986067]
[ 1089.986069] floppy0: floppy timeout called
[ 1089.986094] PM: resume of devices complete after 10536.229 msecs
[ 1089.986319] PM: resume devices took 10.537 seconds
[ 1089.986320] ------------[ cut here ]------------
[ 1089.986325] WARNING: at kernel/power/suspend_test.c:53 suspend_devices_and_enter+0xd2/0x220()
[ 1089.986326] Hardware name: MS-7250
[ 1089.986327] Component: resume devices, time: 10537
[ 1089.986328] Modules linked in: rndis_wlan olympic forcedeth floppy [last unloaded: scsi_wait_scan]
[ 1089.986333] Pid: 15510, comm: pm-suspend Not tainted 2.6.34 #1
[ 1089.986334] Call Trace:
[ 1089.986340] [<ffffffff81063283>] ? warn_slowpath_common+0x73/0xb0
[ 1089.986343] [<ffffffff81063320>] ? warn_slowpath_fmt+0x40/0x50
[ 1089.986345] [<ffffffff81095132>] ? suspend_devices_and_enter+0xd2/0x220
[ 1089.986348] [<ffffffff810953a8>] ? enter_state+0x128/0x150
[ 1089.986350] [<ffffffff810949cd>] ? state_store+0x8d/0xf0
[ 1089.986353] [<ffffffff81165215>] ? sysfs_write_file+0xd5/0x160
[ 1089.986356] [<ffffffff811029e8>] ? vfs_write+0xb8/0x1a0
[ 1089.986360] [<ffffffff810aff51>] ? audit_syscall_entry+0x1c1/0x1f0
[ 1089.986362] [<ffffffff811032ce>] ? sys_write+0x4e/0x90
[ 1089.986365] [<ffffffff81027e6b>] ? system_call_fastpath+0x16/0x1b
[ 1089.986367] ---[ end trace 210bc7fac7be9fbd ]---
[ 1089.986432] PM: Finishing wakeup.
[ 1089.986433] Restarting tasks ...

Is this okay?

2010-06-11 17:31:03

by Stephen Hemminger

[permalink] [raw]

Subject: Re: [PATCH 0/8] floppy: cleanup patches

On Thu, 10 Jun 2010 08:43:32 +0200
Thomas Meyer <[email protected]> wrote:

> Am Mittwoch, den 09.06.2010, 11:34 -0700 schrieb Stephen Hemminger:
> > When going through floppy driver addressing some of the timing
> > bugs, I also saw these leftover things that ought to be cleaned up.
> >
>
> With your patches applied on top of 2.6.34 I still get this timeout
> while resuming from ram:

Not clear, did the problem occur before my patches? The posted set
of patches should not change visible functionalty or fix any observable
bug. Floppy driver would be bug for bug the same.

--

2010-06-11 17:32:31

by Stephen Hemminger

[permalink] [raw]

Subject: [RFC] floppy: use single threaded workqueue

Fix races between timers and bottom half by using delayed work
and a single threaded queue.

Still needs more testing.

Signed-off-by: Stephen Hemminger <[email protected]>

---
drivers/block/floppy.c | 172 ++++++++++++++++++++++++++-----------------------
1 file changed, 94 insertions(+), 78 deletions(-)

--- a/drivers/block/floppy.c 2010-06-11 09:47:59.000000000 -0700
+++ b/drivers/block/floppy.c 2010-06-11 10:06:01.451555907 -0700
@@ -553,7 +553,7 @@ static void floppy_ready(void);
static void floppy_start(void);
static void process_fd_request(void);
static void recalibrate_floppy(void);
-static void floppy_shutdown(unsigned long);
+static void floppy_shutdown(struct work_struct *);

static int floppy_request_regions(int);
static void floppy_release_regions(int);
@@ -590,6 +590,8 @@ static int buffer_max = -1;
static struct floppy_fdc_state fdc_state[N_FDC];
static int fdc; /* current fdc */

+static struct workqueue_struct *floppy_workq;
+
static struct floppy_struct *_floppy = floppy_type;
static unsigned char current_drive;
static long current_count_sectors;
@@ -626,16 +628,15 @@ static inline void set_debugt(void) { }
static inline void debugt(const char *func, const char *msg) { }
#endif /* DEBUGT */

-typedef void (*timeout_fn)(unsigned long);
-static DEFINE_TIMER(fd_timeout, floppy_shutdown, 0, 0);

+static DECLARE_DELAYED_WORK(fd_timeout, floppy_shutdown);
static const char *timeout_message;

static void is_alive(const char *func, const char *message)
{
/* this routine checks whether the floppy driver is "alive" */
if (test_bit(0, &fdc_busy) && command_status < 2 &&
- !timer_pending(&fd_timeout)) {
+ !delayed_work_pending(&fd_timeout)) {
DPRINT("%s: timeout handler died. %s\n", func, message);
}
}
@@ -663,15 +664,18 @@ static int output_log_pos;

static void __reschedule_timeout(int drive, const char *message)
{
+ unsigned long delay;
+
if (drive == current_reqD)
drive = current_drive;
- del_timer(&fd_timeout);
+
if (drive < 0 || drive >= N_DRIVE) {
- fd_timeout.expires = jiffies + 20UL * HZ;
+ delay = 20UL * HZ;
drive = 0;
} else
- fd_timeout.expires = jiffies + UDP->timeout;
- add_timer(&fd_timeout);
+ delay = UDP->timeout;
+
+ queue_delayed_work(floppy_workq, &fd_timeout, delay);
if (UDP->flags & FD_DEBUG)
DPRINT("reschedule timeout %s\n", message);
timeout_message = message;
@@ -886,11 +890,11 @@ static int _lock_fdc(int drive, bool int

set_current_state(TASK_RUNNING);
remove_wait_queue(&fdc_wait, &wait);
- flush_scheduled_work();
+ flush_workqueue(floppy_workq);
}
command_status = FD_COMMAND_NONE;

- __reschedule_timeout(drive, "lock fdc");
+ reschedule_timeout(drive, "lock fdc");
set_fdc(drive);
return 0;
}
@@ -901,23 +905,15 @@ static int _lock_fdc(int drive, bool int
/* unlocks the driver */
static void unlock_fdc(void)
{
- unsigned long flags;
-
- raw_cmd = NULL;
if (!test_bit(0, &fdc_busy))
DPRINT("FDC access conflict!\n");

- if (do_floppy)
- DPRINT("device interrupt still active at FDC release: %pf!\n",
- do_floppy);
+ raw_cmd = NULL;
command_status = FD_COMMAND_NONE;
- spin_lock_irqsave(&floppy_lock, flags);
- del_timer(&fd_timeout);
+ __cancel_delayed_work(&fd_timeout);
+ do_floppy = NULL;
cont = NULL;
clear_bit(0, &fdc_busy);
- if (current_req || blk_peek_request(floppy_queue))
- do_fd_request(floppy_queue);
- spin_unlock_irqrestore(&floppy_lock, flags);
wake_up(&fdc_wait);
}

@@ -989,26 +985,24 @@ static DECLARE_WORK(floppy_work, NULL);

static void schedule_bh(void (*handler)(void))
{
+ WARN_ON(work_pending(&floppy_work));
+
PREPARE_WORK(&floppy_work, (work_func_t)handler);
- schedule_work(&floppy_work);
+ queue_work(floppy_workq, &floppy_work);
}

-static DEFINE_TIMER(fd_timer, NULL, 0, 0);
+static DECLARE_DELAYED_WORK(fd_timer, NULL);

static void cancel_activity(void)
{
- unsigned long flags;
-
- spin_lock_irqsave(&floppy_lock, flags);
do_floppy = NULL;
- PREPARE_WORK(&floppy_work, (work_func_t)empty);
- del_timer(&fd_timer);
- spin_unlock_irqrestore(&floppy_lock, flags);
+ cancel_delayed_work_sync(&fd_timer);
+ cancel_work_sync(&floppy_work);
}

/* this function makes sure that the disk stays in the drive during the
* transfer */
-static void fd_watchdog(void)
+static void fd_watchdog(struct work_struct *arg)
{
debug_dcl(DP->flags, "calling disk change from watchdog\n");

@@ -1018,21 +1012,21 @@ static void fd_watchdog(void)
cont->done(0);
reset_fdc();
} else {
- del_timer(&fd_timer);
- fd_timer.function = (timeout_fn)fd_watchdog;
- fd_timer.expires = jiffies + HZ / 10;
- add_timer(&fd_timer);
+ cancel_delayed_work(&fd_timer);
+ PREPARE_DELAYED_WORK(&fd_timer, fd_watchdog);
+ queue_delayed_work(floppy_workq,
+ &fd_timer, HZ / 10);
}
}

static void main_command_interrupt(void)
{
- del_timer(&fd_timer);
+ cancel_delayed_work(&fd_timer);
cont->interrupt();
}

/* waits for a delay (spinup or select) to pass */
-static int fd_wait_for_completion(unsigned long delay, timeout_fn function)
+static int fd_wait_for_completion(unsigned long expires, work_func_t function)
{
if (FDCS->reset) {
reset_fdc(); /* do the reset during sleep to win time
@@ -1041,11 +1035,11 @@ static int fd_wait_for_completion(unsign
return 1;
}

- if (time_before(jiffies, delay)) {
- del_timer(&fd_timer);
- fd_timer.function = function;
- fd_timer.expires = delay;
- add_timer(&fd_timer);
+ if (time_before(jiffies, expires)) {
+ cancel_delayed_work(&fd_timer);
+ PREPARE_DELAYED_WORK(&fd_timer, function);
+ queue_delayed_work(floppy_workq, &fd_timer,
+ expires - jiffies);
return 1;
}
return 0;
@@ -1394,7 +1388,7 @@ static int fdc_dtr(void)
*/
FDCS->dtr = raw_cmd->rate & 3;
return fd_wait_for_completion(jiffies + 2UL * HZ / 100,
- (timeout_fn)floppy_ready);
+ (work_func_t)floppy_ready);
} /* fdc_dtr */

static void tell_sector(void)
@@ -1499,7 +1493,7 @@ static void setup_rw_floppy(void)
int flags;
int dflags;
unsigned long ready_date;
- timeout_fn function;
+ work_func_t function;

flags = raw_cmd->flags;
if (flags & (FD_RAW_READ | FD_RAW_WRITE))
@@ -1513,9 +1507,9 @@ static void setup_rw_floppy(void)
*/
if (time_after(ready_date, jiffies + DP->select_delay)) {
ready_date -= DP->select_delay;
- function = (timeout_fn)floppy_start;
+ function = (work_func_t)floppy_start;
} else
- function = (timeout_fn)setup_rw_floppy;
+ function = (work_func_t)setup_rw_floppy;

/* wait until the floppy is spinning fast enough */
if (fd_wait_for_completion(ready_date, function))
@@ -1545,7 +1539,7 @@ static void setup_rw_floppy(void)
inr = result();
cont->interrupt();
} else if (flags & FD_RAW_NEED_DISK)
- fd_watchdog();
+ fd_watchdog(NULL);
}

static int blind_seek;
@@ -1855,20 +1849,22 @@ static void show_floppy(void)
pr_info("do_floppy=%pf\n", do_floppy);
if (work_pending(&floppy_work))
pr_info("floppy_work.func=%pf\n", floppy_work.func);
- if (timer_pending(&fd_timer))
- pr_info("fd_timer.function=%pf\n", fd_timer.function);
- if (timer_pending(&fd_timeout)) {
- pr_info("timer_function=%pf\n", fd_timeout.function);
- pr_info("expires=%lu\n", fd_timeout.expires - jiffies);
- pr_info("now=%lu\n", jiffies);
- }
+ if (delayed_work_pending(&fd_timer))
+ pr_info("delayed work.function=%p expires=%ld\n",
+ fd_timer.work.func,
+ fd_timer.timer.expires - jiffies);
+ if (delayed_work_pending(&fd_timeout))
+ pr_info("timer_function=%p expires=%ld\n",
+ fd_timeout.work.func,
+ fd_timeout.timer.expires - jiffies);
+
pr_info("cont=%p\n", cont);
pr_info("current_req=%p\n", current_req);
pr_info("command_status=%d\n", command_status);
pr_info("\n");
}

-static void floppy_shutdown(unsigned long data)
+static void floppy_shutdown(struct work_struct *arg)
{
unsigned long flags;

@@ -1923,7 +1919,7 @@ static int start_motor(void (*function)(

/* wait_for_completion also schedules reset if needed. */
return fd_wait_for_completion(DRS->select_date + DP->select_delay,
- (timeout_fn)function);
+ (work_func_t)function);
}

static void floppy_ready(void)
@@ -2853,6 +2849,20 @@ static int make_raw_rw_request(void)
return 2;
}

+static struct request *get_fd_request(void)
+{
+ struct request *req;
+
+ spin_lock_bh(floppy_queue->queue_lock);
+ req = blk_fetch_request(floppy_queue);
+ if (!req)
+ unlock_fdc();
+ spin_unlock_bh(floppy_queue->queue_lock);
+
+ return req;
+}
+
+
static void redo_fd_request(void)
{
int drive;
@@ -2864,16 +2874,10 @@ static void redo_fd_request(void)

do_request:
if (!current_req) {
- struct request *req;
+ struct request *req = get_fd_request();

- spin_lock_irq(floppy_queue->queue_lock);
- req = blk_fetch_request(floppy_queue);
- spin_unlock_irq(floppy_queue->queue_lock);
- if (!req) {
- do_floppy = NULL;
- unlock_fdc();
+ if (!req)
return;
- }
current_req = req;
}
drive = (long)current_req->rq_disk->private_data;
@@ -2949,13 +2953,17 @@ static void do_fd_request(struct request
current_req->cmd_flags);
return;
}
- if (test_bit(0, &fdc_busy)) {
+ if (test_and_set_bit(0, &fdc_busy)) {
/* fdc busy, this new request will be treated when the
current one is done */
is_alive(__func__, "old request running");
return;
}
- lock_fdc(MAXTIMEOUT, false);
+
+ command_status = FD_COMMAND_NONE;
+
+ __reschedule_timeout(MAXTIMEOUT, "fd_request");
+ set_fdc(0);
process_fd_request();
is_alive(__func__, "");
}
@@ -4211,10 +4219,16 @@ static int __init floppy_init(void)
if (err)
goto out_unreg_blkdev;

+ floppy_workq = create_singlethread_workqueue("floppy");
+ if (!floppy_workq) {
+ err = -ENOMEM;
+ goto out_unreg_driver;
+ }
+
floppy_queue = blk_init_queue(do_fd_request, &floppy_lock);
if (!floppy_queue) {
err = -ENOMEM;
- goto out_unreg_driver;
+ goto out_destroy_workq;
}
blk_queue_max_hw_sectors(floppy_queue, 64);

@@ -4247,7 +4261,7 @@ static int __init floppy_init(void)
use_virtual_dma = can_use_virtual_dma & 1;
fdc_state[0].address = FDC1;
if (fdc_state[0].address == -1) {
- del_timer(&fd_timeout);
+ cancel_delayed_work(&fd_timeout);
err = -ENODEV;
goto out_unreg_region;
}
@@ -4258,7 +4272,7 @@ static int __init floppy_init(void)
fdc = 0; /* reset fdc in case of unexpected interrupt */
err = floppy_grab_irq_and_dma();
if (err) {
- del_timer(&fd_timeout);
+ cancel_delayed_work(&fd_timeout);
err = -EBUSY;
goto out_unreg_region;
}
@@ -4315,13 +4329,13 @@ static int __init floppy_init(void)
user_reset_fdc(-1, FD_RESET_ALWAYS, false);
}
fdc = 0;
- del_timer(&fd_timeout);
+ cancel_delayed_work(&fd_timeout);
current_drive = 0;
initialized = true;
if (have_no_fdc) {
DPRINT("no floppy controllers found\n");
err = have_no_fdc;
- goto out_flush_work;
+ goto out_release_dma;
}

for (drive = 0; drive < N_DRIVE; drive++) {
@@ -4336,7 +4350,7 @@ static int __init floppy_init(void)

err = platform_device_register(&floppy_device[drive]);
if (err)
- goto out_flush_work;
+ goto out_release_dma;

err = device_create_file(&floppy_device[drive].dev,
&dev_attr_cmos);
@@ -4355,13 +4369,14 @@ static int __init floppy_init(void)

out_unreg_platform_dev:
platform_device_unregister(&floppy_device[drive]);
-out_flush_work:
- flush_scheduled_work();
+out_release_dma:
if (atomic_read(&usage_count))
floppy_release_irq_and_dma();
out_unreg_region:
blk_unregister_region(MKDEV(FLOPPY_MAJOR, 0), 256);
blk_cleanup_queue(floppy_queue);
+out_destroy_workq:
+ destroy_workqueue(floppy_workq);
out_unreg_driver:
platform_driver_unregister(&floppy_driver);
out_unreg_blkdev:
@@ -4426,7 +4441,7 @@ static int floppy_grab_irq_and_dma(void)
* We might have scheduled a free_irq(), wait it to
* drain first:
*/
- flush_scheduled_work();
+ flush_workqueue(floppy_workq);

if (fd_request_irq()) {
DPRINT("Unable to grab IRQ%d for the floppy driver\n",
@@ -4518,9 +4533,9 @@ static void floppy_release_irq_and_dma(v
pr_info("motor off timer %d still active\n", drive);
#endif

- if (timer_pending(&fd_timeout))
+ if (delayed_work_pending(&fd_timeout))
pr_info("floppy timer still active:%s\n", timeout_message);
- if (timer_pending(&fd_timer))
+ if (delayed_work_pending(&fd_timer))
pr_info("auxiliary floppy timer still active\n");
if (work_pending(&floppy_work))
pr_info("work still pending\n");
@@ -4580,8 +4595,9 @@ static void __exit floppy_module_exit(vo
put_disk(disks[drive]);
}

- del_timer_sync(&fd_timeout);
- del_timer_sync(&fd_timer);
+ cancel_delayed_work_sync(&fd_timeout);
+ cancel_delayed_work_sync(&fd_timer);
+ destroy_workqueue(floppy_workq);
blk_cleanup_queue(floppy_queue);

if (atomic_read(&usage_count))

2010-06-11 18:56:26

by Linus Torvalds

[permalink] [raw]

Subject: Re: [RFC] floppy: use single threaded workqueue

On Fri, Jun 11, 2010 at 10:32 AM, Stephen Hemminger
<[email protected]> wrote:
> Fix races between timers and bottom half by using delayed work
> and a single threaded queue.
>
> Still needs more testing.

Looks simple and straightforward to me. Ack. (And ack on the "needs
more testing" part too, of course).

Linus

2010-06-13 19:23:33

by Thomas Meyer

[permalink] [raw]

Subject: Re: [PATCH 0/8] floppy: cleanup patches

----- Original Message -----
From: "Stephen Hemminger" <[email protected]>
> On Thu, 10 Jun 2010 08:43:32 +0200
> Thomas Meyer <[email protected]> wrote:
>
>> Am Mittwoch, den 09.06.2010, 11:34 -0700 schrieb Stephen Hemminger:
>> > When going through floppy driver addressing some of the timing
>> > bugs, I also saw these leftover things that ought to be cleaned up.
>> >
>>
>> With your patches applied on top of 2.6.34 I still get this timeout
>> while resuming from ram:
>
> Not clear, did the problem occur before my patches?

Yes, the timeout occured before your patches, i.e. plain 2.6.34.

> The posted set
> of patches should not change visible functionalty or fix any observable
> bug. Floppy driver would be bug for bug the same.

I guess, it's okay then.

kind regards
thomas

2010-07-21 20:03:11

by Marcelo Tosatti

[permalink] [raw]

Subject: Re: [RFC] floppy: use single threaded workqueue

On Fri, Jun 11, 2010 at 10:32:23AM -0700, Stephen Hemminger wrote:
> Fix races between timers and bottom half by using delayed work
> and a single threaded queue.
>
> Still needs more testing.
>
> Signed-off-by: Stephen Hemminger <[email protected]>
>
> ---
> drivers/block/floppy.c | 172 ++++++++++++++++++++++++++-----------------------
> 1 file changed, 94 insertions(+), 78 deletions(-)

Stephen,

Can confirm this fixes the following oops seen under KVM guests.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000035
IP: [<ffffffff812141f0>] setup_rw_floppy+0x229/0x2ca
PGD 284b8067 PUD 3d79f067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/resource
CPU 1
Modules linked in: xt_tcpudp iptable_filter ip_tables x_tables
scsi_wait_scan uhci_hcd ohci_hcd ehci_hcd

Pid: 28, comm: events/1 Not tainted 2.6.35-rc5-35033-g14e84ff #221
/Bochs RIP: 0010:[<ffffffff812141f0>] [<ffffffff812141f0>] setup_rw_floppy+0x229/0x2ca
RSP: 0018:ffff88003e1a1e60 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000009 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000286 RDI: 0000000000000000
RBP: 00000000000000da R08: 0000000000000000 R09: ffffffff81214bdb
R10: 0000000000000000 R11: ffffffff817f7660 R12: 0000000000000000
R13: 0000000000000008 R14: ffffffff81214bdb R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff880001a40000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000035 CR3: 000000003a428000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process events/1 (pid: 28, threadinfo ffff88003e1a0000, task
ffff88003e19a8f0)
Stack:
0000000000000000 ffff88003e1a1ef8 ffffffff81692f00 ffff880001a56680
<0> ffff88003e19a8f0 ffffffff810466f1 0000000000000000 ffff88003e19a8f0
<0> ffffffff810495ad ffff88003e1a1ea8 ffff88003e1a1ea8 ffff88003e1a1ef8
Call Trace:
[<ffffffff810466f1>] ? worker_thread+0x150/0x1f0
[<ffffffff810495ad>] ? autoremove_wake_function+0x0/0x2e
[<ffffffff810465a1>] ? worker_thread+0x0/0x1f0
[<ffffffff810492bf>] ? kthread+0x79/0x81
[<ffffffff81002be4>] ? kernel_thread_helper+0x4/0x10

2010-07-22 19:11:04

by Linus Torvalds

[permalink] [raw]

Subject: Re: [RFC] floppy: use single threaded workqueue

On Wed, Jul 21, 2010 at 12:55 PM, Marcelo Tosatti <[email protected]> wrote:
>
> Can confirm this fixes the following oops seen under KVM guests.
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000035

Hmm. That does make me want to merge it early, but at the same time I
suspect it has gotten almost no testing on real hardware. So despite
looking "obvious" and fixing one bug, I'm a bit nervous and I suspect
that the prudent thing to do is still to take it early in the 2.3.36
merge window, and just mark it for stable.

Is anybody a big floppy user (on _real_ hardware, preferably with more
than a couple machines) and could test it? I do feel kind of stupid
for not taking it, but I don't have the hardware or the interest in
really installing any in my current machines, and even if it looks
fine and fixes some issue, I just don't feel safe enough without some
more confirmation... As far as I know, the floppy oops thing has
_only_ been seen on KVM (likely due to timing that just doesn't happen
on real hardware), and while the KVM floppy emulation is probably
fairly good, it also clearly isn't equivalent to hardware.

Linus

2010-07-22 22:23:18

by David Lang

[permalink] [raw]

Subject: Re: [RFC] floppy: use single threaded workqueue

I don't use floopies much anymore, but I've got a fair number of machines
around that have floppy drives (purchased across about 6 years with from a
couple vendors and a variety of motherboards)

If someone can go to the effort of documenting what testing you want done
I may be able to do it.

I would be installing new builds onto the machines to test them, so
you can do testing that locks up the boxes without impacting anything.

the easier it is to do the tests, the more machines I can test on
(bootable CD images that I could just pop in and then check the screen
later would be ideal ;-)

these systems will not have network access.

David Lang

On Thu, 22 Jul 2010, Linus
Torvalds wrote:

> On Wed, Jul 21, 2010 at 12:55 PM, Marcelo Tosatti <[email protected]> wrote:
>>
>> Can confirm this fixes the following oops seen under KVM guests.
>>
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000035
>
> Hmm. That does make me want to merge it early, but at the same time I
> suspect it has gotten almost no testing on real hardware. So despite
> looking "obvious" and fixing one bug, I'm a bit nervous and I suspect
> that the prudent thing to do is still to take it early in the 2.3.36
> merge window, and just mark it for stable.
>
> Is anybody a big floppy user (on _real_ hardware, preferably with more
> than a couple machines) and could test it? I do feel kind of stupid
> for not taking it, but I don't have the hardware or the interest in
> really installing any in my current machines, and even if it looks
> fine and fixes some issue, I just don't feel safe enough without some
> more confirmation... As far as I know, the floppy oops thing has
> _only_ been seen on KVM (likely due to timing that just doesn't happen
> on real hardware), and while the KVM floppy emulation is probably
> fairly good, it also clearly isn't equivalent to hardware.
>
> Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2010-07-22 23:01:10

by Linus Torvalds

[permalink] [raw]

Subject: Re: [RFC] floppy: use single threaded workqueue

On Thu, Jul 22, 2010 at 3:22 PM, <[email protected]> wrote:
>
> I don't use floopies much anymore, but I've got a fair number of machines
> around that have floppy drives (purchased across about 6 years with from a
> couple vendors and a variety of motherboards)
>
> If someone can go to the effort of documenting what testing you want done I
> may be able to do it.

It doesn't really need to be all that extensive. The "interesting"
operations tend to be (apart from just reading and writing data, of
course):

- formatting a floppy (it's special, and historically relatively
often broke without people noticing for a while)

- the floppy format auto-detection

- floppy disk change detection

so just formatting floppies to a couple of different formats (ie do
you possibly have DD and HD floppies?), mkfs them and write something
to them, and then moving them to another machine and checking that
reading the data off them through the auto-detected formats (/dev/fd0)
works and gives the right results (just sha1sum the files).

The disk change detect can be a bit hard to see. It's unreliable with
some media, so iirc we always flush caches after the last close (we
didn't use to do that, and the disk change detection needed to just be
reliable - and you could test that the disk change logic worked by
just timing a read of the media and seeing if it came out of the
cache). I think for floppies, the thing to see is if format detection
works correctly when you switch between formats. Perhaps also the
read-only marker (ie switch the floppy between read-only and
read-write, and see that the status of the floppy is correctly
noticed: you should get a nice error if you try to write a read-only
floppy, rather than getting IO errors).

I can't think of anything else relevant.

Linus

2010-08-22 09:07:56

by Amos Jianjun Kong

[permalink] [raw]

Subject: Re: [RFC] floppy: use single threaded workqueue

On Fri, Jul 23, 2010 at 6:22 AM, <[email protected]> wrote:
> I don't use floopies much anymore, but I've got a fair number of machines
> around that have floppy drives (purchased across about 6 years with from a
> couple vendors and a variety of motherboards)
>
> If someone can go to the effort of documenting what testing you want done I
> may be able to do it.
>
> I would be installing new builds onto the machines to test them, so you can
> do testing that locks up the boxes without impacting anything.
>
> the easier it is to do the tests, the more machines I can test on (bootable
> CD images that I could just pop in and then check the screen later would be
> ideal ;-)
>
> these systems will not have network access.
>
> David Lang

Hello Davide,

Any feedback of testing this patch on real hardware with Linus' methods ?
Thanks.

Amos

2010-08-23 22:22:32

by Stephen Hemminger

[permalink] [raw]

Subject: Re: [RFC] floppy: use single threaded workqueue

On Sun, 22 Aug 2010 17:07:49 +0800
Amos Kong <[email protected]> wrote:

> On Fri, Jul 23, 2010 at 6:22 AM, <[email protected]> wrote:
> > I don't use floopies much anymore, but I've got a fair number of machines
> > around that have floppy drives (purchased across about 6 years with from a
> > couple vendors and a variety of motherboards)
> >
> > If someone can go to the effort of documenting what testing you want done I
> > may be able to do it.
> >
> > I would be installing new builds onto the machines to test them, so you can
> > do testing that locks up the boxes without impacting anything.
> >
> > the easier it is to do the tests, the more machines I can test on (bootable
> > CD images that I could just pop in and then check the screen later would be
> > ideal ;-)
> >
> > these systems will not have network access.
> >
> > David Lang
>
>
> Hello Davide,
>
> Any feedback of testing this patch on real hardware with Linus' methods ?
> Thanks.

I did some testing, saw a glitch but wasn't sure if it was a 2.6.35
or floppy specific. But haven't investigated furthur because of impending
Vyatta product release. Plan to revisit this week.