2009-01-17 21:46:16

by Stefan Richter

[permalink] [raw]
Subject: [PATCH 0/4] firewire: cope with temporary connection loss

In the following series, the 1st patch enables the firewire stack to
tolerate brief disappearance of nodes similarly as the ieee1394 stack
has been taught in Linux 2.6.28.

This finally allows me to push the patch
Date: Wed, 19 Mar 2008 22:02:40 +0100 (CET)
From: Stefan Richter <[email protected]>
Subject: firewire: insist on successive self ID complete events
into the mainline, which is a fix that potentially prevents kernel
panics. This fix had the unfortunate side effect that the firewire-core
topology code destroyed and re-created node representations more often
than before. Until now, this caused high-level drivers (firewire-sbp2
and userspace drivers) to be unbound and rebound, so that for example
mounted filesystems were lost.

Besides the condition which triggered the "insist on successive self ID
complete events" precautions, there are also other occasions at which
nodes vanish briefly and reappear. See the changelog of patch 1/4.
firewire-core is now able to keep drivers bound until the nodes come
back, if they come back quickly.

The latter 3 patches are cleanups.

[PATCH 1/4] firewire: keep highlevel drivers attached during brief connection loss
[PATCH 2/4] firewire: core: clean up includes
[PATCH 3/4] firewire: core: move some functions
[PATCH 4/4] firewire: core: remove condition which is always false

drivers/firewire/fw-device.c | 231 +++++++++++++++++++++++------------
drivers/firewire/fw-device.h | 14 +-
2 files changed, 166 insertions(+), 79 deletions(-)
--
Stefan Richter
-=====-==--= ---= =---=
http://arcgraph.de/sr/


2009-01-17 21:46:45

by Stefan Richter

[permalink] [raw]
Subject: [PATCH 1/4] firewire: keep highlevel drivers attached during brief connection loss

Date:
From: Stefan Richter <[email protected]>
Subject: firewire: keep highlevel drivers attached during brief connection loss

There are situations when nodes vanish from the bus and come back in
quickly thereafter:
- When certain bus-powered hubs are plugged in,
- when certain devices are plugged into 6-port hubs,
- when certain disk enclosures are switched from self-power to bus
power or vice versa and break the daisy chain during the transition,
- when the user plugs a cable out and quickly plugs it back in, e.g.
to reorder a daisy chain (works on Mac OS X if done quickly enough),
- when certain hubs temporarily malfunction during high bus traffic.

Until now, firewire-core reported affected nodes as lost to the
highlevel drivers (firewire-sbp2 and userspace drivers). We now delay
the destruction of device representations until after at least two
seconds after the last bus reset. If a "new" device is detected in this
period whose bus information block and root directory header match that
of a device which is pending for deletion, we resurrect that device and
send update calls to highlevel drivers.

Signed-off-by: Stefan Richter <[email protected]>
---
drivers/firewire/fw-device.c | 121 ++++++++++++++++++++++++++++-------
drivers/firewire/fw-device.h | 1
2 files changed, 100 insertions(+), 22 deletions(-)

Index: linux/drivers/firewire/fw-device.c
===================================================================
--- linux.orig/drivers/firewire/fw-device.c
+++ linux/drivers/firewire/fw-device.c
@@ -25,6 +25,7 @@
#include <linux/device.h>
#include <linux/delay.h>
#include <linux/idr.h>
+#include <linux/jiffies.h>
#include <linux/string.h>
#include <linux/mutex.h>
#include <linux/rwsem.h>
@@ -632,12 +633,38 @@ struct fw_device *fw_device_get_by_devt(
return device;
}

+/*
+ * These defines control the retry behavior for reading the config
+ * rom. It shouldn't be necessary to tweak these; if the device
+ * doesn't respond to a config rom read within 10 seconds, it's not
+ * going to respond at all. As for the initial delay, a lot of
+ * devices will be able to respond within half a second after bus
+ * reset. On the other hand, it's not really worth being more
+ * aggressive than that, since it scales pretty well; if 10 devices
+ * are plugged in, they're all getting read within one second.
+ */
+
+#define MAX_RETRIES 10
+#define RETRY_DELAY (3 * HZ)
+#define INITIAL_DELAY (HZ / 2)
+#define SHUTDOWN_DELAY (2 * HZ)
+
static void fw_device_shutdown(struct work_struct *work)
{
struct fw_device *device =
container_of(work, struct fw_device, work.work);
int minor = MINOR(device->device.devt);

+ if (time_is_after_jiffies(device->card->reset_jiffies + SHUTDOWN_DELAY)) {
+ schedule_delayed_work(&device->work, SHUTDOWN_DELAY);
+ return;
+ }
+
+ if (atomic_cmpxchg(&device->state,
+ FW_DEVICE_GONE,
+ FW_DEVICE_SHUTDOWN) != FW_DEVICE_GONE)
+ return;
+
fw_device_cdev_remove(device);
device_for_each_child(&device->device, NULL, shutdown_unit);
device_unregister(&device->device);
@@ -645,6 +672,7 @@ static void fw_device_shutdown(struct wo
down_write(&fw_device_rwsem);
idr_remove(&fw_device_idr, minor);
up_write(&fw_device_rwsem);
+
fw_device_put(device);
}

@@ -652,25 +680,63 @@ static struct device_type fw_device_type
.release = fw_device_release,
};

+static void fw_device_update(struct work_struct *work);
+
/*
- * These defines control the retry behavior for reading the config
- * rom. It shouldn't be necessary to tweak these; if the device
- * doesn't respond to a config rom read within 10 seconds, it's not
- * going to respond at all. As for the initial delay, a lot of
- * devices will be able to respond within half a second after bus
- * reset. On the other hand, it's not really worth being more
- * aggressive than that, since it scales pretty well; if 10 devices
- * are plugged in, they're all getting read within one second.
+ * If a device was pending for deletion because its node went away but its
+ * bus info block and root directory header matches that of a newly discovered
+ * device, revive the existing fw_device.
+ * The newly allocated fw_device becomes obsolete instead.
*/
+static int lookup_existing_device(struct device *dev, void *data)
+{
+ struct fw_device *old = fw_device(dev);
+ struct fw_device *new = data;
+ struct fw_card *card = new->card;
+ int match = 0;
+
+ down_read(&fw_device_rwsem); /* serialize config_rom access */
+ spin_lock_irq(&card->lock); /* serialize node access */
+
+ if (memcmp(old->config_rom, new->config_rom, 6 * 4) == 0 &&
+ atomic_cmpxchg(&old->state,
+ FW_DEVICE_GONE,
+ FW_DEVICE_RUNNING) == FW_DEVICE_GONE) {
+ struct fw_node *current_node = new->node;
+ struct fw_node *obsolete_node = old->node;
+
+ new->node = obsolete_node;
+ new->node->data = new;
+ old->node = current_node;
+ old->node->data = old;

-#define MAX_RETRIES 10
-#define RETRY_DELAY (3 * HZ)
-#define INITIAL_DELAY (HZ / 2)
+ old->max_speed = new->max_speed;
+ old->node_id = current_node->node_id;
+ smp_wmb(); /* update node_id before generation */
+ old->generation = card->generation;
+ old->config_rom_retries = 0;
+ fw_notify("rediscovered device %s\n", dev_name(dev));
+
+ PREPARE_DELAYED_WORK(&old->work, fw_device_update);
+ schedule_delayed_work(&old->work, 0);
+
+ if (current_node == card->root_node)
+ fw_schedule_bm_work(card, 0);
+
+ match = 1;
+ }
+
+ spin_unlock_irq(&card->lock);
+ up_read(&fw_device_rwsem);
+
+ return match;
+}

static void fw_device_init(struct work_struct *work)
{
struct fw_device *device =
container_of(work, struct fw_device, work.work);
+ struct device *revived_dev;
int minor, err;

/*
@@ -694,6 +760,15 @@ static void fw_device_init(struct work_s
return;
}

+ revived_dev = device_find_child(device->card->device,
+ device, lookup_existing_device);
+ if (revived_dev) {
+ put_device(revived_dev);
+ fw_device_release(&device->device);
+
+ return;
+ }
+
device_initialize(&device->device);

fw_device_get(device);
@@ -732,9 +807,10 @@ static void fw_device_init(struct work_s
* fw_node_event().
*/
if (atomic_cmpxchg(&device->state,
- FW_DEVICE_INITIALIZING,
- FW_DEVICE_RUNNING) == FW_DEVICE_SHUTDOWN) {
- fw_device_shutdown(work);
+ FW_DEVICE_INITIALIZING,
+ FW_DEVICE_RUNNING) == FW_DEVICE_GONE) {
+ PREPARE_DELAYED_WORK(&device->work, fw_device_shutdown);
+ schedule_delayed_work(&device->work, SHUTDOWN_DELAY);
} else {
if (device->config_rom_retries)
fw_notify("created device %s: GUID %08x%08x, S%d00, "
@@ -845,8 +921,8 @@ static void fw_device_refresh(struct wor

case REREAD_BIB_UNCHANGED:
if (atomic_cmpxchg(&device->state,
- FW_DEVICE_INITIALIZING,
- FW_DEVICE_RUNNING) == FW_DEVICE_SHUTDOWN)
+ FW_DEVICE_INITIALIZING,
+ FW_DEVICE_RUNNING) == FW_DEVICE_GONE)
goto gone;

fw_device_update(work);
@@ -877,8 +953,8 @@ static void fw_device_refresh(struct wor
create_units(device);

if (atomic_cmpxchg(&device->state,
- FW_DEVICE_INITIALIZING,
- FW_DEVICE_RUNNING) == FW_DEVICE_SHUTDOWN)
+ FW_DEVICE_INITIALIZING,
+ FW_DEVICE_RUNNING) == FW_DEVICE_GONE)
goto gone;

fw_notify("refreshed device %s\n", dev_name(&device->device));
@@ -888,8 +964,9 @@ static void fw_device_refresh(struct wor
give_up:
fw_notify("giving up on refresh of device %s\n", dev_name(&device->device));
gone:
- atomic_set(&device->state, FW_DEVICE_SHUTDOWN);
- fw_device_shutdown(work);
+ atomic_set(&device->state, FW_DEVICE_GONE);
+ PREPARE_DELAYED_WORK(&device->work, fw_device_shutdown);
+ schedule_delayed_work(&device->work, SHUTDOWN_DELAY);
out:
if (node_id == card->root_node->node_id)
fw_schedule_bm_work(card, 0);
@@ -994,9 +1071,9 @@ void fw_node_event(struct fw_card *card,
*/
device = node->data;
if (atomic_xchg(&device->state,
- FW_DEVICE_SHUTDOWN) == FW_DEVICE_RUNNING) {
+ FW_DEVICE_GONE) == FW_DEVICE_RUNNING) {
PREPARE_DELAYED_WORK(&device->work, fw_device_shutdown);
- schedule_delayed_work(&device->work, 0);
+ schedule_delayed_work(&device->work, SHUTDOWN_DELAY);
}
break;
}
Index: linux/drivers/firewire/fw-device.h
===================================================================
--- linux.orig/drivers/firewire/fw-device.h
+++ linux/drivers/firewire/fw-device.h
@@ -29,6 +29,7 @@
enum fw_device_state {
FW_DEVICE_INITIALIZING,
FW_DEVICE_RUNNING,
+ FW_DEVICE_GONE,
FW_DEVICE_SHUTDOWN,
};


--
Stefan Richter
-=====-==--= ---= =---=
http://arcgraph.de/sr/

2009-01-17 21:47:20

by Stefan Richter

[permalink] [raw]
Subject: [PATCH 2/4] firewire: core: clean up includes

Signed-off-by: Stefan Richter <[email protected]>
---
drivers/firewire/fw-device.c | 20 +++++++++++---------
drivers/firewire/fw-device.h | 13 +++++++++++--
2 files changed, 22 insertions(+), 11 deletions(-)

Index: linux/drivers/firewire/fw-device.c
===================================================================
--- linux.orig/drivers/firewire/fw-device.c
+++ linux/drivers/firewire/fw-device.c
@@ -18,24 +18,26 @@
* Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
*/

-#include <linux/module.h>
-#include <linux/wait.h>
-#include <linux/errno.h>
-#include <linux/kthread.h>
-#include <linux/device.h>
+#include <linux/ctype.h>
#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/errno.h>
#include <linux/idr.h>
#include <linux/jiffies.h>
-#include <linux/string.h>
+#include <linux/kobject.h>
+#include <linux/list.h>
#include <linux/mutex.h>
#include <linux/rwsem.h>
#include <linux/semaphore.h>
#include <linux/spinlock.h>
+#include <linux/string.h>
+#include <linux/workqueue.h>
+
#include <asm/system.h>
-#include <linux/ctype.h>
-#include "fw-transaction.h"
-#include "fw-topology.h"
+
#include "fw-device.h"
+#include "fw-topology.h"
+#include "fw-transaction.h"

void fw_csr_iterator_init(struct fw_csr_iterator *ci, u32 * p)
{
Index: linux/drivers/firewire/fw-device.h
===================================================================
--- linux.orig/drivers/firewire/fw-device.h
+++ linux/drivers/firewire/fw-device.h
@@ -19,11 +19,17 @@
#ifndef __fw_device_h
#define __fw_device_h

+#include <linux/device.h>
#include <linux/fs.h>
-#include <linux/cdev.h>
#include <linux/idr.h>
-#include <linux/rwsem.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
#include <linux/mutex.h>
+#include <linux/rwsem.h>
+#include <linux/sysfs.h>
+#include <linux/types.h>
+#include <linux/workqueue.h>
+
#include <asm/atomic.h>

enum fw_device_state {
@@ -39,6 +45,9 @@ struct fw_attribute_group {
struct attribute *attrs[11];
};

+struct fw_node;
+struct fw_card;
+
/*
* Note, fw_device.generation always has to be read before fw_device.node_id.
* Use SMP memory barriers to ensure this. Otherwise requests will be sent

--
Stefan Richter
-=====-==--= ---= =---=
http://arcgraph.de/sr/

2009-01-17 21:47:45

by Stefan Richter

[permalink] [raw]
Subject: [PATCH 3/4] firewire: core: move some functions

Signed-off-by: Stefan Richter <[email protected]>
---
drivers/firewire/fw-device.c | 90 +++++++++++++++++------------------
1 file changed, 44 insertions(+), 46 deletions(-)

Index: linux/drivers/firewire/fw-device.c
===================================================================
--- linux.orig/drivers/firewire/fw-device.c
+++ linux/drivers/firewire/fw-device.c
@@ -155,27 +155,6 @@ struct bus_type fw_bus_type = {
};
EXPORT_SYMBOL(fw_bus_type);

-static void fw_device_release(struct device *dev)
-{
- struct fw_device *device = fw_device(dev);
- struct fw_card *card = device->card;
- unsigned long flags;
-
- /*
- * Take the card lock so we don't set this to NULL while a
- * FW_NODE_UPDATED callback is being handled or while the
- * bus manager work looks at this node.
- */
- spin_lock_irqsave(&card->lock, flags);
- device->node->data = NULL;
- spin_unlock_irqrestore(&card->lock, flags);
-
- fw_node_put(device->node);
- kfree(device->config_rom);
- kfree(device);
- fw_card_put(card);
-}
-
int fw_device_enable_phys_dma(struct fw_device *device)
{
int generation = device->generation;
@@ -678,11 +657,53 @@ static void fw_device_shutdown(struct wo
fw_device_put(device);
}

+static void fw_device_release(struct device *dev)
+{
+ struct fw_device *device = fw_device(dev);
+ struct fw_card *card = device->card;
+ unsigned long flags;
+
+ /*
+ * Take the card lock so we don't set this to NULL while a
+ * FW_NODE_UPDATED callback is being handled or while the
+ * bus manager work looks at this node.
+ */
+ spin_lock_irqsave(&card->lock, flags);
+ device->node->data = NULL;
+ spin_unlock_irqrestore(&card->lock, flags);
+
+ fw_node_put(device->node);
+ kfree(device->config_rom);
+ kfree(device);
+ fw_card_put(card);
+}
+
static struct device_type fw_device_type = {
- .release = fw_device_release,
+ .release = fw_device_release,
};

-static void fw_device_update(struct work_struct *work);
+static int update_unit(struct device *dev, void *data)
+{
+ struct fw_unit *unit = fw_unit(dev);
+ struct fw_driver *driver = (struct fw_driver *)dev->driver;
+
+ if (is_fw_unit(dev) && driver != NULL && driver->update != NULL) {
+ down(&dev->sem);
+ driver->update(unit);
+ up(&dev->sem);
+ }
+
+ return 0;
+}
+
+static void fw_device_update(struct work_struct *work)
+{
+ struct fw_device *device =
+ container_of(work, struct fw_device, work.work);
+
+ fw_device_cdev_update(device);
+ device_for_each_child(&device->device, NULL, update_unit);
+}

/*
* If a device was pending for deletion because its node went away but its
@@ -850,29 +871,6 @@ static void fw_device_init(struct work_s
put_device(&device->device); /* our reference */
}

-static int update_unit(struct device *dev, void *data)
-{
- struct fw_unit *unit = fw_unit(dev);
- struct fw_driver *driver = (struct fw_driver *)dev->driver;
-
- if (is_fw_unit(dev) && driver != NULL && driver->update != NULL) {
- down(&dev->sem);
- driver->update(unit);
- up(&dev->sem);
- }
-
- return 0;
-}
-
-static void fw_device_update(struct work_struct *work)
-{
- struct fw_device *device =
- container_of(work, struct fw_device, work.work);
-
- fw_device_cdev_update(device);
- device_for_each_child(&device->device, NULL, update_unit);
-}
-
enum {
REREAD_BIB_ERROR,
REREAD_BIB_GONE,

--
Stefan Richter
-=====-==--= ---= =---=
http://arcgraph.de/sr/

2009-01-17 21:48:29

by Stefan Richter

[permalink] [raw]
Subject: [PATCH 4/4] firewire: core: remove condition which is always false

reread_bus_info_block() only gets to see devices whose config_rom_length
is at least 6 (ROM header, bus info block, root directory header).

Signed-off-by: Stefan Richter <[email protected]>
---
drivers/firewire/fw-device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/drivers/firewire/fw-device.c
===================================================================
--- linux.orig/drivers/firewire/fw-device.c
+++ linux/drivers/firewire/fw-device.c
@@ -887,15 +887,15 @@ static int reread_bus_info_block(struct
for (i = 0; i < 6; i++) {
if (read_rom(device, generation, i, &q) != RCODE_COMPLETE)
return REREAD_BIB_ERROR;

if (i == 0 && q == 0)
return REREAD_BIB_GONE;

- if (i > device->config_rom_length || q != device->config_rom[i])
+ if (q != device->config_rom[i])
return REREAD_BIB_CHANGED;
}

return REREAD_BIB_UNCHANGED;
}

static void fw_device_refresh(struct work_struct *work)

--
Stefan Richter
-=====-==--= ---= =---=
http://arcgraph.de/sr/

2009-01-24 19:35:55

by Stefan Richter

[permalink] [raw]
Subject: [PATCH 5/4] firewire: core: optimize card shutdown

This fixes a regression by "firewire: keep highlevel drivers attached
during brief connection loss": There were 2 seconds unnecessary waiting
added to the shutdown procedure of each controller.

We use card->link as status flag to signal the device handler that there
is no use to wait for a come-back.

Signed-off-by: Stefan Richter <[email protected]>
---
drivers/firewire/fw-card.c | 2 +-
drivers/firewire/fw-device.c | 6 ++++--
2 files changed, 5 insertions(+), 3 deletions(-)

Index: linux/drivers/firewire/fw-card.c
===================================================================
--- linux.orig/drivers/firewire/fw-card.c
+++ linux/drivers/firewire/fw-card.c
@@ -495,7 +495,7 @@ void fw_core_remove_card(struct fw_card
fw_core_initiate_bus_reset(card, 1);

mutex_lock(&card_mutex);
- list_del(&card->link);
+ list_del_init(&card->link);
mutex_unlock(&card_mutex);

/* Set up the dummy driver. */
Index: linux/drivers/firewire/fw-device.c
===================================================================
--- linux.orig/drivers/firewire/fw-device.c
+++ linux/drivers/firewire/fw-device.c
@@ -636,7 +636,8 @@ static void fw_device_shutdown(struct wo
container_of(work, struct fw_device, work.work);
int minor = MINOR(device->device.devt);

- if (time_is_after_jiffies(device->card->reset_jiffies + SHUTDOWN_DELAY)) {
+ if (time_is_after_jiffies(device->card->reset_jiffies + SHUTDOWN_DELAY)
+ && !list_empty(&device->card->link)) {
schedule_delayed_work(&device->work, SHUTDOWN_DELAY);
return;
}
@@ -1073,7 +1074,8 @@ void fw_node_event(struct fw_card *card,
if (atomic_xchg(&device->state,
FW_DEVICE_GONE) == FW_DEVICE_RUNNING) {
PREPARE_DELAYED_WORK(&device->work, fw_device_shutdown);
- schedule_delayed_work(&device->work, SHUTDOWN_DELAY);
+ schedule_delayed_work(&device->work,
+ list_empty(&card->link) ? 0 : SHUTDOWN_DELAY);
}
break;
}

--
Stefan Richter
-=====-==--= ---= ==---
http://arcgraph.de/sr/