2006-10-17 14:51:48

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH] Block on access to temporarily unavailable pci device


The existing implementation of pci_block_user_cfg_access() was recently
criticised for providing out of date information and for returning errors
on write, which applications won't be expecting.

This reimplementation uses an rw semaphore to block accesses. I examined
a couple of other alternatives (a mutex, which would unnecessarily
serialise kernel BIST users; a per-device mutex, which would take another
16 bytes per pci device; a custom queue), I felt the rwsem provided the
best tradeoffs.

I'll post the patch I used to test blocking device accesses separately.

Signed-off-by: Matthew Wilcox <[email protected]>

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index ea16805..29581a2 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -1,6 +1,7 @@
#include <linux/pci.h>
#include <linux/module.h>
#include <linux/ioport.h>
+#include <linux/rwsem.h>

#include "pci.h"

@@ -12,6 +13,18 @@ #include "pci.h"
static DEFINE_SPINLOCK(pci_lock);

/*
+ * Prevent the user from accessing PCI config space when it's unsafe to do
+ * so. Some devices require this during BIST and we're required to prevent
+ * it during D-state transitions. It could be made per-device, but it doesn't
+ * seem worthwhile for such a rare occurrence.
+ *
+ * User accesses are the 'writer' as only one is allowed at a time. Kernel
+ * blocking the user is the 'reader' as multiple devices can be blocked at
+ * the same time.
+ */
+static DECLARE_RWSEM(pci_user_sem);
+
+/*
* Wrappers for all PCI configuration access functions. They just check
* alignment, do locking and call the low-level functions pointed to
* by pci_dev->ops.
@@ -63,15 +76,6 @@ EXPORT_SYMBOL(pci_bus_write_config_byte)
EXPORT_SYMBOL(pci_bus_write_config_word);
EXPORT_SYMBOL(pci_bus_write_config_dword);

-static u32 pci_user_cached_config(struct pci_dev *dev, int pos)
-{
- u32 data;
-
- data = dev->saved_config_space[pos/sizeof(dev->saved_config_space[0])];
- data >>= (pos % sizeof(dev->saved_config_space[0])) * 8;
- return data;
-}
-
#define PCI_USER_READ_CONFIG(size,type) \
int pci_user_read_config_##size \
(struct pci_dev *dev, int pos, type *val) \
@@ -80,13 +84,12 @@ int pci_user_read_config_##size \
int ret = 0; \
u32 data = -1; \
if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \
+ down_write(&pci_user_sem); \
spin_lock_irqsave(&pci_lock, flags); \
- if (likely(!dev->block_ucfg_access)) \
- ret = dev->bus->ops->read(dev->bus, dev->devfn, \
+ ret = dev->bus->ops->read(dev->bus, dev->devfn, \
pos, sizeof(type), &data); \
- else if (pos < sizeof(dev->saved_config_space)) \
- data = pci_user_cached_config(dev, pos); \
spin_unlock_irqrestore(&pci_lock, flags); \
+ up_write(&pci_user_sem); \
*val = (type)data; \
return ret; \
}
@@ -98,11 +101,12 @@ int pci_user_write_config_##size \
unsigned long flags; \
int ret = -EIO; \
if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \
+ down_write(&pci_user_sem); \
spin_lock_irqsave(&pci_lock, flags); \
- if (likely(!dev->block_ucfg_access)) \
- ret = dev->bus->ops->write(dev->bus, dev->devfn, \
+ ret = dev->bus->ops->write(dev->bus, dev->devfn, \
pos, sizeof(type), val); \
spin_unlock_irqrestore(&pci_lock, flags); \
+ up_write(&pci_user_sem); \
return ret; \
}

@@ -117,21 +121,12 @@ PCI_USER_WRITE_CONFIG(dword, u32)
* pci_block_user_cfg_access - Block userspace PCI config reads/writes
* @dev: pci device struct
*
- * This function blocks any userspace PCI config accesses from occurring.
- * When blocked, any writes will be bit bucketed and reads will return the
- * data saved using pci_save_state for the first 64 bytes of config
- * space and return 0xff for all other config reads.
- **/
+ * When user access is blocked, any reads or writes to config space will
+ * sleep until access is unblocked again.
+ */
void pci_block_user_cfg_access(struct pci_dev *dev)
{
- unsigned long flags;
-
- pci_save_state(dev);
-
- /* spinlock to synchronize with anyone reading config space now */
- spin_lock_irqsave(&pci_lock, flags);
- dev->block_ucfg_access = 1;
- spin_unlock_irqrestore(&pci_lock, flags);
+ down_read(&pci_user_sem);
}
EXPORT_SYMBOL_GPL(pci_block_user_cfg_access);

@@ -140,14 +135,9 @@ EXPORT_SYMBOL_GPL(pci_block_user_cfg_acc
* @dev: pci device struct
*
* This function allows userspace PCI config accesses to resume.
- **/
+ */
void pci_unblock_user_cfg_access(struct pci_dev *dev)
{
- unsigned long flags;
-
- /* spinlock to synchronize with anyone reading saved config space */
- spin_lock_irqsave(&pci_lock, flags);
- dev->block_ucfg_access = 0;
- spin_unlock_irqrestore(&pci_lock, flags);
+ up_read(&pci_user_sem);
}
EXPORT_SYMBOL_GPL(pci_unblock_user_cfg_access);
diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index 2dde821..5d06837 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -6174,6 +6174,7 @@ static int ipr_reset_start_bist(struct i
int rc;

ENTER;
+ pci_save_state(ioa_cfg->pdev);
pci_block_user_cfg_access(ioa_cfg->pdev);
rc = pci_write_config_byte(ioa_cfg->pdev, PCI_BIST, PCI_BIST_START);

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 3632282..4dbcbbd 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -174,7 +174,6 @@ struct pci_dev {
unsigned int is_busmaster:1; /* device is busmaster */
unsigned int no_msi:1; /* device may not use msi */
unsigned int no_d1d2:1; /* only allow d0 or d3 */
- unsigned int block_ucfg_access:1; /* userspace config space access is blocked */
unsigned int broken_parity_status:1; /* Device generates false positive parity */
unsigned int msi_enabled:1;
unsigned int msix_enabled:1;


2006-10-17 14:54:07

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] Block on access to temporarily unavailable pci device

On Tue, Oct 17, 2006 at 08:51:46AM -0600, Matthew Wilcox wrote:
> I'll post the patch I used to test blocking device accesses separately.

Here it is. Not for applying.

Nacked-by: Matthew Wilcox <[email protected]>

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index a1d2e97..267bf76 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -175,6 +175,20 @@ msi_bus_store(struct device *dev, struct
return count;
}

+static ssize_t
+block_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ if (*buf == '0')
+ pci_unblock_user_cfg_access(pdev);
+ else if (*buf == '1')
+ pci_block_user_cfg_access(pdev);
+
+ return count;
+}
+
struct device_attribute pci_dev_attrs[] = {
__ATTR_RO(resource),
__ATTR_RO(vendor),
@@ -189,6 +203,7 @@ struct device_attribute pci_dev_attrs[]
__ATTR(broken_parity_status,(S_IRUGO|S_IWUSR),
broken_parity_status_show,broken_parity_status_store),
__ATTR(msi_bus, 0644, msi_bus_show, msi_bus_store),
+ __ATTR(block, 0200, NULL, block_store),
__ATTR_NULL,
};

2006-10-17 21:25:54

by Brian King

[permalink] [raw]
Subject: Re: [PATCH] Block on access to temporarily unavailable pci device

Matthew Wilcox wrote:
> The existing implementation of pci_block_user_cfg_access() was recently
> criticised for providing out of date information and for returning errors
> on write, which applications won't be expecting.
>
> This reimplementation uses an rw semaphore to block accesses. I examined
> a couple of other alternatives (a mutex, which would unnecessarily
> serialise kernel BIST users; a per-device mutex, which would take another
> 16 bytes per pci device; a custom queue), I felt the rwsem provided the
> best tradeoffs.

Nack. This changes pci_block_user_cfg_access such that it can now sleep,
which does not work for the ipr driver, which uses it to block during BIST.
The ipr driver needs to be able to call this function at interrupt level
when it receives an interrupt that its scsi adapter has received a fatal
error and requires BIST to recover. The only way I see for ipr to be able
to use the changed interface would require I create a kernel thread in
the ipr driver for this specific purpose.

Brian

>
> I'll post the patch I used to test blocking device accesses separately.
>
> Signed-off-by: Matthew Wilcox <[email protected]>
>
> diff --git a/drivers/pci/access.c b/drivers/pci/access.c
> index ea16805..29581a2 100644
> --- a/drivers/pci/access.c
> +++ b/drivers/pci/access.c
> @@ -1,6 +1,7 @@
> #include <linux/pci.h>
> #include <linux/module.h>
> #include <linux/ioport.h>
> +#include <linux/rwsem.h>
>
> #include "pci.h"
>
> @@ -12,6 +13,18 @@ #include "pci.h"
> static DEFINE_SPINLOCK(pci_lock);
>
> /*
> + * Prevent the user from accessing PCI config space when it's unsafe to do
> + * so. Some devices require this during BIST and we're required to prevent
> + * it during D-state transitions. It could be made per-device, but it doesn't
> + * seem worthwhile for such a rare occurrence.
> + *
> + * User accesses are the 'writer' as only one is allowed at a time. Kernel
> + * blocking the user is the 'reader' as multiple devices can be blocked at
> + * the same time.
> + */
> +static DECLARE_RWSEM(pci_user_sem);
> +
> +/*
> * Wrappers for all PCI configuration access functions. They just check
> * alignment, do locking and call the low-level functions pointed to
> * by pci_dev->ops.
> @@ -63,15 +76,6 @@ EXPORT_SYMBOL(pci_bus_write_config_byte)
> EXPORT_SYMBOL(pci_bus_write_config_word);
> EXPORT_SYMBOL(pci_bus_write_config_dword);
>
> -static u32 pci_user_cached_config(struct pci_dev *dev, int pos)
> -{
> - u32 data;
> -
> - data = dev->saved_config_space[pos/sizeof(dev->saved_config_space[0])];
> - data >>= (pos % sizeof(dev->saved_config_space[0])) * 8;
> - return data;
> -}
> -
> #define PCI_USER_READ_CONFIG(size,type) \
> int pci_user_read_config_##size \
> (struct pci_dev *dev, int pos, type *val) \
> @@ -80,13 +84,12 @@ int pci_user_read_config_##size \
> int ret = 0; \
> u32 data = -1; \
> if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \
> + down_write(&pci_user_sem); \
> spin_lock_irqsave(&pci_lock, flags); \
> - if (likely(!dev->block_ucfg_access)) \
> - ret = dev->bus->ops->read(dev->bus, dev->devfn, \
> + ret = dev->bus->ops->read(dev->bus, dev->devfn, \
> pos, sizeof(type), &data); \
> - else if (pos < sizeof(dev->saved_config_space)) \
> - data = pci_user_cached_config(dev, pos); \
> spin_unlock_irqrestore(&pci_lock, flags); \
> + up_write(&pci_user_sem); \
> *val = (type)data; \
> return ret; \
> }
> @@ -98,11 +101,12 @@ int pci_user_write_config_##size \
> unsigned long flags; \
> int ret = -EIO; \
> if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \
> + down_write(&pci_user_sem); \
> spin_lock_irqsave(&pci_lock, flags); \
> - if (likely(!dev->block_ucfg_access)) \
> - ret = dev->bus->ops->write(dev->bus, dev->devfn, \
> + ret = dev->bus->ops->write(dev->bus, dev->devfn, \
> pos, sizeof(type), val); \
> spin_unlock_irqrestore(&pci_lock, flags); \
> + up_write(&pci_user_sem); \
> return ret; \
> }
>
> @@ -117,21 +121,12 @@ PCI_USER_WRITE_CONFIG(dword, u32)
> * pci_block_user_cfg_access - Block userspace PCI config reads/writes
> * @dev: pci device struct
> *
> - * This function blocks any userspace PCI config accesses from occurring.
> - * When blocked, any writes will be bit bucketed and reads will return the
> - * data saved using pci_save_state for the first 64 bytes of config
> - * space and return 0xff for all other config reads.
> - **/
> + * When user access is blocked, any reads or writes to config space will
> + * sleep until access is unblocked again.
> + */
> void pci_block_user_cfg_access(struct pci_dev *dev)
> {
> - unsigned long flags;
> -
> - pci_save_state(dev);
> -
> - /* spinlock to synchronize with anyone reading config space now */
> - spin_lock_irqsave(&pci_lock, flags);
> - dev->block_ucfg_access = 1;
> - spin_unlock_irqrestore(&pci_lock, flags);
> + down_read(&pci_user_sem);
> }
> EXPORT_SYMBOL_GPL(pci_block_user_cfg_access);
>
> @@ -140,14 +135,9 @@ EXPORT_SYMBOL_GPL(pci_block_user_cfg_acc
> * @dev: pci device struct
> *
> * This function allows userspace PCI config accesses to resume.
> - **/
> + */
> void pci_unblock_user_cfg_access(struct pci_dev *dev)
> {
> - unsigned long flags;
> -
> - /* spinlock to synchronize with anyone reading saved config space */
> - spin_lock_irqsave(&pci_lock, flags);
> - dev->block_ucfg_access = 0;
> - spin_unlock_irqrestore(&pci_lock, flags);
> + up_read(&pci_user_sem);
> }
> EXPORT_SYMBOL_GPL(pci_unblock_user_cfg_access);
> diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
> index 2dde821..5d06837 100644
> --- a/drivers/scsi/ipr.c
> +++ b/drivers/scsi/ipr.c
> @@ -6174,6 +6174,7 @@ static int ipr_reset_start_bist(struct i
> int rc;
>
> ENTER;
> + pci_save_state(ioa_cfg->pdev);
> pci_block_user_cfg_access(ioa_cfg->pdev);
> rc = pci_write_config_byte(ioa_cfg->pdev, PCI_BIST, PCI_BIST_START);
>
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 3632282..4dbcbbd 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -174,7 +174,6 @@ struct pci_dev {
> unsigned int is_busmaster:1; /* device is busmaster */
> unsigned int no_msi:1; /* device may not use msi */
> unsigned int no_d1d2:1; /* only allow d0 or d3 */
> - unsigned int block_ucfg_access:1; /* userspace config space access is blocked */
> unsigned int broken_parity_status:1; /* Device generates false positive parity */
> unsigned int msi_enabled:1;
> unsigned int msix_enabled:1;


--
Brian King
eServer Storage I/O
IBM Linux Technology Center

2006-10-18 14:38:22

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] Block on access to temporarily unavailable pci device

On Tue, 17 Oct 2006, Brian King wrote:

> Nack. This changes pci_block_user_cfg_access such that it can now sleep,
> which does not work for the ipr driver, which uses it to block during BIST.
> The ipr driver needs to be able to call this function at interrupt level
> when it receives an interrupt that its scsi adapter has received a fatal
> error and requires BIST to recover. The only way I see for ipr to be able
> to use the changed interface would require I create a kernel thread in
> the ipr driver for this specific purpose.

How about calling execute_in_process_context()?

You have to do _something_, because a user task could be about to read the
configuration space at the exact moment you want to start the BIST. That
means ipr would have to wait until the user access is finished, which
means it has to be prepared to sleep one way or another.

Alan Stern

2006-10-18 14:51:43

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] Block on access to temporarily unavailable pci device

On Wed, Oct 18, 2006 at 10:38:20AM -0400, Alan Stern wrote:
> You have to do _something_, because a user task could be about to read the
> configuration space at the exact moment you want to start the BIST. That
> means ipr would have to wait until the user access is finished, which
> means it has to be prepared to sleep one way or another.

Actually, it only has to spin until the user has finished accessing
config space. See the patch I just posted.

2006-10-18 14:51:08

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] Block on access to temporarily unavailable pci device


The existing implementation of pci_block_user_cfg_access() was recently
criticised for providing out of date information and for returning errors
on write, which applications won't be expecting.

This reimplementation uses a global wait queue and a bit per device.
I've open-coded prepare_to_wait() / finish_wait() as I could optimise
it significantly by knowing that the pci_lock protected us at all points.

It looked a bit funny to be doing a spin_unlock_irqsave(); schedule(),
so I used spin_lock_irq() for the _user versions of pci_read_config and
pci_write_config. Not carrying a flags pointer around made the code
much less nasty.

I also addressed the potential issue with nested attempts to block.
Now pci_block_user_cfg_access() can return -EBUSY if it's already blocked,
and pci_unblock_user_cfg_access() will WARN if you try to unblock an
already unblocked device.

Signed-off-by: Matthew Wilcox <[email protected]>

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index ea16805..a27a6c0 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -1,6 +1,7 @@
#include <linux/pci.h>
#include <linux/module.h>
#include <linux/ioport.h>
+#include <linux/wait.h>

#include "pci.h"

@@ -63,30 +64,42 @@ EXPORT_SYMBOL(pci_bus_write_config_byte)
EXPORT_SYMBOL(pci_bus_write_config_word);
EXPORT_SYMBOL(pci_bus_write_config_dword);

-static u32 pci_user_cached_config(struct pci_dev *dev, int pos)
-{
- u32 data;
+/*
+ * The following routines are to prevent the user from accessing PCI config
+ * space when it's unsafe to do so. Some devices require this during BIST and
+ * we're required to prevent it during D-state transitions.
+ *
+ * We have a bit per device to indicate it's blocked and a global wait queue
+ * for callers to sleep on until devices are unblocked.
+ */
+static DECLARE_WAIT_QUEUE_HEAD(pci_ucfg_wait);

- data = dev->saved_config_space[pos/sizeof(dev->saved_config_space[0])];
- data >>= (pos % sizeof(dev->saved_config_space[0])) * 8;
- return data;
+static noinline void pci_wait_ucfg(struct pci_dev *dev)
+{
+ DECLARE_WAITQUEUE(wait, current);
+
+ __add_wait_queue(&pci_ucfg_wait, &wait);
+ do {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ spin_unlock_irq(&pci_lock);
+ schedule();
+ spin_lock_irq(&pci_lock);
+ } while (dev->block_ucfg_access);
+ __remove_wait_queue(&pci_ucfg_wait, &wait);
}

#define PCI_USER_READ_CONFIG(size,type) \
int pci_user_read_config_##size \
(struct pci_dev *dev, int pos, type *val) \
{ \
- unsigned long flags; \
int ret = 0; \
u32 data = -1; \
if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \
- spin_lock_irqsave(&pci_lock, flags); \
- if (likely(!dev->block_ucfg_access)) \
- ret = dev->bus->ops->read(dev->bus, dev->devfn, \
+ spin_lock_irq(&pci_lock); \
+ if (unlikely(dev->block_ucfg_access)) pci_wait_ucfg(dev); \
+ ret = dev->bus->ops->read(dev->bus, dev->devfn, \
pos, sizeof(type), &data); \
- else if (pos < sizeof(dev->saved_config_space)) \
- data = pci_user_cached_config(dev, pos); \
- spin_unlock_irqrestore(&pci_lock, flags); \
+ spin_unlock_irq(&pci_lock); \
*val = (type)data; \
return ret; \
}
@@ -95,14 +108,13 @@ #define PCI_USER_WRITE_CONFIG(size,type)
int pci_user_write_config_##size \
(struct pci_dev *dev, int pos, type val) \
{ \
- unsigned long flags; \
int ret = -EIO; \
if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \
- spin_lock_irqsave(&pci_lock, flags); \
- if (likely(!dev->block_ucfg_access)) \
- ret = dev->bus->ops->write(dev->bus, dev->devfn, \
+ spin_lock_irq(&pci_lock); \
+ if (unlikely(dev->block_ucfg_access)) pci_wait_ucfg(dev); \
+ ret = dev->bus->ops->write(dev->bus, dev->devfn, \
pos, sizeof(type), val); \
- spin_unlock_irqrestore(&pci_lock, flags); \
+ spin_unlock_irq(&pci_lock); \
return ret; \
}

@@ -117,21 +129,21 @@ PCI_USER_WRITE_CONFIG(dword, u32)
* pci_block_user_cfg_access - Block userspace PCI config reads/writes
* @dev: pci device struct
*
- * This function blocks any userspace PCI config accesses from occurring.
- * When blocked, any writes will be bit bucketed and reads will return the
- * data saved using pci_save_state for the first 64 bytes of config
- * space and return 0xff for all other config reads.
- **/
-void pci_block_user_cfg_access(struct pci_dev *dev)
+ * When user access is blocked, any reads or writes to config space will
+ * sleep until access is unblocked again. We don't allow nesting of
+ * block/unblock calls.
+ */
+int pci_block_user_cfg_access(struct pci_dev *dev)
{
unsigned long flags;
+ int result;

- pci_save_state(dev);
-
- /* spinlock to synchronize with anyone reading config space now */
spin_lock_irqsave(&pci_lock, flags);
+ result = dev->block_ucfg_access ? -EBUSY : 0;
dev->block_ucfg_access = 1;
spin_unlock_irqrestore(&pci_lock, flags);
+
+ return result;
}
EXPORT_SYMBOL_GPL(pci_block_user_cfg_access);

@@ -140,14 +152,19 @@ EXPORT_SYMBOL_GPL(pci_block_user_cfg_acc
* @dev: pci device struct
*
* This function allows userspace PCI config accesses to resume.
- **/
+ */
void pci_unblock_user_cfg_access(struct pci_dev *dev)
{
unsigned long flags;

- /* spinlock to synchronize with anyone reading saved config space */
spin_lock_irqsave(&pci_lock, flags);
+
+ /* A second unblock implies a failure to notice an attempt to nested
+ * block, which we don't support. */
+ WARN_ON(!dev->block_ucfg_access);
+
dev->block_ucfg_access = 0;
+ wake_up_all(&pci_ucfg_wait);
spin_unlock_irqrestore(&pci_lock, flags);
}
EXPORT_SYMBOL_GPL(pci_unblock_user_cfg_access);
diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index 2dde821..5d06837 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -6174,6 +6174,7 @@ static int ipr_reset_start_bist(struct i
int rc;

ENTER;
+ pci_save_state(ioa_cfg->pdev);
pci_block_user_cfg_access(ioa_cfg->pdev);
rc = pci_write_config_byte(ioa_cfg->pdev, PCI_BIST, PCI_BIST_START);

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 3632282..8725d1f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -634,7 +634,7 @@ int ht_create_irq(struct pci_dev *dev,
void ht_destroy_irq(unsigned int irq);
#endif /* CONFIG_HT_IRQ */

-extern void pci_block_user_cfg_access(struct pci_dev *dev);
+extern int __must_check pci_block_user_cfg_access(struct pci_dev *dev);
extern void pci_unblock_user_cfg_access(struct pci_dev *dev);

/*

2006-10-18 14:57:24

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] Block on access to temporarily unavailable pci device

On Wed, Oct 18, 2006 at 08:51:04AM -0600, Matthew Wilcox wrote:
> This reimplementation uses a global wait queue and a bit per device.
> I've open-coded prepare_to_wait() / finish_wait() as I could optimise
> it significantly by knowing that the pci_lock protected us at all points.

I forgot to report how I've tested it. Using the 'test' patch from
yesterday, I have two processes running lspci in an infinite loop. When
I write a 1 to /sys/bus/pci/devices/0000\:00\:01.0/block, they both
halt. When I write a 0, they both resume.

Then I tried doing echo 0 >/sys/bus/pci/devices/0000\:00\:01.0/block a
second time, and I got the WARNing I expected.

animal:~# echo 1 >/sys/bus/pci/devices/0000\:00\:01.0/block
animal:~# echo 1 >/sys/bus/pci/devices/0000\:00\:01.0/block
animal:~# echo 0 >/sys/bus/pci/devices/0000\:00\:01.0/block

They both resumed (as expected; we don't support nesting).

animal:~# echo 1 >/sys/bus/pci/devices/0000\:00\:01.0/block
animal:~# echo 1 >/sys/bus/pci/devices/0000\:00\:01.1/block
animal:~# echo 0 >/sys/bus/pci/devices/0000\:00\:01.0/block
animal:~# echo 0 >/sys/bus/pci/devices/0000\:00\:01.1/block

Both resumed only after the second echo 0 (as lspci reads all devices
before printing anything).

It's not exactly rigorous testing, but I couldn't think of any other
cases to try.

2006-10-18 15:12:48

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] Block on access to temporarily unavailable pci device

Matthew Wilcox wrote:

> I also addressed the potential issue with nested attempts to block.
> Now pci_block_user_cfg_access() can return -EBUSY if it's already blocked,
> and pci_unblock_user_cfg_access() will WARN if you try to unblock an
> already unblocked device.

Why not just WARN if it is already blocked as well? Presumably if the
driver needs nested blocking, it is going to have to carry some state
to know when to do the final unblock anyway, so it may as well just
not do these redundant blocks either.

** or ** just do a counting block/unblock to properly support nesting
them. That is, go one way or the other, and do it properly.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-10-18 15:16:05

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] Block on access to temporarily unavailable pci device

On Thu, Oct 19, 2006 at 01:12:42AM +1000, Nick Piggin wrote:
> Matthew Wilcox wrote:
>
> >I also addressed the potential issue with nested attempts to block.
> >Now pci_block_user_cfg_access() can return -EBUSY if it's already blocked,
> >and pci_unblock_user_cfg_access() will WARN if you try to unblock an
> >already unblocked device.
>
> Why not just WARN if it is already blocked as well? Presumably if the

Because the driver can check the return value and fail the higher level
operation.

> driver needs nested blocking, it is going to have to carry some state
> to know when to do the final unblock anyway, so it may as well just
> not do these redundant blocks either.

No driver needs nested blocking, but blocks may be imposed on a driver's
device by the PCI core, for example.

> ** or ** just do a counting block/unblock to properly support nesting
> them. That is, go one way or the other, and do it properly.

I don't think that's necessary. Right now, we have one user (IPR's
BIST) and I'm trying to add a second (D-state transitions). We don't
need nested blocks, but we do need to fail in the highly unlikely case
someone tries to do them.

If it turns out we actually need them later, let's revisit this when we
have a user.

2006-10-18 15:27:09

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] Block on access to temporarily unavailable pci device

Matthew Wilcox wrote:
> On Thu, Oct 19, 2006 at 01:12:42AM +1000, Nick Piggin wrote:

>>Why not just WARN if it is already blocked as well? Presumably if the
>
>
> Because the driver can check the return value and fail the higher level
> operation.
>
>
>>driver needs nested blocking, it is going to have to carry some state
>>to know when to do the final unblock anyway, so it may as well just
>>not do these redundant blocks either.
>
>
> No driver needs nested blocking, but blocks may be imposed on a driver's
> device by the PCI core, for example.

So doesn't that mean it needs nested blocking? Surely that's better than
failing the operation due to an implementation detail.


>>** or ** just do a counting block/unblock to properly support nesting
>>them. That is, go one way or the other, and do it properly.
>
>
> I don't think that's necessary. Right now, we have one user (IPR's
> BIST) and I'm trying to add a second (D-state transitions). We don't
> need nested blocks, but we do need to fail in the highly unlikely case
> someone tries to do them.

Well if you really don't need nested blocking and you want to fail in
the highly unlikely case that someone tries... add the WARN.

> If it turns out we actually need them later, let's revisit this when we
> have a user.

I don't know much about the pci layer, so I'm sure you know better than
me whether they are needed or not. But if you think they are not, then
do the WARN rather than return failure.

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com

2006-10-18 15:49:11

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Block on access to temporarily unavailable pci device

Ar Mer, 2006-10-18 am 08:51 -0600, ysgrifennodd Matthew Wilcox:
> The existing implementation of pci_block_user_cfg_access() was recently
> criticised for providing out of date information and for returning errors
> on write, which applications won't be expecting.

It was also favoured by some of us as well. In addition this whole issue
was extensively debated in the past to select the current approach. That
said I do like the approach of a short wait *specifically* on power
transitions, its bounded in time, its neat and it makes sense.

We must be very very sure its never triggered in the real world any
other way (eg your >block testing must be impossible)

So unless you distinguish between "back in a moment", "back someday" and
"not coming back" it isn't useful. Thus your patch is incomplete as it
does not provide the cache that is also needed. At least I don't think X
hanging forever mid mode switch is terribly useful...

> I also addressed the potential issue with nested attempts to block.
> Now pci_block_user_cfg_access() can return -EBUSY if it's already blocked,
> and pci_unblock_user_cfg_access() will WARN if you try to unblock an
> already unblocked device.

Gak that is IMHO a mistake. Just keep a counter. You've just added
another "what the hell do I do if this returns -EBUSY" problem for all
the driver authors.

> Signed-off-by: Matthew Wilcox <[email protected]>

NAK for all the above reasons. This is a cure looking for a disease, and
its a cure which is worse than the theoretical disease it wants to fix.


2006-10-18 15:53:01

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] Block on access to temporarily unavailable pci device

On Wed, 18 Oct 2006, Matthew Wilcox wrote:

> On Wed, Oct 18, 2006 at 10:38:20AM -0400, Alan Stern wrote:
> > You have to do _something_, because a user task could be about to read the
> > configuration space at the exact moment you want to start the BIST. That
> > means ipr would have to wait until the user access is finished, which
> > means it has to be prepared to sleep one way or another.
>
> Actually, it only has to spin until the user has finished accessing
> config space. See the patch I just posted.

I stand corrected.

Don't you want the user process to wait in TASK_INTERRUPTIBLE? It would
require only a very simple change.

Alan Stern

2006-10-18 16:04:10

by Alan

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] Block on access to temporarily unavailable pci device

Ar Mer, 2006-10-18 am 11:52 -0400, ysgrifennodd Alan Stern:
> Don't you want the user process to wait in TASK_INTERRUPTIBLE? It would
> require only a very simple change.

That just makes the problem even worse, to go with the kernel driver
"what the hell do do if.." we get a user space one thats based around
incompatibility with the existing behaviour.

There are much saner ways to sort that out without breaking the API

If its going to be a bounded short wait -> pause
If its might be a long wait -> cached
If its gone for good then error

If the user specified O_NDELAY then -EWOULDBLOCK not wait

That way you don't break anything and you get sensible Unix semantics.
The wait queue Matthew added also means select() can be fitted up to do
the right thing for the O_NDELAY case.

Alan

2006-10-18 16:09:29

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] Block on access to temporarily unavailable pci device

On Wed, Oct 18, 2006 at 05:05:02PM +0100, Alan Cox wrote:
> If the user specified O_NDELAY then -EWOULDBLOCK not wait

sysfs doesn't give us the struct file, so we can't tell.

2006-10-18 16:20:46

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] Block on access to temporarily unavailable pci device

On Wed, Oct 18, 2006 at 04:50:51PM +0100, Alan Cox wrote:
> Ar Mer, 2006-10-18 am 08:51 -0600, ysgrifennodd Matthew Wilcox:
> > The existing implementation of pci_block_user_cfg_access() was recently
> > criticised for providing out of date information and for returning errors
> > on write, which applications won't be expecting.
>
> It was also favoured by some of us as well. In addition this whole issue
> was extensively debated in the past to select the current approach. That
> said I do like the approach of a short wait *specifically* on power
> transitions, its bounded in time, its neat and it makes sense.
>
> We must be very very sure its never triggered in the real world any
> other way (eg your >block testing must be impossible)

Oh yes, absolutely. That's why I posted it with a Nacked-by.

> So unless you distinguish between "back in a moment", "back someday" and
> "not coming back" it isn't useful. Thus your patch is incomplete as it
> does not provide the cache that is also needed. At least I don't think X
> hanging forever mid mode switch is terribly useful...

I don't see how that's possible. If the driver forgets to call
pci_unblock_user_cfg_access(), that's a bug in the kernel driver.
The current user is limited to a two-second delay and the one I'm
proposing introducing is a delay measued in milli- or microseconds.
An extra two-second delay while you BIST your IPR device and change
modes in X at the same time (does X really scan all devices when it's
changing mode settings? That's odd) doesn't strike me as a huge failure.

> > I also addressed the potential issue with nested attempts to block.
> > Now pci_block_user_cfg_access() can return -EBUSY if it's already blocked,
> > and pci_unblock_user_cfg_access() will WARN if you try to unblock an
> > already unblocked device.
>
> Gak that is IMHO a mistake. Just keep a counter. You've just added
> another "what the hell do I do if this returns -EBUSY" problem for all
> the driver authors.

You fail the operation if it returns busy. Or you loop. It's really up
to you, the driver author. You know what operation you're trying to do,
you know what makes more sense.

> NAK for all the above reasons. This is a cure looking for a disease, and
> its a cure which is worse than the theoretical disease it wants to fix.

Absolutely not. It's an attempt to solve a tricky problem. Devfs, now
there was a cure looking for a disease.

It seemed to me there was consensus that blocking was a better approach
than returning a cached value or returning an error. If we're
decided that returning a cached value is the better approach, then
the patch to fix it is trivial; move the pci_set_state() call from the
pci_block_user_cfg_access() function to the IPR driver. Done and dusted.

2006-10-18 16:39:53

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Block on access to temporarily unavailable pci device

Ar Mer, 2006-10-18 am 10:20 -0600, ysgrifennodd Matthew Wilcox:
> I don't see how that's possible. If the driver forgets to call
> pci_unblock_user_cfg_access(), that's a bug in the kernel driver.

Lets leave bugs aside thats a different problem. The goal isn't to be
bug proof

> The current user is limited to a two-second delay and the one I'm
> proposing introducing is a delay measued in milli- or microseconds.
> An extra two-second delay while you BIST your IPR device and change
> modes in X at the same time (does X really scan all devices when it's
> changing mode settings? That's odd) doesn't strike me as a huge failure.

X scans all the devices when it sets up so only a video device one would
hang mid mode set.

> You fail the operation if it returns busy. Or you loop. It's really up
> to you, the driver author. You know what operation you're trying to do,
> you know what makes more sense.

But I've no idea who, what or why and that makes it hard to handle. If
the thing refcounts then if there are two reasons to be blocked we are
fine and the last reason goes away we resume - it does make it more easy
to make mistakes. If it isnt ref counting I'd prefer block repeated is a
BUG() not a "driver figure this out"

> It seemed to me there was consensus that blocking was a better approach
> than returning a cached value or returning an error. If we're
> decided that returning a cached value is the better approach, then
> the patch to fix it is trivial; move the pci_set_state() call from the
> pci_block_user_cfg_access() function to the IPR driver. Done and dusted.

That would have been my choice but the blocking approach seems to be
getting somewhere so it seems worth thrashing out in full.

2006-10-18 16:45:40

by Alan

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] Block on access to temporarily unavailable pci device

Ar Mer, 2006-10-18 am 10:09 -0600, ysgrifennodd Matthew Wilcox:
> On Wed, Oct 18, 2006 at 05:05:02PM +0100, Alan Cox wrote:
> > If the user specified O_NDELAY then -EWOULDBLOCK not wait
>
> sysfs doesn't give us the struct file, so we can't tell.

I'm all for fixing sysfs on that point so that it can behave sensibly.

2006-10-18 17:14:21

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] Block on access to temporarily unavailable pci device

On Wed, Oct 18, 2006 at 05:39:52PM +0100, Alan Cox wrote:
> > The current user is limited to a two-second delay and the one I'm
> > proposing introducing is a delay measued in milli- or microseconds.
> > An extra two-second delay while you BIST your IPR device and change
> > modes in X at the same time (does X really scan all devices when it's
> > changing mode settings? That's odd) doesn't strike me as a huge failure.
>
> X scans all the devices when it sets up so only a video device one would
> hang mid mode set.

OK. So the only possible X interaction currently is a D-state transition.

> > You fail the operation if it returns busy. Or you loop. It's really up
> > to you, the driver author. You know what operation you're trying to do,
> > you know what makes more sense.
>
> But I've no idea who, what or why and that makes it hard to handle. If
> the thing refcounts then if there are two reasons to be blocked we are
> fine and the last reason goes away we resume - it does make it more easy
> to make mistakes. If it isnt ref counting I'd prefer block repeated is a
> BUG() not a "driver figure this out"

Thinking about this a bit more, we only *need* to block userspace from
accessing a device while it's going to cause lockups if we access the
device. And we'll cause the lockup ourselves if we try to do more than
one of these operations at a time. So BUG_ON is clearly the right
approach. Of course, the backtrace might well finger the wrong culprit --
if someone forgot to release the block earlier, it'll catch the second
attempt rather than the missed (or infinitely delayed) unblock. I don't
think it matters too much, and I don't see a nice way to capture the
other task (do a backtrace to a buffer somewhere in a special debug mode
...?)

2006-10-19 15:41:30

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH] Block on access to temporarily unavailable pci device [version 3]


The existing implementation of pci_block_user_cfg_access() was recently
criticised for providing out of date information and for returning errors
on write, which applications won't be expecting.

This reimplementation uses a global wait queue and a bit per device.
I've open-coded prepare_to_wait() / finish_wait() as I could optimise
it significantly by knowing that the pci_lock protected us at all points.

It looked a bit funny to be doing a spin_unlock_irqsave(); schedule(),
so I used spin_lock_irq() for the _user versions of pci_read_config and
pci_write_config. Not carrying a flags pointer around made the code
much less nasty.

Attempts to block an already blocked device hit a BUG() and attempts to
unblock an already unblocked device hit a WARN(). If we need to block
access to a device from userspace, it's because it's unsafe for even
another bit of the kernel to access the device. An attempt to block
a device for a second time means we're about to access the device to
perform some other operation, which could provoke undefined behaviour
from the device.

Signed-off-by: Matthew Wilcox <[email protected]>

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index ea16805..73a58c7 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -1,6 +1,7 @@
#include <linux/pci.h>
#include <linux/module.h>
#include <linux/ioport.h>
+#include <linux/wait.h>

#include "pci.h"

@@ -63,30 +64,42 @@ EXPORT_SYMBOL(pci_bus_write_config_byte)
EXPORT_SYMBOL(pci_bus_write_config_word);
EXPORT_SYMBOL(pci_bus_write_config_dword);

-static u32 pci_user_cached_config(struct pci_dev *dev, int pos)
-{
- u32 data;
+/*
+ * The following routines are to prevent the user from accessing PCI config
+ * space when it's unsafe to do so. Some devices require this during BIST and
+ * we're required to prevent it during D-state transitions.
+ *
+ * We have a bit per device to indicate it's blocked and a global wait queue
+ * for callers to sleep on until devices are unblocked.
+ */
+static DECLARE_WAIT_QUEUE_HEAD(pci_ucfg_wait);

- data = dev->saved_config_space[pos/sizeof(dev->saved_config_space[0])];
- data >>= (pos % sizeof(dev->saved_config_space[0])) * 8;
- return data;
+static noinline void pci_wait_ucfg(struct pci_dev *dev)
+{
+ DECLARE_WAITQUEUE(wait, current);
+
+ __add_wait_queue(&pci_ucfg_wait, &wait);
+ do {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ spin_unlock_irq(&pci_lock);
+ schedule();
+ spin_lock_irq(&pci_lock);
+ } while (dev->block_ucfg_access);
+ __remove_wait_queue(&pci_ucfg_wait, &wait);
}

#define PCI_USER_READ_CONFIG(size,type) \
int pci_user_read_config_##size \
(struct pci_dev *dev, int pos, type *val) \
{ \
- unsigned long flags; \
int ret = 0; \
u32 data = -1; \
if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \
- spin_lock_irqsave(&pci_lock, flags); \
- if (likely(!dev->block_ucfg_access)) \
- ret = dev->bus->ops->read(dev->bus, dev->devfn, \
+ spin_lock_irq(&pci_lock); \
+ if (unlikely(dev->block_ucfg_access)) pci_wait_ucfg(dev); \
+ ret = dev->bus->ops->read(dev->bus, dev->devfn, \
pos, sizeof(type), &data); \
- else if (pos < sizeof(dev->saved_config_space)) \
- data = pci_user_cached_config(dev, pos); \
- spin_unlock_irqrestore(&pci_lock, flags); \
+ spin_unlock_irq(&pci_lock); \
*val = (type)data; \
return ret; \
}
@@ -95,14 +108,13 @@ #define PCI_USER_WRITE_CONFIG(size,type)
int pci_user_write_config_##size \
(struct pci_dev *dev, int pos, type val) \
{ \
- unsigned long flags; \
int ret = -EIO; \
if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \
- spin_lock_irqsave(&pci_lock, flags); \
- if (likely(!dev->block_ucfg_access)) \
- ret = dev->bus->ops->write(dev->bus, dev->devfn, \
+ spin_lock_irq(&pci_lock); \
+ if (unlikely(dev->block_ucfg_access)) pci_wait_ucfg(dev); \
+ ret = dev->bus->ops->write(dev->bus, dev->devfn, \
pos, sizeof(type), val); \
- spin_unlock_irqrestore(&pci_lock, flags); \
+ spin_unlock_irq(&pci_lock); \
return ret; \
}

@@ -117,21 +129,23 @@ PCI_USER_WRITE_CONFIG(dword, u32)
* pci_block_user_cfg_access - Block userspace PCI config reads/writes
* @dev: pci device struct
*
- * This function blocks any userspace PCI config accesses from occurring.
- * When blocked, any writes will be bit bucketed and reads will return the
- * data saved using pci_save_state for the first 64 bytes of config
- * space and return 0xff for all other config reads.
- **/
+ * When user access is blocked, any reads or writes to config space will
+ * sleep until access is unblocked again. We don't allow nesting of
+ * block/unblock calls.
+ */
void pci_block_user_cfg_access(struct pci_dev *dev)
{
unsigned long flags;
+ int was_blocked;

- pci_save_state(dev);
-
- /* spinlock to synchronize with anyone reading config space now */
spin_lock_irqsave(&pci_lock, flags);
+ was_blocked = dev->block_ucfg_access;
dev->block_ucfg_access = 1;
spin_unlock_irqrestore(&pci_lock, flags);
+
+ /* If we BUG() inside the pci_lock, we're guaranteed to hose
+ * the machine */
+ BUG_ON(was_blocked);
}
EXPORT_SYMBOL_GPL(pci_block_user_cfg_access);

@@ -140,14 +154,19 @@ EXPORT_SYMBOL_GPL(pci_block_user_cfg_acc
* @dev: pci device struct
*
* This function allows userspace PCI config accesses to resume.
- **/
+ */
void pci_unblock_user_cfg_access(struct pci_dev *dev)
{
unsigned long flags;

- /* spinlock to synchronize with anyone reading saved config space */
spin_lock_irqsave(&pci_lock, flags);
+
+ /* This indicates a problem in the caller, but we don't need
+ * to kill them, unlike a double-block above. */
+ WARN_ON(!dev->block_ucfg_access);
+
dev->block_ucfg_access = 0;
+ wake_up_all(&pci_ucfg_wait);
spin_unlock_irqrestore(&pci_lock, flags);
}
EXPORT_SYMBOL_GPL(pci_unblock_user_cfg_access);
diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index 2dde821..5d06837 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -6174,6 +6174,7 @@ static int ipr_reset_start_bist(struct i
int rc;

ENTER;
+ pci_save_state(ioa_cfg->pdev);
pci_block_user_cfg_access(ioa_cfg->pdev);
rc = pci_write_config_byte(ioa_cfg->pdev, PCI_BIST, PCI_BIST_START);

2006-10-19 16:32:48

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Block on access to temporarily unavailable pci device [version 3]

Ar Iau, 2006-10-19 am 09:41 -0600, ysgrifennodd Matthew Wilcox:
> Signed-off-by: Matthew Wilcox <[email protected]>
>

Acked-by: Alan Cox <[email protected]>

Add a note to the block call that anyone blocking it for long times will
be unpopular.


2006-10-19 23:10:35

by Adam M Belay

[permalink] [raw]
Subject: Re: [PATCH] Block on access to temporarily unavailable pci device [version 3]

On Thu, 2006-10-19 at 09:41 -0600, Matthew Wilcox wrote:
> Signed-off-by: Matthew Wilcox <[email protected]>

Acked-by: Adam Belay <[email protected]>

It would be nice if we had sysfs expose struct file at some point... :)

Thanks,
Adam


2006-10-19 23:53:18

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] Block on access to temporarily unavailable pci device [version 3]

On Thu, Oct 19, 2006 at 07:13:56PM -0400, Adam Belay wrote:
> It would be nice if we had sysfs expose struct file at some point... :)

No it would not. I have yet to see a valid reason why to do this
(everyone would be abusing sysfs even worse if it was present, you
should see some of the requests I've had over the years...)

thanks,

greg k-h

2006-10-20 21:51:01

by Brian King

[permalink] [raw]
Subject: Re: [PATCH] Block on access to temporarily unavailable pci device [version 3]

Matthew Wilcox wrote:
> Signed-off-by: Matthew Wilcox <[email protected]>

Acked-by: Brian King <[email protected]>

I tried this out on my machine with an ipr adapter, where
I forced the adapter through BIST using ipr's reset_host
sysfs attribute, all the while continually reading pci
config space through sysfs in a loop. Everything looked
good.

Brian

--
Brian King
eServer Storage I/O
IBM Linux Technology Center