2009-03-20 23:04:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH] PCI PM: Be extra careful when changing power states of devices

From: Rafael J. Wysocki <[email protected]>

The story in http://bugzilla.kernel.org/show_bug.cgi?id=12846 shows
that setting the power state of a PCI device by
pci_raw_set_power_state() may sometimes fail. For this reason,
pci_raw_set_power_state() should not assume that the power state of
the device has actually changed after writing into its PMCSR.
Instead, it should read the value from there and use it to update
dev->current_state. It also is useful to print a warning if the
device's power state hasn't changed as expected.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/pci/pci.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -436,7 +436,7 @@ static inline int platform_pci_sleep_wak
*/
static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
{
- u16 pmcsr;
+ u16 pmcsr, new_pmcsr;
bool need_restore = false;

/* Check if we're already there */
@@ -498,7 +498,15 @@ static int pci_raw_set_power_state(struc
else if (state == PCI_D2 || dev->current_state == PCI_D2)
udelay(PCI_PM_D2_DELAY);

- dev->current_state = state;
+ pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &new_pmcsr);
+ if (new_pmcsr == pmcsr) {
+ dev->current_state = state;
+ } else {
+ dev->current_state = (new_pmcsr & PCI_PM_CTRL_STATE_MASK);
+ dev_warn(&dev->dev,
+ "failed to set power state to D%d, is D%d\n", state,
+ dev->current_state);
+ }

/* According to section 5.4.1 of the "PCI BUS POWER MANAGEMENT
* INTERFACE SPECIFICATION, REV. 1.2", a device transitioning


2009-03-22 21:13:54

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 1/2] PCI PM: Introduce __pci_set_power_state()

From: Rafael J. Wysocki <[email protected]>

The story in http://bugzilla.kernel.org/show_bug.cgi?id=12846 shows
that setting the power state of a PCI device may sometimes require
multiple attempts to program the device's PMCSR and and/or an delay
longer than the default one. For this reason, introduce
__pci_set_power_state() that will take two additional arguments, the
number of attempts to program the power state of the device to be
made and the delay after writing a new value to the device's PMCSR.
Redefine pci_set_power_state() as __pci_set_power_state() using the
default values of the new arguments.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/pci/pci.c | 63 +++++++++++++++++++++++++++++++++++++++-------------
include/linux/pci.h | 7 ++++-
2 files changed, 54 insertions(+), 16 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -426,6 +426,9 @@ static inline int platform_pci_sleep_wak
* given PCI device
* @dev: PCI device to handle.
* @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
+ * @attempts: How many times to try to change the power state of the device
+ * @delay: Delay after programming the new power state, in miliseconds (0 means
+ * use default)
*
* RETURN VALUE:
* -EINVAL if the requested state is invalid.
@@ -434,9 +437,13 @@ static inline int platform_pci_sleep_wak
* 0 if device already is in the requested state.
* 0 if device's power state has been successfully changed.
*/
-static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
+static int pci_raw_set_power_state(
+ struct pci_dev *dev,
+ pci_power_t state,
+ unsigned int attempts,
+ unsigned int delay)
{
- u16 pmcsr;
+ u16 pmcsr, new_pmcsr;
bool need_restore = false;

/* Check if we're already there */
@@ -488,17 +495,36 @@ static int pci_raw_set_power_state(struc
break;
}

- /* enter specified state */
- pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
+ do {
+ /* Program the requested state */
+ pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);

- /* Mandatory power management transition delays */
- /* see PCI PM 1.1 5.6.1 table 18 */
- if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
- msleep(pci_pm_d3_delay);
- else if (state == PCI_D2 || dev->current_state == PCI_D2)
- udelay(PCI_PM_D2_DELAY);
+ /*
+ * If delay has not been specified, use mandatory PCI power
+ * management transition delays (see PCI PM 1.1 5.6.1 table 18).
+ */
+ if (delay)
+ msleep(delay);
+ else if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
+ msleep(pci_pm_d3_delay);
+ else if (state == PCI_D2 || dev->current_state == PCI_D2)
+ udelay(PCI_PM_D2_DELAY);
+
+ /* Check if the power state has actually changed */
+ pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL,
+ &new_pmcsr);
+ if (pmcsr == new_pmcsr) {
+ dev->current_state = state;
+ break;
+ }
+ } while (--attempts);

- dev->current_state = state;
+ if (pmcsr != new_pmcsr) {
+ dev->current_state = (new_pmcsr & PCI_PM_CTRL_STATE_MASK);
+ dev_warn(&dev->dev,
+ "failed to set power state to D%d, is D%d\n", state,
+ dev->current_state);
+ }

/* According to section 5.4.1 of the "PCI BUS POWER MANAGEMENT
* INTERFACE SPECIFICATION, REV. 1.2", a device transitioning
@@ -540,9 +566,12 @@ void pci_update_current_state(struct pci
}

/**
- * pci_set_power_state - Set the power state of a PCI device
+ * __pci_set_power_state - Set the power state of a PCI device
* @dev: PCI device to handle.
* @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
+ * @attempts: How many times to try to change the power state of the device.
+ * @delay: Delay after programming the new power state, in miliseconds (0 means
+ * use default).
*
* Transition a device to a new power state, using the platform formware and/or
* the device's PCI PM registers.
@@ -554,7 +583,11 @@ void pci_update_current_state(struct pci
* 0 if device already is in the requested state.
* 0 if device's power state has been successfully changed.
*/
-int pci_set_power_state(struct pci_dev *dev, pci_power_t state)
+int __pci_set_power_state(
+ struct pci_dev *dev,
+ pci_power_t state,
+ unsigned int attempts,
+ unsigned int delay)
{
int error;

@@ -590,7 +623,7 @@ int pci_set_power_state(struct pci_dev *
if (state == PCI_D3hot && (dev->dev_flags & PCI_DEV_FLAGS_NO_D3))
return 0;

- error = pci_raw_set_power_state(dev, state);
+ error = pci_raw_set_power_state(dev, state, attempts, delay);

if (state > PCI_D0 && platform_pci_power_manageable(dev)) {
/* Allow the platform to finalize the transition */
@@ -603,6 +636,7 @@ int pci_set_power_state(struct pci_dev *

return error;
}
+EXPORT_SYMBOL(__pci_set_power_state);

/**
* pci_choose_state - Choose the power state of a PCI device
@@ -2409,7 +2443,6 @@ EXPORT_SYMBOL(pci_assign_resource);
EXPORT_SYMBOL(pci_find_parent_resource);
EXPORT_SYMBOL(pci_select_bars);

-EXPORT_SYMBOL(pci_set_power_state);
EXPORT_SYMBOL(pci_save_state);
EXPORT_SYMBOL(pci_restore_state);
EXPORT_SYMBOL(pci_pme_capable);
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -689,7 +689,12 @@ size_t pci_get_rom_size(struct pci_dev *
/* Power management related routines */
int pci_save_state(struct pci_dev *dev);
int pci_restore_state(struct pci_dev *dev);
-int pci_set_power_state(struct pci_dev *dev, pci_power_t state);
+int __pci_set_power_state(struct pci_dev *dev, pci_power_t state,
+ unsigned int attempts, unsigned int delay);
+static inline int pci_set_power_state(struct pci_dev *dev, pci_power_t state)
+{
+ return __pci_set_power_state(dev, state, 1, 0);
+}
pci_power_t pci_choose_state(struct pci_dev *dev, pm_message_t state);
bool pci_pme_capable(struct pci_dev *dev, pci_power_t state);
void pci_pme_active(struct pci_dev *dev, bool enable);

2009-03-22 21:14:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 0/2] Make radeonfb use PCI PM core for suspendig device (was: Re: [RFC][PATCH] PCI PM: Be extra careful when changing power states of devices)

On Saturday 21 March 2009, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> The story in http://bugzilla.kernel.org/show_bug.cgi?id=12846 shows
> that setting the power state of a PCI device by
> pci_raw_set_power_state() may sometimes fail. For this reason,
> pci_raw_set_power_state() should not assume that the power state of
> the device has actually changed after writing into its PMCSR.
> Instead, it should read the value from there and use it to update
> dev->current_state. It also is useful to print a warning if the
> device's power state hasn't changed as expected.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
OK, since the Ben's radeonfb fix for bug #12846 has been merged, I'd like
to do something a bit different.

Patch 1/2 introduces __pci_set_power_state() that will allow the radeonfb
driver not to open code PCI PM operations.

Patch 2/2 makes the driver use __pci_set_power_state().

Comments welcome.

Thanks,
Rafael

2009-03-22 21:14:34

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 2/2] radeonfb: Avoid open coding of PCI PM operations

From: Rafael J. Wysocki <[email protected]>

The radeonfb driver uses open coded programming of the low power
state D2 into the device, because it has to handle some chips
that don't follow the specification. Avoid that by making it use
__pci_set_power_state() for this purpose.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/video/aty/radeon_pm.c | 26 +++-----------------------
1 file changed, 3 insertions(+), 23 deletions(-)

Index: linux-2.6/drivers/video/aty/radeon_pm.c
===================================================================
--- linux-2.6.orig/drivers/video/aty/radeon_pm.c
+++ linux-2.6/drivers/video/aty/radeon_pm.c
@@ -2507,25 +2507,6 @@ static void radeon_reinitialize_QW(struc

#endif /* CONFIG_PPC_OF */

-static void radeonfb_whack_power_state(struct radeonfb_info *rinfo, pci_power_t state)
-{
- u16 pwr_cmd;
-
- for (;;) {
- pci_read_config_word(rinfo->pdev,
- rinfo->pm_reg+PCI_PM_CTRL,
- &pwr_cmd);
- if (pwr_cmd & 2)
- break;
- pwr_cmd = (pwr_cmd & ~PCI_PM_CTRL_STATE_MASK) | 2;
- pci_write_config_word(rinfo->pdev,
- rinfo->pm_reg+PCI_PM_CTRL,
- pwr_cmd);
- msleep(500);
- }
- rinfo->pdev->current_state = state;
-}
-
static void radeon_set_suspend(struct radeonfb_info *rinfo, int suspend)
{
u32 tmp;
@@ -2578,11 +2559,10 @@ static void radeon_set_suspend(struct ra
pci_disable_device(rinfo->pdev);
pci_save_state(rinfo->pdev);
/* The chip seems to need us to whack the PM register
- * repeatedly until it sticks. We do that -prior- to
- * calling pci_set_power_state()
+ * repeatedly until it sticks. However, with the 500 ms delay,
+ * it's better to give up after 10 unsuccessful attempts.
*/
- radeonfb_whack_power_state(rinfo, PCI_D2);
- pci_set_power_state(rinfo->pdev, PCI_D2);
+ __pci_set_power_state(rinfo->pdev, PCI_D2, 10, 500);
} else {
printk(KERN_DEBUG "radeonfb (%s): switching to D0 state...\n",
pci_name(rinfo->pdev));

2009-03-22 23:13:17

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH 1/2] PCI PM: Introduce __pci_set_power_state()

Hi.

It's not an error if we fail to set the desired state? (I see you have a
dev_warn only).

(If it is an error, the radeon_set_suspend patch needs to be adjusted to
not ignore the error code).

Regards,

Nigel

2009-03-23 00:12:44

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/2] Make radeonfb use PCI PM core for suspendig device (was: Re: [RFC][PATCH] PCI PM: Be extra careful when changing power states of devices)

On Sun, 2009-03-22 at 22:08 +0100, Rafael J. Wysocki wrote:
> On Saturday 21 March 2009, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > The story in http://bugzilla.kernel.org/show_bug.cgi?id=12846 shows
> > that setting the power state of a PCI device by
> > pci_raw_set_power_state() may sometimes fail. For this reason,
> > pci_raw_set_power_state() should not assume that the power state of
> > the device has actually changed after writing into its PMCSR.
> > Instead, it should read the value from there and use it to update
> > dev->current_state. It also is useful to print a warning if the
> > device's power state hasn't changed as expected.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
> OK, since the Ben's radeonfb fix for bug #12846 has been merged, I'd like
> to do something a bit different.
>
> Patch 1/2 introduces __pci_set_power_state() that will allow the radeonfb
> driver not to open code PCI PM operations.
>
> Patch 2/2 makes the driver use __pci_set_power_state().
>
> Comments welcome.

No objection.

Ben.

2009-03-23 18:16:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH 1/2] PCI PM: Introduce __pci_set_power_state()

On Monday 23 March 2009, Nigel Cunningham wrote:
> Hi.

Hi,

> It's not an error if we fail to set the desired state? (I see you have a
> dev_warn only).

Theoretically, it is, but since we've never returned errors in such cases,
we'd probably see some unexpected suspend failures if we started to do that
now.

Thanks,
Rafael

2009-03-23 21:33:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 0/2] Export platform_pci_set_power_state() and make radeonfb use it

On Sunday 22 March 2009, Rafael J. Wysocki wrote:
> On Saturday 21 March 2009, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > The story in http://bugzilla.kernel.org/show_bug.cgi?id=12846 shows
> > that setting the power state of a PCI device by
> > pci_raw_set_power_state() may sometimes fail. For this reason,
> > pci_raw_set_power_state() should not assume that the power state of
> > the device has actually changed after writing into its PMCSR.
> > Instead, it should read the value from there and use it to update
> > dev->current_state. It also is useful to print a warning if the
> > device's power state hasn't changed as expected.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
> OK, since the Ben's radeonfb fix for bug #12846 has been merged, I'd like
> to do something a bit different.
>
> Patch 1/2 introduces __pci_set_power_state() that will allow the radeonfb
> driver not to open code PCI PM operations.
>
> Patch 2/2 makes the driver use __pci_set_power_state().
>
> Comments welcome.

Well, Jesse doesn't like these patches very much, so here's an alternative.

1/2 changes platform_pci_set_power_state() into an exported function (and uses
it to simplify pci_set_power_state() a bit) so that the radeonfb driver can use
it.

2/2 modifies the radeonfb driver itself.

Thanks,
Rafael

2009-03-23 21:33:42

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 2/2] radeonfb: Use platform_pci_set_power_state()

From: Rafael J. Wysocki <[email protected]>

Use platform_pci_set_power_state() to finalize the transition into D2
after programming the PMCSR of the device directly.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/video/aty/radeon_pm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/drivers/video/aty/radeon_pm.c
===================================================================
--- linux-2.6.orig/drivers/video/aty/radeon_pm.c
+++ linux-2.6/drivers/video/aty/radeon_pm.c
@@ -2582,7 +2582,7 @@ static void radeon_set_suspend(struct ra
* calling pci_set_power_state()
*/
radeonfb_whack_power_state(rinfo, PCI_D2);
- pci_set_power_state(rinfo->pdev, PCI_D2);
+ platform_pci_set_power_state(rinfo->pdev, PCI_D2);
} else {
printk(KERN_DEBUG "radeonfb (%s): switching to D0 state...\n",
pci_name(rinfo->pdev));

2009-03-23 21:33:59

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 1/2] PCI PM: Export platform_pci_set_power_state()

From: Rafael J. Wysocki <[email protected]>

The radeonfb driver needs to program the device's PMCSR directly due
to some quirky hardware it has to handle (see
http://bugzilla.kernel.org/show_bug.cgi?id=12846 for details) and
after doing that it needs to call the platform (usually ACPI) to
finish the power transition of the device. Currently it uses
pci_set_power_state() for this purpose, however making a specific
assumption about the internal behavior of this function, which has
changed recently so that this assumption is no longer satisfied.
For this reason, change platform_pci_set_power_state() into an
exported function so that the radeonfb driver can use it directly
instead of calling it via pci_set_power_state().

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/pci/pci.c | 68 +++++++++++++++++++++++++++++-----------------------
include/linux/pci.h | 1
2 files changed, 40 insertions(+), 29 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -398,12 +398,6 @@ static inline bool platform_pci_power_ma
return pci_platform_pm ? pci_platform_pm->is_manageable(dev) : false;
}

-static inline int platform_pci_set_power_state(struct pci_dev *dev,
- pci_power_t t)
-{
- return pci_platform_pm ? pci_platform_pm->set_state(dev, t) : -ENOSYS;
-}
-
static inline pci_power_t platform_pci_choose_state(struct pci_dev *dev)
{
return pci_platform_pm ?
@@ -540,6 +534,31 @@ void pci_update_current_state(struct pci
}

/**
+ * platform_pci_set_power_state - Call platform to change device power state
+ * @dev: PCI device to handle
+ * @state: State to put the device into
+ *
+ * Allow the platform to change the power state of the device, for example via
+ * ACPI _PR0, _PS0 or some such.
+ */
+int platform_pci_set_power_state(struct pci_dev *dev, pci_power_t state)
+{
+ int error = 0;
+
+ if (!pci_platform_pm)
+ return -ENOSYS;
+
+ if (pci_platform_pm->is_manageable(dev))
+ error = pci_platform_pm->set_state(dev, state);
+
+ if (!error)
+ pci_update_current_state(dev, state);
+
+ return error;
+}
+EXPORT_SYMBOL_GPL(platform_pci_set_power_state);
+
+/**
* pci_set_power_state - Set the power state of a PCI device
* @dev: PCI device to handle.
* @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
@@ -556,7 +575,7 @@ void pci_update_current_state(struct pci
*/
int pci_set_power_state(struct pci_dev *dev, pci_power_t state)
{
- int error;
+ int error, platform_error = -ENODEV;

/* bound the state we're entering */
if (state > PCI_D3hot)
@@ -570,36 +589,27 @@ int pci_set_power_state(struct pci_dev *
* it into D0 (which would only happen on boot).
*/
return 0;
+ else if (state == PCI_D3hot && (dev->dev_flags & PCI_DEV_FLAGS_NO_D3))
+ /*
+ * This device is quirked not to be put into D3, so don't put it
+ * in D3
+ */
+ return 0;

/* Check if we're already there */
if (dev->current_state == state)
return 0;

- if (state == PCI_D0) {
- /*
- * Allow the platform to change the state, for example via ACPI
- * _PR0, _PS0 and some such, but do not trust it.
- */
- int ret = platform_pci_power_manageable(dev) ?
- platform_pci_set_power_state(dev, PCI_D0) : 0;
- if (!ret)
- pci_update_current_state(dev, PCI_D0);
- }
- /* This device is quirked not to be put into D3, so
- don't put it in D3 */
- if (state == PCI_D3hot && (dev->dev_flags & PCI_DEV_FLAGS_NO_D3))
- return 0;
+ if (state == PCI_D0)
+ platform_error = platform_pci_set_power_state(dev, PCI_D0);

error = pci_raw_set_power_state(dev, state);

- if (state > PCI_D0 && platform_pci_power_manageable(dev)) {
- /* Allow the platform to finalize the transition */
- int ret = platform_pci_set_power_state(dev, state);
- if (!ret) {
- pci_update_current_state(dev, state);
- error = 0;
- }
- }
+ if (state > PCI_D0)
+ platform_error = platform_pci_set_power_state(dev, state);
+
+ if (!platform_error)
+ error = 0;

return error;
}
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -689,6 +689,7 @@ size_t pci_get_rom_size(struct pci_dev *
/* Power management related routines */
int pci_save_state(struct pci_dev *dev);
int pci_restore_state(struct pci_dev *dev);
+int platform_pci_set_power_state(struct pci_dev *dev, pci_power_t state);
int pci_set_power_state(struct pci_dev *dev, pci_power_t state);
pci_power_t pci_choose_state(struct pci_dev *dev, pm_message_t state);
bool pci_pme_capable(struct pci_dev *dev, pci_power_t state);

2009-03-23 22:23:47

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/2] Export platform_pci_set_power_state() and make radeonfb use it

On Mon, 23 Mar 2009 22:30:09 +0100
"Rafael J. Wysocki" <[email protected]> wrote:

> On Sunday 22 March 2009, Rafael J. Wysocki wrote:
> > On Saturday 21 March 2009, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <[email protected]>
> > >
> > > The story in http://bugzilla.kernel.org/show_bug.cgi?id=12846
> > > shows that setting the power state of a PCI device by
> > > pci_raw_set_power_state() may sometimes fail. For this reason,
> > > pci_raw_set_power_state() should not assume that the power state
> > > of the device has actually changed after writing into its PMCSR.
> > > Instead, it should read the value from there and use it to update
> > > dev->current_state. It also is useful to print a warning if the
> > > device's power state hasn't changed as expected.
> > >
> > > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > > ---
> > OK, since the Ben's radeonfb fix for bug #12846 has been merged,
> > I'd like to do something a bit different.
> >
> > Patch 1/2 introduces __pci_set_power_state() that will allow the
> > radeonfb driver not to open code PCI PM operations.
> >
> > Patch 2/2 makes the driver use __pci_set_power_state().
> >
> > Comments welcome.
>
> Well, Jesse doesn't like these patches very much, so here's an
> alternative.
>
> 1/2 changes platform_pci_set_power_state() into an exported function
> (and uses it to simplify pci_set_power_state() a bit) so that the
> radeonfb driver can use it.
>
> 2/2 modifies the radeonfb driver itself.

The thing I didn't like was that it made the radeon driver use an
internal interface; I'd really prefer a proper return value from
pci_set_power_state, which in turn means auditing all its current
callers. But that doesn't seem worth it unless we see other drivers
needing something similar...

And if we did go with something like your first patch, I'd still rather
see the timeout done in the driver, rather than having the attempts &
delay included in the function...

--
Jesse Barnes, Intel Open Source Technology Center

2009-03-23 23:01:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/2] Make radeonfb use PCI PM core for suspendig device (was: Re: [RFC][PATCH] PCI PM: Be extra careful when changing power states of devices)

On Monday 23 March 2009, Benjamin Herrenschmidt wrote:
> On Sun, 2009-03-22 at 22:08 +0100, Rafael J. Wysocki wrote:
> > On Saturday 21 March 2009, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <[email protected]>
> > >
> > > The story in http://bugzilla.kernel.org/show_bug.cgi?id=12846 shows
> > > that setting the power state of a PCI device by
> > > pci_raw_set_power_state() may sometimes fail. For this reason,
> > > pci_raw_set_power_state() should not assume that the power state of
> > > the device has actually changed after writing into its PMCSR.
> > > Instead, it should read the value from there and use it to update
> > > dev->current_state. It also is useful to print a warning if the
> > > device's power state hasn't changed as expected.
> > >
> > > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > > ---
> > OK, since the Ben's radeonfb fix for bug #12846 has been merged, I'd like
> > to do something a bit different.
> >
> > Patch 1/2 introduces __pci_set_power_state() that will allow the radeonfb
> > driver not to open code PCI PM operations.
> >
> > Patch 2/2 makes the driver use __pci_set_power_state().
> >
> > Comments welcome.
>
> No objection.

Great, thanks.

I also discussed the patchset with Jesse on IRC and we agreed it was better
than the alternative approaches. (So, please disregard the patches sent
earlier today).

Would you mind if I put 1/2 and 2/2 into the suspend tree?

Rafael

2009-03-24 01:00:48

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/2] Export platform_pci_set_power_state() and make radeonfb use it

On Mon, 2009-03-23 at 15:23 -0700, Jesse Barnes wrote:
> The thing I didn't like was that it made the radeon driver use an
> internal interface; I'd really prefer a proper return value from
> pci_set_power_state, which in turn means auditing all its current
> callers. But that doesn't seem worth it unless we see other drivers
> needing something similar...
>
> And if we did go with something like your first patch, I'd still rather
> see the timeout done in the driver, rather than having the attempts &
> delay included in the function...

So what ? The driver would call pci_set_power_state() until it stops
failing ?

I'm not too fan of that, because it will change the access pattern
to the chip:

- write PM to 2
- short delay
- read PM, see 0, return error
- driver does big delay
- write PM to 2
- short delay
- read PM ....

vs. the current sequence which is

- write PM to 2
- long delay
- read PM, be happy

Which -seems- to be pretty much what happens in practice, though on that chip,
I don't know for sure about others.

I'm worried of the possible side effects of the first sequence that you propose
since it would do 2 things potentially confusing to the HW:

- read PM after a short delay... it -should- be harmless but you know
HW as well as I do ...

- write PM to 2 a second time after the long delay. Again, it -should- be
harmless since the chip at this stage should already be in D2 state but god
knows how the HW will react.

I'm especially worried about the later in fact. Maybe we can minimize it by
having pci_set_power_state() dbl check the content of the PM reg before writing
to it...

Ben.

2009-03-24 01:14:37

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/2] Export platform_pci_set_power_state() and make radeonfb use it

On Tue, 24 Mar 2009 11:57:19 +1100
Benjamin Herrenschmidt <[email protected]> wrote:

> On Mon, 2009-03-23 at 15:23 -0700, Jesse Barnes wrote:
> > The thing I didn't like was that it made the radeon driver use an
> > internal interface; I'd really prefer a proper return value from
> > pci_set_power_state, which in turn means auditing all its current
> > callers. But that doesn't seem worth it unless we see other drivers
> > needing something similar...
> >
> > And if we did go with something like your first patch, I'd still
> > rather see the timeout done in the driver, rather than having the
> > attempts & delay included in the function...
>
> So what ? The driver would call pci_set_power_state() until it stops
> failing ?

Yeah, that's what I had in mind.

> I'm not too fan of that, because it will change the access pattern
> to the chip:
>
> - write PM to 2
> - short delay
> - read PM, see 0, return error
> - driver does big delay
> - write PM to 2
> - short delay
> - read PM ....
>
> vs. the current sequence which is
>
> - write PM to 2
> - long delay
> - read PM, be happy
>
> Which -seems- to be pretty much what happens in practice, though on
> that chip, I don't know for sure about others.
>
> I'm worried of the possible side effects of the first sequence that
> you propose since it would do 2 things potentially confusing to the
> HW:
>
> - read PM after a short delay... it -should- be harmless but you know
> HW as well as I do ...
>
> - write PM to 2 a second time after the long delay. Again, it
> -should- be harmless since the chip at this stage should already be
> in D2 state but god knows how the HW will react.
>
> I'm especially worried about the later in fact. Maybe we can minimize
> it by having pci_set_power_state() dbl check the content of the PM
> reg before writing to it...

Honestly I'm not too happy about any of the approaches, but yeah I see
your point. The main thing is to prevent any config space access for
a specified time after the first D-state transition, which I think we
do correctly in the core. Beyond that, we just have to make sure the
core state is updated correctly; Rafael's first patch does that
correctly I think.

Actually now that I think of it, maybe Alex can get us details on the
errata here. If we just need a longer wait between a D-state transition
and the next config space access for this chip (or set of chips), we
could add that as a proper quirk to the core...

Alex, any thoughts about the bug & patch in
http://bugzilla.kernel.org/show_bug.cgi?id=12846 ? Looks like old
radeon chips need a workaround when transitioning from D0 -> D2...

Thanks,
--
Jesse Barnes, Intel Open Source Technology Center

2009-03-24 11:01:14

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/2] Export platform_pci_set_power_state() and make radeonfb use it

On Tuesday 24 March 2009, Jesse Barnes wrote:
> On Tue, 24 Mar 2009 11:57:19 +1100
> Benjamin Herrenschmidt <[email protected]> wrote:
>
> > On Mon, 2009-03-23 at 15:23 -0700, Jesse Barnes wrote:
> > > The thing I didn't like was that it made the radeon driver use an
> > > internal interface; I'd really prefer a proper return value from
> > > pci_set_power_state, which in turn means auditing all its current
> > > callers. But that doesn't seem worth it unless we see other drivers
> > > needing something similar...
> > >
> > > And if we did go with something like your first patch, I'd still
> > > rather see the timeout done in the driver, rather than having the
> > > attempts & delay included in the function...
> >
> > So what ? The driver would call pci_set_power_state() until it stops
> > failing ?
>
> Yeah, that's what I had in mind.
>
> > I'm not too fan of that, because it will change the access pattern
> > to the chip:
> >
> > - write PM to 2
> > - short delay
> > - read PM, see 0, return error
> > - driver does big delay
> > - write PM to 2
> > - short delay
> > - read PM ....
> >
> > vs. the current sequence which is
> >
> > - write PM to 2
> > - long delay
> > - read PM, be happy
> >
> > Which -seems- to be pretty much what happens in practice, though on
> > that chip, I don't know for sure about others.
> >
> > I'm worried of the possible side effects of the first sequence that
> > you propose since it would do 2 things potentially confusing to the
> > HW:
> >
> > - read PM after a short delay... it -should- be harmless but you know
> > HW as well as I do ...
> >
> > - write PM to 2 a second time after the long delay. Again, it
> > -should- be harmless since the chip at this stage should already be
> > in D2 state but god knows how the HW will react.
> >
> > I'm especially worried about the later in fact. Maybe we can minimize
> > it by having pci_set_power_state() dbl check the content of the PM
> > reg before writing to it...
>
> Honestly I'm not too happy about any of the approaches, but yeah I see
> your point. The main thing is to prevent any config space access for
> a specified time after the first D-state transition, which I think we
> do correctly in the core. Beyond that, we just have to make sure the
> core state is updated correctly; Rafael's first patch does that
> correctly I think.

In fact I have yet another idea. If we use the "retransmission with exponential
backoff" algorithm in pci_raw_set_power_state(), we won't have to add any
extra parameters to pci_set_power_state() and the radeon case will be covered
automatically. That should also cover any other devices having similar
problems IMO.

A patch to implement that is appended, please tell me what you think.

Thanks,
Rafael

---
drivers/pci/pci.c | 50 ++++++++++++++++++++++++++++++++++++++++----------
include/linux/pci.h | 1 +
2 files changed, 41 insertions(+), 10 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -436,8 +436,10 @@ static inline int platform_pci_sleep_wak
*/
static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
{
- u16 pmcsr;
+ u16 pmcsr, pmcsr_r;
+ unsigned int delay;
bool need_restore = false;
+ int error = 0;

/* Check if we're already there */
if (dev->current_state == state)
@@ -488,17 +490,45 @@ static int pci_raw_set_power_state(struc
break;
}

- /* enter specified state */
- pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
-
- /* Mandatory power management transition delays */
- /* see PCI PM 1.1 5.6.1 table 18 */
+ /*
+ * Mandatory power management transition delays, in microseconds
+ * (see PCI PM 1.1 5.6.1 table 18).
+ */
if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
- msleep(pci_pm_d3_delay);
+ delay = pci_pm_d3_delay * 1000;
else if (state == PCI_D2 || dev->current_state == PCI_D2)
- udelay(PCI_PM_D2_DELAY);
+ delay = PCI_PM_D2_DELAY;
+ else
+ delay = 0;
+
+ /*
+ * Write the new value to the control register, wait as long as needed
+ * and check if the value read back from the register is the same as
+ * the written one. If not, repeat with exponential backoff.
+ */
+ do {
+ pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
+ if (delay) {
+ if (delay < 1000)
+ udelay(delay);
+ else
+ msleep(DIV_ROUND_UP(delay, 1000));
+ delay <<= 1;
+ }
+ pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr_r);
+ if (pmcsr == pmcsr_r) {
+ dev->current_state = state;
+ break;
+ }
+ } while (delay && delay <= PCI_PM_MAX_DELAY);

- dev->current_state = state;
+ if (pmcsr != pmcsr_r) {
+ dev->current_state = (pmcsr_r & PCI_PM_CTRL_STATE_MASK);
+ dev_warn(&dev->dev,
+ "failed to set power state to D%d, currently in D%d\n",
+ state, dev->current_state);
+ error = -ENODEV;
+ }

/* According to section 5.4.1 of the "PCI BUS POWER MANAGEMENT
* INTERFACE SPECIFICATION, REV. 1.2", a device transitioning
@@ -518,7 +548,7 @@ static int pci_raw_set_power_state(struc
if (dev->bus->self)
pcie_aspm_pm_state_change(dev->bus->self);

- return 0;
+ return error;
}

/**
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -117,6 +117,7 @@ typedef int __bitwise pci_power_t;
#define PCI_UNKNOWN ((pci_power_t __force) 5)
#define PCI_POWER_ERROR ((pci_power_t __force) -1)

+#define PCI_PM_MAX_DELAY 2000000
#define PCI_PM_D2_DELAY 200
#define PCI_PM_D3_WAIT 10
#define PCI_PM_BUS_WAIT 50

2009-03-24 21:18:00

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/2] Export platform_pci_set_power_state() and make radeonfb use it

On Tue, 2009-03-24 at 12:00 +0100, Rafael J. Wysocki wrote:

> In fact I have yet another idea. If we use the "retransmission with exponential
> backoff" algorithm in pci_raw_set_power_state(), we won't have to add any
> extra parameters to pci_set_power_state() and the radeon case will be covered
> automatically. That should also cover any other devices having similar
> problems IMO.
>
> A patch to implement that is appended, please tell me what you think.

This is crazy....

Most devices don't need that and those who are a bit "touchy" may well
puke on being written to too fast while they are in the middle of
the transition.

Let's wait to see if Alex from ATI has an answer here, and if not, I
would suggest keeping the dirt inside radeonfb.

Ben.

2009-03-24 22:01:16

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/2] Export platform_pci_set_power_state() and make radeonfb use it

On Tuesday 24 March 2009, Benjamin Herrenschmidt wrote:
> On Tue, 2009-03-24 at 12:00 +0100, Rafael J. Wysocki wrote:
>
> > In fact I have yet another idea. If we use the "retransmission with exponential
> > backoff" algorithm in pci_raw_set_power_state(), we won't have to add any
> > extra parameters to pci_set_power_state() and the radeon case will be covered
> > automatically. That should also cover any other devices having similar
> > problems IMO.
> >
> > A patch to implement that is appended, please tell me what you think.
>
> This is crazy....
>
> Most devices don't need that and those who are a bit "touchy" may well
> puke on being written to too fast while they are in the middle of
> the transition.
>
> Let's wait to see if Alex from ATI has an answer here, and if not, I
> would suggest keeping the dirt inside radeonfb.

OK, so what would you like to do exactly?

The only thing I see is not to set current_state to PCI_D2 before calling the
final pci_set_power_state().

Rafael

2009-03-24 22:04:50

by Alex Deucher

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/2] Export platform_pci_set_power_state() and make radeonfb use it

On 3/23/09, Jesse Barnes <[email protected]> wrote:
> On Tue, 24 Mar 2009 11:57:19 +1100
> Benjamin Herrenschmidt <[email protected]> wrote:
>
> > On Mon, 2009-03-23 at 15:23 -0700, Jesse Barnes wrote:
> > > The thing I didn't like was that it made the radeon driver use an
> > > internal interface; I'd really prefer a proper return value from
> > > pci_set_power_state, which in turn means auditing all its current
> > > callers. But that doesn't seem worth it unless we see other drivers
> > > needing something similar...
> > >
> > > And if we did go with something like your first patch, I'd still
> > > rather see the timeout done in the driver, rather than having the
> > > attempts & delay included in the function...
> >
> > So what ? The driver would call pci_set_power_state() until it stops
> > failing ?
>
> Yeah, that's what I had in mind.
>
> > I'm not too fan of that, because it will change the access pattern
> > to the chip:
> >
> > - write PM to 2
> > - short delay
> > - read PM, see 0, return error
> > - driver does big delay
> > - write PM to 2
> > - short delay
> > - read PM ....
> >
> > vs. the current sequence which is
> >
> > - write PM to 2
> > - long delay
> > - read PM, be happy
> >
> > Which -seems- to be pretty much what happens in practice, though on
> > that chip, I don't know for sure about others.
> >
> > I'm worried of the possible side effects of the first sequence that
> > you propose since it would do 2 things potentially confusing to the
> > HW:
> >
> > - read PM after a short delay... it -should- be harmless but you know
> > HW as well as I do ...
> >
> > - write PM to 2 a second time after the long delay. Again, it
> > -should- be harmless since the chip at this stage should already be
> > in D2 state but god knows how the HW will react.
> >
> > I'm especially worried about the later in fact. Maybe we can minimize
> > it by having pci_set_power_state() dbl check the content of the PM
> > reg before writing to it...
>
> Honestly I'm not too happy about any of the approaches, but yeah I see
> your point. The main thing is to prevent any config space access for
> a specified time after the first D-state transition, which I think we
> do correctly in the core. Beyond that, we just have to make sure the
> core state is updated correctly; Rafael's first patch does that
> correctly I think.
>
> Actually now that I think of it, maybe Alex can get us details on the
> errata here. If we just need a longer wait between a D-state transition
> and the next config space access for this chip (or set of chips), we
> could add that as a proper quirk to the core...
>
> Alex, any thoughts about the bug & patch in
> http://bugzilla.kernel.org/show_bug.cgi?id=12846 ? Looks like old
> radeon chips need a workaround when transitioning from D0 -> D2...

I don't see any errata for this, but it's hard to find stuff as chips get older.

Alex

2009-03-24 22:25:43

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/2] Export platform_pci_set_power_state() and make radeonfb use it

On Tuesday 24 March 2009, Benjamin Herrenschmidt wrote:
> On Tue, 2009-03-24 at 12:00 +0100, Rafael J. Wysocki wrote:
>
> > In fact I have yet another idea. If we use the "retransmission with exponential
> > backoff" algorithm in pci_raw_set_power_state(), we won't have to add any
> > extra parameters to pci_set_power_state() and the radeon case will be covered
> > automatically. That should also cover any other devices having similar
> > problems IMO.
> >
> > A patch to implement that is appended, please tell me what you think.
>
> This is crazy....
>
> Most devices don't need that and those who are a bit "touchy" may well
> puke on being written to too fast while they are in the middle of
> the transition.
>
> Let's wait to see if Alex from ATI has an answer here, and if not, I
> would suggest keeping the dirt inside radeonfb.

BTW, whatever you do inside of radeonfb will be based on specific assumptions
regarding how pci_set_power_state() works internally, which makes that
extremely fragile. This way you'll also make the core depend on the radeonfb
driver to some extent, which I don't think is desirable.

Thanks,
Rafael