Brian King <[email protected]> wrote:
>
> +void pci_block_user_cfg_access(struct pci_dev *dev)
> +{
> + unsigned long flags;
> +
> + pci_save_state(dev);
> + spin_lock_irqsave(&pci_lock, flags);
> + dev->block_ucfg_access = 1;
> + spin_unlock_irqrestore(&pci_lock, flags);
Are you sure the locking in here is meaningful? All it will really do is
give you a couple of barriers.
Andrew Morton wrote:
> Brian King <[email protected]> wrote:
>
>>+void pci_block_user_cfg_access(struct pci_dev *dev)
>>+{
>>+ unsigned long flags;
>>+
>>+ pci_save_state(dev);
>>+ spin_lock_irqsave(&pci_lock, flags);
>>+ dev->block_ucfg_access = 1;
>>+ spin_unlock_irqrestore(&pci_lock, flags);
>
>
> Are you sure the locking in here is meaningful? All it will really do is
> give you a couple of barriers.
Actually, it is meaningful. It synchronizes the blocking of pci config
accesses with other pci config accesses that may be going on when this
function is called. Without the locking, the API cannot guarantee that
no further user initiated PCI config accesses will be initiated after
this function is called.
Brian
--
Brian King
eServer Storage I/O
IBM Linux Technology Center
On Fri, Sep 02, 2005 at 06:56:35PM -0500, Brian King wrote:
> Andrew Morton wrote:
> >Brian King <[email protected]> wrote:
> >
> >>+void pci_block_user_cfg_access(struct pci_dev *dev)
> >>+{
> >>+ unsigned long flags;
> >>+
> >>+ pci_save_state(dev);
> >>+ spin_lock_irqsave(&pci_lock, flags);
> >>+ dev->block_ucfg_access = 1;
> >>+ spin_unlock_irqrestore(&pci_lock, flags);
> >
> >
> >Are you sure the locking in here is meaningful? All it will really do is
> >give you a couple of barriers.
>
> Actually, it is meaningful. It synchronizes the blocking of pci config
> accesses with other pci config accesses that may be going on when this
> function is called. Without the locking, the API cannot guarantee that
> no further user initiated PCI config accesses will be initiated after
> this function is called.
I don't have the impression you understood what Andrew wrote.
dev->block_ucfg_access = 1 is essentially an atomic operation.
AFAIK, Use of the pci_lock doesn't solve any race conditions that mb()
wouldn't solve.
If you had:
spin_lock_irqsave(&pci_lock, flags);
pci_save_state(dev);
dev->block_ucfg_access = 1;
spin_unlock_irqrestore(&pci_lock, flags);
Then I could buy your arguement since the flag now implies
we need to atomically save state and set the flag.
grant
Grant Grundler writes:
> On Fri, Sep 02, 2005 at 06:56:35PM -0500, Brian King wrote:
> > Andrew Morton wrote:
> > >Brian King <[email protected]> wrote:
> > >
> > >>+void pci_block_user_cfg_access(struct pci_dev *dev)
> > >>+{
> > >>+ unsigned long flags;
> > >>+
> > >>+ pci_save_state(dev);
> > >>+ spin_lock_irqsave(&pci_lock, flags);
> > >>+ dev->block_ucfg_access = 1;
> > >>+ spin_unlock_irqrestore(&pci_lock, flags);
> > >
> > >
> > >Are you sure the locking in here is meaningful? All it will really do is
> > >give you a couple of barriers.
> >
> > Actually, it is meaningful. It synchronizes the blocking of pci config
> > accesses with other pci config accesses that may be going on when this
> > function is called. Without the locking, the API cannot guarantee that
> > no further user initiated PCI config accesses will be initiated after
> > this function is called.
>
> I don't have the impression you understood what Andrew wrote.
> dev->block_ucfg_access = 1 is essentially an atomic operation.
> AFAIK, Use of the pci_lock doesn't solve any race conditions that mb()
> wouldn't solve.
Think about it. Taking the lock ensures that we don't do the
assignment (dev->block_ucfg_access = 1) while any other cpu has the
pci_lock. In other words, the reason for taking the lock is so that
we wait until nobody else is doing an access, not to make others wait.
> If you had:
> spin_lock_irqsave(&pci_lock, flags);
> pci_save_state(dev);
> dev->block_ucfg_access = 1;
> spin_unlock_irqrestore(&pci_lock, flags);
>
> Then I could buy your arguement since the flag now implies
> we need to atomically save state and set the flag.
That's probably a good thing to do to.
Paul.
On Sat, Sep 03, 2005 at 09:11:30AM +1000, Paul Mackerras wrote:
> Think about it. Taking the lock ensures that we don't do the
> assignment (dev->block_ucfg_access = 1) while any other cpu has the
> pci_lock. In other words, the reason for taking the lock is so that
> we wait until nobody else is doing an access, not to make others wait.
The block_ucfg_access field is only used when making the choice to
use saved state or call the PCI bus cfg accessor.
I don't what problem waiting solves here since any CPU already
accessing real cfg space will finish what they are doing anyway.
ie they already made the choice to access real cfg space.
We just need to make sure successive accesses to cfg space
for this device only access the saved state. For that, a memory barrier
around all uses of block_ucfg_access should be sufficient.
Or do you think I'm still drinking the wrong color cool-aid?
> > If you had:
> > spin_lock_irqsave(&pci_lock, flags);
> > pci_save_state(dev);
> > dev->block_ucfg_access = 1;
> > spin_unlock_irqrestore(&pci_lock, flags);
> >
> > Then I could buy your arguement since the flag now implies
> > we need to atomically save state and set the flag.
>
> That's probably a good thing to do to.
One needs to verify pci_lock isn't acquired in pci_save_state()
(or some other obvious dead lock).
It would make sense to block pci cfg *writes* to that device
while we save the state.
grant
Grant Grundler wrote:
> On Sat, Sep 03, 2005 at 09:11:30AM +1000, Paul Mackerras wrote:
>
>>Think about it. Taking the lock ensures that we don't do the
>>assignment (dev->block_ucfg_access = 1) while any other cpu has the
>>pci_lock. In other words, the reason for taking the lock is so that
>>we wait until nobody else is doing an access, not to make others wait.
>
>
> The block_ucfg_access field is only used when making the choice to
> use saved state or call the PCI bus cfg accessor.
> I don't what problem waiting solves here since any CPU already
> accessing real cfg space will finish what they are doing anyway.
> ie they already made the choice to access real cfg space.
> We just need to make sure successive accesses to cfg space
> for this device only access the saved state. For that, a memory barrier
> around all uses of block_ucfg_access should be sufficient.
> Or do you think I'm still drinking the wrong color cool-aid?
Without the locking, we introduce a race condition.
CPU 0 CPU 1
pci_block_user_cfg_access
pci_save_state
pci_read_user_config_space
check block_ucfg_access
set block_ucfg_access
other code that puts the device
in a state such that it cannot
handle read config i/o, such as
running BIST.
pci read config
In this scenario, If the real read on the left happens after the flag is
set to block user config accesses, then the thread that set the flag
could go off and start BIST or do something else to put the pci device
in a state where it cannot accept real config I/O and we end up with a
target abort, which is exactly what this patch is attempting to fix.
Granted, for the specific usage scenario in ipr, where I am using this
to block config space over BIST, I use a pci config write to start BIST,
which would end up being a point of synchronization, but that seems a
bit non-obvious and limits how the patch can be used by others...
>>>If you had:
>>> spin_lock_irqsave(&pci_lock, flags);
>>> pci_save_state(dev);
>>> dev->block_ucfg_access = 1;
>>> spin_unlock_irqrestore(&pci_lock, flags);
>>>
>>>Then I could buy your arguement since the flag now implies
>>>we need to atomically save state and set the flag.
>>
>>That's probably a good thing to do to.
>
>
> One needs to verify pci_lock isn't acquired in pci_save_state()
> (or some other obvious dead lock).
Unfortunately, it is... Every pci config access grabs the lock, so we
would need to use some special code that did not acquire the lock in
pci_save_state if we wanted to do such a thing.
Brian
--
Brian King
eServer Storage I/O
IBM Linux Technology Center
On Sat, Sep 03, 2005 at 06:37:52PM -0500, Brian King wrote:
...
> Without the locking, we introduce a race condition.
>
> CPU 0 CPU 1
>
> pci_block_user_cfg_access
> pci_save_state
> pci_read_user_config_space
> check block_ucfg_access
> set block_ucfg_access
> other code that puts the device
> in a state such that it cannot
> handle read config i/o, such as
> running BIST.
>
> pci read config
Ok this is good example - I see what the problem is.
You could use the following sequence too then:
pci_block_user_cfg_access
pci_save_state
block_ucfg_access = 1
mb()
while (spin_is_locked(&pci_lock))
relax_cpu();
Think this is sufficient?
> Granted, for the specific usage scenario in ipr, where I am using this
> to block config space over BIST, I use a pci config write to start BIST,
> which would end up being a point of synchronization, but that seems a
> bit non-obvious and limits how the patch can be used by others...
Yes, agreed.
grant
Grant Grundler wrote:
> On Sat, Sep 03, 2005 at 06:37:52PM -0500, Brian King wrote:
> ...
>
>>Without the locking, we introduce a race condition.
>>
>>CPU 0 CPU 1
>>
>> pci_block_user_cfg_access
>> pci_save_state
>>pci_read_user_config_space
>> check block_ucfg_access
>> set block_ucfg_access
>> other code that puts the device
>> in a state such that it cannot
>> handle read config i/o, such as
>> running BIST.
>>
>> pci read config
>
>
> Ok this is good example - I see what the problem is.
> You could use the following sequence too then:
> pci_block_user_cfg_access
> pci_save_state
> block_ucfg_access = 1
> mb()
> while (spin_is_locked(&pci_lock))
> relax_cpu();
>
> Think this is sufficient?
That should work also. Here is an updated patch.
--
Brian King
eServer Storage I/O
IBM Linux Technology Center
On Mon, Sep 05, 2005 at 01:31:20PM -0500, Brian King wrote:
> That should work also. Here is an updated patch.
...
The code looks good...but it got me thinking.
...
> +void pci_block_user_cfg_access(struct pci_dev *dev)
> +{
> + pci_save_state(dev);
> + dev->block_ucfg_access = 1;
> + mb();
> + while (spin_is_locked(&pci_lock))
> + cpu_relax();
> +}
> +EXPORT_SYMBOL_GPL(pci_block_user_cfg_access);
> +
> +/**
> + * pci_unblock_user_cfg_access - Unblock userspace PCI config reads/writes
> + * @dev: pci device struct
> + *
> + * This function allows userspace PCI config accesses to resume.
> + **/
> +void pci_unblock_user_cfg_access(struct pci_dev *dev)
> +{
> + dev->block_ucfg_access = 0;
> +}
Shouldn't pci_unblock_user_cfg_access() have a similar construct
as pci_block_user_cfg_access()?
I'm thinking we don't want to pull the rug out from under someone
who is accessing the saved state, right?
Or does something else guarantee that?
It wasn't obvious from this diff alone.
thanks,
grant
Grant Grundler wrote:
> On Mon, Sep 05, 2005 at 01:31:20PM -0500, Brian King wrote:
>>+void pci_block_user_cfg_access(struct pci_dev *dev)
>>+{
>>+ pci_save_state(dev);
>>+ dev->block_ucfg_access = 1;
>>+ mb();
>>+ while (spin_is_locked(&pci_lock))
>>+ cpu_relax();
>>+}
>>+EXPORT_SYMBOL_GPL(pci_block_user_cfg_access);
>>+
>>+/**
>>+ * pci_unblock_user_cfg_access - Unblock userspace PCI config reads/writes
>>+ * @dev: pci device struct
>>+ *
>>+ * This function allows userspace PCI config accesses to resume.
>>+ **/
>>+void pci_unblock_user_cfg_access(struct pci_dev *dev)
>>+{
>>+ dev->block_ucfg_access = 0;
>>+}
>
>
> Shouldn't pci_unblock_user_cfg_access() have a similar construct
> as pci_block_user_cfg_access()?
>
> I'm thinking we don't want to pull the rug out from under someone
> who is accessing the saved state, right?
> Or does something else guarantee that?
>
> It wasn't obvious from this diff alone.
Sounds reasonable enough. Updated patch attached.
--
Brian King
eServer Storage I/O
IBM Linux Technology Center
Grant Grundler writes:
> Ok this is good example - I see what the problem is.
> You could use the following sequence too then:
> pci_block_user_cfg_access
> pci_save_state
> block_ucfg_access = 1
> mb()
> while (spin_is_locked(&pci_lock))
> relax_cpu();
>
> Think this is sufficient?
Maybe, but it seems like a bad idea to me. It's longer, it's less
obvious what's happening, and it precludes the sorts of optimization
that we do on ppc64 where a cpu that is waiting for a lock can tell
give its time slice to the cpu that is holding the lock (on systems
where the hypervisor time-slices multiple virtual cpus on one physical
cpu).
What's wrong with just doing spin_lock/spin_unlock?
Paul.
On Wed, Sep 07, 2005 at 03:49:37PM +1000, Paul Mackerras wrote:
> Maybe, but it seems like a bad idea to me. It's longer, it's less
> obvious what's happening,
I would argue it more obvious. People looking at the code
are immediately going to realize it was a deliberate choice to
not use a spinlock.
> and it precludes the sorts of optimization
> that we do on ppc64 where a cpu that is waiting for a lock can tell
> give its time slice to the cpu that is holding the lock (on systems
> where the hypervisor time-slices multiple virtual cpus on one physical
> cpu).
relax_cpu() doesn't do that?
> What's wrong with just doing spin_lock/spin_unlock?
it's not wrong - just misleading IMHO. There is no
"critical section" in that particular chunk of code.
If relax_cpu doesn't allow time-slice donation, then I guess
spinlock/unlock with only a comment inside it explain why
would be ok too.
thanks,
grant
Grant Grundler writes:
> I would argue it more obvious. People looking at the code
> are immediately going to realize it was a deliberate choice to
> not use a spinlock.
It achieves exactly the same effect as spin_lock/spin_unlock, just
more verbosely. :)
> > and it precludes the sorts of optimization
> > that we do on ppc64 where a cpu that is waiting for a lock can tell
> > give its time slice to the cpu that is holding the lock (on systems
> > where the hypervisor time-slices multiple virtual cpus on one physical
> > cpu).
>
> relax_cpu() doesn't do that?
No, how can it, when it doesn't get told which virtual cpu to yield
to? spin_lock knows which virtual cpu to yield to because we store
that in the lock variable.
> > What's wrong with just doing spin_lock/spin_unlock?
>
> it's not wrong - just misleading IMHO. There is no
> "critical section" in that particular chunk of code.
Other code is using the lock to ensure the atomicity of a compound
action, which involves the testing the flag and taking some action
based on the value of the flag. We take the lock to preserve that
atomicity. Locks are often used to make a set of compound actions
atomic with respect to each other, which is what we're doing here.
> If relax_cpu doesn't allow time-slice donation, then I guess
> spinlock/unlock with only a comment inside it explain why
> would be ok too.
Preferable, in fact. :)
Paul.
On Thu, Sep 08, 2005 at 08:39:15AM +1000, Paul Mackerras wrote:
> Grant Grundler writes:
>
> > I would argue it more obvious. People looking at the code
> > are immediately going to realize it was a deliberate choice to
> > not use a spinlock.
>
> It achieves exactly the same effect as spin_lock/spin_unlock, just
> more verbosely. :)
Not exactly identical. spin_try_lock() doesn't attempt to acquire
the lock and thus force exclusive access to the cacheline.
Other than that, I agree with you.
I don't see a problem with being verbose here since putting
a spinlock around something that's already atomic (assignment)
even caught Andrew's attention.
> > relax_cpu() doesn't do that?
>
> No, how can it, when it doesn't get told which virtual cpu to yield
> to? spin_lock knows which virtual cpu to yield to because we store
> that in the lock variable.
Ah ok. I was expecting relax_cpu() to just pick a "random" different one.
:^)
> > it's not wrong - just misleading IMHO. There is no
> > "critical section" in that particular chunk of code.
>
> Other code is using the lock to ensure the atomicity of a compound
> action, which involves the testing the flag and taking some action
> based on the value of the flag. We take the lock to preserve that
> atomicity. Locks are often used to make a set of compound actions
> atomic with respect to each other, which is what we're doing here.
Yes.
We agree this chunk only needs to wait until the lock is released.
Other critical sections need to acquire/release the lock.
> > If relax_cpu doesn't allow time-slice donation, then I guess
> > spinlock/unlock with only a comment inside it explain why
> > would be ok too.
>
> Preferable, in fact. :)
Ok. I was just suggesting the alternative since
Andrew (correctly) questioned the use of a spinlock
and it didn't look right to me either.
thanks,
grant
Grant Grundler wrote:
> On Thu, Sep 08, 2005 at 08:39:15AM +1000, Paul Mackerras wrote:
>
>>Grant Grundler writes:
>>
>>
>>>I would argue it more obvious. People looking at the code
>>>are immediately going to realize it was a deliberate choice to
>>>not use a spinlock.
>>
>>It achieves exactly the same effect as spin_lock/spin_unlock, just
>>more verbosely. :)
>
>
> Not exactly identical. spin_try_lock() doesn't attempt to acquire
> the lock and thus force exclusive access to the cacheline.
> Other than that, I agree with you.
>
> I don't see a problem with being verbose here since putting
> a spinlock around something that's already atomic (assignment)
> even caught Andrew's attention.
I reverted the patch to use a spinlock and added a comment. How
does this look?
--
Brian King
eServer Storage I/O
IBM Linux Technology Center
On Wed, Sep 07, 2005 at 10:05:30PM -0500, Brian King wrote:
> I reverted the patch to use a spinlock and added a comment.
> How does this look?
Fine with me. Ball is in akpm/Paul's court.
grant