2004-10-05 22:42:29

by Jesse Barnes

[permalink] [raw]
Subject: [PATCH] I/O space write barrier

I've integrated BenH's latest comments. If it turns out they actually need
this (they may in the future if they implement the other barriers they'd
like), then they can trivially update their definition of mmiowb().

On some platforms (e.g. SGI Challenge, Origin, and Altix machines), writes to
I/O space aren't ordered coming from different CPUs. For the most part, this
isn't a problem since drivers generally spinlock around code that does writeX
calls, but if the last operation a driver does before it releases a lock is a
write and some other CPU takes the lock and immediately does a write, it's
possible the second CPU's write could arrive before the first's.

This patch adds a mmiowb() call to deal with this sort of situation, and
adds some documentation describing I/O ordering issues to deviceiobook.tmpl.
The idea is to mirror the regular, cacheable memory barrier operation, wmb.
Example of the problem this new macro solves:

CPU A: spin_lock_irqsave(&dev_lock, flags)
CPU A: ...
CPU A: writel(newval, ring_ptr);
CPU A: spin_unlock_irqrestore(&dev_lock, flags)
...
CPU B: spin_lock_irqsave(&dev_lock, flags)
CPU B: writel(newval2, ring_ptr);
CPU B: ...
CPU B: spin_unlock_irqrestore(&dev_lock, flags)

In this case, newval2 could be written to ring_ptr before newval. Fixing it
is easy though:

CPU A: spin_lock_irqsave(&dev_lock, flags)
CPU A: ...
CPU A: writel(newval, ring_ptr);
CPU A: mmiowb(); /* ensure no other writes beat us to the device */
CPU A: spin_unlock_irqrestore(&dev_lock, flags)
...
CPU B: spin_lock_irqsave(&dev_lock, flags)
CPU B: writel(newval2, ring_ptr);
CPU B: ...
CPU B: mmiowb();
CPU B: spin_unlock_irqrestore(&dev_lock, flags)

Note that this doesn't address a related case where the driver may want to
actually make a given write get to the device before proceeding. This should
be dealt with by immediately reading a register from the card that has no
side effects. According to the PCI spec, that will guarantee that all writes
have arrived before being sent to the target bus. If no such register is
available (in the case of card resets perhaps), reading from config space is
sufficient (though it may return all ones if the card isn't responding to
read cycles). I've tried to describe how mmiowb() differs from PCI posted
write flushing in the patch to deviceiobook.tmpl.

Patches to use this new primitive in various drivers will come separately,
probably via the SCSI tree.

Signed-off-by: Jesse Barnes <[email protected]>

Thanks,
Jesse


Attachments:
(No filename) (2.49 kB)
mmiowb-6.patch (18.64 kB)
Download all attachments