2004-09-27 18:06:28

by Jesse Barnes

[permalink] [raw]
Subject: [PATCH] I/O space write barrier

Received a few comments and no complaints, so I'm sending this patch in for
inclusion.

On some platforms (e.g. SGI Challenge, Origin, and Altix machines), writes to
I/O space aren't ordered coming from different CPUs. For the most part, this
isn't a problem since drivers generally spinlock around code that does writeX
calls, but if the last operation a driver does before it releases a lock is a
write and some other CPU takes the lock and immediately does a write, it's
possible the second CPU's write could arrive before the first's.

This patch adds a mmiowb() call to deal with this sort of situation, and
adds some documentation describing I/O ordering issues to deviceiobook.tmpl.
The idea is to mirror the regular, cacheable memory barrier operation, wmb.
Example of the problem this new macro solves:

CPU A: spin_lock_irqsave(&dev_lock, flags)
CPU A: ...
CPU A: writel(newval, ring_ptr);
CPU A: spin_unlock_irqrestore(&dev_lock, flags)
...
CPU B: spin_lock_irqsave(&dev_lock, flags)
CPU B: writel(newval2, ring_ptr);
CPU B: ...
CPU B: spin_unlock_irqrestore(&dev_lock, flags)

In this case, newval2 could be written to ring_ptr before newval. Fixing it
is easy though:

CPU A: spin_lock_irqsave(&dev_lock, flags)
CPU A: ...
CPU A: writel(newval, ring_ptr);
CPU A: mmiowb(); /* ensure no other writes beat us to the device */
CPU A: spin_unlock_irqrestore(&dev_lock, flags)
...
CPU B: spin_lock_irqsave(&dev_lock, flags)
CPU B: writel(newval2, ring_ptr);
CPU B: ...
CPU B: mmiowb();
CPU B: spin_unlock_irqrestore(&dev_lock, flags)

Note that this doesn't address a related case where the driver may want to
actually make a given write get to the device before proceeding. This should
be dealt with by immediately reading a register from the card that has no
side effects. According to the PCI spec, that will guarantee that all writes
have arrived before being sent to the target bus. If no such register is
available (in the case of card resets perhaps), reading from config space is
sufficient (though it may return all ones if the card isn't responding to
read cycles). I've tried to describe how mmiowb() differs from PCI posted
write flushing in the patch to deviceiobook.tmpl.

Patches to use this new primitive in various drivers will come separately,
probably via the SCSI tree.

Signed-off-by: Jesse Barnes <[email protected]>

Thanks,
Jesse


Attachments:
(No filename) (2.36 kB)
mmiowb-3.patch (20.87 kB)
Download all attachments

2004-09-29 10:33:23

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

G'day,

On Mon, Sep 27, 2004 at 11:03:39AM -0700, Jesse Barnes wrote:
> On some platforms (e.g. SGI Challenge, Origin, and Altix machines), writes to
> I/O space aren't ordered coming from different CPUs. For the most part, this
> isn't a problem since drivers generally spinlock around code that does writeX
> calls, but if the last operation a driver does before it releases a lock is a
> write and some other CPU takes the lock and immediately does a write, it's
> possible the second CPU's write could arrive before the first's.
>
> This patch adds a mmiowb() call to deal with this sort of situation, and
> adds some documentation describing I/O ordering issues to deviceiobook.tmpl.
> The idea is to mirror the regular, cacheable memory barrier operation, wmb.
>[...]
> Patches to use this new primitive in various drivers will come separately,
> probably via the SCSI tree.

Ok, here's a patch for the tg3 network driver to use mmiowb(). Tests
over the last couple of days has shown that it solves the oopses in
tg3_tx() that I reported and attempted to patch some time ago:

http://marc.theaimsgroup.com/?l=linux-netdev&m=108538612421774&w=2

The CPU usage of the mmiowb() approach is also significantly better
than doing PCI reads to flush the writes (by setting the existing
TG3_FLAG_MBOX_WRITE_REORDER flag). In an artificial CPU-constrained
test on a ProPack kernel, the same amount of CPU work for the REORDER
solution pushes 85.1 MB/s over 2 NICs compared to 146.5 MB/s for the
mmiowb() solution.


--- linux.orig/drivers/net/tg3.c 2004-09-22 17:20:45.%N +1000
+++ linux/drivers/net/tg3.c 2004-09-29 19:45:16.%N +1000
@@ -44,6 +44,19 @@
#include <asm/pbm.h>
#endif

+#ifndef mmiowb
+/*
+ * mmiowb() is a memory-mapped I/O write boundary, useful for
+ * preserving send ring update ordering between multiple CPUs
+ * Define it if it doesn't exist.
+ */
+#ifdef CONFIG_IA64_SGI_SN2
+#define mmiowb() sn_mmiob()
+#else
+#define mmiowb()
+#endif
+#endif
+
#if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE)
#define TG3_VLAN_TAG_USED 1
#else
@@ -2725,6 +2738,7 @@ next_pkt_nopost:
tw32_rx_mbox(MAILBOX_RCV_JUMBO_PROD_IDX + TG3_64BIT_REG_LOW,
sw_idx);
}
+ mmiowb();

return received;
}
@@ -3172,6 +3186,7 @@ static int tg3_start_xmit(struct sk_buff
netif_stop_queue(dev);

out_unlock:
+ mmiowb();
spin_unlock_irqrestore(&tp->tx_lock, flags);

dev->trans_start = jiffies;


Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

2004-09-29 20:36:52

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

On Wed, 29 Sep 2004 20:36:46 +1000
Greg Banks <[email protected]> wrote:

> Ok, here's a patch for the tg3 network driver to use mmiowb(). Tests
> over the last couple of days has shown that it solves the oopses in
> tg3_tx() that I reported and attempted to patch some time ago:
>
> http://marc.theaimsgroup.com/?l=linux-netdev&m=108538612421774&w=2
>
> The CPU usage of the mmiowb() approach is also significantly better
> than doing PCI reads to flush the writes (by setting the existing
> TG3_FLAG_MBOX_WRITE_REORDER flag). In an artificial CPU-constrained
> test on a ProPack kernel, the same amount of CPU work for the REORDER
> solution pushes 85.1 MB/s over 2 NICs compared to 146.5 MB/s for the
> mmiowb() solution.

Please put this macro in asm/io.h or similar and make sure
every platform has it implemented or provides a NOP version.

A lot of people are going to get this wrong btw. The only
way it's really going to be cured across the board is if someone
like yourself who understands this audits all of the drivers.

2004-09-29 20:44:59

by Jesse Barnes

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

On Wednesday, September 29, 2004 1:35 pm, David S. Miller wrote:
> On Wed, 29 Sep 2004 20:36:46 +1000
>
> Greg Banks <[email protected]> wrote:
> > Ok, here's a patch for the tg3 network driver to use mmiowb(). Tests
> > over the last couple of days has shown that it solves the oopses in
> > tg3_tx() that I reported and attempted to patch some time ago:
> >
> > http://marc.theaimsgroup.com/?l=linux-netdev&m=108538612421774&w=2
> >
> > The CPU usage of the mmiowb() approach is also significantly better
> > than doing PCI reads to flush the writes (by setting the existing
> > TG3_FLAG_MBOX_WRITE_REORDER flag). In an artificial CPU-constrained
> > test on a ProPack kernel, the same amount of CPU work for the REORDER
> > solution pushes 85.1 MB/s over 2 NICs compared to 146.5 MB/s for the
> > mmiowb() solution.
>
> Please put this macro in asm/io.h or similar and make sure
> every platform has it implemented or provides a NOP version.

The patch that actually implements mmiowb() already does this, I think Greg
just used his patch for testing. The proper way to do it of course is to
just use mmiowb() where needed in tg3 after the write barrier patch gets in.

> A lot of people are going to get this wrong btw. The only
> way it's really going to be cured across the board is if someone
> like yourself who understands this audits all of the drivers.

Yep, just like PCI posting (though many people seem to have a grasp on that
now).

Thanks,
Jesse

2004-09-29 20:51:24

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

On Wed, 29 Sep 2004 13:43:55 -0700
Jesse Barnes <[email protected]> wrote:

> The patch that actually implements mmiowb() already does this, I think Greg
> just used his patch for testing. The proper way to do it of course is to
> just use mmiowb() where needed in tg3 after the write barrier patch gets in.

Perfect, please send me a tg3 patch once the mmiowb() bits
go into the tree.

Thanks a lot.

2004-09-29 22:58:15

by Jesse Barnes

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

On Monday, September 27, 2004 11:03 am, Jesse Barnes wrote:
> Received a few comments and no complaints, so I'm sending this patch in for
> inclusion.

This one fixes up the documentation a bit, but is otherwise the same.

Signed-off-by: Jesse Barnes <[email protected]>

Jesse


Attachments:
(No filename) (276.00 B)
mmiowb-5.patch (20.95 kB)
Download all attachments

2004-09-30 02:11:49

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

On Thu, 2004-09-30 at 06:50, David S. Miller wrote:
> On Wed, 29 Sep 2004 13:43:55 -0700
> Jesse Barnes <[email protected]> wrote:
>
> > The patch that actually implements mmiowb() already does this, I think Greg
> > just used his patch for testing.

Yes, that hunk will be unnecessary when Jesse's patch goes in.

> The proper way to do it of course is to
> > just use mmiowb() where needed in tg3 after the write barrier patch gets in.
>
> Perfect, please send me a tg3 patch once the mmiowb() bits
> go into the tree.

Will do.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


2004-09-30 07:16:15

by Jeremy Higdon

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

Here's the qla1280 patch to go with Jesse's mmiowb patch.
Comments are verbose to satisfy a request from Mr. Bottomley.

signed-off-by: Jeremy Higdon <[email protected]>

===== drivers/scsi/qla1280.c 1.65 vs edited =====
--- 1.65/drivers/scsi/qla1280.c 2004-07-28 20:59:10 -07:00
+++ edited/drivers/scsi/qla1280.c 2004-09-29 23:43:30 -07:00
@@ -3397,8 +3397,22 @@
"qla1280_64bit_start_scsi: Wakeup RISC for pending command\n");
sp->flags |= SRB_SENT;
ha->actthreads++;
+
+ /*
+ * Update request index to mailbox4 (Request Queue In).
+ * The mmiowb() ensures that this write is ordered with writes by other
+ * CPUs. Without the mmiowb(), it is possible for the following:
+ * CPUA posts write of index 5 to mailbox4
+ * CPUA releases host lock
+ * CPUB acquires host lock
+ * CPUB posts write of index 6 to mailbox4
+ * On PCI bus, order reverses and write of 6 posts, then index 5,
+ * causing chip to issue full queue of stale commands
+ * The mmiowb() prevents future writes from crossing the barrier.
+ * See Documentation/DocBook/deviceiobook.tmpl for more information.
+ */
WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
- (void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+ mmiowb();

out:
if (status)
@@ -3665,8 +3679,22 @@
"for pending command\n");
sp->flags |= SRB_SENT;
ha->actthreads++;
+
+ /*
+ * Update request index to mailbox4 (Request Queue In).
+ * The mmiowb() ensures that this write is ordered with writes by other
+ * CPUs. Without the mmiowb(), it is possible for the following:
+ * CPUA posts write of index 5 to mailbox4
+ * CPUA releases host lock
+ * CPUB acquires host lock
+ * CPUB posts write of index 6 to mailbox4
+ * On PCI bus, order reverses and write of 6 posts, then index 5,
+ * causing chip to issue full queue of stale commands
+ * The mmiowb() prevents future writes from crossing the barrier.
+ * See Documentation/DocBook/deviceiobook.tmpl for more information.
+ */
WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
- (void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+ mmiowb();

out:
if (status)
@@ -3776,9 +3804,21 @@
} else
ha->request_ring_ptr++;

- /* Set chip new ring index. */
+ /*
+ * Update request index to mailbox4 (Request Queue In).
+ * The mmiowb() ensures that this write is ordered with writes by other
+ * CPUs. Without the mmiowb(), it is possible for the following:
+ * CPUA posts write of index 5 to mailbox4
+ * CPUA releases host lock
+ * CPUB acquires host lock
+ * CPUB posts write of index 6 to mailbox4
+ * On PCI bus, order reverses and write of 6 posts, then index 5,
+ * causing chip to issue full queue of stale commands
+ * The mmiowb() prevents future writes from crossing the barrier.
+ * See Documentation/DocBook/deviceiobook.tmpl for more information.
+ */
WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
- (void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+ mmiowb();

LEAVE("qla1280_isp_cmd");
}

2004-09-30 21:33:30

by Guennadi Liakhovetski

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

On Thu, 30 Sep 2004, Jeremy Higdon wrote:

> Here's the qla1280 patch to go with Jesse's mmiowb patch.
> Comments are verbose to satisfy a request from Mr. Bottomley.
>
> signed-off-by: Jeremy Higdon <[email protected]>
>
> ===== drivers/scsi/qla1280.c 1.65 vs edited =====
> --- 1.65/drivers/scsi/qla1280.c 2004-07-28 20:59:10 -07:00
> +++ edited/drivers/scsi/qla1280.c 2004-09-29 23:43:30 -07:00
> @@ -3397,8 +3397,22 @@
> "qla1280_64bit_start_scsi: Wakeup RISC for pending command\n");
> sp->flags |= SRB_SENT;
> ha->actthreads++;
> +
> + /*
> + * Update request index to mailbox4 (Request Queue In).
> + * The mmiowb() ensures that this write is ordered with writes by other
> + * CPUs. Without the mmiowb(), it is possible for the following:
> + * CPUA posts write of index 5 to mailbox4
> + * CPUA releases host lock
> + * CPUB acquires host lock
> + * CPUB posts write of index 6 to mailbox4
> + * On PCI bus, order reverses and write of 6 posts, then index 5,
> + * causing chip to issue full queue of stale commands
> + * The mmiowb() prevents future writes from crossing the barrier.
> + * See Documentation/DocBook/deviceiobook.tmpl for more information.
> + */

A pretty obvious note: instead of repeating this nice but pretty lengthy
comment 3 times in the same file, wouldn't it be better to write at
further locations something like

/* Enforce IO-ordering. See comment in <function> for details. */

Also helps if you later have to modify the comment, or move it, or add
more mmiowb()s, or do some other modifications.

Thanks
Guennadi
---
Guennadi Liakhovetski

2004-10-04 20:47:44

by Albert Cahalan

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

> diff -Nru a/include/asm-ppc/io.h b/include/asm-ppc/io.h
> --- a/include/asm-ppc/io.h 2004-09-27 10:48:41 -07:00
> +++ b/include/asm-ppc/io.h 2004-09-27 10:48:41 -07:00
> @@ -197,6 +197,8 @@
> #define memcpy_fromio(a,b,c) memcpy((a),(void *)(b),(c))
> #define memcpy_toio(a,b,c) memcpy((void *)(a),(b),(c))
>
> +#define mmiowb() asm volatile ("eieio" ::: "memory")
> +
> /*
> * Map in an area of physical address space, for accessing
> * I/O devices etc.

I don't think this is right. For ppc, eieio is
already included as part of the assembly for the
IO operations. If you could delete that, great,
but I suspect that nearly all drivers would break.

There's an existing eieio() inline function that
you could use, instead of fresh assembly code.

BTW, the "eieio" name is better. The "wb" part
of "mmiowb" looks like "write back" to me, as if
it were some sort of cache push operation. It is
also lacking an appropriate song. :-)


2004-10-04 21:22:32

by Jesse Barnes

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

On Monday, October 4, 2004 1:39 pm, Albert Cahalan wrote:
> > diff -Nru a/include/asm-ppc/io.h b/include/asm-ppc/io.h
> > --- a/include/asm-ppc/io.h 2004-09-27 10:48:41 -07:00
> > +++ b/include/asm-ppc/io.h 2004-09-27 10:48:41 -07:00
> > @@ -197,6 +197,8 @@
> > #define memcpy_fromio(a,b,c) memcpy((a),(void *)(b),(c))
> > #define memcpy_toio(a,b,c) memcpy((void *)(a),(b),(c))
> >
> > +#define mmiowb() asm volatile ("eieio" ::: "memory")
> > +
> > /*
> > * Map in an area of physical address space, for accessing
> > * I/O devices etc.
>
> I don't think this is right. For ppc, eieio is
> already included as part of the assembly for the
> IO operations. If you could delete that, great,
> but I suspect that nearly all drivers would break.

Ok, if it's covered than mmiowb() can just be empty for ppc.

> BTW, the "eieio" name is better. The "wb" part
> of "mmiowb" looks like "write back" to me, as if
> it were some sort of cache push operation. It is
> also lacking an appropriate song. :-)

It's supposed to be 'write barrier' just like wmb is a write memory barrier,
so is mmiowb a memory-mapped I/O write barrier. Make sense?

Thanks,
Jesse

2004-10-05 00:37:28

by Albert Cahalan

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

On Mon, 2004-10-04 at 17:20, Jesse Barnes wrote:
> On Monday, October 4, 2004 1:39 pm, Albert Cahalan wrote:
> > > diff -Nru a/include/asm-ppc/io.h b/include/asm-ppc/io.h
> > > --- a/include/asm-ppc/io.h 2004-09-27 10:48:41 -07:00
> > > +++ b/include/asm-ppc/io.h 2004-09-27 10:48:41 -07:00
> > > @@ -197,6 +197,8 @@
> > > #define memcpy_fromio(a,b,c) memcpy((a),(void *)(b),(c))
> > > #define memcpy_toio(a,b,c) memcpy((void *)(a),(b),(c))
> > >
> > > +#define mmiowb() asm volatile ("eieio" ::: "memory")
> > > +
> > > /*
> > > * Map in an area of physical address space, for accessing
> > > * I/O devices etc.
> >
> > I don't think this is right. For ppc, eieio is
> > already included as part of the assembly for the
> > IO operations. If you could delete that, great,
> > but I suspect that nearly all drivers would break.
>
> Ok, if it's covered than mmiowb() can just be empty for ppc.

Ideally, it would be eieio, and the eieio in each
of the IO operations would be removed. Finding and
fixing all the drivers that break looks impossible
though; most driver developers will be on x86 boxes.

> > BTW, the "eieio" name is better. The "wb" part
> > of "mmiowb" looks like "write back" to me, as if
> > it were some sort of cache push operation. It is
> > also lacking an appropriate song. :-)
>
> It's supposed to be 'write barrier' just like wmb is a write memory barrier,
> so is mmiowb a memory-mapped I/O write barrier. Make sense?

In that case: wmmiob

(or something longer, like mmio_write_fence maybe)

As a name, "wmb" sucks almost as much as "cli" and "sti" do.
It dates back to the Alpha port, where it's an opcode.


2004-10-05 01:27:59

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

On Tue, 2004-10-05 at 10:32, Albert Cahalan wrote:


> Ideally, it would be eieio, and the eieio in each
> of the IO operations would be removed. Finding and
> fixing all the drivers that break looks impossible
> though; most driver developers will be on x86 boxes.

I don't agree. IO operations shouldn't be relaxed by
default. That's really asking too much of driver writers

Ben.

2004-10-05 02:27:17

by Jesse Barnes

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

On Monday, October 04, 2004 6:22 pm, Benjamin Herrenschmidt wrote:
> On Tue, 2004-10-05 at 10:32, Albert Cahalan wrote:
> > Ideally, it would be eieio, and the eieio in each
> > of the IO operations would be removed. Finding and
> > fixing all the drivers that break looks impossible
> > though; most driver developers will be on x86 boxes.
>
> I don't agree. IO operations shouldn't be relaxed by
> default. That's really asking too much of driver writers

I agree, it's hard to get right, especially when you've got a large base of
drivers with hard to find assumptions about ordering.

What about mmiowb()? Should it be eieio? I don't want to post another patch
if I don't have to...

Thanks,
Jesse

2004-10-05 02:33:54

by Jesse Barnes

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

On Monday, October 04, 2004 5:32 pm, Albert Cahalan wrote:
> Ideally, it would be eieio, and the eieio in each
> of the IO operations would be removed. Finding and
> fixing all the drivers that break looks impossible
> though; most driver developers will be on x86 boxes.

Well, I won't pretend to understand how the PPC ordering rules work, so I'll
defer to benh on that one.

> In that case: wmmiob
>
> (or something longer, like mmio_write_fence maybe)
>
> As a name, "wmb" sucks almost as much as "cli" and "sti" do.
> It dates back to the Alpha port, where it's an opcode.

The other option I briefly considered was wwjd(), but I don't think He has an
official position on posted write ordering.

Jesse

2004-10-05 03:10:09

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier


> I agree, it's hard to get right, especially when you've got a large base of
> drivers with hard to find assumptions about ordering.
>
> What about mmiowb()? Should it be eieio? I don't want to post another patch
> if I don't have to...

I don't understand the whole story...

If normal accesses aren't relaxed (and I think they shouldn't be), then
there is no point in a barrier here.... If you need an explicit barrier
for explicitely relaxed accesses, then yes.

That doesn't solve my need of MMIO vs. memory unless you are trying to
cover that as well, in which case it should be a sync.

Ben.


2004-10-05 15:37:03

by Jesse Barnes

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

On Monday, October 4, 2004 8:04 pm, Benjamin Herrenschmidt wrote:
> > I agree, it's hard to get right, especially when you've got a large base
> > of drivers with hard to find assumptions about ordering.
> >
> > What about mmiowb()? Should it be eieio? I don't want to post another
> > patch if I don't have to...
>
> I don't understand the whole story...
>
> If normal accesses aren't relaxed (and I think they shouldn't be), then
> there is no point in a barrier here.... If you need an explicit barrier
> for explicitely relaxed accesses, then yes.

This macro is only supposed to deal with writes from different CPUs that may
arrive out of order, nothing else. It sounds like PPC won't allow that
normally, so I can be an empty definition.

> That doesn't solve my need of MMIO vs. memory unless you are trying to
> cover that as well, in which case it should be a sync.

No, I think that has to be covered separately.

Jesse

2004-10-05 22:48:43

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

On Wed, 2004-10-06 at 01:33, Jesse Barnes wrote:

> This macro is only supposed to deal with writes from different CPUs that may
> arrive out of order, nothing else. It sounds like PPC won't allow that
> normally, so I can be an empty definition.

I don't understand that neither. You can never guarantee any ordering
between writes from different CPUs unless you have a sinlock. If you
have an ordering problem with spinlocks, then it's a totally different
issue, a bit more like MMIO vs. cacheable mem that we have on PPC. If
this is the problem you are trying to chase, then we could use such a
barrier on ppc too and make it a hard sync, but it has nothing to do
with the write barrier we already have in our IO accessors...

> > That doesn't solve my need of MMIO vs. memory unless you are trying to
> > cover that as well, in which case it should be a sync.
>
> No, I think that has to be covered separately.

How so ? Again, this whole "ordering of writes between different CPU" makes
absolutely no sense to me.

Ben.


2004-10-05 23:16:59

by Jesse Barnes

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

On Tuesday, October 5, 2004 3:41 pm, Benjamin Herrenschmidt wrote:
> On Wed, 2004-10-06 at 01:33, Jesse Barnes wrote:
> > This macro is only supposed to deal with writes from different CPUs that
> > may arrive out of order, nothing else. It sounds like PPC won't allow
> > that normally, so I can be an empty definition.
>
> I don't understand that neither. You can never guarantee any ordering
> between writes from different CPUs unless you have a sinlock. If you
> have an ordering problem with spinlocks, then it's a totally different
> issue, a bit more like MMIO vs. cacheable mem that we have on PPC.

Right.

> If
> this is the problem you are trying to chase, then we could use such a
> barrier on ppc too and make it a hard sync, but it has nothing to do
> with the write barrier we already have in our IO accessors...

Ok.

>
> > > That doesn't solve my need of MMIO vs. memory unless you are trying to
> > > cover that as well, in which case it should be a sync.
> >
> > No, I think that has to be covered separately.
>
> How so ? Again, this whole "ordering of writes between different CPU" makes
> absolutely no sense to me.

It's like you said above. I meant that ordering of writes to I/O space and
memory space should be dealt with differently on PPC, as you've said before.
I guess you need new barrier types for that?

Jesse

2004-10-05 23:58:43

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

Benjamin> I don't understand that neither. You can never guarantee
Benjamin> any ordering between writes from different CPUs unless
Benjamin> you have a sinlock. If you have an ordering problem with
Benjamin> spinlocks, then it's a totally different issue, a bit
Benjamin> more like MMIO vs. cacheable mem that we have on PPC. If
Benjamin> this is the problem you are trying to chase, then we
Benjamin> could use such a barrier on ppc too and make it a hard
Benjamin> sync, but it has nothing to do with the write barrier we
Benjamin> already have in our IO accessors...

As I understand it, the problem is that on some Itanium boxes, it's
possible to have the following code run:

CPU 1 CPU 2
spin_lock(&devlock);
writel(foo);
spin_unlock(&devlock);

spin_lock(&devlock);
writel(bar);
spin_unlock(&devlock);

and still have bar arrive at the device before foo. One possibility
would be to add a read after the writel() to flush the posted write.
The proposed mmiowb() function is somewhat lighter weight -- it
guarantees that later writes will not hit the device before any writes
already issued, but it doesn't say anything about writes making it all
the way to the device.

I could be wrong but I think that the eieio in the ppc IO write
functions should be strong enough that mmiowb() can be a no-op.

By the way, are the ordering rules different for ppc32 and ppc64? I
notice that the ppc32 out_xxx() functions do eieio while the ppc64
versions do a full sync.

Thanks,
Roland


2004-10-06 01:51:35

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

On Wed, 2004-10-06 at 09:57, Roland Dreier wrote:

> I could be wrong but I think that the eieio in the ppc IO write
> functions should be strong enough that mmiowb() can be a no-op.
>
> By the way, are the ordering rules different for ppc32 and ppc64? I
> notice that the ppc32 out_xxx() functions do eieio while the ppc64
> versions do a full sync.

ppc32 and ppc64 are identical by spec, but the current chips smaller
store queues are such that we didn't epxerience on ppc32 the amount
of issues we had on ppc64.

eieio will not order a cacheable store vs. a non-cacheable store by
spec, which is the root of our problem on ppc and why we had to change
some of these into sync's. Extended the semantics of mmiowb() to a more
generic ordering of MMIO vs. "the rest of the world" would help us as
I don't beleive in defining yet-another barrier and have it properly
used by device driver writers.

Ben.


2004-10-16 00:38:54

by Jeremy Higdon

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

On Thu, Sep 30, 2004 at 11:21:39PM +0200, Guennadi Liakhovetski wrote:
>
> A pretty obvious note: instead of repeating this nice but pretty lengthy
> comment 3 times in the same file, wouldn't it be better to write at
> further locations something like
>
> /* Enforce IO-ordering. See comment in <function> for details. */
>
> Also helps if you later have to modify the comment, or move it, or add
> more mmiowb()s, or do some other modifications.
>
> Thanks
> Guennadi
> ---
> Guennadi Liakhovetski

Suggestion applied. Now that the mmiowb is in the mm patch, this
patch to qla1280 should be ready for inclusion therein also.

signed-off-by: Jeremy Higdon <[email protected]>


--- linux-2.6.9-rc4.orig/drivers/scsi/qla1280.c 2004-10-15 17:21:21.000000000 -0700
+++ linux-2.6.9-rc4/drivers/scsi/qla1280.c 2004-10-15 17:23:08.000000000 -0700
@@ -3409,7 +3409,8 @@
sp->flags |= SRB_SENT;
ha->actthreads++;
WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
- (void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+ /* Enforce mmio write ordering; see comment in qla1280_isp_cmd(). */
+ mmiowb();

out:
if (status)
@@ -3677,7 +3678,8 @@
sp->flags |= SRB_SENT;
ha->actthreads++;
WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
- (void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+ /* Enforce mmio write ordering; see comment in qla1280_isp_cmd(). */
+ mmiowb();

out:
if (status)
@@ -3787,9 +3789,21 @@
} else
ha->request_ring_ptr++;

- /* Set chip new ring index. */
+ /*
+ * Update request index to mailbox4 (Request Queue In).
+ * The mmiowb() ensures that this write is ordered with writes by other
+ * CPUs. Without the mmiowb(), it is possible for the following:
+ * CPUA posts write of index 5 to mailbox4
+ * CPUA releases host lock
+ * CPUB acquires host lock
+ * CPUB posts write of index 6 to mailbox4
+ * On PCI bus, order reverses and write of 6 posts, then index 5,
+ * causing chip to issue full queue of stale commands
+ * The mmiowb() prevents future writes from crossing the barrier.
+ * See Documentation/DocBook/deviceiobook.tmpl for more information.
+ */
WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
- (void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+ mmiowb();

LEAVE("qla1280_isp_cmd");
}

2004-10-16 03:20:49

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

On Fri, Oct 15, 2004 at 05:38:09PM -0700, Jeremy Higdon wrote:
> - (void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
> + /* Enforce mmio write ordering; see comment in qla1280_isp_cmd(). */
> + mmiowb();

I really don't think we want a comment by every mmiowb() explaining what
it does. We needed one by the write flush because it had two potential
meanings, and we didn't want people overoptimising it away. But mmiowb()
is clear and unambiguous.

--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain

2004-10-16 03:31:38

by Jeremy Higdon

[permalink] [raw]
Subject: Re: [PATCH] I/O space write barrier

On Sat, Oct 16, 2004 at 04:20:44AM +0100, Matthew Wilcox wrote:
> On Fri, Oct 15, 2004 at 05:38:09PM -0700, Jeremy Higdon wrote:
> > - (void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
> > + /* Enforce mmio write ordering; see comment in qla1280_isp_cmd(). */
> > + mmiowb();
>
> I really don't think we want a comment by every mmiowb() explaining what
> it does. We needed one by the write flush because it had two potential
> meanings, and we didn't want people overoptimising it away. But mmiowb()
> is clear and unambiguous.

This seems to be a case of not being able to please everyone.
James Bottomley asked for documentation on the usage of mmiowb().
Guennadi asked for one copy of the documentation and the references
to that in other places.

I don't really care that much, but this is the third version of this
patch, where the only difference is comments. If it's all right, let
this go in, and you can submit patches to change the comments later.

I believe James' idea was that qla1280 would be the "example" driver.

jeremy