by Segher Boessenkool

[permalink] [raw]

Subject: Re: [RFC] MMIO accessors & barriers documentation

>> Or you do the sane thing and just not allow two threads of execution
>> access to the same I/O device at the same time.
>
> Why ? Some devices are designed to be able to handle that...

Sure, but not many -- and even then, you normally get a separate
MMIO area to write to for each thread. Not really differrnt.

>> Now compare this with the similar scenario for "normal" MMIO, where
>> we do store;sync (or sync;store or even sync;store;sync) for every
>> writel() -- exactly the same problem.
>
> What problem ? "Normal" MMIO doesn't get combined, thus there is no
> problem. Of course there is no guarantee of ordering of the stores
> from
> the 2 CPUs unless there is a spinlock etc etc... but we are talking
> about a case where that is acceptable here. Howver, combining is not.

As an example, the first access might set off a DMA, and the 2nd MMIO
interferes. That's not necessarily acceptable. Now you might point
me to the spinlock again, but I'll just point you right back to your
original example, because that's my whole point.

>> Better lock at a higher level than just per instruction.
>>
>> Some devices that want to support multiple clients at the same time
>> have multiple identical "register files", one for each client, to
>> prevent this and other problems (and it's useful anyway).
>
> Yes, they do, and what happen if those register "files" happen to be
> consecutive in the address space and the CPU suddenly combines a store
> to the last register of one "file" and an unrelated store from another
> thread to the first register of the other ?

That's why those devices rely on the CPU's not combining over the edges
of (typically) 4kB pages.

> This is a very specific problem that has nothing to do with your
> "grand
> general case"

Oh I have no "grand general case", my main argument still is to have
accessors _per bus_ (per bus type really, archs can make it more
specific
if they want).

In the "grand general case", you have to do lowest-common-denominator
for everything, and you're increasingly forcing yourself into that
corner.

>>> Anyway, let's not pollute this discussion with that too much now :)
>>
>> Au contraire -- if you're proposing to hugely invasively change some
>> core interface, and add millions of little barriers(*), you better
>> explain how this is going to help us tackle the problems (like WC)
>> that
>> we are starting to see already, and that will be a big deal in the
>> near future.
>
> No, this is totally irrelevant.

"The (near) future [and it's only not right now because Linux is
dragging
behind] is totally irrelevant, only my current this-second itch is?"

> I'm proposing a simple change (nothing
> invasive there) to the MMIO accessors of weakly ordered platforms
> only,
> to make them guarantee ordering like x86 etc...

Please explain what drivers will need changes because of this. Not just
the few you really care about, but _all_ that could be plugged into
PowerPC
machines' PCI busses, and might need changes because of changing the
ordering semantics of readX()/writeX() from the supposed standard Linux
semantics (i.e., the x86 semantics).

> and I'm proposing the
> -addition- (which is not something I would cause invasive) of -one-
> class of partially relaxed accessors and the -few- (damn, there are
> only
> 4 of them) barriers that precisely match the semantics that drivers
> need. Oh, and make sure those semantics are well defined or they are
> useless.

Erm, wait a minute, I might start to understand now... You want all
drivers that you care about to be converted to use __readX()/__writeX()
instead? How is this going to help, exactly?

> This has strictly nothing to do with WC and mixing things up will only
> confuse the discussion and guarantee that we'll never get anything
> done.

No, it _has_ to do with WC. If the Linux I/O API is going to be
changed/
amended/expanded/mot du jour, we better do it in such a way that we will
get a positive outlook on the problems that we will have to face next
(or
in this case, that we should be handling already really).

> <snip useless digression>

Very constructive.

Segher

2006-09-13 01:34:59

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: [RFC] MMIO accessors & barriers documentation

> Please explain what drivers will need changes because of this. Not just
> the few you really care about, but _all_ that could be plugged into
> PowerPC
> machines' PCI busses, and might need changes because of changing the
> ordering semantics of readX()/writeX() from the supposed standard Linux
> semantics (i.e., the x86 semantics).

They won't. They will still work, and in some (many ?) case better due
to the removal of a potential bug since lots of driver don't have a
barrier where they should be with relaxed semantics. So the net effect
is positive here.

Now, it also means that we -can- start improving drivers we care about
to use the relaxed semantics and benefit from there. And since the
semantics are well defined, all archs with some sort of relaxed ordering
will be able to benefit in a way or another.

In addition, it will allow us to get a small optimisation on PowerPC vs.
the current situation by slightly relaxing wmb() which currently has to
do a full sync because it might be used to order memory vs. MMIO, which
it will no longer do (it will go back to a pure memory store barrier).

Anyway, Paul has a patch we are testing that makes our writel/readl's
synchronous (by moving the sync to before writel, adding an eieio before
readl, and doing the percpu trick so spin_unlock magically does a sync
when a writel occurred). With that, we'll get full correctness with no
more sync's in writel than we had before. We are running some benchs
here now to see what kind of performance impact it has overall, and if
we are happy, that can make it into 2.6.18 and close the problem of
drivers assuming ordered MMIO vs. memory at least.

Then, in a -separate- step, we can provide a set of relaxed accessors
that will allow for additional performance improvements on the hot path
of selected drivers.

I'm tired of arguing over and over again the same thing here anyway,
I'll post a new version of the document including some of the feedback
we got already and will submit it for inclusion along with a
__writel/__readl implementation for powerpc (and a generic one that
defaults to readl/writel) for the 2.6.19 timeframe.

We'll see from there if there are more constructive comments.

Ben.