LinuxLists.cc - [git pull] drm patches for 2.6.27-rc1

2008-10-17 21:29:50

Subject: [git pull] drm patches for 2.6.27-rc1

by Thomas Hellström

[permalink] [raw]

Subject: Re: [git pull] drm patches for 2.6.27-rc1

Linus Torvalds wrote:
> On Sat, 18 Oct 2008, Keith Packard wrote:
>
>> The basic plan is to have four new functions (yes, I'm making up names
>> here):
>>
>> struct io_mapping *io_reserve_pci_resource(struct pci_dev *dev,
>> int bar,
>> int prot);
>> void io_mapping_free(struct io_mapping *mapping);
>>
>> void *io_map_atomic(struct io_mapping *mapping, unsigned long pfn);
>> void io_unmap_atomic(struct io_mapping *mapping, unsigned long pfn);
>>
>
> The important thing is that mappings need to be per-CPU, so the above may
> work, but only if it's designed so that "io_reserve_pci_resource()" will
> actually reserve space for 'nr_possible_cpu' page mappings, and then the
> "io_[un]map_atomic()" functions do per-CPU mappings.
>
> Anything else is a disaster, because anything else implies TLB shootdown.
>
> And quite frankly, even so, we'd possibly still be _better_ off with just
> exposing the "kmap_atomic_pfn()" functionality even so. Because quite
> frankly, your "io_reserve_pci_resource()" infrastructure is going to
> inevitably be more complex and slower than the rather efficient
> kmap_atomic_pfn() thing we have.
>
> [ The *non-atomic* kmap() functions are fairly high-overhead, in that they
> want to keep track of cached mappings and remember page addresses etc.
> So those are the ones we don't want to support for non-HIGHMEM setups.
>
> But the atomic kmaps are pretty simple, and really only need some
> trivial FIXMAP support. We could easily extend it for x86-64, methinks,
> and do it for x86-32 even when we don't do HIGHMEM.
>
> Ingo? ]
>
> One small detail: our we currently have "kmap_atomic_pfn()" and
> "kmap_atomic_prot()", and we really should maek the fundamental core
> operation be "kmap_atomic_pfn_prot()", and have everything be done in
> terms of that. Looking at it, it also looks like kmap_atomic_prot() is
> actually incorrect right now, and doesn't do a "prot" thing for
> non-highmem pages, but just returns "page_address(page);"
>
Actually, a "kmap_atomic_prot_pfn()" has been lurking in the drm repos
for some time now, but hasn't been suggested for upstream. It was
intended for drivers that require quick in-kernel patching of
write-combined io and highmem pages. The latter is a common situation
for PCIE graphics devices with their own MMU, so IMHO an exported
kmap_atomic_pfn_prot() would be a big help in such cases.

/Thomas

> Linus
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> --
> _______________________________________________
> Dri-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dri-devel
>

2008-10-18 20:21:15

On Sat, 2008-10-18 at 22:37 +0200, Ingo Molnar wrote:

> But i think the direction of the new GEM code is subtly wrong here,
> because it tries to manage memory even on 64-bit systems. IMO it should
> just map the _whole_ graphics aperture (non-cached) and be done with it.
> There's no faster method at managing pages than the CPU doing a TLB fill
> from pagetables.

Yeah, we're stuck thinking that we "can't" map the aperture because it's
too large, but with a 64-bit kernel, we should be able to keep it mapped
permanently.

Of course, the io_reserve_pci_resource and io_map_atomic functions could
do precisely that, as kmap_atomic does on non-HIGHMEM systems today.

> The only real API need i see is on 32-bit: with a 1GB or 2GB graphics
> aperture we just cannot map that permanently, so kmap_atomic() is a
> necessity. We can certainly extend that to non-highmem as well.

Yes, this is where exposing an io-specific atomic mapping function will
remain necessary for some time.

> And if i understood your
> workload correctly you want to do tens of thousand of map/unmap/remap
> events per frame generated - depending on the type of the 3D app/engine.

Yeah, data transfer from CPU to GPU is through a pwrite interface, and
we perform the transfer within the kernel using map/unmap operations on
the aperture as those are WC and hence do not require clflush.

> Or am i missing something subtle? Why do you want the overhead of kmap
> on 64-bit?

We don't, but I think it would be nice to have a common API that works
across all 32-bit configurations as well as 64-bit systems.

--
[email protected]

Attachments:

signature.asc (189.00 B)
This is a digitally signed message part

2008-10-18 22:32:44

by Ingo Molnar

[permalink] [raw]

Subject: Re: [git pull] drm patches for 2.6.27-rc1

by Ingo Molnar

[permalink] [raw]

Subject: io resources and cached mappings (was: [git pull] drm patches for 2.6.27-rc1)

* Keith Packard <[email protected]> wrote:

> On Sat, 2008-10-18 at 21:14 -0700, Keith Packard wrote:
> > On Sun, 2008-10-19 at 00:32 +0200, Ingo Molnar wrote:
> >
> > > Mind sending patches for this? :-)
>
> Here's a patch for the i915 driver that includes the new API. Tested
> on x86_32+HIGHMEM and x86_64. I stuck a new 'io_reserve.h' header in
> the i915 directory for this patch, but it should go elsewhere.
>
> The new APIs are:
>
> io_reserve_create_wc
> io_reserve_free
> io_reserve_map_atomic_wc
> io_reserve_unmap_atomic
> io_reserve_map_wc
> io_reserve_unmap

very nice!

I think we need a somewhat different abstraction though.

Firstly, regarding drivers/gpu/drm/i915/io_reserve.h, that needs to move
to generic code.

Secondly, wouldnt the right abstraction be to attach this functionality
to 'struct resource' ? [or at least create a second struct that embedds
struct resource]

this abstraction is definitely not a PCI thing and not a
detached-from-everything thing, it's an IO resource thing. We could make
it a property of struct resource:

struct resource {
resource_size_t start;
resource_size_t end;
const char *name;
unsigned long flags;
struct resource *parent, *sibling, *child;
+ void *mapping;
};

The APIs would be:

int io_resource_init_mapping(struct resource *res);
void io_resource_free_mapping(struct resource *res);
void * io_resource_map(struct resource *res, pfn_t pfn, unsigned long offset);
void io_resource_unmap(struct resource *res, void *kaddr);

Note how simple and consistent it all gets: IO resources already know
their physical location and their size limits. Being able to cache an
ioremap in a mapping [and being able to use atomic kmaps on 32-bit] is a
relatively simple and natural extension to the concept.

i think that would be quite acceptable - and the APIs could just
transparently work on it. This would also allow the PCI code to
automatically unmap any cached mappings from resources, when the driver
deinitializes.

Linus, Jesse, what do you think?

i think we need to finalize the API names and their abstraction level,
and then could even merge those APIs into v2.6.28 on a fast path, to
enable you to use it. It does not interact with anything else so it
should be safe to do.

(i'd not suggest to merge the i915 bits just yet - but that's obviously
not my call.)

Ingo

2008-10-19 18:01:19

by Arjan van de Ven

[permalink] [raw]

Subject: Re: io resources and cached mappings (was: [git pull] drm patches for 2.6.27-rc1)

On Sun, 19 Oct 2008 19:53:20 +0200
Ingo Molnar <[email protected]> wrote:

>
> struct resource {
> resource_size_t start;
> resource_size_t end;
> const char *name;
> unsigned long flags;
> struct resource *parent, *sibling, *child;
> + void *mapping;
> };
>
> The APIs would be:
>
> int io_resource_init_mapping(struct resource *res);
> void io_resource_free_mapping(struct resource *res);
> void * io_resource_map(struct resource *res, pfn_t pfn, unsigned
> long offset); void io_resource_unmap(struct resource *res, void
> *kaddr);
>
> Note how simple and consistent it all gets: IO resources already know
> their physical location and their size limits. Being able to cache an
> ioremap in a mapping [and being able to use atomic kmaps on 32-bit]
> is a relatively simple and natural extension to the concept.

and making a simple wrapper around this that turns "struct pci_dev,
barnr" into a resource would make sense too, but yes.

We need one more

io_resource_force_cachability(struct resource, cachetype)

or maybe only

io_resource_force_writecombine(struct resource)

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-10-19 19:09:26

by Eric Anholt

[permalink] [raw]

Subject: Re: io resources and cached mappings (was: [git pull] drm patches for 2.6.27-rc1)

On Sun, 2008-10-19 at 19:53 +0200, Ingo Molnar wrote:
> * Keith Packard <[email protected]> wrote:
>
> > On Sat, 2008-10-18 at 21:14 -0700, Keith Packard wrote:
> > > On Sun, 2008-10-19 at 00:32 +0200, Ingo Molnar wrote:
> > >
> > > > Mind sending patches for this? :-)
> >
> > Here's a patch for the i915 driver that includes the new API. Tested
> > on x86_32+HIGHMEM and x86_64. I stuck a new 'io_reserve.h' header in
> > the i915 directory for this patch, but it should go elsewhere.
> >
> > The new APIs are:
> >
> > io_reserve_create_wc
> > io_reserve_free
> > io_reserve_map_atomic_wc
> > io_reserve_unmap_atomic
> > io_reserve_map_wc
> > io_reserve_unmap
>
> very nice!
>
> I think we need a somewhat different abstraction though.
>
> Firstly, regarding drivers/gpu/drm/i915/io_reserve.h, that needs to move
> to generic code.
>
> Secondly, wouldnt the right abstraction be to attach this functionality
> to 'struct resource' ? [or at least create a second struct that embedds
> struct resource]
>
> this abstraction is definitely not a PCI thing and not a
> detached-from-everything thing, it's an IO resource thing. We could make
> it a property of struct resource:
>
> struct resource {
> resource_size_t start;
> resource_size_t end;
> const char *name;
> unsigned long flags;
> struct resource *parent, *sibling, *child;
> + void *mapping;
> };
>
> The APIs would be:
>
> int io_resource_init_mapping(struct resource *res);
> void io_resource_free_mapping(struct resource *res);
> void * io_resource_map(struct resource *res, pfn_t pfn, unsigned long offset);
> void io_resource_unmap(struct resource *res, void *kaddr);
>
> Note how simple and consistent it all gets: IO resources already know
> their physical location and their size limits. Being able to cache an
> ioremap in a mapping [and being able to use atomic kmaps on 32-bit] is a
> relatively simple and natural extension to the concept.
>
> i think that would be quite acceptable - and the APIs could just
> transparently work on it. This would also allow the PCI code to
> automatically unmap any cached mappings from resources, when the driver
> deinitializes.
>
> Linus, Jesse, what do you think?
>
> i think we need to finalize the API names and their abstraction level,
> and then could even merge those APIs into v2.6.28 on a fast path, to
> enable you to use it. It does not interact with anything else so it
> should be safe to do.

This API needs the cacheability control, which I don't see in it
currently. In the past we've been relying on an MTRR over the GTT
resulting in all of our UC- mappings getting us the correct WC behavior,
but now there aren't enough MTRRs to go around and graphics loses out
(at about a 20% CPU cost as a conservative estimate). The primary goal
of the new API is to let us eat PAT costs up front so we don't have to
at runtime.

Second, we need to know when we're doing a mapping whether we're
affected by atomic scheduling restrictions. Right now our plan has been
to try doing page-by-page
io_map_atomic_wc()/copy_from_user_inatomic()/io_unmap_atomic(), and if
we fail at that at some point (map returns NULL or we get a partial
completion from copy_from_user_inatomic) then fall back to io_map_wc()
and copy_from_user() the whole thing at once. That gets us good
performance on both x86 with highmem and x86-64, and not too shabby
performance on x86 non-highmem.

Also, while it's rare, there have been graphics cards (looking at you,
S3) where BARs were expensive for some reason and they stuffed both the
framebuffer and registers into one PCI BAR, where you want the FB to be
WC and the registers to be UC. Not sure if they would be supportable
with this API or not. And if it's not, I'm not sure how much we care to
design for them, but it's something to potentially consider.

Finally, I'm confused by the pfn and offset args to io_resource_map,
when I expected something parallel to ioremap but with our resource arg
added.

--
Eric Anholt
[email protected] [email protected]

Attachments:

signature.asc (197.00 B)
This is a digitally signed message part

2008-10-19 21:05:17

[permalink] [raw]

Subject: Re: io resources and cached mappings (was: [git pull] drm patches for 2.6.27-rc1)

On Thu, 2008-10-23 at 10:05 +0200, Ingo Molnar wrote:

> Any ballpark-figure numbers you can share with us?

For one quake-3 based game we use for performance regression checking,
64-bit kernels run about 18 times faster now. That's the difference
between using a zero-cost dynamic mapping and using ioremap_wc for each
page.

--
[email protected]

Attachments:

signature.asc (189.00 B)
This is a digitally signed message part

2008-10-23 20:22:54

On Thu, 2008-10-23 at 13:38 -0700, Andrew Morton wrote:

> I guess one could reimplemenet kmap_atomic_pfn() to call this. Sometime.

The goal is to stop needing this function fairly soon and replace it
with a 'real' io-mapping implementation for 32-bit processors.

> Given that all highmem-implementing archtiectures must use the same
> declaration here, we might as well put it into include/linux/highmem.h.
> Although that goes against current mistakes^Wcode.

I'd hate to break with a long tradition.

> Does powerpc32 still implement highmem? It seems that way. You broke
> it, no?

Powerpc32 doesn't have kmap_atomic_pfn either. Seems like the set of
HIGHMEM functions is not uniform across architectures.

--
[email protected]

Attachments:

signature.asc (189.00 B)
This is a digitally signed message part

2008-10-23 21:26:04

by Linus Torvalds

[permalink] [raw]

Subject: Re: Adding kmap_atomic_prot_pfn (was: [git pull] drm patches for 2.6.27-rc1)

On Thu, 23 Oct 2008, Andrew Morton wrote:
>
> Given that all highmem-implementing archtiectures must use the same
> declaration here, we might as well put it into include/linux/highmem.h.
> Although that goes against current mistakes^Wcode.
>
> Does powerpc32 still implement highmem? It seems that way. You broke
> it, no?

This really shouldn't be in highmem.h AT ALL.

The whole point of that function has absolutely nothing to do with
highmem, and it *must* be useful on non-highmem configurations to be
appropriate.

So I'd much rather create a new <linux/kmap.h> or something. Or just
expose this from to <asm/fixmap.h> or something. Let's not confuse this
with highmem, even if the implementation _historically_ was due to that.

Linus

2008-10-24 01:50:38

On Thu, 2008-10-23 at 19:48 -0700, Linus Torvalds wrote:

> I'm not entirely sure who wants to own up to owning that particular part
> of code, and is willing to make kmap_atomic_prot_pfn() also work in the
> absense of CONFIG_HIGHMEM.

All of the kmap_atomic functions *do* work without CONFIG_HIGHMEM, they
just don't do what we want in this case. Without knowing the history, it
seems fairly clear that the kmap functions are designed to map physical
memory pages into the kernel virtual address space. On small 32-bit
systems, that's trivial, you just use the direct map (as one does on
64-bit systems). The magic fixmap entries make this work with
CONFIG_HIGHMEM as well.

So, I fear touching the kmap API as it appears to have a specific and
useful purpose, irrespective of the memory size the kernel is configured
for.

What I've proposed is that we create a new io-space specific set of
fixmap APIs. On CONFIG_HIGHMEM, they'd just use the existing kmap_atomic
mechanism, but on small 32-bit systems, we'd enable the fixmaps and have
some for that environment as well.

> So I would suspect that if you guys actually write a patch, and make sure
> that it works on x86-32 even _without_ CONFIG_HIGHMEM, and send it to
> Ingo, good things will happen.

Ok, we can give this a try.

> and it probably should all work automatically. The kmap_atomic() stuff
> really should be almost totally independent of CONFIG_HIGHMEM, since it's
> really much more closely related to fixmaps than HIGHMEM.

As above, I think kmap_atomic should be left alone as a way of quickly
mapping memory pages. There are a users of both kmap_atomic_prot (xen)
and kmap_atomic_pfn (crash dumps).

> I guess there may be some debugging code that depends on HIGHMEM (eg that
> whole testing for whether a page is a highmem page or not), so it might be
> a _bit_ more than just moving code around, but I really didn't look
> closer.
>
> Then, there's the issue of 64-bit, and just mapping everything there, and
> the interface to that. I liked the trivial extension to "struct resource"
> to have a "cached mapping" pointer. So if we can just make it pass
> resources around and get a page that way (and not even need kmap() on
> 64-bit architections), that would be good.

The io_mapping API I proposed does precisely this. On 32-bit systems, it
uses kmap_atomic for each page access while on 64-bit systems it uses
ioremap_wc at device init time and then just uses an offset for each
page access.

Hiding this detail behind an API leaves the driver code independent of
this particular choice.

> It's too late for -rc1 (which I'm planning on cutting within the hour),
> and if it ends up being complex, I guess that it means this will eb a
> 2.6.29 issue.

If we do end up pushing this out to 2.6.29, I'd like to see
kmap_atomic_prot_pfn in place as a stop-gap so that PAT performance on
32-bit systems is reasonable. I don't think too many people are running
desktop systems without CONFIG_HIGHMEM at this point, and if so, we can
just suggest that perhaps they change their configuration.

> But if it really is just a matter of mostly just some trivial code
> movement and both the gfx and x86 people are all happy, I'll happily take
> it for -rc2, and we can just leave this all behind us.

I'll try to get something working in the next day or so and see how it
looks. My plan at this point is to create new API for 32-bit systems:

void *io_map_atomic_wc(unsigned long pfn)
void io_unmap_atomic(void *addr);

With this, I can switch my existing io_mapping API over to an
io-specific interface and implement those using the fixmap code.

--
[email protected]

Attachments:

signature.asc (189.00 B)
This is a digitally signed message part

2008-10-24 04:51:01

by Randy Dunlap

On Fri, 2008-10-24 at 07:53 -0700, Linus Torvalds wrote:

> Actually, on 32-bit, the 'prot' should be there, as should the starting
> physical page. Otherwise the two interfaces would be very odd, and you'd
> have to repeat those arguments in all callers (ie both in "prepare" and
> in the actual "access").

I suppose. What I did instead was create _wc versions of both the
prepare and access functions to eliminate the need for additional data.
Either way is fine with me; I took the route which didn't require an
additional allocation.

--
[email protected]

Attachments:

signature.asc (189.00 B)
This is a digitally signed message part

2008-10-24 19:00:40

* Jonathan Corbet <[email protected]> wrote:

> Having looked at this some, I have one, tiny little suggestion:
>
> > +With this mapping object, individual pages can be mapped either
> > atomically +or not, depending on the necessary scheduling
> > environment. Of course, atomic +maps are more efficient:
> > +
> > + void *io_mapping_map_atomic_wc(struct io_mapping *mapping,
> > + unsigned long offset)
>
> Should the documentation for this function (perhaps the
> certainly-forthcoming kerneldoc comments :) mention loudly that this
> function uses KM_USER0? This isn't kmap(), and doesn't look much
> like it; someday some developer might get an ugly surprise when they
> try to use that slot simultaneously for something else.

definitely worth a comment.

Ingo