2008-06-23 17:35:06

by Keith Packard

[permalink] [raw]
Subject: kmap_atomic_pfn for PCI BAR access?

The graphics memory BAR is generally fairly good sized; on Intel chips,
it's between 256M and 1G (and growing). I want to write data into this
region from kernel space, but it's really too big to map the whole thing
into kernel address space, especially on 32-bit systems. ioremap is not
a good option here -- it's way too slow.

With CONFIG_HIGHMEM enabled, I can use kmap_atomic_pfn (well, actually
the kmap_atomic_proc_pfn included in the DRM tree) and things work quite
well -- performance is good, with barely any measurable time spent in
the PTE whacking (~1%).

However, with CONFIG_HIGHMEM disabled, there aren't any PTEs reserved
for this kind of mapping fun. This makes me suspect that abusing
kmap_atomic for this operation would not be appreciated.

Should I use kmap_atomic_pfn to reach my PCI BAR like this?

Would it be reasonable to supply a patch that made this work even
without CONFIG_HIGHMEM?

--
[email protected]


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2008-06-26 01:18:28

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: kmap_atomic_pfn for PCI BAR access?

Keith Packard wrote:
> The graphics memory BAR is generally fairly good sized; on Intel chips,
> it's between 256M and 1G (and growing). I want to write data into this
> region from kernel space, but it's really too big to map the whole thing
> into kernel address space, especially on 32-bit systems. ioremap is not
> a good option here -- it's way too slow.
>
> With CONFIG_HIGHMEM enabled, I can use kmap_atomic_pfn (well, actually
> the kmap_atomic_proc_pfn included in the DRM tree) and things work quite
> well -- performance is good, with barely any measurable time spent in
> the PTE whacking (~1%).
>
> However, with CONFIG_HIGHMEM disabled, there aren't any PTEs reserved
> for this kind of mapping fun. This makes me suspect that abusing
> kmap_atomic for this operation would not be appreciated.
>
> Should I use kmap_atomic_pfn to reach my PCI BAR like this?
>
> Would it be reasonable to supply a patch that made this work even
> without CONFIG_HIGHMEM?
>

Usually people use ioremap to map device memory. Wouldn't that work in
this case?

J

2008-06-26 01:23:22

by Dave Airlie

[permalink] [raw]
Subject: Re: kmap_atomic_pfn for PCI BAR access?

On Thu, Jun 26, 2008 at 11:18 AM, Jeremy Fitzhardinge <[email protected]> wrote:
> Keith Packard wrote:
>>
>> The graphics memory BAR is generally fairly good sized; on Intel chips,
>> it's between 256M and 1G (and growing). I want to write data into this
>> region from kernel space, but it's really too big to map the whole thing
>> into kernel address space, especially on 32-bit systems. ioremap is not
>> a good option here -- it's way too slow.
>>
>> With CONFIG_HIGHMEM enabled, I can use kmap_atomic_pfn (well, actually
>> the kmap_atomic_proc_pfn included in the DRM tree) and things work quite
>> well -- performance is good, with barely any measurable time spent in
>> the PTE whacking (~1%).
>>
>> However, with CONFIG_HIGHMEM disabled, there aren't any PTEs reserved
>> for this kind of mapping fun. This makes me suspect that abusing
>> kmap_atomic for this operation would not be appreciated.
>> Should I use kmap_atomic_pfn to reach my PCI BAR like this?
>>
>> Would it be reasonable to supply a patch that made this work even
>> without CONFIG_HIGHMEM?
>>
>
> Usually people use ioremap to map device memory. Wouldn't that work in this
> case?
>

"but it's really too big to map the whole thing
into kernel address space, especially on 32-bit systems. ioremap is not
a good option here -- it's way too slow."

>From the original mail.

doing tlb flush for iounmap is slow as all hell if you do it a lot,
and we can't afford to mmap the whole aperture it can 1GB.

Dave.

2008-06-26 03:11:28

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: kmap_atomic_pfn for PCI BAR access?

Dave Airlie wrote:
> On Thu, Jun 26, 2008 at 11:18 AM, Jeremy Fitzhardinge <[email protected]> wrote:
>
>> Keith Packard wrote:
>>
>>> The graphics memory BAR is generally fairly good sized; on Intel chips,
>>> it's between 256M and 1G (and growing). I want to write data into this
>>> region from kernel space, but it's really too big to map the whole thing
>>> into kernel address space, especially on 32-bit systems. ioremap is not
>>> a good option here -- it's way too slow.
>>>
>>> With CONFIG_HIGHMEM enabled, I can use kmap_atomic_pfn (well, actually
>>> the kmap_atomic_proc_pfn included in the DRM tree) and things work quite
>>> well -- performance is good, with barely any measurable time spent in
>>> the PTE whacking (~1%).
>>>
>>> However, with CONFIG_HIGHMEM disabled, there aren't any PTEs reserved
>>> for this kind of mapping fun. This makes me suspect that abusing
>>> kmap_atomic for this operation would not be appreciated.
>>> Should I use kmap_atomic_pfn to reach my PCI BAR like this?
>>>
>>> Would it be reasonable to supply a patch that made this work even
>>> without CONFIG_HIGHMEM?
>>>
>>>
>> Usually people use ioremap to map device memory. Wouldn't that work in this
>> case?
>>
>>
>
> "but it's really too big to map the whole thing
> into kernel address space, especially on 32-bit systems. ioremap is not
> a good option here -- it's way too slow."
>
> From the original mail.
>

Uh, yep.

> doing tlb flush for iounmap is slow as all hell if you do it a lot,
> and we can't afford to mmap the whole aperture it can 1GB.
>

Maybe Nick's vmap reimplementation would help here. It effectively
allows you to map stuff into the vmalloc space, and do lazy tlb flushes
to mitigate the cost of map/unmap. He posted the patches week or so ago.

J

2008-06-26 04:36:25

by Arjan van de Ven

[permalink] [raw]
Subject: Re: kmap_atomic_pfn for PCI BAR access?

On Thu, 26 Jun 2008 11:23:11 +1000
"Dave Airlie" <[email protected]> wrote:

> On Thu, Jun 26, 2008 at 11:18 AM, Jeremy Fitzhardinge
> <[email protected]> wrote:
> > Keith Packard wrote:
> >>
> >> The graphics memory BAR is generally fairly good sized; on Intel
> >> chips, it's between 256M and 1G (and growing). I want to write
> >> data into this region from kernel space, but it's really too big
> >> to map the whole thing into kernel address space, especially on
> >> 32-bit systems. ioremap is not a good option here -- it's way too
> >> slow.
> >>
> >> With CONFIG_HIGHMEM enabled, I can use kmap_atomic_pfn (well,
> >> actually the kmap_atomic_proc_pfn included in the DRM tree) and
> >> things work quite well -- performance is good, with barely any
> >> measurable time spent in the PTE whacking (~1%).
> >>
> >> However, with CONFIG_HIGHMEM disabled, there aren't any PTEs
> >> reserved for this kind of mapping fun. This makes me suspect that
> >> abusing kmap_atomic for this operation would not be appreciated.
> >> Should I use kmap_atomic_pfn to reach my PCI BAR like this?
> >>
> >> Would it be reasonable to supply a patch that made this work even
> >> without CONFIG_HIGHMEM?
> >>
> >
> > Usually people use ioremap to map device memory. Wouldn't that
> > work in this case?
> >
>
> "but it's really too big to map the whole thing
> into kernel address space, especially on 32-bit systems. ioremap is
> not a good option here -- it's way too slow."
>
> >From the original mail.
>
> doing tlb flush for iounmap is slow as all hell if you do it a lot,
> and we can't afford to mmap the whole aperture it can 1GB.

well kmap does a tlb flush as well... you can't get away from doing a
flush if you change cpu mapping.
What you CAN do is play tricks and flush only once in a while and make
sure you don't recycle mappings in the mean time (like kmap does).

I can totally see doing an iounmap_lazy() and then have an
iounmap_flush_lazy() thing or something like that....


--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-06-26 05:02:19

by Dave Airlie

[permalink] [raw]
Subject: Re: kmap_atomic_pfn for PCI BAR access?

On Thu, Jun 26, 2008 at 2:36 PM, Arjan van de Ven <[email protected]> wrote:
> On Thu, 26 Jun 2008 11:23:11 +1000
> "Dave Airlie" <[email protected]> wrote:
>
>> On Thu, Jun 26, 2008 at 11:18 AM, Jeremy Fitzhardinge
>> <[email protected]> wrote:
>> > Keith Packard wrote:
>> >>
>> >> The graphics memory BAR is generally fairly good sized; on Intel
>> >> chips, it's between 256M and 1G (and growing). I want to write
>> >> data into this region from kernel space, but it's really too big
>> >> to map the whole thing into kernel address space, especially on
>> >> 32-bit systems. ioremap is not a good option here -- it's way too
>> >> slow.
>> >>
>> >> With CONFIG_HIGHMEM enabled, I can use kmap_atomic_pfn (well,
>> >> actually the kmap_atomic_proc_pfn included in the DRM tree) and
>> >> things work quite well -- performance is good, with barely any
>> >> measurable time spent in the PTE whacking (~1%).
>> >>
>> >> However, with CONFIG_HIGHMEM disabled, there aren't any PTEs
>> >> reserved for this kind of mapping fun. This makes me suspect that
>> >> abusing kmap_atomic for this operation would not be appreciated.
>> >> Should I use kmap_atomic_pfn to reach my PCI BAR like this?
>> >>
>> >> Would it be reasonable to supply a patch that made this work even
>> >> without CONFIG_HIGHMEM?
>> >>
>> >
>> > Usually people use ioremap to map device memory. Wouldn't that
>> > work in this case?
>> >
>>
>> "but it's really too big to map the whole thing
>> into kernel address space, especially on 32-bit systems. ioremap is
>> not a good option here -- it's way too slow."
>>
>> >From the original mail.
>>
>> doing tlb flush for iounmap is slow as all hell if you do it a lot,
>> and we can't afford to mmap the whole aperture it can 1GB.
>
> well kmap does a tlb flush as well... you can't get away from doing a
> flush if you change cpu mapping.
> What you CAN do is play tricks and flush only once in a while and make
> sure you don't recycle mappings in the mean time (like kmap does).
>
> I can totally see doing an iounmap_lazy() and then have an
> iounmap_flush_lazy() thing or something like that....
>

Thats why keithp wants something like kmap_atomic but for ioremap
instead of kmaps.

This would be for short temporary ioremaps like kmap_atomic is.

Dave.

2008-06-26 05:33:34

by Keith Packard

[permalink] [raw]
Subject: Re: kmap_atomic_pfn for PCI BAR access?

On Thu, 2008-06-26 at 15:02 +1000, Dave Airlie wrote:

> Thats why keithp wants something like kmap_atomic but for ioremap
> instead of kmaps.

Right, the usage is precisely the same as kmap_atomic -- reading and
writing from a wide range of physical addresses without needing either
permanent map nor taking the huge cost of TLB flushing.

The existing kmap_atomic_pfn is exactly what I need (modulo the lack of
prot bits), except that it only handles non-memory pages on kernels with
CONFIG_HIGHMEM set.

It seems like making this function work on non-HIGHMEM kernels would
require only the reservation of the same few PTEs that HIGHMEM kernels
use, along with suitable hacking to make them work.

> This would be for short temporary ioremaps like kmap_atomic is.

For an integrated graphics device, this is just an optimization. I take
physical pages, map them to the graphics GTT which makes them visible to
the CPU up in I/O space. Then, I map address in the GTT back to the CPU
with kmap_atomic_pfn and viola -- WC mapped access to regular pages, all
without touching the low memory mappings.

Before trying this, we just mapped the 'real' address of the page and
then used clflush to get the contents out to memory where the graphics
device could pick it up. However, the clflush is fairly expensive,
enough so that using the WC mapping turns out to be faster in practice.

Beyond simple integrated graphics performance benefits, we're looking
towards discrete graphics cards where we need to write to VRAM which can
only be made visible through the aperture; in that environment, we're
stuck choosing between ioremap (urf) or the same kmap_atomic_pfn as
above. In this case, there's no question that kmap_atomic_pfn will be a
huge performance benefit.

--
[email protected]


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2008-06-26 05:36:45

by Keith Packard

[permalink] [raw]
Subject: Re: kmap_atomic_pfn for PCI BAR access?

On Wed, 2008-06-25 at 21:36 -0700, Arjan van de Ven wrote:

> well kmap does a tlb flush as well...

kmap_atomic

--
[email protected]


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2008-06-26 06:00:39

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: kmap_atomic_pfn for PCI BAR access?

Keith Packard wrote:
> On Wed, 2008-06-25 at 21:36 -0700, Arjan van de Ven wrote:
>
>
>> well kmap does a tlb flush as well...
>>
>
> kmap_atomic
>

Anything that does any kind of pagetable manipulation needs to do tlb
flushes. kunmap_atomic handles the flushing.

J

2008-06-26 06:02:26

by Dave Airlie

[permalink] [raw]
Subject: Re: kmap_atomic_pfn for PCI BAR access?

> >
>
> Anything that does any kind of pagetable manipulation needs to do tlb flushes.
> kunmap_atomic handles the flushing.

It doesn't however need to do an IPI dance which is the worst part of
doing a tlb flush on SMP machines.

flushing local CPU tlbs is bad, but doing IPIs and waiting for everyone
else is brutal.

Dave.

2008-06-26 10:34:52

by Arjan van de Ven

[permalink] [raw]
Subject: Re: kmap_atomic_pfn for PCI BAR access?

On Wed, 25 Jun 2008 22:36:29 -0700
Keith Packard <[email protected]> wrote:

> On Wed, 2008-06-25 at 21:36 -0700, Arjan van de Ven wrote:
>
> > well kmap does a tlb flush as well...
>
> kmap_atomic

that needs to, and does, a flush as well.
granted, a per cpu flush, but it's a flush neverthelesss

>


--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-06-26 10:36:35

by Arjan van de Ven

[permalink] [raw]
Subject: Re: kmap_atomic_pfn for PCI BAR access?

On Thu, 26 Jun 2008 07:02:15 +0100 (IST)
Dave Airlie <[email protected]> wrote:

> > >
> >
> > Anything that does any kind of pagetable manipulation needs to do
> > tlb flushes. kunmap_atomic handles the flushing.
>
> It doesn't however need to do an IPI dance which is the worst part of
> doing a tlb flush on SMP machines.
>
> flushing local CPU tlbs is bad, but doing IPIs and waiting for
> everyone else is brutal.

but you only do IPI's for the multithreaded case to cpus where you're
running; otoh I guess games are multithreaded nowadays

>
> Dave.


--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-06-26 13:11:55

by Arjan van de Ven

[permalink] [raw]
Subject: Re: kmap_atomic_pfn for PCI BAR access?

On Thu, 26 Jun 2008 03:36:14 -0700
Arjan van de Ven <[email protected]> wrote:

> On Thu, 26 Jun 2008 07:02:15 +0100 (IST)
> Dave Airlie <[email protected]> wrote:
>
> > > >
> > >
> > > Anything that does any kind of pagetable manipulation needs to do
> > > tlb flushes. kunmap_atomic handles the flushing.
> >
> > It doesn't however need to do an IPI dance which is the worst part
> > of doing a tlb flush on SMP machines.
> >
> > flushing local CPU tlbs is bad, but doing IPIs and waiting for
> > everyone else is brutal.
>
> but you only do IPI's for the multithreaded case to cpus where you're
> running; otoh I guess games are multithreaded nowadays
eh never mind this is total bogus
yes a full ipi is needed.

note that with ioremap you may need certain tricks as well to deal with
cache type aliasing, depending on the type of ioremap you want (wc?)

>
> >
> > Dave.
>
>


--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-07-07 06:54:28

by Nick Piggin

[permalink] [raw]
Subject: Re: kmap_atomic_pfn for PCI BAR access?

On Thursday 26 June 2008 13:11, Jeremy Fitzhardinge wrote:
> Dave Airlie wrote:

> > doing tlb flush for iounmap is slow as all hell if you do it a lot,
> > and we can't afford to mmap the whole aperture it can 1GB.
>
> Maybe Nick's vmap reimplementation would help here. It effectively
> allows you to map stuff into the vmalloc space, and do lazy tlb flushes
> to mitigate the cost of map/unmap. He posted the patches week or so ago.

Yeah, it can _really_ help. I'd posted some performance numbers with
the patch which might prompt you to take another look at ioremap.

One thing I still haven't implemented in that patch are CPU-local
mappings (which only require a local flush to flush)... I found that
after the improvements I did implement, they didn't help much for
my workloads, so I suspect you might find the same thing... But anyway
if you really need the per-CPU mappings, it should be possible to
implement rather generically in vmap layer.

I would be very interested to know what sort of results you see with it
(compared to kmap_atomic and compared to vanilla ioremap)

Thanks,
Nick

2008-07-07 07:05:54

by Arjan van de Ven

[permalink] [raw]
Subject: Re: kmap_atomic_pfn for PCI BAR access?

On Mon, 7 Jul 2008 16:53:54 +1000
Nick Piggin <[email protected]> wrote:

> On Thursday 26 June 2008 13:11, Jeremy Fitzhardinge wrote:
> > Dave Airlie wrote:
>
> > > doing tlb flush for iounmap is slow as all hell if you do it a
> > > lot, and we can't afford to mmap the whole aperture it can 1GB.
> >
> > Maybe Nick's vmap reimplementation would help here. It effectively
> > allows you to map stuff into the vmalloc space, and do lazy tlb
> > flushes to mitigate the cost of map/unmap. He posted the patches
> > week or so ago.
>
> Yeah, it can _really_ help. I'd posted some performance numbers with
> the patch which might prompt you to take another look at ioremap.
>
> One thing I still haven't implemented in that patch are CPU-local
> mappings (which only require a local flush to flush)... I found that
> after the improvements I did implement, they didn't help much for
> my workloads, so I suspect you might find the same thing...

just to complicate things with ioremap() you have to deal with the
various caching-coherency requirements.. making things lazy and per CPU
complicates that a ton (and I suspect won't actually make things
cheaper)

--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-07-07 11:19:53

by Nick Piggin

[permalink] [raw]
Subject: Re: kmap_atomic_pfn for PCI BAR access?

On Monday 07 July 2008 17:05, Arjan van de Ven wrote:
> On Mon, 7 Jul 2008 16:53:54 +1000

> > One thing I still haven't implemented in that patch are CPU-local
> > mappings (which only require a local flush to flush)... I found that
> > after the improvements I did implement, they didn't help much for
> > my workloads, so I suspect you might find the same thing...
>
> just to complicate things with ioremap() you have to deal with the
> various caching-coherency requirements.. making things lazy and per CPU
> complicates that a ton (and I suspect won't actually make things
> cheaper)

Yeah, my vmap rewrite has a vm_unmap_aliases() call, which the
page attribute code calls.

But these flushes are not really any different between the atomic
kmap area and my vmap lazy flushing: the atomic kmap code still
has to do a broadcast global TLB flush.

The fact is simply that changing cache attributes is always going
to be an expensive operation and lazy mappings are not going to
really change that -- because changing attributes is going to
require a TLB flush on all CPUs _anyway_, which is the expensive
thing. All we really have to do with lazy mappings is tear down
their page tables before CPA's TLB flush -- if there were no lazy
unmappings in that time, then there will be no extra work to do.

... or, it occurs to me that you might have been referring to
something else: the problem of simply changing the cache attributes
on the pages that you wish to ioremap now. That's going to be slow
sure, but these guys don't seem to need such an attribute change
anyway because they are using kmap_atomic_pfn (which doesn't change
attributes)