2007-12-13 23:58:19

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support

Forward port of ioremap.patch to x86 tree.

Signed-off-by: Venkatesh Pallipadi <[email protected]>
Signed-off-by: Suresh Siddha <[email protected]>
---

Index: linux-2.6.24-rc4/arch/x86/mm/ioremap_64.c
===================================================================
--- linux-2.6.24-rc4.orig/arch/x86/mm/ioremap_64.c 2007-12-11 15:48:35.000000000 -0800
+++ linux-2.6.24-rc4/arch/x86/mm/ioremap_64.c 2007-12-11 15:49:52.000000000 -0800
@@ -147,7 +147,7 @@
EXPORT_SYMBOL(__ioremap);

/**
- * ioremap_nocache - map bus memory into CPU space
+ * ioremap_nocache - map bus memory into CPU space uncached
* @offset: bus address of the memory
* @size: size of the resource to map
*
@@ -175,6 +175,30 @@
EXPORT_SYMBOL(ioremap_nocache);

/**
+ * ioremap_wc - map bus memory into CPU space write combined
+ * @offset: bus address of the memory
+ * @size: size of the resource to map
+ *
+ * ioremap_wc performs a platform specific sequence of operations to
+ * make bus memory CPU accessible via the readb/readw/readl/writeb/
+ * writew/writel functions and the other mmio helpers. The returned
+ * address is not guaranteed to be usable directly as a virtual
+ * address.
+ *
+ * This version of ioremap ensures that the memory is marked write combining.
+ * Write combining allows faster writes to some hardware devices.
+ * See also iounmap_nocache for more details.
+ *
+ * Must be freed with iounmap.
+ */
+void __iomem *ioremap_wc(unsigned long phys_addr, unsigned long size)
+{
+ return __ioremap(phys_addr, size, _PAGE_WC);
+}
+EXPORT_SYMBOL(ioremap_wc);
+
+
+/**
* iounmap - Free a IO remapping
* @addr: virtual address from ioremap_*
*
Index: linux-2.6.24-rc4/include/asm-x86/io_64.h
===================================================================
--- linux-2.6.24-rc4.orig/include/asm-x86/io_64.h 2007-12-11 14:24:56.000000000 -0800
+++ linux-2.6.24-rc4/include/asm-x86/io_64.h 2007-12-11 15:49:52.000000000 -0800
@@ -142,7 +142,8 @@
* it's useful if some control registers are in such an area and write combining
* or read caching is not desirable:
*/
-extern void __iomem * ioremap_nocache (unsigned long offset, unsigned long size);
+extern void __iomem * ioremap_nocache(unsigned long offset, unsigned long size);
+extern void __iomem * ioremap_wc(unsigned long offset, unsigned long size);
extern void iounmap(volatile void __iomem *addr);
extern void __iomem *fix_ioremap(unsigned idx, unsigned long phys);


--


2007-12-14 04:17:37

by Roland Dreier

[permalink] [raw]
Subject: Re: [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support

> --- linux-2.6.24-rc4.orig/include/asm-x86/io_64.h 2007-12-11 14:24:56.000000000 -0800
> +++ linux-2.6.24-rc4/include/asm-x86/io_64.h 2007-12-11 15:49:52.000000000 -0800
> @@ -142,7 +142,8 @@
> * it's useful if some control registers are in such an area and write combining
> * or read caching is not desirable:
> */
> -extern void __iomem * ioremap_nocache (unsigned long offset, unsigned long size);
> +extern void __iomem * ioremap_nocache(unsigned long offset, unsigned long size);
> +extern void __iomem * ioremap_wc(unsigned long offset, unsigned long size);

I think ioremap_wc() needs to be available on all archs for this to be
really useful to drivers. It can be a fallback to ioremap_nocache()
everywhere except 64-bit x86, but it's not nice for every driver that
wants to use this to need an "#ifdef X86" or whatever.

Also I didn't see anything like pgprot_wc() in the patchset (although
I just skimmed quickly for now). The use case I actually have would
be in a a driver's .mmap method, where I want to map device registers
into userspace with write-combining turned on:

vma->vm_page_prot = pgprot_wc(vma->vm_page_prot);
io_remap_pfn_range(vma, vma->vm_start, pfn, PAGE_SIZE, vma->vm_page_prot);

where the pfn points into a PCI BAR (which will only be mapped once,
so no issues with conflicting PATs or anything like that).

- R.

2007-12-14 04:31:32

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support

Roland Dreier <[email protected]> writes:

> > --- linux-2.6.24-rc4.orig/include/asm-x86/io_64.h 2007-12-11
> 14:24:56.000000000 -0800
> > +++ linux-2.6.24-rc4/include/asm-x86/io_64.h 2007-12-11 15:49:52.000000000
> -0800
> > @@ -142,7 +142,8 @@
> > * it's useful if some control registers are in such an area and write
> combining
> > * or read caching is not desirable:
> > */
> > -extern void __iomem * ioremap_nocache (unsigned long offset, unsigned long
> size);
> > +extern void __iomem * ioremap_nocache(unsigned long offset, unsigned long
> size);
> > +extern void __iomem * ioremap_wc(unsigned long offset, unsigned long size);
>
> I think ioremap_wc() needs to be available on all archs for this to be
> really useful to drivers. It can be a fallback to ioremap_nocache()
> everywhere except 64-bit x86, but it's not nice for every driver that
> wants to use this to need an "#ifdef X86" or whatever.
>
> Also I didn't see anything like pgprot_wc() in the patchset (although
pgprot_writcombined.

> I just skimmed quickly for now). The use case I actually have would
> be in a a driver's .mmap method, where I want to map device registers
> into userspace with write-combining turned on:
>
> vma->vm_page_prot = pgprot_wc(vma->vm_page_prot);
> io_remap_pfn_range(vma, vma->vm_start, pfn, PAGE_SIZE,
> vma->vm_page_prot);
>
> where the pfn points into a PCI BAR (which will only be mapped once,
> so no issues with conflicting PATs or anything like that).

Eric

2007-12-14 04:32:57

by Roland Dreier

[permalink] [raw]
Subject: Re: [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support

> > Also I didn't see anything like pgprot_wc() in the patchset (although

> pgprot_writcombined.

Oh I see it now (pgprot_writecombine() actually).

However the same comment as before applies: there needs to be a
fallback to pgprot_noncached() for all other architectures so that
drivers can actually use it in a sane way.

- R.

2007-12-14 04:51:46

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support

Roland Dreier <[email protected]> writes:

> > > Also I didn't see anything like pgprot_wc() in the patchset (although
>
> > pgprot_writcombined.
>
> Oh I see it now (pgprot_writecombine() actually).
>
> However the same comment as before applies: there needs to be a
> fallback to pgprot_noncached() for all other architectures so that
> drivers can actually use it in a sane way.

Sounds reasonable.

There are three distinct pieces to this.
- Getting arch/x86 caught up the state art in linux
- Getting the state of the art generalized so everyone can use it.
- Figuring how to generally do the proper conflict checking so we
don't shoot ourselves in the head by accident.

They are all independent problems and can be solved in any order.

It should be the conflict checking that is the actual bottleneck.
The rest is just gravy.

Eric

2007-12-14 21:42:08

by Suresh Siddha

[permalink] [raw]
Subject: Re: [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support

On Thu, Dec 13, 2007 at 09:48:36PM -0700, Eric W. Biederman wrote:
> Roland Dreier <[email protected]> writes:
>
> > > > Also I didn't see anything like pgprot_wc() in the patchset (although
> >
> > > pgprot_writcombined.
> >
> > Oh I see it now (pgprot_writecombine() actually).
> >
> > However the same comment as before applies: there needs to be a
> > fallback to pgprot_noncached() for all other architectures so that
> > drivers can actually use it in a sane way.
>
> Sounds reasonable.

We will fix it in next rev.

> It should be the conflict checking that is the actual bottleneck.
> The rest is just gravy.

Yes. We are looking for comments for our proposal to track the
reserved/non-reserved regions some what different.
This is the critical issue which had been holding off PAT for years now...

<snip from the other mails>

Change x86_64 identity map to only map non-reserved memory. This helps
to handle UC/WC mapping of reserved region in a much simple manner
(we don't have to do cpa any more, as such not keep track of the actual
reference counts. We still track all the usages to keep the mappings
consistent. We just avoid the headache of splitting mattr regions for
managing ref counts for every individual usage of the reserved area).

For now, we don't track RAM pages using memattr infrastructure. This is because,
memattr infrastructure is not enough. i.e., while the page is getting
tracked using memattr infrastructure, potentially the page can get
freed(a bug that we need to catch, to avoid attribute aliasing).
For example, a driver does ioremap_uc and an application mapped the
same page using /dev/mem. When the driver does iounamp and free the page,
/dev/mem mapping is still live and we run into aliasing issue.

Can we use the existing page struct to keep track of the attribute
and usage? /dev/mem mappings then can increment the page ref count and not
allow to free the page while the /dev/mem mappings are active. And allow
/dev/mem to map only those pages which are marked reserved (which the driver
does before doing iomap).

Or when a WB mapping through /dev/mem is active, don't allow any driver
to map the page as UC.. Can we do this tracking for RAM pages through
struct page. Or there any issues we should keep in mind..

thanks,
suresh

2007-12-14 23:19:38

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support

> For now, we don't track RAM pages using memattr infrastructure. This is because,
> memattr infrastructure is not enough. i.e., while the page is getting
> tracked using memattr infrastructure,

> potentially the page can get
> freed(a bug that we need to catch, to avoid attribute aliasing).

That would be a bug in the caller. But anybody who sets non standard
attributes should go through memattr first.

The only exception is /dev/mem -- not sure how to handle this best.
Perhaps only allow memattr for non memory for there.

-Andi

2007-12-18 08:31:55

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC PATCH 06/12] PAT 64b: Add ioremap_wc support

"Siddha, Suresh B" <[email protected]> writes:

> Yes. We are looking for comments for our proposal to track the
> reserved/non-reserved regions some what different.
> This is the critical issue which had been holding off PAT for years now...

The mattr infrastructure appears to do a decent job of handling the
reserved page mapping case. It essentially reinvents struct
vm_area_struct, and so I expect we can do things a little more easily
if we use the existing vm_area_struct with it's vm_page_prot member
for our checks, all rooted at a dummy reserved page inode. That way
we don't need to do anything special on unmap.

For normal pages we always have them in the kernel mapping and
we use them there. change_page_attr also comes in from the AGP
drivers and changes the caching attributes on a few of those. So when
mapping a normal page we need to require it to be write-back or
whatever change_page_attr has set it to. I expect 2 bits
of page->flags with the proper default can handle that.

change_page_attr needs to check non kernel mappings of a page
and either fix them or fail.

If we perform the checks I have described for normal pages
in /dev/mem (in remap_pfn_pages?) that should be our
most difficult case handled.

Eric

> <snip from the other mails>
>
> Change x86_64 identity map to only map non-reserved memory. This helps
> to handle UC/WC mapping of reserved region in a much simple manner
> (we don't have to do cpa any more, as such not keep track of the actual
> reference counts. We still track all the usages to keep the mappings
> consistent. We just avoid the headache of splitting mattr regions for
> managing ref counts for every individual usage of the reserved
> area).

Well we do want to early map the ``isa'' region.
>
> For now, we don't track RAM pages using memattr infrastructure. This is because,
> memattr infrastructure is not enough. i.e., while the page is getting
> tracked using memattr infrastructure, potentially the page can get
> freed(a bug that we need to catch, to avoid attribute aliasing).
> For example, a driver does ioremap_uc and an application mapped the
> same page using /dev/mem. When the driver does iounamp and free the page,
> /dev/mem mapping is still live and we run into aliasing issue.

/dev/mem is particular weird because it doesn't own the page, and thus
will always be the second user if we are talking about pages in ram.

> Can we use the existing page struct to keep track of the attribute
> and usage?

Yes but not the way you describe below.

> /dev/mem mappings then can increment the page ref count and not
> allow to free the page while the /dev/mem mappings are active. And allow
> /dev/mem to map only those pages which are marked reserved (which the driver
> does before doing iomap).

Part of the usefulness of /dev/mem is that it can do silly things like
map pages someone else in the kernel is using. /dev/mem by it's very
nature does not own ram pages so we need to handle that differently.

> Or when a WB mapping through /dev/mem is active, don't allow any driver
> to map the page as UC.. Can we do this tracking for RAM pages through
> struct page. Or there any issues we should keep in mind..

I think some bits in page->flags should do the trick. The semantics
of change_page_attr are interesting in this case.

Eric