Drivers use mmap followed by pgprot_* and remap_pfn_range or vm_insert_pfn,
in order to export reserved memory to userspace. Currently, such mappings are
not tracked and hence not kept consistent with other mappings (/dev/mem,
pci resource, ioremap) for the sme memory, that may exist in the system.
The following patchset adds x86 PAT attribute tracking and untracking for
pfnmap related APIs.
First three patches in the patchset are changing the generic mm code to fit
in this tracking. Last four patches are x86 specific to make things work
with x86 PAT code. The patchset aso introduces pgprot_writecombine interface,
which gives writecombine mapping when enabled, falling back to
pgprot_noncached otherwise.
This patch:
While working on x86 PAT, we faced some hurdles with trackking
remap_pfn_range() regions, as we do not have any information to say
whether that PFNMAP mapping is linear for the entire vma range or
it is smaller granularity regions within the vma.
A simple solution to this is to use vm_pgoff as an indicator for
linear mapping over the vma region. Currently, remap_pfn_range
only sets vm_pgoff for COW mappings. Below patch changes the
logic and sets the vm_pgoff irrespective of COW. This will still not
be enough for the case where pfn is zero (vma region mapped to
physical address zero). But, for all the other cases, we can look at
pfnmap VMAs and say whether the mappng is for the entire vma region
or not.
Signed-off-by: Venkatesh Pallipadi <[email protected]>
Signed-off-by: Suresh Siddha <[email protected]>
---
include/linux/mm.h | 9 +++++++++
mm/memory.c | 7 +++----
2 files changed, 12 insertions(+), 4 deletions(-)
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c 2008-12-17 17:24:31.000000000 -0800
+++ linux-2.6/mm/memory.c 2008-12-18 10:10:46.000000000 -0800
@@ -1575,11 +1575,10 @@ int remap_pfn_range(struct vm_area_struc
* behaviour that some programs depend on. We mark the "original"
* un-COW'ed pages by matching them up with "vma->vm_pgoff".
*/
- if (is_cow_mapping(vma->vm_flags)) {
- if (addr != vma->vm_start || end != vma->vm_end)
- return -EINVAL;
+ if (addr == vma->vm_start && end == vma->vm_end)
vma->vm_pgoff = pfn;
- }
+ else if (is_cow_mapping(vma->vm_flags))
+ return -EINVAL;
vma->vm_flags |= VM_IO | VM_RESERVED | VM_PFNMAP;
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h 2008-12-17 17:24:31.000000000 -0800
+++ linux-2.6/include/linux/mm.h 2008-12-18 10:10:46.000000000 -0800
@@ -145,6 +145,15 @@ extern pgprot_t protection_map[16];
#define FAULT_FLAG_WRITE 0x01 /* Fault was a write access */
#define FAULT_FLAG_NONLINEAR 0x02 /* Fault was via a nonlinear mapping */
+static inline int is_linear_pfn_mapping(struct vm_area_struct *vma)
+{
+ return ((vma->vm_flags & VM_PFNMAP) && vma->vm_pgoff);
+}
+
+static inline int is_pfn_mapping(struct vm_area_struct *vma)
+{
+ return (vma->vm_flags & VM_PFNMAP);
+}
/*
* vm_fault is filled by the the pagefault handler and passed to the vma's
--
On Thu, Dec 18, 2008 at 11:41:27AM -0800, [email protected] wrote:
> Drivers use mmap followed by pgprot_* and remap_pfn_range or vm_insert_pfn,
> in order to export reserved memory to userspace. Currently, such mappings are
> not tracked and hence not kept consistent with other mappings (/dev/mem,
> pci resource, ioremap) for the sme memory, that may exist in the system.
>
> The following patchset adds x86 PAT attribute tracking and untracking for
> pfnmap related APIs.
>
> First three patches in the patchset are changing the generic mm code to fit
> in this tracking. Last four patches are x86 specific to make things work
> with x86 PAT code. The patchset aso introduces pgprot_writecombine interface,
> which gives writecombine mapping when enabled, falling back to
> pgprot_noncached otherwise.
>
> This patch:
>
> While working on x86 PAT, we faced some hurdles with trackking
> remap_pfn_range() regions, as we do not have any information to say
> whether that PFNMAP mapping is linear for the entire vma range or
> it is smaller granularity regions within the vma.
>
> A simple solution to this is to use vm_pgoff as an indicator for
> linear mapping over the vma region. Currently, remap_pfn_range
> only sets vm_pgoff for COW mappings. Below patch changes the
> logic and sets the vm_pgoff irrespective of COW. This will still not
> be enough for the case where pfn is zero (vma region mapped to
> physical address zero). But, for all the other cases, we can look at
> pfnmap VMAs and say whether the mappng is for the entire vma region
> or not.
>
> Signed-off-by: Venkatesh Pallipadi <[email protected]>
> Signed-off-by: Suresh Siddha <[email protected]>
>
> ---
> include/linux/mm.h | 9 +++++++++
> mm/memory.c | 7 +++----
> 2 files changed, 12 insertions(+), 4 deletions(-)
>
> Index: linux-2.6/mm/memory.c
> ===================================================================
> --- linux-2.6.orig/mm/memory.c 2008-12-17 17:24:31.000000000 -0800
> +++ linux-2.6/mm/memory.c 2008-12-18 10:10:46.000000000 -0800
> @@ -1575,11 +1575,10 @@ int remap_pfn_range(struct vm_area_struc
> * behaviour that some programs depend on. We mark the "original"
> * un-COW'ed pages by matching them up with "vma->vm_pgoff".
> */
> - if (is_cow_mapping(vma->vm_flags)) {
> - if (addr != vma->vm_start || end != vma->vm_end)
> - return -EINVAL;
> + if (addr == vma->vm_start && end == vma->vm_end)
> vma->vm_pgoff = pfn;
> - }
> + else if (is_cow_mapping(vma->vm_flags))
> + return -EINVAL;
>
> vma->vm_flags |= VM_IO | VM_RESERVED | VM_PFNMAP;
>
> Index: linux-2.6/include/linux/mm.h
> ===================================================================
> --- linux-2.6.orig/include/linux/mm.h 2008-12-17 17:24:31.000000000 -0800
> +++ linux-2.6/include/linux/mm.h 2008-12-18 10:10:46.000000000 -0800
> @@ -145,6 +145,15 @@ extern pgprot_t protection_map[16];
> #define FAULT_FLAG_WRITE 0x01 /* Fault was a write access */
> #define FAULT_FLAG_NONLINEAR 0x02 /* Fault was via a nonlinear mapping */
>
> +static inline int is_linear_pfn_mapping(struct vm_area_struct *vma)
> +{
> + return ((vma->vm_flags & VM_PFNMAP) && vma->vm_pgoff);
> +}
> +
> +static inline int is_pfn_mapping(struct vm_area_struct *vma)
> +{
> + return (vma->vm_flags & VM_PFNMAP);
> +}
>
> /*
> * vm_fault is filled by the the pagefault handler and passed to the vma's
This is fine by me, however:
1. Can you add some comments to say "this is not for core vm but for pat,
oh and a pgoff of zero is not going to work".
2. Can you please justify to me (or the changelog) roughly why PAT wants
to know if the mapping is linear or not? Presumably it has to handle
both types? If performance wasn't an issue, then you could manually scan
the ptes to verify (which would solve your zero-offset bug). etc.
On Thu, Dec 18, 2008 at 01:27:28PM -0800, Nick Piggin wrote:
> On Thu, Dec 18, 2008 at 11:41:27AM -0800, [email protected] wrote:
> > Drivers use mmap followed by pgprot_* and remap_pfn_range or vm_insert_pfn,
> > in order to export reserved memory to userspace. Currently, such mappings are
> > not tracked and hence not kept consistent with other mappings (/dev/mem,
> > pci resource, ioremap) for the sme memory, that may exist in the system.
> >
> > The following patchset adds x86 PAT attribute tracking and untracking for
> > pfnmap related APIs.
> >
> > First three patches in the patchset are changing the generic mm code to fit
> > in this tracking. Last four patches are x86 specific to make things work
> > with x86 PAT code. The patchset aso introduces pgprot_writecombine interface,
> > which gives writecombine mapping when enabled, falling back to
> > pgprot_noncached otherwise.
> >
> > This patch:
> >
> > While working on x86 PAT, we faced some hurdles with trackking
> > remap_pfn_range() regions, as we do not have any information to say
> > whether that PFNMAP mapping is linear for the entire vma range or
> > it is smaller granularity regions within the vma.
> >
> > A simple solution to this is to use vm_pgoff as an indicator for
> > linear mapping over the vma region. Currently, remap_pfn_range
> > only sets vm_pgoff for COW mappings. Below patch changes the
> > logic and sets the vm_pgoff irrespective of COW. This will still not
> > be enough for the case where pfn is zero (vma region mapped to
> > physical address zero). But, for all the other cases, we can look at
> > pfnmap VMAs and say whether the mappng is for the entire vma region
> > or not.
> >
> > Signed-off-by: Venkatesh Pallipadi <[email protected]>
> > Signed-off-by: Suresh Siddha <[email protected]>
> >
> > ---
> > include/linux/mm.h | 9 +++++++++
> > mm/memory.c | 7 +++----
> > 2 files changed, 12 insertions(+), 4 deletions(-)
> >
> > Index: linux-2.6/mm/memory.c
> > ===================================================================
> > --- linux-2.6.orig/mm/memory.c 2008-12-17 17:24:31.000000000 -0800
> > +++ linux-2.6/mm/memory.c 2008-12-18 10:10:46.000000000 -0800
> > @@ -1575,11 +1575,10 @@ int remap_pfn_range(struct vm_area_struc
> > * behaviour that some programs depend on. We mark the "original"
> > * un-COW'ed pages by matching them up with "vma->vm_pgoff".
> > */
> > - if (is_cow_mapping(vma->vm_flags)) {
> > - if (addr != vma->vm_start || end != vma->vm_end)
> > - return -EINVAL;
> > + if (addr == vma->vm_start && end == vma->vm_end)
> > vma->vm_pgoff = pfn;
> > - }
> > + else if (is_cow_mapping(vma->vm_flags))
> > + return -EINVAL;
> >
> > vma->vm_flags |= VM_IO | VM_RESERVED | VM_PFNMAP;
> >
> > Index: linux-2.6/include/linux/mm.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/mm.h 2008-12-17 17:24:31.000000000 -0800
> > +++ linux-2.6/include/linux/mm.h 2008-12-18 10:10:46.000000000 -0800
> > @@ -145,6 +145,15 @@ extern pgprot_t protection_map[16];
> > #define FAULT_FLAG_WRITE 0x01 /* Fault was a write access */
> > #define FAULT_FLAG_NONLINEAR 0x02 /* Fault was via a nonlinear mapping */
> >
> > +static inline int is_linear_pfn_mapping(struct vm_area_struct *vma)
> > +{
> > + return ((vma->vm_flags & VM_PFNMAP) && vma->vm_pgoff);
> > +}
> > +
> > +static inline int is_pfn_mapping(struct vm_area_struct *vma)
> > +{
> > + return (vma->vm_flags & VM_PFNMAP);
> > +}
> >
> > /*
> > * vm_fault is filled by the the pagefault handler and passed to the vma's
>
> This is fine by me, however:
> 1. Can you add some comments to say "this is not for core vm but for pat,
> oh and a pgoff of zero is not going to work".
OK. Will add comments about both the points.
> 2. Can you please justify to me (or the changelog) roughly why PAT wants
> to know if the mapping is linear or not? Presumably it has to handle
> both types? If performance wasn't an issue, then you could manually scan
> the ptes to verify (which would solve your zero-offset bug). etc.
The main reason is performance. If we know it is linear, we can track the entire
region as one block and do the reserve free for entire region. But, if it is
not linear, then we have to reserve memtype of physical addresses page by page.
This will not be optimal as it will result in reserve and free becoming
slower. Almost all users that we find in kernel today (atleast in x86) are
all linear.
Thanks,
Venki
On Thu, Dec 18, 2008 at 02:10:57PM -0800, Pallipadi, Venkatesh wrote:
> On Thu, Dec 18, 2008 at 01:27:28PM -0800, Nick Piggin wrote:
> >
> > This is fine by me, however:
> > 1. Can you add some comments to say "this is not for core vm but for pat,
> > oh and a pgoff of zero is not going to work".
>
> OK. Will add comments about both the points.
>
> > 2. Can you please justify to me (or the changelog) roughly why PAT wants
> > to know if the mapping is linear or not? Presumably it has to handle
> > both types? If performance wasn't an issue, then you could manually scan
> > the ptes to verify (which would solve your zero-offset bug). etc.
>
> The main reason is performance. If we know it is linear, we can track the entire
> region as one block and do the reserve free for entire region. But, if it is
> not linear, then we have to reserve memtype of physical addresses page by page.
> This will not be optimal as it will result in reserve and free becoming
> slower. Almost all users that we find in kernel today (atleast in x86) are
> all linear.
OK, so it is not a bug to miss the zero pgoff case then. That's good
to know and should be added to comments.