On 06/20, Oleg Nesterov wrote:
>
> Let me first send the changes which look "obviously correct" to me.
> Perhaps I'll send more patches on top of this later.
But lets also fix another unmap/remap bug before the cleanups...
This series doesn't depend on the previous mremap fixes.
special_mapping_fault() is absolutely broken. It seems it was always
wrong, but this didn't matter until vdso/vvar started to use more than
one page.
I am not sure about 1/3. As the changelog says the name is not very
accurate, and I do not really like the vma->fault != NULL check. But
I hope this can work, and we can change this helper later if needed.
Please review.
Oleg.
include/linux/mm.h | 5 +++++
mm/memory.c | 13 ++++++-------
mm/mmap.c | 14 +++-----------
3 files changed, 14 insertions(+), 18 deletions(-)
Preparation. Add the new simple helper, vma_is_anonymous(vma),
and change handle_pte_fault() to use it. It will have more users.
The name is not very accurate, say mmap_mem/VM_PFNMAP vma is not
anonymous. Perhaps it should be named vma_has_fault() instead.
But it matches the logic in mmap.c/memory.c, see next changes.
Signed-off-by: Oleg Nesterov <[email protected]>
---
include/linux/mm.h | 5 +++++
mm/memory.c | 13 ++++++-------
2 files changed, 11 insertions(+), 7 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0755b9f..a0fe3d3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1225,6 +1225,11 @@ static inline int vma_growsdown(struct vm_area_struct *vma, unsigned long addr)
return vma && (vma->vm_end == addr) && (vma->vm_flags & VM_GROWSDOWN);
}
+static inline bool vma_is_anonymous(struct vm_area_struct *vma)
+{
+ return !vma->vm_ops || !vma->vm_ops->fault;
+}
+
static inline int stack_guard_page_start(struct vm_area_struct *vma,
unsigned long addr)
{
diff --git a/mm/memory.c b/mm/memory.c
index 22e037e..f50ed81 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3242,13 +3242,12 @@ static int handle_pte_fault(struct mm_struct *mm,
barrier();
if (!pte_present(entry)) {
if (pte_none(entry)) {
- if (vma->vm_ops) {
- if (likely(vma->vm_ops->fault))
- return do_fault(mm, vma, address, pte,
- pmd, flags, entry);
- }
- return do_anonymous_page(mm, vma, address,
- pte, pmd, flags);
+ if (vma_is_anonymous(vma))
+ return do_anonymous_page(mm, vma, address,
+ pte, pmd, flags);
+ else
+ return do_fault(mm, vma, address, pte, pmd,
+ flags, entry);
}
return do_swap_page(mm, vma, address,
pte, pmd, flags, entry);
--
1.5.5.1
Test-case:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <assert.h>
void *find_vdso_vaddr(void)
{
FILE *perl;
char buf[32] = {};
perl = popen("perl -e 'open STDIN,qq|/proc/@{[getppid]}/maps|;"
"/^(.*?)-.*vdso/ && print hex $1 while <>'", "r");
fread(buf, sizeof(buf), 1, perl);
fclose(perl);
return (void *)atol(buf);
}
#define PAGE_SIZE 4096
int main(void)
{
void *vdso = find_vdso_vaddr();
assert(vdso);
// of course they should differ, and they do so far
printf("vdso pages differ: %d\n",
!!memcmp(vdso, vdso + PAGE_SIZE, PAGE_SIZE));
// split into 2 vma's
assert(mprotect(vdso, PAGE_SIZE, PROT_READ) == 0);
// force another fault on the next check
assert(madvise(vdso, 2 * PAGE_SIZE, MADV_DONTNEED) == 0);
// now they no longer differ, the 2nd vm_pgoff is wrong
printf("vdso pages differ: %d\n",
!!memcmp(vdso, vdso + PAGE_SIZE, PAGE_SIZE));
return 0;
}
Output:
vdso pages differ: 1
vdso pages differ: 0
This is because split_vma() correctly updates ->vm_pgoff, but the logic
in insert_vm_struct() and special_mapping_fault() is absolutely broken,
so the fault at vdso + PAGE_SIZE return the 1st page. The same happens
if you simply unmap the 1st page.
special_mapping_fault() does:
pgoff = vmf->pgoff - vma->vm_pgoff;
and this is _only_ correct if vma->vm_start mmaps the first page from
->vm_private_data array.
vdso or any other user of install_special_mapping() is not anonymous,
it has the "backing storage" even if it is just the array of pages.
So we actually need to make vm_pgoff work as an offset in this array.
Note: this also allows to fix another problem: currently gdb can't access
"[vvar]" memory because in this case special_mapping_fault() doesn't work.
Now that we can use ->vm_pgoff we can implement ->access() and fix this.
Signed-off-by: Oleg Nesterov <[email protected]>
---
mm/mmap.c | 12 ++----------
1 files changed, 2 insertions(+), 10 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index bb50cac..992417f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2871,7 +2871,7 @@ int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma)
* using the existing file pgoff checks and manipulations.
* Similarly in do_mmap_pgoff and in do_brk.
*/
- if (!vma->vm_file) {
+ if (vma_is_anonymous(vma)) {
BUG_ON(vma->anon_vma);
vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;
}
@@ -3013,21 +3013,13 @@ static int special_mapping_fault(struct vm_area_struct *vma,
pgoff_t pgoff;
struct page **pages;
- /*
- * special mappings have no vm_file, and in that case, the mm
- * uses vm_pgoff internally. So we have to subtract it from here.
- * We are allowed to do this because we are the mm; do not copy
- * this code into drivers!
- */
- pgoff = vmf->pgoff - vma->vm_pgoff;
-
if (vma->vm_ops == &legacy_special_mapping_vmops)
pages = vma->vm_private_data;
else
pages = ((struct vm_special_mapping *)vma->vm_private_data)->
pages;
- for (; pgoff && *pages; ++pages)
+ for (pgoff = vmf->pgoff; pgoff && *pages; ++pages)
pgoff--;
if (*pages) {
--
1.5.5.1
Test-case:
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <assert.h>
void *find_vdso_vaddr(void)
{
FILE *perl;
char buf[32] = {};
perl = popen("perl -e 'open STDIN,qq|/proc/@{[getppid]}/maps|;"
"/^(.*?)-.*vdso/ && print hex $1 while <>'", "r");
fread(buf, sizeof(buf), 1, perl);
fclose(perl);
return (void *)atol(buf);
}
#define PAGE_SIZE 4096
void *get_unmapped_area(void)
{
void *p = mmap(0, PAGE_SIZE, PROT_NONE,
MAP_PRIVATE|MAP_ANONYMOUS, -1,0);
assert(p != MAP_FAILED);
munmap(p, PAGE_SIZE);
return p;
}
char save[2][PAGE_SIZE];
int main(void)
{
void *vdso = find_vdso_vaddr();
void *page[2];
assert(vdso);
memcpy(save, vdso, sizeof (save));
// force another fault on the next check
assert(madvise(vdso, 2 * PAGE_SIZE, MADV_DONTNEED) == 0);
page[0] = mremap(vdso,
PAGE_SIZE, PAGE_SIZE, MREMAP_FIXED | MREMAP_MAYMOVE,
get_unmapped_area());
page[1] = mremap(vdso + PAGE_SIZE,
PAGE_SIZE, PAGE_SIZE, MREMAP_FIXED | MREMAP_MAYMOVE,
get_unmapped_area());
assert(page[0] != MAP_FAILED && page[1] != MAP_FAILED);
printf("match: %d %d\n",
!memcmp(save[0], page[0], PAGE_SIZE),
!memcmp(save[1], page[1], PAGE_SIZE));
return 0;
}
fails without this patch. Before the previous commit it gets the wrong
page, now it segfaults (which is imho better).
This is because copy_vma() wrongly assumes that if vma->vm_file == NULL
is irrelevant until the first fault which will use do_anonymous_page().
This is obviously wrong for the special mapping.
Signed-off-by: Oleg Nesterov <[email protected]>
---
mm/mmap.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 992417f..2185cd9 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2905,7 +2905,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
* If anonymous vma has not yet been faulted, update new pgoff
* to match new location, to increase its chance of merging.
*/
- if (unlikely(!vma->vm_file && !vma->anon_vma)) {
+ if (unlikely(vma_is_anonymous(vma) && !vma->anon_vma)) {
pgoff = addr >> PAGE_SHIFT;
faulted_in_anon_vma = false;
}
--
1.5.5.1
Forgot to add Andy...
And forgot to mention. As for vdso in particular, I'd actually prefer
to make it have ->vm_file != NULL so that uprobe-in-vdso could work.
But this is not that simple, and I think these fixes (if correct) make
sense in any case, whatever we do with vdso.
On 06/21, Oleg Nesterov wrote:
>
> On 06/20, Oleg Nesterov wrote:
> >
> > Let me first send the changes which look "obviously correct" to me.
> > Perhaps I'll send more patches on top of this later.
>
> But lets also fix another unmap/remap bug before the cleanups...
> This series doesn't depend on the previous mremap fixes.
>
> special_mapping_fault() is absolutely broken. It seems it was always
> wrong, but this didn't matter until vdso/vvar started to use more than
> one page.
>
> I am not sure about 1/3. As the changelog says the name is not very
> accurate, and I do not really like the vma->fault != NULL check. But
> I hope this can work, and we can change this helper later if needed.
>
> Please review.
>
> Oleg.
>
> include/linux/mm.h | 5 +++++
> mm/memory.c | 13 ++++++-------
> mm/mmap.c | 14 +++-----------
> 3 files changed, 14 insertions(+), 18 deletions(-)
On 06/21, Oleg Nesterov wrote:
>
> Forgot to add Andy...
Add Pavel ;)
I never understood why ->mremap() lives in file_operations, not in
vm_operations_struct. To me vma->vm_file->f_op in move_vma() just
looks strange, vma->vm_ops->mremap(new_vma) looks "obviously better".
And afaics more useful. CRIU remaps vdso, but this does not update
mm->context.vdso. OK, probably this does not matter currently, CRIU
can't c/r the compat tasks, and 64-bit apps do not use context.vdso.
Afaics. Still, I think we might want to have special_mapping_remap()
and we can't do this because ->vm_file == NULL.
And perhaps other architectures can depend on the "correct" value
in >context.vdso more then x86, I dunno...
In short. Shouldn't we move ->mremap() to vm_operations_struct before
it has another user? We need to fix aio.c, but this is trivial.
No?
Oleg.
On Mon, Jun 22, 2015 at 5:47 PM, Oleg Nesterov <[email protected]> wrote:
>
> In short. Shouldn't we move ->mremap() to vm_operations_struct before
> it has another user? We need to fix aio.c, but this is trivial.
>
> No?
Sounds like the right thing to do, but maybe there's some reason I
miss that it's the way it is..
Linus
On Mon, Jun 22, 2015 at 5:47 PM, Oleg Nesterov <[email protected]> wrote:
> On 06/21, Oleg Nesterov wrote:
>>
>> Forgot to add Andy...
>
> Add Pavel ;)
>
> I never understood why ->mremap() lives in file_operations, not in
> vm_operations_struct. To me vma->vm_file->f_op in move_vma() just
> looks strange, vma->vm_ops->mremap(new_vma) looks "obviously better".
>
> And afaics more useful. CRIU remaps vdso, but this does not update
> mm->context.vdso. OK, probably this does not matter currently, CRIU
> can't c/r the compat tasks, and 64-bit apps do not use context.vdso.
> Afaics. Still, I think we might want to have special_mapping_remap()
> and we can't do this because ->vm_file == NULL.
I would like this. Then I could clean up and resubmit my patch to
keep context.vdso up to date.
Oleg, can you let me know what patch, if any, I should be reviewing?
--Andy
On 06/23/2015 03:47 AM, Oleg Nesterov wrote:
> On 06/21, Oleg Nesterov wrote:
>>
>> Forgot to add Andy...
>
> Add Pavel ;)
>
> I never understood why ->mremap() lives in file_operations, not in
> vm_operations_struct. To me vma->vm_file->f_op in move_vma() just
> looks strange, vma->vm_ops->mremap(new_vma) looks "obviously better".
>
> And afaics more useful. CRIU remaps vdso, but this does not update
> mm->context.vdso. OK, probably this does not matter currently, CRIU
> can't c/r the compat tasks, and 64-bit apps do not use context.vdso.
Yup, the context.vdso exists not for all vdso-s. There should have been
a patch for power arch adding this change.
> Afaics. Still, I think we might want to have special_mapping_remap()
> and we can't do this because ->vm_file == NULL.
For aio (the single for now mapping with mremap callback) the vm_file
is not NULL.
> And perhaps other architectures can depend on the "correct" value
> in >context.vdso more then x86, I dunno...
>
> In short. Shouldn't we move ->mremap() to vm_operations_struct before
> it has another user? We need to fix aio.c, but this is trivial.
>
> No?
I will be OK with this change :)
-- Pavel
On 06/23, Pavel Emelyanov wrote:
>
> On 06/23/2015 03:47 AM, Oleg Nesterov wrote:
>
> > Afaics. Still, I think we might want to have special_mapping_remap()
> > and we can't do this because ->vm_file == NULL.
>
> For aio (the single for now mapping with mremap callback) the vm_file
> is not NULL.
I meant, vdso can't hook ->remap because vm_file is NULL.
> > In short. Shouldn't we move ->mremap() to vm_operations_struct before
> > it has another user? We need to fix aio.c, but this is trivial.
> >
> > No?
>
> I will be OK with this change :)
OK, nobody objects, I'll send a patch...
Oleg.