2004-04-19 14:04:18

by Jamie Lokier

[permalink] [raw]
Subject: Non-linear mappings and truncate/madvise(MADV_DONTNEED)

A couple of thoughts on non-linear mappings. Vanilla 2.6.5.

I'm reading madvise_dontneed() and thinking about that zap_page_range()
call. It'll wipe non-linear file offset ptes, won't it?

MADV_DONTNEED is actually a reasonable thing to do with a non-linear
mapping, when you no longer need some of the pages. You could argue
that losing the offsets is acceptable in this case, but I think it's a
poor argument. The offsets should be preserved while zapping the ptes.

Then there's vmtruncate() and invalidate_mmap_range() which calls
zap_page_range(). When you call truncate(), the non-linear offsets
appear to be lost (I'm reading the code, not testing it) for the part
of each VMA corresponding to where the linear mapping would have been.

That means (a) a peculiar part of the mapping is lost, and (b) some of
the truncated pages will stay mapped, if they're in a part of a VMA
which didn't get wiped by the linear calculation.

Do any of the latest objrmap patches fix these problems? Have I
misdiagnosed these problems?

Both would be fixed by changing zap_page_range() to preserve the
non-linear offsets. Here's a patch which does exactly that. I'm sure
there's a fatal flaw: for example aren't page table pages supposed to
be empty after an mm is released? This patch is to show an idea only...
Compiled but not tested.

Your comments would be appreciated.

Thanks,
-- Jamie

--- orig-2.6.5/mm/memory.c 2004-04-14 08:30:04.000000000 +0100
+++ dual-2.6.5/mm/memory.c 2004-04-19 13:57:30.000000000 +0100
@@ -385,10 +385,10 @@
}

static void
-zap_pte_range(struct mmu_gather *tlb, pmd_t * pmd,
+zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t * pmd,
unsigned long address, unsigned long size)
{
- unsigned long offset;
+ unsigned long offset, pgidx;
pte_t *ptep;

if (pmd_none(*pmd))
@@ -399,11 +399,12 @@
return;
}
ptep = pte_offset_map(pmd, address);
+ pgidx = ((address - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
offset = address & ~PMD_MASK;
if (offset + size > PMD_SIZE)
size = PMD_SIZE - offset;
size &= PAGE_MASK;
- for (offset=0; offset < size; ptep++, offset += PAGE_SIZE) {
+ for (offset=0; offset < size; ptep++, pgidx++, offset += PAGE_SIZE) {
pte_t pte = *ptep;
if (pte_none(pte))
continue;
@@ -420,14 +421,18 @@
if (page->mapping && pte_young(pte) &&
!PageSwapCache(page))
mark_page_accessed(page);
+ /* Preserve non-linear offsets. */
+ if ((pgidx >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)) != page->index) {
+ set_pte(ptep, pgoff_to_pte(page->index));
+ BUG_ON(!pte_file(*ptep));
+ }
tlb->freed++;
page_remove_rmap(page, ptep);
tlb_remove_page(tlb, page);
}
}
- } else {
- if (!pte_file(pte))
- free_swap_and_cache(pte_to_swp_entry(pte));
+ } else if (!pte_file(pte)) { /* Preserve non-linear offsets. */
+ free_swap_and_cache(pte_to_swp_entry(pte));
pte_clear(ptep);
}
}
@@ -435,7 +440,7 @@
}

static void
-zap_pmd_range(struct mmu_gather *tlb, pgd_t * dir,
+zap_pmd_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pgd_t * dir,
unsigned long address, unsigned long size)
{
pmd_t * pmd;
@@ -453,7 +458,7 @@
if (end > ((address + PGDIR_SIZE) & PGDIR_MASK))
end = ((address + PGDIR_SIZE) & PGDIR_MASK);
do {
- zap_pte_range(tlb, pmd, address, end - address);
+ zap_pte_range(tlb, vma, pmd, address, end - address);
address = (address + PMD_SIZE) & PMD_MASK;
pmd++;
} while (address < end);
@@ -474,7 +479,7 @@
dir = pgd_offset(vma->vm_mm, address);
tlb_start_vma(tlb, vma);
do {
- zap_pmd_range(tlb, dir, address, end - address);
+ zap_pmd_range(tlb, vma, dir, address, end - address);
address = (address + PGDIR_SIZE) & PGDIR_MASK;
dir++;
} while (address && (address < end));


2004-04-19 15:01:56

by Hugh Dickins

[permalink] [raw]
Subject: Re: Non-linear mappings and truncate/madvise(MADV_DONTNEED)

On Mon, 19 Apr 2004, Jamie Lokier wrote:
> A couple of thoughts on non-linear mappings. Vanilla 2.6.5.
>
> I'm reading madvise_dontneed() and thinking about that zap_page_range()
> call. It'll wipe non-linear file offset ptes, won't it?

Yes, at present.

> MADV_DONTNEED is actually a reasonable thing to do with a non-linear
> mapping, when you no longer need some of the pages. You could argue
> that losing the offsets is acceptable in this case, but I think it's a
> poor argument. The offsets should be preserved while zapping the ptes.

Yes. And I think it also implies that the ->populate functions
are wrong to fail beyond EOF, should just set up file ptes there.

> Then there's vmtruncate() and invalidate_mmap_range() which calls
> zap_page_range(). When you call truncate(), the non-linear offsets
> appear to be lost (I'm reading the code, not testing it) for the part
> of each VMA corresponding to where the linear mapping would have been.
>
> That means (a) a peculiar part of the mapping is lost, and (b) some of
> the truncated pages will stay mapped, if they're in a part of a VMA
> which didn't get wiped by the linear calculation.
>
> Do any of the latest objrmap patches fix these problems? Have I
> misdiagnosed these problems?

rmap 6 nonlinear truncation (which never appeared on LKML, though
sent twice) fixed most of this, and went into 2.6.6-rc1-bk4 last
night: please check it out.

But I just converted madvise_dontneed by rote, adding a NULL arg to
zap_page_range, missing your point that it should respect nonlinearity.

And I made the zap_details structure private to mm/memory.c since I
hadn't noticed anything outside needing it: I'll fix that up later
and post a patch.

I'm haven't and don't intend to change the behaviour of ->populate,
without agreement from others - Ingo? Jamie?

Hugh

2004-04-22 07:54:11

by Ingo Molnar

[permalink] [raw]
Subject: Re: Non-linear mappings and truncate/madvise(MADV_DONTNEED)


On Mon, 19 Apr 2004, Hugh Dickins wrote:

> rmap 6 nonlinear truncation (which never appeared on LKML, though
> sent twice) fixed most of this, and went into 2.6.6-rc1-bk4 last
> night: please check it out.
>
> But I just converted madvise_dontneed by rote, adding a NULL arg to
> zap_page_range, missing your point that it should respect nonlinearity.
>
> And I made the zap_details structure private to mm/memory.c since I
> hadn't noticed anything outside needing it: I'll fix that up later and
> post a patch.
>
> I'm haven't and don't intend to change the behaviour of ->populate,
> without agreement from others - Ingo? Jamie?

feel free. I've got followup work, protection bits stored in the swap pte,
thus per-page protection possible via remap_file_pages_prot(). (earlier
-mm trees had this but it clashed with objrmap which has priority.)

Ingo