LinuxLists.cc - [rfc][patch 0/14] mm: performance improvements

2005-11-06 08:09:39

Subject: [rfc][patch 0/14] mm: performance improvements

The following patchset is a set of performance optimisations
for the mm subsystem. They mainly focus on the page allocator
because that is a very hot path for kbuild, which is my target
workload.

Performance improvements are not finely documented yet, so they
are not indented for merging yet. Also some rmap optimisations
that Hugh probably won't have time to ACK for a while.

However, a slightly older patchset was able to decrease kernel
residency by about 5% for UP, and 7.5% for SMP on a dual Xeon
doing kbuild.

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com

2005-11-06 08:18:42

by Nick Piggin

[permalink] [raw]

Subject: [patch 2/14] mm: pte prefetch

Prefetch ptes a line ahead. Worth 25% on ia64 when doing big forks.

Index: linux-2.6/include/asm-generic/pgtable.h
===================================================================
--- linux-2.6.orig/include/asm-generic/pgtable.h
+++ linux-2.6/include/asm-generic/pgtable.h
@@ -196,6 +196,33 @@ static inline void ptep_set_wrprotect(st
})
#endif

+#ifndef __HAVE_ARCH_PTE_PREFETCH
+#define PTES_PER_LINE (L1_CACHE_BYTES / sizeof(pte_t))
+#define PTE_LINE_MASK (~(PTES_PER_LINE - 1))
+#define ADDR_PER_LINE (PTES_PER_LINE << PAGE_SHIFT)
+#define ADDR_LINE_MASK (~(ADDR_PER_LINE - 1))
+
+#define pte_prefetch(pte, addr, end) \
+({ \
+ unsigned long __nextline = ((addr) + ADDR_PER_LINE) & ADDR_LINE_MASK; \
+ if (__nextline < (end)) \
+ prefetch(pte + PTES_PER_LINE); \
+})
+
+#define pte_prefetch_start(pte, addr, end) \
+({ \
+ prefetch(pte); \
+ pte_prefetch(pte, addr, end); \
+})
+
+#define pte_prefetch_next(pte, addr, end) \
+({ \
+ unsigned long __addr = (addr); \
+ if (!(__addr & ~ADDR_LINE_MASK)) /* We hit a new cacheline */ \
+ pte_prefetch(pte, __addr, end); \
+})
+#endif
+
#ifndef __ASSEMBLY__
/*
* When walking page tables, we usually want to skip any p?d_none entries;
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -437,6 +437,8 @@ again:
if (!dst_pte)
return -ENOMEM;
src_pte = pte_offset_map_nested(src_pmd, addr);
+ pte_prefetch_start(src_pte, addr, end);
+
src_ptl = pte_lockptr(src_mm, src_pmd);
spin_lock(src_ptl);

@@ -458,7 +460,8 @@ again:
}
copy_one_pte(dst_mm, src_mm, dst_pte, src_pte, vma, addr, rss);
progress += 8;
- } while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end);
+ } while (dst_pte++, src_pte++, addr += PAGE_SIZE,
+ pte_prefetch_next(src_pte, addr, end), addr != end);

spin_unlock(src_ptl);
pte_unmap_nested(src_pte - 1);
@@ -561,6 +564,7 @@ static unsigned long zap_pte_range(struc
int anon_rss = 0;

pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
+ pte_prefetch_start(pte, addr, end);
do {
pte_t ptent = *pte;
if (pte_none(ptent)) {
@@ -629,7 +633,8 @@ static unsigned long zap_pte_range(struc
if (!pte_file(ptent))
free_swap_and_cache(pte_to_swp_entry(ptent));
pte_clear_full(mm, addr, pte, tlb->fullmm);
- } while (pte++, addr += PAGE_SIZE, (addr != end && *zap_work > 0));
+ } while (pte++, addr += PAGE_SIZE, pte_prefetch_next(pte, addr, end),
+ (addr != end && *zap_work > 0));

add_mm_rss(mm, file_rss, anon_rss);
pte_unmap_unlock(pte - 1, ptl);

Attachments:

mm-pte-prefetch.patch (2.59 kB)

2005-11-06 08:18:32

by Nick Piggin

[permalink] [raw]

Nick Piggin <[email protected]> writes:

> 9/14
>
> --
> SUSE Labs, Novell Inc.
>
> Optimise page_state manipulations by introducing a direct accessor
> to page_state fields without disabling interrupts, in which case
> the callers must provide their own locking (either disable interrupts
> or not update from interrupt context).

I have a patchkit (which i need to update for the current kernel)
which replaces this with local_t. Gives much better code and is much
simpler and doesn't require turning off interrupts anywhere.

-Andi