2009-06-09 19:03:58

by Johannes Weiner

[permalink] [raw]
Subject: [patch v3] swap: virtual swap readahead

[resend with lists cc'd, sorry]

Hi,

here is a new iteration of the virtual swap readahead. Per Hugh's
suggestion, I moved the pte collecting to the callsite and thus out
ouf swap code. Unfortunately, I had to bound page_cluster due to an
array of that many swap entries on the stack, but I think it is better
to limit the cluster size to a sane maximum than using dynamic
allocation for this purpose.

Thanks all for the helpful suggestions. KAMEZAWA-san and Minchan, I
didn't incorporate your ideas in this patch as I think they belong in
a different one with their own justifications. I didn't ignore them.

Hannes

---
The current swap readahead implementation reads a physically
contiguous group of swap slots around the faulting page to take
advantage of the disk head's position and in the hope that the
surrounding pages will be needed soon as well.

This works as long as the physical swap slot order approximates the
LRU order decently, otherwise it wastes memory and IO bandwidth to
read in pages that are unlikely to be needed soon.

However, the physical swap slot layout diverges from the LRU order
with increasing swap activity, i.e. high memory pressure situations,
and this is exactly the situation where swapin should not waste any
memory or IO bandwidth as both are the most contended resources at
this point.

Another approximation for LRU-relation is the VMA order as groups of
VMA-related pages are usually used together.

This patch combines both the physical and the virtual hint to get a
good approximation of pages that are sensible to read ahead.

When both diverge, we either read unrelated data, seek heavily for
related data, or, what this patch does, just decrease the readahead
efforts.

To achieve this, we have essentially two readahead windows of the same
size: one spans the virtual, the other one the physical neighborhood
of the faulting page. We only read where both areas overlap.

Signed-off-by: Johannes Weiner <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Wu Fengguang <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Minchan Kim <[email protected]>
---
include/linux/swap.h | 4 ++-
kernel/sysctl.c | 7 ++++-
mm/memory.c | 55 +++++++++++++++++++++++++++++++++++++++++
mm/shmem.c | 4 +--
mm/swap_state.c | 67 ++++++++++++++++++++++++++++++++++++++-------------
5 files changed, 116 insertions(+), 21 deletions(-)

version 3:
o move pte selection to callee (per Hugh)
o limit ra ptes to one pmd entry to avoid multiple
locking/mapping of highptes (per Hugh)

version 2:
o fall back to physical ra window for shmem
o add documentation to the new ra algorithm (per Andrew)

--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -327,27 +327,14 @@ struct page *read_swap_cache_async(swp_e
return found_page;
}

-/**
- * swapin_readahead - swap in pages in hope we need them soon
- * @entry: swap entry of this memory
- * @gfp_mask: memory allocation flags
- * @vma: user vma this address belongs to
- * @addr: target address for mempolicy
- *
- * Returns the struct page for entry and addr, after queueing swapin.
- *
+/*
* Primitive swap readahead code. We simply read an aligned block of
* (1 << page_cluster) entries in the swap area. This method is chosen
* because it doesn't cost us any seek time. We also make sure to queue
* the 'original' request together with the readahead ones...
- *
- * This has been extended to use the NUMA policies from the mm triggering
- * the readahead.
- *
- * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
*/
-struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
- struct vm_area_struct *vma, unsigned long addr)
+static struct page *swapin_readahead_phys(swp_entry_t entry, gfp_t gfp_mask,
+ struct vm_area_struct *vma, unsigned long addr)
{
int nr_pages;
struct page *page;
@@ -373,3 +360,51 @@ struct page *swapin_readahead(swp_entry_
lru_add_drain(); /* Push any new pages onto the LRU now */
return read_swap_cache_async(entry, gfp_mask, vma, addr);
}
+
+/**
+ * swapin_readahead - swap in pages in hope we need them soon
+ * @entry: swap entry of this memory
+ * @gfp_mask: memory allocation flags
+ * @vma: user vma this address belongs to
+ * @addr: target address for mempolicy
+ * @entries: swap slots to consider reading
+ * @nr_entries: number of @entries
+ * @cluster: readahead window size in swap slots
+ *
+ * Returns the struct page for entry and addr, after queueing swapin.
+ *
+ * This has been extended to use the NUMA policies from the mm
+ * triggering the readahead.
+ *
+ * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
+ */
+struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
+ struct vm_area_struct *vma, unsigned long addr,
+ swp_entry_t *entries, int nr_entries,
+ unsigned long cluster)
+{
+ unsigned long pmin, pmax;
+ int i;
+
+ if (!entries) /* XXX: shmem case */
+ return swapin_readahead_phys(entry, gfp_mask, vma, addr);
+ pmin = swp_offset(entry) & ~(cluster - 1);
+ pmax = pmin + cluster;
+ for (i = 0; i < nr_entries; i++) {
+ swp_entry_t swp = entries[i];
+ struct page *page;
+
+ if (swp_type(swp) != swp_type(entry))
+ continue;
+ if (swp_offset(swp) > pmax)
+ continue;
+ if (swp_offset(swp) < pmin)
+ continue;
+ page = read_swap_cache_async(swp, gfp_mask, vma, addr);
+ if (!page)
+ break;
+ page_cache_release(page);
+ }
+ lru_add_drain(); /* Push any new pages onto the LRU now */
+ return read_swap_cache_async(entry, gfp_mask, vma, addr);
+}
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -292,7 +292,9 @@ extern struct page *lookup_swap_cache(sw
extern struct page *read_swap_cache_async(swp_entry_t, gfp_t,
struct vm_area_struct *vma, unsigned long addr);
extern struct page *swapin_readahead(swp_entry_t, gfp_t,
- struct vm_area_struct *vma, unsigned long addr);
+ struct vm_area_struct *vma, unsigned long addr,
+ swp_entry_t *entries, int nr_entries,
+ unsigned long cluster);

/* linux/mm/swapfile.c */
extern long nr_swap_pages;
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2440,6 +2440,54 @@ int vmtruncate_range(struct inode *inode
}

/*
+ * The readahead window is the virtual area around the faulting page,
+ * where the physical proximity of the swap slots is taken into
+ * account as well in swapin_readahead().
+ *
+ * While the swap allocation algorithm tries to keep LRU-related pages
+ * together on the swap backing, it is not reliable on heavy thrashing
+ * systems where concurrent reclaimers allocate swap slots and/or most
+ * anonymous memory pages are already in swap cache.
+ *
+ * On the virtual side, subgroups of VMA-related pages are usually
+ * used together, which gives another hint to LRU relationship.
+ *
+ * By taking both aspects into account, we get a good approximation of
+ * which pages are sensible to read together with the faulting one.
+ */
+static int swap_readahead_ptes(struct mm_struct *mm,
+ unsigned long addr, pmd_t *pmd,
+ swp_entry_t *entries,
+ unsigned long cluster)
+{
+ unsigned long window, min, max, limit;
+ spinlock_t *ptl;
+ pte_t *ptep;
+ int i, nr;
+
+ window = cluster << PAGE_SHIFT;
+ min = addr & ~(window - 1);
+ max = min + cluster;
+ /*
+ * To keep the locking/highpte mapping simple, stay
+ * within the PTE range of one PMD entry.
+ */
+ limit = addr & PMD_MASK;
+ if (limit > min)
+ min = limit;
+ limit = pmd_addr_end(addr, max);
+ if (limit < max)
+ max = limit;
+ limit = max - min;
+ ptep = pte_offset_map_lock(mm, pmd, min, &ptl);
+ for (i = nr = 0; i < limit; i++)
+ if (is_swap_pte(ptep[i]))
+ entries[nr++] = pte_to_swp_entry(ptep[i]);
+ pte_unmap_unlock(ptep, ptl);
+ return nr;
+}
+
+/*
* We enter with non-exclusive mmap_sem (to exclude vma changes,
* but allow concurrent faults), and pte mapped but not yet locked.
* We return with mmap_sem still held, but pte unmapped and unlocked.
@@ -2466,9 +2514,14 @@ static int do_swap_page(struct mm_struct
delayacct_set_flag(DELAYACCT_PF_SWAPIN);
page = lookup_swap_cache(entry);
if (!page) {
+ int nr, cluster = 1 << page_cluster;
+ swp_entry_t entries[cluster];
+
grab_swap_token(); /* Contend for token _before_ read-in */
+ nr = swap_readahead_ptes(mm, address, pmd, entries, cluster);
page = swapin_readahead(entry,
- GFP_HIGHUSER_MOVABLE, vma, address);
+ GFP_HIGHUSER_MOVABLE, vma, address,
+ entries, nr, cluster);
if (!page) {
/*
* Back out if somebody else faulted in this pte
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1148,7 +1148,7 @@ static struct page *shmem_swapin(swp_ent
pvma.vm_pgoff = idx;
pvma.vm_ops = NULL;
pvma.vm_policy = spol;
- page = swapin_readahead(entry, gfp, &pvma, 0);
+ page = swapin_readahead(entry, gfp, &pvma, 0, NULL, 0, 0);
return page;
}

@@ -1178,7 +1178,7 @@ static inline void shmem_show_mpol(struc
static inline struct page *shmem_swapin(swp_entry_t entry, gfp_t gfp,
struct shmem_inode_info *info, unsigned long idx)
{
- return swapin_readahead(entry, gfp, NULL, 0);
+ return swapin_readahead(entry, gfp, NULL, 0, NULL, 0, 0);
}

static inline struct page *shmem_alloc_page(gfp_t gfp,
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -112,6 +112,8 @@ static int min_percpu_pagelist_fract = 8

static int ngroups_max = NGROUPS_MAX;

+static int page_cluster_max = 5;
+
#ifdef CONFIG_MODULES
extern char modprobe_path[];
#endif
@@ -966,7 +968,10 @@ static struct ctl_table vm_table[] = {
.data = &page_cluster,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = &proc_dointvec,
+ .proc_handler = &proc_dointvec_minmax,
+ .strategy = &sysctl_intvec,
+ .extra1 = &zero,
+ .extra2 = &page_cluster_max,
},
{
.ctl_name = VM_DIRTY_BACKGROUND,


2009-06-09 19:39:40

by Johannes Weiner

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> [resend with lists cc'd, sorry]

[and fixed Hugh's email. crap]

> Hi,
>
> here is a new iteration of the virtual swap readahead. Per Hugh's
> suggestion, I moved the pte collecting to the callsite and thus out
> ouf swap code. Unfortunately, I had to bound page_cluster due to an
> array of that many swap entries on the stack, but I think it is better
> to limit the cluster size to a sane maximum than using dynamic
> allocation for this purpose.
>
> Thanks all for the helpful suggestions. KAMEZAWA-san and Minchan, I
> didn't incorporate your ideas in this patch as I think they belong in
> a different one with their own justifications. I didn't ignore them.
>
> Hannes
>
> ---
> The current swap readahead implementation reads a physically
> contiguous group of swap slots around the faulting page to take
> advantage of the disk head's position and in the hope that the
> surrounding pages will be needed soon as well.
>
> This works as long as the physical swap slot order approximates the
> LRU order decently, otherwise it wastes memory and IO bandwidth to
> read in pages that are unlikely to be needed soon.
>
> However, the physical swap slot layout diverges from the LRU order
> with increasing swap activity, i.e. high memory pressure situations,
> and this is exactly the situation where swapin should not waste any
> memory or IO bandwidth as both are the most contended resources at
> this point.
>
> Another approximation for LRU-relation is the VMA order as groups of
> VMA-related pages are usually used together.
>
> This patch combines both the physical and the virtual hint to get a
> good approximation of pages that are sensible to read ahead.
>
> When both diverge, we either read unrelated data, seek heavily for
> related data, or, what this patch does, just decrease the readahead
> efforts.
>
> To achieve this, we have essentially two readahead windows of the same
> size: one spans the virtual, the other one the physical neighborhood
> of the faulting page. We only read where both areas overlap.
>
> Signed-off-by: Johannes Weiner <[email protected]>
> Reviewed-by: Rik van Riel <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Cc: Wu Fengguang <[email protected]>
> Cc: KAMEZAWA Hiroyuki <[email protected]>
> Cc: Minchan Kim <[email protected]>
> ---
> include/linux/swap.h | 4 ++-
> kernel/sysctl.c | 7 ++++-
> mm/memory.c | 55 +++++++++++++++++++++++++++++++++++++++++
> mm/shmem.c | 4 +--
> mm/swap_state.c | 67 ++++++++++++++++++++++++++++++++++++++-------------
> 5 files changed, 116 insertions(+), 21 deletions(-)
>
> version 3:
> o move pte selection to callee (per Hugh)
> o limit ra ptes to one pmd entry to avoid multiple
> locking/mapping of highptes (per Hugh)
>
> version 2:
> o fall back to physical ra window for shmem
> o add documentation to the new ra algorithm (per Andrew)
>
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -327,27 +327,14 @@ struct page *read_swap_cache_async(swp_e
> return found_page;
> }
>
> -/**
> - * swapin_readahead - swap in pages in hope we need them soon
> - * @entry: swap entry of this memory
> - * @gfp_mask: memory allocation flags
> - * @vma: user vma this address belongs to
> - * @addr: target address for mempolicy
> - *
> - * Returns the struct page for entry and addr, after queueing swapin.
> - *
> +/*
> * Primitive swap readahead code. We simply read an aligned block of
> * (1 << page_cluster) entries in the swap area. This method is chosen
> * because it doesn't cost us any seek time. We also make sure to queue
> * the 'original' request together with the readahead ones...
> - *
> - * This has been extended to use the NUMA policies from the mm triggering
> - * the readahead.
> - *
> - * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
> */
> -struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
> - struct vm_area_struct *vma, unsigned long addr)
> +static struct page *swapin_readahead_phys(swp_entry_t entry, gfp_t gfp_mask,
> + struct vm_area_struct *vma, unsigned long addr)
> {
> int nr_pages;
> struct page *page;
> @@ -373,3 +360,51 @@ struct page *swapin_readahead(swp_entry_
> lru_add_drain(); /* Push any new pages onto the LRU now */
> return read_swap_cache_async(entry, gfp_mask, vma, addr);
> }
> +
> +/**
> + * swapin_readahead - swap in pages in hope we need them soon
> + * @entry: swap entry of this memory
> + * @gfp_mask: memory allocation flags
> + * @vma: user vma this address belongs to
> + * @addr: target address for mempolicy
> + * @entries: swap slots to consider reading
> + * @nr_entries: number of @entries
> + * @cluster: readahead window size in swap slots
> + *
> + * Returns the struct page for entry and addr, after queueing swapin.
> + *
> + * This has been extended to use the NUMA policies from the mm
> + * triggering the readahead.
> + *
> + * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
> + */
> +struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
> + struct vm_area_struct *vma, unsigned long addr,
> + swp_entry_t *entries, int nr_entries,
> + unsigned long cluster)
> +{
> + unsigned long pmin, pmax;
> + int i;
> +
> + if (!entries) /* XXX: shmem case */
> + return swapin_readahead_phys(entry, gfp_mask, vma, addr);
> + pmin = swp_offset(entry) & ~(cluster - 1);
> + pmax = pmin + cluster;
> + for (i = 0; i < nr_entries; i++) {
> + swp_entry_t swp = entries[i];
> + struct page *page;
> +
> + if (swp_type(swp) != swp_type(entry))
> + continue;
> + if (swp_offset(swp) > pmax)
> + continue;
> + if (swp_offset(swp) < pmin)
> + continue;
> + page = read_swap_cache_async(swp, gfp_mask, vma, addr);
> + if (!page)
> + break;
> + page_cache_release(page);
> + }
> + lru_add_drain(); /* Push any new pages onto the LRU now */
> + return read_swap_cache_async(entry, gfp_mask, vma, addr);
> +}
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -292,7 +292,9 @@ extern struct page *lookup_swap_cache(sw
> extern struct page *read_swap_cache_async(swp_entry_t, gfp_t,
> struct vm_area_struct *vma, unsigned long addr);
> extern struct page *swapin_readahead(swp_entry_t, gfp_t,
> - struct vm_area_struct *vma, unsigned long addr);
> + struct vm_area_struct *vma, unsigned long addr,
> + swp_entry_t *entries, int nr_entries,
> + unsigned long cluster);
>
> /* linux/mm/swapfile.c */
> extern long nr_swap_pages;
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2440,6 +2440,54 @@ int vmtruncate_range(struct inode *inode
> }
>
> /*
> + * The readahead window is the virtual area around the faulting page,
> + * where the physical proximity of the swap slots is taken into
> + * account as well in swapin_readahead().
> + *
> + * While the swap allocation algorithm tries to keep LRU-related pages
> + * together on the swap backing, it is not reliable on heavy thrashing
> + * systems where concurrent reclaimers allocate swap slots and/or most
> + * anonymous memory pages are already in swap cache.
> + *
> + * On the virtual side, subgroups of VMA-related pages are usually
> + * used together, which gives another hint to LRU relationship.
> + *
> + * By taking both aspects into account, we get a good approximation of
> + * which pages are sensible to read together with the faulting one.
> + */
> +static int swap_readahead_ptes(struct mm_struct *mm,
> + unsigned long addr, pmd_t *pmd,
> + swp_entry_t *entries,
> + unsigned long cluster)
> +{
> + unsigned long window, min, max, limit;
> + spinlock_t *ptl;
> + pte_t *ptep;
> + int i, nr;
> +
> + window = cluster << PAGE_SHIFT;
> + min = addr & ~(window - 1);
> + max = min + cluster;
> + /*
> + * To keep the locking/highpte mapping simple, stay
> + * within the PTE range of one PMD entry.
> + */
> + limit = addr & PMD_MASK;
> + if (limit > min)
> + min = limit;
> + limit = pmd_addr_end(addr, max);
> + if (limit < max)
> + max = limit;
> + limit = max - min;
> + ptep = pte_offset_map_lock(mm, pmd, min, &ptl);
> + for (i = nr = 0; i < limit; i++)
> + if (is_swap_pte(ptep[i]))
> + entries[nr++] = pte_to_swp_entry(ptep[i]);
> + pte_unmap_unlock(ptep, ptl);
> + return nr;
> +}
> +
> +/*
> * We enter with non-exclusive mmap_sem (to exclude vma changes,
> * but allow concurrent faults), and pte mapped but not yet locked.
> * We return with mmap_sem still held, but pte unmapped and unlocked.
> @@ -2466,9 +2514,14 @@ static int do_swap_page(struct mm_struct
> delayacct_set_flag(DELAYACCT_PF_SWAPIN);
> page = lookup_swap_cache(entry);
> if (!page) {
> + int nr, cluster = 1 << page_cluster;
> + swp_entry_t entries[cluster];
> +
> grab_swap_token(); /* Contend for token _before_ read-in */
> + nr = swap_readahead_ptes(mm, address, pmd, entries, cluster);
> page = swapin_readahead(entry,
> - GFP_HIGHUSER_MOVABLE, vma, address);
> + GFP_HIGHUSER_MOVABLE, vma, address,
> + entries, nr, cluster);
> if (!page) {
> /*
> * Back out if somebody else faulted in this pte
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1148,7 +1148,7 @@ static struct page *shmem_swapin(swp_ent
> pvma.vm_pgoff = idx;
> pvma.vm_ops = NULL;
> pvma.vm_policy = spol;
> - page = swapin_readahead(entry, gfp, &pvma, 0);
> + page = swapin_readahead(entry, gfp, &pvma, 0, NULL, 0, 0);
> return page;
> }
>
> @@ -1178,7 +1178,7 @@ static inline void shmem_show_mpol(struc
> static inline struct page *shmem_swapin(swp_entry_t entry, gfp_t gfp,
> struct shmem_inode_info *info, unsigned long idx)
> {
> - return swapin_readahead(entry, gfp, NULL, 0);
> + return swapin_readahead(entry, gfp, NULL, 0, NULL, 0, 0);
> }
>
> static inline struct page *shmem_alloc_page(gfp_t gfp,
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -112,6 +112,8 @@ static int min_percpu_pagelist_fract = 8
>
> static int ngroups_max = NGROUPS_MAX;
>
> +static int page_cluster_max = 5;
> +
> #ifdef CONFIG_MODULES
> extern char modprobe_path[];
> #endif
> @@ -966,7 +968,10 @@ static struct ctl_table vm_table[] = {
> .data = &page_cluster,
> .maxlen = sizeof(int),
> .mode = 0644,
> - .proc_handler = &proc_dointvec,
> + .proc_handler = &proc_dointvec_minmax,
> + .strategy = &sysctl_intvec,
> + .extra1 = &zero,
> + .extra2 = &page_cluster_max,
> },
> {
> .ctl_name = VM_DIRTY_BACKGROUND,
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2009-06-10 05:03:54

by Fengguang Wu

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Wed, Jun 10, 2009 at 03:37:02AM +0800, Johannes Weiner wrote:
> On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > [resend with lists cc'd, sorry]
>
> [and fixed Hugh's email. crap]
>
> > Hi,
> >
> > here is a new iteration of the virtual swap readahead. Per Hugh's
> > suggestion, I moved the pte collecting to the callsite and thus out
> > ouf swap code. Unfortunately, I had to bound page_cluster due to an
> > array of that many swap entries on the stack, but I think it is better
> > to limit the cluster size to a sane maximum than using dynamic
> > allocation for this purpose.

Hi Johannes,

When stress testing your patch, I found it triggered many OOM kills.
Around the time of last OOMs, the memory usage is:

total used free shared buffers cached
Mem: 474 468 5 0 0 239
-/+ buffers/cache: 229 244
Swap: 1023 221 802

Thanks,
Fengguang
---

full kernel log:

[ 472.528487] /usr/games/glch invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 472.537228] Pid: 4361, comm: /usr/games/glch Not tainted 2.6.30-rc8-mm1 #301
[ 472.544293] Call Trace:
[ 472.546762] [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[ 472.552259] [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[ 472.558010] [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[ 472.563250] [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[ 472.568991] [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[ 472.574499] [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 472.580858] [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[ 472.586871] [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[ 472.592614] [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[ 472.599222] [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[ 472.605926] [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[ 472.610987] [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[ 472.616558] [<ffffffff810d9883>] __do_fault+0x53/0x510
[ 472.621786] [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[ 472.627874] [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[ 472.633658] [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[ 472.639258] [<ffffffff81545a55>] page_fault+0x25/0x30
[ 472.644413] Mem-Info:
[ 472.646698] Node 0 DMA per-cpu:
[ 472.649855] CPU 0: hi: 0, btch: 1 usd: 0
[ 472.654649] CPU 1: hi: 0, btch: 1 usd: 0
[ 472.659439] Node 0 DMA32 per-cpu:
[ 472.662774] CPU 0: hi: 186, btch: 31 usd: 114
[ 472.667560] CPU 1: hi: 186, btch: 31 usd: 81
[ 472.672350] Active_anon:43340 active_file:774 inactive_anon:46297
[ 472.672351] inactive_file:2095 unevictable:4 dirty:0 writeback:0 unstable:0
[ 472.672352] free:1334 slab:13888 mapped:3528 pagetables:7580 bounce:0
[ 472.692012] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:4892kB inactive_anon:6200kB active_file:12kB inactive_file:172kB unevictable:0kB present:15164kB pages_scanned:6752 all_unreclaimable? no
[ 472.711031] lowmem_reserve[]: 0 483 483 483
[ 472.715313] Node 0 DMA32 free:3320kB min:2768kB low:3460kB high:4152kB active_anon:168468kB inactive_anon:179064kB active_file:3084kB inactive_file:8208kB unevictable:16kB present:495008kB pages_scanned:265856 all_unreclaimable? no
[ 472.735793] lowmem_reserve[]: 0 0 0 0
[ 472.739546] Node 0 DMA: 21*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2020kB
[ 472.750386] Node 0 DMA32: 220*4kB 23*8kB 17*16kB 14*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 3320kB
[ 472.761754] 63776 total pagecache pages
[ 472.765589] 9263 pages in swap cache
[ 472.769162] Swap cache stats: add 166054, delete 156791, find 14174/51560
[ 472.775943] Free swap = 689708kB
[ 472.779264] Total swap = 1048568kB
[ 472.786832] 131072 pages RAM
[ 472.789713] 9628 pages reserved
[ 472.792861] 86958 pages shared
[ 472.795921] 56805 pages non-shared
[ 472.799325] Out of memory: kill process 3514 (run-many-x-apps) score 1495085 or a child
[ 472.807327] Killed process 3516 (xeyes)
[ 473.861300] gnobots2 invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[ 473.868615] Pid: 4533, comm: gnobots2 Not tainted 2.6.30-rc8-mm1 #301
[ 473.875196] Call Trace:
[ 473.877669] [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[ 473.883155] [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[ 473.888919] [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[ 473.894141] [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[ 473.899881] [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[ 473.905362] [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 473.911711] [<ffffffff810f3f76>] alloc_page_vma+0x86/0x1c0
[ 473.917276] [<ffffffff810e9ce8>] read_swap_cache_async+0xd8/0x120
[ 473.923451] [<ffffffff810e9de5>] swapin_readahead+0xb5/0x170
[ 473.929194] [<ffffffff810dac5d>] do_swap_page+0x3fd/0x500
[ 473.934677] [<ffffffff810e9913>] ? lookup_swap_cache+0x13/0x30
[ 473.940585] [<ffffffff810da8da>] ? do_swap_page+0x7a/0x500
[ 473.946152] [<ffffffff810dc70e>] handle_mm_fault+0x44e/0x500
[ 473.951898] [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[ 473.957464] [<ffffffff81545a55>] page_fault+0x25/0x30
[ 473.962601] Mem-Info:
[ 473.964870] Node 0 DMA per-cpu:
[ 473.968036] CPU 0: hi: 0, btch: 1 usd: 0
[ 473.972818] CPU 1: hi: 0, btch: 1 usd: 0
[ 473.977601] Node 0 DMA32 per-cpu:
[ 473.980930] CPU 0: hi: 186, btch: 31 usd: 78
[ 473.985718] CPU 1: hi: 186, btch: 31 usd: 79
[ 473.990512] Active_anon:43366 active_file:728 inactive_anon:46639
[ 473.990513] inactive_file:2442 unevictable:4 dirty:0 writeback:0 unstable:0
[ 473.990515] free:1187 slab:13677 mapped:3344 pagetables:7560 bounce:0
[ 474.010136] Node 0 DMA free:2008kB min:84kB low:104kB high:124kB active_anon:4872kB inactive_anon:6360kB active_file:28kB inactive_file:96kB unevictable:0kB present:15164kB pages_scanned:15568 all_unreclaimable? no
[ 474.029143] lowmem_reserve[]: 0 483 483 483
[ 474.033403] Node 0 DMA32 free:2740kB min:2768kB low:3460kB high:4152kB active_anon:168592kB inactive_anon:180308kB active_file:2884kB inactive_file:9672kB unevictable:16kB present:495008kB pages_scanned:627904 all_unreclaimable? yes
[ 474.053974] lowmem_reserve[]: 0 0 0 0
[ 474.057721] Node 0 DMA: 16*4kB 3*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2008kB
[ 474.068556] Node 0 DMA32: 105*4kB 6*8kB 16*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2740kB
[ 474.079825] 64075 total pagecache pages
[ 474.083660] 9277 pages in swap cache
[ 474.087235] Swap cache stats: add 166129, delete 156852, find 14175/51619
[ 474.094011] Free swap = 690168kB
[ 474.097327] Total swap = 1048568kB
[ 474.104333] 131072 pages RAM
[ 474.107225] 9628 pages reserved
[ 474.110363] 84659 pages shared
[ 474.113409] 57530 pages non-shared
[ 474.116816] Out of memory: kill process 3514 (run-many-x-apps) score 1490267 or a child
[ 474.124811] Killed process 3593 (gthumb)
[ 480.443446] gnome-network-p invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[ 480.451749] Pid: 5242, comm: gnome-network-p Not tainted 2.6.30-rc8-mm1 #301
[ 480.458883] Call Trace:
[ 480.461362] [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[ 480.467248] [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[ 480.473025] [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[ 480.478294] [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[ 480.484050] [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[ 480.489546] [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 480.495920] [<ffffffff810f3f76>] alloc_page_vma+0x86/0x1c0
[ 480.501509] [<ffffffff810e9ce8>] read_swap_cache_async+0xd8/0x120
[ 480.507718] [<ffffffff810e9de5>] swapin_readahead+0xb5/0x170
[ 480.513477] [<ffffffff810dac5d>] do_swap_page+0x3fd/0x500
[ 480.518982] [<ffffffff810e9913>] ? lookup_swap_cache+0x13/0x30
[ 480.524917] [<ffffffff810da8da>] ? do_swap_page+0x7a/0x500
[ 480.530515] [<ffffffff810dc70e>] handle_mm_fault+0x44e/0x500
[ 480.536273] [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[ 480.541865] [<ffffffff81545a55>] page_fault+0x25/0x30
[ 480.547023] Mem-Info:
[ 480.549305] Node 0 DMA per-cpu:
[ 480.552485] CPU 0: hi: 0, btch: 1 usd: 0
[ 480.557293] CPU 1: hi: 0, btch: 1 usd: 0
[ 480.562106] Node 0 DMA32 per-cpu:
[ 480.565450] CPU 0: hi: 186, btch: 31 usd: 166
[ 480.570260] CPU 1: hi: 186, btch: 31 usd: 54
[ 480.575072] Active_anon:43200 active_file:1328 inactive_anon:46633
[ 480.575077] inactive_file:2266 unevictable:4 dirty:0 writeback:0 unstable:0
[ 480.575081] free:1175 slab:13522 mapped:4094 pagetables:7430 bounce:0
[ 480.594826] Node 0 DMA free:2004kB min:84kB low:104kB high:124kB active_anon:5048kB inactive_anon:6228kB active_file:24kB inactive_file:92kB unevictable:0kB present:15164kB pages_scanned:20576 all_unreclaimable? yes
[ 480.613968] lowmem_reserve[]: 0 483 483 483
[ 480.618302] Node 0 DMA32 free:2696kB min:2768kB low:3460kB high:4152kB active_anon:167804kB inactive_anon:180304kB active_file:5324kB inactive_file:9012kB unevictable:16kB present:495008kB pages_scanned:698592 all_unreclaimable? yes
[ 480.638902] lowmem_reserve[]: 0 0 0 0
[ 480.642709] Node 0 DMA: 15*4kB 1*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1988kB
[ 480.653661] Node 0 DMA32: 100*4kB 5*8kB 15*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2696kB
[ 480.665062] 64296 total pagecache pages
[ 480.668909] 9027 pages in swap cache
[ 480.672486] Swap cache stats: add 166520, delete 157493, find 14190/51963
[ 480.679265] Free swap = 697604kB
[ 480.682590] Total swap = 1048568kB
[ 480.692920] 131072 pages RAM
[ 480.695835] 9628 pages reserved
[ 480.698989] 83496 pages shared
[ 480.702055] 56997 pages non-shared
[ 480.705460] Out of memory: kill process 3514 (run-many-x-apps) score 1233725 or a child
[ 480.713480] Killed process 3620 (gedit)
[ 485.239788] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 485.247180] Pid: 3407, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #301
[ 485.253879] Call Trace:
[ 485.256340] [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[ 485.261825] [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[ 485.267587] [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[ 485.272810] [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[ 485.278556] [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[ 485.284034] [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 485.290383] [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[ 485.296384] [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[ 485.302127] [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[ 485.308729] [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[ 485.315421] [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[ 485.320471] [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[ 485.326044] [<ffffffff810d9883>] __do_fault+0x53/0x510
[ 485.331264] [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[ 485.337348] [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[ 485.343091] [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[ 485.348660] [<ffffffff81545a55>] page_fault+0x25/0x30
[ 485.353794] Mem-Info:
[ 485.356074] Node 0 DMA per-cpu:
[ 485.359238] CPU 0: hi: 0, btch: 1 usd: 0
[ 485.364022] CPU 1: hi: 0, btch: 1 usd: 0
[ 485.368805] Node 0 DMA32 per-cpu:
[ 485.372130] CPU 0: hi: 186, btch: 31 usd: 86
[ 485.376917] CPU 1: hi: 186, btch: 31 usd: 65
[ 485.381704] Active_anon:43069 active_file:1343 inactive_anon:46566
[ 485.381705] inactive_file:2264 unevictable:4 dirty:0 writeback:0 unstable:0
[ 485.381706] free:1177 slab:13765 mapped:3976 pagetables:7336 bounce:0
[ 485.401416] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5096kB inactive_anon:6228kB active_file:24kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:14624 all_unreclaimable? no
[ 485.420352] lowmem_reserve[]: 0 483 483 483
[ 485.424627] Node 0 DMA32 free:2708kB min:2768kB low:3460kB high:4152kB active_anon:167180kB inactive_anon:180036kB active_file:5348kB inactive_file:9072kB unevictable:16kB present:495008kB pages_scanned:700592 all_unreclaimable? yes
[ 485.445209] lowmem_reserve[]: 0 0 0 0
[ 485.448983] Node 0 DMA: 25*4kB 1*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[ 485.459812] Node 0 DMA32: 97*4kB 8*8kB 15*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2708kB
[ 485.470995] 64132 total pagecache pages
[ 485.474826] 8910 pages in swap cache
[ 485.478397] Swap cache stats: add 166970, delete 158060, find 14213/52337
[ 485.485171] Free swap = 704464kB
[ 485.488481] Total swap = 1048568kB
[ 485.495505] 131072 pages RAM
[ 485.498400] 9628 pages reserved
[ 485.501539] 80730 pages shared
[ 485.504593] 57330 pages non-shared
[ 485.507994] Out of memory: kill process 3514 (run-many-x-apps) score 1208843 or a child
[ 485.515986] Killed process 3653 (xpdf.bin)
[ 487.520227] blackjack invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[ 487.527723] Pid: 4579, comm: blackjack Not tainted 2.6.30-rc8-mm1 #301
[ 487.534650] Call Trace:
[ 487.537290] [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[ 487.542782] [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[ 487.548533] [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[ 487.553767] [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[ 487.559522] [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[ 487.565003] [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 487.571353] [<ffffffff810f3f76>] alloc_page_vma+0x86/0x1c0
[ 487.576933] [<ffffffff810e9ce8>] read_swap_cache_async+0xd8/0x120
[ 487.583117] [<ffffffff810e9e19>] swapin_readahead+0xe9/0x170
[ 487.588860] [<ffffffff810d1167>] shmem_getpage+0x607/0x970
[ 487.594432] [<ffffffff810a9c8b>] ? delayacct_end+0x6b/0xa0
[ 487.600003] [<ffffffff810a9caa>] ? delayacct_end+0x8a/0xa0
[ 487.605571] [<ffffffff810a9d2f>] ? __delayacct_blkio_end+0x2f/0x50
[ 487.611837] [<ffffffff81542132>] ? io_schedule+0x82/0xb0
[ 487.617229] [<ffffffff8107ca35>] ? print_lock_contention_bug+0x25/0x120
[ 487.623927] [<ffffffff810c0970>] ? sync_page+0x0/0x80
[ 487.629060] [<ffffffff810c0700>] ? find_get_page+0x0/0x110
[ 487.634633] [<ffffffff81052702>] ? current_fs_time+0x22/0x30
[ 487.640372] [<ffffffff810d9983>] ? __do_fault+0x153/0x510
[ 487.645849] [<ffffffff8107ca35>] ? print_lock_contention_bug+0x25/0x120
[ 487.652542] [<ffffffff810d151a>] shmem_fault+0x4a/0x80
[ 487.657762] [<ffffffff812444a9>] shm_fault+0x19/0x20
[ 487.662819] [<ffffffff810d9883>] __do_fault+0x53/0x510
[ 487.668036] [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[ 487.674125] [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[ 487.679867] [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[ 487.685434] [<ffffffff81545a55>] page_fault+0x25/0x30
[ 487.690570] Mem-Info:
[ 487.692836] Node 0 DMA per-cpu:
[ 487.696003] CPU 0: hi: 0, btch: 1 usd: 0
[ 487.700790] CPU 1: hi: 0, btch: 1 usd: 0
[ 487.705578] Node 0 DMA32 per-cpu:
[ 487.708906] CPU 0: hi: 186, btch: 31 usd: 142
[ 487.713698] CPU 1: hi: 186, btch: 31 usd: 77
[ 487.718498] Active_anon:42533 active_file:677 inactive_anon:46561
[ 487.718499] inactive_file:3214 unevictable:4 dirty:0 writeback:0 unstable:0
[ 487.718500] free:1573 slab:13680 mapped:3351 pagetables:7308 bounce:0
[ 487.738125] Node 0 DMA free:2064kB min:84kB low:104kB high:124kB active_anon:5152kB inactive_anon:6328kB active_file:8kB inactive_file:92kB unevictable:0kB present:15164kB pages_scanned:1586 all_unreclaimable? no
[ 487.756958] lowmem_reserve[]: 0 483 483 483
[ 487.761221] Node 0 DMA32 free:4228kB min:2768kB low:3460kB high:4152kB active_anon:164980kB inactive_anon:180068kB active_file:2700kB inactive_file:12764kB unevictable:16kB present:495008kB pages_scanned:42720 all_unreclaimable? no
[ 487.781711] lowmem_reserve[]: 0 0 0 0
[ 487.785458] Node 0 DMA: 37*4kB 2*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2068kB
[ 487.796294] Node 0 DMA32: 271*4kB 105*8kB 16*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 4228kB
[ 487.807722] 64270 total pagecache pages
[ 487.811557] 8728 pages in swap cache
[ 487.815132] Swap cache stats: add 167087, delete 158359, find 14218/52435
[ 487.821908] Free swap = 711028kB
[ 487.825220] Total swap = 1048568kB
[ 487.832277] 131072 pages RAM
[ 487.835178] 9628 pages reserved
[ 487.838317] 76338 pages shared
[ 487.841364] 57425 pages non-shared
[ 487.844768] Out of memory: kill process 3514 (run-many-x-apps) score 1201219 or a child
[ 487.852761] Killed process 3696 (xterm)
[ 487.857092] tty_ldisc_deref: no references.
[ 489.747066] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 489.754480] Pid: 5404, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #301
[ 489.761179] Call Trace:
[ 489.763640] [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[ 489.769123] [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[ 489.774870] [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[ 489.780090] [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[ 489.785830] [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[ 489.791315] [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 489.797665] [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[ 489.803672] [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[ 489.809409] [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[ 489.816020] [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[ 489.822723] [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[ 489.827771] [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[ 489.833338] [<ffffffff810d9883>] __do_fault+0x53/0x510
[ 489.838565] [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[ 489.844653] [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[ 489.850404] [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[ 489.855970] [<ffffffff81545a55>] page_fault+0x25/0x30
[ 489.861101] Mem-Info:
[ 489.863375] Node 0 DMA per-cpu:
[ 489.866538] CPU 0: hi: 0, btch: 1 usd: 0
[ 489.871327] CPU 1: hi: 0, btch: 1 usd: 0
[ 489.876114] Node 0 DMA32 per-cpu:
[ 489.879450] CPU 0: hi: 186, btch: 31 usd: 139
[ 489.884235] CPU 1: hi: 186, btch: 31 usd: 168
[ 489.889020] Active_anon:42548 active_file:713 inactive_anon:46654
[ 489.889022] inactive_file:3551 unevictable:4 dirty:0 writeback:0 unstable:0
[ 489.889023] free:1191 slab:13619 mapped:3463 pagetables:7277 bounce:0
[ 489.908648] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5156kB inactive_anon:6324kB active_file:0kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:18048 all_unreclaimable? yes
[ 489.927583] lowmem_reserve[]: 0 483 483 483
[ 489.931852] Node 0 DMA32 free:2764kB min:2768kB low:3460kB high:4152kB active_anon:165036kB inactive_anon:180292kB active_file:2852kB inactive_file:14204kB unevictable:16kB present:495008kB pages_scanned:598624 all_unreclaimable? yes
[ 489.952505] lowmem_reserve[]: 0 0 0 0
[ 489.956255] Node 0 DMA: 24*4kB 2*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[ 489.967104] Node 0 DMA32: 67*4kB 16*8kB 20*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2764kB
[ 489.978371] 64571 total pagecache pages
[ 489.982209] 8716 pages in swap cache
[ 489.985779] Swap cache stats: add 167160, delete 158444, find 14228/52496
[ 489.992561] Free swap = 712436kB
[ 489.995878] Total swap = 1048568kB
[ 490.003023] 131072 pages RAM
[ 490.005917] 9628 pages reserved
[ 490.009051] 77164 pages shared
[ 490.012111] 57863 pages non-shared
[ 490.015516] Out of memory: kill process 3514 (run-many-x-apps) score 1193943 or a child
[ 490.023514] Killed process 3789 (gnome-terminal)
[ 490.042359] gnome-terminal invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[ 490.050059] Pid: 3817, comm: gnome-terminal Not tainted 2.6.30-rc8-mm1 #301
[ 490.057019] Call Trace:
[ 490.059490] [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[ 490.064986] [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[ 490.070743] [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[ 490.075981] [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[ 490.081738] [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[ 490.087245] [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 490.093606] [<ffffffff810f3f76>] alloc_page_vma+0x86/0x1c0
[ 490.099200] [<ffffffff810e9ce8>] read_swap_cache_async+0xd8/0x120
[ 490.105390] [<ffffffff810e9de5>] swapin_readahead+0xb5/0x170
[ 490.111157] [<ffffffff810dac5d>] do_swap_page+0x3fd/0x500
[ 490.116651] [<ffffffff810e9913>] ? lookup_swap_cache+0x13/0x30
[ 490.122581] [<ffffffff810da8da>] ? do_swap_page+0x7a/0x500
[ 490.128166] [<ffffffff810dc70e>] handle_mm_fault+0x44e/0x500
[ 490.133932] [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[ 490.139510] [<ffffffff81545a55>] page_fault+0x25/0x30
[ 490.144658] [<ffffffff8127600c>] ? __get_user_8+0x1c/0x23
[ 490.150157] [<ffffffff810806ad>] ? exit_robust_list+0x5d/0x160
[ 490.156088] [<ffffffff81077c4d>] ? trace_hardirqs_off+0xd/0x10
[ 490.162026] [<ffffffff81544f97>] ? _spin_unlock_irqrestore+0x67/0x70
[ 490.168473] [<ffffffff8104ae5d>] mm_release+0xed/0x100
[ 490.173707] [<ffffffff8104f653>] exit_mm+0x23/0x150
[ 490.178684] [<ffffffff81544f1b>] ? _spin_unlock_irq+0x2b/0x40
[ 490.184528] [<ffffffff81051208>] do_exit+0x138/0x880
[ 490.189593] [<ffffffff8105e757>] ? get_signal_to_deliver+0x67/0x430
[ 490.195967] [<ffffffff81051998>] do_group_exit+0x48/0xd0
[ 490.201373] [<ffffffff8105e9d4>] get_signal_to_deliver+0x2e4/0x430
[ 490.207653] [<ffffffff8100b332>] do_notify_resume+0xc2/0x820
[ 490.213410] [<ffffffff81012859>] ? sched_clock+0x9/0x10
[ 490.218743] [<ffffffff81077c85>] ? lock_release_holdtime+0x35/0x1c0
[ 490.225102] [<ffffffff810fd768>] ? vfs_read+0xc8/0x1a0
[ 490.230340] [<ffffffff8100c057>] sysret_signal+0x83/0xd9
[ 490.235750] Mem-Info:
[ 490.238041] Node 0 DMA per-cpu:
[ 490.241213] CPU 0: hi: 0, btch: 1 usd: 0
[ 490.246023] CPU 1: hi: 0, btch: 1 usd: 0
[ 490.250817] Node 0 DMA32 per-cpu:
[ 490.254173] CPU 0: hi: 186, btch: 31 usd: 139
[ 490.258976] CPU 1: hi: 186, btch: 31 usd: 169
[ 490.263781] Active_anon:42548 active_file:713 inactive_anon:46660
[ 490.263784] inactive_file:3551 unevictable:4 dirty:0 writeback:0 unstable:0
[ 490.263787] free:1191 slab:13619 mapped:3463 pagetables:7277 bounce:0
[ 490.283433] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5156kB inactive_anon:6324kB active_file:0kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:18048 all_unreclaimable? yes
[ 490.302379] lowmem_reserve[]: 0 483 483 483
[ 490.306699] Node 0 DMA32 free:2764kB min:2768kB low:3460kB high:4152kB active_anon:165036kB inactive_anon:180316kB active_file:2852kB inactive_file:14204kB unevictable:16kB present:495008kB pages_scanned:616288 all_unreclaimable? yes
[ 490.327380] lowmem_reserve[]: 0 0 0 0
[ 490.331178] Node 0 DMA: 24*4kB 2*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[ 490.342134] Node 0 DMA32: 67*4kB 16*8kB 20*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2764kB
[ 490.353506] 64571 total pagecache pages
[ 490.357357] 8716 pages in swap cache
[ 490.360943] Swap cache stats: add 167160, delete 158444, find 14228/52497
[ 490.367735] Free swap = 712436kB
[ 490.371063] Total swap = 1048568kB
[ 490.381335] 131072 pages RAM
[ 490.384247] 9628 pages reserved
[ 490.387398] 77163 pages shared
[ 490.390461] 57864 pages non-shared
[ 491.721918] tty_ldisc_deref: no references.
[ 507.974133] Xorg invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 507.981095] Pid: 3308, comm: Xorg Not tainted 2.6.30-rc8-mm1 #301
[ 507.987465] Call Trace:
[ 507.990171] [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[ 507.995670] [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[ 508.001413] [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[ 508.006640] [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[ 508.012378] [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[ 508.017857] [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 508.024207] [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[ 508.030211] [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[ 508.035951] [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[ 508.042555] [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[ 508.049248] [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[ 508.054298] [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[ 508.059864] [<ffffffff810d9883>] __do_fault+0x53/0x510
[ 508.065082] [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[ 508.071170] [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[ 508.076916] [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[ 508.082488] [<ffffffff81545a55>] page_fault+0x25/0x30
[ 508.087617] Mem-Info:
[ 508.089890] Node 0 DMA per-cpu:
[ 508.093045] CPU 0: hi: 0, btch: 1 usd: 0
[ 508.097831] CPU 1: hi: 0, btch: 1 usd: 0
[ 508.102618] Node 0 DMA32 per-cpu:
[ 508.105949] CPU 0: hi: 186, btch: 31 usd: 70
[ 508.110732] CPU 1: hi: 186, btch: 31 usd: 35
[ 508.115518] Active_anon:43375 active_file:1606 inactive_anon:46595
[ 508.115519] inactive_file:2431 unevictable:4 dirty:0 writeback:0 unstable:0
[ 508.115520] free:1171 slab:13500 mapped:4464 pagetables:7137 bounce:0
[ 508.135223] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5372kB inactive_anon:6304kB active_file:48kB inactive_file:152kB unevictable:0kB present:15164kB pages_scanned:18016 all_unreclaimable? yes
[ 508.154402] lowmem_reserve[]: 0 483 483 483
[ 508.158670] Node 0 DMA32 free:2684kB min:2768kB low:3460kB high:4152kB active_anon:168128kB inactive_anon:180076kB active_file:6376kB inactive_file:9572kB unevictable:16kB present:495008kB pages_scanned:574528 all_unreclaimable? yes
[ 508.179230] lowmem_reserve[]: 0 0 0 0
[ 508.182977] Node 0 DMA: 20*4kB 2*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2000kB
[ 508.193806] Node 0 DMA32: 81*4kB 9*8kB 17*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2684kB
[ 508.204972] 64466 total pagecache pages
[ 508.208804] 8648 pages in swap cache
[ 508.212374] Swap cache stats: add 169110, delete 160462, find 14531/53889
[ 508.219151] Free swap = 723636kB
[ 508.222465] Total swap = 1048568kB
[ 508.229465] 131072 pages RAM
[ 508.232364] 9628 pages reserved
[ 508.235504] 80834 pages shared
[ 508.238558] 57150 pages non-shared
[ 508.241961] Out of memory: kill process 3514 (run-many-x-apps) score 1142844 or a child
[ 508.249954] Killed process 3828 (urxvt)
[ 508.254826] tty_ldisc_deref: no references.
[ 518.644007] /usr/games/gnom invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 518.652048] Pid: 4284, comm: /usr/games/gnom Not tainted 2.6.30-rc8-mm1 #301
[ 518.659110] Call Trace:
[ 518.661572] [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[ 518.667060] [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[ 518.672805] [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[ 518.678036] [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[ 518.683779] [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[ 518.689265] [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 518.695629] [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[ 518.701648] [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[ 518.707396] [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[ 518.714015] [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[ 518.720728] [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[ 518.725782] [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[ 518.731376] [<ffffffff810d9883>] __do_fault+0x53/0x510
[ 518.736610] [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[ 518.742724] [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[ 518.748470] [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[ 518.754050] [<ffffffff81545a55>] page_fault+0x25/0x30
[ 518.759186] Mem-Info:
[ 518.761457] Node 0 DMA per-cpu:
[ 518.764622] CPU 0: hi: 0, btch: 1 usd: 0
[ 518.769433] CPU 1: hi: 0, btch: 1 usd: 0
[ 518.774250] Node 0 DMA32 per-cpu:
[ 518.777607] CPU 0: hi: 186, btch: 31 usd: 122
[ 518.782429] CPU 1: hi: 186, btch: 31 usd: 140
[ 518.787320] Active_anon:43558 active_file:800 inactive_anon:46596
[ 518.787322] inactive_file:3200 unevictable:4 dirty:0 writeback:1 unstable:0
[ 518.787324] free:1170 slab:13276 mapped:3632 pagetables:7067 bounce:0
[ 518.806969] Node 0 DMA free:2004kB min:84kB low:104kB high:124kB active_anon:5392kB inactive_anon:6284kB active_file:8kB inactive_file:192kB unevictable:0kB present:15164kB pages_scanned:0 all_unreclaimable? no
[ 518.825631] lowmem_reserve[]: 0 483 483 483
[ 518.829894] Node 0 DMA32 free:2676kB min:2768kB low:3460kB high:4152kB active_anon:168840kB inactive_anon:180100kB active_file:3192kB inactive_file:12608kB unevictable:16kB present:495008kB pages_scanned:2752 all_unreclaimable? no
[ 518.850287] lowmem_reserve[]: 0 0 0 0
[ 518.854034] Node 0 DMA: 17*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2004kB
[ 518.864860] Node 0 DMA32: 51*4kB 9*8kB 22*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2676kB
[ 518.876047] 64523 total pagecache pages
[ 518.879879] 8754 pages in swap cache
[ 518.883453] Swap cache stats: add 169415, delete 160661, find 14593/54101
[ 518.890231] Free swap = 727320kB
[ 518.893549] Total swap = 1048568kB
[ 518.900474] 131072 pages RAM
[ 518.903375] 9628 pages reserved
[ 518.906522] 75910 pages shared
[ 518.909579] 57545 pages non-shared
[ 518.912975] Out of memory: kill process 3514 (run-many-x-apps) score 1125494 or a child
[ 518.920971] Killed process 3913 (gnome-system-mo)
[ 664.508168] Xorg invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 664.514995] Pid: 3308, comm: Xorg Not tainted 2.6.30-rc8-mm1 #301
[ 664.521111] Call Trace:
[ 664.523568] [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[ 664.529049] [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[ 664.534794] [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[ 664.540021] [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[ 664.545757] [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[ 664.551235] [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 664.557591] [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[ 664.563593] [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[ 664.569336] [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[ 664.575947] [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[ 664.582648] [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[ 664.587710] [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[ 664.593282] [<ffffffff810d9883>] __do_fault+0x53/0x510
[ 664.598508] [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[ 664.604603] [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[ 664.610357] [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[ 664.615937] [<ffffffff81545a55>] page_fault+0x25/0x30
[ 664.621071] Mem-Info:
[ 664.623341] Node 0 DMA per-cpu:
[ 664.626517] CPU 0: hi: 0, btch: 1 usd: 0
[ 664.631305] CPU 1: hi: 0, btch: 1 usd: 0
[ 664.636096] Node 0 DMA32 per-cpu:
[ 664.639430] CPU 0: hi: 186, btch: 31 usd: 108
[ 664.644229] CPU 1: hi: 186, btch: 31 usd: 104
[ 664.649022] Active_anon:42958 active_file:868 inactive_anon:46862
[ 664.649024] inactive_file:3541 unevictable:4 dirty:0 writeback:0 unstable:0
[ 664.649026] free:1182 slab:13288 mapped:3904 pagetables:7002 bounce:0
[ 664.668657] Node 0 DMA free:2004kB min:84kB low:104kB high:124kB active_anon:5528kB inactive_anon:6256kB active_file:0kB inactive_file:56kB unevictable:0kB present:15164kB pages_scanned:17829 all_unreclaimable? yes
[ 664.687670] lowmem_reserve[]: 0 483 483 483
[ 664.691974] Node 0 DMA32 free:2724kB min:2768kB low:3460kB high:4152kB active_anon:166304kB inactive_anon:181192kB active_file:3472kB inactive_file:14108kB unevictable:16kB present:495008kB pages_scanned:561984 all_unreclaimable? yes
[ 664.712637] lowmem_reserve[]: 0 0 0 0
[ 664.716412] Node 0 DMA: 21*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2020kB
[ 664.727297] Node 0 DMA32: 83*4kB 9*8kB 17*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2724kB
[ 664.738494] 64381 total pagecache pages
[ 664.742329] 7902 pages in swap cache
[ 664.745909] Swap cache stats: add 174458, delete 166556, find 14826/56928
[ 664.752696] Free swap = 734732kB
[ 664.756012] Total swap = 1048568kB
[ 664.763953] 131072 pages RAM
[ 664.766845] 9628 pages reserved
[ 664.769992] 74903 pages shared
[ 664.773047] 58244 pages non-shared
[ 664.776465] Out of memory: kill process 3514 (run-many-x-apps) score 1094818 or a child
[ 664.784464] Killed process 3941 (gnome-help)
[ 700.167781] Xorg invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
[ 700.174355] Pid: 3308, comm: Xorg Not tainted 2.6.30-rc8-mm1 #301
[ 700.180473] Call Trace:
[ 700.182949] [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[ 700.188480] [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[ 700.194247] [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[ 700.199501] [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[ 700.205257] [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[ 700.210748] [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 700.217115] [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[ 700.223132] [<ffffffff810c73f9>] __get_free_pages+0x9/0x50
[ 700.228731] [<ffffffff8110e3c2>] __pollwait+0xc2/0x100
[ 700.233966] [<ffffffff814958c3>] unix_poll+0x23/0xc0
[ 700.239025] [<ffffffff81419a88>] sock_poll+0x18/0x20
[ 700.244095] [<ffffffff8110d969>] do_select+0x3e9/0x730
[ 700.249333] [<ffffffff8110d580>] ? do_select+0x0/0x730
[ 700.254575] [<ffffffff8110e300>] ? __pollwait+0x0/0x100
[ 700.259909] [<ffffffff8110e400>] ? pollwake+0x0/0x60
[ 700.264976] [<ffffffff8110e400>] ? pollwake+0x0/0x60
[ 700.270034] [<ffffffff8110e400>] ? pollwake+0x0/0x60
[ 700.275093] [<ffffffff8110e400>] ? pollwake+0x0/0x60
[ 700.280157] [<ffffffff8110e400>] ? pollwake+0x0/0x60
[ 700.285223] [<ffffffff8110e400>] ? pollwake+0x0/0x60
[ 700.290287] [<ffffffff8110e400>] ? pollwake+0x0/0x60
[ 700.295360] [<ffffffff8110e400>] ? pollwake+0x0/0x60
[ 700.300416] [<ffffffff8110e400>] ? pollwake+0x0/0x60
[ 700.305475] [<ffffffff8110deaf>] core_sys_select+0x1ff/0x330
[ 700.311225] [<ffffffff8110dcf8>] ? core_sys_select+0x48/0x330
[ 700.317068] [<ffffffffa014954c>] ? i915_gem_throttle_ioctl+0x4c/0x60 [i915]
[ 700.324109] [<ffffffff810fcf9a>] ? do_readv_writev+0x16a/0x1f0
[ 700.330037] [<ffffffff810706bc>] ? getnstimeofday+0x5c/0xf0
[ 700.335708] [<ffffffff8106aca9>] ? ktime_get_ts+0x59/0x60
[ 700.341207] [<ffffffff8110e23a>] sys_select+0x4a/0x110
[ 700.346450] [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[ 700.352471] Mem-Info:
[ 700.354744] Node 0 DMA per-cpu:
[ 700.357931] CPU 0: hi: 0, btch: 1 usd: 0
[ 700.362728] CPU 1: hi: 0, btch: 1 usd: 0
[ 700.367528] Node 0 DMA32 per-cpu:
[ 700.370869] CPU 0: hi: 186, btch: 31 usd: 124
[ 700.375681] CPU 1: hi: 186, btch: 31 usd: 109
[ 700.380485] Active_anon:42750 active_file:1211 inactive_anon:46836
[ 700.380487] inactive_file:3834 unevictable:4 dirty:0 writeback:0 unstable:0
[ 700.380490] free:1185 slab:13047 mapped:4269 pagetables:6879 bounce:0
[ 700.400224] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5504kB inactive_anon:6244kB active_file:4kB inactive_file:20kB unevictable:0kB present:15164kB pages_scanned:21160 all_unreclaimable? no
[ 700.419171] lowmem_reserve[]: 0 483 483 483
[ 700.423495] Node 0 DMA32 free:2724kB min:2768kB low:3460kB high:4152kB active_anon:165496kB inactive_anon:181100kB active_file:4840kB inactive_file:15316kB unevictable:16kB present:495008kB pages_scanned:749440 all_unreclaimable? yes
[ 700.444177] lowmem_reserve[]: 0 0 0 0
[ 700.447982] Node 0 DMA: 24*4kB 2*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[ 700.458919] Node 0 DMA32: 95*4kB 7*8kB 15*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2724kB
[ 700.470109] 64769 total pagecache pages
[ 700.473944] 7685 pages in swap cache
[ 700.477521] Swap cache stats: add 174858, delete 167173, find 14884/57219
[ 700.484305] Free swap = 756796kB
[ 700.487619] Total swap = 1048568kB
[ 700.495533] 131072 pages RAM
[ 700.498435] 9628 pages reserved
[ 700.501585] 75677 pages shared
[ 700.504647] 57992 pages non-shared
[ 700.508062] Out of memory: kill process 3514 (run-many-x-apps) score 920259 or a child
[ 700.515981] Killed process 3972 (gnome-dictionar)
[ 772.754850] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 772.762316] Pid: 3363, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #301
[ 772.769042] Call Trace:
[ 772.771532] [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[ 772.777056] [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[ 772.782830] [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[ 772.788093] [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[ 772.793861] [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[ 772.799371] [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 772.805903] [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[ 772.812044] [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[ 772.817979] [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[ 772.824833] [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[ 772.831934] [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[ 772.837201] [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[ 772.843077] [<ffffffff810d9883>] __do_fault+0x53/0x510
[ 772.848298] [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[ 772.854384] [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[ 772.860126] [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[ 772.865693] [<ffffffff81545a55>] page_fault+0x25/0x30
[ 772.870831] Mem-Info:
[ 772.873099] Node 0 DMA per-cpu:
[ 772.876268] CPU 0: hi: 0, btch: 1 usd: 0
[ 772.881052] CPU 1: hi: 0, btch: 1 usd: 0
[ 772.885837] Node 0 DMA32 per-cpu:
[ 772.889177] CPU 0: hi: 186, btch: 31 usd: 119
[ 772.893970] CPU 1: hi: 186, btch: 31 usd: 131
[ 772.898771] Active_anon:42925 active_file:967 inactive_anon:46822
[ 772.898773] inactive_file:3951 unevictable:4 dirty:0 writeback:0 unstable:0
[ 772.898775] free:1195 slab:13130 mapped:4261 pagetables:6775 bounce:0
[ 772.918425] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5572kB inactive_anon:6228kB active_file:0kB inactive_file:28kB unevictable:0kB present:15164kB pages_scanned:1152 all_unreclaimable? no
[ 772.937282] lowmem_reserve[]: 0 483 483 483
[ 772.941583] Node 0 DMA32 free:2780kB min:2768kB low:3460kB high:4152kB active_anon:166128kB inactive_anon:181060kB active_file:3868kB inactive_file:15776kB unevictable:16kB present:495008kB pages_scanned:31168 all_unreclaimable? no
[ 772.962096] lowmem_reserve[]: 0 0 0 0
[ 772.965848] Node 0 DMA: 19*4kB 3*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2020kB
[ 772.976695] Node 0 DMA32: 113*4kB 7*8kB 16*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2780kB
[ 772.987966] 64559 total pagecache pages
[ 772.991800] 7639 pages in swap cache
[ 772.995376] Swap cache stats: add 175606, delete 167967, find 14965/57706
[ 773.002155] Free swap = 761820kB
[ 773.005474] Total swap = 1048568kB
[ 773.012974] 131072 pages RAM
[ 773.015871] 9628 pages reserved
[ 773.019017] 75524 pages shared
[ 773.022066] 57891 pages non-shared
[ 773.025474] Out of memory: kill process 3514 (run-many-x-apps) score 892555 or a child
[ 773.033387] Killed process 4039 (sol)
[ 794.790990] NFS: Server wrote zero bytes, expected 120.
[ 822.483490] Xorg invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 822.490772] Pid: 3308, comm: Xorg Not tainted 2.6.30-rc8-mm1 #301
[ 822.496918] Call Trace:
[ 822.499384] [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[ 822.504871] [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[ 822.510622] [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[ 822.515851] [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[ 822.521593] [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[ 822.527081] [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 822.533429] [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[ 822.539434] [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[ 822.545175] [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[ 822.551788] [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[ 822.558481] [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[ 822.563528] [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[ 822.569098] [<ffffffff810d9883>] __do_fault+0x53/0x510
[ 822.574327] [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[ 822.580413] [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[ 822.586157] [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[ 822.591727] [<ffffffff81545a55>] page_fault+0x25/0x30
[ 822.596859] Mem-Info:
[ 822.599136] Node 0 DMA per-cpu:
[ 822.602299] CPU 0: hi: 0, btch: 1 usd: 0
[ 822.607084] CPU 1: hi: 0, btch: 1 usd: 0
[ 822.611869] Node 0 DMA32 per-cpu:
[ 822.615198] CPU 0: hi: 186, btch: 31 usd: 91
[ 822.619985] CPU 1: hi: 186, btch: 31 usd: 98
[ 822.624773] Active_anon:43566 active_file:835 inactive_anon:46874
[ 822.624774] inactive_file:3327 unevictable:4 dirty:0 writeback:0 unstable:0
[ 822.624775] free:1187 slab:13349 mapped:3843 pagetables:6679 bounce:0
[ 822.644402] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5648kB inactive_anon:6260kB active_file:24kB inactive_file:72kB unevictable:0kB present:15164kB pages_scanned:20672 all_unreclaimable? yes
[ 822.663507] lowmem_reserve[]: 0 483 483 483
[ 822.667773] Node 0 DMA32 free:2748kB min:2768kB low:3460kB high:4152kB active_anon:168616kB inactive_anon:181236kB active_file:3316kB inactive_file:13236kB unevictable:16kB present:495008kB pages_scanned:729026 all_unreclaimable? yes
[ 822.688432] lowmem_reserve[]: 0 0 0 0
[ 822.692178] Node 0 DMA: 16*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2000kB
[ 822.703015] Node 0 DMA32: 53*4kB 31*8kB 15*16kB 16*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2748kB
[ 822.714282] 63870 total pagecache pages
[ 822.718120] 7714 pages in swap cache
[ 822.721687] Swap cache stats: add 177378, delete 169664, find 15255/58971
[ 822.728470] Free swap = 772080kB
[ 822.731787] Total swap = 1048568kB
[ 822.738767] 131072 pages RAM
[ 822.741648] 9628 pages reserved
[ 822.744800] 78480 pages shared
[ 822.747857] 58328 pages non-shared
[ 822.751262] Out of memory: kill process 3514 (run-many-x-apps) score 874039 or a child
[ 822.759173] Killed process 4071 (gnometris)
[ 838.434074] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 838.441560] Pid: 5500, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #301
[ 838.448286] Call Trace:
[ 838.450770] [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[ 838.456279] [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[ 838.462053] [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[ 838.467299] [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[ 838.473064] [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[ 838.478570] [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 838.484930] [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[ 838.490953] [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[ 838.496714] [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[ 838.503346] [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[ 838.510056] [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[ 838.515121] [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[ 838.520707] [<ffffffff810d9883>] __do_fault+0x53/0x510
[ 838.525955] [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[ 838.532058] [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[ 838.537819] [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[ 838.543405] [<ffffffff81545a55>] page_fault+0x25/0x30
[ 838.548553] Mem-Info:
[ 838.550844] Node 0 DMA per-cpu:
[ 838.554023] CPU 0: hi: 0, btch: 1 usd: 0
[ 838.558818] CPU 1: hi: 0, btch: 1 usd: 0
[ 838.563614] Node 0 DMA32 per-cpu:
[ 838.566959] CPU 0: hi: 186, btch: 31 usd: 174
[ 838.571767] CPU 1: hi: 186, btch: 31 usd: 87
[ 838.576579] Active_anon:43520 active_file:718 inactive_anon:46874
[ 838.576582] inactive_file:3607 unevictable:4 dirty:0 writeback:0 unstable:0
[ 838.576584] free:1193 slab:13228 mapped:4138 pagetables:6608 bounce:0
[ 838.596232] Node 0 DMA free:2008kB min:84kB low:104kB high:124kB active_anon:5620kB inactive_anon:6260kB active_file:28kB inactive_file:72kB unevictable:0kB present:15164kB pages_scanned:18848 all_unreclaimable? yes
[ 838.615367] lowmem_reserve[]: 0 483 483 483
[ 838.619678] Node 0 DMA32 free:2764kB min:2768kB low:3460kB high:4152kB active_anon:168460kB inactive_anon:181236kB active_file:2844kB inactive_file:14356kB unevictable:16kB present:495008kB pages_scanned:585548 all_unreclaimable? yes
[ 838.640372] lowmem_reserve[]: 0 0 0 0
[ 838.644163] Node 0 DMA: 18*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2008kB
[ 838.655125] Node 0 DMA32: 109*4kB 7*8kB 16*16kB 14*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2732kB
[ 838.666499] 64009 total pagecache pages
[ 838.670350] 7656 pages in swap cache
[ 838.673941] Swap cache stats: add 177561, delete 169905, find 15273/59126
[ 838.680734] Free swap = 791892kB
[ 838.684060] Total swap = 1048568kB
[ 838.694532] 131072 pages RAM
[ 838.697436] 9628 pages reserved
[ 838.700590] 73594 pages shared
[ 838.703661] 58166 pages non-shared
[ 838.707076] Out of memory: kill process 3514 (run-many-x-apps) score 853023 or a child
[ 838.714995] Killed process 4104 (gnect)
[ 889.461532] scim-panel-gtk invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 889.469205] Pid: 3360, comm: scim-panel-gtk Not tainted 2.6.30-rc8-mm1 #301
[ 889.476177] Call Trace:
[ 889.478662] [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[ 889.484172] [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[ 889.489944] [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[ 889.495191] [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[ 889.500962] [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[ 889.506455] [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 889.512814] [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[ 889.518831] [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[ 889.524591] [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[ 889.531220] [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[ 889.537930] [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[ 889.542994] [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[ 889.548580] [<ffffffff810d9883>] __do_fault+0x53/0x510
[ 889.553829] [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[ 889.559928] [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[ 889.565694] [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[ 889.571281] [<ffffffff81545a55>] page_fault+0x25/0x30
[ 889.576428] Mem-Info:
[ 889.578716] Node 0 DMA per-cpu:
[ 889.581897] CPU 0: hi: 0, btch: 1 usd: 0
[ 889.586693] CPU 1: hi: 0, btch: 1 usd: 0
[ 889.591489] Node 0 DMA32 per-cpu:
[ 889.594838] CPU 0: hi: 186, btch: 31 usd: 27
[ 889.599639] CPU 1: hi: 186, btch: 31 usd: 52
[ 889.604447] Active_anon:43571 active_file:1739 inactive_anon:47198
[ 889.604450] inactive_file:2522 unevictable:4 dirty:0 writeback:0 unstable:0
[ 889.604453] free:1172 slab:13250 mapped:4789 pagetables:6476 bounce:0
[ 889.624188] Node 0 DMA free:2012kB min:84kB low:104kB high:124kB active_anon:5672kB inactive_anon:6228kB active_file:0kB inactive_file:28kB unevictable:0kB present:15164kB pages_scanned:18758 all_unreclaimable? yes
[ 889.643237] lowmem_reserve[]: 0 483 483 483
[ 889.647549] Node 0 DMA32 free:2676kB min:2768kB low:3460kB high:4152kB active_anon:168612kB inactive_anon:182564kB active_file:6956kB inactive_file:10060kB unevictable:16kB present:495008kB pages_scanned:562004 all_unreclaimable? yes
[ 889.668244] lowmem_reserve[]: 0 0 0 0
[ 889.672043] Node 0 DMA: 19*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[ 889.683006] Node 0 DMA32: 85*4kB 8*8kB 16*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2676kB
[ 889.694298] 63465 total pagecache pages
[ 889.698147] 7133 pages in swap cache
[ 889.701736] Swap cache stats: add 181169, delete 174036, find 15337/60473
[ 889.708527] Free swap = 795216kB
[ 889.711853] Total swap = 1048568kB
[ 889.722306] 131072 pages RAM
[ 889.725220] 9628 pages reserved
[ 889.728368] 73642 pages shared
[ 889.731430] 58217 pages non-shared
[ 889.734842] Out of memory: kill process 3314 (gnome-session) score 875272 or a child
[ 889.742589] Killed process 3345 (ssh-agent)
[ 889.753188] urxvt invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 889.760064] Pid: 3364, comm: urxvt Not tainted 2.6.30-rc8-mm1 #301
[ 889.766248] Call Trace:
[ 889.768709] [<ffffffff81544fc6>] ? _spin_unlock+0x26/0x30
[ 889.774212] [<ffffffff810c37bc>] oom_kill_process+0xdc/0x270
[ 889.779963] [<ffffffff810c3b1f>] ? badness+0x18f/0x300
[ 889.785202] [<ffffffff810c3dc5>] __out_of_memory+0x135/0x170
[ 889.790961] [<ffffffff810c3ef5>] out_of_memory+0xf5/0x180
[ 889.796460] [<ffffffff810c856c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 889.802839] [<ffffffff810f3e68>] alloc_pages_current+0x78/0x100
[ 889.808867] [<ffffffff810c0c6b>] __page_cache_alloc+0xb/0x10
[ 889.814622] [<ffffffff810ca900>] __do_page_cache_readahead+0x120/0x240
[ 889.821253] [<ffffffff810ca892>] ? __do_page_cache_readahead+0xb2/0x240
[ 889.827970] [<ffffffff810caa3c>] ra_submit+0x1c/0x20
[ 889.833050] [<ffffffff810c1487>] filemap_fault+0x3f7/0x400
[ 889.838635] [<ffffffff810d9883>] __do_fault+0x53/0x510
[ 889.843875] [<ffffffff81271ca0>] ? __down_read_trylock+0x20/0x60
[ 889.849989] [<ffffffff810dc489>] handle_mm_fault+0x1c9/0x500
[ 889.855753] [<ffffffff81548234>] do_page_fault+0x1c4/0x330
[ 889.861356] [<ffffffff81545a55>] page_fault+0x25/0x30
[ 889.866503] Mem-Info:
[ 889.868779] Node 0 DMA per-cpu:
[ 889.871969] CPU 0: hi: 0, btch: 1 usd: 0
[ 889.876771] CPU 1: hi: 0, btch: 1 usd: 0
[ 889.881590] Node 0 DMA32 per-cpu:
[ 889.884950] CPU 0: hi: 186, btch: 31 usd: 27
[ 889.889752] CPU 1: hi: 186, btch: 31 usd: 83
[ 889.894557] Active_anon:43568 active_file:1748 inactive_anon:47202
[ 889.894560] inactive_file:2532 unevictable:4 dirty:0 writeback:0 unstable:0
[ 889.894562] free:1172 slab:13256 mapped:4800 pagetables:6457 bounce:0
[ 889.914305] Node 0 DMA free:2012kB min:84kB low:104kB high:124kB active_anon:5672kB inactive_anon:6244kB active_file:16kB inactive_file:36kB unevictable:0kB present:15164kB pages_scanned:18758 all_unreclaimable? yes
[ 889.933431] lowmem_reserve[]: 0 483 483 483
[ 889.937757] Node 0 DMA32 free:2676kB min:2768kB low:3460kB high:4152kB active_anon:168600kB inactive_anon:182564kB active_file:6976kB inactive_file:10092kB unevictable:16kB present:495008kB pages_scanned:572756 all_unreclaimable? yes
[ 889.958441] lowmem_reserve[]: 0 0 0 0
[ 889.962251] Node 0 DMA: 19*4kB 2*8kB 4*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[ 889.973218] Node 0 DMA32: 85*4kB 8*8kB 16*16kB 15*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2676kB
[ 889.984510] 63470 total pagecache pages
[ 889.988363] 7128 pages in swap cache
[ 889.991956] Swap cache stats: add 181169, delete 174041, find 15337/60473
[ 889.998764] Free swap = 795628kB
[ 890.002089] Total swap = 1048568kB
[ 890.012112] 131072 pages RAM
[ 890.015034] 9628 pages reserved
[ 890.018197] 73633 pages shared
[ 890.021274] 58191 pages non-shared
[ 890.024686] Out of memory: kill process 3314 (gnome-session) score 870770 or a child
[ 890.032441] Killed process 3363 (firefox-bin)

2009-06-10 06:41:32

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Tue, 9 Jun 2009 21:37:02 +0200
Johannes Weiner <[email protected]> wrote:

> On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > [resend with lists cc'd, sorry]
>
> [and fixed Hugh's email. crap]
>
> > Hi,
> >
> > here is a new iteration of the virtual swap readahead. Per Hugh's
> > suggestion, I moved the pte collecting to the callsite and thus out
> > ouf swap code. Unfortunately, I had to bound page_cluster due to an
> > array of that many swap entries on the stack, but I think it is better
> > to limit the cluster size to a sane maximum than using dynamic
> > allocation for this purpose.
> >
> > Thanks all for the helpful suggestions. KAMEZAWA-san and Minchan, I
> > didn't incorporate your ideas in this patch as I think they belong in
> > a different one with their own justifications. I didn't ignore them.
> >
> > Hannes
> >
> > ---
> > The current swap readahead implementation reads a physically
> > contiguous group of swap slots around the faulting page to take
> > advantage of the disk head's position and in the hope that the
> > surrounding pages will be needed soon as well.
> >
> > This works as long as the physical swap slot order approximates the
> > LRU order decently, otherwise it wastes memory and IO bandwidth to
> > read in pages that are unlikely to be needed soon.
> >
> > However, the physical swap slot layout diverges from the LRU order
> > with increasing swap activity, i.e. high memory pressure situations,
> > and this is exactly the situation where swapin should not waste any
> > memory or IO bandwidth as both are the most contended resources at
> > this point.
> >
> > Another approximation for LRU-relation is the VMA order as groups of
> > VMA-related pages are usually used together.
> >
> > This patch combines both the physical and the virtual hint to get a
> > good approximation of pages that are sensible to read ahead.
> >
> > When both diverge, we either read unrelated data, seek heavily for
> > related data, or, what this patch does, just decrease the readahead
> > efforts.
> >
> > To achieve this, we have essentially two readahead windows of the same
> > size: one spans the virtual, the other one the physical neighborhood
> > of the faulting page. We only read where both areas overlap.
> >
> > Signed-off-by: Johannes Weiner <[email protected]>
> > Reviewed-by: Rik van Riel <[email protected]>
> > Cc: Hugh Dickins <[email protected]>
> > Cc: Andi Kleen <[email protected]>
> > Cc: Wu Fengguang <[email protected]>
> > Cc: KAMEZAWA Hiroyuki <[email protected]>
> > Cc: Minchan Kim <[email protected]>
> > ---
> > include/linux/swap.h | 4 ++-
> > kernel/sysctl.c | 7 ++++-
> > mm/memory.c | 55 +++++++++++++++++++++++++++++++++++++++++
> > mm/shmem.c | 4 +--
> > mm/swap_state.c | 67 ++++++++++++++++++++++++++++++++++++++-------------
> > 5 files changed, 116 insertions(+), 21 deletions(-)
> >
> > version 3:
> > o move pte selection to callee (per Hugh)
> > o limit ra ptes to one pmd entry to avoid multiple
> > locking/mapping of highptes (per Hugh)
> >
> > version 2:
> > o fall back to physical ra window for shmem
> > o add documentation to the new ra algorithm (per Andrew)
> >
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -327,27 +327,14 @@ struct page *read_swap_cache_async(swp_e
> > return found_page;
> > }
> >
> > -/**
> > - * swapin_readahead - swap in pages in hope we need them soon
> > - * @entry: swap entry of this memory
> > - * @gfp_mask: memory allocation flags
> > - * @vma: user vma this address belongs to
> > - * @addr: target address for mempolicy
> > - *
> > - * Returns the struct page for entry and addr, after queueing swapin.
> > - *
> > +/*
> > * Primitive swap readahead code. We simply read an aligned block of
> > * (1 << page_cluster) entries in the swap area. This method is chosen
> > * because it doesn't cost us any seek time. We also make sure to queue
> > * the 'original' request together with the readahead ones...
> > - *
> > - * This has been extended to use the NUMA policies from the mm triggering
> > - * the readahead.
> > - *
> > - * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
> > */
> > -struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
> > - struct vm_area_struct *vma, unsigned long addr)
> > +static struct page *swapin_readahead_phys(swp_entry_t entry, gfp_t gfp_mask,
> > + struct vm_area_struct *vma, unsigned long addr)
> > {
> > int nr_pages;
> > struct page *page;
> > @@ -373,3 +360,51 @@ struct page *swapin_readahead(swp_entry_
> > lru_add_drain(); /* Push any new pages onto the LRU now */
> > return read_swap_cache_async(entry, gfp_mask, vma, addr);
> > }
> > +
> > +/**
> > + * swapin_readahead - swap in pages in hope we need them soon
> > + * @entry: swap entry of this memory
> > + * @gfp_mask: memory allocation flags
> > + * @vma: user vma this address belongs to
> > + * @addr: target address for mempolicy
> > + * @entries: swap slots to consider reading
> > + * @nr_entries: number of @entries
> > + * @cluster: readahead window size in swap slots
> > + *
> > + * Returns the struct page for entry and addr, after queueing swapin.
> > + *
> > + * This has been extended to use the NUMA policies from the mm
> > + * triggering the readahead.
> > + *
> > + * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
> > + */
> > +struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
> > + struct vm_area_struct *vma, unsigned long addr,
> > + swp_entry_t *entries, int nr_entries,
> > + unsigned long cluster)
> > +{
> > + unsigned long pmin, pmax;
> > + int i;
> > +
> > + if (!entries) /* XXX: shmem case */
> > + return swapin_readahead_phys(entry, gfp_mask, vma, addr);
> > + pmin = swp_offset(entry) & ~(cluster - 1);
> > + pmax = pmin + cluster;
> > + for (i = 0; i < nr_entries; i++) {
> > + swp_entry_t swp = entries[i];
> > + struct page *page;
> > +
> > + if (swp_type(swp) != swp_type(entry))
> > + continue;
> > + if (swp_offset(swp) > pmax)
> > + continue;
> > + if (swp_offset(swp) < pmin)
> > + continue;
> > + page = read_swap_cache_async(swp, gfp_mask, vma, addr);
> > + if (!page)
> > + break;
> > + page_cache_release(page);
> > + }
> > + lru_add_drain(); /* Push any new pages onto the LRU now */
> > + return read_swap_cache_async(entry, gfp_mask, vma, addr);
> > +}
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -292,7 +292,9 @@ extern struct page *lookup_swap_cache(sw
> > extern struct page *read_swap_cache_async(swp_entry_t, gfp_t,
> > struct vm_area_struct *vma, unsigned long addr);
> > extern struct page *swapin_readahead(swp_entry_t, gfp_t,
> > - struct vm_area_struct *vma, unsigned long addr);
> > + struct vm_area_struct *vma, unsigned long addr,
> > + swp_entry_t *entries, int nr_entries,
> > + unsigned long cluster);
> >
> > /* linux/mm/swapfile.c */
> > extern long nr_swap_pages;
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2440,6 +2440,54 @@ int vmtruncate_range(struct inode *inode
> > }
> >
> > /*
> > + * The readahead window is the virtual area around the faulting page,
> > + * where the physical proximity of the swap slots is taken into
> > + * account as well in swapin_readahead().
> > + *
> > + * While the swap allocation algorithm tries to keep LRU-related pages
> > + * together on the swap backing, it is not reliable on heavy thrashing
> > + * systems where concurrent reclaimers allocate swap slots and/or most
> > + * anonymous memory pages are already in swap cache.
> > + *
> > + * On the virtual side, subgroups of VMA-related pages are usually
> > + * used together, which gives another hint to LRU relationship.
> > + *
> > + * By taking both aspects into account, we get a good approximation of
> > + * which pages are sensible to read together with the faulting one.
> > + */
> > +static int swap_readahead_ptes(struct mm_struct *mm,
> > + unsigned long addr, pmd_t *pmd,
> > + swp_entry_t *entries,
> > + unsigned long cluster)
> > +{
> > + unsigned long window, min, max, limit;
> > + spinlock_t *ptl;
> > + pte_t *ptep;
> > + int i, nr;
> > +
> > + window = cluster << PAGE_SHIFT;
> > + min = addr & ~(window - 1);
> > + max = min + cluster;

Hmm, max = min + window ?

Thanks,
-Kame

> > + /*
> > + * To keep the locking/highpte mapping simple, stay
> > + * within the PTE range of one PMD entry.
> > + */
> > + limit = addr & PMD_MASK;
> > + if (limit > min)
> > + min = limit;
> > + limit = pmd_addr_end(addr, max);
> > + if (limit < max)
> > + max = limit;
> > + limit = max - min;
> > + ptep = pte_offset_map_lock(mm, pmd, min, &ptl);
> > + for (i = nr = 0; i < limit; i++)
> > + if (is_swap_pte(ptep[i]))
> > + entries[nr++] = pte_to_swp_entry(ptep[i]);
> > + pte_unmap_unlock(ptep, ptl);
> > + return nr;
> > +}
> > +
> > +/*
> > * We enter with non-exclusive mmap_sem (to exclude vma changes,
> > * but allow concurrent faults), and pte mapped but not yet locked.
> > * We return with mmap_sem still held, but pte unmapped and unlocked.
> > @@ -2466,9 +2514,14 @@ static int do_swap_page(struct mm_struct
> > delayacct_set_flag(DELAYACCT_PF_SWAPIN);
> > page = lookup_swap_cache(entry);
> > if (!page) {
> > + int nr, cluster = 1 << page_cluster;
> > + swp_entry_t entries[cluster];
> > +
> > grab_swap_token(); /* Contend for token _before_ read-in */
> > + nr = swap_readahead_ptes(mm, address, pmd, entries, cluster);
> > page = swapin_readahead(entry,
> > - GFP_HIGHUSER_MOVABLE, vma, address);
> > + GFP_HIGHUSER_MOVABLE, vma, address,
> > + entries, nr, cluster);
> > if (!page) {
> > /*
> > * Back out if somebody else faulted in this pte
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1148,7 +1148,7 @@ static struct page *shmem_swapin(swp_ent
> > pvma.vm_pgoff = idx;
> > pvma.vm_ops = NULL;
> > pvma.vm_policy = spol;
> > - page = swapin_readahead(entry, gfp, &pvma, 0);
> > + page = swapin_readahead(entry, gfp, &pvma, 0, NULL, 0, 0);
> > return page;
> > }
> >
> > @@ -1178,7 +1178,7 @@ static inline void shmem_show_mpol(struc
> > static inline struct page *shmem_swapin(swp_entry_t entry, gfp_t gfp,
> > struct shmem_inode_info *info, unsigned long idx)
> > {
> > - return swapin_readahead(entry, gfp, NULL, 0);
> > + return swapin_readahead(entry, gfp, NULL, 0, NULL, 0, 0);
> > }
> >
> > static inline struct page *shmem_alloc_page(gfp_t gfp,
> > --- a/kernel/sysctl.c
> > +++ b/kernel/sysctl.c
> > @@ -112,6 +112,8 @@ static int min_percpu_pagelist_fract = 8
> >
> > static int ngroups_max = NGROUPS_MAX;
> >
> > +static int page_cluster_max = 5;
> > +
> > #ifdef CONFIG_MODULES
> > extern char modprobe_path[];
> > #endif
> > @@ -966,7 +968,10 @@ static struct ctl_table vm_table[] = {
> > .data = &page_cluster,
> > .maxlen = sizeof(int),
> > .mode = 0644,
> > - .proc_handler = &proc_dointvec,
> > + .proc_handler = &proc_dointvec_minmax,
> > + .strategy = &sysctl_intvec,
> > + .extra1 = &zero,
> > + .extra2 = &page_cluster_max,
> > },
> > {
> > .ctl_name = VM_DIRTY_BACKGROUND,
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to [email protected]. For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2009-06-10 07:47:39

by Johannes Weiner

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

Hi Fengguang,

On Wed, Jun 10, 2009 at 01:03:42PM +0800, Wu Fengguang wrote:
> On Wed, Jun 10, 2009 at 03:37:02AM +0800, Johannes Weiner wrote:
> > On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > > [resend with lists cc'd, sorry]
> >
> > [and fixed Hugh's email. crap]
> >
> > > Hi,
> > >
> > > here is a new iteration of the virtual swap readahead. Per Hugh's
> > > suggestion, I moved the pte collecting to the callsite and thus out
> > > ouf swap code. Unfortunately, I had to bound page_cluster due to an
> > > array of that many swap entries on the stack, but I think it is better
> > > to limit the cluster size to a sane maximum than using dynamic
> > > allocation for this purpose.
>
> Hi Johannes,
>
> When stress testing your patch, I found it triggered many OOM kills.
> Around the time of last OOMs, the memory usage is:
>
> total used free shared buffers cached
> Mem: 474 468 5 0 0 239
> -/+ buffers/cache: 229 244
> Swap: 1023 221 802

Wow, that really confused me for a second as we shouldn't read more
pages ahead than without the patch, probably even less under stress.

So the problem has to be a runaway reading. And indeed, severe
stupidity here:

+ window = cluster << PAGE_SHIFT;
+ min = addr & ~(window - 1);
+ max = min + cluster;
+ /*
+ * To keep the locking/highpte mapping simple, stay
+ * within the PTE range of one PMD entry.
+ */
+ limit = addr & PMD_MASK;
+ if (limit > min)
+ min = limit;
+ limit = pmd_addr_end(addr, max);
+ if (limit < max)
+ max = limit;
+ limit = max - min;

The mistake is at the initial calculation of max. It should be

max = min + window;

The resulting problem is that min could get bigger than max when
cluster is bigger than PMD_SHIFT. Did you use page_cluster == 5?

The initial min is aligned to a value below the PMD boundary and max
based on it with a too small offset, staying below the PMD boundary as
well. When min is rounded up, this becomes a bit large:

limit = max - min;

So if my brain is already functioning, fixing the initial max should
be enough because either

o window is smaller than PMD_SIZE, than we won't round down
below a PMD boundary in the first place or

o window is bigger than PMD_SIZE, than we can round down below
a PMD boundary but adding window to that is garuanteed to
cross the boundary again

and thus max is always bigger than min.

Fengguang, does this make sense? If so, the patch below should fix
it.

Thank you,

Hannes

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2467,7 +2467,7 @@ static int swap_readahead_ptes(struct mm

window = cluster << PAGE_SHIFT;
min = addr & ~(window - 1);
- max = min + cluster;
+ max = min + window;
/*
* To keep the locking/highpte mapping simple, stay
* within the PTE range of one PMD entry.

2009-06-10 08:12:10

by Fengguang Wu

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Wed, Jun 10, 2009 at 03:45:08PM +0800, Johannes Weiner wrote:
> Hi Fengguang,
>
> On Wed, Jun 10, 2009 at 01:03:42PM +0800, Wu Fengguang wrote:
> > On Wed, Jun 10, 2009 at 03:37:02AM +0800, Johannes Weiner wrote:
> > > On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > > > [resend with lists cc'd, sorry]
> > >
> > > [and fixed Hugh's email. crap]
> > >
> > > > Hi,
> > > >
> > > > here is a new iteration of the virtual swap readahead. Per Hugh's
> > > > suggestion, I moved the pte collecting to the callsite and thus out
> > > > ouf swap code. Unfortunately, I had to bound page_cluster due to an
> > > > array of that many swap entries on the stack, but I think it is better
> > > > to limit the cluster size to a sane maximum than using dynamic
> > > > allocation for this purpose.
> >
> > Hi Johannes,
> >
> > When stress testing your patch, I found it triggered many OOM kills.
> > Around the time of last OOMs, the memory usage is:
> >
> > total used free shared buffers cached
> > Mem: 474 468 5 0 0 239
> > -/+ buffers/cache: 229 244
> > Swap: 1023 221 802
>
> Wow, that really confused me for a second as we shouldn't read more
> pages ahead than without the patch, probably even less under stress.

Yup - swap readahead is much more challenging than sequential readahead,
in that it must be accurate enough given some really obscure patterns.

> So the problem has to be a runaway reading. And indeed, severe
> stupidity here:
>
> + window = cluster << PAGE_SHIFT;
> + min = addr & ~(window - 1);
> + max = min + cluster;
> + /*
> + * To keep the locking/highpte mapping simple, stay
> + * within the PTE range of one PMD entry.
> + */
> + limit = addr & PMD_MASK;
> + if (limit > min)
> + min = limit;
> + limit = pmd_addr_end(addr, max);
> + if (limit < max)
> + max = limit;
> + limit = max - min;
>
> The mistake is at the initial calculation of max. It should be
>
> max = min + window;
>
> The resulting problem is that min could get bigger than max when
> cluster is bigger than PMD_SHIFT. Did you use page_cluster == 5?

No I use the default 3.

btw, the mistake reflects bad named variables. How about rename
cluster => pages
window => bytes
?

> The initial min is aligned to a value below the PMD boundary and max
> based on it with a too small offset, staying below the PMD boundary as
> well. When min is rounded up, this becomes a bit large:
>
> limit = max - min;
>
> So if my brain is already functioning, fixing the initial max should
> be enough because either
>
> o window is smaller than PMD_SIZE, than we won't round down
> below a PMD boundary in the first place or
>
> o window is bigger than PMD_SIZE, than we can round down below
> a PMD boundary but adding window to that is garuanteed to
> cross the boundary again
>
> and thus max is always bigger than min.
>
> Fengguang, does this make sense? If so, the patch below should fix
> it.

Too bad, a quick test of the below patch freezes the box..

Thanks,
Fengguang

> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2467,7 +2467,7 @@ static int swap_readahead_ptes(struct mm
>
> window = cluster << PAGE_SHIFT;
> min = addr & ~(window - 1);
> - max = min + cluster;
> + max = min + window;
> /*
> * To keep the locking/highpte mapping simple, stay
> * within the PTE range of one PMD entry.

2009-06-10 08:34:40

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Wed, 10 Jun 2009 16:11:32 +0800
Wu Fengguang <[email protected]> wrote:

> On Wed, Jun 10, 2009 at 03:45:08PM +0800, Johannes Weiner wrote:
> > Hi Fengguang,
> >
> > On Wed, Jun 10, 2009 at 01:03:42PM +0800, Wu Fengguang wrote:
> > > On Wed, Jun 10, 2009 at 03:37:02AM +0800, Johannes Weiner wrote:
> > > > On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > > > > [resend with lists cc'd, sorry]
> > > >
> > > > [and fixed Hugh's email. crap]
> > > >
> > > > > Hi,
> > > > >
> > > > > here is a new iteration of the virtual swap readahead. Per Hugh's
> > > > > suggestion, I moved the pte collecting to the callsite and thus out
> > > > > ouf swap code. Unfortunately, I had to bound page_cluster due to an
> > > > > array of that many swap entries on the stack, but I think it is better
> > > > > to limit the cluster size to a sane maximum than using dynamic
> > > > > allocation for this purpose.
> > >
> > > Hi Johannes,
> > >
> > > When stress testing your patch, I found it triggered many OOM kills.
> > > Around the time of last OOMs, the memory usage is:
> > >
> > > total used free shared buffers cached
> > > Mem: 474 468 5 0 0 239
> > > -/+ buffers/cache: 229 244
> > > Swap: 1023 221 802
> >
> > Wow, that really confused me for a second as we shouldn't read more
> > pages ahead than without the patch, probably even less under stress.
>
> Yup - swap readahead is much more challenging than sequential readahead,
> in that it must be accurate enough given some really obscure patterns.
>
> > So the problem has to be a runaway reading. And indeed, severe
> > stupidity here:
> >
> > + window = cluster << PAGE_SHIFT;
> > + min = addr & ~(window - 1);
> > + max = min + cluster;
> > + /*
> > + * To keep the locking/highpte mapping simple, stay
> > + * within the PTE range of one PMD entry.
> > + */
> > + limit = addr & PMD_MASK;
> > + if (limit > min)
> > + min = limit;
> > + limit = pmd_addr_end(addr, max);
> > + if (limit < max)
> > + max = limit;
> > + limit = max - min;
> >
> > The mistake is at the initial calculation of max. It should be
> >
> > max = min + window;
> >
> > The resulting problem is that min could get bigger than max when
> > cluster is bigger than PMD_SHIFT. Did you use page_cluster == 5?
>
> No I use the default 3.
>
> btw, the mistake reflects bad named variables. How about rename
> cluster => pages
> window => bytes
> ?
>
> > The initial min is aligned to a value below the PMD boundary and max
> > based on it with a too small offset, staying below the PMD boundary as
> > well. When min is rounded up, this becomes a bit large:
> >
> > limit = max - min;
> >
> > So if my brain is already functioning, fixing the initial max should
> > be enough because either
> >
> > o window is smaller than PMD_SIZE, than we won't round down
> > below a PMD boundary in the first place or
> >
> > o window is bigger than PMD_SIZE, than we can round down below
> > a PMD boundary but adding window to that is garuanteed to
> > cross the boundary again
> >
> > and thus max is always bigger than min.
> >
> > Fengguang, does this make sense? If so, the patch below should fix
> > it.
>
> Too bad, a quick test of the below patch freezes the box..
>

+ window = cluster << PAGE_SHIFT;
+ min = addr & ~(window - 1);
+ max = min + cluster;

max = min + window; # this is fixed. then,

+ /*
+ * To keep the locking/highpte mapping simple, stay
+ * within the PTE range of one PMD entry.
+ */
+ limit = addr & PMD_MASK;
+ if (limit > min)
+ min = limit;
+ limit = pmd_addr_end(addr, max);
+ if (limit < max)
+ max = limit;
+ limit = max - min;

limit = (max - min) >> PAGE_SHIFT;

+ ptep = pte_offset_map_lock(mm, pmd, min, &ptl);
+ for (i = nr = 0; i < limit; i++)
+ if (is_swap_pte(ptep[i]))
+ entries[nr++] = pte_to_swp_entry(ptep[i]);
+ pte_unmap_unlock(ptep, ptl);

Cheer!,
-Kame

2009-06-10 08:56:49

by Fengguang Wu

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Wed, Jun 10, 2009 at 04:32:49PM +0800, KAMEZAWA Hiroyuki wrote:
> On Wed, 10 Jun 2009 16:11:32 +0800
> Wu Fengguang <[email protected]> wrote:
>
> > On Wed, Jun 10, 2009 at 03:45:08PM +0800, Johannes Weiner wrote:
> > > Hi Fengguang,
> > >
> > > On Wed, Jun 10, 2009 at 01:03:42PM +0800, Wu Fengguang wrote:
> > > > On Wed, Jun 10, 2009 at 03:37:02AM +0800, Johannes Weiner wrote:
> > > > > On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > > > > > [resend with lists cc'd, sorry]
> > > > >
> > > > > [and fixed Hugh's email. crap]
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > here is a new iteration of the virtual swap readahead. Per Hugh's
> > > > > > suggestion, I moved the pte collecting to the callsite and thus out
> > > > > > ouf swap code. Unfortunately, I had to bound page_cluster due to an
> > > > > > array of that many swap entries on the stack, but I think it is better
> > > > > > to limit the cluster size to a sane maximum than using dynamic
> > > > > > allocation for this purpose.
> > > >
> > > > Hi Johannes,
> > > >
> > > > When stress testing your patch, I found it triggered many OOM kills.
> > > > Around the time of last OOMs, the memory usage is:
> > > >
> > > > total used free shared buffers cached
> > > > Mem: 474 468 5 0 0 239
> > > > -/+ buffers/cache: 229 244
> > > > Swap: 1023 221 802
> > >
> > > Wow, that really confused me for a second as we shouldn't read more
> > > pages ahead than without the patch, probably even less under stress.
> >
> > Yup - swap readahead is much more challenging than sequential readahead,
> > in that it must be accurate enough given some really obscure patterns.
> >
> > > So the problem has to be a runaway reading. And indeed, severe
> > > stupidity here:
> > >
> > > + window = cluster << PAGE_SHIFT;
> > > + min = addr & ~(window - 1);
> > > + max = min + cluster;
> > > + /*
> > > + * To keep the locking/highpte mapping simple, stay
> > > + * within the PTE range of one PMD entry.
> > > + */
> > > + limit = addr & PMD_MASK;
> > > + if (limit > min)
> > > + min = limit;
> > > + limit = pmd_addr_end(addr, max);
> > > + if (limit < max)
> > > + max = limit;
> > > + limit = max - min;
> > >
> > > The mistake is at the initial calculation of max. It should be
> > >
> > > max = min + window;
> > >
> > > The resulting problem is that min could get bigger than max when
> > > cluster is bigger than PMD_SHIFT. Did you use page_cluster == 5?
> >
> > No I use the default 3.
> >
> > btw, the mistake reflects bad named variables. How about rename
> > cluster => pages
> > window => bytes
> > ?
> >
> > > The initial min is aligned to a value below the PMD boundary and max
> > > based on it with a too small offset, staying below the PMD boundary as
> > > well. When min is rounded up, this becomes a bit large:
> > >
> > > limit = max - min;
> > >
> > > So if my brain is already functioning, fixing the initial max should
> > > be enough because either
> > >
> > > o window is smaller than PMD_SIZE, than we won't round down
> > > below a PMD boundary in the first place or
> > >
> > > o window is bigger than PMD_SIZE, than we can round down below
> > > a PMD boundary but adding window to that is garuanteed to
> > > cross the boundary again
> > >
> > > and thus max is always bigger than min.
> > >
> > > Fengguang, does this make sense? If so, the patch below should fix
> > > it.
> >
> > Too bad, a quick test of the below patch freezes the box..
> >
>
> + window = cluster << PAGE_SHIFT;
> + min = addr & ~(window - 1);
> + max = min + cluster;
>
> max = min + window; # this is fixed. then,
>
> + /*
> + * To keep the locking/highpte mapping simple, stay
> + * within the PTE range of one PMD entry.
> + */
> + limit = addr & PMD_MASK;
> + if (limit > min)
> + min = limit;
> + limit = pmd_addr_end(addr, max);
> + if (limit < max)
> + max = limit;
> + limit = max - min;
>
> limit = (max - min) >> PAGE_SHIFT;
>
> + ptep = pte_offset_map_lock(mm, pmd, min, &ptl);
> + for (i = nr = 0; i < limit; i++)
> + if (is_swap_pte(ptep[i]))
> + entries[nr++] = pte_to_swp_entry(ptep[i]);
> + pte_unmap_unlock(ptep, ptl);

Yes it worked! But then I run into page allocation failures:

[ 340.639803] Xorg: page allocation failure. order:4, mode:0x40d0
[ 340.645744] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
[ 340.651839] Call Trace:
[ 340.654289] [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
[ 340.660645] [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
[ 340.666472] [<ffffffff810f8608>] __kmalloc+0x198/0x250
[ 340.671786] [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
[ 340.678746] [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
[ 340.685527] [<ffffffff81079ead>] ? trace_hardirqs_on+0xd/0x10
[ 340.691356] [<ffffffff81542b49>] ? mutex_unlock+0x9/0x10
[ 340.696771] [<ffffffffa00f5b7d>] drm_ioctl+0x12d/0x3d0 [drm]
[ 340.702518] [<ffffffffa014be20>] ? i915_gem_execbuffer+0x0/0x11e0 [i915]
[ 340.709301] [<ffffffff81271f1a>] ? __up_read+0x2a/0xb0
[ 340.714529] [<ffffffff8110ba8d>] vfs_ioctl+0x7d/0xa0
[ 340.719578] [<ffffffff8110bb3a>] do_vfs_ioctl+0x8a/0x580
[ 340.724969] [<ffffffff8106b236>] ? up_read+0x26/0x30
[ 340.730024] [<ffffffff81544b04>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 340.736375] [<ffffffff8110c07a>] sys_ioctl+0x4a/0x80
[ 340.741430] [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[ 340.747434] Mem-Info:
[ 340.749730] Node 0 DMA per-cpu:
[ 340.752896] CPU 0: hi: 0, btch: 1 usd: 0
[ 340.757679] CPU 1: hi: 0, btch: 1 usd: 0
[ 340.762462] Node 0 DMA32 per-cpu:
[ 340.765797] CPU 0: hi: 186, btch: 31 usd: 161
[ 340.770582] CPU 1: hi: 186, btch: 31 usd: 0
[ 340.775367] Active_anon:38344 active_file:6556 inactive_anon:41644
[ 340.775368] inactive_file:4210 unevictable:4 dirty:1 writeback:10 unstable:1
[ 340.775370] free:3136 slab:15738 mapped:8023 pagetables:6294 bounce:0
[ 340.795166] Node 0 DMA free:2024kB min:84kB low:104kB high:124kB active_anon:5296kB inactive_anon:5772kB active_file:644kB inactive_file:612kB unevictable:0kB present:15164kB pages_scanned:0 all_unreclaimable? no
[ 340.814007] lowmem_reserve[]: 0 483 483 483
[ 340.818277] Node 0 DMA32 free:10520kB min:2768kB low:3460kB high:4152kB active_anon:148080kB inactive_anon:160804kB active_file:25580kB inactive_file:16228kB unevictable:16kB present:495008kB pages_scanned:0 all_unreclaimable? no
[ 340.838594] lowmem_reserve[]: 0 0 0 0
[ 340.842338] Node 0 DMA: 87*4kB 14*8kB 2*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2028kB
[ 340.853398] Node 0 DMA32: 2288*4kB 24*8kB 4*16kB 2*32kB 3*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 10560kB
[ 340.864874] 59895 total pagecache pages
[ 340.868720] 4176 pages in swap cache
[ 340.872315] Swap cache stats: add 99021, delete 94845, find 8313/23463
[ 340.878847] Free swap = 780376kB
[ 340.882178] Total swap = 1048568kB
[ 340.889619] 131072 pages RAM
[ 340.892527] 9628 pages reserved
[ 340.895677] 126767 pages shared
[ 340.898836] 60472 pages non-shared
[ 341.026977] Xorg: page allocation failure. order:4, mode:0x40d0
[ 341.032900] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
[ 341.038989] Call Trace:
[ 341.041451] [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
[ 341.047801] [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
[ 341.053628] [<ffffffff810f7840>] __remote_slab_alloc_node+0xc0/0x130
[ 341.060073] [<ffffffff810f78e5>] __remote_slab_alloc+0x35/0xc0
[ 341.065983] [<ffffffff810f76e4>] ? __slab_alloc_page+0x314/0x3b0
[ 341.072070] [<ffffffff810f8528>] __kmalloc+0xb8/0x250
[ 341.077220] [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
[ 341.084184] [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
[ 341.090963] [<ffffffff81079ead>] ? trace_hardirqs_on+0xd/0x10
[ 341.096791] [<ffffffff81542b49>] ? mutex_unlock+0x9/0x10
[ 341.102197] [<ffffffffa00f5b7d>] drm_ioctl+0x12d/0x3d0 [drm]
[ 341.107948] [<ffffffffa014be20>] ? i915_gem_execbuffer+0x0/0x11e0 [i915]
[ 341.114726] [<ffffffff81271f1a>] ? __up_read+0x2a/0xb0
[ 341.119948] [<ffffffff8110ba8d>] vfs_ioctl+0x7d/0xa0
[ 341.124996] [<ffffffff8110bb3a>] do_vfs_ioctl+0x8a/0x580
[ 341.130389] [<ffffffff8106b236>] ? up_read+0x26/0x30
[ 341.135436] [<ffffffff81544b04>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 341.141787] [<ffffffff8110c07a>] sys_ioctl+0x4a/0x80
[ 341.146848] [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[ 341.152855] Mem-Info:
[ 341.155124] Node 0 DMA per-cpu:
[ 341.158289] CPU 0: hi: 0, btch: 1 usd: 0
[ 341.163074] CPU 1: hi: 0, btch: 1 usd: 0
[ 341.167878] Node 0 DMA32 per-cpu:
[ 341.171212] CPU 0: hi: 186, btch: 31 usd: 72
[ 341.176009] CPU 1: hi: 186, btch: 31 usd: 0
[ 341.180794] Active_anon:38344 active_file:6605 inactive_anon:41579
[ 341.180795] inactive_file:4180 unevictable:4 dirty:0 writeback:0 unstable:1
[ 341.180797] free:3147 slab:15867 mapped:8021 pagetables:6295 bounce:0
[ 341.200505] Node 0 DMA free:2028kB min:84kB low:104kB high:124kB active_anon:5284kB inactive_anon:5784kB active_file:644kB inactive_file:612kB unevictable:0kB present:15164kB pages_scanned:0 all_unreclaimable? no
[ 341.219339] lowmem_reserve[]: 0 483 483 483
[ 341.223605] Node 0 DMA32 free:10560kB min:2768kB low:3460kB high:4152kB active_anon:148092kB inactive_anon:160532kB active_file:25776kB inactive_file:16108kB unevictable:16kB present:495008kB pages_scanned:618 all_unreclaimable? no
[ 341.244093] lowmem_reserve[]: 0 0 0 0
[ 341.247851] Node 0 DMA: 87*4kB 14*8kB 2*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2028kB
[ 341.258769] Node 0 DMA32: 2296*4kB 18*8kB 5*16kB 2*32kB 3*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 10560kB
[ 341.270121] 59860 total pagecache pages
[ 341.273957] 4142 pages in swap cache
[ 341.277531] Swap cache stats: add 99071, delete 94929, find 8313/23465
[ 341.284052] Free swap = 780184kB
[ 341.287357] Total swap = 1048568kB
[ 341.294497] 131072 pages RAM
[ 341.297396] 9628 pages reserved
[ 341.300538] 126655 pages shared
[ 341.303674] 60501 pages non-shared
[ 357.833157] Xorg: page allocation failure. order:4, mode:0x40d0
[ 357.839105] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
[ 357.845243] Call Trace:
[ 357.847737] [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
[ 357.854108] [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
[ 357.859965] [<ffffffff810f8608>] __kmalloc+0x198/0x250
[ 357.865263] [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
[ 357.872245] [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
[ 357.879029] [<ffffffff810ea8bb>] ? swap_info_get+0x6b/0xf0
[ 357.884626] [<ffffffffa00f5b7d>] drm_ioctl+0x12d/0x3d0 [drm]
[ 357.890396] [<ffffffffa014be20>] ? i915_gem_execbuffer+0x0/0x11e0 [i915]
[ 357.897190] [<ffffffff81271f1a>] ? __up_read+0x2a/0xb0
[ 357.902412] [<ffffffff8110ba8d>] vfs_ioctl+0x7d/0xa0
[ 357.907460] [<ffffffff8110bb3a>] do_vfs_ioctl+0x8a/0x580
[ 357.912873] [<ffffffff8106b236>] ? up_read+0x26/0x30
[ 357.917923] [<ffffffff81544b04>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 357.924289] [<ffffffff8110c07a>] sys_ioctl+0x4a/0x80
[ 357.929347] [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[ 357.935350] Mem-Info:
[ 357.937630] Node 0 DMA per-cpu:
[ 357.940801] CPU 0: hi: 0, btch: 1 usd: 0
[ 357.945590] CPU 1: hi: 0, btch: 1 usd: 0
[ 357.950379] Node 0 DMA32 per-cpu:
[ 357.953728] CPU 0: hi: 186, btch: 31 usd: 159
[ 357.958513] CPU 1: hi: 186, btch: 31 usd: 0
[ 357.963300] Active_anon:38863 active_file:6095 inactive_anon:41764
[ 357.963301] inactive_file:4777 unevictable:4 dirty:0 writeback:18 unstable:0
[ 357.963302] free:2317 slab:15674 mapped:8121 pagetables:6408 bounce:0
[ 357.983105] Node 0 DMA free:2012kB min:84kB low:104kB high:124kB active_anon:5268kB inactive_anon:5768kB active_file:644kB inactive_file:632kB unevictable:0kB present:15164kB pages_scanned:65 all_unreclaimable? no
[ 358.002033] lowmem_reserve[]: 0 483 483 483
[ 358.006331] Node 0 DMA32 free:7380kB min:2768kB low:3460kB high:4152kB active_anon:150124kB inactive_anon:161368kB active_file:23736kB inactive_file:18404kB unevictable:16kB present:495008kB pages_scanned:32 all_unreclaimable? no
[ 358.026802] lowmem_reserve[]: 0 0 0 0
[ 358.030561] Node 0 DMA: 81*4kB 11*8kB 0*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[ 358.041571] Node 0 DMA32: 1534*4kB 29*8kB 3*16kB 4*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 7504kB
[ 358.052856] 60223 total pagecache pages
[ 358.056690] 4367 pages in swap cache
[ 358.060265] Swap cache stats: add 105056, delete 100689, find 9043/26609
[ 358.066954] Free swap = 774800kB
[ 358.070268] Total swap = 1048568kB
[ 358.077041] 131072 pages RAM
[ 358.079954] 9628 pages reserved
[ 358.083094] 128803 pages shared
[ 358.086237] 61031 pages non-shared
[ 507.741934] Xorg: page allocation failure. order:4, mode:0x40d0
[ 507.748019] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
[ 507.754182] Call Trace:
[ 507.756636] [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
[ 507.762988] [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
[ 507.768812] [<ffffffff810f8608>] __kmalloc+0x198/0x250
[ 507.774048] [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
[ 507.781010] [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
[ 507.787798] [<ffffffff81079ead>] ? trace_hardirqs_on+0xd/0x10
[ 507.793636] [<ffffffff81542b49>] ? mutex_unlock+0x9/0x10
[ 507.799043] [<ffffffffa00f5b7d>] drm_ioctl+0x12d/0x3d0 [drm]
[ 507.804788] [<ffffffffa014be20>] ? i915_gem_execbuffer+0x0/0x11e0 [i915]
[ 507.811572] [<ffffffff81271f1a>] ? __up_read+0x2a/0xb0
[ 507.816788] [<ffffffff8110ba8d>] vfs_ioctl+0x7d/0xa0
[ 507.821847] [<ffffffff8110bb3a>] do_vfs_ioctl+0x8a/0x580
[ 507.827244] [<ffffffff8106b236>] ? up_read+0x26/0x30
[ 507.832291] [<ffffffff81544b04>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 507.838642] [<ffffffff8110c07a>] sys_ioctl+0x4a/0x80
[ 507.843696] [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[ 507.849699] Mem-Info:
[ 507.851973] Node 0 DMA per-cpu:
[ 507.855130] CPU 0: hi: 0, btch: 1 usd: 0
[ 507.859916] CPU 1: hi: 0, btch: 1 usd: 0
[ 507.864700] Node 0 DMA32 per-cpu:
[ 507.868036] CPU 0: hi: 186, btch: 31 usd: 0
[ 507.872819] CPU 1: hi: 186, btch: 31 usd: 30
[ 507.876816] Active_anon:34956 active_file:5472 inactive_anon:45220
[ 507.876816] inactive_file:6158 unevictable:4 dirty:13 writeback:2 unstable:0
[ 507.876816] free:1726 slab:15603 mapped:7450 pagetables:6818 bounce:0
[ 507.897413] Node 0 DMA free:2044kB min:84kB low:104kB high:124kB active_anon:5060kB inactive_anon:6028kB active_file:644kB inactive_file:624kB unevictable:0kB present:15164kB pages_scanned:0 all_unreclaimable? no
[ 507.916249] lowmem_reserve[]: 0 483 483 483
[ 507.920598] Node 0 DMA32 free:4488kB min:2768kB low:3460kB high:4152kB active_anon:134764kB inactive_anon:174852kB active_file:21244kB inactive_file:24008kB unevictable:16kB present:495008kB pages_scanned:0 all_unreclaimable? no
[ 507.940856] lowmem_reserve[]: 0 0 0 0
[ 507.944849] Node 0 DMA: 51*4kB 14*8kB 0*16kB 4*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2044kB
[ 507.955772] Node 0 DMA32: 888*4kB 1*8kB 0*16kB 3*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 4488kB
[ 507.966871] 64772 total pagecache pages
[ 507.970702] 6574 pages in swap cache
[ 507.974276] Swap cache stats: add 161629, delete 155055, find 17122/59120
[ 507.981051] Free swap = 735792kB
[ 507.984361] Total swap = 1048568kB
[ 507.991453] 131072 pages RAM
[ 507.994364] 9628 pages reserved
[ 507.997503] 114413 pages shared
[ 508.000643] 59801 pages non-shared
[ 509.462416] NFS: Server wrote zero bytes, expected 756.
[ 580.369464] Xorg: page allocation failure. order:4, mode:0x40d0
[ 580.375400] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
[ 580.381522] Call Trace:
[ 580.384092] [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
[ 580.390669] [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
[ 580.396802] [<ffffffff810f8608>] __kmalloc+0x198/0x250
[ 580.402033] [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
[ 580.408992] [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
[ 580.415775] [<ffffffff81079ead>] ? trace_hardirqs_on+0xd/0x10
[ 580.421607] [<ffffffff81542b49>] ? mutex_unlock+0x9/0x10
[ 580.427033] [<ffffffffa00f5b7d>] drm_ioctl+0x12d/0x3d0 [drm]
[ 580.432804] [<ffffffffa014be20>] ? i915_gem_execbuffer+0x0/0x11e0 [i915]
[ 580.439600] [<ffffffff81271f1a>] ? __up_read+0x2a/0xb0
[ 580.444824] [<ffffffff8110ba8d>] vfs_ioctl+0x7d/0xa0
[ 580.449889] [<ffffffff8110bb3a>] do_vfs_ioctl+0x8a/0x580
[ 580.455287] [<ffffffff8106b236>] ? up_read+0x26/0x30
[ 580.460353] [<ffffffff81544b04>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 580.466702] [<ffffffff8110c07a>] sys_ioctl+0x4a/0x80
[ 580.471751] [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[ 580.477753] Mem-Info:
[ 580.480020] Node 0 DMA per-cpu:
[ 580.483189] CPU 0: hi: 0, btch: 1 usd: 0
[ 580.487977] CPU 1: hi: 0, btch: 1 usd: 0
[ 580.492767] Node 0 DMA32 per-cpu:
[ 580.496095] CPU 0: hi: 186, btch: 31 usd: 90
[ 580.500892] CPU 1: hi: 186, btch: 31 usd: 1
[ 580.505679] Active_anon:34315 active_file:5739 inactive_anon:45597
[ 580.505681] inactive_file:5830 unevictable:4 dirty:2 writeback:0 unstable:1
[ 580.505682] free:3781 slab:13422 mapped:6830 pagetables:7180 bounce:0
[ 580.525398] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5024kB inactive_anon:6012kB active_file:640kB inactive_file:608kB unevictable:0kB present:15164kB pages_scanned:0 all_unreclaimable? no
[ 580.544234] lowmem_reserve[]: 0 483 483 483
[ 580.548504] Node 0 DMA32 free:13108kB min:2768kB low:3460kB high:4152kB active_anon:132236kB inactive_anon:176376kB active_file:22316kB inactive_file:22712kB unevictable:16kB present:495008kB pages_scanned:417 all_unreclaimable? no
[ 580.568992] lowmem_reserve[]: 0 0 0 0
[ 580.572741] Node 0 DMA: 56*4kB 22*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[ 580.583661] Node 0 DMA32: 2995*4kB 23*8kB 1*16kB 1*32kB 4*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 13108kB
[ 580.595010] 64782 total pagecache pages
[ 580.598845] 6586 pages in swap cache
[ 580.602421] Swap cache stats: add 185372, delete 178786, find 19755/72917
[ 580.609205] Free swap = 722720kB
[ 580.612513] Total swap = 1048568kB
[ 580.619688] 131072 pages RAM
[ 580.622586] 9628 pages reserved
[ 580.625726] 112220 pages shared
[ 580.628868] 58034 pages non-shared

Thanks,
Fengguang

2009-06-10 09:32:41

by Johannes Weiner

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Wed, Jun 10, 2009 at 05:32:49PM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 10 Jun 2009 16:11:32 +0800
> Wu Fengguang <[email protected]> wrote:
>
> > On Wed, Jun 10, 2009 at 03:45:08PM +0800, Johannes Weiner wrote:
> > > Hi Fengguang,
> > >
> > > On Wed, Jun 10, 2009 at 01:03:42PM +0800, Wu Fengguang wrote:
> > > > On Wed, Jun 10, 2009 at 03:37:02AM +0800, Johannes Weiner wrote:
> > > > > On Tue, Jun 09, 2009 at 09:01:28PM +0200, Johannes Weiner wrote:
> > > > > > [resend with lists cc'd, sorry]
> > > > >
> > > > > [and fixed Hugh's email. crap]
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > here is a new iteration of the virtual swap readahead. Per Hugh's
> > > > > > suggestion, I moved the pte collecting to the callsite and thus out
> > > > > > ouf swap code. Unfortunately, I had to bound page_cluster due to an
> > > > > > array of that many swap entries on the stack, but I think it is better
> > > > > > to limit the cluster size to a sane maximum than using dynamic
> > > > > > allocation for this purpose.
> > > >
> > > > Hi Johannes,
> > > >
> > > > When stress testing your patch, I found it triggered many OOM kills.
> > > > Around the time of last OOMs, the memory usage is:
> > > >
> > > > total used free shared buffers cached
> > > > Mem: 474 468 5 0 0 239
> > > > -/+ buffers/cache: 229 244
> > > > Swap: 1023 221 802
> > >
> > > Wow, that really confused me for a second as we shouldn't read more
> > > pages ahead than without the patch, probably even less under stress.
> >
> > Yup - swap readahead is much more challenging than sequential readahead,
> > in that it must be accurate enough given some really obscure patterns.
> >
> > > So the problem has to be a runaway reading. And indeed, severe
> > > stupidity here:
> > >
> > > + window = cluster << PAGE_SHIFT;
> > > + min = addr & ~(window - 1);
> > > + max = min + cluster;
> > > + /*
> > > + * To keep the locking/highpte mapping simple, stay
> > > + * within the PTE range of one PMD entry.
> > > + */
> > > + limit = addr & PMD_MASK;
> > > + if (limit > min)
> > > + min = limit;
> > > + limit = pmd_addr_end(addr, max);
> > > + if (limit < max)
> > > + max = limit;
> > > + limit = max - min;
> > >
> > > The mistake is at the initial calculation of max. It should be
> > >
> > > max = min + window;
> > >
> > > The resulting problem is that min could get bigger than max when
> > > cluster is bigger than PMD_SHIFT. Did you use page_cluster == 5?
> >
> > No I use the default 3.
> >
> > btw, the mistake reflects bad named variables. How about rename
> > cluster => pages
> > window => bytes
> > ?

Proven twice, fixed in v4.

> > > The initial min is aligned to a value below the PMD boundary and max
> > > based on it with a too small offset, staying below the PMD boundary as
> > > well. When min is rounded up, this becomes a bit large:
> > >
> > > limit = max - min;
> > >
> > > So if my brain is already functioning, fixing the initial max should
> > > be enough because either
> > >
> > > o window is smaller than PMD_SIZE, than we won't round down
> > > below a PMD boundary in the first place or
> > >
> > > o window is bigger than PMD_SIZE, than we can round down below
> > > a PMD boundary but adding window to that is garuanteed to
> > > cross the boundary again
> > >
> > > and thus max is always bigger than min.
> > >
> > > Fengguang, does this make sense? If so, the patch below should fix
> > > it.
> >
> > Too bad, a quick test of the below patch freezes the box..
> >
>
> + window = cluster << PAGE_SHIFT;
> + min = addr & ~(window - 1);
> + max = min + cluster;
>
> max = min + window; # this is fixed. then,
>
> + /*
> + * To keep the locking/highpte mapping simple, stay
> + * within the PTE range of one PMD entry.
> + */
> + limit = addr & PMD_MASK;
> + if (limit > min)
> + min = limit;
> + limit = pmd_addr_end(addr, max);
> + if (limit < max)
> + max = limit;
> + limit = max - min;
>
> limit = (max - min) >> PAGE_SHIFT;

Head -> desk.

Fixed in v4, thank you.

Hannes

2009-06-10 09:45:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Wed, 2009-06-10 at 16:56 +0800, Wu Fengguang wrote:
>
> Yes it worked! But then I run into page allocation failures:
>
> [ 340.639803] Xorg: page allocation failure. order:4, mode:0x40d0
> [ 340.645744] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
> [ 340.651839] Call Trace:
> [ 340.654289] [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
> [ 340.660645] [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
> [ 340.666472] [<ffffffff810f8608>] __kmalloc+0x198/0x250
> [ 340.671786] [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
> [ 340.678746] [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]

Jesse Barnes had a patch to add a vmalloc fallback to those largish kms
allocs.

But order-4 allocs failing isn't really strange, but it might indicate
this patch fragments stuff sooner, although I've seen these particular
failues before.

2009-06-10 10:00:25

by Fengguang Wu

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Wed, Jun 10, 2009 at 05:42:56PM +0800, Peter Zijlstra wrote:
> On Wed, 2009-06-10 at 16:56 +0800, Wu Fengguang wrote:
> >
> > Yes it worked! But then I run into page allocation failures:
> >
> > [ 340.639803] Xorg: page allocation failure. order:4, mode:0x40d0
> > [ 340.645744] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
> > [ 340.651839] Call Trace:
> > [ 340.654289] [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
> > [ 340.660645] [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
> > [ 340.666472] [<ffffffff810f8608>] __kmalloc+0x198/0x250
> > [ 340.671786] [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > [ 340.678746] [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
>
> Jesse Barnes had a patch to add a vmalloc fallback to those largish kms
> allocs.
>
> But order-4 allocs failing isn't really strange, but it might indicate
> this patch fragments stuff sooner, although I've seen these particular
> failues before.

Thanks for the tip. Where is it? I'd like to try it out :)

Despite of the xorg failures, the test was able to complete with the
listed timing. The numbers are the time each program is able to start:

before after
0.02 0.01 N xeyes
0.76 0.68 N firefox
1.88 1.89 N nautilus
3.17 3.25 N nautilus --browser
4.89 4.98 N gthumb
6.47 6.79 N gedit
8.16 8.56 N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf
12.55 12.61 N xterm
14.57 14.99 N mlterm
17.06 17.16 N gnome-terminal
18.90 19.60 N urxvt
23.48 24.26 N gnome-system-monitor
26.52 27.13 N gnome-help
29.65 30.29 N gnome-dictionary
36.12 36.93 N /usr/games/sol
39.27 39.21 N /usr/games/gnometris
42.56 43.61 N /usr/games/gnect
47.03 47.40 N /usr/games/gtali
52.05 51.41 N /usr/games/iagno
55.42 56.21 N /usr/games/gnotravex
61.47 60.58 N /usr/games/mahjongg
67.11 64.68 N /usr/games/gnome-sudoku
75.15 72.42 N /usr/games/glines
79.70 78.61 N /usr/games/glchess
88.48 87.01 N /usr/games/gnomine
96.51 95.03 N /usr/games/gnotski
102.19 100.50 N /usr/games/gnibbles
114.93 108.97 N /usr/games/gnobots2
125.02 120.09 N /usr/games/blackjack
135.11 134.39 N /usr/games/same-gnome
154.50 159.99 N /usr/bin/gnome-window-properties
162.09 176.04 N /usr/bin/gnome-default-applications-properties
173.29 197.12 N /usr/bin/gnome-at-properties
188.21 221.15 N /usr/bin/gnome-typing-monitor
199.93 249.38 N /usr/bin/gnome-at-visual
206.95 272.87 N /usr/bin/gnome-sound-properties
224.49 302.03 N /usr/bin/gnome-at-mobility
234.11 325.73 N /usr/bin/gnome-keybinding-properties
248.59 358.64 N /usr/bin/gnome-about-me
276.27 402.30 N /usr/bin/gnome-display-properties
304.39 439.35 N /usr/bin/gnome-network-preferences
342.01 482.78 N /usr/bin/gnome-mouse-properties
388.58 528.54 N /usr/bin/gnome-appearance-properties
508.47 653.12 N /usr/bin/gnome-control-center
587.57 769.65 N /usr/bin/gnome-keyboard-properties
758.16 1021.65 N : oocalc
830.03 1124.14 N : oodraw
900.03 1246.52 N : ooimpress
993.91 1370.35 N : oomath
1081.89 1478.34 N : ooweb
1161.99 1595.85 N : oowriter

It's slower with the patch. Maybe we shall give it another run with
the vmalloc patch.

Thanks,
Fengguang

2009-06-10 10:05:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Wed, 2009-06-10 at 17:59 +0800, Wu Fengguang wrote:
> On Wed, Jun 10, 2009 at 05:42:56PM +0800, Peter Zijlstra wrote:
> > On Wed, 2009-06-10 at 16:56 +0800, Wu Fengguang wrote:
> > >
> > > Yes it worked! But then I run into page allocation failures:
> > >
> > > [ 340.639803] Xorg: page allocation failure. order:4, mode:0x40d0
> > > [ 340.645744] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
> > > [ 340.651839] Call Trace:
> > > [ 340.654289] [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
> > > [ 340.660645] [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
> > > [ 340.666472] [<ffffffff810f8608>] __kmalloc+0x198/0x250
> > > [ 340.671786] [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > > [ 340.678746] [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
> >
> > Jesse Barnes had a patch to add a vmalloc fallback to those largish kms
> > allocs.
> >
> > But order-4 allocs failing isn't really strange, but it might indicate
> > this patch fragments stuff sooner, although I've seen these particular
> > failues before.
>
> Thanks for the tip. Where is it? I'd like to try it out :)

commit 8e7d2b2c6ecd3c21a54b877eae3d5be48292e6b5
Author: Jesse Barnes <[email protected]>
Date: Fri May 8 16:13:25 2009 -0700

drm/i915: allocate large pointer arrays with vmalloc

2009-06-10 11:32:42

by Fengguang Wu

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Wed, Jun 10, 2009 at 06:05:14PM +0800, Peter Zijlstra wrote:
> On Wed, 2009-06-10 at 17:59 +0800, Wu Fengguang wrote:
> > On Wed, Jun 10, 2009 at 05:42:56PM +0800, Peter Zijlstra wrote:
> > > On Wed, 2009-06-10 at 16:56 +0800, Wu Fengguang wrote:
> > > >
> > > > Yes it worked! But then I run into page allocation failures:
> > > >
> > > > [ 340.639803] Xorg: page allocation failure. order:4, mode:0x40d0
> > > > [ 340.645744] Pid: 3258, comm: Xorg Not tainted 2.6.30-rc8-mm1 #303
> > > > [ 340.651839] Call Trace:
> > > > [ 340.654289] [<ffffffff810c8204>] __alloc_pages_nodemask+0x344/0x6c0
> > > > [ 340.660645] [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
> > > > [ 340.666472] [<ffffffff810f8608>] __kmalloc+0x198/0x250
> > > > [ 340.671786] [<ffffffffa014bf9f>] ? i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > > > [ 340.678746] [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > >
> > > Jesse Barnes had a patch to add a vmalloc fallback to those largish kms
> > > allocs.
> > >
> > > But order-4 allocs failing isn't really strange, but it might indicate
> > > this patch fragments stuff sooner, although I've seen these particular
> > > failues before.
> >
> > Thanks for the tip. Where is it? I'd like to try it out :)
>
> commit 8e7d2b2c6ecd3c21a54b877eae3d5be48292e6b5
> Author: Jesse Barnes <[email protected]>
> Date: Fri May 8 16:13:25 2009 -0700
>
> drm/i915: allocate large pointer arrays with vmalloc

Thanks! It is already in the -mm tree, but it missed on conversion :)

I'll retry with this patch tomorrow.

Thanks,
Fengguang
---

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 39f5c65..7132dbe 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3230,8 +3230,8 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
}

if (args->num_cliprects != 0) {
- cliprects = drm_calloc(args->num_cliprects, sizeof(*cliprects),
- DRM_MEM_DRIVER);
+ cliprects = drm_calloc_large(args->num_cliprects,
+ sizeof(*cliprects));
if (cliprects == NULL)
goto pre_mutex_err;

@@ -3474,8 +3474,7 @@ err:
pre_mutex_err:
drm_free_large(object_list);
drm_free_large(exec_list);
- drm_free(cliprects, sizeof(*cliprects) * args->num_cliprects,
- DRM_MEM_DRIVER);
+ drm_free_large(cliprects);

return ret;
}

2009-06-10 17:25:42

by Jesse Barnes

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Wed, 10 Jun 2009 04:32:14 -0700
"Wu, Fengguang" <[email protected]> wrote:

> On Wed, Jun 10, 2009 at 06:05:14PM +0800, Peter Zijlstra wrote:
> > On Wed, 2009-06-10 at 17:59 +0800, Wu Fengguang wrote:
> > > On Wed, Jun 10, 2009 at 05:42:56PM +0800, Peter Zijlstra wrote:
> > > > On Wed, 2009-06-10 at 16:56 +0800, Wu Fengguang wrote:
> > > > >
> > > > > Yes it worked! But then I run into page allocation failures:
> > > > >
> > > > > [ 340.639803] Xorg: page allocation failure. order:4,
> > > > > mode:0x40d0 [ 340.645744] Pid: 3258, comm: Xorg Not tainted
> > > > > 2.6.30-rc8-mm1 #303 [ 340.651839] Call Trace:
> > > > > [ 340.654289] [<ffffffff810c8204>]
> > > > > __alloc_pages_nodemask+0x344/0x6c0 [ 340.660645]
> > > > > [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
> > > > > [ 340.666472] [<ffffffff810f8608>] __kmalloc+0x198/0x250
> > > > > [ 340.671786] [<ffffffffa014bf9f>] ?
> > > > > i915_gem_execbuffer+0x17f/0x11e0 [i915] [ 340.678746]
> > > > > [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > > >
> > > > Jesse Barnes had a patch to add a vmalloc fallback to those
> > > > largish kms allocs.
> > > >
> > > > But order-4 allocs failing isn't really strange, but it might
> > > > indicate this patch fragments stuff sooner, although I've seen
> > > > these particular failues before.
> > >
> > > Thanks for the tip. Where is it? I'd like to try it out :)
> >
> > commit 8e7d2b2c6ecd3c21a54b877eae3d5be48292e6b5
> > Author: Jesse Barnes <[email protected]>
> > Date: Fri May 8 16:13:25 2009 -0700
> >
> > drm/i915: allocate large pointer arrays with vmalloc
>
> Thanks! It is already in the -mm tree, but it missed on conversion :)
>
> I'll retry with this patch tomorrow.
>
> Thanks,
> Fengguang
> ---
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c
> b/drivers/gpu/drm/i915/i915_gem.c index 39f5c65..7132dbe 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3230,8 +3230,8 @@ i915_gem_execbuffer(struct drm_device *dev,
> void *data, }
>
> if (args->num_cliprects != 0) {
> - cliprects = drm_calloc(args->num_cliprects,
> sizeof(*cliprects),
> - DRM_MEM_DRIVER);
> + cliprects = drm_calloc_large(args->num_cliprects,
> + sizeof(*cliprects));
> if (cliprects == NULL)
> goto pre_mutex_err;
>
> @@ -3474,8 +3474,7 @@ err:
> pre_mutex_err:
> drm_free_large(object_list);
> drm_free_large(exec_list);
> - drm_free(cliprects, sizeof(*cliprects) * args->num_cliprects,
> - DRM_MEM_DRIVER);
> + drm_free_large(cliprects);
>
> return ret;
> }

Kristian posted a fix to my drm_calloc_large function as well; one of
the size checks in drm_calloc_large (the one which decides whether to
use kmalloc or vmalloc) was just checking size instead of size * num,
so you may be hitting that.

Jesse

2009-06-11 05:22:38

by Fengguang Wu

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Thu, Jun 11, 2009 at 01:25:16AM +0800, Barnes, Jesse wrote:
> On Wed, 10 Jun 2009 04:32:14 -0700
> "Wu, Fengguang" <[email protected]> wrote:
>
> > On Wed, Jun 10, 2009 at 06:05:14PM +0800, Peter Zijlstra wrote:
> > > On Wed, 2009-06-10 at 17:59 +0800, Wu Fengguang wrote:
> > > > On Wed, Jun 10, 2009 at 05:42:56PM +0800, Peter Zijlstra wrote:
> > > > > On Wed, 2009-06-10 at 16:56 +0800, Wu Fengguang wrote:
> > > > > >
> > > > > > Yes it worked! But then I run into page allocation failures:
> > > > > >
> > > > > > [ 340.639803] Xorg: page allocation failure. order:4,
> > > > > > mode:0x40d0 [ 340.645744] Pid: 3258, comm: Xorg Not tainted
> > > > > > 2.6.30-rc8-mm1 #303 [ 340.651839] Call Trace:
> > > > > > [ 340.654289] [<ffffffff810c8204>]
> > > > > > __alloc_pages_nodemask+0x344/0x6c0 [ 340.660645]
> > > > > > [<ffffffff810f7489>] __slab_alloc_page+0xb9/0x3b0
> > > > > > [ 340.666472] [<ffffffff810f8608>] __kmalloc+0x198/0x250
> > > > > > [ 340.671786] [<ffffffffa014bf9f>] ?
> > > > > > i915_gem_execbuffer+0x17f/0x11e0 [i915] [ 340.678746]
> > > > > > [<ffffffffa014bf9f>] i915_gem_execbuffer+0x17f/0x11e0 [i915]
> > > > >
> > > > > Jesse Barnes had a patch to add a vmalloc fallback to those
> > > > > largish kms allocs.
> > > > >
> > > > > But order-4 allocs failing isn't really strange, but it might
> > > > > indicate this patch fragments stuff sooner, although I've seen
> > > > > these particular failues before.
> > > >
> > > > Thanks for the tip. Where is it? I'd like to try it out :)
> > >
> > > commit 8e7d2b2c6ecd3c21a54b877eae3d5be48292e6b5
> > > Author: Jesse Barnes <[email protected]>
> > > Date: Fri May 8 16:13:25 2009 -0700
> > >
> > > drm/i915: allocate large pointer arrays with vmalloc
> >
> > Thanks! It is already in the -mm tree, but it missed on conversion :)
> >
> > I'll retry with this patch tomorrow.
> >
> > Thanks,
> > Fengguang
> > ---
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c
> > b/drivers/gpu/drm/i915/i915_gem.c index 39f5c65..7132dbe 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3230,8 +3230,8 @@ i915_gem_execbuffer(struct drm_device *dev,
> > void *data, }
> >
> > if (args->num_cliprects != 0) {
> > - cliprects = drm_calloc(args->num_cliprects,
> > sizeof(*cliprects),
> > - DRM_MEM_DRIVER);
> > + cliprects = drm_calloc_large(args->num_cliprects,
> > + sizeof(*cliprects));
> > if (cliprects == NULL)
> > goto pre_mutex_err;
> >
> > @@ -3474,8 +3474,7 @@ err:
> > pre_mutex_err:
> > drm_free_large(object_list);
> > drm_free_large(exec_list);
> > - drm_free(cliprects, sizeof(*cliprects) * args->num_cliprects,
> > - DRM_MEM_DRIVER);
> > + drm_free_large(cliprects);
> >
> > return ret;
> > }
>
> Kristian posted a fix to my drm_calloc_large function as well; one of
> the size checks in drm_calloc_large (the one which decides whether to
> use kmalloc or vmalloc) was just checking size instead of size * num,
> so you may be hitting that.

Yes, it is.

Unfortunately, after fixing it up the swap readahead patch still performs slow
(even worse this time):

before after
0.02 0.01 N xeyes
0.76 0.89 N firefox
1.88 2.21 N nautilus
3.17 3.41 N nautilus --browser
4.89 5.20 N gthumb
6.47 7.02 N gedit
8.16 8.90 N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf
12.55 13.36 N xterm
14.57 15.57 N mlterm
17.06 18.11 N gnome-terminal
18.90 20.37 N urxvt
23.48 25.26 N gnome-system-monitor
26.52 27.84 N gnome-help
29.65 31.93 N gnome-dictionary
36.12 37.74 N /usr/games/sol
39.27 40.61 N /usr/games/gnometris
42.56 43.75 N /usr/games/gnect
47.03 47.85 N /usr/games/gtali
52.05 52.31 N /usr/games/iagno
55.42 55.61 N /usr/games/gnotravex
61.47 61.38 N /usr/games/mahjongg
67.11 65.07 N /usr/games/gnome-sudoku
75.15 70.36 N /usr/games/glines
79.70 74.96 N /usr/games/glchess
88.48 80.82 N /usr/games/gnomine
96.51 88.30 N /usr/games/gnotski
102.19 94.26 N /usr/games/gnibbles
114.93 102.02 N /usr/games/gnobots2
125.02 115.23 N /usr/games/blackjack
135.11 128.41 N /usr/games/same-gnome
154.50 153.05 N /usr/bin/gnome-window-properties
162.09 169.53 N /usr/bin/gnome-default-applications-properties
173.29 190.32 N /usr/bin/gnome-at-properties
188.21 212.70 N /usr/bin/gnome-typing-monitor
199.93 236.18 N /usr/bin/gnome-at-visual
206.95 261.88 N /usr/bin/gnome-sound-properties
224.49 304.66 N /usr/bin/gnome-at-mobility
234.11 336.73 N /usr/bin/gnome-keybinding-properties
248.59 374.03 N /usr/bin/gnome-about-me
276.27 433.86 N /usr/bin/gnome-display-properties
304.39 488.43 N /usr/bin/gnome-network-preferences
342.01 686.68 N /usr/bin/gnome-mouse-properties
388.58 769.21 N /usr/bin/gnome-appearance-properties
508.47 933.35 N /usr/bin/gnome-control-center
587.57 1193.27 N /usr/bin/gnome-keyboard-properties
[...]

Thanks,
Fengguang

2009-06-11 05:33:15

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Tue, 9 Jun 2009 21:01:28 +0200
Johannes Weiner <[email protected]> wrote:
> [resend with lists cc'd, sorry]
>
> +static int swap_readahead_ptes(struct mm_struct *mm,
> + unsigned long addr, pmd_t *pmd,
> + swp_entry_t *entries,
> + unsigned long cluster)
> +{
> + unsigned long window, min, max, limit;
> + spinlock_t *ptl;
> + pte_t *ptep;
> + int i, nr;
> +
> + window = cluster << PAGE_SHIFT;
> + min = addr & ~(window - 1);
> + max = min + cluster;

Johannes, I wonder there is no reason to use "alignment".
I think we just need to read "nearby" pages. Then, this function's
scan range should be

[addr - window/2, addr + window/2)
or some.

And here, too
> + if (!entries) /* XXX: shmem case */
> + return swapin_readahead_phys(entry, gfp_mask, vma, addr);
> + pmin = swp_offset(entry) & ~(cluster - 1);
> + pmax = pmin + cluster;

pmin = swp_offset(entry) - cluster/2.
pmax = swp_offset(entry) + cluster/2.

I'm sorry if I miss a reason for using "alignment".

Thanks,
-Kame

2009-06-11 10:20:30

by Johannes Weiner

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> Unfortunately, after fixing it up the swap readahead patch still performs slow
> (even worse this time):

Thanks for doing the tests. Do you know if the time difference comes
from IO or CPU time?

Because one reason I could think of is that the original code walks
the readaround window in two directions, starting from the target each
time but immediately stops when it encounters a hole where the new
code just skips holes but doesn't abort readaround and thus might
indeed read more slots.

I have an old patch flying around that changed the physical ra code to
use a bitmap that is able to represent holes. If the increased time
is waiting for IO, I would be interested if that patch has the same
negative impact.

Hannes

> before after
> 0.02 0.01 N xeyes
> 0.76 0.89 N firefox
> 1.88 2.21 N nautilus
> 3.17 3.41 N nautilus --browser
> 4.89 5.20 N gthumb
> 6.47 7.02 N gedit
> 8.16 8.90 N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf
> 12.55 13.36 N xterm
> 14.57 15.57 N mlterm
> 17.06 18.11 N gnome-terminal
> 18.90 20.37 N urxvt
> 23.48 25.26 N gnome-system-monitor
> 26.52 27.84 N gnome-help
> 29.65 31.93 N gnome-dictionary
> 36.12 37.74 N /usr/games/sol
> 39.27 40.61 N /usr/games/gnometris
> 42.56 43.75 N /usr/games/gnect
> 47.03 47.85 N /usr/games/gtali
> 52.05 52.31 N /usr/games/iagno
> 55.42 55.61 N /usr/games/gnotravex
> 61.47 61.38 N /usr/games/mahjongg
> 67.11 65.07 N /usr/games/gnome-sudoku
> 75.15 70.36 N /usr/games/glines
> 79.70 74.96 N /usr/games/glchess
> 88.48 80.82 N /usr/games/gnomine
> 96.51 88.30 N /usr/games/gnotski
> 102.19 94.26 N /usr/games/gnibbles
> 114.93 102.02 N /usr/games/gnobots2
> 125.02 115.23 N /usr/games/blackjack
> 135.11 128.41 N /usr/games/same-gnome
> 154.50 153.05 N /usr/bin/gnome-window-properties
> 162.09 169.53 N /usr/bin/gnome-default-applications-properties
> 173.29 190.32 N /usr/bin/gnome-at-properties
> 188.21 212.70 N /usr/bin/gnome-typing-monitor
> 199.93 236.18 N /usr/bin/gnome-at-visual
> 206.95 261.88 N /usr/bin/gnome-sound-properties
> 224.49 304.66 N /usr/bin/gnome-at-mobility
> 234.11 336.73 N /usr/bin/gnome-keybinding-properties
> 248.59 374.03 N /usr/bin/gnome-about-me
> 276.27 433.86 N /usr/bin/gnome-display-properties
> 304.39 488.43 N /usr/bin/gnome-network-preferences
> 342.01 686.68 N /usr/bin/gnome-mouse-properties
> 388.58 769.21 N /usr/bin/gnome-appearance-properties
> 508.47 933.35 N /usr/bin/gnome-control-center
> 587.57 1193.27 N /usr/bin/gnome-keyboard-properties

2009-06-12 01:59:37

by Fengguang Wu

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Thu, Jun 11, 2009 at 06:17:42PM +0800, Johannes Weiner wrote:
> On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> > Unfortunately, after fixing it up the swap readahead patch still performs slow
> > (even worse this time):
>
> Thanks for doing the tests. Do you know if the time difference comes
> from IO or CPU time?
>
> Because one reason I could think of is that the original code walks
> the readaround window in two directions, starting from the target each
> time but immediately stops when it encounters a hole where the new
> code just skips holes but doesn't abort readaround and thus might
> indeed read more slots.
>
> I have an old patch flying around that changed the physical ra code to
> use a bitmap that is able to represent holes. If the increased time
> is waiting for IO, I would be interested if that patch has the same
> negative impact.

You can send me the patch :)

But for this patch it is IO bound. The CPU iowait field actually is
going up as the test goes on:

wfg@hp ~% dstat 10
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
3 3 89 4 0 1| 18k 27B| 0 0 | 0 0 |1530 1006
0 1 99 0 0 0| 0 0 | 31k 9609B| 0 0 |1071 444
1 1 97 1 0 0| 0 0 | 57k 13k| 0 0 |1139 870
30 31 24 13 0 3| 0 741k|1648k 294k| 0 370k|3666 10k
27 30 26 14 0 3| 361k 3227k|1264k 262k| 180k 1614k|3471 9457
25 25 29 18 0 2| 479k 4102k|2353k 285k| 240k 2051k|3707 9429
39 44 5 8 0 4| 256k 7646k|2711k 564k| 128k 3823k|7055 13k
33 18 17 30 0 2|1654k 4357k|2565k 306k| 830k 2366k|4033 10k
25 17 25 31 0 2|1130k 4053k|2540k 312k| 562k 1838k|3906 9722
26 17 15 38 0 3|2481k 7118k|3870k 456k|1244k 3559k|5301 11k
21 12 15 49 0 3|2406k 5041k|4389k 371k|1206k 2818k|4684 8747
26 15 12 42 0 4|3582k 7320k|5002k 484k|1784k 3362k|5675 9934
26 19 17 35 0 3|2412k 3452k|3165k 300k|1209k 1726k|4090 8727
26 15 13 43 0 3|2531k 5294k|3727k 350k|1281k 2738k|4570 8857
19 13 5 60 0 4|5471k 5148k|4661k 354k|2736k 2484k|4563 8084
16 9 10 62 0 2|3656k 1818k|3464k 189k|1815k 948k|3121 5361
22 15 5 54 0 4|5016k 3176k|5773k 412k|2524k 1549k|5337 10k
20 12 9 57 0 3|2277k 1528k|3405k 288k|1120k 764k|3786 7112
15 9 4 69 0 3|4410k 2786k|4233k 311k|2228k 1411k|4115 6685
20 12 10 56 0 2|3765k 1953k|2490k 159k|1863k 964k|2550 6832
26 14 22 36 0 2|1709k 569k|2969k 219k| 848k 279k|3229 8640
16 11 7 63 0 3|4095k 2934k|4986k 316k|2047k 1471k|4413 7165
18 11 3 66 0 3|4219k 1238k|3623k 247k|2119k 616k|3767 6728
16 12 5 64 0 3|4122k 2278k|4400k 343k|2066k 1184k|4325 7220
15 11 5 66 0 3|3715k 1467k|4760k 282k|1858k 824k|4130 5918
7 9 0 80 0 3|4986k 2773k|5811k 328k|2652k 1255k|4244 5173
9 6 10 74 0 2|4465k 846k|2100k 116k|2061k 420k|2106 2349
13 8 12 63 0 4|3813k 2607k|5926k 365k|1917k 1309k|4588 5611
6 6 0 84 0 3|3898k 1206k|4807k 236k|1976k 983k|3477 4210 missed 2 ticks
6 4 6 83 0 1|4312k 1281k| 679k 58k|2118k 255k|1618 2035
15 9 18 55 0 4|3489k 1354k|5087k 323k|1746k 713k|4396 5182
9 5 2 82 0 2|4026k 1134k|1792k 101k|2020k 548k|2183 3555
14 13 3 66 0 4|3269k 1974k|8776k 476k|1642k 1074k|5937 7077
10 8 3 77 0 2|4211k 1192k|3227k 196k|2092k 492k|3098 4070
7 6 7 78 0 3|3672k 2268k|4879k 234k|1833k 1134k|3490 3608
8 7 6 74 0 4|3782k 2708k|5389k 309k|1902k 1357k|4026 4887
1 6 0 91 0 2|4662k 33k|1720k 145k|2357k 117k|2587 2066
3 11 0 85 0 1|4285k 941k|1506k 78k|2118k 431k|2026 1968
5 8 0 83 0 4|4463k 3075k|5975k 364k|2219k 1729k|4167 4147
3 4 5 86 0 2|4004k 834k|2943k 137k|2027k 161k|2518 2195
3 3 0 93 0 2|3016k 974k|1979k 93k|1490k 676k|2034 1717
7 5 2 85 0 2|4066k 2286k|2617k 195k|2047k 954k|2955 3344
8 6 7 77 0 3|4247k 2599k|3422k 252k|2108k 1300k|3623 3129
8 4 12 72 0 3|4056k 1235k|4237k 201k|2028k 618k|3190 2675
5 7 0 84 0 3|3789k 1222k|5824k 314k|1955k 612k|3758 5173
0 5 0 94 0 1|3544k 418k| 646k 29k|1744k 216k|1527 989
1 3 0 94 0 2|3263k 263k|2193k 105k|1614k 165k|2173 1673
2 13 0 83 0 2|3252k 1124k|2546k 200k|1612k 521k|2832 2386
3 34 0 59 0 3|2959k 342k|7795k 325k|1472k 171k|4462 3451
5 22 2 67 0 4|2898k 1534k| 10M 452k|1452k 767k|4380 4124
9 12 12 66 0 2|3530k 479k|2890k 140k|1764k 240k|2453 2538
6 6 12 74 0 2|3334k 2631k|2660k 122k|1672k 1546k|2480 2070 missed 2 ticks
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
9 3 21 65 0 2|3750k 765k|3169k 134k|1872k 152k|2273 1921
5 6 1 83 0 4|3618k 1295k|6543k 330k|1891k 648k|4030 4131
3 5 2 87 0 2|3600k 1054k|2851k 173k|1720k 527k|2815 2687
4 7 1 83 0 5|3677k 1344k|6024k 314k|1844k 734k|3877 4376
4 5 3 85 0 3|3953k 933k|3196k 152k|1989k 405k|2618 2321
2 3 0 94 0 1|3106k 131k| 486k 24k|1544k 131k|1466 1374
2 3 0 93 0 1|3089k 672k|1454k 65k|1540k 362k|1825 1909
7 4 2 86 0 1|3393k 878k|1503k 84k|1694k 416k|1882 2033
9 3 25 62 0 2|3496k 1833k|1979k 90k|1748k 848k|2112 1797
6 4 3 84 0 3|3592k 861k|4340k 191k|1795k 432k|2926 3143
4 6 0 87 0 3|3399k 847k|3758k 186k|1740k 440k|2699 4299
1 2 0 97 0 1|2807k 365k| 685k 49k|1394k 168k|1175 840 missed 2 ticks
2 3 4 90 0 2|3183k 801k|2022k 87k|1568k 399k|1998 1561
2 3 2 91 0 2|3014k 726k|2214k 96k|1521k 368k|2072 1652
4 5 2 86 0 3|3344k 1686k|4970k 217k|1659k 838k|3209 2936
8 4 17 69 0 2|3026k 741k|1923k 107k|1510k 370k|1993 2227
8 4 23 63 0 2|3496k 1026k|2948k 129k|1754k 513k|2347 2048
6 7 2 81 0 4|3438k 1222k|5658k 272k|1746k 626k|3740 5708
0 5 0 94 0 1|2902k 30k|1012k 43k|1435k 0 |1637 1161
1 2 2 93 0 1|2968k 102k| 985k 59k|1471k 122k|1402 1101
4 5 1 88 0 3|3651k 1814k|3838k 170k|1840k 841k|2769 2382
2 2 1 94 0 1|2570k 344k| 500k 23k|1283k 214k|1360 1299
5 3 2 89 0 1|2728k 964k|1119k 70k|1378k 450k|1760 2024
8 3 24 64 0 1|2993k 967k| 737k 29k|1470k 468k|1432 1251
12 2 37 48 0 1|2547k 710k| 651k 26k|1274k 360k|1435 1199
9 3 26 60 0 2|3218k 1630k|3540k 153k|1612k 847k|2723 2174
3 4 5 85 0 3|3618k 870k|3796k 168k|1807k 414k|2653 2497
4 5 0 90 0 1|3134k 841k|1489k 81k|1591k 419k|1972 3498
1 2 0 97 0 1|2910k 349k| 816k 55k|1438k 191k|1525 1096
3 4 2 89 0 2|3240k 930k|2779k 122k|1610k 433k|2313 2036
4 5 0 89 0 2|3079k 1340k|4054k 184k|1549k 670k|2981 3567
2 6 1 90 0 1|2702k 256k|1080k 50k|1348k 178k|1658 1413
3 4 6 85 0 2|3798k 1128k|2208k 105k|1890k 513k|2194 1984
10 3 33 53 0 1|3619k 1239k|1147k 50k|1821k 620k|1708 1563
7 5 12 73 0 3|3689k 1795k|3633k 185k|1833k 898k|2744 2404 missed 2 ticks
4 4 4 85 0 3|3309k 282k|3728k 168k|1662k 166k|2661 2891
2 11 0 84 0 2|2989k 195k|3949k 186k|1530k 92k|2528 3687
0 2 0 96 0 1|2576k 67k|1148k 67k|1278k 40k|1668 1124
1 2 0 95 0 2|2680k 896k|2093k 94k|1317k 548k|2088 1564
1 2 0 95 0 1|2938k 809k|1769k 72k|1461k 279k|1825 1385
2 3 3 90 0 2|3099k 1158k|2854k 125k|1562k 611k|2317 1841
4 4 1 90 0 2|2806k 670k|2139k 94k|1398k 303k|2096 2173
9 5 11 73 0 2|2930k 1646k|2741k 122k|1454k 823k|2504 2515
11 3 29 56 0 1|3154k 1049k|1453k 85k|1578k 524k|1849 1599
5 4 5 84 0 2|3135k 489k|3718k 161k|1570k 268k|2806 2712
3 4 2 90 0 1|3010k 513k|1514k 82k|1530k 233k|1936 2989
3 4 0 91 0 2|2891k 378k|3174k 148k|1430k 196k|2562 2776
2 12 0 83 0 2|3146k 310k|3730k 184k|1569k 149k|2399 2101
3 3 0 93 0 1|2491k 358k|1628k 73k|1245k 179k|1837 1755

Thanks,
Fengguang

2009-06-15 18:26:11

by Johannes Weiner

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Fri, Jun 12, 2009 at 09:59:27AM +0800, Wu Fengguang wrote:
> On Thu, Jun 11, 2009 at 06:17:42PM +0800, Johannes Weiner wrote:
> > On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> > > Unfortunately, after fixing it up the swap readahead patch still performs slow
> > > (even worse this time):
> >
> > Thanks for doing the tests. Do you know if the time difference comes
> > from IO or CPU time?
> >
> > Because one reason I could think of is that the original code walks
> > the readaround window in two directions, starting from the target each
> > time but immediately stops when it encounters a hole where the new
> > code just skips holes but doesn't abort readaround and thus might
> > indeed read more slots.
> >
> > I have an old patch flying around that changed the physical ra code to
> > use a bitmap that is able to represent holes. If the increased time
> > is waiting for IO, I would be interested if that patch has the same
> > negative impact.
>
> You can send me the patch :)

Okay, attached is a rebase against latest -mmotm.

> But for this patch it is IO bound. The CPU iowait field actually is
> going up as the test goes on:

It's probably the larger ra window then which takes away the bandwidth
needed to load the new executables. This sucks. Would be nice to
have 'optional IO' for readahead that is dropped when normal-priority
IO requests are coming in... Oh, we have READA for bios. But it
doesn't seem to implement dropping requests on load (or I am blind).

Hannes

---

diff --git a/include/linux/swap.h b/include/linux/swap.h
index c88b366..119ad43 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -284,7 +284,7 @@ extern swp_entry_t get_swap_page(void);
extern swp_entry_t get_swap_page_of_type(int);
extern void swap_duplicate(swp_entry_t);
extern int swapcache_prepare(swp_entry_t);
-extern int valid_swaphandles(swp_entry_t, unsigned long *);
+extern pgoff_t valid_swaphandles(swp_entry_t, unsigned long *, unsigned long);
extern void swap_free(swp_entry_t);
extern void swapcache_free(swp_entry_t, struct page *page);
extern int free_swap_and_cache(swp_entry_t);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 42cd38e..c9f9c97 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -348,10 +348,10 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
struct vm_area_struct *vma, unsigned long addr)
{
- int nr_pages;
- struct page *page;
+ unsigned long nr_slots = 1 << page_cluster;
+ DECLARE_BITMAP(slots, nr_slots);
unsigned long offset;
- unsigned long end_offset;
+ pgoff_t base;

/*
* Get starting offset for readaround, and number of pages to read.
@@ -360,11 +360,15 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
* more likely that neighbouring swap pages came from the same node:
* so use the same "addr" to choose the same node for each swap read.
*/
- nr_pages = valid_swaphandles(entry, &offset);
- for (end_offset = offset + nr_pages; offset < end_offset; offset++) {
- /* Ok, do the async read-ahead now */
- page = read_swap_cache_async(swp_entry(swp_type(entry), offset),
- gfp_mask, vma, addr);
+ base = valid_swaphandles(entry, slots, nr_slots);
+ for (offset = find_first_bit(slots, nr_slots);
+ offset < nr_slots;
+ offset = find_next_bit(slots, nr_slots, offset + 1)) {
+ struct page *page;
+ swp_entry_t tmp;
+
+ tmp = swp_entry(swp_type(entry), base + offset);
+ page = read_swap_cache_async(tmp, gfp_mask, vma, addr);
if (!page)
break;
page_cache_release(page);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index d1ade1a..27771dd 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2163,25 +2163,28 @@ get_swap_info_struct(unsigned type)
return &swap_info[type];
}

+static int swap_inuse(unsigned long count)
+{
+ int swapcount = swap_count(count);
+ return swapcount && swapcount != SWAP_MAP_BAD;
+}
+
/*
* swap_lock prevents swap_map being freed. Don't grab an extra
* reference on the swaphandle, it doesn't matter if it becomes unused.
*/
-int valid_swaphandles(swp_entry_t entry, unsigned long *offset)
+pgoff_t valid_swaphandles(swp_entry_t entry, unsigned long *slots,
+ unsigned long nr_slots)
{
struct swap_info_struct *si;
- int our_page_cluster = page_cluster;
- pgoff_t target, toff;
- pgoff_t base, end;
- int nr_pages = 0;
-
- if (!our_page_cluster) /* no readahead */
- return 0;
+ pgoff_t target, base, end;

+ bitmap_zero(slots, nr_slots);
si = &swap_info[swp_type(entry)];
target = swp_offset(entry);
- base = (target >> our_page_cluster) << our_page_cluster;
- end = base + (1 << our_page_cluster);
+ base = target & ~(nr_slots - 1);
+ end = base + nr_slots;
+
if (!base) /* first page is swap header */
base++;

@@ -2189,28 +2192,10 @@ int valid_swaphandles(swp_entry_t entry, unsigned long *offset)
if (end > si->max) /* don't go beyond end of map */
end = si->max;

- /* Count contiguous allocated slots above our target */
- for (toff = target; ++toff < end; nr_pages++) {
- /* Don't read in free or bad pages */
- if (!si->swap_map[toff])
- break;
- if (swap_count(si->swap_map[toff]) == SWAP_MAP_BAD)
- break;
- }
- /* Count contiguous allocated slots below our target */
- for (toff = target; --toff >= base; nr_pages++) {
- /* Don't read in free or bad pages */
- if (!si->swap_map[toff])
- break;
- if (swap_count(si->swap_map[toff]) == SWAP_MAP_BAD)
- break;
- }
- spin_unlock(&swap_lock);
+ while (end-- > base)
+ if (end == target || swap_inuse(si->swap_map[end]))
+ set_bit(end - base, slots);

- /*
- * Indicate starting offset, and return number of pages to get:
- * if only 1, say 0, since there's then no readahead to be done.
- */
- *offset = ++toff;
- return nr_pages? ++nr_pages: 0;
+ spin_unlock(&swap_lock);
+ return base;
}

2009-06-17 22:45:26

by Johannes Weiner

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Thu, Jun 11, 2009 at 02:31:22PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 9 Jun 2009 21:01:28 +0200
> Johannes Weiner <[email protected]> wrote:
> > [resend with lists cc'd, sorry]
> >
> > +static int swap_readahead_ptes(struct mm_struct *mm,
> > + unsigned long addr, pmd_t *pmd,
> > + swp_entry_t *entries,
> > + unsigned long cluster)
> > +{
> > + unsigned long window, min, max, limit;
> > + spinlock_t *ptl;
> > + pte_t *ptep;
> > + int i, nr;
> > +
> > + window = cluster << PAGE_SHIFT;
> > + min = addr & ~(window - 1);
> > + max = min + cluster;
>
> Johannes, I wonder there is no reason to use "alignment".

I am wondering too. I digged into the archives but the alignment
comes from a change older than what history.git documents, so I wasn't
able to find written down justification for this.

> I think we just need to read "nearby" pages. Then, this function's
> scan range should be
>
> [addr - window/2, addr + window/2)
> or some.
>
> And here, too
> > + if (!entries) /* XXX: shmem case */
> > + return swapin_readahead_phys(entry, gfp_mask, vma, addr);
> > + pmin = swp_offset(entry) & ~(cluster - 1);
> > + pmax = pmin + cluster;
>
> pmin = swp_offset(entry) - cluster/2.
> pmax = swp_offset(entry) + cluster/2.
>
> I'm sorry if I miss a reason for using "alignment".

Perhas someone else knows a good reason for it, but I think it could
even be harmful.

Chances are that several processes fault around the same slots
simultaneously. By letting them all start at the same aligned offset
we have a maximum race between them and they all allocate pages for
the same slots concurrently.

By placing the window unaligned we decrease this overlapping, so it
sounds like a good idea.

It would increase the amount of readahead done even more, though, and
Fengguang already measured degradation in IO latency with my patch, so
this probably needs more changes to work well.

2009-06-18 09:20:04

by Fengguang Wu

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Tue, Jun 16, 2009 at 02:22:17AM +0800, Johannes Weiner wrote:
> On Fri, Jun 12, 2009 at 09:59:27AM +0800, Wu Fengguang wrote:
> > On Thu, Jun 11, 2009 at 06:17:42PM +0800, Johannes Weiner wrote:
> > > On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> > > > Unfortunately, after fixing it up the swap readahead patch still performs slow
> > > > (even worse this time):
> > >
> > > Thanks for doing the tests. Do you know if the time difference comes
> > > from IO or CPU time?
> > >
> > > Because one reason I could think of is that the original code walks
> > > the readaround window in two directions, starting from the target each
> > > time but immediately stops when it encounters a hole where the new
> > > code just skips holes but doesn't abort readaround and thus might
> > > indeed read more slots.
> > >
> > > I have an old patch flying around that changed the physical ra code to
> > > use a bitmap that is able to represent holes. If the increased time
> > > is waiting for IO, I would be interested if that patch has the same
> > > negative impact.
> >
> > You can send me the patch :)
>
> Okay, attached is a rebase against latest -mmotm.
>
> > But for this patch it is IO bound. The CPU iowait field actually is
> > going up as the test goes on:
>
> It's probably the larger ra window then which takes away the bandwidth
> needed to load the new executables. This sucks. Would be nice to
> have 'optional IO' for readahead that is dropped when normal-priority
> IO requests are coming in... Oh, we have READA for bios. But it
> doesn't seem to implement dropping requests on load (or I am blind).

Hi Hannes,

Sorry for the long delay! A bad news is that I get many oom with this patch:

[ 781.450862] Xorg invoked oom-killer: gfp_mask=0xd2, order=0, oom_adj=0
[ 781.457411] Pid: 3272, comm: Xorg Not tainted 2.6.30-rc8-mm1 #312
[ 781.463511] Call Trace:
[ 781.465976] [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 781.471462] [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 781.477210] [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 781.482449] [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 781.488188] [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 781.493666] [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 781.500015] [<ffffffff81079ebd>] ? trace_hardirqs_on+0xd/0x10
[ 781.505846] [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[ 781.511857] [<ffffffff810e7fe8>] __vmalloc_area_node+0xf8/0x190
[ 781.517869] [<ffffffffa014c9b5>] ? i915_gem_execbuffer+0xb45/0x12f0 [i915]
[ 781.524835] [<ffffffff810e8121>] __vmalloc_node+0xa1/0xb0
[ 781.530346] [<ffffffffa014c9b5>] ? i915_gem_execbuffer+0xb45/0x12f0 [i915]
[ 781.537312] [<ffffffffa014bf2b>] ? i915_gem_execbuffer+0xbb/0x12f0 [i915]
[ 781.544192] [<ffffffff810e8281>] vmalloc+0x21/0x30
[ 781.549100] [<ffffffffa014c9b5>] i915_gem_execbuffer+0xb45/0x12f0 [i915]
[ 781.555920] [<ffffffff81079ebd>] ? trace_hardirqs_on+0xd/0x10
[ 781.561789] [<ffffffffa00f5b7d>] drm_ioctl+0x12d/0x3d0 [drm]
[ 781.567569] [<ffffffffa014be70>] ? i915_gem_execbuffer+0x0/0x12f0 [i915]
[ 781.574383] [<ffffffff81079ebd>] ? trace_hardirqs_on+0xd/0x10
[ 781.580225] [<ffffffff8110babd>] vfs_ioctl+0x7d/0xa0
[ 781.585287] [<ffffffff8110bb6a>] do_vfs_ioctl+0x8a/0x580
[ 781.590706] [<ffffffff81078f3a>] ? lockdep_sys_exit+0x2a/0x90
[ 781.596552] [<ffffffff81544b34>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 781.602929] [<ffffffff8110c0aa>] sys_ioctl+0x4a/0x80
[ 781.607995] [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[ 781.614005] Mem-Info:
[ 781.616293] Node 0 DMA per-cpu:
[ 781.619471] CPU 0: hi: 0, btch: 1 usd: 0
[ 781.624278] CPU 1: hi: 0, btch: 1 usd: 0
[ 781.629080] Node 0 DMA32 per-cpu:
[ 781.632443] CPU 0: hi: 186, btch: 31 usd: 83
[ 781.637243] CPU 1: hi: 186, btch: 31 usd: 108
[ 781.642045] Active_anon:41057 active_file:2334 inactive_anon:47003
[ 781.642048] inactive_file:2148 unevictable:4 dirty:0 writeback:0 unstable:0
[ 781.642051] free:1180 slab:14177 mapped:4473 pagetables:7629 bounce:0
[ 781.661802] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5408kB inactive_anon:5676kB active_file:16kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:42276 all_unreclaimable? no
[ 781.680773] lowmem_reserve[]: 0 483 483 483
[ 781.685089] Node 0 DMA32 free:2704kB min:2768kB low:3460kB high:4152kB active_anon:158820kB inactive_anon:182224kB active_file:9320kB inactive_file:8592kB unevictable:16kB present:495008kB pages_scanned:673623 all_unreclaimable? yes
[ 781.705711] lowmem_reserve[]: 0 0 0 0
[ 781.709501] Node 0 DMA: 104*4kB 0*8kB 6*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[ 781.720553] Node 0 DMA32: 318*4kB 1*8kB 1*16kB 6*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2704kB
[ 781.731764] 61569 total pagecache pages
[ 781.735618] 6489 pages in swap cache
[ 781.739212] Swap cache stats: add 285146, delete 278657, find 31455/133061
[ 781.746092] Free swap = 709316kB
[ 781.749417] Total swap = 1048568kB
[ 781.759726] 131072 pages RAM
[ 781.762645] 9628 pages reserved
[ 781.765793] 95620 pages shared
[ 781.768862] 58466 pages non-shared
[ 781.772278] Out of memory: kill process 3487 (run-many-x-apps) score 1471069 or a child
[ 781.780291] Killed process 3488 (xeyes)
[ 781.830240] gtali invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[ 781.837208] Pid: 4113, comm: gtali Not tainted 2.6.30-rc8-mm1 #312
[ 781.843554] Call Trace:
[ 781.846233] [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 781.851870] [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 781.857615] [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 781.862840] [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 781.868578] [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 781.874054] [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 781.880401] [<ffffffff810f3fb6>] alloc_page_vma+0x86/0x1c0
[ 781.885969] [<ffffffff810e9d08>] read_swap_cache_async+0xd8/0x120
[ 781.892147] [<ffffffff810e9f05>] swapin_readahead+0xb5/0x110
[ 781.897886] [<ffffffff810dac73>] do_swap_page+0x403/0x510
[ 781.903366] [<ffffffff810e9933>] ? lookup_swap_cache+0x13/0x30
[ 781.909279] [<ffffffff810da8ea>] ? do_swap_page+0x7a/0x510
[ 781.914850] [<ffffffff810dc72e>] handle_mm_fault+0x44e/0x500
[ 781.920587] [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[ 781.926149] [<ffffffff81545a95>] page_fault+0x25/0x30
[ 781.931287] Mem-Info:
[ 781.933559] Node 0 DMA per-cpu:
[ 781.936714] CPU 0: hi: 0, btch: 1 usd: 0
[ 781.941500] CPU 1: hi: 0, btch: 1 usd: 0
[ 781.946288] Node 0 DMA32 per-cpu:
[ 781.949615] CPU 0: hi: 186, btch: 31 usd: 84
[ 781.954402] CPU 1: hi: 186, btch: 31 usd: 109
[ 781.959192] Active_anon:41029 active_file:2334 inactive_anon:46908
[ 781.959193] inactive_file:2211 unevictable:4 dirty:0 writeback:0 unstable:0
[ 781.959194] free:1180 slab:14177 mapped:4492 pagetables:7608 bounce:0
[ 781.978897] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5296kB inactive_anon:5408kB active_file:16kB inactive_file:176kB unevictable:0kB present:15164kB pages_scanned:6816 all_unreclaimable? no
[ 781.997900] lowmem_reserve[]: 0 483 483 483
[ 782.002173] Node 0 DMA32 free:2704kB min:2768kB low:3460kB high:4152kB active_anon:158820kB inactive_anon:182224kB active_file:9320kB inactive_file:8668kB unevictable:16kB present:495008kB pages_scanned:674199 all_unreclaimable? yes
[ 782.022740] lowmem_reserve[]: 0 0 0 0
[ 782.026488] Node 0 DMA: 82*4kB 9*8kB 7*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[ 782.037309] Node 0 DMA32: 318*4kB 1*8kB 1*16kB 6*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2704kB
[ 782.048405] 61637 total pagecache pages
[ 782.052236] 6494 pages in swap cache
[ 782.055809] Swap cache stats: add 285154, delete 278660, find 31456/133069
[ 782.062672] Free swap = 709592kB
[ 782.065983] Total swap = 1048568kB
[ 782.072735] 131072 pages RAM
[ 782.075632] 9628 pages reserved
[ 782.078774] 95669 pages shared
[ 782.081822] 58413 pages non-shared
[ 782.085223] Out of memory: kill process 3487 (run-many-x-apps) score 1466556 or a child
[ 782.093215] Killed process 3566 (gthumb)
[ 790.063897] gnome-panel invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 790.071664] Pid: 3405, comm: gnome-panel Not tainted 2.6.30-rc8-mm1 #312
[ 790.078421] Call Trace:
[ 790.080902] [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 790.086410] [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 790.092159] [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 790.097387] [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 790.103135] [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 790.108632] [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 790.115001] [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[ 790.121002] [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[ 790.126745] [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[ 790.133352] [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[ 790.140057] [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[ 790.145103] [<ffffffff810c1497>] filemap_fault+0x3f7/0x400
[ 790.150678] [<ffffffff810d9893>] __do_fault+0x53/0x510
[ 790.155902] [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[ 790.161989] [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[ 790.167738] [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[ 790.173304] [<ffffffff81545a95>] page_fault+0x25/0x30
[ 790.178441] Mem-Info:
[ 790.180714] Node 0 DMA per-cpu:
[ 790.183870] CPU 0: hi: 0, btch: 1 usd: 0
[ 790.188659] CPU 1: hi: 0, btch: 1 usd: 0
[ 790.193446] Node 0 DMA32 per-cpu:
[ 790.196783] CPU 0: hi: 186, btch: 31 usd: 43
[ 790.201569] CPU 1: hi: 186, btch: 31 usd: 31
[ 790.206359] Active_anon:41179 active_file:900 inactive_anon:46967
[ 790.206360] inactive_file:4104 unevictable:4 dirty:0 writeback:0 unstable:0
[ 790.206361] free:1165 slab:13961 mapped:3241 pagetables:7475 bounce:0
[ 790.225984] Node 0 DMA free:2012kB min:84kB low:104kB high:124kB active_anon:5496kB inactive_anon:5800kB active_file:4kB inactive_file:220kB unevictable:0kB present:15164kB pages_scanned:26112 all_unreclaimable? yes
[ 790.245079] lowmem_reserve[]: 0 483 483 483
[ 790.249352] Node 0 DMA32 free:2648kB min:2768kB low:3460kB high:4152kB active_anon:159220kB inactive_anon:182068kB active_file:3596kB inactive_file:16196kB unevictable:16kB present:495008kB pages_scanned:875456 all_unreclaimable? yes
[ 790.270005] lowmem_reserve[]: 0 0 0 0
[ 790.273762] Node 0 DMA: 53*4kB 9*8kB 12*16kB 2*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[ 790.284681] Node 0 DMA32: 190*4kB 46*8kB 7*16kB 6*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2648kB
[ 790.295866] 62097 total pagecache pages
[ 790.299698] 6548 pages in swap cache
[ 790.303271] Swap cache stats: add 286032, delete 279484, find 31565/133879
[ 790.310137] Free swap = 717460kB
[ 790.313445] Total swap = 1048568kB
[ 790.320544] 131072 pages RAM
[ 790.323445] 9628 pages reserved
[ 790.326591] 85371 pages shared
[ 790.329641] 59742 pages non-shared
[ 790.333046] Out of memory: kill process 3487 (run-many-x-apps) score 1258333 or a child
[ 790.341039] Killed process 3599 (gedit)
[ 790.382081] gedit used greatest stack depth: 2064 bytes left
[ 792.149572] Xorg invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
[ 792.156786] Pid: 3272, comm: Xorg Not tainted 2.6.30-rc8-mm1 #312
[ 792.162980] Call Trace:
[ 792.165429] [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 792.170937] [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 792.176691] [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 792.181909] [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 792.187653] [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 792.193136] [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 792.199490] [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[ 792.205491] [<ffffffff810c7409>] __get_free_pages+0x9/0x50
[ 792.211060] [<ffffffff8110e402>] __pollwait+0xc2/0x100
[ 792.216283] [<ffffffff81495903>] unix_poll+0x23/0xc0
[ 792.221330] [<ffffffff81419ac8>] sock_poll+0x18/0x20
[ 792.226380] [<ffffffff8110d9a9>] do_select+0x3e9/0x730
[ 792.231597] [<ffffffff8110d5c0>] ? do_select+0x0/0x730
[ 792.236816] [<ffffffff8110e340>] ? __pollwait+0x0/0x100
[ 792.242126] [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 792.247180] [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 792.252227] [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 792.257275] [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 792.262331] [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 792.267377] [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 792.272422] [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 792.277468] [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 792.282519] [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 792.287574] [<ffffffff8110deef>] core_sys_select+0x1ff/0x330
[ 792.293317] [<ffffffff8110dd38>] ? core_sys_select+0x48/0x330
[ 792.299162] [<ffffffffa014954c>] ? i915_gem_throttle_ioctl+0x4c/0x60 [i915]
[ 792.306204] [<ffffffff81079ebd>] ? trace_hardirqs_on+0xd/0x10
[ 792.312034] [<ffffffff810706cc>] ? getnstimeofday+0x5c/0xf0
[ 792.317687] [<ffffffff8106acb9>] ? ktime_get_ts+0x59/0x60
[ 792.323169] [<ffffffff8110e27a>] sys_select+0x4a/0x110
[ 792.328387] [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[ 792.334389] Mem-Info:
[ 792.336663] Node 0 DMA per-cpu:
[ 792.339824] CPU 0: hi: 0, btch: 1 usd: 0
[ 792.344612] CPU 1: hi: 0, btch: 1 usd: 0
[ 792.349397] Node 0 DMA32 per-cpu:
[ 792.352734] CPU 0: hi: 186, btch: 31 usd: 57
[ 792.357518] CPU 1: hi: 186, btch: 31 usd: 50
[ 792.362310] Active_anon:40862 active_file:1622 inactive_anon:47020
[ 792.362311] inactive_file:3746 unevictable:4 dirty:0 writeback:0 unstable:0
[ 792.362313] free:1187 slab:13902 mapped:4052 pagetables:7387 bounce:0
[ 792.382030] Node 0 DMA free:2012kB min:84kB low:104kB high:124kB active_anon:5428kB inactive_anon:5680kB active_file:0kB inactive_file:224kB unevictable:0kB present:15164kB pages_scanned:4992 all_unreclaimable? no
[ 792.400957] lowmem_reserve[]: 0 483 483 483
[ 792.405232] Node 0 DMA32 free:2736kB min:2768kB low:3460kB high:4152kB active_anon:158020kB inactive_anon:182284kB active_file:6488kB inactive_file:14760kB unevictable:16kB present:495008kB pages_scanned:876741 all_unreclaimable? yes
[ 792.425889] lowmem_reserve[]: 0 0 0 0
[ 792.429637] Node 0 DMA: 31*4kB 14*8kB 15*16kB 2*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[ 792.440651] Node 0 DMA32: 86*4kB 95*8kB 14*16kB 6*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2736kB
[ 792.451821] 62288 total pagecache pages
[ 792.455655] 6442 pages in swap cache
[ 792.459230] Swap cache stats: add 286223, delete 279781, find 31574/134040
[ 792.466100] Free swap = 723520kB
[ 792.469405] Total swap = 1048568kB
[ 792.476461] 131072 pages RAM
[ 792.479359] 9628 pages reserved
[ 792.482502] 86274 pages shared
[ 792.485547] 59031 pages non-shared
[ 792.488956] Out of memory: kill process 3487 (run-many-x-apps) score 1235901 or a child
[ 792.496952] Killed process 3626 (xpdf.bin)
[ 912.097890] gnome-control-c invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 912.105967] Pid: 5395, comm: gnome-control-c Not tainted 2.6.30-rc8-mm1 #312
[ 912.113042] Call Trace:
[ 912.115499] [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 912.120994] [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 912.126737] [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 912.131961] [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 912.137709] [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 912.143193] [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 912.149547] [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[ 912.155551] [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[ 912.161295] [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[ 912.167904] [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[ 912.174602] [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[ 912.179650] [<ffffffff810c1497>] filemap_fault+0x3f7/0x400
[ 912.185221] [<ffffffff810d9893>] __do_fault+0x53/0x510
[ 912.190445] [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[ 912.196539] [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[ 912.202278] [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[ 912.207840] [<ffffffff81545a95>] page_fault+0x25/0x30
[ 912.212976] Mem-Info:
[ 912.215247] Node 0 DMA per-cpu:
[ 912.218402] CPU 0: hi: 0, btch: 1 usd: 0
[ 912.223190] CPU 1: hi: 0, btch: 1 usd: 0
[ 912.227979] Node 0 DMA32 per-cpu:
[ 912.231315] CPU 0: hi: 186, btch: 31 usd: 118
[ 912.236100] CPU 1: hi: 186, btch: 31 usd: 158
[ 912.240891] Active_anon:42350 active_file:809 inactive_anon:47098
[ 912.240892] inactive_file:2682 unevictable:4 dirty:0 writeback:3 unstable:0
[ 912.240893] free:1164 slab:13886 mapped:3078 pagetables:7561 bounce:0
[ 912.260546] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5456kB inactive_anon:5676kB active_file:4kB inactive_file:72kB unevictable:0kB present:15164kB pages_scanned:1920 all_unreclaimable? no
[ 912.279403] lowmem_reserve[]: 0 483 483 483
[ 912.283671] Node 0 DMA32 free:2600kB min:2768kB low:3460kB high:4152kB active_anon:163944kB inactive_anon:182600kB active_file:3232kB inactive_file:10644kB unevictable:16kB present:495008kB pages_scanned:571360 all_unreclaimable? yes
[ 912.304335] lowmem_reserve[]: 0 0 0 0
[ 912.308082] Node 0 DMA: 22*4kB 16*8kB 12*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2008kB
[ 912.319093] Node 0 DMA32: 128*4kB 131*8kB 1*16kB 0*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2600kB
[ 912.330367] 62393 total pagecache pages
[ 912.334201] 7186 pages in swap cache
[ 912.337778] Swap cache stats: add 320003, delete 312817, find 34852/153688
[ 912.344648] Free swap = 714408kB
[ 912.347950] Total swap = 1048568kB
[ 912.355114] 131072 pages RAM
[ 912.358011] 9628 pages reserved
[ 912.361153] 84608 pages shared
[ 912.364199] 58138 pages non-shared
[ 912.367606] Out of memory: kill process 3487 (run-many-x-apps) score 1281073 or a child
[ 912.375604] Killed process 3669 (xterm)
[ 912.427936] tty_ldisc_deref: no references.
[ 912.480847] nautilus invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 912.487981] Pid: 3408, comm: nautilus Not tainted 2.6.30-rc8-mm1 #312
[ 912.494418] Call Trace:
[ 912.496876] [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 912.502361] [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 912.508100] [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 912.513327] [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 912.519067] [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 912.524552] [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 912.530902] [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[ 912.536907] [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[ 912.542645] [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[ 912.549253] [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[ 912.555946] [<ffffffff810a9c9b>] ? delayacct_end+0x6b/0xa0
[ 912.561517] [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[ 912.566563] [<ffffffff810cacb3>] ondemand_readahead+0x163/0x2d0
[ 912.572563] [<ffffffff810caf25>] page_cache_sync_readahead+0x25/0x30
[ 912.579000] [<ffffffff810c141c>] filemap_fault+0x37c/0x400
[ 912.584576] [<ffffffff810d9893>] __do_fault+0x53/0x510
[ 912.589799] [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[ 912.595888] [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[ 912.601632] [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[ 912.607206] [<ffffffff81545a95>] page_fault+0x25/0x30
[ 912.612345] Mem-Info:
[ 912.614624] Node 0 DMA per-cpu:
[ 912.617787] CPU 0: hi: 0, btch: 1 usd: 0
[ 912.622570] CPU 1: hi: 0, btch: 1 usd: 0
[ 912.627353] Node 0 DMA32 per-cpu:
[ 912.630682] CPU 0: hi: 186, btch: 31 usd: 121
[ 912.635470] CPU 1: hi: 186, btch: 31 usd: 76
[ 912.640259] Active_anon:42310 active_file:830 inactive_anon:47085
[ 912.640260] inactive_file:2747 unevictable:4 dirty:0 writeback:0 unstable:0
[ 912.640261] free:1182 slab:13881 mapped:3111 pagetables:7523 bounce:0
[ 912.659881] Node 0 DMA free:2004kB min:84kB low:104kB high:124kB active_anon:5468kB inactive_anon:5784kB active_file:4kB inactive_file:56kB unevictable:0kB present:15164kB pages_scanned:5152 all_unreclaimable? no
[ 912.678724] lowmem_reserve[]: 0 483 483 483
[ 912.682990] Node 0 DMA32 free:2724kB min:2768kB low:3460kB high:4152kB active_anon:163772kB inactive_anon:182556kB active_file:3316kB inactive_file:10932kB unevictable:16kB present:495008kB pages_scanned:51712 all_unreclaimable? no
[ 912.703478] lowmem_reserve[]: 0 0 0 0
[ 912.707226] Node 0 DMA: 21*4kB 16*8kB 12*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2004kB
[ 912.718239] Node 0 DMA32: 159*4kB 132*8kB 1*16kB 0*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2732kB
[ 912.729502] 62461 total pagecache pages
[ 912.733337] 7171 pages in swap cache
[ 912.736915] Swap cache stats: add 320011, delete 312840, find 34852/153696
[ 912.743782] Free swap = 715668kB
[ 912.747098] Total swap = 1048568kB
[ 912.754168] 131072 pages RAM
[ 912.757059] 9628 pages reserved
[ 912.760191] 84519 pages shared
[ 912.763248] 58139 pages non-shared
[ 912.766653] Out of memory: kill process 3487 (run-many-x-apps) score 1273781 or a child
[ 912.774647] Killed process 3762 (gnome-terminal)
[ 913.650490] tty_ldisc_deref: no references.
[ 914.671325] kerneloops-appl invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[ 914.679083] Pid: 3425, comm: kerneloops-appl Not tainted 2.6.30-rc8-mm1 #312
[ 914.686121] Call Trace:
[ 914.688575] [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 914.694057] [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 914.699800] [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 914.705034] [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 914.710791] [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 914.716279] [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 914.722640] [<ffffffff810f3fb6>] alloc_page_vma+0x86/0x1c0
[ 914.728208] [<ffffffff810e9d08>] read_swap_cache_async+0xd8/0x120
[ 914.734391] [<ffffffff810e9f05>] swapin_readahead+0xb5/0x110
[ 914.740139] [<ffffffff810dac73>] do_swap_page+0x403/0x510
[ 914.745632] [<ffffffff810c0710>] ? find_get_page+0x0/0x110
[ 914.751200] [<ffffffff810e9933>] ? lookup_swap_cache+0x13/0x30
[ 914.757115] [<ffffffff810da8ea>] ? do_swap_page+0x7a/0x510
[ 914.762688] [<ffffffff810dc72e>] handle_mm_fault+0x44e/0x500
[ 914.768437] [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[ 914.774005] [<ffffffff81545a95>] page_fault+0x25/0x30
[ 914.779136] Mem-Info:
[ 914.781410] Node 0 DMA per-cpu:
[ 914.784572] CPU 0: hi: 0, btch: 1 usd: 0
[ 914.789367] CPU 1: hi: 0, btch: 1 usd: 0
[ 914.794156] Node 0 DMA32 per-cpu:
[ 914.797493] CPU 0: hi: 186, btch: 31 usd: 150
[ 914.802278] CPU 1: hi: 186, btch: 31 usd: 147
[ 914.807064] Active_anon:42324 active_file:1285 inactive_anon:47097
[ 914.807065] inactive_file:2225 unevictable:4 dirty:0 writeback:0 unstable:0
[ 914.807067] free:1185 slab:13908 mapped:3648 pagetables:7413 bounce:0
[ 914.826781] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5360kB inactive_anon:5784kB active_file:0kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:17408 all_unreclaimable? yes
[ 914.845718] lowmem_reserve[]: 0 483 483 483
[ 914.849988] Node 0 DMA32 free:2724kB min:2768kB low:3460kB high:4152kB active_anon:163936kB inactive_anon:182604kB active_file:5140kB inactive_file:8908kB unevictable:16kB present:495008kB pages_scanned:581760 all_unreclaimable? yes
[ 914.870559] lowmem_reserve[]: 0 0 0 0
[ 914.874306] Node 0 DMA: 37*4kB 10*8kB 12*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2020kB
[ 914.885318] Node 0 DMA32: 119*4kB 139*8kB 7*16kB 0*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2724kB
[ 914.896588] 62441 total pagecache pages
[ 914.900417] 7199 pages in swap cache
[ 914.903999] Swap cache stats: add 320272, delete 313073, find 34864/153895
[ 914.910867] Free swap = 721224kB
[ 914.914193] Total swap = 1048568kB
[ 914.921489] 131072 pages RAM
[ 914.924370] 9628 pages reserved
[ 914.927519] 84507 pages shared
[ 914.930581] 57535 pages non-shared
[ 914.933989] Out of memory: kill process 3487 (run-many-x-apps) score 1213315 or a child
[ 914.941986] Killed process 3803 (urxvt)
[ 914.947298] tty_ldisc_deref: no references.
[ 919.983335] gnome-keyboard- invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
[ 919.991145] Pid: 5458, comm: gnome-keyboard- Not tainted 2.6.30-rc8-mm1 #312
[ 919.998198] Call Trace:
[ 920.000663] [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 920.006157] [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 920.011906] [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 920.017135] [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 920.022876] [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 920.028357] [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 920.034706] [<ffffffff810f3fb6>] alloc_page_vma+0x86/0x1c0
[ 920.040280] [<ffffffff810e9d08>] read_swap_cache_async+0xd8/0x120
[ 920.046460] [<ffffffff810e9f05>] swapin_readahead+0xb5/0x110
[ 920.052196] [<ffffffff810dac73>] do_swap_page+0x403/0x510
[ 920.057676] [<ffffffff810e9933>] ? lookup_swap_cache+0x13/0x30
[ 920.063592] [<ffffffff810da8ea>] ? do_swap_page+0x7a/0x510
[ 920.069165] [<ffffffff810dc72e>] handle_mm_fault+0x44e/0x500
[ 920.074901] [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[ 920.080470] [<ffffffff81545a95>] page_fault+0x25/0x30
[ 920.085604] Mem-Info:
[ 920.087875] Node 0 DMA per-cpu:
[ 920.091031] CPU 0: hi: 0, btch: 1 usd: 0
[ 920.095818] CPU 1: hi: 0, btch: 1 usd: 0
[ 920.100617] Node 0 DMA32 per-cpu:
[ 920.103947] CPU 0: hi: 186, btch: 31 usd: 89
[ 920.108734] CPU 1: hi: 186, btch: 31 usd: 119
[ 920.113524] Active_anon:42944 active_file:542 inactive_anon:46956
[ 920.113525] inactive_file:2652 unevictable:4 dirty:0 writeback:0 unstable:0
[ 920.113526] free:1169 slab:13893 mapped:3036 pagetables:7342 bounce:0
[ 920.133149] Node 0 DMA free:2008kB min:84kB low:104kB high:124kB active_anon:5568kB inactive_anon:5772kB active_file:20kB inactive_file:164kB unevictable:0kB present:15164kB pages_scanned:22824 all_unreclaimable? yes
[ 920.152324] lowmem_reserve[]: 0 483 483 483
[ 920.156597] Node 0 DMA32 free:2668kB min:2768kB low:3460kB high:4152kB active_anon:166208kB inactive_anon:182052kB active_file:2148kB inactive_file:10444kB unevictable:16kB present:495008kB pages_scanned:650400 all_unreclaimable? yes
[ 920.177245] lowmem_reserve[]: 0 0 0 0
[ 920.180991] Node 0 DMA: 44*4kB 9*8kB 10*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2008kB
[ 920.191903] Node 0 DMA32: 165*4kB 117*8kB 3*16kB 0*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2668kB
[ 920.203169] 62409 total pagecache pages
[ 920.207000] 7469 pages in swap cache
[ 920.210572] Swap cache stats: add 321003, delete 313534, find 34989/154507
[ 920.217436] Free swap = 725812kB
[ 920.220752] Total swap = 1048568kB
[ 920.227856] 131072 pages RAM
[ 920.230752] 9628 pages reserved
[ 920.233901] 78560 pages shared
[ 920.236958] 58011 pages non-shared
[ 920.240355] Out of memory: kill process 3487 (run-many-x-apps) score 1195965 or a child
[ 920.248346] Killed process 3889 (gnome-system-mo)
[ 920.993872] nautilus invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 921.001843] Pid: 3408, comm: nautilus Not tainted 2.6.30-rc8-mm1 #312
[ 921.008294] Call Trace:
[ 921.010757] [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 921.016245] [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 921.021995] [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 921.027215] [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 921.032954] [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 921.038441] [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 921.044805] [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[ 921.050808] [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[ 921.056549] [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[ 921.063163] [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[ 921.069868] [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[ 921.074918] [<ffffffff810c1497>] filemap_fault+0x3f7/0x400
[ 921.080487] [<ffffffff810d9893>] __do_fault+0x53/0x510
[ 921.085717] [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[ 921.091805] [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[ 921.097552] [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[ 921.103145] [<ffffffff81545a95>] page_fault+0x25/0x30
[ 921.108280] Mem-Info:
[ 921.110556] Node 0 DMA per-cpu:
[ 921.113720] CPU 0: hi: 0, btch: 1 usd: 0
[ 921.118501] CPU 1: hi: 0, btch: 1 usd: 0
[ 921.123286] Node 0 DMA32 per-cpu:
[ 921.126614] CPU 0: hi: 186, btch: 31 usd: 25
[ 921.131400] CPU 1: hi: 186, btch: 31 usd: 58
[ 921.136187] Active_anon:42277 active_file:992 inactive_anon:46953
[ 921.136188] inactive_file:3279 unevictable:4 dirty:0 writeback:0 unstable:0
[ 921.136189] free:1183 slab:13728 mapped:3449 pagetables:7235 bounce:0
[ 921.155810] Node 0 DMA free:2016kB min:84kB low:104kB high:124kB active_anon:5540kB inactive_anon:5772kB active_file:20kB inactive_file:224kB unevictable:0kB present:15164kB pages_scanned:18464 all_unreclaimable? yes
[ 921.174995] lowmem_reserve[]: 0 483 483 483
[ 921.179259] Node 0 DMA32 free:2716kB min:2768kB low:3460kB high:4152kB active_anon:163568kB inactive_anon:182040kB active_file:3948kB inactive_file:12892kB unevictable:16kB present:495008kB pages_scanned:719674 all_unreclaimable? yes
[ 921.199914] lowmem_reserve[]: 0 0 0 0
[ 921.203661] Node 0 DMA: 50*4kB 7*8kB 10*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
[ 921.214577] Node 0 DMA32: 257*4kB 45*8kB 19*16kB 0*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2716kB
[ 921.225837] 63208 total pagecache pages
[ 921.229675] 7214 pages in swap cache
[ 921.233249] Swap cache stats: add 321070, delete 313856, find 34991/154562
[ 921.240112] Free swap = 730844kB
[ 921.243427] Total swap = 1048568kB
[ 921.250566] 131072 pages RAM
[ 921.253460] 9628 pages reserved
[ 921.256599] 79050 pages shared
[ 921.259646] 57895 pages non-shared
[ 921.263048] Out of memory: kill process 3487 (run-many-x-apps) score 1168892 or a child
[ 921.271042] Killed process 3917 (gnome-help)
[ 934.057490] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 934.065285] Pid: 3353, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #312
[ 934.072425] Call Trace:
[ 934.074882] [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 934.080382] [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 934.086126] [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 934.091349] [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 934.097091] [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 934.102568] [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 934.108914] [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[ 934.114922] [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[ 934.120667] [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[ 934.127269] [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[ 934.133963] [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[ 934.139018] [<ffffffff810c1497>] filemap_fault+0x3f7/0x400
[ 934.144593] [<ffffffff810d9893>] __do_fault+0x53/0x510
[ 934.149812] [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[ 934.155898] [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[ 934.161640] [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[ 934.167208] [<ffffffff81545a95>] page_fault+0x25/0x30
[ 934.172348] Mem-Info:
[ 934.174614] Node 0 DMA per-cpu:
[ 934.177775] CPU 0: hi: 0, btch: 1 usd: 0
[ 934.182560] CPU 1: hi: 0, btch: 1 usd: 0
[ 934.187342] Node 0 DMA32 per-cpu:
[ 934.190671] CPU 0: hi: 186, btch: 31 usd: 115
[ 934.195459] CPU 1: hi: 186, btch: 31 usd: 146
[ 934.200251] Active_anon:43024 active_file:1381 inactive_anon:46959
[ 934.200252] inactive_file:2292 unevictable:4 dirty:0 writeback:0 unstable:0
[ 934.200253] free:1170 slab:13755 mapped:4121 pagetables:7012 bounce:0
[ 934.219958] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5532kB inactive_anon:5756kB active_file:16kB inactive_file:248kB unevictable:0kB present:15164kB pages_scanned:18348 all_unreclaimable? yes
[ 934.239142] lowmem_reserve[]: 0 483 483 483
[ 934.243408] Node 0 DMA32 free:2680kB min:2768kB low:3460kB high:4152kB active_anon:166564kB inactive_anon:182080kB active_file:5508kB inactive_file:8920kB unevictable:16kB present:495008kB pages_scanned:689667 all_unreclaimable? yes
[ 934.263988] lowmem_reserve[]: 0 0 0 0
[ 934.267735] Node 0 DMA: 60*4kB 0*8kB 10*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2000kB
[ 934.278662] Node 0 DMA32: 294*4kB 2*8kB 9*16kB 10*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2680kB
[ 934.289834] 62846 total pagecache pages
[ 934.293669] 7202 pages in swap cache
[ 934.297244] Swap cache stats: add 322861, delete 315659, find 35288/156117
[ 934.304107] Free swap = 758748kB
[ 934.307422] Total swap = 1048568kB
[ 934.314470] 131072 pages RAM
[ 934.317362] 9628 pages reserved
[ 934.320501] 76930 pages shared
[ 934.323549] 57149 pages non-shared
[ 934.326955] Out of memory: kill process 3487 (run-many-x-apps) score 1006662 or a child
[ 934.334948] Killed process 3952 (gnome-dictionar)
[ 934.340708] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 934.348622] Pid: 3353, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #312
[ 934.355318] Call Trace:
[ 934.357768] [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 934.363256] [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 934.368998] [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 934.372992] [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 934.372992] [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 934.385506] [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 934.389481] [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[ 934.397856] [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[ 934.401848] [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[ 934.410200] [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[ 934.416894] [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[ 934.421942] [<ffffffff810c1497>] filemap_fault+0x3f7/0x400
[ 934.425936] [<ffffffff810d9893>] __do_fault+0x53/0x510
[ 934.432734] [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[ 934.438822] [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[ 934.444566] [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[ 934.448558] [<ffffffff81545a95>] page_fault+0x25/0x30
[ 934.455262] Mem-Info:
[ 934.457533] Node 0 DMA per-cpu:
[ 934.460695] CPU 0: hi: 0, btch: 1 usd: 0
[ 934.464690] CPU 1: hi: 0, btch: 1 usd: 0
[ 934.470263] Node 0 DMA32 per-cpu:
[ 934.473589] CPU 0: hi: 186, btch: 31 usd: 172
[ 934.478377] CPU 1: hi: 186, btch: 31 usd: 145
[ 934.482373] Active_anon:42768 active_file:1390 inactive_anon:46967
[ 934.482373] inactive_file:2301 unevictable:4 dirty:0 writeback:0 unstable:0
[ 934.482373] free:1495 slab:13778 mapped:4137 pagetables:6916 bounce:0
[ 934.502869] Node 0 DMA free:2060kB min:84kB low:104kB high:124kB active_anon:5492kB inactive_anon:5788kB active_file:28kB inactive_file:252kB unevictable:0kB present:15164kB pages_scanned:0 all_unreclaimable? no
[ 934.521612] lowmem_reserve[]: 0 483 483 483
[ 934.525885] Node 0 DMA32 free:3920kB min:2768kB low:3460kB high:4152kB active_anon:165580kB inactive_anon:182080kB active_file:5532kB inactive_file:8952kB unevictable:16kB present:495008kB pages_scanned:0 all_unreclaimable? no
[ 934.545927] lowmem_reserve[]: 0 0 0 0
[ 934.549677] Node 0 DMA: 71*4kB 2*8kB 10*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2060kB
[ 934.560588] Node 0 DMA32: 588*4kB 10*8kB 9*16kB 10*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 3920kB
[ 934.568475] 62739 total pagecache pages
[ 934.575685] 7086 pages in swap cache
[ 934.579254] Swap cache stats: add 322861, delete 315775, find 35288/156117
[ 934.586118] Free swap = 763384kB
[ 934.589433] Total swap = 1048568kB
[ 934.597155] 131072 pages RAM
[ 934.600036] 9628 pages reserved
[ 934.600235] 76640 pages shared
[ 934.606236] 56884 pages non-shared
[ 934.609634] Out of memory: kill process 3487 (run-many-x-apps) score 978701 or a child
[ 934.617540] Killed process 4014 (sol)
[ 1028.279307] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 1028.286714] Pid: 5554, comm: firefox-bin Not tainted 2.6.30-rc8-mm1 #312
[ 1028.293414] Call Trace:
[ 1028.295874] [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 1028.301361] [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 1028.307109] [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 1028.312330] [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 1028.318069] [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 1028.323554] [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 1028.329900] [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[ 1028.335899] [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[ 1028.341639] [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[ 1028.348247] [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[ 1028.354935] [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[ 1028.359982] [<ffffffff810cacb3>] ondemand_readahead+0x163/0x2d0
[ 1028.365986] [<ffffffff810caf25>] page_cache_sync_readahead+0x25/0x30
[ 1028.372422] [<ffffffff810c141c>] filemap_fault+0x37c/0x400
[ 1028.377985] [<ffffffff810d9893>] __do_fault+0x53/0x510
[ 1028.383205] [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[ 1028.389291] [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[ 1028.395031] [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[ 1028.400594] [<ffffffff81545a95>] page_fault+0x25/0x30
[ 1028.405726] Mem-Info:
[ 1028.408001] Node 0 DMA per-cpu:
[ 1028.411161] CPU 0: hi: 0, btch: 1 usd: 0
[ 1028.416012] CPU 1: hi: 0, btch: 1 usd: 0
[ 1028.420860] Node 0 DMA32 per-cpu:
[ 1028.424346] CPU 0: hi: 186, btch: 31 usd: 125
[ 1028.429129] CPU 1: hi: 186, btch: 31 usd: 17
[ 1028.433914] Active_anon:41222 active_file:1015 inactive_anon:47978
[ 1028.433915] inactive_file:4149 unevictable:4 dirty:0 writeback:0 unstable:0
[ 1028.433916] free:1168 slab:13459 mapped:4432 pagetables:6766 bounce:0
[ 1028.453622] Node 0 DMA free:2000kB min:84kB low:104kB high:124kB active_anon:5520kB inactive_anon:5776kB active_file:0kB inactive_file:84kB unevictable:0kB present:15164kB pages_scanned:16704 all_unreclaimable? no
[ 1028.472548] lowmem_reserve[]: 0 483 483 483
[ 1028.476811] Node 0 DMA32 free:2672kB min:2768kB low:3460kB high:4152kB active_anon:159368kB inactive_anon:186136kB active_file:4060kB inactive_file:16512kB unevictable:16kB present:495008kB pages_scanned:566633 all_unreclaimable? yes
[ 1028.497459] lowmem_reserve[]: 0 0 0 0
[ 1028.501203] Node 0 DMA: 56*4kB 0*8kB 11*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2000kB
[ 1028.512136] Node 0 DMA32: 278*4kB 3*8kB 4*16kB 8*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2672kB
[ 1028.523222] 64013 total pagecache pages
[ 1028.527049] 6900 pages in swap cache
[ 1028.530627] Swap cache stats: add 334539, delete 327639, find 36253/163064
[ 1028.537490] Free swap = 775384kB
[ 1028.540803] Total swap = 1048568kB
[ 1028.547522] 131072 pages RAM
[ 1028.550399] 9628 pages reserved
[ 1028.553550] 79539 pages shared
[ 1028.556607] 57450 pages non-shared
[ 1028.560008] Out of memory: kill process 3487 (run-many-x-apps) score 938661 or a child
[ 1028.567914] Killed process 4046 (gnometris)
[ 1162.209886] Xorg invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
[ 1162.216441] Pid: 3272, comm: Xorg Not tainted 2.6.30-rc8-mm1 #312
[ 1162.222536] Call Trace:
[ 1162.224993] [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 1162.230485] [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 1162.236231] [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 1162.241461] [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 1162.247198] [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 1162.252677] [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 1162.259027] [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[ 1162.265027] [<ffffffff810c7409>] __get_free_pages+0x9/0x50
[ 1162.270599] [<ffffffff8110e402>] __pollwait+0xc2/0x100
[ 1162.275815] [<ffffffff81495903>] unix_poll+0x23/0xc0
[ 1162.280860] [<ffffffff81419ac8>] sock_poll+0x18/0x20
[ 1162.285907] [<ffffffff8110d9a9>] do_select+0x3e9/0x730
[ 1162.291129] [<ffffffff8110d5c0>] ? do_select+0x0/0x730
[ 1162.296349] [<ffffffff8110e340>] ? __pollwait+0x0/0x100
[ 1162.301659] [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.306706] [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.311748] [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.316792] [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.321840] [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.326886] [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.331933] [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.336979] [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.342029] [<ffffffff8110e440>] ? pollwake+0x0/0x60
[ 1162.347071] [<ffffffff8110deef>] core_sys_select+0x1ff/0x330
[ 1162.352807] [<ffffffff8110dd38>] ? core_sys_select+0x48/0x330
[ 1162.358644] [<ffffffffa014954c>] ? i915_gem_throttle_ioctl+0x4c/0x60 [i915]
[ 1162.365687] [<ffffffff81079ebd>] ? trace_hardirqs_on+0xd/0x10
[ 1162.371511] [<ffffffff810706cc>] ? getnstimeofday+0x5c/0xf0
[ 1162.377161] [<ffffffff8106acb9>] ? ktime_get_ts+0x59/0x60
[ 1162.382641] [<ffffffff8110e27a>] sys_select+0x4a/0x110
[ 1162.387863] [<ffffffff8100bf42>] system_call_fastpath+0x16/0x1b
[ 1162.393865] Mem-Info:
[ 1162.396132] Node 0 DMA per-cpu:
[ 1162.399294] CPU 0: hi: 0, btch: 1 usd: 0
[ 1162.404076] CPU 1: hi: 0, btch: 1 usd: 0
[ 1162.408858] Node 0 DMA32 per-cpu:
[ 1162.412185] CPU 0: hi: 186, btch: 31 usd: 161
[ 1162.416972] CPU 1: hi: 186, btch: 31 usd: 182
[ 1162.421762] Active_anon:42731 active_file:740 inactive_anon:48110
[ 1162.421763] inactive_file:2851 unevictable:4 dirty:0 writeback:0 unstable:0
[ 1162.421764] free:1174 slab:13321 mapped:3702 pagetables:6595 bounce:0
[ 1162.441384] Node 0 DMA free:2008kB min:84kB low:104kB high:124kB active_anon:5552kB inactive_anon:5812kB active_file:0kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:9376 all_unreclaimable? no
[ 1162.460128] lowmem_reserve[]: 0 483 483 483
[ 1162.464392] Node 0 DMA32 free:2688kB min:2768kB low:3460kB high:4152kB active_anon:165372kB inactive_anon:186628kB active_file:2960kB inactive_file:11404kB unevictable:16kB present:495008kB pages_scanned:675382 all_unreclaimable? yes
[ 1162.485048] lowmem_reserve[]: 0 0 0 0
[ 1162.488797] Node 0 DMA: 56*4kB 1*8kB 11*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2008kB
[ 1162.499720] Node 0 DMA32: 274*4kB 3*8kB 8*16kB 7*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2688kB
[ 1162.510803] 62374 total pagecache pages
[ 1162.514635] 6690 pages in swap cache
[ 1162.518210] Swap cache stats: add 344648, delete 337958, find 37585/169560
[ 1162.525071] Free swap = 796012kB
[ 1162.528385] Total swap = 1048568kB
[ 1162.535461] 131072 pages RAM
[ 1162.538352] 9628 pages reserved
[ 1162.541490] 73953 pages shared
[ 1162.544536] 58149 pages non-shared
[ 1162.547940] Out of memory: kill process 3487 (run-many-x-apps) score 918444 or a child
[ 1162.555846] Killed process 4079 (gnect)
[ 1162.634031] /usr/games/gnom invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
[ 1162.641791] Pid: 4259, comm: /usr/games/gnom Not tainted 2.6.30-rc8-mm1 #312
[ 1162.648843] Call Trace:
[ 1162.651302] [<ffffffff81545006>] ? _spin_unlock+0x26/0x30
[ 1162.656786] [<ffffffff810c37cc>] oom_kill_process+0xdc/0x270
[ 1162.662531] [<ffffffff810c3b2f>] ? badness+0x18f/0x300
[ 1162.667761] [<ffffffff810c3dd5>] __out_of_memory+0x135/0x170
[ 1162.673511] [<ffffffff810c3f05>] out_of_memory+0xf5/0x180
[ 1162.678995] [<ffffffff810c857c>] __alloc_pages_nodemask+0x6ac/0x6c0
[ 1162.685345] [<ffffffff810f3ea8>] alloc_pages_current+0x78/0x100
[ 1162.691347] [<ffffffff810c0c7b>] __page_cache_alloc+0xb/0x10
[ 1162.697086] [<ffffffff810ca910>] __do_page_cache_readahead+0x120/0x240
[ 1162.703701] [<ffffffff810ca8a2>] ? __do_page_cache_readahead+0xb2/0x240
[ 1162.710401] [<ffffffff810caa4c>] ra_submit+0x1c/0x20
[ 1162.715446] [<ffffffff810c1497>] filemap_fault+0x3f7/0x400
[ 1162.721012] [<ffffffff810d9893>] __do_fault+0x53/0x510
[ 1162.726236] [<ffffffff81271ce0>] ? __down_read_trylock+0x20/0x60
[ 1162.732333] [<ffffffff810dc4a9>] handle_mm_fault+0x1c9/0x500
[ 1162.738088] [<ffffffff81548274>] do_page_fault+0x1c4/0x330
[ 1162.743659] [<ffffffff81545a95>] page_fault+0x25/0x30
[ 1162.748793] Mem-Info:
[ 1162.751069] Node 0 DMA per-cpu:
[ 1162.754231] CPU 0: hi: 0, btch: 1 usd: 0
[ 1162.759021] CPU 1: hi: 0, btch: 1 usd: 0
[ 1162.763812] Node 0 DMA32 per-cpu:
[ 1162.767147] CPU 0: hi: 186, btch: 31 usd: 90
[ 1162.771930] CPU 1: hi: 186, btch: 31 usd: 89
[ 1162.776719] Active_anon:42484 active_file:760 inactive_anon:48078
[ 1162.776721] inactive_file:3351 unevictable:4 dirty:0 writeback:0 unstable:0
[ 1162.776722] free:1174 slab:13329 mapped:3807 pagetables:6487 bounce:0
[ 1162.796351] Node 0 DMA free:2008kB min:84kB low:104kB high:124kB active_anon:5532kB inactive_anon:5812kB active_file:4kB inactive_file:0kB unevictable:0kB present:15164kB pages_scanned:1408 all_unreclaimable? no
[ 1162.815110] lowmem_reserve[]: 0 483 483 483
[ 1162.819378] Node 0 DMA32 free:2688kB min:2768kB low:3460kB high:4152kB active_anon:164404kB inactive_anon:186500kB active_file:3036kB inactive_file:13404kB unevictable:16kB present:495008kB pages_scanned:40768 all_unreclaimable? no
[ 1162.839863] lowmem_reserve[]: 0 0 0 0
[ 1162.843612] Node 0 DMA: 57*4kB 1*8kB 11*16kB 2*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2012kB
[ 1162.854539] Node 0 DMA32: 274*4kB 4*8kB 8*16kB 7*32kB 1*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2696kB
[ 1162.865631] 62784 total pagecache pages
[ 1162.869465] 6595 pages in swap cache
[ 1162.873034] Swap cache stats: add 344648, delete 338053, find 37585/169561
[ 1162.879901] Free swap = 802992kB
[ 1162.883222] Total swap = 1048568kB
[ 1162.891314] 131072 pages RAM
[ 1162.894216] 9628 pages reserved
[ 1162.897365] 74036 pages shared
[ 1162.900414] 58276 pages non-shared
[ 1162.903825] Out of memory: kill process 3487 (run-many-x-apps) score 890891 or a child
[ 1162.911747] Killed process 4113 (gtali)


Thanks,
Fengguang

2009-06-18 09:29:59

by Fengguang Wu

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

Johannes,

On Thu, Jun 18, 2009 at 06:41:49AM +0800, Johannes Weiner wrote:
> On Thu, Jun 11, 2009 at 02:31:22PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 9 Jun 2009 21:01:28 +0200
> > Johannes Weiner <[email protected]> wrote:
> > > [resend with lists cc'd, sorry]
> > >
> > > +static int swap_readahead_ptes(struct mm_struct *mm,

I suspect the previous unfavorable results are due to comparing things
with/without the drm vmalloc patch. So I spent one day redo the whole
comparisons. The swap readahead patch shows neither big improvements
nor big degradations this time.

Base kernel is 2.6.30-rc8-mm1 with drm vmalloc patch.

a) base kernel
b) base kernel + VM_EXEC protection
c) base kernel + VM_EXEC protection + swap readahead

(a) (b) (c)
0.02 0.02 0.01 N xeyes
0.78 0.92 0.77 N firefox
2.03 2.20 1.97 N nautilus
3.27 3.35 3.39 N nautilus --browser
5.10 5.28 4.99 N gthumb
6.74 7.06 6.64 N gedit
8.70 8.82 8.47 N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf
11.05 10.95 10.94 N
13.03 12.72 12.79 N xterm
15.46 15.09 15.10 N mlterm
18.05 17.31 17.51 N gnome-terminal
20.59 19.90 19.98 N urxvt
23.45 22.82 22.67 N
25.74 25.16 24.96 N gnome-system-monitor
28.87 27.53 27.89 N gnome-help
32.37 31.17 31.89 N gnome-dictionary
36.60 35.18 35.16 N
39.76 38.04 37.64 N /usr/games/sol
43.05 42.17 40.33 N /usr/games/gnometris
47.70 47.08 43.48 N /usr/games/gnect
51.64 50.46 47.24 N /usr/games/gtali
56.26 54.58 50.83 N /usr/games/iagno
60.36 58.01 55.15 N /usr/games/gnotravex
65.79 62.92 59.28 N /usr/games/mahjongg
71.59 67.36 65.95 N /usr/games/gnome-sudoku
78.57 72.32 72.60 N /usr/games/glines
84.25 80.03 77.42 N /usr/games/glchess
90.65 88.11 83.66 N /usr/games/gnomine
97.75 95.13 89.38 N /usr/games/gnotski
102.99 101.59 95.05 N /usr/games/gnibbles
110.68 112.05 109.40 N /usr/games/gnobots2
117.23 121.58 120.05 N /usr/games/blackjack
125.15 133.59 130.91 N /usr/games/same-gnome
134.05 151.99 148.91 N
142.57 162.67 165.00 N /usr/bin/gnome-window-properties
156.29 174.54 183.84 N /usr/bin/gnome-default-applications-properties
168.37 190.38 200.99 N /usr/bin/gnome-at-properties
184.80 209.41 230.82 N /usr/bin/gnome-typing-monitor
202.05 226.52 250.02 N /usr/bin/gnome-at-visual
217.60 243.76 272.91 N /usr/bin/gnome-sound-properties
239.78 266.47 308.74 N /usr/bin/gnome-at-mobility
255.23 285.42 338.51 N /usr/bin/gnome-keybinding-properties
276.85 314.84 374.64 N /usr/bin/gnome-about-me
308.51 355.95 419.78 N /usr/bin/gnome-display-properties
341.27 401.22 463.55 N /usr/bin/gnome-network-preferences
393.42 451.27 517.24 N /usr/bin/gnome-mouse-properties
438.48 510.54 574.64 N /usr/bin/gnome-appearance-properties
616.09 671.44 760.49 N /usr/bin/gnome-control-center
879.69 879.45 918.87 N /usr/bin/gnome-keyboard-properties
1159.47 1076.29 1071.65 N
1701.82 1240.47 1280.77 N : oocalc
1921.14 1446.95 1451.82 N : oodraw
2262.40 1572.95 1698.37 N : ooimpress
2703.88 1714.53 1841.89 N : oomath
3464.54 1864.99 1983.96 N : ooweb
4040.91 2079.96 2185.53 N : oowriter
4668.16 2330.24 2365.17 N

Thanks,
Fengguang

2009-06-18 13:05:27

by Johannes Weiner

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Thu, Jun 18, 2009 at 05:19:49PM +0800, Wu Fengguang wrote:
> On Tue, Jun 16, 2009 at 02:22:17AM +0800, Johannes Weiner wrote:
> > On Fri, Jun 12, 2009 at 09:59:27AM +0800, Wu Fengguang wrote:
> > > On Thu, Jun 11, 2009 at 06:17:42PM +0800, Johannes Weiner wrote:
> > > > On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> > > > > Unfortunately, after fixing it up the swap readahead patch still performs slow
> > > > > (even worse this time):
> > > >
> > > > Thanks for doing the tests. Do you know if the time difference comes
> > > > from IO or CPU time?
> > > >
> > > > Because one reason I could think of is that the original code walks
> > > > the readaround window in two directions, starting from the target each
> > > > time but immediately stops when it encounters a hole where the new
> > > > code just skips holes but doesn't abort readaround and thus might
> > > > indeed read more slots.
> > > >
> > > > I have an old patch flying around that changed the physical ra code to
> > > > use a bitmap that is able to represent holes. If the increased time
> > > > is waiting for IO, I would be interested if that patch has the same
> > > > negative impact.
> > >
> > > You can send me the patch :)
> >
> > Okay, attached is a rebase against latest -mmotm.
> >
> > > But for this patch it is IO bound. The CPU iowait field actually is
> > > going up as the test goes on:
> >
> > It's probably the larger ra window then which takes away the bandwidth
> > needed to load the new executables. This sucks. Would be nice to
> > have 'optional IO' for readahead that is dropped when normal-priority
> > IO requests are coming in... Oh, we have READA for bios. But it
> > doesn't seem to implement dropping requests on load (or I am blind).
>
> Hi Hannes,
>
> Sorry for the long delay! A bad news is that I get many oom with this patch:

Okay, evaluating this test-patch any further probably isn't worth it.
It's too aggressive, I think readahead is stealing pages reclaimed by
other allocations which in turn oom.

Back to the original problem: you detected increased latency for
launching new applications, so they get less share of the IO bandwidth
than without the patch.

I can see two reasons for this:

a) the new heuristics don't work out and we read more unrelated
pages than before

b) we readahead more pages in total as the old code would stop at
holes, as described above

We can verify a) by comparing major fault numbers between the two
kernels with your testload. If they increase with my patch, we
anticipate the wrong slots and every fault has do the reading itself.

b) seems to be a trade-off. After all, the IO resources you have less
for new applications in your test is the bandwidth that is used by
swapping applications. My qsbench numbers are a sign for this as the
only IO going on is swap.

Of course, the theory is not to improve swap performance by increasing
the readahead window but to choose better readahead candidates. So I
will run your tests and qsbench with a smaller page cluster and see if
this improves both loads.

Let me no if that doesn't make sense :)

Thanks a lot for all your efforts so far,

Hannes

2009-06-18 13:13:18

by Johannes Weiner

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Thu, Jun 18, 2009 at 05:29:47PM +0800, Wu Fengguang wrote:
> Johannes,
>
> On Thu, Jun 18, 2009 at 06:41:49AM +0800, Johannes Weiner wrote:
> > On Thu, Jun 11, 2009 at 02:31:22PM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Tue, 9 Jun 2009 21:01:28 +0200
> > > Johannes Weiner <[email protected]> wrote:
> > > > [resend with lists cc'd, sorry]
> > > >
> > > > +static int swap_readahead_ptes(struct mm_struct *mm,
>
> I suspect the previous unfavorable results are due to comparing things
> with/without the drm vmalloc patch. So I spent one day redo the whole
> comparisons. The swap readahead patch shows neither big improvements
> nor big degradations this time.

Thanks again! Nice. So according to this, vswapra doesn't increase
other IO latency (much) but boosts ongoing swap loads (quite some) (as
qsbench showed). Is that a result or what! :)

I will see how the tests described in the other mail work out.

> Base kernel is 2.6.30-rc8-mm1 with drm vmalloc patch.
>
> a) base kernel
> b) base kernel + VM_EXEC protection
> c) base kernel + VM_EXEC protection + swap readahead
>
> (a) (b) (c)
> 0.02 0.02 0.01 N xeyes
> 0.78 0.92 0.77 N firefox
> 2.03 2.20 1.97 N nautilus
> 3.27 3.35 3.39 N nautilus --browser
> 5.10 5.28 4.99 N gthumb
> 6.74 7.06 6.64 N gedit
> 8.70 8.82 8.47 N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf
> 11.05 10.95 10.94 N
> 13.03 12.72 12.79 N xterm
> 15.46 15.09 15.10 N mlterm
> 18.05 17.31 17.51 N gnome-terminal
> 20.59 19.90 19.98 N urxvt
> 23.45 22.82 22.67 N
> 25.74 25.16 24.96 N gnome-system-monitor
> 28.87 27.53 27.89 N gnome-help
> 32.37 31.17 31.89 N gnome-dictionary
> 36.60 35.18 35.16 N
> 39.76 38.04 37.64 N /usr/games/sol
> 43.05 42.17 40.33 N /usr/games/gnometris
> 47.70 47.08 43.48 N /usr/games/gnect
> 51.64 50.46 47.24 N /usr/games/gtali
> 56.26 54.58 50.83 N /usr/games/iagno
> 60.36 58.01 55.15 N /usr/games/gnotravex
> 65.79 62.92 59.28 N /usr/games/mahjongg
> 71.59 67.36 65.95 N /usr/games/gnome-sudoku
> 78.57 72.32 72.60 N /usr/games/glines
> 84.25 80.03 77.42 N /usr/games/glchess
> 90.65 88.11 83.66 N /usr/games/gnomine
> 97.75 95.13 89.38 N /usr/games/gnotski
> 102.99 101.59 95.05 N /usr/games/gnibbles
> 110.68 112.05 109.40 N /usr/games/gnobots2
> 117.23 121.58 120.05 N /usr/games/blackjack
> 125.15 133.59 130.91 N /usr/games/same-gnome
> 134.05 151.99 148.91 N
> 142.57 162.67 165.00 N /usr/bin/gnome-window-properties
> 156.29 174.54 183.84 N /usr/bin/gnome-default-applications-properties
> 168.37 190.38 200.99 N /usr/bin/gnome-at-properties
> 184.80 209.41 230.82 N /usr/bin/gnome-typing-monitor
> 202.05 226.52 250.02 N /usr/bin/gnome-at-visual
> 217.60 243.76 272.91 N /usr/bin/gnome-sound-properties
> 239.78 266.47 308.74 N /usr/bin/gnome-at-mobility
> 255.23 285.42 338.51 N /usr/bin/gnome-keybinding-properties
> 276.85 314.84 374.64 N /usr/bin/gnome-about-me
> 308.51 355.95 419.78 N /usr/bin/gnome-display-properties
> 341.27 401.22 463.55 N /usr/bin/gnome-network-preferences
> 393.42 451.27 517.24 N /usr/bin/gnome-mouse-properties
> 438.48 510.54 574.64 N /usr/bin/gnome-appearance-properties
> 616.09 671.44 760.49 N /usr/bin/gnome-control-center
> 879.69 879.45 918.87 N /usr/bin/gnome-keyboard-properties
> 1159.47 1076.29 1071.65 N
> 1701.82 1240.47 1280.77 N : oocalc
> 1921.14 1446.95 1451.82 N : oodraw
> 2262.40 1572.95 1698.37 N : ooimpress
> 2703.88 1714.53 1841.89 N : oomath
> 3464.54 1864.99 1983.96 N : ooweb
> 4040.91 2079.96 2185.53 N : oowriter
> 4668.16 2330.24 2365.17 N
>
> Thanks,
> Fengguang
>

2009-06-19 03:17:30

by Fengguang Wu

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Thu, Jun 18, 2009 at 09:09:34PM +0800, Johannes Weiner wrote:
> On Thu, Jun 18, 2009 at 05:29:47PM +0800, Wu Fengguang wrote:
> > Johannes,
> >
> > On Thu, Jun 18, 2009 at 06:41:49AM +0800, Johannes Weiner wrote:
> > > On Thu, Jun 11, 2009 at 02:31:22PM +0900, KAMEZAWA Hiroyuki wrote:
> > > > On Tue, 9 Jun 2009 21:01:28 +0200
> > > > Johannes Weiner <[email protected]> wrote:
> > > > > [resend with lists cc'd, sorry]
> > > > >
> > > > > +static int swap_readahead_ptes(struct mm_struct *mm,
> >
> > I suspect the previous unfavorable results are due to comparing things
> > with/without the drm vmalloc patch. So I spent one day redo the whole
> > comparisons. The swap readahead patch shows neither big improvements
> > nor big degradations this time.
>
> Thanks again! Nice. So according to this, vswapra doesn't increase
> other IO latency (much) but boosts ongoing swap loads (quite some) (as
> qsbench showed). Is that a result or what! :)
>
> I will see how the tests described in the other mail work out.

And here are the /proc/vmstat contents after each test run :)

The pswpin number goes down radically in case (c) which seems
illogical.

pgpgin 8898235 pgpgin 4828771 pgpgin 1807731
pgpgout 1806868 pgpgout 1463644 pgpgout 1382244
==> pswpin 2222503 pswpin 1205137 pswpin 449877
pswpout 451716 pswpout 365910 pswpout 345560
pgalloc_dma 39883 pgalloc_dma 24343 pgalloc_dma 3547
pgalloc_dma32 11918819 pgalloc_dma32 6810775 pgalloc_dma32 6387602
pgalloc_normal 0 pgalloc_normal 0 pgalloc_normal 0
pgalloc_movable 0 pgalloc_movable 0 pgalloc_movable 0
pgfree 11961651 pgfree 6837658 pgfree 6396229
pgactivate 5771012 pgactivate 2999101 pgactivate 2341219
pgdeactivate 5909300 pgdeactivate 3140474 pgdeactivate 2481319
pgfault 4536082 pgfault 3468555 pgfault 3589046
==> pgmajfault 926383 pgmajfault 506265 pgmajfault 520010

Thanks,
Fengguang

> > Base kernel is 2.6.30-rc8-mm1 with drm vmalloc patch.
> >
> > a) base kernel
> > b) base kernel + VM_EXEC protection
> > c) base kernel + VM_EXEC protection + swap readahead
> >
> > (a) (b) (c)
> > 0.02 0.02 0.01 N xeyes
> > 0.78 0.92 0.77 N firefox
> > 2.03 2.20 1.97 N nautilus
> > 3.27 3.35 3.39 N nautilus --browser
> > 5.10 5.28 4.99 N gthumb
> > 6.74 7.06 6.64 N gedit
> > 8.70 8.82 8.47 N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf
> > 11.05 10.95 10.94 N
> > 13.03 12.72 12.79 N xterm
> > 15.46 15.09 15.10 N mlterm
> > 18.05 17.31 17.51 N gnome-terminal
> > 20.59 19.90 19.98 N urxvt
> > 23.45 22.82 22.67 N
> > 25.74 25.16 24.96 N gnome-system-monitor
> > 28.87 27.53 27.89 N gnome-help
> > 32.37 31.17 31.89 N gnome-dictionary
> > 36.60 35.18 35.16 N
> > 39.76 38.04 37.64 N /usr/games/sol
> > 43.05 42.17 40.33 N /usr/games/gnometris
> > 47.70 47.08 43.48 N /usr/games/gnect
> > 51.64 50.46 47.24 N /usr/games/gtali
> > 56.26 54.58 50.83 N /usr/games/iagno
> > 60.36 58.01 55.15 N /usr/games/gnotravex
> > 65.79 62.92 59.28 N /usr/games/mahjongg
> > 71.59 67.36 65.95 N /usr/games/gnome-sudoku
> > 78.57 72.32 72.60 N /usr/games/glines
> > 84.25 80.03 77.42 N /usr/games/glchess
> > 90.65 88.11 83.66 N /usr/games/gnomine
> > 97.75 95.13 89.38 N /usr/games/gnotski
> > 102.99 101.59 95.05 N /usr/games/gnibbles
> > 110.68 112.05 109.40 N /usr/games/gnobots2
> > 117.23 121.58 120.05 N /usr/games/blackjack
> > 125.15 133.59 130.91 N /usr/games/same-gnome
> > 134.05 151.99 148.91 N
> > 142.57 162.67 165.00 N /usr/bin/gnome-window-properties
> > 156.29 174.54 183.84 N /usr/bin/gnome-default-applications-properties
> > 168.37 190.38 200.99 N /usr/bin/gnome-at-properties
> > 184.80 209.41 230.82 N /usr/bin/gnome-typing-monitor
> > 202.05 226.52 250.02 N /usr/bin/gnome-at-visual
> > 217.60 243.76 272.91 N /usr/bin/gnome-sound-properties
> > 239.78 266.47 308.74 N /usr/bin/gnome-at-mobility
> > 255.23 285.42 338.51 N /usr/bin/gnome-keybinding-properties
> > 276.85 314.84 374.64 N /usr/bin/gnome-about-me
> > 308.51 355.95 419.78 N /usr/bin/gnome-display-properties
> > 341.27 401.22 463.55 N /usr/bin/gnome-network-preferences
> > 393.42 451.27 517.24 N /usr/bin/gnome-mouse-properties
> > 438.48 510.54 574.64 N /usr/bin/gnome-appearance-properties
> > 616.09 671.44 760.49 N /usr/bin/gnome-control-center
> > 879.69 879.45 918.87 N /usr/bin/gnome-keyboard-properties
> > 1159.47 1076.29 1071.65 N
> > 1701.82 1240.47 1280.77 N : oocalc
> > 1921.14 1446.95 1451.82 N : oodraw
> > 2262.40 1572.95 1698.37 N : ooimpress
> > 2703.88 1714.53 1841.89 N : oomath
> > 3464.54 1864.99 1983.96 N : ooweb
> > 4040.91 2079.96 2185.53 N : oowriter
> > 4668.16 2330.24 2365.17 N
> >
> > Thanks,
> > Fengguang
> >


Attachments:
(No filename) (6.04 kB)
vmstat.0 (1.41 kB)
vmstat.1 (1.40 kB)
vmstat.2 (1.40 kB)
Download all attachments

2009-06-19 03:30:26

by Fengguang Wu

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Thu, Jun 18, 2009 at 09:01:21PM +0800, Johannes Weiner wrote:
> On Thu, Jun 18, 2009 at 05:19:49PM +0800, Wu Fengguang wrote:
> > On Tue, Jun 16, 2009 at 02:22:17AM +0800, Johannes Weiner wrote:
> > > On Fri, Jun 12, 2009 at 09:59:27AM +0800, Wu Fengguang wrote:
> > > > On Thu, Jun 11, 2009 at 06:17:42PM +0800, Johannes Weiner wrote:
> > > > > On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> > > > > > Unfortunately, after fixing it up the swap readahead patch still performs slow
> > > > > > (even worse this time):
> > > > >
> > > > > Thanks for doing the tests. Do you know if the time difference comes
> > > > > from IO or CPU time?
> > > > >
> > > > > Because one reason I could think of is that the original code walks
> > > > > the readaround window in two directions, starting from the target each
> > > > > time but immediately stops when it encounters a hole where the new
> > > > > code just skips holes but doesn't abort readaround and thus might
> > > > > indeed read more slots.
> > > > >
> > > > > I have an old patch flying around that changed the physical ra code to
> > > > > use a bitmap that is able to represent holes. If the increased time
> > > > > is waiting for IO, I would be interested if that patch has the same
> > > > > negative impact.
> > > >
> > > > You can send me the patch :)
> > >
> > > Okay, attached is a rebase against latest -mmotm.
> > >
> > > > But for this patch it is IO bound. The CPU iowait field actually is
> > > > going up as the test goes on:
> > >
> > > It's probably the larger ra window then which takes away the bandwidth
> > > needed to load the new executables. This sucks. Would be nice to
> > > have 'optional IO' for readahead that is dropped when normal-priority
> > > IO requests are coming in... Oh, we have READA for bios. But it
> > > doesn't seem to implement dropping requests on load (or I am blind).
> >
> > Hi Hannes,
> >
> > Sorry for the long delay! A bad news is that I get many oom with this patch:
>
> Okay, evaluating this test-patch any further probably isn't worth it.
> It's too aggressive, I think readahead is stealing pages reclaimed by
> other allocations which in turn oom.

OK.

> Back to the original problem: you detected increased latency for
> launching new applications, so they get less share of the IO bandwidth

There are no "launch new app" phase. The test flow works like:

for all apps {
for all started apps {
activate its GUI window
}
start one new app
}

But yes, as time goes by, the test becomes more and more about
switching between existing windows under high memory pressure.

> than without the patch.
>
> I can see two reasons for this:
>
> a) the new heuristics don't work out and we read more unrelated
> pages than before
>
> b) we readahead more pages in total as the old code would stop at
> holes, as described above
>
> We can verify a) by comparing major fault numbers between the two

Plus pswpin numbers :) I found it significantly decreased when we do
pte swap readahead.. See another email.

> kernels with your testload. If they increase with my patch, we
> anticipate the wrong slots and every fault has do the reading itself.
>
> b) seems to be a trade-off. After all, the IO resources you have less
> for new applications in your test is the bandwidth that is used by
> swapping applications. My qsbench numbers are a sign for this as the
> only IO going on is swap.
>
> Of course, the theory is not to improve swap performance by increasing
> the readahead window but to choose better readahead candidates. So I
> will run your tests and qsbench with a smaller page cluster and see if
> this improves both loads.

The general principle is, any non sector number based readahead should
be really accurate in order to be a net gain. Because each readahead
page miss will lead to one disk seek, which is much more costly than
wasting a memory page.

Thanks,
Fengguang

2009-06-21 18:07:21

by Hugh Dickins

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

Hi Hannes,

On Thu, 18 Jun 2009, Johannes Weiner wrote:
> On Thu, Jun 18, 2009 at 05:19:49PM +0800, Wu Fengguang wrote:
>
> Okay, evaluating this test-patch any further probably isn't worth it.
> It's too aggressive, I think readahead is stealing pages reclaimed by
> other allocations which in turn oom.
>
> Back to the original problem: you detected increased latency for
> launching new applications, so they get less share of the IO bandwidth
> than without the patch.
>
> I can see two reasons for this:
>
> a) the new heuristics don't work out and we read more unrelated
> pages than before
>
> b) we readahead more pages in total as the old code would stop at
> holes, as described above
>
> We can verify a) by comparing major fault numbers between the two
> kernels with your testload. If they increase with my patch, we
> anticipate the wrong slots and every fault has do the reading itself.
>
> b) seems to be a trade-off. After all, the IO resources you have less
> for new applications in your test is the bandwidth that is used by
> swapping applications. My qsbench numbers are a sign for this as the
> only IO going on is swap.
>
> Of course, the theory is not to improve swap performance by increasing
> the readahead window but to choose better readahead candidates. So I
> will run your tests and qsbench with a smaller page cluster and see if
> this improves both loads.

Hmm, sounds rather pessimistic; but I've not decided about it either.

May I please hand over to you this collection of adjustments to your
v3 virtual swap readahead patch, for you to merge in or split up or
mess around with, generally take ownership of, however you wish?
So you can keep adjusting shmem.c to match memory.c if necessary.

I still think your method looks a very good idea, though results have
not yet convinced me that it necessarily works out better in practice;
and I probably won't be looking at it again for a while.

The base for this patch was 2.6.30 + your v3.

* shmem_getpage() call shmem_swap_cluster() to collect vector of swap
entries for shmem_swapin(), while we still have them kmap'ped.

* Variable-sized arrays on stack are not popular: I forget whether the
kernel build still supports any gccs which can't manage them, but they
do obscure stack usage, and shmem_getpage is already a suspect for that
(because of the pseudo-vma usage which I hope to remove): should be fine
while you're experimenting, but in the end let's define PAGE_CLUSTER_MAX.

* Fix "> pmax" in swapin_readahead() to ">= pmax": of course this is
only a heuristic, so it wasn't accusably wrong; but we are trying for
a particular range, so it's right to reject < pmin and >= pmax there.

* Kamezawa-san's two one-liners to swap_readahead_ptes(), of course.

* Delete valid_swaphandles() once it's unused (though I can imagine a
useful test patch in which we could switch between old and new methods).

* swapin_readahead() was always poorly named: while you're changing its
behaviour, let's take the opportunity to rename it swapin_readaround();
yes, that triviality would be better as a separate patch.

Signed-off-by: Hugh Dickins <[email protected]>
---

include/linux/mm.h | 6 ++++
include/linux/swap.h | 5 +--
kernel/sysctl.c | 2 -
mm/memory.c | 16 ++++++------
mm/shmem.c | 47 +++++++++++++++++++++++++++++++++----
mm/swap_state.c | 46 +++---------------------------------
mm/swapfile.c | 52 -----------------------------------------
71 files changed, 64 insertions(+), 110 deletions(-)

--- 2.6.30-hv3/include/linux/mm.h 2009-06-10 04:05:27.000000000 +0100
+++ 2.6.30-hv4/include/linux/mm.h 2009-06-21 14:59:27.000000000 +0100
@@ -26,6 +26,12 @@ extern unsigned long max_mapnr;

extern unsigned long num_physpages;
extern void * high_memory;
+
+/*
+ * page_cluster limits swapin_readaround: tuned by /proc/sys/vm/page-cluster
+ * 1 << page_cluster is the maximum number of pages which may be read
+ */
+#define PAGE_CLUSTER_MAX 5
extern int page_cluster;

#ifdef CONFIG_SYSCTL
--- 2.6.30-hv3/include/linux/swap.h 2009-06-11 19:10:34.000000000 +0100
+++ 2.6.30-hv4/include/linux/swap.h 2009-06-21 14:59:27.000000000 +0100
@@ -291,7 +291,7 @@ extern void free_pages_and_swap_cache(st
extern struct page *lookup_swap_cache(swp_entry_t);
extern struct page *read_swap_cache_async(swp_entry_t, gfp_t,
struct vm_area_struct *vma, unsigned long addr);
-extern struct page *swapin_readahead(swp_entry_t, gfp_t,
+extern struct page *swapin_readaround(swp_entry_t, gfp_t,
struct vm_area_struct *vma, unsigned long addr,
swp_entry_t *entries, int nr_entries,
unsigned long cluster);
@@ -303,7 +303,6 @@ extern void si_swapinfo(struct sysinfo *
extern swp_entry_t get_swap_page(void);
extern swp_entry_t get_swap_page_of_type(int);
extern int swap_duplicate(swp_entry_t);
-extern int valid_swaphandles(swp_entry_t, unsigned long *);
extern void swap_free(swp_entry_t);
extern int free_swap_and_cache(swp_entry_t);
extern int swap_type_of(dev_t, sector_t, struct block_device **);
@@ -378,7 +377,7 @@ static inline void swap_free(swp_entry_t
{
}

-static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
+static inline struct page *swapin_readaround(swp_entry_t swp, gfp_t gfp_mask,
struct vm_area_struct *vma, unsigned long addr)
{
return NULL;
--- 2.6.30-hv3/kernel/sysctl.c 2009-06-11 19:10:34.000000000 +0100
+++ 2.6.30-hv4/kernel/sysctl.c 2009-06-21 14:59:27.000000000 +0100
@@ -112,7 +112,7 @@ static int min_percpu_pagelist_fract = 8

static int ngroups_max = NGROUPS_MAX;

-static int page_cluster_max = 5;
+static int page_cluster_max = PAGE_CLUSTER_MAX;

#ifdef CONFIG_MODULES
extern char modprobe_path[];
--- 2.6.30-hv3/mm/memory.c 2009-06-21 14:55:44.000000000 +0100
+++ 2.6.30-hv4/mm/memory.c 2009-06-21 14:59:27.000000000 +0100
@@ -2440,9 +2440,9 @@ int vmtruncate_range(struct inode *inode
}

/*
- * The readahead window is the virtual area around the faulting page,
+ * The readaround window is the virtual area around the faulting page,
* where the physical proximity of the swap slots is taken into
- * account as well in swapin_readahead().
+ * account as well in swapin_readaround().
*
* While the swap allocation algorithm tries to keep LRU-related pages
* together on the swap backing, it is not reliable on heavy thrashing
@@ -2455,7 +2455,7 @@ int vmtruncate_range(struct inode *inode
* By taking both aspects into account, we get a good approximation of
* which pages are sensible to read together with the faulting one.
*/
-static int swap_readahead_ptes(struct mm_struct *mm,
+static int swap_readaround_ptes(struct mm_struct *mm,
unsigned long addr, pmd_t *pmd,
swp_entry_t *entries,
unsigned long cluster)
@@ -2467,7 +2467,7 @@ static int swap_readahead_ptes(struct mm

window = cluster << PAGE_SHIFT;
min = addr & ~(window - 1);
- max = min + cluster;
+ max = min + window;
/*
* To keep the locking/highpte mapping simple, stay
* within the PTE range of one PMD entry.
@@ -2478,7 +2478,7 @@ static int swap_readahead_ptes(struct mm
limit = pmd_addr_end(addr, max);
if (limit < max)
max = limit;
- limit = max - min;
+ limit = (max - min) >> PAGE_SHIFT;
ptep = pte_offset_map_lock(mm, pmd, min, &ptl);
for (i = nr = 0; i < limit; i++)
if (is_swap_pte(ptep[i]))
@@ -2515,11 +2515,11 @@ static int do_swap_page(struct mm_struct
page = lookup_swap_cache(entry);
if (!page) {
int nr, cluster = 1 << page_cluster;
- swp_entry_t entries[cluster];
+ swp_entry_t entries[1 << PAGE_CLUSTER_MAX];

grab_swap_token(); /* Contend for token _before_ read-in */
- nr = swap_readahead_ptes(mm, address, pmd, entries, cluster);
- page = swapin_readahead(entry,
+ nr = swap_readaround_ptes(mm, address, pmd, entries, cluster);
+ page = swapin_readaround(entry,
GFP_HIGHUSER_MOVABLE, vma, address,
entries, nr, cluster);
if (!page) {
--- 2.6.30-hv3/mm/shmem.c 2009-06-11 19:10:34.000000000 +0100
+++ 2.6.30-hv4/mm/shmem.c 2009-06-21 14:59:27.000000000 +0100
@@ -1134,7 +1134,8 @@ static struct mempolicy *shmem_get_sbmpo
#endif /* CONFIG_TMPFS */

static struct page *shmem_swapin(swp_entry_t entry, gfp_t gfp,
- struct shmem_inode_info *info, unsigned long idx)
+ struct shmem_inode_info *info, unsigned long idx,
+ swp_entry_t *entries, int nr_entries, unsigned long cluster)
{
struct mempolicy mpol, *spol;
struct vm_area_struct pvma;
@@ -1148,7 +1149,8 @@ static struct page *shmem_swapin(swp_ent
pvma.vm_pgoff = idx;
pvma.vm_ops = NULL;
pvma.vm_policy = spol;
- page = swapin_readahead(entry, gfp, &pvma, 0, NULL, 0, 0);
+ page = swapin_readaround(entry, gfp, &pvma, 0,
+ entries, nr_entries, cluster);
return page;
}

@@ -1176,9 +1178,11 @@ static inline void shmem_show_mpol(struc
#endif /* CONFIG_TMPFS */

static inline struct page *shmem_swapin(swp_entry_t entry, gfp_t gfp,
- struct shmem_inode_info *info, unsigned long idx)
+ struct shmem_inode_info *info, unsigned long idx,
+ swp_entry_t *entries, int nr_entries, unsigned long cluster)
{
- return swapin_readahead(entry, gfp, NULL, 0, NULL, 0, 0);
+ return swapin_readaround(entry, gfp, NULL, 0,
+ entries, nr_entries, cluster);
}

static inline struct page *shmem_alloc_page(gfp_t gfp,
@@ -1195,6 +1199,33 @@ static inline struct mempolicy *shmem_ge
}
#endif

+static int shmem_swap_cluster(swp_entry_t *entry, unsigned long idx,
+ swp_entry_t *entries, unsigned long cluster)
+{
+ unsigned long min, max, limit;
+ int i, nr;
+
+ limit = SHMEM_NR_DIRECT;
+ if (idx >= SHMEM_NR_DIRECT) {
+ idx -= SHMEM_NR_DIRECT;
+ idx %= ENTRIES_PER_PAGE;
+ limit = ENTRIES_PER_PAGE;
+ }
+
+ min = idx & ~(cluster - 1);
+ max = min + cluster;
+ if (max > limit)
+ max = limit;
+ entry -= (idx - min);
+ limit = max - min;
+
+ for (i = nr = 0; i < limit; i++) {
+ if (entry[i].val)
+ entries[nr++] = entry[i];
+ }
+ return nr;
+}
+
/*
* shmem_getpage - either get the page from swap or allocate a new one
*
@@ -1261,6 +1292,11 @@ repeat:
/* Look it up and read it in.. */
swappage = lookup_swap_cache(swap);
if (!swappage) {
+ int nr_entries, cluster = 1 << page_cluster;
+ swp_entry_t entries[1 << PAGE_CLUSTER_MAX];
+
+ nr_entries = shmem_swap_cluster(entry, idx,
+ entries, cluster);
shmem_swp_unmap(entry);
/* here we actually do the io */
if (type && !(*type & VM_FAULT_MAJOR)) {
@@ -1268,7 +1304,8 @@ repeat:
*type |= VM_FAULT_MAJOR;
}
spin_unlock(&info->lock);
- swappage = shmem_swapin(swap, gfp, info, idx);
+ swappage = shmem_swapin(swap, gfp, info, idx,
+ entries, nr_entries, cluster);
if (!swappage) {
spin_lock(&info->lock);
entry = shmem_swp_alloc(info, idx, sgp);
--- 2.6.30-hv3/mm/swap_state.c 2009-06-11 19:10:34.000000000 +0100
+++ 2.6.30-hv4/mm/swap_state.c 2009-06-21 14:59:27.000000000 +0100
@@ -325,58 +325,24 @@ struct page *read_swap_cache_async(swp_e
return found_page;
}

-/*
- * Primitive swap readahead code. We simply read an aligned block of
- * (1 << page_cluster) entries in the swap area. This method is chosen
- * because it doesn't cost us any seek time. We also make sure to queue
- * the 'original' request together with the readahead ones...
- */
-static struct page *swapin_readahead_phys(swp_entry_t entry, gfp_t gfp_mask,
- struct vm_area_struct *vma, unsigned long addr)
-{
- int nr_pages;
- struct page *page;
- unsigned long offset;
- unsigned long end_offset;
-
- /*
- * Get starting offset for readaround, and number of pages to read.
- * Adjust starting address by readbehind (for NUMA interleave case)?
- * No, it's very unlikely that swap layout would follow vma layout,
- * more likely that neighbouring swap pages came from the same node:
- * so use the same "addr" to choose the same node for each swap read.
- */
- nr_pages = valid_swaphandles(entry, &offset);
- for (end_offset = offset + nr_pages; offset < end_offset; offset++) {
- /* Ok, do the async read-ahead now */
- page = read_swap_cache_async(swp_entry(swp_type(entry), offset),
- gfp_mask, vma, addr);
- if (!page)
- break;
- page_cache_release(page);
- }
- lru_add_drain(); /* Push any new pages onto the LRU now */
- return read_swap_cache_async(entry, gfp_mask, vma, addr);
-}
-
/**
- * swapin_readahead - swap in pages in hope we need them soon
+ * swapin_readaround - swap in pages in hope we need them soon
* @entry: swap entry of this memory
* @gfp_mask: memory allocation flags
* @vma: user vma this address belongs to
* @addr: target address for mempolicy
* @entries: swap slots to consider reading
* @nr_entries: number of @entries
- * @cluster: readahead window size in swap slots
+ * @cluster: readaround window size in swap slots
*
* Returns the struct page for entry and addr, after queueing swapin.
*
* This has been extended to use the NUMA policies from the mm
- * triggering the readahead.
+ * triggering the readaround.
*
* Caller must hold down_read on the vma->vm_mm if vma is not NULL.
*/
-struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
+struct page *swapin_readaround(swp_entry_t entry, gfp_t gfp_mask,
struct vm_area_struct *vma, unsigned long addr,
swp_entry_t *entries, int nr_entries,
unsigned long cluster)
@@ -384,8 +350,6 @@ struct page *swapin_readahead(swp_entry_
unsigned long pmin, pmax;
int i;

- if (!entries) /* XXX: shmem case */
- return swapin_readahead_phys(entry, gfp_mask, vma, addr);
pmin = swp_offset(entry) & ~(cluster - 1);
pmax = pmin + cluster;
for (i = 0; i < nr_entries; i++) {
@@ -394,7 +358,7 @@ struct page *swapin_readahead(swp_entry_

if (swp_type(swp) != swp_type(entry))
continue;
- if (swp_offset(swp) > pmax)
+ if (swp_offset(swp) >= pmax)
continue;
if (swp_offset(swp) < pmin)
continue;
--- 2.6.30-hv3/mm/swapfile.c 2009-03-23 23:12:14.000000000 +0000
+++ 2.6.30-hv4/mm/swapfile.c 2009-06-21 14:59:27.000000000 +0100
@@ -1984,55 +1984,3 @@ get_swap_info_struct(unsigned type)
{
return &swap_info[type];
}
-
-/*
- * swap_lock prevents swap_map being freed. Don't grab an extra
- * reference on the swaphandle, it doesn't matter if it becomes unused.
- */
-int valid_swaphandles(swp_entry_t entry, unsigned long *offset)
-{
- struct swap_info_struct *si;
- int our_page_cluster = page_cluster;
- pgoff_t target, toff;
- pgoff_t base, end;
- int nr_pages = 0;
-
- if (!our_page_cluster) /* no readahead */
- return 0;
-
- si = &swap_info[swp_type(entry)];
- target = swp_offset(entry);
- base = (target >> our_page_cluster) << our_page_cluster;
- end = base + (1 << our_page_cluster);
- if (!base) /* first page is swap header */
- base++;
-
- spin_lock(&swap_lock);
- if (end > si->max) /* don't go beyond end of map */
- end = si->max;
-
- /* Count contiguous allocated slots above our target */
- for (toff = target; ++toff < end; nr_pages++) {
- /* Don't read in free or bad pages */
- if (!si->swap_map[toff])
- break;
- if (si->swap_map[toff] == SWAP_MAP_BAD)
- break;
- }
- /* Count contiguous allocated slots below our target */
- for (toff = target; --toff >= base; nr_pages++) {
- /* Don't read in free or bad pages */
- if (!si->swap_map[toff])
- break;
- if (si->swap_map[toff] == SWAP_MAP_BAD)
- break;
- }
- spin_unlock(&swap_lock);
-
- /*
- * Indicate starting offset, and return number of pages to get:
- * if only 1, say 0, since there's then no readahead to be done.
- */
- *offset = ++toff;
- return nr_pages? ++nr_pages: 0;
-}

2009-06-21 18:41:19

by Johannes Weiner

[permalink] [raw]
Subject: Re: [patch v3] swap: virtual swap readahead

On Sun, Jun 21, 2009 at 07:07:03PM +0100, Hugh Dickins wrote:
> Hi Hannes,
>
> On Thu, 18 Jun 2009, Johannes Weiner wrote:
> > On Thu, Jun 18, 2009 at 05:19:49PM +0800, Wu Fengguang wrote:
> >
> > Okay, evaluating this test-patch any further probably isn't worth it.
> > It's too aggressive, I think readahead is stealing pages reclaimed by
> > other allocations which in turn oom.
> >
> > Back to the original problem: you detected increased latency for
> > launching new applications, so they get less share of the IO bandwidth
> > than without the patch.
> >
> > I can see two reasons for this:
> >
> > a) the new heuristics don't work out and we read more unrelated
> > pages than before
> >
> > b) we readahead more pages in total as the old code would stop at
> > holes, as described above
> >
> > We can verify a) by comparing major fault numbers between the two
> > kernels with your testload. If they increase with my patch, we
> > anticipate the wrong slots and every fault has do the reading itself.
> >
> > b) seems to be a trade-off. After all, the IO resources you have less
> > for new applications in your test is the bandwidth that is used by
> > swapping applications. My qsbench numbers are a sign for this as the
> > only IO going on is swap.
> >
> > Of course, the theory is not to improve swap performance by increasing
> > the readahead window but to choose better readahead candidates. So I
> > will run your tests and qsbench with a smaller page cluster and see if
> > this improves both loads.
>
> Hmm, sounds rather pessimistic; but I've not decided about it either.

It seems the problem was not that real after all:

http://lkml.org/lkml/2009/6/18/109

> May I please hand over to you this collection of adjustments to your
> v3 virtual swap readahead patch, for you to merge in or split up or
> mess around with, generally take ownership of, however you wish?
> So you can keep adjusting shmem.c to match memory.c if necessary.

I will adopt them, thank you!

Hannes