2004-03-10 10:58:54

by Junichi Nomura

[permalink] [raw]
Subject: Re: [2.4] heavy-load under swap space shortage

--- linux/include/linux/swap.h 2004/02/19 04:12:39 1.1.1.26
+++ linux/include/linux/swap.h 2004/03/10 10:09:11
@@ -116,7 +116,7 @@ extern void swap_setup(void);
extern wait_queue_head_t kswapd_wait;
extern int FASTCALL(try_to_free_pages_zone(zone_t *, unsigned int));
extern int FASTCALL(try_to_free_pages(unsigned int));
-extern int vm_vfs_scan_ratio, vm_cache_scan_ratio, vm_lru_balance_ratio, vm_passes, vm_gfp_debug, vm_mapped_ratio;
+extern int vm_vfs_scan_ratio, vm_cache_scan_ratio, vm_lru_balance_ratio, vm_passes, vm_gfp_debug, vm_mapped_ratio, vm_anon_lru;

/* linux/mm/page_io.c */
extern void rw_swap_page(int, struct page *);
--- linux/include/linux/sysctl.h 2004/02/19 04:12:39 1.1.1.23
+++ linux/include/linux/sysctl.h 2004/03/10 10:09:11
@@ -156,6 +156,7 @@ enum
VM_MAPPED_RATIO=20, /* amount of unfreeable pages that triggers swapout */
VM_LAPTOP_MODE=21, /* kernel in laptop flush mode */
VM_BLOCK_DUMP=22, /* dump fs activity to log */
+ VM_ANON_LRU=23, /* immediatly insert anon pages in the vm page lru */
};


--- linux/kernel/sysctl.c 2003/12/02 04:48:47 1.1.1.22
+++ linux/kernel/sysctl.c 2004/03/10 10:09:12
@@ -287,6 +287,8 @@ static ctl_table vm_table[] = {
&vm_cache_scan_ratio, sizeof(int), 0644, NULL, &proc_dointvec},
{VM_MAPPED_RATIO, "vm_mapped_ratio",
&vm_mapped_ratio, sizeof(int), 0644, NULL, &proc_dointvec},
+ {VM_ANON_LRU, "vm_anon_lru",
+ &vm_anon_lru, sizeof(int), 0644, NULL, &proc_dointvec},
{VM_LRU_BALANCE_RATIO, "vm_lru_balance_ratio",
&vm_lru_balance_ratio, sizeof(int), 0644, NULL, &proc_dointvec},
{VM_PASSES, "vm_passes",
--- linux/mm/memory.c 2003/12/02 04:48:47 1.1.1.31
+++ linux/mm/memory.c 2004/03/10 10:09:12
@@ -984,7 +984,8 @@ static int do_wp_page(struct mm_struct *
if (PageReserved(old_page))
++mm->rss;
break_cow(vma, new_page, address, page_table);
- lru_cache_add(new_page);
+ if (vm_anon_lru)
+ lru_cache_add(new_page);

/* Free the old page.. */
new_page = old_page;
@@ -1215,7 +1216,8 @@ static int do_anonymous_page(struct mm_s
mm->rss++;
flush_page_to_ram(page);
entry = pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
- lru_cache_add(page);
+ if (vm_anon_lru)
+ lru_cache_add(page);
mark_page_accessed(page);
}

@@ -1270,7 +1272,8 @@ static int do_no_page(struct mm_struct *
}
copy_user_highpage(page, new_page, address);
page_cache_release(new_page);
- lru_cache_add(page);
+ if (vm_anon_lru)
+ lru_cache_add(page);
new_page = page;
}

--- linux/mm/vmscan.c 2004/02/19 04:12:33 1.1.1.32
+++ linux/mm/vmscan.c 2004/03/10 10:09:13
@@ -65,6 +65,27 @@ int vm_lru_balance_ratio = 2;
int vm_vfs_scan_ratio = 6;

/*
+ * "vm_anon_lru" select if to immdiatly insert anon pages in the
+ * lru. Immediatly means as soon as they're allocated during the
+ * page faults.
+ *
+ * If this is set to 0, they're inserted only after the first
+ * swapout.
+ *
+ * Having anon pages immediatly inserted in the lru allows the
+ * VM to know better when it's worthwhile to start swapping
+ * anonymous ram, it will start to swap earlier and it should
+ * swap smoother and faster, but it will decrease scalability
+ * on the >16-ways of an order of magnitude. Big SMP/NUMA
+ * definitely can't take an hit on a global spinlock at
+ * every anon page allocation. So this is off by default.
+ *
+ * Low ram machines that swaps all the time want to turn
+ * this on (i.e. set to 1).
+ */
+int vm_anon_lru = 1;
+
+/*
* The swap-out function returns 1 if it successfully
* scanned all the pages it was asked to (`count').
* It returns zero if it couldn't do anything,


Attachments:
05_vm_22_vm-anon-lru-3_2.4.25.diff (3.55 kB)

2004-03-14 19:49:15

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [2.4] heavy-load under swap space shortage


Hi kernel colleagues,

At first I was skeptic to the inclusion of this patch in v2.4 (due to the
freeze), but after thinking a bit more about I have a few points in favour
of this modification (read Nomura's message below and the patch to know
what I'm talking about):

- It is off by default
- It is very simple (non intrusive), it just changes the point in which
anonymous pages are inserted into the LRU.
- When turned on, I dont see it being a reason for introducing new
bugs.

What you think of this?

On Wed, 10 Mar 2004 [email protected] wrote:

> After discussion with Hugh and recommendation from Andrea,
> it turns out that Andrea's 05_vm_22_vm-anon-lru-3 in 2.4.23aa2 solves
> the problem.
> ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.23aa2/05_vm_22_vm-anon-lru-3
>
> The patch adds a sysctl which accelerate the performance on
> huge memory machine. It doesn't affect anything if turned off.
>
> Marcelo, could you apply this to 2.4.26-pre?
> (I attached the slightly modified patch in which the feature is turned
> off by default and which is cleanly applied to bk tree.)
>
>
> My test case was:
> - there is a process with large anonymous mapping
> - there are large amount of page caches and active I/O processes
> - there are not much of file mappings
>
> So the problem happens in this way:
> - shrink_cache tries scanning inactive list in which most of pages
> are anonymous mapped
> - it soon fall into swap_out because of too many anonymous pages
> - when no free swap space, it hardly frees anything
> - it retries again but soon calls swap_out again and again
>
> Without the patch, snapshot of readprofile looks like:
> 3590781 total
> 3289271 swap_out
> 212029 smp_call_function
> 22598 shrink_cache
> 21833 lru_cache_add
> 7787 get_user_pages
>
> Most of the time was spent in swap_out. (contention on pagetable_lock)
>
> After applying the patch, the snapshot is like:
> 17420 total
> 3929 copy_page
> 3677 statm_pgd_range
> 1317 try_to_free_buffers
> 1312 __copy_user
> 593 scsi_make_request
>
> Best regards.

--- linux/include/linux/swap.h 2004/02/19 04:12:39 1.1.1.26
+++ linux/include/linux/swap.h 2004/03/10 10:09:11
@@ -116,7 +116,7 @@ extern void swap_setup(void);
extern wait_queue_head_t kswapd_wait;
extern int FASTCALL(try_to_free_pages_zone(zone_t *, unsigned int));
extern int FASTCALL(try_to_free_pages(unsigned int));
-extern int vm_vfs_scan_ratio, vm_cache_scan_ratio, vm_lru_balance_ratio, vm_passes, vm_gfp_debug, vm_mapped_ratio;
+extern int vm_vfs_scan_ratio, vm_cache_scan_ratio, vm_lru_balance_ratio, vm_passes, vm_gfp_debug, vm_mapped_ratio, vm_anon_lru;

/* linux/mm/page_io.c */
extern void rw_swap_page(int, struct page *);
--- linux/include/linux/sysctl.h 2004/02/19 04:12:39 1.1.1.23
+++ linux/include/linux/sysctl.h 2004/03/10 10:09:11
@@ -156,6 +156,7 @@ enum
VM_MAPPED_RATIO=20, /* amount of unfreeable pages that triggers swapout */
VM_LAPTOP_MODE=21, /* kernel in laptop flush mode */
VM_BLOCK_DUMP=22, /* dump fs activity to log */
+ VM_ANON_LRU=23, /* immediatly insert anon pages in the vm page lru */
};


--- linux/kernel/sysctl.c 2003/12/02 04:48:47 1.1.1.22
+++ linux/kernel/sysctl.c 2004/03/10 10:09:12
@@ -287,6 +287,8 @@ static ctl_table vm_table[] = {
&vm_cache_scan_ratio, sizeof(int), 0644, NULL, &proc_dointvec},
{VM_MAPPED_RATIO, "vm_mapped_ratio",
&vm_mapped_ratio, sizeof(int), 0644, NULL, &proc_dointvec},
+ {VM_ANON_LRU, "vm_anon_lru",
+ &vm_anon_lru, sizeof(int), 0644, NULL, &proc_dointvec},
{VM_LRU_BALANCE_RATIO, "vm_lru_balance_ratio",
&vm_lru_balance_ratio, sizeof(int), 0644, NULL, &proc_dointvec},
{VM_PASSES, "vm_passes",
--- linux/mm/memory.c 2003/12/02 04:48:47 1.1.1.31
+++ linux/mm/memory.c 2004/03/10 10:09:12
@@ -984,7 +984,8 @@ static int do_wp_page(struct mm_struct *
if (PageReserved(old_page))
++mm->rss;
break_cow(vma, new_page, address, page_table);
- lru_cache_add(new_page);
+ if (vm_anon_lru)
+ lru_cache_add(new_page);

/* Free the old page.. */
new_page = old_page;
@@ -1215,7 +1216,8 @@ static int do_anonymous_page(struct mm_s
mm->rss++;
flush_page_to_ram(page);
entry = pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
- lru_cache_add(page);
+ if (vm_anon_lru)
+ lru_cache_add(page);
mark_page_accessed(page);
}

@@ -1270,7 +1272,8 @@ static int do_no_page(struct mm_struct *
}
copy_user_highpage(page, new_page, address);
page_cache_release(new_page);
- lru_cache_add(page);
+ if (vm_anon_lru)
+ lru_cache_add(page);
new_page = page;
}

--- linux/mm/vmscan.c 2004/02/19 04:12:33 1.1.1.32
+++ linux/mm/vmscan.c 2004/03/10 10:09:13
@@ -65,6 +65,27 @@ int vm_lru_balance_ratio = 2;
int vm_vfs_scan_ratio = 6;

/*
+ * "vm_anon_lru" select if to immdiatly insert anon pages in the
+ * lru. Immediatly means as soon as they're allocated during the
+ * page faults.
+ *
+ * If this is set to 0, they're inserted only after the first
+ * swapout.
+ *
+ * Having anon pages immediatly inserted in the lru allows the
+ * VM to know better when it's worthwhile to start swapping
+ * anonymous ram, it will start to swap earlier and it should
+ * swap smoother and faster, but it will decrease scalability
+ * on the >16-ways of an order of magnitude. Big SMP/NUMA
+ * definitely can't take an hit on a global spinlock at
+ * every anon page allocation. So this is off by default.
+ *
+ * Low ram machines that swaps all the time want to turn
+ * this on (i.e. set to 1).
+ */
+int vm_anon_lru = 1;
+
+/*
* The swap-out function returns 1 if it successfully
* scanned all the pages it was asked to (`count').
* It returns zero if it couldn't do anything,

2004-03-14 19:55:17

by Rik van Riel

[permalink] [raw]
Subject: Re: [2.4] heavy-load under swap space shortage

On Sun, 14 Mar 2004, Marcelo Tosatti wrote:

> - It is off by default
> - It is very simple (non intrusive), it just changes the point in which
> anonymous pages are inserted into the LRU.
> - When turned on, I dont see it being a reason for introducing new
> bugs.
>
> What you think of this?

1) Yes, the patch is harmless enough.
2) As long as the default behaviour doesn't change from
how things are done now, the patch shouldn't introduce
any regressions, so it should be safe to apply.


--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

2004-03-14 20:15:11

by Andrew Morton

[permalink] [raw]
Subject: Re: [2.4] heavy-load under swap space shortage

Marcelo Tosatti <[email protected]> wrote:
>
> At first I was skeptic to the inclusion of this patch in v2.4 (due to the
> freeze), but after thinking a bit more about I have a few points in favour
> of this modification (read Nomura's message below and the patch to know
> what I'm talking about):
>
> - It is off by default
> - It is very simple (non intrusive), it just changes the point in which
> anonymous pages are inserted into the LRU.
> - When turned on, I dont see it being a reason for introducing new
> bugs.
>
> What you think of this?

hm, I hadn't noticed that 2.4 was changed to _not_ add anon pages to the
LRU. I'd always regarded that as a workaround for pagemap_lru_lock contention
on large SMP machines which would never get beyond the suse kernel.

Having a magic knob is a weak solution: the majority of people who are
affected by this problem won't know to turn it on.

I confess that I don't really understand the failure mode. So we have
zillions of anon pages which are not on the LRU. We call swap_out() and
scan all these pages, failing to find swapcache space for them.

Why does adding the pages to the LRU up-front solve the problem?

(And why cannot we lazily add these anon pages to the LRU in swap_out, and
avoid the need for the knob?)