Hello,
I'm using 2.4.18 kernel and suspect there are swapcache race.
I looked into 2.4.19 patch but could not find the fix to it.
The race condition is described below.
Do you have any idea?
In the situation such as:
- two processes (process A and B) sharing memory space
- one of the pages in the space has been swapped out and
not remained in swapcache
- process A runs on cpu0, process B runs on cpu1
when process A reads the address corresponding to the page,
page fault occurs and the cpu0 reads swapped-out page into memory,
calls add_to_page_cache_unique() to add it to swapcache and then
calls lru_cache_add() to add it to lru list.
If process B reads the same address at that time, cpu1 calls
do_swap_page() and lookup_swap_cache() may succeed before cpu0
calls lru_cache_add() and cpu1 will set the page active by
following mark_page_accessed().
lru_cache_add() checks if the page is active and if it is active,
it calls BUG().
process A process B
cpu0 cpu1
-----------------------------------------------------------
do_swap_page()
spinunlock(mm->page_table_lock)
spinlock(mm->page_table_lock)
lookup_swap_cache()
read_swap_cache_async()
alloc_page()
add_to_swap_cache_unique()
__add_to_page_cache()
do_swap_page()
lookup_swap_cache()
mark_page_accessed()
lru_cache_add()
Best regards.
--
NOMURA, Jun'ichi <[email protected], [email protected]>
HPC Operating System Group, 1st Computers Software Division,
Computers Software Operations Unit, NEC Solutions.
On Thu, 15 Aug 2002 [email protected] wrote:
>
> I'm using 2.4.18 kernel and suspect there are swapcache race.
> I looked into 2.4.19 patch but could not find the fix to it.
I see a benign race but no oops.
> In the situation such as:
> - two processes (process A and B) sharing memory space
> - one of the pages in the space has been swapped out and
> not remained in swapcache
> - process A runs on cpu0, process B runs on cpu1
>
> when process A reads the address corresponding to the page,
> page fault occurs and the cpu0 reads swapped-out page into memory,
> calls add_to_page_cache_unique() to add it to swapcache and then
> calls lru_cache_add() to add it to lru list.
>
> If process B reads the same address at that time, cpu1 calls
> do_swap_page() and lookup_swap_cache() may succeed before cpu0
> calls lru_cache_add() and cpu1 will set the page active by
> following mark_page_accessed().
I agree that B may get to mark_page_accessed before A gets to
lru_cache_add, but B will just SetPageReferenced. If there's a
similar process C racing on the page too, its mark_page_accessed
would call activate_page, but that will see !PageLRU and do nothing.
> lru_cache_add() checks if the page is active and if it is active,
> it calls BUG().
It cannot be made PageActive until after lru_cache_add has set PageLRU.
Hugh
Hugh Dickins wrote:
>
> On Thu, 15 Aug 2002 [email protected] wrote:
> >
> > I'm using 2.4.18 kernel and suspect there are swapcache race.
> > I looked into 2.4.19 patch but could not find the fix to it.
>
> I see a benign race but no oops.
>
But look at lru_cache_add():
void lru_cache_add(struct page * page)
{
if (!TestSetPageLRU(page)) {
/* window here */
spin_lock(&pagemap_lru_lock);
add_page_to_inactive_list(page);
spin_unlock(&pagemap_lru_lock);
}
}
It sets PG_lru before adding the page to the LRU.
static inline void activate_page_nolock(struct page * page)
{
if (PageLRU(page) && !PageActive(page)) {
del_page_from_inactive_list(page);
add_page_to_active_list(page);
}
}
void activate_page(struct page * page)
{
spin_lock(&pagemap_lru_lock);
activate_page_nolock(page);
spin_unlock(&pagemap_lru_lock);
}
So if activate_page gets the lock inside that window, it will
delete a page from the LRU which isn't on it (memory corruption).
Then activate_page will set PG_active and will drop the lock.
lru_cache_add gets the lock, runs add_page_to_inactive_list which
BUGs over PG_active.
--- 2.4.19/mm/swap.c~lru-race Thu Aug 15 13:03:48 2002
+++ 2.4.19-akpm/mm/swap.c Thu Aug 15 13:04:19 2002
@@ -57,9 +57,10 @@ void activate_page(struct page * page)
*/
void lru_cache_add(struct page * page)
{
- if (!TestSetPageLRU(page)) {
+ if (!PageLRU(page)) {
spin_lock(&pagemap_lru_lock);
- add_page_to_inactive_list(page);
+ if (!TestSetPageLRU(page))
+ add_page_to_inactive_list(page);
spin_unlock(&pagemap_lru_lock);
}
}
.
Andrew Morton wrote:
>
> ...
> --- 2.4.19/mm/swap.c~lru-race Thu Aug 15 13:03:48 2002
> +++ 2.4.19-akpm/mm/swap.c Thu Aug 15 13:04:19 2002
> @@ -57,9 +57,10 @@ void activate_page(struct page * page)
> */
> void lru_cache_add(struct page * page)
> {
> - if (!TestSetPageLRU(page)) {
> + if (!PageLRU(page)) {
> spin_lock(&pagemap_lru_lock);
> - add_page_to_inactive_list(page);
> + if (!TestSetPageLRU(page))
> + add_page_to_inactive_list(page);
> spin_unlock(&pagemap_lru_lock);
> }
> }
Seems that I fixed this in 2.5.32. That set_bit outside
the lock gave me the willes, and I couldn't put my finger on why.
Never occurred to me that the page could be found via pagecache
lookup in this manner.
In 2.5, it is effectively:
void lru_cache_add(struct page * page)
{
spin_lock(&pagemap_lru_lock);
if (TestSetPageLRU(page))
BUG();
add_page_to_inactive_list(page);
spin_unlock(&pagemap_lru_lock);
}
which is what should be tested in 2.4. It's stricter, and significantly
faster.
On Thu, 15 Aug 2002, Andrew Morton wrote:
> Hugh Dickins wrote:
> > On Thu, 15 Aug 2002 [email protected] wrote:
> > >
> > > I'm using 2.4.18 kernel and suspect there are swapcache race.
> > > I looked into 2.4.19 patch but could not find the fix to it.
> >
> > I see a benign race but no oops.
>
> But look at lru_cache_add():
>
> void lru_cache_add(struct page * page)
> {
> if (!TestSetPageLRU(page)) {
> /* window here */
> spin_lock(&pagemap_lru_lock);
> add_page_to_inactive_list(page);
> spin_unlock(&pagemap_lru_lock);
> }
> }
>
> It sets PG_lru before adding the page to the LRU.
Right you are! Thanks for correcting my dysanalysis.
Hugh
Thank you for the solutions.
Your first patch seems to fix the problem in 2.4.
As to the second fix from 2.5.32, it can't be applicable to 2.4, can it?
try_to_swap_out() may call add_to_swap_cache() with PG_lru page.
From: Andrew Morton <[email protected]>
Subject: Re: 2.4.18(19) swapcache oops
Date: Thu, 15 Aug 2002 13:21:17 -0700
> In 2.5, it is effectively:
>
> void lru_cache_add(struct page * page)
> {
> spin_lock(&pagemap_lru_lock);
> if (TestSetPageLRU(page))
> BUG();
> add_page_to_inactive_list(page);
> spin_unlock(&pagemap_lru_lock);
> }
>
> which is what should be tested in 2.4. It's stricter, and significantly
> faster.
Best regards.
--
NOMURA, Jun'ichi <[email protected], [email protected]>
HPC Operating System Group, 1st Computers Software Division,
Computers Software Operations Unit, NEC Solutions.
[email protected] wrote:
>
> Thank you for the solutions.
> Your first patch seems to fix the problem in 2.4.
You must have been able to reproduce it quite quickly. But the
code has been like that for ages. Why is that? Are you testing
on an unusual machine?
> As to the second fix from 2.5.32, it can't be applicable to 2.4, can it?
> try_to_swap_out() may call add_to_swap_cache() with PG_lru page.
Ah, good point. In 2.5, add_to_page_cache() does not add the
page to the LRU. Anonymous pages are already there, and we
don't need to do that. The first patch should be OK.
From: Andrew Morton <[email protected]>
Subject: Re: 2.4.18(19) swapcache oops
Date: Thu, 15 Aug 2002 21:40:26 -0700
> You must have been able to reproduce it quite quickly. But the
> code has been like that for ages. Why is that? Are you testing
> on an unusual machine?
Not so unusual. It's Itanium2 SMP machine (4cpus to 32cpus).
The problem have occured at first after I run a very large multi-threaded
program for days.
To investigate the problem, I made a small program which allocate
memory and then create some threads to access the shared space in parallel.
It makes shared pages swapped in and out with very high frequency.
Running it with small free memory situation, the program could cause
the oops in a few minutes on 4cpu machine.
---------------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#define NPROC 4
#define SIZE 0x20000000
char *ptr;
void*
loop(void* none)
{
while(1)
memset((char *)ptr, 0, SIZE);
}
int
main(int argc, char** argv)
{
int i;
void *v;
pthread_t tid[NPROC];
ptr=malloc(SIZE);
memset(ptr, 0, SIZE);
for(i=0; i<NPROC; i++){
pthread_create((pthread_t *)&tid[i],
(pthread_attr_t *)NULL,
loop,
(void *)NULL);
}
for(i=0; i<NPROC; i++)
pthread_join(tid[i], (void**)NULL);
return 0;
}
---------------------------------------------------------------------
> > As to the second fix from 2.5.32, it can't be applicable to 2.4, can it?
> > try_to_swap_out() may call add_to_swap_cache() with PG_lru page.
>
> Ah, good point. In 2.5, add_to_page_cache() does not add the
> page to the LRU. Anonymous pages are already there, and we
> don't need to do that. The first patch should be OK.
I hope the patch will be taken in next 2.4 release.
Best regards.
--
NOMURA, Jun'ichi <[email protected], [email protected]>
HPC Operating System Group, 1st Computers Software Division,
Computers Software Operations Unit, NEC Solutions.