2000-12-08 21:00:01

by Rasmus Andersen

[permalink] [raw]
Subject: [PATCH] mm->rss is modified without page_table_lock held

Hi.

The following patch moves the page_table_lock in mm/* to cover the
modification of mm->rss in 240-test12-pre7. It was inspired by a
similar patch from davej(?) which covered too much, AFAIR. The item
is on Tytso's ToDo list.

Please comment.


diff -Naur linux-240-t12-pre7-clean/mm/memory.c linux/mm/memory.c
--- linux-240-t12-pre7-clean/mm/memory.c Fri Dec 8 00:45:04 2000
+++ linux/mm/memory.c Fri Dec 8 00:48:26 2000
@@ -371,7 +371,6 @@
address = (address + PGDIR_SIZE) & PGDIR_MASK;
dir++;
} while (address && (address < end));
- spin_unlock(&mm->page_table_lock);
/*
* Update rss for the mm_struct (not necessarily current->mm)
* Notice that rss is an unsigned long.
@@ -380,6 +379,7 @@
mm->rss -= freed;
else
mm->rss = 0;
+ spin_unlock(&mm->page_table_lock);
}


@@ -1076,7 +1076,9 @@
flush_icache_page(vma, page);
}

+ spin_lock(&mm->page_table_lock);
mm->rss++;
+ spin_unlock(&mm->page_table_lock);

pte = mk_pte(page, vma->vm_page_prot);

@@ -1110,7 +1112,9 @@
return -1;
clear_user_highpage(page, addr);
entry = pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
+ spin_lock(&mm->page_table_lock);
mm->rss++;
+ spin_unlock(&mm->page_table_lock);
flush_page_to_ram(page);
}
set_pte(page_table, entry);
@@ -1149,7 +1153,9 @@
return 0;
if (new_page == NOPAGE_OOM)
return -1;
+ spin_lock(&mm->page_table_lock);
++mm->rss;
+ spin_unlock(&mm->page_table_lock);
/*
* This silly early PAGE_DIRTY setting removes a race
* due to the bad i386 page protection. But it's valid
diff -Naur linux-240-t12-pre7-clean/mm/mmap.c linux/mm/mmap.c
--- linux-240-t12-pre7-clean/mm/mmap.c Wed Nov 22 22:41:45 2000
+++ linux/mm/mmap.c Fri Dec 8 00:48:26 2000
@@ -889,8 +889,8 @@
spin_lock(&mm->page_table_lock);
mpnt = mm->mmap;
mm->mmap = mm->mmap_avl = mm->mmap_cache = NULL;
- spin_unlock(&mm->page_table_lock);
mm->rss = 0;
+ spin_unlock(&mm->page_table_lock);
mm->total_vm = 0;
mm->locked_vm = 0;
while (mpnt) {
diff -Naur linux-240-t12-pre7-clean/mm/swapfile.c linux/mm/swapfile.c
--- linux-240-t12-pre7-clean/mm/swapfile.c Sat Nov 4 23:27:17 2000
+++ linux/mm/swapfile.c Fri Dec 8 00:48:26 2000
@@ -231,7 +231,9 @@
set_pte(dir, pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
swap_free(entry);
get_page(page);
+ spin_lock(&vma->vm_mm->page_table_lock);
++vma->vm_mm->rss;
+ spin_unlock(&vma->vm_mm->page_table_lock);
}

static inline void unuse_pmd(struct vm_area_struct * vma, pmd_t *dir,
diff -Naur linux-240-t12-pre7-clean/mm/vmscan.c linux/mm/vmscan.c
--- linux-240-t12-pre7-clean/mm/vmscan.c Fri Dec 8 00:45:04 2000
+++ linux/mm/vmscan.c Fri Dec 8 00:48:26 2000
@@ -98,7 +98,9 @@
set_pte(page_table, swp_entry_to_pte(entry));
drop_pte:
UnlockPage(page);
+ spin_lock(&mm->page_table_lock);
mm->rss--;
+ spin_unlock(&mm->page_table_lock);
flush_tlb_page(vma, address);
deactivate_page(page);
page_cache_release(page);

--
Regards,
Rasmus([email protected])

You know how dumb the average guy is? Well, by definition, half
of them are even dumber than that.
-- J.R. "Bob" Dobbs


2000-12-08 21:07:21

by Mark Hahn

[permalink] [raw]
Subject: Re: [PATCH] mm->rss is modified without page_table_lock held

> The following patch moves the page_table_lock in mm/* to cover the
> modification of mm->rss in 240-test12-pre7. It was inspired by a

can't we just change rss to count pages?
or are we worried about rss's over ~16 TB?

2000-12-08 21:44:36

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] mm->rss is modified without page_table_lock held

Date: Fri, 8 Dec 2000 15:36:35 -0500 (EST)
From: Mark Hahn <[email protected]>

can't we just change rss to count pages?

This is what it does now.

or are we worried about rss's over ~16 TB?

If we weren't, we could just use an atomic_t for this problem.
We can't.

This patch is the only currently available solution to this problem
which is in any way acceptable.

Later,
David S. Miller
[email protected]

2000-12-09 11:01:42

by Roberto Fichera

[permalink] [raw]
Subject: Re: [PATCH] mm->rss is modified without page_table_lock held

At 21.29 08/12/00 +0100, Rasmus Andersen wrote:

>Hi.
>
>The following patch moves the page_table_lock in mm/* to cover the
>modification of mm->rss in 240-test12-pre7. It was inspired by a
>similar patch from davej(?) which covered too much, AFAIR. The item
>is on Tytso's ToDo list.

[...snip...]

>@@ -1076,7 +1076,9 @@
> flush_icache_page(vma, page);
> }
>
>+ spin_lock(&mm->page_table_lock);
> mm->rss++;
>+ spin_unlock(&mm->page_table_lock);
>

[...snip...]

Why we couldn't use atomic_inc(&mm->rss) here and below, avoiding to wrap
the inc with a spin_lock()/spin_unlock() ?



2000-12-09 12:14:27

by Rasmus Andersen

[permalink] [raw]
Subject: Re: [PATCH] mm->rss is modified without page_table_lock held

On Sat, Dec 09, 2000 at 11:25:09AM +0100, Roberto Fichera wrote:
[...]
> >+ spin_lock(&mm->page_table_lock);
> > mm->rss++;
> >+ spin_unlock(&mm->page_table_lock);
> >
>
> [...snip...]
>
> Why we couldn't use atomic_inc(&mm->rss) here and below, avoiding to wrap
> the inc with a spin_lock()/spin_unlock() ?
>

AFAIR, because for some architectures we can't rely on mm->rss fitting in
an atomic_t. See davem's (somewhat short) post in this thread. Otherwise
search the archives for the original thread treating this problem.
--
Rasmus([email protected])

Television is called a medium because it is neither rare nor well-done.
-- Anonymous

2000-12-09 13:19:00

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] mm->rss is modified without page_table_lock held

Date: Sat, 09 Dec 2000 11:25:09 +0100
From: Roberto Fichera <[email protected]>

Why we couldn't use atomic_inc(&mm->rss) here and below, avoiding to wrap
the inc with a spin_lock()/spin_unlock() ?

atomic_t does not guarentee a large enough range necessary for mm->rss

Later,
David S. Miller
[email protected]

2000-12-09 15:24:24

by Roberto Fichera

[permalink] [raw]
Subject: Re: [PATCH] mm->rss is modified without page_table_lock held

At 13.43 09/12/00 +0100, Rasmus Andersen wrote:

>On Sat, Dec 09, 2000 at 11:25:09AM +0100, Roberto Fichera wrote:
>[...]
> > >+ spin_lock(&mm->page_table_lock);
> > > mm->rss++;
> > >+ spin_unlock(&mm->page_table_lock);
> > >
> >
> > [...snip...]
> >
> > Why we couldn't use atomic_inc(&mm->rss) here and below, avoiding to wrap
> > the inc with a spin_lock()/spin_unlock() ?
> >
>
>AFAIR, because for some architectures we can't rely on mm->rss fitting in
>an atomic_t. See davem's (somewhat short) post in this thread. Otherwise
>search the archives for the original thread treating this problem.

and At 04.32 09/12/00 -0800, David S. Miller wrote:

> Date: Sat, 09 Dec 2000 11:25:09 +0100
> From: Roberto Fichera <[email protected]>
>
> Why we couldn't use atomic_inc(&mm->rss) here and below, avoiding to wrap
> the inc with a spin_lock()/spin_unlock() ?
>
>atomic_t does not guarentee a large enough range necessary for mm->rss

If we haven't some atomic_t that can be negative we could define atomic_t
as unsigned long for all arch,
this is sufficient to fitting all the range for the mm->rss.

Roberto Fichera.

2000-12-09 15:29:27

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] mm->rss is modified without page_table_lock held

Date: Sat, 09 Dec 2000 15:48:05 +0100
From: Roberto Fichera <[email protected]>

>atomic_t does not guarentee a large enough range necessary for mm->rss

If we haven't some atomic_t that can be negative we could define atomic_t
as unsigned long for all arch,
this is sufficient to fitting all the range for the mm->rss.

32-bit Sparc has unsigned long as 32-bit, and the top 8 bits of the
atomic_t are used for a spinlock, thus a 27-bit atomic_t, there
is not enough precision.

Later,
David S. Miller
[email protected]

2000-12-09 15:44:09

by Roberto Fichera

[permalink] [raw]
Subject: Re: [PATCH] mm->rss is modified without page_table_lock held

At 06.42 09/12/00 -0800, David S. Miller wrote:

> Date: Sat, 09 Dec 2000 15:48:05 +0100
> From: Roberto Fichera <[email protected]>
>
> >atomic_t does not guarentee a large enough range necessary for mm->rss
>
> If we haven't some atomic_t that can be negative we could define atomic_t
> as unsigned long for all arch,
> this is sufficient to fitting all the range for the mm->rss.
>
>32-bit Sparc has unsigned long as 32-bit, and the top 8 bits of the
>atomic_t are used for a spinlock, thus a 27-bit atomic_t, there
>is not enough precision.

8 bits for a spinlock ? What kind of use we have here ? All arch except Sparc32
don't have this trick.

Roberto Fichera.

2000-12-09 15:46:59

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] mm->rss is modified without page_table_lock held

Date: Sat, 09 Dec 2000 16:07:03 +0100
From: Roberto Fichera <[email protected]>

8 bits for a spinlock ? What kind of use we have here ?

Sparc32 (like some other older architectures) do not have a
word atomic update instruction, but it does have a byte spinlock.
To conserve space and implement the atomic update properly, we
use a spinlock in the top byte of the word.

All arch except Sparc32 don't have this trick.

This may not be true forever.

Also, this sematic was decided upon many eons ago, changing it a month
before 2.4.0 just to deal with this mm->rss atomicity issue is not
going to happen. The spinlock patch exists, and if nothing better
comes up, we should just use it.

Later,
David S. Miller
[email protected]

2000-12-09 18:48:56

by Roberto Fichera

[permalink] [raw]
Subject: Re: [PATCH] mm->rss is modified without page_table_lock held

At 07.00 09/12/00 -0800, David S. Miller wrote:

> Date: Sat, 09 Dec 2000 16:07:03 +0100
> From: Roberto Fichera <[email protected]>
>
> 8 bits for a spinlock ? What kind of use we have here ?
>
>Sparc32 (like some other older architectures) do not have a
>word atomic update instruction, but it does have a byte spinlock.
>To conserve space and implement the atomic update properly, we
>use a spinlock in the top byte of the word.

There's any possibility ;-) to define it as

typedef struct { volatile char spinlock, volatile long counter } atomic_t;

>Also, this sematic was decided upon many eons ago, changing it a month
>before 2.4.0 just to deal with this mm->rss atomicity issue is not
>going to happen. The spinlock patch exists, and if nothing better
>comes up, we should just use it.

Indeed! You are right! I was thinking to optimize it, using a
spinlock/unlock we spent
several time for a inc.

Roberto Fichera.

2000-12-09 23:10:20

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: [PATCH] mm->rss is modified without page_table_lock held

> 32-bit Sparc has unsigned long as 32-bit, and the top 8 bits of the
> atomic_t are used for a spinlock, thus a 27-bit atomic_t, there
> is not enough precision.

So the SPARC port is broken. It is just sick to have this "feature"
screw things up for all the ports with a proper atomic_t.