2001-03-23 10:33:00

by Adam J. Richter

[permalink] [raw]
Subject: Patch(?): linux-2.4.3-pre6/mm/vmalloc.c could return with init_mm.page_table_lock held

[Sorry for posting three messages to linux-kernel about this.
Each time I was pretty sure I was done for the night. Anyhow, I
hope this proposed patch makes up for it.]

In linux-2.4.3-pre6, a call to vmalloc can result in a call to
pte_alloc without the appropriate page_table_lock being held. Here is
the call graph, from my post of about half an hour ago:

vmalloc
__vmalloc
vmalloc_area_pages
alloc_area_pmd
pte_alloc ...which assumes (here incorrectly) that
mm->page_table_lock is held, and sometimes releases
and reacquires mm->page_table_lock.

Not only does pte_alloc expect mm->page_table_lock
to be held when it is called, but it also sometimes releases and
reacquires it. vmalloc did not release this lock either, of course.
So, the next attempt to acquire the same mm->page_table_lock spin lock
hangs.

The symptom that I had noticed was the agpgart.o module hanging
at module initialization, but it is a much more general problem, and
could explain all sorts of hangs in 2.4.3-pre6.

Anyhow, with this patch, agpgart.o loads just fine and the
kernel seems to have suffered no negative side effects. I am
not confident in exactly where I chose to put the spin_lock and
spin_unlock calls, so I would recommend a careful examination of
this patch before integrating.

--
Adam J. Richter __ ______________ 4880 Stevens Creek Blvd, Suite 104
[email protected] \ / San Jose, California 95129-1034
+1 408 261-6630 | g g d r a s i l United States of America
fax +1 408 261-6631 "Free Software For The Rest Of Us."


Attachments:
(No filename) (1.61 kB)
vmalloc.patch (1.05 kB)
Download all attachments

2001-03-23 11:20:41

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: Patch(?): linux-2.4.3-pre6/mm/vmalloc.c could return with init_mm.page_table_lock held



On Fri, 23 Mar 2001, Adam J. Richter wrote:

> [Sorry for posting three messages to linux-kernel about this.
> Each time I was pretty sure I was done for the night. Anyhow, I
> hope this proposed patch makes up for it.]
>
> In linux-2.4.3-pre6, a call to vmalloc can result in a call to
> pte_alloc without the appropriate page_table_lock being held. Here is
> the call graph, from my post of about half an hour ago:
>
> vmalloc
> __vmalloc
> vmalloc_area_pages
> alloc_area_pmd
> pte_alloc ...which assumes (here incorrectly) that
> mm->page_table_lock is held, and sometimes releases
> and reacquires mm->page_table_lock.
>
> Not only does pte_alloc expect mm->page_table_lock
> to be held when it is called, but it also sometimes releases and
> reacquires it. vmalloc did not release this lock either, of course.
> So, the next attempt to acquire the same mm->page_table_lock spin lock
> hangs.
>
> The symptom that I had noticed was the agpgart.o module hanging
> at module initialization, but it is a much more general problem, and
> could explain all sorts of hangs in 2.4.3-pre6.
>
> Anyhow, with this patch, agpgart.o loads just fine and the
> kernel seems to have suffered no negative side effects. I am
> not confident in exactly where I chose to put the spin_lock and
> spin_unlock calls, so I would recommend a careful examination of
> this patch before integrating.

There is no need to hold mm->page_table_lock for vmalloced memory.

I guess a better solution is to make the vmalloc codepath use
"pte_alloc_vmalloc" (or something like that) which would be a
spinlock-free version of pte_alloc (like the old one).

2001-03-23 11:37:11

by Adam J. Richter

[permalink] [raw]
Subject: Re: Patch(?): linux-2.4.3-pre6/mm/vmalloc.c could return with init_mm.page_table_lock held

Marcelo Tosatti <[email protected]> writes:
>There is no need to hold mm->page_table_lock for vmalloced memory.

I don't know if it makes a difference, but I should clarify
that mm == &init_mm throughout this code, not &current->mm.

Adam J. Richter __ ______________ 4880 Stevens Creek Blvd, Suite 104
[email protected] \ / San Jose, California 95129-1034
+1 408 261-6630 | g g d r a s i l United States of America
fax +1 408 261-6631 "Free Software For The Rest Of Us."

2001-03-23 19:43:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: Patch(?): linux-2.4.3-pre6/mm/vmalloc.c could return with init_mm.page_table_lock held



On Fri, 23 Mar 2001, Marcelo Tosatti wrote:
>
> There is no need to hold mm->page_table_lock for vmalloced memory.

But there is. You do need _some_ protection to protect the kernel from
inserting two different pmd/pgd entries for two different areas in the
same slot. And that's exactly what page_table_lock does for us.

> I guess a better solution is to make the vmalloc codepath use
> "pte_alloc_vmalloc" (or something like that) which would be a
> spinlock-free version of pte_alloc (like the old one).

The old one avoided the race by using the big kernel lock. Which is
totally non-sensical, but works. It's much better to use the spinlock that
is meant for exactly this thing.

Linus

2001-03-23 19:41:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: Patch(?): linux-2.4.3-pre6/mm/vmalloc.c could return with init_mm.page_table_lock held



On Fri, 23 Mar 2001, Adam J. Richter wrote:
>
> In linux-2.4.3-pre6, a call to vmalloc can result in a call to
> pte_alloc without the appropriate page_table_lock being held. Here is
> the call graph, from my post of about half an hour ago:

Good find.

HOWEVER, the patch is not right.

You cannot get the kernel lock from within a spinlock, so you should
_replace_ the kernel lock with the spinlock (which is definitely the
correct thing to do anyway - the kernel lock is there exactly because we
didn't have any other good synchronization primitive, and that's _exactly_
what the spinlock in question is all about).

Also, you have to drop the spinlock over the actual page allocation inside
alloc_area_pte(). So I'd suggest something along the lines of the attached
(completely untested) patch.

Linus

-----
--- pre6/linux/mm/vmalloc.c Tue Mar 20 23:13:03 2001
+++ linux/mm/vmalloc.c Fri Mar 23 11:38:24 2001
@@ -102,9 +102,11 @@
end = PMD_SIZE;
do {
struct page * page;
+ spin_unlock(&init_mm.page_table_lock);
+ page = alloc_page(gfp_mask);
+ spin_lock(&init_mm.page_table_lock);
if (!pte_none(*pte))
printk(KERN_ERR "alloc_area_pte: page already exists\n");
- page = alloc_page(gfp_mask);
if (!page)
return -ENOMEM;
set_pte(pte, mk_pte(page, prot));
@@ -143,7 +145,7 @@

dir = pgd_offset_k(address);
flush_cache_all();
- lock_kernel();
+ spin_lock(&init_mm.page_table_lock);
do {
pmd_t *pmd;

@@ -161,7 +163,7 @@

ret = 0;
} while (address && (address < end));
- unlock_kernel();
+ spin_unlock(&init_mm.page_table_lock)
flush_tlb_all();
return ret;
}