LinuxLists.cc - Oddness in i_shared_lock and page_table

2000-11-05 05:56:03

Subject: Oddness in i_shared_lock and page_table_lock nesting hierarchies ?

The vma list lock can nest with i_shared_lock, as per Kanoj Sarcar's
note on mem-mgmt locks (Documentation/vm/locking), and currently
vma list lock == mm->page_table_lock.
But there appears to be some inconsistency in the hierarchy of these
2 locks. (By vma list lock I mean vmlist_access/modify_lock(s) )

Looking at mmap code, it appears that the vma list lock
i.e. page_table_lock right now, is to be acquired first
(e.g insert_vm_struct which acquires i_shared_lock internally,
is called under the page_table_lock/vma list lock).
Elsewhere in madvise too, I see a similar hierarchy.
In the unmap code, care has been taken not to have these locks
acquired simultaneously.

However, in the vmtruncate code, it looks like the hierarchy is
reversed.
There, the i_shared_lock is acquired, in order to traverse the i_mmap
list
and inside the loop it calls zap_page_range, which aquires the
page_table_lock.

This is odd. Isn't there a possibility of a deadlock if mmap and
truncation
for the same file happen simultaneously (on an SMP) ?

I'm wondering if this could be a side effect of the doubling up of the
page_table_lock as a vma list lock ?

Or have I missed something ?

[I have checked upto 2.4-test10-pre5 ]

I had put this query up on linux-mm, as part of a much larger mail, but
didn't get any response yet, so I thought of putting up a more focussed
query this time.

Regards
Suparna

Suparna Bhattacharya
Systems Software Group, IBM Global Services, India
E-mail : [email protected]
Phone : 91-80-5267117, Extn : 2525

>List: linux-mm
>Subject: Re: Updated Linux 2.4 Status/TODO List (from the ALS show)
>From: Kanoj Sarcar <[email protected]>
>Date: 2000-10-13 18:19:06
>[Download message RAW]

>>
>>
>> On Thu, 12 Oct 2000, David S. Miller wrote:
>> >
>> > page_table_lock is supposed to protect normal page table activity
(like
>> > what's done in page fault handler) from swapping out.
>> > However, grabbing this lock in swap-out code is completely missing!
>> >
>> > Audrey, vmlist_access_{un,}lock == unlocking/locking page_table_lock.
>>
>> Yeah, it's an easy mistake to make.
>>
>> I've made it myself - grepping for page_table_lock and coming up empty
in
>> places where I expected it to be.
>>
>> In fact, if somebody sends me patches to remove the
"vmlist_access_lock()"
>> stuff completely, and replace them with explicit page_table_lock things,
>> I'll apply it pretty much immediately. I don't like information hiding,
>> and right now that's the only thing that the vmlist_access_lock() stuff
is
>> doing.

>Linus,
>
>I came up with the vmlist_access_lock/vmlist_modify_lock names early in
>2.3. The reasoning behind that was that in most places where the "vmlist
>lock" was being taken was to protect the vmlist chain, vma_t fields or
>mm_struct fields. The fact that implementation wise this lock could be
>the same as page_table_lock was a good idea that you suggested.
>
>Nevertheless, the name was chosen to indicate what type of things it was
>guarding. For example, in the future, you might actually have a different
>(possibly sleeping) lock to guard the vmachain etc, but still have a
>spin lock for the page_table_lock (No, I don't want to be drawn into a
>discussion of why this might be needed right now). Some of this is
>mentioned in Documentation/vm/locking.
>
>Just thought I would mention, in case you don't recollect some of this
>history. Of course, I understand the "information hiding" part.
>
>Kanoj
>

2000-11-06 03:30:14

by David Miller

[permalink] [raw]

Subject: Re: Oddness in i_shared_lock and page_table_lock nesting hierarchies ?

From: [email protected]
Date: Sun, 5 Nov 2000 11:21:05 +0530

However, in the vmtruncate code, it looks like the hierarchy is
reversed.

It is a well known bug amongst gurus :-) I sent a linux24 bug addition
to Ted Ty'tso a week or so ago but he dropped it aparently.

Ted, I'm resending this below, please add it to the linux24 list
thanks.

X-Coding-System: undecided-unix
Date: Fri, 13 Oct 2000 17:36:04 -0700
From: "David S. Miller" <[email protected]>
To: [email protected]
Subject: New BUG for todo list

This bug will essentially hard-hang an SMP system if triggered. Linus
and myself both know about it already for some time now, the fix is
straight forward, just nobody has coded it up and tested it yet.

The problem basically is that mm/memory.c:vmtruncate() violates the
lock acquisition ordering rules when both mm->page_table_lock and
mapping->i_shared_lock must both be acquired. All other instances (in
the mmap/munmap syscalls for example) acuire the page_table_lock then
the i_shared_lock. vmtruncate() on the other hand acquires the locks
i_shared_lock first then page_table_lock.

Essentially I would describe this in the TODO list as:

vmtruncate() violates page_table_lock/i_shared_lock
acquisition ordering rules leading to deadlock

The fix is to actually keep vmtruncate() how it is, and change the
ordering rules for the rest of the kernel to follow vmtruncate()'s
lock ordering. Linus agreed with me on this because the more natural
data inspection flow is to go from the object to mappings of that
object.

I'm going to try and work on this change this weekend, but I want it
to be in the bug list so that it _is_ accounted for.

Later,
David S. Miller
[email protected]

2000-11-06 13:55:33

by Suparna Bhattacharya

[permalink] [raw]

Subject: Re: Oddness in i_shared_lock and page_table_lock nesting hierarchies ?

Thanks.
That sounds like a reasonable approach to take.

Regards
- Suparna

Suparna Bhattacharya
Systems Software Group, IBM Global Services, India
E-mail : [email protected]
Phone : 91-80-5267117, Extn : 2525

"David S. Miller" <[email protected]> on 11/06/2000 08:44:29 AM

Please respond to "David S. Miller" <[email protected]>

To: Suparna Bhattacharya/India/IBM@IBMIN
cc: [email protected], [email protected], [email protected]
Subject: Re: Oddness in i_shared_lock and page_table_lock nesting
hierarchies ?

From: [email protected]
Date: Sun, 5 Nov 2000 11:21:05 +0530

However, in the vmtruncate code, it looks like the hierarchy is
reversed.

It is a well known bug amongst gurus :-) I sent a linux24 bug addition
to Ted Ty'tso a week or so ago but he dropped it aparently.

Ted, I'm resending this below, please add it to the linux24 list
thanks.

X-Coding-System: undecided-unix
Date: Fri, 13 Oct 2000 17:36:04 -0700
From: "David S. Miller" <[email protected]>
To: [email protected]
Subject: New BUG for todo list

This bug will essentially hard-hang an SMP system if triggered. Linus
and myself both know about it already for some time now, the fix is
straight forward, just nobody has coded it up and tested it yet.

The problem basically is that mm/memory.c:vmtruncate() violates the
lock acquisition ordering rules when both mm->page_table_lock and
mapping->i_shared_lock must both be acquired. All other instances (in
the mmap/munmap syscalls for example) acuire the page_table_lock then
the i_shared_lock. vmtruncate() on the other hand acquires the locks
i_shared_lock first then page_table_lock.

Essentially I would describe this in the TODO list as:

vmtruncate() violates page_table_lock/i_shared_lock
acquisition ordering rules leading to deadlock

The fix is to actually keep vmtruncate() how it is, and change the
ordering rules for the rest of the kernel to follow vmtruncate()'s
lock ordering. Linus agreed with me on this because the more natural
data inspection flow is to go from the object to mappings of that
object.

I'm going to try and work on this change this weekend, but I want it
to be in the bug list so that it _is_ accounted for.

Later,
David S. Miller
[email protected]

2000-11-06 17:43:36

by Theodore Ts'o

[permalink] [raw]

Subject: Re: Oddness in i_shared_lock and page_table_lock nesting hierarchies ?

Date: Sun, 5 Nov 2000 19:14:29 -0800
From: "David S. Miller" <[email protected]>

It is a well known bug amongst gurus :-) I sent a linux24 bug addition
to Ted Ty'tso a week or so ago but he dropped it aparently.

I got it, but I thought it was fixed before I had a chance to add it to
the bug list. I got confused by one of Linus's descriptions of fixes to
the test10-pre* series.

Sorry; my bad. I'll get it added to the list.

- Ted

2000-11-09 12:23:54

by Suparna Bhattacharya

[permalink] [raw]

Subject: Re: Oddness in i_shared_lock and page_table_lock nesting hierarchies ?

> The fix we agreed on back then is easy (make rest of kernel match
> vmtruncate()'s locking order), it just takes time to implement and
> test.

David,

I was looking into the vmm code and trying to work out exactly how to fix
this, and there are
some questions in my mind - mainly a few cases (involving multiple vma
updates) which
I'm not sure about the cleanest way to tackle.
But before I bother anyone with those, I thought I ought to go through
the earlier discussions
that you had while coming up with what the fix should be. Maybe you've
already gone over
this once.
Could you point me to those ? Somehow I haven't been successful in
locating them.

Regards
Suparna

Suparna Bhattacharya
Systems Software Group, IBM Global Services, India
E-mail : [email protected]
Phone : 91-80-5267117, Extn : 2525

2000-11-10 06:59:27

by David Miller

[permalink] [raw]

Subject: Re: Oddness in i_shared_lock and page_table_lock nesting hierarchies ?

From: [email protected]
Date: Thu, 9 Nov 2000 17:46:53 +0530

I was looking into the vmm code and trying to work out exactly
how to fix this

Let me save you some time, below is the fix I sent to
Linus this evening:

diff -u --recursive --new-file --exclude=CVS --exclude=.cvsignore vanilla/linux/include/linux/mm.h linux/include/linux/mm.h
--- vanilla/linux/include/linux/mm.h Thu Nov 9 21:43:46 2000
+++ linux/include/linux/mm.h Thu Nov 9 20:16:04 2000
@@ -417,8 +417,11 @@
extern void swapin_readahead(swp_entry_t);

/* mmap.c */
+extern void lock_vma_mappings(struct vm_area_struct *);
+extern void unlock_vma_mappings(struct vm_area_struct *);
extern void merge_segments(struct mm_struct *, unsigned long, unsigned long);
extern void insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
+extern void __insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
extern void build_mmap_avl(struct mm_struct *);
extern void exit_mmap(struct mm_struct *);
extern unsigned long get_unmapped_area(unsigned long, unsigned long);
diff -u --recursive --new-file --exclude=CVS --exclude=.cvsignore vanilla/linux/mm/filemap.c linux/mm/filemap.c
--- vanilla/linux/mm/filemap.c Thu Nov 9 21:43:46 2000
+++ linux/mm/filemap.c Thu Nov 9 20:08:24 2000
@@ -1813,11 +1813,13 @@
get_file(n->vm_file);
if (n->vm_ops && n->vm_ops->open)
n->vm_ops->open(n);
+ lock_vma_mappings(vma);
spin_lock(&vma->vm_mm->page_table_lock);
vma->vm_pgoff += (end - vma->vm_start) >> PAGE_SHIFT;
vma->vm_start = end;
- insert_vm_struct(current->mm, n);
+ __insert_vm_struct(current->mm, n);
spin_unlock(&vma->vm_mm->page_table_lock);
+ unlock_vma_mappings(vma);
return 0;
}

@@ -1837,10 +1839,12 @@
get_file(n->vm_file);
if (n->vm_ops && n->vm_ops->open)
n->vm_ops->open(n);
+ lock_vma_mappings(vma);
spin_lock(&vma->vm_mm->page_table_lock);
vma->vm_end = start;
- insert_vm_struct(current->mm, n);
+ __insert_vm_struct(current->mm, n);
spin_unlock(&vma->vm_mm->page_table_lock);
+ unlock_vma_mappings(vma);
return 0;
}

@@ -1870,15 +1874,17 @@
vma->vm_ops->open(left);
vma->vm_ops->open(right);
}
+ lock_vma_mappings(vma);
spin_lock(&vma->vm_mm->page_table_lock);
vma->vm_pgoff += (start - vma->vm_start) >> PAGE_SHIFT;
vma->vm_start = start;
vma->vm_end = end;
setup_read_behavior(vma, behavior);
vma->vm_raend = 0;
- insert_vm_struct(current->mm, left);
- insert_vm_struct(current->mm, right);
+ __insert_vm_struct(current->mm, left);
+ __insert_vm_struct(current->mm, right);
spin_unlock(&vma->vm_mm->page_table_lock);
+ unlock_vma_mappings(vma);
return 0;
}

diff -u --recursive --new-file --exclude=CVS --exclude=.cvsignore vanilla/linux/mm/mlock.c linux/mm/mlock.c
--- vanilla/linux/mm/mlock.c Fri Oct 13 12:10:30 2000
+++ linux/mm/mlock.c Thu Nov 9 17:47:00 2000
@@ -36,11 +36,13 @@
get_file(n->vm_file);
if (n->vm_ops && n->vm_ops->open)
n->vm_ops->open(n);
+ lock_vma_mappings(vma);
spin_lock(&vma->vm_mm->page_table_lock);
vma->vm_pgoff += (end - vma->vm_start) >> PAGE_SHIFT;
vma->vm_start = end;
- insert_vm_struct(current->mm, n);
+ __insert_vm_struct(current->mm, n);
spin_unlock(&vma->vm_mm->page_table_lock);
+ unlock_vma_mappings(vma);
return 0;
}

@@ -61,10 +63,12 @@
get_file(n->vm_file);
if (n->vm_ops && n->vm_ops->open)
n->vm_ops->open(n);
+ lock_vma_mappings(vma);
spin_lock(&vma->vm_mm->page_table_lock);
vma->vm_end = start;
- insert_vm_struct(current->mm, n);
+ __insert_vm_struct(current->mm, n);
spin_unlock(&vma->vm_mm->page_table_lock);
+ unlock_vma_mappings(vma);
return 0;
}

@@ -96,15 +100,17 @@
vma->vm_ops->open(left);
vma->vm_ops->open(right);
}
+ lock_vma_mappings(vma);
spin_lock(&vma->vm_mm->page_table_lock);
vma->vm_pgoff += (start - vma->vm_start) >> PAGE_SHIFT;
vma->vm_start = start;
vma->vm_end = end;
vma->vm_flags = newflags;
vma->vm_raend = 0;
- insert_vm_struct(current->mm, left);
- insert_vm_struct(current->mm, right);
+ __insert_vm_struct(current->mm, left);
+ __insert_vm_struct(current->mm, right);
spin_unlock(&vma->vm_mm->page_table_lock);
+ unlock_vma_mappings(vma);
return 0;
}

diff -u --recursive --new-file --exclude=CVS --exclude=.cvsignore vanilla/linux/mm/mmap.c linux/mm/mmap.c
--- vanilla/linux/mm/mmap.c Fri Oct 13 12:10:30 2000
+++ linux/mm/mmap.c Thu Nov 9 17:53:02 2000
@@ -67,7 +67,7 @@
}

/* Remove one vm structure from the inode's i_mapping address space. */
-static inline void remove_shared_vm_struct(struct vm_area_struct *vma)
+static inline void __remove_shared_vm_struct(struct vm_area_struct *vma)
{
struct file * file = vma->vm_file;

@@ -75,14 +75,41 @@
struct inode *inode = file->f_dentry->d_inode;
if (vma->vm_flags & VM_DENYWRITE)
atomic_inc(&inode->i_writecount);
- spin_lock(&inode->i_mapping->i_shared_lock);
if(vma->vm_next_share)
vma->vm_next_share->vm_pprev_share = vma->vm_pprev_share;
*vma->vm_pprev_share = vma->vm_next_share;
- spin_unlock(&inode->i_mapping->i_shared_lock);
}
}

+static inline void remove_shared_vm_struct(struct vm_area_struct *vma)
+{
+ lock_vma_mappings(vma);
+ __remove_shared_vm_struct(vma);
+ unlock_vma_mappings(vma);
+}
+
+void lock_vma_mappings(struct vm_area_struct *vma)
+{
+ struct address_space *mapping;
+
+ mapping = NULL;
+ if (vma->vm_file)
+ mapping = vma->vm_file->f_dentry->d_inode->i_mapping;
+ if (mapping)
+ spin_lock(&mapping->i_shared_lock);
+}
+
+void unlock_vma_mappings(struct vm_area_struct *vma)
+{
+ struct address_space *mapping;
+
+ mapping = NULL;
+ if (vma->vm_file)
+ mapping = vma->vm_file->f_dentry->d_inode->i_mapping;
+ if (mapping)
+ spin_unlock(&mapping->i_shared_lock);
+}
+
/*
* sys_brk() for the most part doesn't need the global kernel
* lock, except when an application is doing something nasty
@@ -316,13 +343,22 @@
* after the call. Save the values we need now ...
*/
flags = vma->vm_flags;
- addr = vma->vm_start; /* can addr have changed?? */
+
+ /* Can addr have changed??
+ *
+ * Answer: Yes, several device drivers can do it in their
+ * f_op->mmap method. -DaveM
+ */
+ addr = vma->vm_start;
+
+ lock_vma_mappings(vma);
spin_lock(&mm->page_table_lock);
- insert_vm_struct(mm, vma);
+ __insert_vm_struct(mm, vma);
if (correct_wcount)
atomic_inc(&file->f_dentry->d_inode->i_writecount);
merge_segments(mm, vma->vm_start, vma->vm_end);
spin_unlock(&mm->page_table_lock);
+ unlock_vma_mappings(vma);

mm->total_vm += len >> PAGE_SHIFT;
if (flags & VM_LOCKED) {
@@ -534,10 +570,12 @@
/* Work out to one of the ends. */
if (end == area->vm_end) {
area->vm_end = addr;
+ lock_vma_mappings(area);
spin_lock(&mm->page_table_lock);
} else if (addr == area->vm_start) {
area->vm_pgoff += (end - area->vm_start) >> PAGE_SHIFT;
area->vm_start = end;
+ lock_vma_mappings(area);
spin_lock(&mm->page_table_lock);
} else {
/* Unmapping a hole: area->vm_start < addr <= end < area->vm_end */
@@ -560,12 +598,18 @@
if (mpnt->vm_ops && mpnt->vm_ops->open)
mpnt->vm_ops->open(mpnt);
area->vm_end = addr; /* Truncate area */
+
+ /* Because mpnt->vm_file == area->vm_file this locks
+ * things correctly.
+ */
+ lock_vma_mappings(area);
spin_lock(&mm->page_table_lock);
- insert_vm_struct(mm, mpnt);
+ __insert_vm_struct(mm, mpnt);
}

- insert_vm_struct(mm, area);
+ __insert_vm_struct(mm, area);
spin_unlock(&mm->page_table_lock);
+ unlock_vma_mappings(area);
return extra;
}

@@ -811,10 +855,12 @@
flags = vma->vm_flags;
addr = vma->vm_start;

+ lock_vma_mappings(vma);
spin_lock(&mm->page_table_lock);
- insert_vm_struct(mm, vma);
+ __insert_vm_struct(mm, vma);
merge_segments(mm, vma->vm_start, vma->vm_end);
spin_unlock(&mm->page_table_lock);
+ unlock_vma_mappings(vma);

mm->total_vm += len >> PAGE_SHIFT;
if (flags & VM_LOCKED) {
@@ -877,9 +923,10 @@
}

/* Insert vm structure into process list sorted by address
- * and into the inode's i_mmap ring.
+ * and into the inode's i_mmap ring. If vm_file is non-NULL
+ * then the i_shared_lock must be held here.
*/
-void insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vmp)
+void __insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vmp)
{
struct vm_area_struct **pprev;
struct file * file;
@@ -916,15 +963,20 @@
head = &mapping->i_mmap_shared;

/* insert vmp into inode's share list */
- spin_lock(&mapping->i_shared_lock);
if((vmp->vm_next_share = *head) != NULL)
(*head)->vm_pprev_share = &vmp->vm_next_share;
*head = vmp;
vmp->vm_pprev_share = head;
- spin_unlock(&mapping->i_shared_lock);
}
}

+void insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vmp)
+{
+ lock_vma_mappings(vmp);
+ __insert_vm_struct(mm, vmp);
+ unlock_vma_mappings(vmp);
+}
+
/* Merge the list of memory segments if possible.
* Redundant vm_area_structs are freed.
* This assumes that the list is ordered by address.
@@ -986,11 +1038,13 @@
mpnt->vm_pgoff += (mpnt->vm_end - mpnt->vm_start) >> PAGE_SHIFT;
mpnt->vm_start = mpnt->vm_end;
spin_unlock(&mm->page_table_lock);
+ unlock_vma_mappings(mpnt);
mpnt->vm_ops->close(mpnt);
+ lock_vma_mappings(mpnt);
spin_lock(&mm->page_table_lock);
}
mm->map_count--;
- remove_shared_vm_struct(mpnt);
+ __remove_shared_vm_struct(mpnt);
if (mpnt->vm_file)
fput(mpnt->vm_file);
kmem_cache_free(vm_area_cachep, mpnt);
diff -u --recursive --new-file --exclude=CVS --exclude=.cvsignore vanilla/linux/mm/mprotect.c linux/mm/mprotect.c
--- vanilla/linux/mm/mprotect.c Wed Oct 18 14:25:46 2000
+++ linux/mm/mprotect.c Thu Nov 9 17:53:43 2000
@@ -118,11 +118,13 @@
get_file(n->vm_file);
if (n->vm_ops && n->vm_ops->open)
n->vm_ops->open(n);
+ lock_vma_mappings(vma);
spin_lock(&vma->vm_mm->page_table_lock);
vma->vm_pgoff += (end - vma->vm_start) >> PAGE_SHIFT;
vma->vm_start = end;
- insert_vm_struct(current->mm, n);
+ __insert_vm_struct(current->mm, n);
spin_unlock(&vma->vm_mm->page_table_lock);
+ unlock_vma_mappings(vma);
return 0;
}

@@ -145,10 +147,12 @@
get_file(n->vm_file);
if (n->vm_ops && n->vm_ops->open)
n->vm_ops->open(n);
+ lock_vma_mappings(vma);
spin_lock(&vma->vm_mm->page_table_lock);
vma->vm_end = start;
- insert_vm_struct(current->mm, n);
+ __insert_vm_struct(current->mm, n);
spin_unlock(&vma->vm_mm->page_table_lock);
+ unlock_vma_mappings(vma);
return 0;
}

@@ -179,6 +183,7 @@
vma->vm_ops->open(left);
vma->vm_ops->open(right);
}
+ lock_vma_mappings(vma);
spin_lock(&vma->vm_mm->page_table_lock);
vma->vm_pgoff += (start - vma->vm_start) >> PAGE_SHIFT;
vma->vm_start = start;
@@ -186,9 +191,10 @@
vma->vm_flags = newflags;
vma->vm_raend = 0;
vma->vm_page_prot = prot;
- insert_vm_struct(current->mm, left);
- insert_vm_struct(current->mm, right);
+ __insert_vm_struct(current->mm, left);
+ __insert_vm_struct(current->mm, right);
spin_unlock(&vma->vm_mm->page_table_lock);
+ unlock_vma_mappings(vma);
return 0;
}

diff -u --recursive --new-file --exclude=CVS --exclude=.cvsignore vanilla/linux/mm/mremap.c linux/mm/mremap.c
--- vanilla/linux/mm/mremap.c Wed Oct 18 14:25:46 2000
+++ linux/mm/mremap.c Thu Nov 9 17:53:57 2000
@@ -141,10 +141,12 @@
get_file(new_vma->vm_file);
if (new_vma->vm_ops && new_vma->vm_ops->open)
new_vma->vm_ops->open(new_vma);
+ lock_vma_mappings(vma);
spin_lock(&current->mm->page_table_lock);
- insert_vm_struct(current->mm, new_vma);
+ __insert_vm_struct(current->mm, new_vma);
merge_segments(current->mm, new_vma->vm_start, new_vma->vm_end);
spin_unlock(&current->mm->page_table_lock);
+ unlock_vma_mappings(vma);
do_munmap(current->mm, addr, old_len);
current->mm->total_vm += new_len >> PAGE_SHIFT;
if (new_vma->vm_flags & VM_LOCKED) {

2000-11-10 07:41:04

by aprasad

[permalink] [raw]

Subject: Re: Oddness in i_shared_lock and page_table_lock nesting hierarchies ?

>I was looking into the vmm code and trying to work out exactly how to fix
>this, and there are
> some questions in my mind - mainly a few cases (involving multiple vma
>updates) which
> I'm not sure about the cleanest way to tackle.
> But before I bother anyone with those, I thought I ought to go through
>the earlier discussions
> that you had while coming up with what the fix should be. Maybe you've
>already gone over
> this once.
> Could you point me to those ? Somehow I haven't been successful in
>locating them.

this link might be useful
http://mail.nl.linux.org/linux-mm/2000-07/msg00038.html

Anil Kumar Prasad,
Hardware Design,
IBM Global Services,
Pune,
INDIA
Phone:6349724 , 4007117 extn:1310

2000-11-10 07:49:16

by David Miller

[permalink] [raw]

Subject: Re: Oddness in i_shared_lock and page_table_lock nesting hierarchies ?

From: [email protected]
Date: Fri, 10 Nov 2000 12:25:24 +0530

this link might be useful
http://mail.nl.linux.org/linux-mm/2000-07/msg00038.html

This talks about a completely different bug.

We are talking here about inconsistant ordering of lock acquisition
when both mapping->i_shared_lock and mm->page_table_lock must be
held simultaneously.

The thread you quoted merely mentions the issue we have mm->rss
modifications are not always protected properly by the
page_table_lock.

There were no previous public discussions about the
i_shared_lock/page_table_lock deadlock problems.

Later,
David S. Miller
[email protected]

2000-11-10 09:47:18

by Suparna Bhattacharya

[permalink] [raw]

Subject: Re: Oddness in i_shared_lock and page_table_lock nesting hierarchies ?

>Let me save you some time, below is the fix I sent to
>Linus this evening:

Yes it does :-) Thanks.

I had started making changes along similar lines, and then as
I realized all the places it'd affect, I got a little diverted trying to
see if there was a way to do it without having to bring the locks
out of the common insert routines; whether it was possible to
avoid having to hold i_shared_lock and page_table_lock
simultaneously (other than in vmtruncate of course), through a
careful sequencing of steps, since we really need to serialize
only with truncate and the page stealer (as the mm sem takes
care of most other cases). Unmap does that now as it crosses
multiple objects (unlike the other cases where luckily we just
have one object at a time, which makes it possible to bring
the i_shared_lock to the top).

But it gets kind of complicated and harder to verify correctness
or generalize such an approach.
So your fix looks like a natural and consistent way to do this.

Now I can get back to what I was originally working on when I
hit this :-)

- Suparna

Suparna Bhattacharya
Systems Software Group, IBM Global Services, India
E-mail : [email protected]
Phone : 91-80-5267117, Extn : 2525