LinuxLists.cc - [BUG] staging: android: ashmem: Deadlock during ashmem_mmap and ashmem

2013-03-20 15:38:06

Subject: [BUG] staging: android: ashmem: Deadlock during ashmem_mmap and ashmem_read

Hi Greg, Dan,

Few days back I posted a patch to fix a dead lock issue in the ashmem
driver that got merged in staging-next branch
https://lkml.org/lkml/2013/2/20/429

I'm seeing that there exists another path in the ashmem driver that
could lead to the similar issue. But this time I'm unable to think of
a way to fix the problem.

- The objects involved in the deadlock are same, mmap->sem and ashmem_mutex
- Assume that there are two threads T1 and T2 that belong to the same
process space, so they?ll share the mm struct, call it ?mm?
- In the context of T1 ashmem_mmap is called
- Now within the kernel this takes the mmap_sem and sleeps
before it acquires the ashmem_mutex
- Thread T2 runs and calls ashmem_read
- It takes the ashmem_mutex and invokes file->f_op->read,
which eventually calls file_read_actor
- The call stack is as below
ashmem_read (acquires ahsmem_mutex) --> f_op->read -->
do_sync_read -->shmem_file_aio_read --> file_read_actor
- There is a copy_to_user call in this function
file_read_actor (mm/filemap.c).
- Now, if there is no physical page allocated for the
address passed from the user space, this copy_to_user would lead to
do_DataAbort --> do_page_fault
- From do_page_fault it tries to get mmap_sem, but its not
available and is with T1, so waits for the mmap_sem holding
ashmem_mutex
- Thread 1 resumes and tries to acquire the ashmem_mutex (from
ashmem_mmap), which is with T2 and waits.
- Now we have entered a situation where T1 holds the mmap_sem and
waits for T2 to release ashmem_mutex, while T2 holds the ashmem_mutex
and waits for T1 to release the mmap_sem

Since I'm not very familiar with ashmem_read code flow, not sure on
how to fix this. When I look at things in isolation it all looks OK.
The mmap_sem acquisition from mm/mmap.c and do_page_fault is obvious
no doubts there
ashmem_mutex is the ashmem driver's lock, so all the driver functions
should hold this to maintain consistency of its data structures
So, i'm not seeing anything wrong in holding this in ashmem_read and
ashmem_mmap. But not sure whether its OK to call the VFS layer call
from ashmem_read.
I'm not aware of the ashmem driver's design goal and what the
developer had in mind while writing this function, so not sure about
any alternate approach.

Really appreciate any suggestions/guidance on how to go about fixing
this problem.

Warm regards,
Shankar

2013-03-20 16:09:08

by Al Viro

[permalink] [raw]

Subject: Re: [BUG] staging: android: ashmem: Deadlock during ashmem_mmap and ashmem_read

On Wed, Mar 20, 2013 at 09:08:03PM +0530, Shankar Brahadeeswaran wrote:
> Hi Greg, Dan,
>
> Few days back I posted a patch to fix a dead lock issue in the ashmem
> driver that got merged in staging-next branch
> https://lkml.org/lkml/2013/2/20/429
>
> I'm seeing that there exists another path in the ashmem driver that
> could lead to the similar issue. But this time I'm unable to think of
> a way to fix the problem.
>
> - The objects involved in the deadlock are same, mmap->sem and ashmem_mutex

Umm... why does it need to hold that mutex past having checked that
asma->file is non-NULL, anyway?

2013-03-20 16:52:28

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [BUG] staging: android: ashmem: Deadlock during ashmem_mmap and ashmem_read

On Wed, Mar 20, 2013 at 09:08:03PM +0530, Shankar Brahadeeswaran wrote:
> Hi Greg, Dan,
>
> Few days back I posted a patch to fix a dead lock issue in the ashmem
> driver that got merged in staging-next branch
> https://lkml.org/lkml/2013/2/20/429
>
> I'm seeing that there exists another path in the ashmem driver that
> could lead to the similar issue. But this time I'm unable to think of
> a way to fix the problem.

Why not copy the android developers who wrote this code on this thread?

greg k-h

2013-03-21 04:03:55

by Shankar Brahadeeswaran

[permalink] [raw]

Subject: Re: [BUG] staging: android: ashmem: Deadlock during ashmem_mmap and ashmem_read

>
> Why not copy the android developers who wrote this code on this thread?
>
> greg k-h

Adding Robert Love & Bjorn Bringert who added most part of the ashmem code.

> Umm... why does it need to hold that mutex past having checked that
> asma->file is non-NULL, anyway?
Not sure. As I mentioned even I don't know the full flow of this
driver so unable to reason out. Better to hear from the developers.

Hi Robert, Bjorn,
Can you please help us resolving the deadlock problem being discussed here.
You can look at the first email in this thread for detailed problem description.

Warm Regards,
Shankar

2013-03-21 14:18:40

by Robert Love

[permalink] [raw]

Subject: Re: [BUG] staging: android: ashmem: Deadlock during ashmem_mmap and ashmem_read

On Thu, Mar 21, 2013 at 10:06 AM, Bjorn Bringert <[email protected]> wrote:
> I did implement ashmem_read, but I had no idea what I was doing. Calling the
> VFS read function seemed like an obvious way to do it, but it might be
> wrong. If that needs fixing, then the similar VFS call in ashmem_llseek
> probably needs fixing too.

You don't want to hold ashmem_mutex across the VFS calls. It is only
needed to protect the ashmem-internal structures.

FWIW is Android now using ashmem_read()? I left it out of the original
ashmem implementation on purpose.

Robert

2013-03-22 14:42:41

by Shankar Brahadeeswaran

[permalink] [raw]

Subject: Re: [BUG] staging: android: ashmem: Deadlock during ashmem_mmap and ashmem_read

> You don't want to hold ashmem_mutex across the VFS calls. It is only
> needed to protect the ashmem-internal structures.

In the ashmem_read function, after the VFS call the asma structure
gets updated again. So one option is to drop the lock before invoking
the VFS call and then take it again once it returns.
But if a given thread invokes ashmem_read and sleeps, is there a
scenario where some other task can come and access asma->file->f_pos ?
If yes, the we'll be returning the older f_pos value. I'm not sure
whether this is expected behavior. Since we are not done with the read
yet, we can report the old f_pos, I think. So is the below pseudo
code OK?

static ssize_t ashmem_read(struct file *file, char __user *buf,
size_t len, loff_t *pos)
{
struct ashmem_area *asma = file->private_data;
int ret = 0;

mutex_lock(&ashmem_mutex);
....
mutex_unlock(&ashmem_mutex);
ret = asma->file->f_op->read(asma->file, buf, len, pos);
if (ret < 0) {
return ret;
}

mutex_lock(&ashmem_mutex);
asma->file->f_pos = *pos;
out:
mutex_unlock(&ashmem_mutex);
return ret;
}

If this is OK, then similar changes can be done for ashmem_lseek as well.

2013-03-22 16:47:05

by Shankar Brahadeeswaran

[permalink] [raw]