2004-10-16 09:33:37

by Nigel Cunningham

[permalink] [raw]
Subject: Best way to find where a lock is taken and not released?

Hi all.

I saw a hang the other day (2.6.8.1) where all other processes except
the suspending to disk one were refrigerated and the process doing the
suspending was stuck trying to take the dcache_lock via
shrink_all_memory. Obviously some path called via shrink_all_memory had
taken the lock and not released it, then tried to retake it _or_ another
process had taken the lock and then not released it when backing out and
entering the refrigator. My question is, what's the best way to find the
path on which this occurs? Grepping, I see dcache_lock all over the
show, so if there's a more efficient method that reading the files, I'd
like to learn it. It occurs to me that I might try wrapping calls to
lock and unlock that lock in printks, but I'm wondering if there's some
better way I don't yet know.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

Many today claim to be tolerant. True tolerance, however, can cope with others
being intolerant.


2004-10-16 16:46:59

by Andreas Dilger

[permalink] [raw]
Subject: Re: Best way to find where a lock is taken and not released?

On Oct 16, 2004 19:30 +1000, Nigel Cunningham wrote:
> I saw a hang the other day (2.6.8.1) where all other processes except
> the suspending to disk one were refrigerated and the process doing the
> suspending was stuck trying to take the dcache_lock via
> shrink_all_memory. Obviously some path called via shrink_all_memory had
> taken the lock and not released it, then tried to retake it _or_ another
> process had taken the lock and then not released it when backing out and
> entering the refrigator. My question is, what's the best way to find the
> path on which this occurs? Grepping, I see dcache_lock all over the
> show, so if there's a more efficient method that reading the files, I'd
> like to learn it. It occurs to me that I might try wrapping calls to
> lock and unlock that lock in printks, but I'm wondering if there's some
> better way I don't yet know.

Probably the easiest would be to add to the lock struct a pointer to
the task_struct and the EIP when it gets the lock, and clear them when
it releases the lock. That way, when you see processes blocking on the
lock you can use crash or other tool to examine the lock and see which
process got the lock and also where.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/


Attachments:
(No filename) (1.32 kB)
(No filename) (189.00 B)
Download all attachments

2004-10-16 21:45:30

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Best way to find where a lock is taken and not released?

Hi.

On Sun, 2004-10-17 at 02:46, Andreas Dilger wrote:
> On Oct 16, 2004 19:30 +1000, Nigel Cunningham wrote:
> > I saw a hang the other day (2.6.8.1) where all other processes except
> > the suspending to disk one were refrigerated and the process doing the
> > suspending was stuck trying to take the dcache_lock via
> > shrink_all_memory. Obviously some path called via shrink_all_memory had
> > taken the lock and not released it, then tried to retake it _or_ another
> > process had taken the lock and then not released it when backing out and
> > entering the refrigator. My question is, what's the best way to find the
> > path on which this occurs? Grepping, I see dcache_lock all over the
> > show, so if there's a more efficient method that reading the files, I'd
> > like to learn it. It occurs to me that I might try wrapping calls to
> > lock and unlock that lock in printks, but I'm wondering if there's some
> > better way I don't yet know.
>
> Probably the easiest would be to add to the lock struct a pointer to
> the task_struct and the EIP when it gets the lock, and clear them when
> it releases the lock. That way, when you see processes blocking on the
> lock you can use crash or other tool to examine the lock and see which
> process got the lock and also where.

Okee doke. I saw that kgdb uses that method. I'll give it a go.

Thanks!

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

Many today claim to be tolerant. True tolerance, however, can cope with others
being intolerant.