2011-04-21 01:43:05

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH 1/7] lockdep: Print a nice description of an irq locking issue

From: Steven Rostedt <[email protected]>

Locking order inversion due to interrupts is a subtle problem.
When a locking inversion due to interrupts is discovered by lockdep,
it currently reports something like this:

[ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]

And then writes the locks that are involved as well as back traces.
But several developers are confused by what a HARDIRQ->safe to unsafe
issue is all about, and sometimes even blow it off as a bug in lockdep.
As it is not obvious when lockdep describes this about a lock that
is never taken in interrupt context.

After explaining the problems that lockdep is reporting, I decided
to add a description of the problem in visual form. Now the following
is shown:

---
other info that might help us debug this:

Possible interrupt unsafe locking scenario:

CPU0 CPU1
---- ----
lock(lockA);
local_irq_disable();
lock(&rq->lock);
lock(lockA);
<Interrupt>
lock(&rq->lock);

*** DEADLOCK ***

---

The above is the case when the unsafe lock is taken while holding
a lock taken in irq context. But when a lock is taken that also
grabs a unsafe lock, the call chain is shown:

---
other info that might help us debug this:

Chain exists of:
&rq->lock --> lockA --> lockC

Possible interrupt unsafe locking scenario:

CPU0 CPU1
---- ----
lock(lockC);
local_irq_disable();
lock(&rq->lock);
lock(lockA);
<Interrupt>
lock(&rq->lock);

*** DEADLOCK ***

Acked-by: Peter Zijlstra <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
kernel/lockdep.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 70 insertions(+), 0 deletions(-)

diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 0d2058d..bb77c030 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -490,6 +490,18 @@ void get_usage_chars(struct lock_class *class, char usage[LOCK_USAGE_CHARS])
usage[i] = '\0';
}

+static int __print_lock_name(struct lock_class *class)
+{
+ char str[KSYM_NAME_LEN];
+ const char *name;
+
+ name = class->name;
+ if (!name)
+ name = __get_key_name(class->key, str);
+
+ return printk("%s", name);
+}
+
static void print_lock_name(struct lock_class *class)
{
char str[KSYM_NAME_LEN], usage[LOCK_USAGE_CHARS];
@@ -1325,6 +1337,62 @@ print_shortest_lock_dependencies(struct lock_list *leaf,
return;
}

+static void
+print_irq_lock_scenario(struct lock_list *safe_entry,
+ struct lock_list *unsafe_entry,
+ struct held_lock *prev,
+ struct held_lock *next)
+{
+ struct lock_class *safe_class = safe_entry->class;
+ struct lock_class *unsafe_class = unsafe_entry->class;
+ struct lock_class *middle_class = hlock_class(prev);
+
+ if (middle_class == safe_class)
+ middle_class = hlock_class(next);
+
+ /*
+ * A direct locking problem where unsafe_class lock is taken
+ * directly by safe_class lock, then all we need to show
+ * is the deadlock scenario, as it is obvious that the
+ * unsafe lock is taken under the safe lock.
+ *
+ * But if there is a chain instead, where the safe lock takes
+ * an intermediate lock (middle_class) where this lock is
+ * not the same as the safe lock, then the lock chain is
+ * used to describe the problem. Otherwise we would need
+ * to show a different CPU case for each link in the chain
+ * from the safe_class lock to the unsafe_class lock.
+ */
+ if (middle_class != unsafe_class) {
+ printk("Chain exists of:\n ");
+ __print_lock_name(safe_class);
+ printk(" --> ");
+ __print_lock_name(middle_class);
+ printk(" --> ");
+ __print_lock_name(unsafe_class);
+ printk("\n\n");
+ }
+
+ printk(" Possible interrupt unsafe locking scenario:\n\n");
+ printk(" CPU0 CPU1\n");
+ printk(" ---- ----\n");
+ printk(" lock(");
+ __print_lock_name(unsafe_class);
+ printk(");\n");
+ printk(" local_irq_disable();\n");
+ printk(" lock(");
+ __print_lock_name(safe_class);
+ printk(");\n");
+ printk(" lock(");
+ __print_lock_name(middle_class);
+ printk(");\n");
+ printk(" <Interrupt>\n");
+ printk(" lock(");
+ __print_lock_name(safe_class);
+ printk(");\n");
+ printk("\n *** DEADLOCK ***\n\n");
+}
+
static int
print_bad_irq_dependency(struct task_struct *curr,
struct lock_list *prev_root,
@@ -1376,6 +1444,8 @@ print_bad_irq_dependency(struct task_struct *curr,
print_stack_trace(forwards_entry->class->usage_traces + bit2, 1);

printk("\nother info that might help us debug this:\n\n");
+ print_irq_lock_scenario(backwards_entry, forwards_entry, prev, next);
+
lockdep_print_held_locks(curr);

printk("\nthe dependencies between %s-irq-safe lock", irqclass);
--
1.7.2.3


2011-04-21 06:43:24

by Yong Zhang

[permalink] [raw]
Subject: Re: [PATCH 1/7] lockdep: Print a nice description of an irq locking issue

On Thu, Apr 21, 2011 at 9:41 AM, Steven Rostedt <[email protected]> wrote:
>  ---
>
> The above is the case when the unsafe lock is taken while holding
> a lock taken in irq context. But when a lock is taken that also
> grabs a unsafe lock, the call chain is shown:
>
>  ---
> other info that might help us debug this:
>
> Chain exists of:
>  &rq->lock --> lockA --> lockC
>
>  Possible interrupt unsafe locking scenario:
>
>       CPU0                    CPU1
>       ----                    ----
>  lock(lockC);
>                               local_irq_disable();
>                               lock(&rq->lock);
>                               lock(lockA);
>  <Interrupt>
>    lock(&rq->lock);
>
>  *** DEADLOCK ***

But this is not a real deadlock, right? Or maybe you can teach me :)

I just assume your scenario should show a more real one, But for
a possible irq-dependence deadlock, it seems not easy to find out.

Thanks,
Yong

>
> Acked-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Steven Rostedt <[email protected]>
> ---
>  kernel/lockdep.c |   70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 70 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/lockdep.c b/kernel/lockdep.c
> index 0d2058d..bb77c030 100644
> --- a/kernel/lockdep.c
> +++ b/kernel/lockdep.c
> @@ -490,6 +490,18 @@ void get_usage_chars(struct lock_class *class, char usage[LOCK_USAGE_CHARS])
>        usage[i] = '\0';
>  }
>
> +static int __print_lock_name(struct lock_class *class)
> +{
> +       char str[KSYM_NAME_LEN];
> +       const char *name;
> +
> +       name = class->name;
> +       if (!name)
> +               name = __get_key_name(class->key, str);
> +
> +       return printk("%s", name);
> +}
> +
>  static void print_lock_name(struct lock_class *class)
>  {
>        char str[KSYM_NAME_LEN], usage[LOCK_USAGE_CHARS];
> @@ -1325,6 +1337,62 @@ print_shortest_lock_dependencies(struct lock_list *leaf,
>        return;
>  }
>
> +static void
> +print_irq_lock_scenario(struct lock_list *safe_entry,
> +                       struct lock_list *unsafe_entry,
> +                       struct held_lock *prev,
> +                       struct held_lock *next)
> +{
> +       struct lock_class *safe_class = safe_entry->class;
> +       struct lock_class *unsafe_class = unsafe_entry->class;
> +       struct lock_class *middle_class = hlock_class(prev);
> +
> +       if (middle_class == safe_class)
> +               middle_class = hlock_class(next);
> +
> +       /*
> +        * A direct locking problem where unsafe_class lock is taken
> +        * directly by safe_class lock, then all we need to show
> +        * is the deadlock scenario, as it is obvious that the
> +        * unsafe lock is taken under the safe lock.
> +        *
> +        * But if there is a chain instead, where the safe lock takes
> +        * an intermediate lock (middle_class) where this lock is
> +        * not the same as the safe lock, then the lock chain is
> +        * used to describe the problem. Otherwise we would need
> +        * to show a different CPU case for each link in the chain
> +        * from the safe_class lock to the unsafe_class lock.
> +        */
> +       if (middle_class != unsafe_class) {
> +               printk("Chain exists of:\n  ");
> +               __print_lock_name(safe_class);
> +               printk(" --> ");
> +               __print_lock_name(middle_class);
> +               printk(" --> ");
> +               __print_lock_name(unsafe_class);
> +               printk("\n\n");
> +       }
> +
> +       printk(" Possible interrupt unsafe locking scenario:\n\n");
> +       printk("       CPU0                    CPU1\n");
> +       printk("       ----                    ----\n");
> +       printk("  lock(");
> +       __print_lock_name(unsafe_class);
> +       printk(");\n");
> +       printk("                               local_irq_disable();\n");
> +       printk("                               lock(");
> +       __print_lock_name(safe_class);
> +       printk(");\n");
> +       printk("                               lock(");
> +       __print_lock_name(middle_class);
> +       printk(");\n");
> +       printk("  <Interrupt>\n");
> +       printk("    lock(");
> +       __print_lock_name(safe_class);
> +       printk(");\n");
> +       printk("\n *** DEADLOCK ***\n\n");
> +}
> +
>  static int
>  print_bad_irq_dependency(struct task_struct *curr,
>                         struct lock_list *prev_root,
> @@ -1376,6 +1444,8 @@ print_bad_irq_dependency(struct task_struct *curr,
>        print_stack_trace(forwards_entry->class->usage_traces + bit2, 1);
>
>        printk("\nother info that might help us debug this:\n\n");
> +       print_irq_lock_scenario(backwards_entry, forwards_entry, prev, next);
> +
>        lockdep_print_held_locks(curr);
>
>        printk("\nthe dependencies between %s-irq-safe lock", irqclass);
> --
> 1.7.2.3
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



--
Only stand for myself

2011-04-21 07:02:08

by Yong Zhang

[permalink] [raw]
Subject: Re: [PATCH 1/7] lockdep: Print a nice description of an irq locking issue

On Thu, Apr 21, 2011 at 9:41 AM, Steven Rostedt <[email protected]> wrote:
>  ---
>
> The above is the case when the unsafe lock is taken while holding
> a lock taken in irq context. But when a lock is taken that also
> grabs a unsafe lock, the call chain is shown:
>
>  ---
> other info that might help us debug this:
>
> Chain exists of:
>  &rq->lock --> lockA --> lockC
>
>  Possible interrupt unsafe locking scenario:
>
>       CPU0                    CPU1
>       ----                    ----
>  lock(lockC);
>                               local_irq_disable();
>                               lock(&rq->lock);
>                               lock(lockA);
>  <Interrupt>
>    lock(&rq->lock);
>
>  *** DEADLOCK ***

Or we could show this:
Chain exists of:
&rq->lock --> lockA --> lockC

Possible interrupt unsafe locking scenario:

CPU0 CPU1 CPU2
---- ---- ----
lock(lockC);
local_irq_disable();
lock(&rq->lock); lock(lockA);
lock(lockA); lock(lockC);
<Interrupt>
lock(&rq->lock);

*** DEADLOCK ***

Thanks,
Yong

>
> Acked-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Steven Rostedt <[email protected]>
> ---
>  kernel/lockdep.c |   70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 70 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/lockdep.c b/kernel/lockdep.c
> index 0d2058d..bb77c030 100644
> --- a/kernel/lockdep.c
> +++ b/kernel/lockdep.c
> @@ -490,6 +490,18 @@ void get_usage_chars(struct lock_class *class, char usage[LOCK_USAGE_CHARS])
>        usage[i] = '\0';
>  }
>
> +static int __print_lock_name(struct lock_class *class)
> +{
> +       char str[KSYM_NAME_LEN];
> +       const char *name;
> +
> +       name = class->name;
> +       if (!name)
> +               name = __get_key_name(class->key, str);
> +
> +       return printk("%s", name);
> +}
> +
>  static void print_lock_name(struct lock_class *class)
>  {
>        char str[KSYM_NAME_LEN], usage[LOCK_USAGE_CHARS];
> @@ -1325,6 +1337,62 @@ print_shortest_lock_dependencies(struct lock_list *leaf,
>        return;
>  }
>
> +static void
> +print_irq_lock_scenario(struct lock_list *safe_entry,
> +                       struct lock_list *unsafe_entry,
> +                       struct held_lock *prev,
> +                       struct held_lock *next)
> +{
> +       struct lock_class *safe_class = safe_entry->class;
> +       struct lock_class *unsafe_class = unsafe_entry->class;
> +       struct lock_class *middle_class = hlock_class(prev);
> +
> +       if (middle_class == safe_class)
> +               middle_class = hlock_class(next);
> +
> +       /*
> +        * A direct locking problem where unsafe_class lock is taken
> +        * directly by safe_class lock, then all we need to show
> +        * is the deadlock scenario, as it is obvious that the
> +        * unsafe lock is taken under the safe lock.
> +        *
> +        * But if there is a chain instead, where the safe lock takes
> +        * an intermediate lock (middle_class) where this lock is
> +        * not the same as the safe lock, then the lock chain is
> +        * used to describe the problem. Otherwise we would need
> +        * to show a different CPU case for each link in the chain
> +        * from the safe_class lock to the unsafe_class lock.
> +        */
> +       if (middle_class != unsafe_class) {
> +               printk("Chain exists of:\n  ");
> +               __print_lock_name(safe_class);
> +               printk(" --> ");
> +               __print_lock_name(middle_class);
> +               printk(" --> ");
> +               __print_lock_name(unsafe_class);
> +               printk("\n\n");
> +       }
> +
> +       printk(" Possible interrupt unsafe locking scenario:\n\n");
> +       printk("       CPU0                    CPU1\n");
> +       printk("       ----                    ----\n");
> +       printk("  lock(");
> +       __print_lock_name(unsafe_class);
> +       printk(");\n");
> +       printk("                               local_irq_disable();\n");
> +       printk("                               lock(");
> +       __print_lock_name(safe_class);
> +       printk(");\n");
> +       printk("                               lock(");
> +       __print_lock_name(middle_class);
> +       printk(");\n");
> +       printk("  <Interrupt>\n");
> +       printk("    lock(");
> +       __print_lock_name(safe_class);
> +       printk(");\n");
> +       printk("\n *** DEADLOCK ***\n\n");
> +}
> +
>  static int
>  print_bad_irq_dependency(struct task_struct *curr,
>                         struct lock_list *prev_root,
> @@ -1376,6 +1444,8 @@ print_bad_irq_dependency(struct task_struct *curr,
>        print_stack_trace(forwards_entry->class->usage_traces + bit2, 1);
>
>        printk("\nother info that might help us debug this:\n\n");
> +       print_irq_lock_scenario(backwards_entry, forwards_entry, prev, next);
> +
>        lockdep_print_held_locks(curr);
>
>        printk("\nthe dependencies between %s-irq-safe lock", irqclass);
> --
> 1.7.2.3
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



--
Only stand for myself

2011-04-21 07:08:30

by Yong Zhang

[permalink] [raw]
Subject: Re: [PATCH 1/7] lockdep: Print a nice description of an irq locking issue

On Thu, Apr 21, 2011 at 3:02 PM, Yong Zhang <[email protected]> wrote:
> On Thu, Apr 21, 2011 at 9:41 AM, Steven Rostedt <[email protected]> wrote:
>>  ---
>>
>> The above is the case when the unsafe lock is taken while holding
>> a lock taken in irq context. But when a lock is taken that also
>> grabs a unsafe lock, the call chain is shown:
>>
>>  ---
>> other info that might help us debug this:
>>
>> Chain exists of:
>>  &rq->lock --> lockA --> lockC
>>
>>  Possible interrupt unsafe locking scenario:
>>
>>       CPU0                    CPU1
>>       ----                    ----
>>  lock(lockC);
>>                               local_irq_disable();
>>                               lock(&rq->lock);
>>                               lock(lockA);
>>  <Interrupt>
>>    lock(&rq->lock);
>>
>>  *** DEADLOCK ***
>
> Or we could show this:
> Chain exists of:
> &rq->lock --> lockA --> lockC
>
>  Possible interrupt unsafe locking scenario:
>
>      CPU0                    CPU1                           CPU2
>      ----                    ----                                     ----
>  lock(lockC);
>                              local_irq_disable();

Forget
local_irq_disable(); here :)

>                              lock(&rq->lock);            lock(lockA);
>                              lock(lockA);                   lock(lockC);
>  <Interrupt>
>   lock(&rq->lock);
>
>  *** DEADLOCK ***
>
> Thanks,
> Yong
>
>>
>> Acked-by: Peter Zijlstra <[email protected]>
>> Signed-off-by: Steven Rostedt <[email protected]>
>> ---
>>  kernel/lockdep.c |   70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 70 insertions(+), 0 deletions(-)
>>
>> diff --git a/kernel/lockdep.c b/kernel/lockdep.c
>> index 0d2058d..bb77c030 100644
>> --- a/kernel/lockdep.c
>> +++ b/kernel/lockdep.c
>> @@ -490,6 +490,18 @@ void get_usage_chars(struct lock_class *class, char usage[LOCK_USAGE_CHARS])
>>        usage[i] = '\0';
>>  }
>>
>> +static int __print_lock_name(struct lock_class *class)
>> +{
>> +       char str[KSYM_NAME_LEN];
>> +       const char *name;
>> +
>> +       name = class->name;
>> +       if (!name)
>> +               name = __get_key_name(class->key, str);
>> +
>> +       return printk("%s", name);
>> +}
>> +
>>  static void print_lock_name(struct lock_class *class)
>>  {
>>        char str[KSYM_NAME_LEN], usage[LOCK_USAGE_CHARS];
>> @@ -1325,6 +1337,62 @@ print_shortest_lock_dependencies(struct lock_list *leaf,
>>        return;
>>  }
>>
>> +static void
>> +print_irq_lock_scenario(struct lock_list *safe_entry,
>> +                       struct lock_list *unsafe_entry,
>> +                       struct held_lock *prev,
>> +                       struct held_lock *next)
>> +{
>> +       struct lock_class *safe_class = safe_entry->class;
>> +       struct lock_class *unsafe_class = unsafe_entry->class;
>> +       struct lock_class *middle_class = hlock_class(prev);
>> +
>> +       if (middle_class == safe_class)
>> +               middle_class = hlock_class(next);
>> +
>> +       /*
>> +        * A direct locking problem where unsafe_class lock is taken
>> +        * directly by safe_class lock, then all we need to show
>> +        * is the deadlock scenario, as it is obvious that the
>> +        * unsafe lock is taken under the safe lock.
>> +        *
>> +        * But if there is a chain instead, where the safe lock takes
>> +        * an intermediate lock (middle_class) where this lock is
>> +        * not the same as the safe lock, then the lock chain is
>> +        * used to describe the problem. Otherwise we would need
>> +        * to show a different CPU case for each link in the chain
>> +        * from the safe_class lock to the unsafe_class lock.
>> +        */
>> +       if (middle_class != unsafe_class) {
>> +               printk("Chain exists of:\n  ");
>> +               __print_lock_name(safe_class);
>> +               printk(" --> ");
>> +               __print_lock_name(middle_class);
>> +               printk(" --> ");
>> +               __print_lock_name(unsafe_class);
>> +               printk("\n\n");
>> +       }
>> +
>> +       printk(" Possible interrupt unsafe locking scenario:\n\n");
>> +       printk("       CPU0                    CPU1\n");
>> +       printk("       ----                    ----\n");
>> +       printk("  lock(");
>> +       __print_lock_name(unsafe_class);
>> +       printk(");\n");
>> +       printk("                               local_irq_disable();\n");
>> +       printk("                               lock(");
>> +       __print_lock_name(safe_class);
>> +       printk(");\n");
>> +       printk("                               lock(");
>> +       __print_lock_name(middle_class);
>> +       printk(");\n");
>> +       printk("  <Interrupt>\n");
>> +       printk("    lock(");
>> +       __print_lock_name(safe_class);
>> +       printk(");\n");
>> +       printk("\n *** DEADLOCK ***\n\n");
>> +}
>> +
>>  static int
>>  print_bad_irq_dependency(struct task_struct *curr,
>>                         struct lock_list *prev_root,
>> @@ -1376,6 +1444,8 @@ print_bad_irq_dependency(struct task_struct *curr,
>>        print_stack_trace(forwards_entry->class->usage_traces + bit2, 1);
>>
>>        printk("\nother info that might help us debug this:\n\n");
>> +       print_irq_lock_scenario(backwards_entry, forwards_entry, prev, next);
>> +
>>        lockdep_print_held_locks(curr);
>>
>>        printk("\nthe dependencies between %s-irq-safe lock", irqclass);
>> --
>> 1.7.2.3
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>
>
>
> --
> Only stand for myself
>



--
Only stand for myself

2011-04-21 11:40:32

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/7] lockdep: Print a nice description of an irq locking issue

On Thu, 2011-04-21 at 15:02 +0800, Yong Zhang wrote:
> On Thu, Apr 21, 2011 at 9:41 AM, Steven Rostedt <[email protected]> wrote:
> > ---
> >
> > The above is the case when the unsafe lock is taken while holding
> > a lock taken in irq context. But when a lock is taken that also
> > grabs a unsafe lock, the call chain is shown:
> >
> > ---
> > other info that might help us debug this:
> >
> > Chain exists of:
> > &rq->lock --> lockA --> lockC
> >
> > Possible interrupt unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(lockC);
> > local_irq_disable();
> > lock(&rq->lock);
> > lock(lockA);
> > <Interrupt>
> > lock(&rq->lock);
> >
> > *** DEADLOCK ***
>
> Or we could show this:
> Chain exists of:
> &rq->lock --> lockA --> lockC
>
> Possible interrupt unsafe locking scenario:
>
> CPU0 CPU1 CPU2
> ---- ---- ----
> lock(lockC);
> local_irq_disable();
> lock(&rq->lock); lock(lockA);
> lock(lockA); lock(lockC);
> <Interrupt>
> lock(&rq->lock);
>
> *** DEADLOCK ***


We could but I prefer not to ;) We have some chains that are 8 locks
deep. I really don't want to scatter that entirely across the screen.
Hence my "Chain exists.." statement, following an example that any
kernel developer can (with a little thinking) see is a possible
deadlock.

In fact, this code doesn't even look at the full chain, it only examines
3 locks in the chain, and lets the developer figure out the rest. I
could show the entire chain too.

Thanks,

-- Steve

2011-04-21 13:35:52

by Yong Zhang

[permalink] [raw]
Subject: Re: [PATCH 1/7] lockdep: Print a nice description of an irq locking issue

On Thu, Apr 21, 2011 at 07:40:29AM -0400, Steven Rostedt wrote:
> On Thu, 2011-04-21 at 15:02 +0800, Yong Zhang wrote:
> > On Thu, Apr 21, 2011 at 9:41 AM, Steven Rostedt <[email protected]> wrote:
> > > ---
> > >
> > > The above is the case when the unsafe lock is taken while holding
> > > a lock taken in irq context. But when a lock is taken that also
> > > grabs a unsafe lock, the call chain is shown:
> > >
> > > ---
> > > other info that might help us debug this:
> > >
> > > Chain exists of:
> > > &rq->lock --> lockA --> lockC
> > >
> > > Possible interrupt unsafe locking scenario:
> > >
> > > CPU0 CPU1
> > > ---- ----
> > > lock(lockC);
> > > local_irq_disable();
> > > lock(&rq->lock);
> > > lock(lockA);
> > > <Interrupt>
> > > lock(&rq->lock);
> > >
> > > *** DEADLOCK ***
> >
> > Or we could show this:
> > Chain exists of:
> > &rq->lock --> lockA --> lockC
> >
> > Possible interrupt unsafe locking scenario:
> >
> > CPU0 CPU1 CPU2
> > ---- ---- ----
> > lock(lockC);
> > local_irq_disable();
> > lock(&rq->lock); lock(lockA);
> > lock(lockA); lock(lockC);
> > <Interrupt>
> > lock(&rq->lock);
> >
> > *** DEADLOCK ***
>
>
> We could but I prefer not to ;) We have some chains that are 8 locks
> deep. I really don't want to scatter that entirely across the screen.
> Hence my "Chain exists.." statement, following an example that any
> kernel developer can (with a little thinking) see is a possible
> deadlock.

Yup :)

>
> In fact, this code doesn't even look at the full chain, it only examines
> 3 locks in the chain, and lets the developer figure out the rest.

OK, fair enough.

> I
> could show the entire chain too.

Sure :)

Thanks,
Yong

2011-04-21 14:24:08

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/7] lockdep: Print a nice description of an irq locking issue

On Thu, 2011-04-21 at 21:35 +0800, Yong Zhang wrote:
> On Thu, Apr 21, 2011 at 07:40:29AM -0400, Steven Rostedt wrote:

> > I
> > could show the entire chain too.
>
> Sure :)

I talked a little about this with Peter Zijlstra on IRC, and we agreed
that we'll keep this as is until we notice that kernel developers are
still confused with the output.

The more advanced I make this output, the more likely I'll make a
mistake and show something that is not what actually happens. I wrote a
module to trigger each of these cases and the problems that lockdep can
detect grows exponentially as we make the output more specific.

I probably should modify the lockdep selftests to add an option to show
the output as well as it triggers errors. This would allow me to make
sure the output still works.

But for now, I have other things to work on ;)

-- Steve

2011-04-22 01:41:42

by Yong Zhang

[permalink] [raw]
Subject: Re: [PATCH 1/7] lockdep: Print a nice description of an irq locking issue

On Thu, Apr 21, 2011 at 10:24 PM, Steven Rostedt <[email protected]> wrote:
> On Thu, 2011-04-21 at 21:35 +0800, Yong Zhang wrote:
>> On Thu, Apr 21, 2011 at 07:40:29AM -0400, Steven Rostedt wrote:
>
>> > I
>> > could show the entire chain too.
>>
>> Sure :)
>
> I talked a little about this with Peter Zijlstra on IRC, and we agreed
> that we'll keep this as is until we notice that kernel developers are
> still confused with the output.

OK :)

>
> The more advanced I make this output, the more likely I'll make a
> mistake and show something that is not what actually happens. I wrote a
> module to trigger each of these cases and the problems that lockdep can
> detect grows exponentially as we make the output more specific.
>
> I probably should modify the lockdep selftests to add an option to show
> the output as well as it triggers errors. This would allow me to make
> sure the output still works.

This does make sense to keep things consistent if we have it.

>
> But for now, I have other things to work on ;)

If you want, you can show more details on your ideas,
maybe I could get some time to finish it :)

Thanks,
Yong



--
Only stand for myself

2011-04-22 02:34:48

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/7] lockdep: Print a nice description of an irq locking issue

On Fri, 2011-04-22 at 09:41 +0800, Yong Zhang wrote:

> > I probably should modify the lockdep selftests to add an option to show
> > the output as well as it triggers errors. This would allow me to make
> > sure the output still works.
>
> This does make sense to keep things consistent if we have it.

Actually, Peter informed me that we already do this. Just need to boot
with debug_locks_verbose=1.

>
> >
> > But for now, I have other things to work on ;)
>
> If you want, you can show more details on your ideas,
> maybe I could get some time to finish it :)

Well, I got pretty much what I wanted out for now. I'll probably go and
look later. But by all means, if you find other ways to make lockdep
more readable. Please, don't let me stop you ;)

-- Steve


2011-04-22 06:10:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/7] lockdep: Print a nice description of an irq locking issue


* Steven Rostedt <[email protected]> wrote:

> On Fri, 2011-04-22 at 09:41 +0800, Yong Zhang wrote:
>
> > > I probably should modify the lockdep selftests to add an option to show
> > > the output as well as it triggers errors. This would allow me to make
> > > sure the output still works.
> >
> > This does make sense to keep things consistent if we have it.
>
> Actually, Peter informed me that we already do this. Just need to boot
> with debug_locks_verbose=1.

Yeah, that's been in lockdep since v1, that's how i prettified the original
lockdep output.

Thanks,

Ingo

2011-04-22 12:20:14

by Steven Rostedt

[permalink] [raw]
Subject: [tip:core/locking] lockdep: Print a nicer description for irq lock inversions

Commit-ID: 3003eba313dd0e0502dd71548c36fe7c19801ce5
Gitweb: http://git.kernel.org/tip/3003eba313dd0e0502dd71548c36fe7c19801ce5
Author: Steven Rostedt <[email protected]>
AuthorDate: Wed, 20 Apr 2011 21:41:54 -0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 22 Apr 2011 11:06:57 +0200

lockdep: Print a nicer description for irq lock inversions

Locking order inversion due to interrupts is a subtle problem.

When an irq lockiinversion discovered by lockdep it currently
reports something like:

[ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]

... and then prints out the locks that are involved, as back traces.

Judging by lkml feedback developers were routinely confused by what
a HARDIRQ->safe to unsafe issue is all about, and sometimes even
blew it off as a bug in lockdep.

It is not obvious when lockdep prints this message about a lock that
is never taken in interrupt context.

After explaining the problems that lockdep is reporting, I
decided to add a description of the problem in visual form. Now
the following is shown:

---
other info that might help us debug this:

Possible interrupt unsafe locking scenario:

CPU0 CPU1
---- ----
lock(lockA);
local_irq_disable();
lock(&rq->lock);
lock(lockA);
<Interrupt>
lock(&rq->lock);

*** DEADLOCK ***

---

The above is the case when the unsafe lock is taken while
holding a lock taken in irq context. But when a lock is taken
that also grabs a unsafe lock, the call chain is shown:

---
other info that might help us debug this:

Chain exists of:
&rq->lock --> lockA --> lockC

Possible interrupt unsafe locking scenario:

CPU0 CPU1
---- ----
lock(lockC);
local_irq_disable();
lock(&rq->lock);
lock(lockA);
<Interrupt>
lock(&rq->lock);

*** DEADLOCK ***

Signed-off-by: Steven Rostedt <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/lockdep.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 70 insertions(+), 0 deletions(-)

diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 53a6895..7b2ffee 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -490,6 +490,18 @@ void get_usage_chars(struct lock_class *class, char usage[LOCK_USAGE_CHARS])
usage[i] = '\0';
}

+static int __print_lock_name(struct lock_class *class)
+{
+ char str[KSYM_NAME_LEN];
+ const char *name;
+
+ name = class->name;
+ if (!name)
+ name = __get_key_name(class->key, str);
+
+ return printk("%s", name);
+}
+
static void print_lock_name(struct lock_class *class)
{
char str[KSYM_NAME_LEN], usage[LOCK_USAGE_CHARS];
@@ -1325,6 +1337,62 @@ print_shortest_lock_dependencies(struct lock_list *leaf,
return;
}

+static void
+print_irq_lock_scenario(struct lock_list *safe_entry,
+ struct lock_list *unsafe_entry,
+ struct held_lock *prev,
+ struct held_lock *next)
+{
+ struct lock_class *safe_class = safe_entry->class;
+ struct lock_class *unsafe_class = unsafe_entry->class;
+ struct lock_class *middle_class = hlock_class(prev);
+
+ if (middle_class == safe_class)
+ middle_class = hlock_class(next);
+
+ /*
+ * A direct locking problem where unsafe_class lock is taken
+ * directly by safe_class lock, then all we need to show
+ * is the deadlock scenario, as it is obvious that the
+ * unsafe lock is taken under the safe lock.
+ *
+ * But if there is a chain instead, where the safe lock takes
+ * an intermediate lock (middle_class) where this lock is
+ * not the same as the safe lock, then the lock chain is
+ * used to describe the problem. Otherwise we would need
+ * to show a different CPU case for each link in the chain
+ * from the safe_class lock to the unsafe_class lock.
+ */
+ if (middle_class != unsafe_class) {
+ printk("Chain exists of:\n ");
+ __print_lock_name(safe_class);
+ printk(" --> ");
+ __print_lock_name(middle_class);
+ printk(" --> ");
+ __print_lock_name(unsafe_class);
+ printk("\n\n");
+ }
+
+ printk(" Possible interrupt unsafe locking scenario:\n\n");
+ printk(" CPU0 CPU1\n");
+ printk(" ---- ----\n");
+ printk(" lock(");
+ __print_lock_name(unsafe_class);
+ printk(");\n");
+ printk(" local_irq_disable();\n");
+ printk(" lock(");
+ __print_lock_name(safe_class);
+ printk(");\n");
+ printk(" lock(");
+ __print_lock_name(middle_class);
+ printk(");\n");
+ printk(" <Interrupt>\n");
+ printk(" lock(");
+ __print_lock_name(safe_class);
+ printk(");\n");
+ printk("\n *** DEADLOCK ***\n\n");
+}
+
static int
print_bad_irq_dependency(struct task_struct *curr,
struct lock_list *prev_root,
@@ -1376,6 +1444,8 @@ print_bad_irq_dependency(struct task_struct *curr,
print_stack_trace(forwards_entry->class->usage_traces + bit2, 1);

printk("\nother info that might help us debug this:\n\n");
+ print_irq_lock_scenario(backwards_entry, forwards_entry, prev, next);
+
lockdep_print_held_locks(curr);

printk("\nthe dependencies between %s-irq-safe lock", irqclass);