2013-04-04 22:23:36

by Arkadiusz Miskiewicz

[permalink] [raw]
Subject: Re: 3.8.3 and 3.9git occasional watchdog oops

On Thursday 14 of March 2013, Arkadiusz Miśkiewicz wrote:
> Hi.
>
> Just hit watchdog related oops in 3.8.3 kernel. Unfortunately photos only.
>
> http://ixion.pld-linux.org/~arekm/watchdog-oops-3.8.3/IMG_8942.JPG
> http://ixion.pld-linux.org/~arekm/watchdog-oops-3.8.3/IMG_8941.JPG

3.9git from today isn't any better unfortunately:

http://ixion.pld-linux.org/~arekm/watchdog-oops-3.9git.jpg

>
> oops started after I enabled systemd watchdog functionality. Cannot
> reproduce easily.
>
> watchdog here (thinkpad t400) is:
> iTCO_wdt: Found a ICH9M-E TCO device (Version=2, TCOBASE=0x1060)


--
Arkadiusz Miśkiewicz, arekm / maven.pl


2013-04-05 02:00:03

by Guenter Roeck

[permalink] [raw]
Subject: Re: 3.8.3 and 3.9git occasional watchdog oops

On Fri, Apr 05, 2013 at 12:23:30AM +0200, Arkadiusz Miskiewicz wrote:
> On Thursday 14 of March 2013, Arkadiusz Miśkiewicz wrote:
> > Hi.
> >
> > Just hit watchdog related oops in 3.8.3 kernel. Unfortunately photos only.
> >
> > http://ixion.pld-linux.org/~arekm/watchdog-oops-3.8.3/IMG_8942.JPG
> > http://ixion.pld-linux.org/~arekm/watchdog-oops-3.8.3/IMG_8941.JPG
>
> 3.9git from today isn't any better unfortunately:
>
> http://ixion.pld-linux.org/~arekm/watchdog-oops-3.9git.jpg
>
> >
> > oops started after I enabled systemd watchdog functionality. Cannot
> > reproduce easily.
> >
> > watchdog here (thinkpad t400) is:
> > iTCO_wdt: Found a ICH9M-E TCO device (Version=2, TCOBASE=0x1060)
>
>
Wonder if there is a race condition in the watchdog driver: The watchdog device
is opened before watchdog_register_device returns. I suspect systemd waits for
a udev event, or by some other means detects that /dev/watchdog was created,
and opens it immediately.

I just have no idea where exactly the race condition, if there is one, is
hiding. Or maybe I am completely off track.

Guenter

2013-04-06 03:47:51

by Guenter Roeck

[permalink] [raw]
Subject: Re: 3.8.3 and 3.9git occasional watchdog oops

On Thu, Apr 04, 2013 at 06:59:59PM -0700, Guenter Roeck wrote:
> On Fri, Apr 05, 2013 at 12:23:30AM +0200, Arkadiusz Miskiewicz wrote:
> > On Thursday 14 of March 2013, Arkadiusz Miśkiewicz wrote:
> > > Hi.
> > >
> > > Just hit watchdog related oops in 3.8.3 kernel. Unfortunately photos only.
> > >
> > > http://ixion.pld-linux.org/~arekm/watchdog-oops-3.8.3/IMG_8942.JPG
> > > http://ixion.pld-linux.org/~arekm/watchdog-oops-3.8.3/IMG_8941.JPG
> >
> > 3.9git from today isn't any better unfortunately:
> >
> > http://ixion.pld-linux.org/~arekm/watchdog-oops-3.9git.jpg
> >
> > >
> > > oops started after I enabled systemd watchdog functionality. Cannot
> > > reproduce easily.
> > >
> > > watchdog here (thinkpad t400) is:
> > > iTCO_wdt: Found a ICH9M-E TCO device (Version=2, TCOBASE=0x1060)
> >
> >
> Wonder if there is a race condition in the watchdog driver: The watchdog device
> is opened before watchdog_register_device returns. I suspect systemd waits for
> a udev event, or by some other means detects that /dev/watchdog was created,
> and opens it immediately.
>
> I just have no idea where exactly the race condition, if there is one, is
> hiding. Or maybe I am completely off track.
>
I _think_ I understand the sequence of events.

- The driver is the first watchdog driver to register.
- watchdog_dev_register() gets called and creates the watchdog misc device
by calling misc_register().
At that time, the matching character device (/dev/watchdog0) does not yet
exist, and old_wdd is not set either.
- Userspace gets an event and opens /dev/watchdog
- watchdog_open() is called and sets sets wdd = old_wdd, which is still NULL,
and tries to dereference it. Bang.

If this is the problem, a simple fix would be to set old_wdd before calling
misc_register().

Can you test a patch ?

Guenter