On Thu, 25 Sep 2008 17:05:28 -0700
Divy Le Ray <[email protected]> wrote:
> A SGE queue set timer might access registers while in EEH recovery,
> triggering an EEH error loop. Stop all timers early in EEH process.
<looks>
It's deeply weird that t3_reset_qset() does
memset(&q->tx_reclaim_timer, 0, sizeof(q->tx_reclaim_timer));
There are lots of things in the timer_list which the driver has no
business modifying. For example, this might break the metadata in
Thomas's debugobjects stuff, which attempts to catch things being done
in the wrong order (I don't think it will, but still...).
Rerunning init_timer() should repair the damage, but I suspect a simple
q->tx_reclaim_timer.function = NULL; /* explanation goes here */
would suffice here.
t3_sge_alloc_qset() could use the newer setup_timer().
Andrew Morton wrote:
> On Thu, 25 Sep 2008 17:05:28 -0700
> Divy Le Ray <[email protected]> wrote:
>
>
>> A SGE queue set timer might access registers while in EEH recovery,
>> triggering an EEH error loop. Stop all timers early in EEH process.
>>
>
> <looks>
>
> It's deeply weird that t3_reset_qset() does
>
> memset(&q->tx_reclaim_timer, 0, sizeof(q->tx_reclaim_timer));
>
> There are lots of things in the timer_list which the driver has no
> business modifying. For example, this might break the metadata in
> Thomas's debugobjects stuff, which attempts to catch things being done
> in the wrong order (I don't think it will, but still...).
>
> Rerunning init_timer() should repair the damage, but I suspect a simple
>
> q->tx_reclaim_timer.function = NULL; /* explanation goes here */
>
> would suffice here.
>
>
> t3_sge_alloc_qset() could use the newer setup_timer().
>
>
Hi Andrew,
Your suggestion is implemented in the first patch of the series I just
posted.
I apologize for the delayed reply.
Cheers,
Divy