As suggested by Frederic Weisbecker.
Patch against tip/core/softlockup.
---
To avoid holding the tasklist lock too long, hung_task_check_count was used
as an upper bound on the number of tasks that are checked by hung_task.
This can be problematic if hung_task_check_count is set much lower than
the number of tasks in the system. A large number of tasks will not get
checked. This patch removes the hung_task_check_count sysctl.
Instead of checking a limited number of tasks, all tasks are checked. To
avoid holding the tasklist lock too long, the lock is released and the
processor rescheduled (if necessary) every n tasks (currently 1024).
The design was proposed by Fr?d?ric Weisbecker.
Fr?d?ric Weisbecker ([email protected]) wrote:
>
> Instead of having this arbitrary limit of tasks, why not just
> lurk the need_resched() and then schedule if it needs too.
>
> I know that sounds a bit racy, because you will have to release the
> tasklist_lock and
> a lot of things can happen in the task list until you become resched.
> But you can do a get_task_struct() on g and t before your thread is
> going to sleep and then put them
> when it is awaken.
> Perhaps some tasks will disappear or be appended in the list before g
> and t, but that doesn't really matter:
> if they disappear, they didn't lockup, and if they were appended, they
> are not enough cold to be analyzed :-)
>
> This way you can drop the arbitrary limit of task number given by the user....
>
> Frederic.
>
Signed-off-by: Mandeep Singh Baines <[email protected]>
---
include/linux/sched.h | 1 -
kernel/hung_task.c | 25 +++++++++++++++++++++----
kernel/sysctl.c | 9 ---------
3 files changed, 21 insertions(+), 14 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f2f94d5..278121c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -315,7 +315,6 @@ static inline void touch_all_softlockup_watchdogs(void)
#ifdef CONFIG_DETECT_HUNG_TASK
extern unsigned int sysctl_hung_task_panic;
-extern unsigned long sysctl_hung_task_check_count;
extern unsigned long sysctl_hung_task_timeout_secs;
extern unsigned long sysctl_hung_task_warnings;
extern int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index ba8ccd4..f9b18e2 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -19,7 +19,7 @@
/*
* Have a reasonable limit on the number of tasks checked:
*/
-unsigned long __read_mostly sysctl_hung_task_check_count = 1024;
+#define HUNG_TASK_CHECK_COUNT 1024
/*
* Zero means infinite timeout - no checking done:
@@ -116,7 +116,7 @@ static void check_hung_task(struct task_struct *t, unsigned long now,
*/
static void check_hung_uninterruptible_tasks(unsigned long timeout)
{
- int max_count = sysctl_hung_task_check_count;
+ int max_count = HUNG_TASK_CHECK_COUNT;
unsigned long now = get_timestamp();
struct task_struct *g, *t;
@@ -129,8 +129,25 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
read_lock(&tasklist_lock);
do_each_thread(g, t) {
- if (!--max_count)
- goto unlock;
+ if (!--max_count) {
+ /*
+ * Drop the lock every once in a while and resched if
+ * necessary. Don't want to hold the lock too long.
+ */
+ get_task_struct(t);
+ read_unlock(&tasklist_lock);
+ max_count = HUNG_TASK_CHECK_COUNT;
+ if (need_resched())
+ schedule();
+ read_lock(&tasklist_lock);
+ put_task_struct(t);
+ /*
+ * t was unlinked from tasklist. Can't continue in this
+ * case. Exit and try again next time.
+ */
+ if (t->state == TASK_DEAD)
+ goto unlock;
+ }
/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
if (t->state == TASK_UNINTERRUPTIBLE)
check_hung_task(t, now, timeout);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 2481ed3..16526a2 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -820,15 +820,6 @@ static struct ctl_table kern_table[] = {
},
{
.ctl_name = CTL_UNNUMBERED,
- .procname = "hung_task_check_count",
- .data = &sysctl_hung_task_check_count,
- .maxlen = sizeof(unsigned long),
- .mode = 0644,
- .proc_handler = &proc_doulongvec_minmax,
- .strategy = &sysctl_intvec,
- },
- {
- .ctl_name = CTL_UNNUMBERED,
.procname = "hung_task_timeout_secs",
.data = &sysctl_hung_task_timeout_secs,
.maxlen = sizeof(unsigned long),
--
1.5.4.5
* Mandeep Singh Baines <[email protected]> wrote:
> read_lock(&tasklist_lock);
> do_each_thread(g, t) {
> - if (!--max_count)
> - goto unlock;
> + if (!--max_count) {
> + /*
> + * Drop the lock every once in a while and resched if
> + * necessary. Don't want to hold the lock too long.
> + */
> + get_task_struct(t);
> + read_unlock(&tasklist_lock);
> + max_count = HUNG_TASK_CHECK_COUNT;
> + if (need_resched())
> + schedule();
> + read_lock(&tasklist_lock);
> + put_task_struct(t);
> + /*
> + * t was unlinked from tasklist. Can't continue in this
> + * case. Exit and try again next time.
> + */
> + if (t->state == TASK_DEAD)
> + goto unlock;
> + }
firstly, this bit should move into a helper function. Also, why dont you
do the need_resched() check first (it's very lighweight) - and thus only
do the heavy ops (get-task-struct & tasklist_lock unlock) if that is set?
But most importantly, isnt the logic above confused? --max_count will
reach zero exactly once, and then we'll loop for a long time.
Ingo
2009/1/21 Ingo Molnar <[email protected]>:
>
> * Mandeep Singh Baines <[email protected]> wrote:
>
>> read_lock(&tasklist_lock);
>> do_each_thread(g, t) {
>> - if (!--max_count)
>> - goto unlock;
>> + if (!--max_count) {
>> + /*
>> + * Drop the lock every once in a while and resched if
>> + * necessary. Don't want to hold the lock too long.
>> + */
>> + get_task_struct(t);
>> + read_unlock(&tasklist_lock);
>> + max_count = HUNG_TASK_CHECK_COUNT;
>> + if (need_resched())
>> + schedule();
>> + read_lock(&tasklist_lock);
>> + put_task_struct(t);
>> + /*
>> + * t was unlinked from tasklist. Can't continue in this
>> + * case. Exit and try again next time.
>> + */
>> + if (t->state == TASK_DEAD)
>> + goto unlock;
>> + }
>
> firstly, this bit should move into a helper function. Also, why dont you
> do the need_resched() check first (it's very lighweight) - and thus only
> do the heavy ops (get-task-struct & tasklist_lock unlock) if that is set?
>
> But most importantly, isnt the logic above confused? --max_count will
> reach zero exactly once, and then we'll loop for a long time.
>
> Ingo
>
Thanks Mandeep for this patch.
By reading Ingo's review, I notice that the checks done by
check_hung_task() are quite lightweight and then quick.
I guess your loop is able to go through a good number of tasks before
actually needing to be resched.
Perhaps the max_count can be dropped actually, since it is a kind of
arbitrary chosen constant.
So you would just need to check need_resched() at each iteration.
ie (should be split in helper functions):
bool retry = false;
do {
read_lock(&tasklist_lock);
do_each_thread(g, t) {
/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
if (t->state == TASK_UNINTERRUPTIBLE)
check_hung_task(t, now, timeout);
if (need_resched()) {
get_task_struct(t);
read_unlock(&tasklist_lock);
schedule();
read_lock(&tasklist_lock);
put_task_struct(t);
if (t->state == TASK_DEAD) {
retry = true;
goto unlock;
}
}
} while_each_thread(g, t);
unlock:
read_unlock(&tasklist_lock);
} while(retry);
There are good chances that your "t" task hasn't died, but if so, you
can just retry the whole thing.
Perhaps that should require some tests to see if there is a good
average of t->stat == TASK_DEAD path not taken,
but of course, it depends on what the system is doing most.
What do you think?
Ingo Molnar ([email protected]) wrote:
>
> firstly, this bit should move into a helper function. Also, why dont you
Done.
> do the need_resched() check first (it's very lighweight) - and thus only
> do the heavy ops (get-task-struct & tasklist_lock unlock) if that is set?
>
Wanted to upper-bound the amount of time the lock is held. In order to
give others a chance to write_lock the tasklist, released the lock
regardless of whether a re-schedule is need.
> But most importantly, isnt the logic above confused? --max_count will
> reach zero exactly once, and then we'll loop for a long time.
max_count was being reset. I just put it in a strange place. To make
the critical section a little smaller, max_count was being reset after
the read_lock was released. This was probably over-kill. Moved this to
right after the max_count check.
Fr?d?ric Weisbecker ([email protected]) wrote:
>
> Thanks Mandeep for this patch.
>
np. Thanks for proposing it.
> By reading Ingo's review, I notice that the checks done by
> check_hung_task() are quite lightweight and then quick.
> I guess your loop is able to go through a good number of tasks before
> actually needing to be resched.
> Perhaps the max_count can be dropped actually, since it is a kind of
> arbitrary chosen constant.
> So you would just need to check need_resched() at each iteration.
>
Would like to upper-bound the amount of time the read_lock is held. So the
lock is released every max_count iterations. But yeah, it is set to
an arbitrary value. Probably needs some tuning. Its a trade-off between
overhead and latency. The larger max_count, the more you amortize the
locks but increase the time other have to wait to write_lock the tasklist.
>
> There are good chances that your "t" task hasn't died, but if so, you
> can just retry the whole thing.
> Perhaps that should require some tests to see if there is a good
> average of t->stat == TASK_DEAD path not taken,
> but of course, it depends on what the system is doing most.
>
> What do you think?
Wanted to avoid doing an unbounded amount of work in check_hung_*_tasks.
So erring on the side of caution, decided to exit the loop as opposed
to retrying immediately. t->state == TASK_DEAD should be a rare event but
if it does happen the worst case is a delay in reporting hung tasks.
---
To avoid holding the tasklist lock too long, hung_task_check_count was used
as an upper bound on the number of tasks that are checked by hung_task.
This patch removes the hung_task_check_count sysctl.
Instead of checking a limited number of tasks, all tasks are checked. To
avoid holding the tasklist lock too long, the lock is released periodically
and the processor rescheduled if necessary.
The design was proposed by Fr?d?ric Weisbecker.
Fr?d?ric Weisbecker ([email protected]) wrote:
>
> Instead of having this arbitrary limit of tasks, why not just
> lurk the need_resched() and then schedule if it needs too.
>
> I know that sounds a bit racy, because you will have to release the
> tasklist_lock and
> a lot of things can happen in the task list until you become resched.
> But you can do a get_task_struct() on g and t before your thread is
> going to sleep and then put them
> when it is awaken.
> Perhaps some tasks will disappear or be appended in the list before g
> and t, but that doesn't really matter:
> if they disappear, they didn't lockup, and if they were appended, they
> are not enough cold to be analyzed :-)
>
> This way you can drop the arbitrary limit of task number given by the user....
>
> Frederic.
>
Signed-off-by: Mandeep Singh Baines <[email protected]>
---
include/linux/sched.h | 1 -
kernel/hung_task.c | 31 +++++++++++++++++++++++++++----
kernel/sysctl.c | 9 ---------
3 files changed, 27 insertions(+), 14 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f2f94d5..278121c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -315,7 +315,6 @@ static inline void touch_all_softlockup_watchdogs(void)
#ifdef CONFIG_DETECT_HUNG_TASK
extern unsigned int sysctl_hung_task_panic;
-extern unsigned long sysctl_hung_task_check_count;
extern unsigned long sysctl_hung_task_timeout_secs;
extern unsigned long sysctl_hung_task_warnings;
extern int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index ba8ccd4..2880574 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -19,7 +19,7 @@
/*
* Have a reasonable limit on the number of tasks checked:
*/
-unsigned long __read_mostly sysctl_hung_task_check_count = 1024;
+#define HUNG_TASK_CHECK_COUNT 1024
/*
* Zero means infinite timeout - no checking done:
@@ -110,13 +110,28 @@ static void check_hung_task(struct task_struct *t, unsigned long now,
}
/*
+ * To avoid holding the tasklist read_lock for an unbounded amount of time,
+ * periodically release the lock and reschedule if necessary. This gives other
+ * tasks a chance to run and a chance to grab the write_lock.
+ */
+static void check_hung_reschedule(struct task_struct *t)
+{
+ get_task_struct(t);
+ read_unlock(&tasklist_lock);
+ if (need_resched())
+ schedule();
+ read_lock(&tasklist_lock);
+ put_task_struct(t);
+}
+
+/*
* Check whether a TASK_UNINTERRUPTIBLE does not get woken up for
* a really long time (120 seconds). If that happens, print out
* a warning.
*/
static void check_hung_uninterruptible_tasks(unsigned long timeout)
{
- int max_count = sysctl_hung_task_check_count;
+ int max_count = HUNG_TASK_CHECK_COUNT;
unsigned long now = get_timestamp();
struct task_struct *g, *t;
@@ -129,8 +144,16 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
read_lock(&tasklist_lock);
do_each_thread(g, t) {
- if (!--max_count)
- goto unlock;
+ if (!--max_count) {
+ max_count = HUNG_TASK_CHECK_COUNT;
+ check_hung_reschedule(t);
+ /*
+ * Exit loop if t was unlinked during reschedule.
+ * Try again later.
+ */
+ if (t->state == TASK_DEAD)
+ goto unlock;
+ }
/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
if (t->state == TASK_UNINTERRUPTIBLE)
check_hung_task(t, now, timeout);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 2481ed3..16526a2 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -820,15 +820,6 @@ static struct ctl_table kern_table[] = {
},
{
.ctl_name = CTL_UNNUMBERED,
- .procname = "hung_task_check_count",
- .data = &sysctl_hung_task_check_count,
- .maxlen = sizeof(unsigned long),
- .mode = 0644,
- .proc_handler = &proc_doulongvec_minmax,
- .strategy = &sysctl_intvec,
- },
- {
- .ctl_name = CTL_UNNUMBERED,
.procname = "hung_task_timeout_secs",
.data = &sysctl_hung_task_timeout_secs,
.maxlen = sizeof(unsigned long),
--
1.5.4.5
* Mandeep Singh Baines <[email protected]> wrote:
> > do the need_resched() check first (it's very lighweight) - and thus
> > only do the heavy ops (get-task-struct & tasklist_lock unlock) if that
> > is set?
>
> Wanted to upper-bound the amount of time the lock is held. In order to
> give others a chance to write_lock the tasklist, released the lock
> regardless of whether a re-schedule is need.
but this:
> +static void check_hung_reschedule(struct task_struct *t)
> +{
> + get_task_struct(t);
> + read_unlock(&tasklist_lock);
> + if (need_resched())
> + schedule();
> + read_lock(&tasklist_lock);
> + put_task_struct(t);
> +}
does not actually achieve that. Releasing a lock does not mean that other
CPUs will immediately be able to get it - if the ex-owner quickly
re-acquires it then it will often succeed in doing so. Perhaps adding a
cpu_relax() would increase the chance ... but still, it looks a bit weird.
Ingo
Ingo Molnar ([email protected]) wrote:
>
> * Mandeep Singh Baines <[email protected]> wrote:
> >
> > Wanted to upper-bound the amount of time the lock is held. In order to
> > give others a chance to write_lock the tasklist, released the lock
> > regardless of whether a re-schedule is need.
>
> but this:
>
> > +static void check_hung_reschedule(struct task_struct *t)
> > +{
> > + get_task_struct(t);
> > + read_unlock(&tasklist_lock);
> > + if (need_resched())
> > + schedule();
> > + read_lock(&tasklist_lock);
> > + put_task_struct(t);
> > +}
>
> does not actually achieve that. Releasing a lock does not mean that other
> CPUs will immediately be able to get it - if the ex-owner quickly
> re-acquires it then it will often succeed in doing so. Perhaps adding a
On x86, spinlocks are now FIFO. But yeah, shouldn't be relying on
architecture-dependent implementation of spinlocks.
> cpu_relax() would increase the chance ... but still, it looks a bit weird.
Done.
The unlock and lock could be removed and only compiled in if PREEMPT.
If the number of tasks isn't bound, the lock might be held too long.
It would be kinda funny if hung_task caused a softlockup.
---
To avoid holding the tasklist lock too long, hung_task_check_count was used
as an upper bound on the number of tasks that are checked by hung_task.
This patch removes the hung_task_check_count sysctl.
Instead of checking a limited number of tasks, all tasks are checked. To
avoid holding the tasklist lock too long, the lock is released periodically
and the processor rescheduled if necessary.
The design was proposed by Fr?d?ric Weisbecker.
Fr?d?ric Weisbecker ([email protected]) wrote:
>
> Instead of having this arbitrary limit of tasks, why not just
> lurk the need_resched() and then schedule if it needs too.
>
> I know that sounds a bit racy, because you will have to release the
> tasklist_lock and
> a lot of things can happen in the task list until you become resched.
> But you can do a get_task_struct() on g and t before your thread is
> going to sleep and then put them
> when it is awaken.
> Perhaps some tasks will disappear or be appended in the list before g
> and t, but that doesn't really matter:
> if they disappear, they didn't lockup, and if they were appended, they
> are not enough cold to be analyzed :-)
>
> This way you can drop the arbitrary limit of task number given by the user....
>
> Frederic.
>
Signed-off-by: Mandeep Singh Baines <[email protected]>
---
include/linux/sched.h | 1 -
kernel/hung_task.c | 33 +++++++++++++++++++++++++++++----
kernel/sysctl.c | 9 ---------
3 files changed, 29 insertions(+), 14 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f2f94d5..278121c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -315,7 +315,6 @@ static inline void touch_all_softlockup_watchdogs(void)
#ifdef CONFIG_DETECT_HUNG_TASK
extern unsigned int sysctl_hung_task_panic;
-extern unsigned long sysctl_hung_task_check_count;
extern unsigned long sysctl_hung_task_timeout_secs;
extern unsigned long sysctl_hung_task_warnings;
extern int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index ba8ccd4..5669283 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -19,7 +19,7 @@
/*
* Have a reasonable limit on the number of tasks checked:
*/
-unsigned long __read_mostly sysctl_hung_task_check_count = 1024;
+#define HUNG_TASK_CHECK_COUNT 1024
/*
* Zero means infinite timeout - no checking done:
@@ -110,13 +110,30 @@ static void check_hung_task(struct task_struct *t, unsigned long now,
}
/*
+ * To avoid holding the tasklist read_lock for an unbounded amount of time,
+ * periodically release the lock and reschedule if necessary. This gives other
+ * tasks a chance to run and a chance to grab the write_lock.
+ */
+static void check_hung_reschedule(struct task_struct *t)
+{
+ get_task_struct(t);
+ read_unlock(&tasklist_lock);
+ if (need_resched())
+ schedule();
+ else
+ cpu_relax();
+ read_lock(&tasklist_lock);
+ put_task_struct(t);
+}
+
+/*
* Check whether a TASK_UNINTERRUPTIBLE does not get woken up for
* a really long time (120 seconds). If that happens, print out
* a warning.
*/
static void check_hung_uninterruptible_tasks(unsigned long timeout)
{
- int max_count = sysctl_hung_task_check_count;
+ int max_count = HUNG_TASK_CHECK_COUNT;
unsigned long now = get_timestamp();
struct task_struct *g, *t;
@@ -129,8 +146,16 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
read_lock(&tasklist_lock);
do_each_thread(g, t) {
- if (!--max_count)
- goto unlock;
+ if (!--max_count) {
+ max_count = HUNG_TASK_CHECK_COUNT;
+ check_hung_reschedule(t);
+ /*
+ * Exit loop if t was unlinked during reschedule.
+ * Try again later.
+ */
+ if (t->state == TASK_DEAD)
+ goto unlock;
+ }
/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
if (t->state == TASK_UNINTERRUPTIBLE)
check_hung_task(t, now, timeout);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 2481ed3..16526a2 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -820,15 +820,6 @@ static struct ctl_table kern_table[] = {
},
{
.ctl_name = CTL_UNNUMBERED,
- .procname = "hung_task_check_count",
- .data = &sysctl_hung_task_check_count,
- .maxlen = sizeof(unsigned long),
- .mode = 0644,
- .proc_handler = &proc_doulongvec_minmax,
- .strategy = &sysctl_intvec,
- },
- {
- .ctl_name = CTL_UNNUMBERED,
.procname = "hung_task_timeout_secs",
.data = &sysctl_hung_task_timeout_secs,
.maxlen = sizeof(unsigned long),
--
1.5.4.5
On Thu, Jan 22, 2009 at 11:55 AM, Mandeep Singh Baines <[email protected]> wrote:
>
> The unlock and lock could be removed and only compiled in if PREEMPT.
> If the number of tasks isn't bound, the lock might be held too long.
>
This is incorrect. The adding the lock and unlock will not make the
system more pre-emptive. To be more pre-emptive you'd want to check
need_resched() as often as possible.
> It would be kinda funny if hung_task caused a softlockup.
>
Again. This is incorrect. Rescheduling if need_resched() will prevent
softlockup.
Not sure what I was thinking this morning;)
However, I am happy with the patch. To give writers a chance, the lock
should held for bounded time. Holding the lock in khungtask (which is
running at low scheduler priority) could potentially be delaying
important work. The longer the lock is held, the bigger the priority
inversion problem.
* Mandeep Baines <[email protected]> wrote:
> On Thu, Jan 22, 2009 at 11:55 AM, Mandeep Singh Baines <[email protected]> wrote:
> >
> > The unlock and lock could be removed and only compiled in if PREEMPT.
> > If the number of tasks isn't bound, the lock might be held too long.
> >
>
> This is incorrect. The adding the lock and unlock will not make the
> system more pre-emptive. To be more pre-emptive you'd want to check
> need_resched() as often as possible.
>
> > It would be kinda funny if hung_task caused a softlockup.
> >
>
> Again. This is incorrect. Rescheduling if need_resched() will prevent
> softlockup.
>
> Not sure what I was thinking this morning;)
>
> However, I am happy with the patch. To give writers a chance, the lock
> should held for bounded time. Holding the lock in khungtask (which is
> running at low scheduler priority) could potentially be delaying
> important work. The longer the lock is held, the bigger the priority
> inversion problem.
not sure i like the whole idea of removing the max iterations check. In
theory if there's a _ton_ of tasks, we could spend a lot of time looping
there. So it always looked prudent to limit it somewhat.
Ingo
2009/1/23 Ingo Molnar <[email protected]>:
>
> * Mandeep Baines <[email protected]> wrote:
>
>> On Thu, Jan 22, 2009 at 11:55 AM, Mandeep Singh Baines <[email protected]> wrote:
>> >
>> > The unlock and lock could be removed and only compiled in if PREEMPT.
>> > If the number of tasks isn't bound, the lock might be held too long.
>> >
>>
>> This is incorrect. The adding the lock and unlock will not make the
>> system more pre-emptive. To be more pre-emptive you'd want to check
>> need_resched() as often as possible.
>>
>> > It would be kinda funny if hung_task caused a softlockup.
>> >
>>
>> Again. This is incorrect. Rescheduling if need_resched() will prevent
>> softlockup.
>>
>> Not sure what I was thinking this morning;)
>>
>> However, I am happy with the patch. To give writers a chance, the lock
>> should held for bounded time. Holding the lock in khungtask (which is
>> running at low scheduler priority) could potentially be delaying
>> important work. The longer the lock is held, the bigger the priority
>> inversion problem.
>
> not sure i like the whole idea of removing the max iterations check. In
> theory if there's a _ton_ of tasks, we could spend a lot of time looping
> there. So it always looked prudent to limit it somewhat.
>
> Ingo
>
Which means we can loose several of them. Would it hurt to iterate as
much as possible along the task list,
keeping some care about writers starvation and latency?
BTW I thought about the slow work framework, but I can't retrieve
it.... But this thread has already a slow priority.
Would it be interesting to provide a way for rwlocks to know if there
is writer waiting for the lock?
Fr?d?ric Weisbecker ([email protected]) wrote:
> 2009/1/23 Ingo Molnar <[email protected]>:
> >
> > not sure i like the whole idea of removing the max iterations check. In
> > theory if there's a _ton_ of tasks, we could spend a lot of time looping
> > there. So it always looked prudent to limit it somewhat.
> >
>
> Which means we can loose several of them. Would it hurt to iterate as
> much as possible along the task list,
> keeping some care about writers starvation and latency?
> BTW I thought about the slow work framework, but I can't retrieve
> it.... But this thread has already a slow priority.
>
> Would it be interesting to provide a way for rwlocks to know if there
> is writer waiting for the lock?
Would be cool if that API existed. You could release the CPU and/or lock as
soon as either was contended for. You'd have the benefits of fine-grained
locking without the overhead of locking and unlocking multiple time.
Currently, there is no bit that can tell you there is a writer waiting. You'd
probably need to change the write_lock() implementation at a minimum. Maybe
if the first writer left the RW_LOCK_BIAS bit clear and then waited for the
readers to leave instead of re-trying? That would actually make write_lock()
more efficient for the 1-writer case since you'd only need to spin doing
a read in the failure case instead of an atomic_dec and atomic_inc.
Ingo Molnar ([email protected]) wrote:
>
> not sure i like the whole idea of removing the max iterations check. In
> theory if there's a _ton_ of tasks, we could spend a lot of time looping
> there. So it always looked prudent to limit it somewhat.
>
We could go back to exporting max iterations to proc, and set the nice
value higher.
Or:
Instead of searching the tasklist from the beginning every time, continue
where you left off. On loaded systems, will take a while to search the
entire list but at least all tasks will be checked.
Something like this:
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index ba8ccd4..d220796 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -109,6 +109,15 @@ static void check_hung_task(struct task_struct *t, unsigned long now,
panic("hung_task: blocked tasks");
}
+static void wait_till_next_iteration(struct task_struct *t)
+{
+ get_task_state(t);
+ read_unlock(&tasklist_lock);
+ schedule_timeout_interruptible(hung_task_poll_jiffies);
+ read_lock(&tasklist_lock);
+ put_task_state(t);
+}
+
/*
* Check whether a TASK_UNINTERRUPTIBLE does not get woken up for
* a really long time (120 seconds). If that happens, print out
@@ -129,8 +138,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
read_lock(&tasklist_lock);
do_each_thread(g, t) {
- if (!--max_count)
- goto unlock;
+ if (!--max_count) {
+ max_count = HUNG_TASK_CHECK_COUNT;
+ wait_till_next_iteration(t);
+ timeout = sysctl_hung_task_timeout_secs;
+ /* Exit loop if t was unlinked or timeout set to 0. */
+ if (!timeout || t->state == TASK_DEAD)
+ goto unlock;
+ }
/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
if (t->state == TASK_UNINTERRUPTIBLE)
check_hung_task(t, now, timeout);
On Fri, Jan 23, 2009 at 05:55:14PM -0800, Mandeep Singh Baines wrote:
> Fr?d?ric Weisbecker ([email protected]) wrote:
> > 2009/1/23 Ingo Molnar <[email protected]>:
> > >
> > > not sure i like the whole idea of removing the max iterations check. In
> > > theory if there's a _ton_ of tasks, we could spend a lot of time looping
> > > there. So it always looked prudent to limit it somewhat.
> > >
> >
> > Which means we can loose several of them. Would it hurt to iterate as
> > much as possible along the task list,
> > keeping some care about writers starvation and latency?
> > BTW I thought about the slow work framework, but I can't retrieve
> > it.... But this thread has already a slow priority.
> >
> > Would it be interesting to provide a way for rwlocks to know if there
> > is writer waiting for the lock?
>
> Would be cool if that API existed. You could release the CPU and/or lock as
> soon as either was contended for. You'd have the benefits of fine-grained
> locking without the overhead of locking and unlocking multiple time.
>
> Currently, there is no bit that can tell you there is a writer waiting. You'd
> probably need to change the write_lock() implementation at a minimum. Maybe
> if the first writer left the RW_LOCK_BIAS bit clear and then waited for the
> readers to leave instead of re-trying? That would actually make write_lock()
> more efficient for the 1-writer case since you'd only need to spin doing
> a read in the failure case instead of an atomic_dec and atomic_inc.
>
This is already what is done in the slow path (in x86):
/* rdi: pointer to rwlock_t */
ENTRY(__write_lock_failed)
CFI_STARTPROC
LOCK_PREFIX
addl $RW_LOCK_BIAS,(%rdi)
1: rep
nop
cmpl $RW_LOCK_BIAS,(%rdi)
jne 1b
LOCK_PREFIX
subl $RW_LOCK_BIAS,(%rdi)
jnz __write_lock_failed
ret
CFI_ENDPROC
END(__write_lock_failed)
It spins lurking at the lock value and only if there are no writers nor readers that
own the lock, it restarts its atomic_sub (and then atomic_add in fail case).
And if an implementation of writers_waiting_for_lock() is needed, I guess this
is the perfect place. One atomic_add on a "waiters_count" on entry and an atomic_sub
on it on exit.
Since this is the slow_path, I guess that wouldn't really impact the performances....
Hi Fr?d?ric,
Yeah, that might work. I think you can even embed waiters_count inside
the lock. I think this can be
done with minimal changes and with no overhead added to the lock
functions. Maybe use bits 7-0 for waiters_count and move the readers
count up 8 bits.
To make room for writers_count I'd change read_lock from:
static inline void __raw_read_lock(raw_rwlock_t *rw)
{
asm volatile(LOCK_PREFIX " subl $1,(%0)\n\t"
"jns 1f\n"
"call __read_lock_failed\n\t"
"1:\n"
::LOCK_PTR_REG (rw) : "memory");
}
To:
static inline void __raw_read_lock(raw_rwlock_t *rw)
{
asm volatile(LOCK_PREFIX " subl $0x100,(%0)\n\t"
"jns 1f\n"
"call __read_lock_failed\n\t"
"1:\n"
::LOCK_PTR_REG (rw) : "memory");
}
I think that's all you'd need to change for read_lock. For write_lock
you'd have to subtract (RW_LOCK_BIAS - 1) instead of just RW_LOCK_BIAS
during lock and add it back for unlock. You'd also have to change the
tests a little to ignore bits 7-0.
Regards,
Mandeep
On Sat, Jan 24, 2009 at 7:52 AM, Frederic Weisbecker <[email protected]> wrote:
> On Fri, Jan 23, 2009 at 05:55:14PM -0800, Mandeep Singh Baines wrote:
>> Fr?d?ric Weisbecker ([email protected]) wrote:
>> > 2009/1/23 Ingo Molnar <[email protected]>:
>> > >
>> > > not sure i like the whole idea of removing the max iterations check. In
>> > > theory if there's a _ton_ of tasks, we could spend a lot of time looping
>> > > there. So it always looked prudent to limit it somewhat.
>> > >
>> >
>> > Which means we can loose several of them. Would it hurt to iterate as
>> > much as possible along the task list,
>> > keeping some care about writers starvation and latency?
>> > BTW I thought about the slow work framework, but I can't retrieve
>> > it.... But this thread has already a slow priority.
>> >
>> > Would it be interesting to provide a way for rwlocks to know if there
>> > is writer waiting for the lock?
>>
>> Would be cool if that API existed. You could release the CPU and/or lock as
>> soon as either was contended for. You'd have the benefits of fine-grained
>> locking without the overhead of locking and unlocking multiple time.
>>
>> Currently, there is no bit that can tell you there is a writer waiting. You'd
>> probably need to change the write_lock() implementation at a minimum. Maybe
>> if the first writer left the RW_LOCK_BIAS bit clear and then waited for the
>> readers to leave instead of re-trying? That would actually make write_lock()
>> more efficient for the 1-writer case since you'd only need to spin doing
>> a read in the failure case instead of an atomic_dec and atomic_inc.
>>
>
>
> This is already what is done in the slow path (in x86):
>
> /* rdi: pointer to rwlock_t */
> ENTRY(__write_lock_failed)
> CFI_STARTPROC
> LOCK_PREFIX
> addl $RW_LOCK_BIAS,(%rdi)
> 1: rep
> nop
> cmpl $RW_LOCK_BIAS,(%rdi)
> jne 1b
> LOCK_PREFIX
> subl $RW_LOCK_BIAS,(%rdi)
> jnz __write_lock_failed
> ret
> CFI_ENDPROC
> END(__write_lock_failed)
>
> It spins lurking at the lock value and only if there are no writers nor readers that
> own the lock, it restarts its atomic_sub (and then atomic_add in fail case).
>
> And if an implementation of writers_waiting_for_lock() is needed, I guess this
> is the perfect place. One atomic_add on a "waiters_count" on entry and an atomic_sub
> on it on exit.
>
> Since this is the slow_path, I guess that wouldn't really impact the performances....
>
>