2007-09-02 02:43:59

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH] TASK_KILLABLE version 2


Here's the second version of TASK_KILLABLE. A few changes since version 1:

- Don't split up TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE.
TASK_WAKESIGNAL and TASK_LOADAVG were pretty much equivalent, and since
we had to keep __TASK_{UN,}INTERRUPTIBLE anyway, splitting them made
little sense.
- Instead, I've added some is_task_*() predicates. I think they
achieve what Linus wanted without consuming flag bits and without
checking a metric tonne of code to see whether it wanted
TASK_UNINTERRUPTIBLE or __TASK_UNINTERRUPTIBLE.
- Renamed try_lock_page() to lock_page_killable() and change the sense.
I had envisioned uses being like spin_trylock(), but that turned out
not to make sense.
- Fix the bug Trond noticed in wait_on_retry_sync_kiocb() by having the
callers handle it returning -EINTR
- Fix the bug I noticed in sync_pages_killable() by adding the new
function fatal_signal_pending()
- Audit a lot of uses of TASK_*. A few seemed dubious and were fixed.

I'd like to see this spend a bit of time in the -mm tree. I've wasted
a couple of days trying to track down why my machine no longer does
much after starting init, so clearly I'm tampering with stuff I don't
quite understand.

Having said that, I ditched that version of the code and started again
with a much more minimalist approach. I've also split the patch into
five independent pieces, each of which accomplishes a logical step,
for ease of bisection.

I think patch 1/5 is clearly Right and should be merged ASAP. Patch 2/5
and 3/5 carry some risk. 4/5 and 5/5 seem less risky to me (and are
independent of each other; both rely on 1-3 being applied first)

I obviously haven't covered every place that can result in a process
sleeping uninterruptibly while attempting an operation. But sync_page
(patch 4/5) covers about 90% of the times I've attempted to kill cat,
and I hope that by providing the two examples, I can help other people
to fix the cases that they find interesting.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."


2007-09-02 02:47:14

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 1/5] Use wake_up_locked() in eventpoll

Replace the uses of __wake_up_locked with wake_up_locked

Signed-off-by: Matthew Wilcox <[email protected]>
---
fs/eventpoll.c | 11 ++++-------
1 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 77b9953..72e4cb4 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -657,8 +657,7 @@ is_linked:
* wait list.
*/
if (waitqueue_active(&ep->wq))
- __wake_up_locked(&ep->wq, TASK_UNINTERRUPTIBLE |
- TASK_INTERRUPTIBLE);
+ wake_up_locked(&ep->wq);
if (waitqueue_active(&ep->poll_wait))
pwake++;

@@ -781,7 +780,7 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,

/* Notify waiting tasks that events are available */
if (waitqueue_active(&ep->wq))
- __wake_up_locked(&ep->wq, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE);
+ wake_up_locked(&ep->wq);
if (waitqueue_active(&ep->poll_wait))
pwake++;
}
@@ -855,8 +854,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even

/* Notify waiting tasks that events are available */
if (waitqueue_active(&ep->wq))
- __wake_up_locked(&ep->wq, TASK_UNINTERRUPTIBLE |
- TASK_INTERRUPTIBLE);
+ wake_up_locked(&ep->wq);
if (waitqueue_active(&ep->poll_wait))
pwake++;
}
@@ -979,8 +977,7 @@ errxit:
* wait list (delayed after we release the lock).
*/
if (waitqueue_active(&ep->wq))
- __wake_up_locked(&ep->wq, TASK_UNINTERRUPTIBLE |
- TASK_INTERRUPTIBLE);
+ wake_up_locked(&ep->wq);
if (waitqueue_active(&ep->poll_wait))
pwake++;
}
--
1.4.4.2

2007-09-02 02:47:31

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 4/5] Add lock_page_killable

and associated infrastructure such as sync_page_killable and
fatal_signal_pending. Use lock_page_killable in do_generic_mapping_read()
to allow us to kill `cat' of a file on an NFS-mounted filesystem.

Signed-off-by: Matthew Wilcox <[email protected]>
---
include/linux/pagemap.h | 14 ++++++++++++++
include/linux/sched.h | 9 ++++++++-
kernel/signal.c | 5 +++++
mm/filemap.c | 25 +++++++++++++++++++++----
4 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 8a83537..8b4f533 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -154,6 +154,7 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
}

extern void FASTCALL(__lock_page(struct page *page));
+extern int FASTCALL(__lock_page_killable(struct page *page));
extern void FASTCALL(__lock_page_nosync(struct page *page));
extern void FASTCALL(unlock_page(struct page *page));

@@ -168,6 +169,19 @@ static inline void lock_page(struct page *page)
}

/*
+ * lock_page_killable is like lock_page but can be interrupted by fatal
+ * signals. It returns 0 if it locked the page and -EINTR if it was
+ * killed while waiting.
+ */
+static inline int lock_page_killable(struct page *page)
+{
+ might_sleep();
+ if (TestSetPageLocked(page))
+ return __lock_page_killable(page);
+ return 0;
+}
+
+/*
* lock_page_nosync should only be used if we can't pin the page's inode.
* Doesn't play quite so well with block device plugging.
*/
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6769179..e6f20da 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1769,7 +1769,14 @@ static inline int signal_pending(struct task_struct *p)
{
return unlikely(test_tsk_thread_flag(p,TIF_SIGPENDING));
}
-
+
+extern int FASTCALL(__fatal_signal_pending(struct task_struct *p));
+
+static inline int fatal_signal_pending(struct task_struct *p)
+{
+ return signal_pending(p) && __fatal_signal_pending(p);
+}
+
static inline int need_resched(void)
{
return unlikely(test_thread_flag(TIF_NEED_RESCHED));
diff --git a/kernel/signal.c b/kernel/signal.c
index 986ba10..2b4fe29 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1003,6 +1003,11 @@ void zap_other_threads(struct task_struct *p)
}
}

+int fastcall __fatal_signal_pending(struct task_struct *tsk)
+{
+ return sigismember(&tsk->pending.signal, SIGKILL);
+}
+
/*
* Must be called under rcu_read_lock() or with tasklist_lock read-held.
*/
diff --git a/mm/filemap.c b/mm/filemap.c
index 90b657b..235f092 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -170,6 +170,12 @@ static int sync_page(void *word)
return 0;
}

+static int sync_page_killable(void *word)
+{
+ sync_page(word);
+ return fatal_signal_pending(current) ? -EINTR : 0;
+}
+
/**
* __filemap_fdatawrite_range - start writeback on mapping dirty pages in range
* @mapping: address space structure to write
@@ -574,6 +580,14 @@ void fastcall __lock_page(struct page *page)
}
EXPORT_SYMBOL(__lock_page);

+int fastcall __lock_page_killable(struct page *page)
+{
+ DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
+
+ return __wait_on_bit_lock(page_waitqueue(page), &wait,
+ sync_page_killable, TASK_KILLABLE);
+}
+
/*
* Variant of lock_page that does not require the caller to hold a reference
* on the page's mapping.
@@ -975,7 +989,8 @@ page_ok:

page_not_up_to_date:
/* Get exclusive access to the page ... */
- lock_page(page);
+ if (lock_page_killable(page))
+ goto readpage_eio;

/* Did it get truncated before we got the lock? */
if (!page->mapping) {
@@ -1003,7 +1018,8 @@ readpage:
}

if (!PageUptodate(page)) {
- lock_page(page);
+ if (lock_page_killable(page))
+ goto readpage_eio;
if (!PageUptodate(page)) {
if (page->mapping == NULL) {
/*
@@ -1014,15 +1030,16 @@ readpage:
goto find_page;
}
unlock_page(page);
- error = -EIO;
shrink_readahead_size_eio(filp, &ra);
- goto readpage_error;
+ goto readpage_eio;
}
unlock_page(page);
}

goto page_ok;

+readpage_eio:
+ error = -EIO;
readpage_error:
/* UHHUH! A synchronous read error occurred. Report it */
desc->error = error;
--
1.4.4.2

2007-09-02 02:47:53

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 3/5] Add TASK_WAKEKILL

Set TASK_WAKEKILL for TASK_STOPPED and TASK_TRACED, add TASK_KILLABLE and
use TASK_WAKEKILL in signal_wake_up()

Signed-off-by: Matthew Wilcox <[email protected]>
---
include/linux/sched.h | 22 ++++++++++++++--------
kernel/signal.c | 8 ++++----
2 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ea509e9..6769179 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -167,28 +167,34 @@ print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
#define TASK_RUNNING 0
#define TASK_INTERRUPTIBLE 1
#define TASK_UNINTERRUPTIBLE 2
-#define TASK_STOPPED 4
-#define TASK_TRACED 8
+#define __TASK_STOPPED 4
+#define __TASK_TRACED 8
/* in tsk->exit_state */
#define EXIT_ZOMBIE 16
#define EXIT_DEAD 32
/* in tsk->state again */
#define TASK_NONINTERACTIVE 64
#define TASK_DEAD 128
+#define TASK_WAKEKILL 256
+
+/* Convenience macros for the sake of set_task_state */
+#define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
+#define TASK_STOPPED (TASK_WAKEKILL | __TASK_STOPPED)
+#define TASK_TRACED (TASK_WAKEKILL | __TASK_TRACED)

/* Convenience macros for the sake of wake_up */
#define TASK_NORMAL (TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE)
-#define TASK_ALL (TASK_NORMAL | TASK_STOPPED | TASK_TRACED)
+#define TASK_ALL (TASK_NORMAL | __TASK_STOPPED | __TASK_TRACED)

/* get_task_state() */
#define TASK_REPORT (TASK_RUNNING | TASK_INTERRUPTIBLE | \
- TASK_UNINTERRUPTIBLE | TASK_STOPPED | \
- TASK_TRACED)
+ TASK_UNINTERRUPTIBLE | __TASK_STOPPED | \
+ __TASK_TRACED)

-#define is_task_traced(task) ((task->state & TASK_TRACED) != 0)
-#define is_task_stopped(task) ((task->state & TASK_STOPPED) != 0)
+#define is_task_traced(task) ((task->state & __TASK_TRACED) != 0)
+#define is_task_stopped(task) ((task->state & __TASK_STOPPED) != 0)
#define is_task_stopped_or_traced(task) \
- ((task->state & (TASK_STOPPED | TASK_TRACED)) != 0)
+ ((task->state & (__TASK_STOPPED | __TASK_TRACED)) != 0)
#define is_task_loadavg(task) ((task->state & TASK_UNINTERRUPTIBLE) != 0)

#define __set_task_state(tsk, state_value) \
diff --git a/kernel/signal.c b/kernel/signal.c
index 53cbac4..986ba10 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -459,15 +459,15 @@ void signal_wake_up(struct task_struct *t, int resume)
set_tsk_thread_flag(t, TIF_SIGPENDING);

/*
- * For SIGKILL, we want to wake it up in the stopped/traced case.
- * We don't check t->state here because there is a race with it
+ * For SIGKILL, we want to wake it up in the stopped/traced/killable
+ * case. We don't check t->state here because there is a race with it
* executing another processor and just now entering stopped state.
* By using wake_up_state, we ensure the process will wake up and
* handle its death signal.
*/
mask = TASK_INTERRUPTIBLE;
if (resume)
- mask |= TASK_STOPPED | TASK_TRACED;
+ mask |= TASK_WAKEKILL;
if (!wake_up_state(t, mask))
kick_process(t);
}
@@ -623,7 +623,7 @@ static void handle_stop_signal(int sig, struct task_struct *p)
* Wake up the stopped thread _after_ setting
* TIF_SIGPENDING
*/
- state = TASK_STOPPED;
+ state = __TASK_STOPPED;
if (sig_user_defined(t, SIGCONT) && !sigismember(&t->blocked, SIGCONT)) {
set_tsk_thread_flag(t, TIF_SIGPENDING);
state |= TASK_INTERRUPTIBLE;
--
1.4.4.2

2007-09-02 02:48:30

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 2/5] Use macros instead of TASK_ flags

Abstracting away direct uses of TASK_ flags allows us to change the
definitions of the task flags more easily.

Also restructure do_wait() a little

Signed-off-by: Matthew Wilcox <[email protected]>
---
arch/ia64/kernel/perfmon.c | 4 +-
fs/proc/array.c | 9 +---
fs/proc/base.c | 2 +-
include/linux/sched.h | 15 +++++++
include/linux/wait.h | 11 +++--
kernel/exit.c | 90 +++++++++++++++++++------------------------
kernel/power/process.c | 7 +--
kernel/ptrace.c | 8 ++--
kernel/sched.c | 15 +++----
kernel/signal.c | 6 +-
kernel/wait.c | 2 +-
11 files changed, 83 insertions(+), 86 deletions(-)

diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c
index 14b8e5a..75a99ee 100644
--- a/arch/ia64/kernel/perfmon.c
+++ b/arch/ia64/kernel/perfmon.c
@@ -2637,7 +2637,7 @@ pfm_task_incompatible(pfm_context_t *ctx, struct task_struct *task)
*/
if (task == current) return 0;

- if ((task->state != TASK_STOPPED) && (task->state != TASK_TRACED)) {
+ if (!is_task_stopped_or_traced(task->state)) {
DPRINT(("cannot attach to non-stopped task [%d] state=%ld\n", task->pid, task->state));
return -EBUSY;
}
@@ -4797,7 +4797,7 @@ recheck:
* the task must be stopped.
*/
if (PFM_CMD_STOPPED(cmd)) {
- if ((task->state != TASK_STOPPED) && (task->state != TASK_TRACED)) {
+ if (!is_task_stopped_or_traced(task)) {
DPRINT(("[%d] task not in stopped state\n", task->pid));
return -EBUSY;
}
diff --git a/fs/proc/array.c b/fs/proc/array.c
index ee4814d..6a3c876 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -140,13 +140,8 @@ static const char *task_state_array[] = {

static inline const char *get_task_state(struct task_struct *tsk)
{
- unsigned int state = (tsk->state & (TASK_RUNNING |
- TASK_INTERRUPTIBLE |
- TASK_UNINTERRUPTIBLE |
- TASK_STOPPED |
- TASK_TRACED)) |
- (tsk->exit_state & (EXIT_ZOMBIE |
- EXIT_DEAD));
+ unsigned int state = (tsk->state & TASK_REPORT) |
+ (tsk->exit_state & (EXIT_ZOMBIE | EXIT_DEAD));
const char **p = &task_state_array[0];

while (state) {
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 19489b0..3155ef1 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -196,7 +196,7 @@ static int proc_root_link(struct inode *inode, struct dentry **dentry, struct vf
(task == current || \
(task->parent == current && \
(task->ptrace & PT_PTRACED) && \
- (task->state == TASK_STOPPED || task->state == TASK_TRACED) && \
+ (is_task_stopped_or_traced(task)) && \
security_ptrace(current,task) == 0))

static int proc_pid_environ(struct task_struct *task, char * buffer)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f4e324e..ea509e9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -176,6 +176,21 @@ print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
#define TASK_NONINTERACTIVE 64
#define TASK_DEAD 128

+/* Convenience macros for the sake of wake_up */
+#define TASK_NORMAL (TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE)
+#define TASK_ALL (TASK_NORMAL | TASK_STOPPED | TASK_TRACED)
+
+/* get_task_state() */
+#define TASK_REPORT (TASK_RUNNING | TASK_INTERRUPTIBLE | \
+ TASK_UNINTERRUPTIBLE | TASK_STOPPED | \
+ TASK_TRACED)
+
+#define is_task_traced(task) ((task->state & TASK_TRACED) != 0)
+#define is_task_stopped(task) ((task->state & TASK_STOPPED) != 0)
+#define is_task_stopped_or_traced(task) \
+ ((task->state & (TASK_STOPPED | TASK_TRACED)) != 0)
+#define is_task_loadavg(task) ((task->state & TASK_UNINTERRUPTIBLE) != 0)
+
#define __set_task_state(tsk, state_value) \
do { (tsk)->state = (state_value); } while (0)
#define set_task_state(tsk, state_value) \
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 0e68628..0a410a4 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -152,14 +152,15 @@ int FASTCALL(out_of_line_wait_on_bit(void *, int, int (*)(void *), unsigned));
int FASTCALL(out_of_line_wait_on_bit_lock(void *, int, int (*)(void *), unsigned));
wait_queue_head_t *FASTCALL(bit_waitqueue(void *, int));

-#define wake_up(x) __wake_up(x, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 1, NULL)
-#define wake_up_nr(x, nr) __wake_up(x, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, nr, NULL)
-#define wake_up_all(x) __wake_up(x, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 0, NULL)
+#define wake_up(x) __wake_up(x, TASK_NORMAL, 1, NULL)
+#define wake_up_nr(x, nr) __wake_up(x, TASK_NORMAL, nr, NULL)
+#define wake_up_all(x) __wake_up(x, TASK_NORMAL, 0, NULL)
+#define wake_up_locked(x) __wake_up_locked((x), TASK_NORMAL)
+
#define wake_up_interruptible(x) __wake_up(x, TASK_INTERRUPTIBLE, 1, NULL)
#define wake_up_interruptible_nr(x, nr) __wake_up(x, TASK_INTERRUPTIBLE, nr, NULL)
#define wake_up_interruptible_all(x) __wake_up(x, TASK_INTERRUPTIBLE, 0, NULL)
-#define wake_up_locked(x) __wake_up_locked((x), TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE)
-#define wake_up_interruptible_sync(x) __wake_up_sync((x),TASK_INTERRUPTIBLE, 1)
+#define wake_up_interruptible_sync(x) __wake_up_sync((x), TASK_INTERRUPTIBLE, 1)

#define __wait_event(wq, condition) \
do { \
diff --git a/kernel/exit.c b/kernel/exit.c
index 06b24b3..3abc703 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -259,7 +259,7 @@ static int has_stopped_jobs(struct pid *pgrp)
struct task_struct *p;

do_each_pid_task(pgrp, PIDTYPE_PGID, p) {
- if (p->state != TASK_STOPPED)
+ if (is_task_stopped(p))
continue;
retval = 1;
break;
@@ -634,7 +634,7 @@ reparent_thread(struct task_struct *p, struct task_struct *father, int traced)
p->parent = p->real_parent;
add_parent(p);

- if (p->state == TASK_TRACED) {
+ if (is_task_traced(p)) {
/*
* If it was at a trace stop, turn it into
* a normal stop since it's no longer being
@@ -1372,7 +1372,7 @@ static int wait_task_stopped(struct task_struct *p, int delayed_group_leader,

exit_code = p->exit_code;
if (unlikely(!exit_code) ||
- unlikely(p->state & TASK_TRACED))
+ unlikely(is_task_traced(p)))
goto bail_ref;
return wait_noreap_copyout(p, pid, uid,
why, (exit_code << 8) | 0x7f,
@@ -1554,60 +1554,51 @@ repeat:
}
allowed = 1;

- switch (p->state) {
- case TASK_TRACED:
- /*
- * When we hit the race with PTRACE_ATTACH,
- * we will not report this child. But the
- * race means it has not yet been moved to
- * our ptrace_children list, so we need to
- * set the flag here to avoid a spurious ECHILD
- * when the race happens with the only child.
- */
- flag = 1;
- if (!my_ptrace_child(p))
- continue;
- /*FALLTHROUGH*/
- case TASK_STOPPED:
+ if (is_task_stopped_or_traced(p)) {
/*
* It's stopped now, so it might later
* continue, exit, or stop again.
+ *
+ * When we hit the race with PTRACE_ATTACH, we
+ * will not report this child. But the race
+ * means it has not yet been moved to our
+ * ptrace_children list, so we need to set the
+ * flag here to avoid a spurious ECHILD when
+ * the race happens with the only child.
*/
flag = 1;
- if (!(options & WUNTRACED) &&
- !my_ptrace_child(p))
- continue;
+
+ if (!my_ptrace_child(p)) {
+ if (is_task_traced(p))
+ continue;
+ if (!(options & WUNTRACED))
+ continue;
+ }
+
retval = wait_task_stopped(p, ret == 2,
- (options & WNOWAIT),
- infop,
- stat_addr, ru);
+ (options & WNOWAIT), infop,
+ stat_addr, ru);
if (retval == -EAGAIN)
goto repeat;
if (retval != 0) /* He released the lock. */
goto end;
- break;
- default:
- // case EXIT_DEAD:
- if (p->exit_state == EXIT_DEAD)
+ } else if (p->exit_state == EXIT_DEAD) {
+ continue;
+ } else if (p->exit_state == EXIT_ZOMBIE) {
+ /*
+ * Eligible but we cannot release it yet:
+ */
+ if (ret == 2)
+ goto check_continued;
+ if (!likely(options & WEXITED))
continue;
- // case EXIT_ZOMBIE:
- if (p->exit_state == EXIT_ZOMBIE) {
- /*
- * Eligible but we cannot release
- * it yet:
- */
- if (ret == 2)
- goto check_continued;
- if (!likely(options & WEXITED))
- continue;
- retval = wait_task_zombie(
- p, (options & WNOWAIT),
- infop, stat_addr, ru);
- /* He released the lock. */
- if (retval != 0)
- goto end;
- break;
- }
+ retval = wait_task_zombie(p,
+ (options & WNOWAIT), infop,
+ stat_addr, ru);
+ /* He released the lock. */
+ if (retval != 0)
+ goto end;
+ } else {
check_continued:
/*
* It's running now, so it might later
@@ -1616,12 +1607,11 @@ check_continued:
flag = 1;
if (!unlikely(options & WCONTINUED))
continue;
- retval = wait_task_continued(
- p, (options & WNOWAIT),
- infop, stat_addr, ru);
+ retval = wait_task_continued(p,
+ (options & WNOWAIT), infop,
+ stat_addr, ru);
if (retval != 0) /* He released the lock. */
goto end;
- break;
}
}
if (!flag) {
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 3434940..ac0c27a 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -83,10 +83,10 @@ static void freeze_task(struct task_struct *p)
rmb();
if (!frozen(p)) {
set_freeze_flag(p);
- if (p->state == TASK_STOPPED)
+ if (is_task_stopped(p))
force_sig_specific(SIGSTOP, p);
spin_lock_irqsave(&p->sighand->siglock, flags);
- signal_wake_up(p, p->state == TASK_STOPPED);
+ signal_wake_up(p, is_task_stopped(p));
spin_unlock_irqrestore(&p->sighand->siglock, flags);
}
}
@@ -120,8 +120,7 @@ static int try_to_freeze_tasks(int freeze_user_space)
continue;

if (freeze_user_space) {
- if (p->state == TASK_TRACED &&
- frozen(p->parent)) {
+ if (is_task_traced(p) && frozen(p->parent)) {
cancel_freezing(p);
continue;
}
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 82a558b..92a2283 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -50,7 +50,7 @@ void __ptrace_link(struct task_struct *child, struct task_struct *new_parent)
void ptrace_untrace(struct task_struct *child)
{
spin_lock(&child->sighand->siglock);
- if (child->state == TASK_TRACED) {
+ if (is_task_traced(child)) {
if (child->signal->flags & SIGNAL_STOP_STOPPED) {
child->state = TASK_STOPPED;
} else {
@@ -78,7 +78,7 @@ void __ptrace_unlink(struct task_struct *child)
add_parent(child);
}

- if (child->state == TASK_TRACED)
+ if (is_task_traced(child))
ptrace_untrace(child);
}

@@ -102,9 +102,9 @@ int ptrace_check_attach(struct task_struct *child, int kill)
&& child->signal != NULL) {
ret = 0;
spin_lock_irq(&child->sighand->siglock);
- if (child->state == TASK_STOPPED) {
+ if (is_task_stopped(child)) {
child->state = TASK_TRACED;
- } else if (child->state != TASK_TRACED && !kill) {
+ } else if (!is_task_traced(child) && !kill) {
ret = -ESRCH;
}
spin_unlock_irq(&child->sighand->siglock);
diff --git a/kernel/sched.c b/kernel/sched.c
index b533d6d..e3be352 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -944,7 +944,7 @@ static int effective_prio(struct task_struct *p)
*/
static void activate_task(struct rq *rq, struct task_struct *p, int wakeup)
{
- if (p->state == TASK_UNINTERRUPTIBLE)
+ if (is_task_loadavg(p))
rq->nr_uninterruptible--;

enqueue_task(rq, p, wakeup);
@@ -958,7 +958,7 @@ static inline void activate_idle_task(struct task_struct *p, struct rq *rq)
{
update_rq_clock(rq);

- if (p->state == TASK_UNINTERRUPTIBLE)
+ if (is_task_loadavg(p))
rq->nr_uninterruptible--;

enqueue_task(rq, p, 0);
@@ -970,7 +970,7 @@ static inline void activate_idle_task(struct task_struct *p, struct rq *rq)
*/
static void deactivate_task(struct rq *rq, struct task_struct *p, int sleep)
{
- if (p->state == TASK_UNINTERRUPTIBLE)
+ if (is_task_loadavg(p))
rq->nr_uninterruptible++;

dequeue_task(rq, p, sleep);
@@ -1566,8 +1566,7 @@ out:

int fastcall wake_up_process(struct task_struct *p)
{
- return try_to_wake_up(p, TASK_STOPPED | TASK_TRACED |
- TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE, 0);
+ return try_to_wake_up(p, TASK_ALL, 0);
}
EXPORT_SYMBOL(wake_up_process);

@@ -3708,8 +3707,7 @@ void fastcall complete(struct completion *x)

spin_lock_irqsave(&x->wait.lock, flags);
x->done++;
- __wake_up_common(&x->wait, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE,
- 1, 0, NULL);
+ __wake_up_common(&x->wait, TASK_NORMAL, 1, 0, NULL);
spin_unlock_irqrestore(&x->wait.lock, flags);
}
EXPORT_SYMBOL(complete);
@@ -3720,8 +3718,7 @@ void fastcall complete_all(struct completion *x)

spin_lock_irqsave(&x->wait.lock, flags);
x->done += UINT_MAX/2;
- __wake_up_common(&x->wait, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE,
- 0, 0, NULL);
+ __wake_up_common(&x->wait, TASK_NORMAL, 0, 0, NULL);
spin_unlock_irqrestore(&x->wait.lock, flags);
}
EXPORT_SYMBOL(complete_all);
diff --git a/kernel/signal.c b/kernel/signal.c
index 3169bed..53cbac4 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -841,7 +841,7 @@ static inline int wants_signal(int sig, struct task_struct *p)
return 0;
if (sig == SIGKILL)
return 1;
- if (p->state & (TASK_STOPPED | TASK_TRACED))
+ if (is_task_stopped_or_traced(p))
return 0;
return task_curr(p) || !signal_pending(p);
}
@@ -1445,7 +1445,7 @@ void do_notify_parent(struct task_struct *tsk, int sig)
BUG_ON(sig == -1);

/* do_notify_parent_cldstop should have been called instead. */
- BUG_ON(tsk->state & (TASK_STOPPED|TASK_TRACED));
+ BUG_ON(is_task_stopped_or_traced(tsk));

BUG_ON(!tsk->ptrace &&
(tsk->group_leader != tsk || !thread_group_empty(tsk)));
@@ -1712,7 +1712,7 @@ static int do_signal_stop(int signr)
* so this check has no races.
*/
if (!t->exit_state &&
- !(t->state & (TASK_STOPPED|TASK_TRACED))) {
+ !is_task_stopped_or_traced(t)) {
stop_count++;
signal_wake_up(t, 0);
}
diff --git a/kernel/wait.c b/kernel/wait.c
index 444ddbf..f987688 100644
--- a/kernel/wait.c
+++ b/kernel/wait.c
@@ -215,7 +215,7 @@ void fastcall __wake_up_bit(wait_queue_head_t *wq, void *word, int bit)
{
struct wait_bit_key key = __WAIT_BIT_KEY_INITIALIZER(word, bit);
if (waitqueue_active(wq))
- __wake_up(wq, TASK_INTERRUPTIBLE|TASK_UNINTERRUPTIBLE, 1, &key);
+ __wake_up(wq, TASK_NORMAL, 1, &key);
}
EXPORT_SYMBOL(__wake_up_bit);

--
1.4.4.2

2007-09-02 02:48:49

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 5/5] Make wait_on_retry_sync_kiocb killable

Use TASK_KILLABLE to allow wait_on_retry_sync_kiocb to return -EINTR.
All callers then check the return value and break out of their loops.

Signed-off-by: Matthew Wilcox <[email protected]>
---
fs/read_write.c | 17 ++++++++++++-----
1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 507ddff..b1a52d5 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -218,14 +218,15 @@ Einval:
return -EINVAL;
}

-static void wait_on_retry_sync_kiocb(struct kiocb *iocb)
+static int wait_on_retry_sync_kiocb(struct kiocb *iocb)
{
- set_current_state(TASK_UNINTERRUPTIBLE);
+ set_current_state(TASK_KILLABLE);
if (!kiocbIsKicked(iocb))
schedule();
else
kiocbClearKicked(iocb);
__set_current_state(TASK_RUNNING);
+ return fatal_signal_pending(current) ? -EINTR : 0;
}

ssize_t do_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
@@ -242,7 +243,9 @@ ssize_t do_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *pp
ret = filp->f_op->aio_read(&kiocb, &iov, 1, kiocb.ki_pos);
if (ret != -EIOCBRETRY)
break;
- wait_on_retry_sync_kiocb(&kiocb);
+ ret = wait_on_retry_sync_kiocb(&kiocb);
+ if (ret)
+ break;
}

if (-EIOCBQUEUED == ret)
@@ -300,7 +303,9 @@ ssize_t do_sync_write(struct file *filp, const char __user *buf, size_t len, lof
ret = filp->f_op->aio_write(&kiocb, &iov, 1, kiocb.ki_pos);
if (ret != -EIOCBRETRY)
break;
- wait_on_retry_sync_kiocb(&kiocb);
+ ret = wait_on_retry_sync_kiocb(&kiocb);
+ if (ret)
+ break;
}

if (-EIOCBQUEUED == ret)
@@ -466,7 +471,9 @@ ssize_t do_sync_readv_writev(struct file *filp, const struct iovec *iov,
ret = fn(&kiocb, iov, nr_segs, kiocb.ki_pos);
if (ret != -EIOCBRETRY)
break;
- wait_on_retry_sync_kiocb(&kiocb);
+ ret = wait_on_retry_sync_kiocb(&kiocb);
+ if (ret)
+ break;
}

if (ret == -EIOCBQUEUED)
--
1.4.4.2

2007-09-02 02:54:21

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/5] Use macros instead of TASK_ flags

On Sat, Sep 01, 2007 at 10:46:51PM -0400, Matthew Wilcox wrote:
> Abstracting away direct uses of TASK_ flags allows us to change the
> definitions of the task flags more easily.
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -259,7 +259,7 @@ static int has_stopped_jobs(struct pid *pgrp)
> struct task_struct *p;
>
> do_each_pid_task(pgrp, PIDTYPE_PGID, p) {
> - if (p->state != TASK_STOPPED)
> + if (is_task_stopped(p))

Funny how I can't spot these things before I send them.

Fixed locally; I'll see if anyone else spots a problem before I send out
a v3.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2007-09-02 03:47:08

by Daniel Walker

[permalink] [raw]
Subject: Re: [PATCH 2/5] Use macros instead of TASK_ flags

On Sat, 2007-09-01 at 22:46 -0400, Matthew Wilcox wrote:
> */
> if (task == current) return 0;
>
> - if ((task->state != TASK_STOPPED) && (task->state !=
> TASK_TRACED)) {
> + if (!is_task_stopped_or_traced(task->state)) {
> DPRINT(("cannot attach to non-stopped task [%d] state=
> - if ((task->state != TASK_STOPPED) && (task->state !=
> TASK_TRACED)) {
> + if (!is_task_stopped_or_traced(task)) {
> DPRINT(("[%d] task not in stopped state\n",
> task->pid));
> return -EBUSY;
> }

Does it take task->state or task ?

Daniel

2007-09-02 04:05:41

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/5] Use macros instead of TASK_ flags

On Sat, Sep 01, 2007 at 08:35:06PM -0700, Daniel Walker wrote:
> Does it take task->state or task ?

task. Clearly I didn't test on ia64. (There was an iteration where it
took task->state, and I guess I missed one). Thanks for pointing out
this oops, I'll fix it for round three.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2007-09-03 21:03:53

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/5] Use macros instead of TASK_ flags

On Sat, Sep 01, 2007 at 10:46:51PM -0400, Matthew Wilcox wrote:
> Abstracting away direct uses of TASK_ flags allows us to change the
> definitions of the task flags more easily.
>
> Also restructure do_wait() a little
>
> Signed-off-by: Matthew Wilcox <[email protected]>
> ---
> arch/ia64/kernel/perfmon.c | 4 +-
> fs/proc/array.c | 9 +---
> fs/proc/base.c | 2 +-
> include/linux/sched.h | 15 +++++++
> include/linux/wait.h | 11 +++--
> kernel/exit.c | 90 +++++++++++++++++++------------------------
> kernel/power/process.c | 7 +--
> kernel/ptrace.c | 8 ++--
> kernel/sched.c | 15 +++----
> kernel/signal.c | 6 +-
> kernel/wait.c | 2 +-
> 11 files changed, 83 insertions(+), 86 deletions(-)

Here's a replacement patch 2/5, including fixes for the problems spotted
by Daniel and myself. A machine running a kernel including these patches
has been up for more than 24 hours.

commit 10287ac5d615a2ccca6611426572259ebc69dc92
Author: Matthew Wilcox <[email protected]>
Date: Sat Sep 1 09:31:22 2007 -0400

Use macros instead of TASK_ flags

Abstracting away direct uses of TASK_ flags allows us to change the
definitions of the task flags more easily.

Also restructure do_wait() a little

Signed-off-by: Matthew Wilcox <[email protected]>

diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c
index 14b8e5a..3690c46 100644
--- a/arch/ia64/kernel/perfmon.c
+++ b/arch/ia64/kernel/perfmon.c
@@ -2637,7 +2637,7 @@ pfm_task_incompatible(pfm_context_t *ctx, struct task_struct *task)
*/
if (task == current) return 0;

- if ((task->state != TASK_STOPPED) && (task->state != TASK_TRACED)) {
+ if (!is_task_stopped_or_traced(task)) {
DPRINT(("cannot attach to non-stopped task [%d] state=%ld\n", task->pid, task->state));
return -EBUSY;
}
@@ -4797,7 +4797,7 @@ recheck:
* the task must be stopped.
*/
if (PFM_CMD_STOPPED(cmd)) {
- if ((task->state != TASK_STOPPED) && (task->state != TASK_TRACED)) {
+ if (!is_task_stopped_or_traced(task)) {
DPRINT(("[%d] task not in stopped state\n", task->pid));
return -EBUSY;
}
diff --git a/fs/proc/array.c b/fs/proc/array.c
index ee4814d..6a3c876 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -140,13 +140,8 @@ static const char *task_state_array[] = {

static inline const char *get_task_state(struct task_struct *tsk)
{
- unsigned int state = (tsk->state & (TASK_RUNNING |
- TASK_INTERRUPTIBLE |
- TASK_UNINTERRUPTIBLE |
- TASK_STOPPED |
- TASK_TRACED)) |
- (tsk->exit_state & (EXIT_ZOMBIE |
- EXIT_DEAD));
+ unsigned int state = (tsk->state & TASK_REPORT) |
+ (tsk->exit_state & (EXIT_ZOMBIE | EXIT_DEAD));
const char **p = &task_state_array[0];

while (state) {
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 19489b0..3155ef1 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -196,7 +196,7 @@ static int proc_root_link(struct inode *inode, struct dentry **dentry, struct vf
(task == current || \
(task->parent == current && \
(task->ptrace & PT_PTRACED) && \
- (task->state == TASK_STOPPED || task->state == TASK_TRACED) && \
+ (is_task_stopped_or_traced(task)) && \
security_ptrace(current,task) == 0))

static int proc_pid_environ(struct task_struct *task, char * buffer)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f4e324e..ea509e9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -176,6 +176,21 @@ print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
#define TASK_NONINTERACTIVE 64
#define TASK_DEAD 128

+/* Convenience macros for the sake of wake_up */
+#define TASK_NORMAL (TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE)
+#define TASK_ALL (TASK_NORMAL | TASK_STOPPED | TASK_TRACED)
+
+/* get_task_state() */
+#define TASK_REPORT (TASK_RUNNING | TASK_INTERRUPTIBLE | \
+ TASK_UNINTERRUPTIBLE | TASK_STOPPED | \
+ TASK_TRACED)
+
+#define is_task_traced(task) ((task->state & TASK_TRACED) != 0)
+#define is_task_stopped(task) ((task->state & TASK_STOPPED) != 0)
+#define is_task_stopped_or_traced(task) \
+ ((task->state & (TASK_STOPPED | TASK_TRACED)) != 0)
+#define is_task_loadavg(task) ((task->state & TASK_UNINTERRUPTIBLE) != 0)
+
#define __set_task_state(tsk, state_value) \
do { (tsk)->state = (state_value); } while (0)
#define set_task_state(tsk, state_value) \
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 0e68628..0a410a4 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -152,14 +152,15 @@ int FASTCALL(out_of_line_wait_on_bit(void *, int, int (*)(void *), unsigned));
int FASTCALL(out_of_line_wait_on_bit_lock(void *, int, int (*)(void *), unsigned));
wait_queue_head_t *FASTCALL(bit_waitqueue(void *, int));

-#define wake_up(x) __wake_up(x, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 1, NULL)
-#define wake_up_nr(x, nr) __wake_up(x, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, nr, NULL)
-#define wake_up_all(x) __wake_up(x, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 0, NULL)
+#define wake_up(x) __wake_up(x, TASK_NORMAL, 1, NULL)
+#define wake_up_nr(x, nr) __wake_up(x, TASK_NORMAL, nr, NULL)
+#define wake_up_all(x) __wake_up(x, TASK_NORMAL, 0, NULL)
+#define wake_up_locked(x) __wake_up_locked((x), TASK_NORMAL)
+
#define wake_up_interruptible(x) __wake_up(x, TASK_INTERRUPTIBLE, 1, NULL)
#define wake_up_interruptible_nr(x, nr) __wake_up(x, TASK_INTERRUPTIBLE, nr, NULL)
#define wake_up_interruptible_all(x) __wake_up(x, TASK_INTERRUPTIBLE, 0, NULL)
-#define wake_up_locked(x) __wake_up_locked((x), TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE)
-#define wake_up_interruptible_sync(x) __wake_up_sync((x),TASK_INTERRUPTIBLE, 1)
+#define wake_up_interruptible_sync(x) __wake_up_sync((x), TASK_INTERRUPTIBLE, 1)

#define __wait_event(wq, condition) \
do { \
diff --git a/kernel/exit.c b/kernel/exit.c
index 06b24b3..325106a 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -259,7 +259,7 @@ static int has_stopped_jobs(struct pid *pgrp)
struct task_struct *p;

do_each_pid_task(pgrp, PIDTYPE_PGID, p) {
- if (p->state != TASK_STOPPED)
+ if (!is_task_stopped(p))
continue;
retval = 1;
break;
@@ -634,7 +634,7 @@ reparent_thread(struct task_struct *p, struct task_struct *father, int traced)
p->parent = p->real_parent;
add_parent(p);

- if (p->state == TASK_TRACED) {
+ if (is_task_traced(p)) {
/*
* If it was at a trace stop, turn it into
* a normal stop since it's no longer being
@@ -1372,7 +1372,7 @@ static int wait_task_stopped(struct task_struct *p, int delayed_group_leader,

exit_code = p->exit_code;
if (unlikely(!exit_code) ||
- unlikely(p->state & TASK_TRACED))
+ unlikely(is_task_traced(p)))
goto bail_ref;
return wait_noreap_copyout(p, pid, uid,
why, (exit_code << 8) | 0x7f,
@@ -1554,60 +1554,51 @@ repeat:
}
allowed = 1;

- switch (p->state) {
- case TASK_TRACED:
- /*
- * When we hit the race with PTRACE_ATTACH,
- * we will not report this child. But the
- * race means it has not yet been moved to
- * our ptrace_children list, so we need to
- * set the flag here to avoid a spurious ECHILD
- * when the race happens with the only child.
- */
- flag = 1;
- if (!my_ptrace_child(p))
- continue;
- /*FALLTHROUGH*/
- case TASK_STOPPED:
+ if (is_task_stopped_or_traced(p)) {
/*
* It's stopped now, so it might later
* continue, exit, or stop again.
+ *
+ * When we hit the race with PTRACE_ATTACH, we
+ * will not report this child. But the race
+ * means it has not yet been moved to our
+ * ptrace_children list, so we need to set the
+ * flag here to avoid a spurious ECHILD when
+ * the race happens with the only child.
*/
flag = 1;
- if (!(options & WUNTRACED) &&
- !my_ptrace_child(p))
- continue;
+
+ if (!my_ptrace_child(p)) {
+ if (is_task_traced(p))
+ continue;
+ if (!(options & WUNTRACED))
+ continue;
+ }
+
retval = wait_task_stopped(p, ret == 2,
- (options & WNOWAIT),
- infop,
- stat_addr, ru);
+ (options & WNOWAIT), infop,
+ stat_addr, ru);
if (retval == -EAGAIN)
goto repeat;
if (retval != 0) /* He released the lock. */
goto end;
- break;
- default:
- // case EXIT_DEAD:
- if (p->exit_state == EXIT_DEAD)
+ } else if (p->exit_state == EXIT_DEAD) {
+ continue;
+ } else if (p->exit_state == EXIT_ZOMBIE) {
+ /*
+ * Eligible but we cannot release it yet:
+ */
+ if (ret == 2)
+ goto check_continued;
+ if (!likely(options & WEXITED))
continue;
- // case EXIT_ZOMBIE:
- if (p->exit_state == EXIT_ZOMBIE) {
- /*
- * Eligible but we cannot release
- * it yet:
- */
- if (ret == 2)
- goto check_continued;
- if (!likely(options & WEXITED))
- continue;
- retval = wait_task_zombie(
- p, (options & WNOWAIT),
- infop, stat_addr, ru);
- /* He released the lock. */
- if (retval != 0)
- goto end;
- break;
- }
+ retval = wait_task_zombie(p,
+ (options & WNOWAIT), infop,
+ stat_addr, ru);
+ /* He released the lock. */
+ if (retval != 0)
+ goto end;
+ } else {
check_continued:
/*
* It's running now, so it might later
@@ -1616,12 +1607,11 @@ check_continued:
flag = 1;
if (!unlikely(options & WCONTINUED))
continue;
- retval = wait_task_continued(
- p, (options & WNOWAIT),
- infop, stat_addr, ru);
+ retval = wait_task_continued(p,
+ (options & WNOWAIT), infop,
+ stat_addr, ru);
if (retval != 0) /* He released the lock. */
goto end;
- break;
}
}
if (!flag) {
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 3434940..ac0c27a 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -83,10 +83,10 @@ static void freeze_task(struct task_struct *p)
rmb();
if (!frozen(p)) {
set_freeze_flag(p);
- if (p->state == TASK_STOPPED)
+ if (is_task_stopped(p))
force_sig_specific(SIGSTOP, p);
spin_lock_irqsave(&p->sighand->siglock, flags);
- signal_wake_up(p, p->state == TASK_STOPPED);
+ signal_wake_up(p, is_task_stopped(p));
spin_unlock_irqrestore(&p->sighand->siglock, flags);
}
}
@@ -120,8 +120,7 @@ static int try_to_freeze_tasks(int freeze_user_space)
continue;

if (freeze_user_space) {
- if (p->state == TASK_TRACED &&
- frozen(p->parent)) {
+ if (is_task_traced(p) && frozen(p->parent)) {
cancel_freezing(p);
continue;
}
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 82a558b..92a2283 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -50,7 +50,7 @@ void __ptrace_link(struct task_struct *child, struct task_struct *new_parent)
void ptrace_untrace(struct task_struct *child)
{
spin_lock(&child->sighand->siglock);
- if (child->state == TASK_TRACED) {
+ if (is_task_traced(child)) {
if (child->signal->flags & SIGNAL_STOP_STOPPED) {
child->state = TASK_STOPPED;
} else {
@@ -78,7 +78,7 @@ void __ptrace_unlink(struct task_struct *child)
add_parent(child);
}

- if (child->state == TASK_TRACED)
+ if (is_task_traced(child))
ptrace_untrace(child);
}

@@ -102,9 +102,9 @@ int ptrace_check_attach(struct task_struct *child, int kill)
&& child->signal != NULL) {
ret = 0;
spin_lock_irq(&child->sighand->siglock);
- if (child->state == TASK_STOPPED) {
+ if (is_task_stopped(child)) {
child->state = TASK_TRACED;
- } else if (child->state != TASK_TRACED && !kill) {
+ } else if (!is_task_traced(child) && !kill) {
ret = -ESRCH;
}
spin_unlock_irq(&child->sighand->siglock);
diff --git a/kernel/sched.c b/kernel/sched.c
index b533d6d..e3be352 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -944,7 +944,7 @@ static int effective_prio(struct task_struct *p)
*/
static void activate_task(struct rq *rq, struct task_struct *p, int wakeup)
{
- if (p->state == TASK_UNINTERRUPTIBLE)
+ if (is_task_loadavg(p))
rq->nr_uninterruptible--;

enqueue_task(rq, p, wakeup);
@@ -958,7 +958,7 @@ static inline void activate_idle_task(struct task_struct *p, struct rq *rq)
{
update_rq_clock(rq);

- if (p->state == TASK_UNINTERRUPTIBLE)
+ if (is_task_loadavg(p))
rq->nr_uninterruptible--;

enqueue_task(rq, p, 0);
@@ -970,7 +970,7 @@ static inline void activate_idle_task(struct task_struct *p, struct rq *rq)
*/
static void deactivate_task(struct rq *rq, struct task_struct *p, int sleep)
{
- if (p->state == TASK_UNINTERRUPTIBLE)
+ if (is_task_loadavg(p))
rq->nr_uninterruptible++;

dequeue_task(rq, p, sleep);
@@ -1566,8 +1566,7 @@ out:

int fastcall wake_up_process(struct task_struct *p)
{
- return try_to_wake_up(p, TASK_STOPPED | TASK_TRACED |
- TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE, 0);
+ return try_to_wake_up(p, TASK_ALL, 0);
}
EXPORT_SYMBOL(wake_up_process);

@@ -3708,8 +3707,7 @@ void fastcall complete(struct completion *x)

spin_lock_irqsave(&x->wait.lock, flags);
x->done++;
- __wake_up_common(&x->wait, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE,
- 1, 0, NULL);
+ __wake_up_common(&x->wait, TASK_NORMAL, 1, 0, NULL);
spin_unlock_irqrestore(&x->wait.lock, flags);
}
EXPORT_SYMBOL(complete);
@@ -3720,8 +3718,7 @@ void fastcall complete_all(struct completion *x)

spin_lock_irqsave(&x->wait.lock, flags);
x->done += UINT_MAX/2;
- __wake_up_common(&x->wait, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE,
- 0, 0, NULL);
+ __wake_up_common(&x->wait, TASK_NORMAL, 0, 0, NULL);
spin_unlock_irqrestore(&x->wait.lock, flags);
}
EXPORT_SYMBOL(complete_all);
diff --git a/kernel/signal.c b/kernel/signal.c
index 3169bed..53cbac4 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -841,7 +841,7 @@ static inline int wants_signal(int sig, struct task_struct *p)
return 0;
if (sig == SIGKILL)
return 1;
- if (p->state & (TASK_STOPPED | TASK_TRACED))
+ if (is_task_stopped_or_traced(p))
return 0;
return task_curr(p) || !signal_pending(p);
}
@@ -1445,7 +1445,7 @@ void do_notify_parent(struct task_struct *tsk, int sig)
BUG_ON(sig == -1);

/* do_notify_parent_cldstop should have been called instead. */
- BUG_ON(tsk->state & (TASK_STOPPED|TASK_TRACED));
+ BUG_ON(is_task_stopped_or_traced(tsk));

BUG_ON(!tsk->ptrace &&
(tsk->group_leader != tsk || !thread_group_empty(tsk)));
@@ -1712,7 +1712,7 @@ static int do_signal_stop(int signr)
* so this check has no races.
*/
if (!t->exit_state &&
- !(t->state & (TASK_STOPPED|TASK_TRACED))) {
+ !is_task_stopped_or_traced(t)) {
stop_count++;
signal_wake_up(t, 0);
}
diff --git a/kernel/wait.c b/kernel/wait.c
index 444ddbf..f987688 100644
--- a/kernel/wait.c
+++ b/kernel/wait.c
@@ -215,7 +215,7 @@ void fastcall __wake_up_bit(wait_queue_head_t *wq, void *word, int bit)
{
struct wait_bit_key key = __WAIT_BIT_KEY_INITIALIZER(word, bit);
if (waitqueue_active(wq))
- __wake_up(wq, TASK_INTERRUPTIBLE|TASK_UNINTERRUPTIBLE, 1, &key);
+ __wake_up(wq, TASK_NORMAL, 1, &key);
}
EXPORT_SYMBOL(__wake_up_bit);


--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2007-09-24 20:26:17

by Bob Bell

[permalink] [raw]
Subject: Re: [PATCH] TASK_KILLABLE version 2

On Sat, Sep 01, 2007 at 08:43:49PM -0600, Matthew Wilcox wrote:
> Here's the second version of TASK_KILLABLE. A few changes since version 1:
<snip>
> I obviously haven't covered every place that can result in a process
> sleeping uninterruptibly while attempting an operation. But sync_page
> (patch 4/5) covers about 90% of the times I've attempted to kill cat,
> and I hope that by providing the two examples, I can help other people
> to fix the cases that they find interesting.

I've been testing this patch on my systems. It's working for me when
I read() a file. Asynchronous write()s seem okay, too. However,
synchronous writes (caused by either calling fsync() or fcntl() to
release a lock) prevent the process from being killed when the NFS
server goes down.

When the process is sent SIGKILL, it's waiting with the following call
tree:
do_fsync
nfs_fsync
nfs_wb_all
nfs_sync_mapping_wait
nfs_wait_on_requests_locks (I believe)
nfs_wait_on_request
out_of_line_wait_on_bit
__wait_on_bit
nfs_wait_bit_interruptible
schedule

When the process is later viewed after being deemed "stuck", it's
waiting with the following call tree:
do_fsync
filemap_fdatawait
wait_on_page_writeback_range
wait_on_page_writeback
wait_on_page_bit
__wait_on_bit
sync_page
io_schedule
schedule

If I hazard a guess as to what might be wrong here, I believe that when
the processes catches SIGKILL, nfs_wait_bit_interruptible is returning
-ERESTARTSYS. That error bubbles back up to nfs_fsync. However,
nfs_fsync returns ctx->error, not -ERESTARTSYS, and ctx->error is 0.
do_fsync proceeds to call filemap_fdatawait. I question whether
nfs_sync should return an error, and if do_fsync should skip
filemap_fdatawait if the fsync op returned an error.

I did try replacing the call to sync_page in __wait_on_bit with
sync_page_killable and replacing TASK_UNINTERRUPTIBLE with
TASK_KILLABLE. That seemed to work once, but then really screwed things
up on subsequent attempts.

--
Bob Bell

2007-09-26 12:01:49

by Ric Wheeler

[permalink] [raw]
Subject: Re: [PATCH] TASK_KILLABLE version 2

Bob Bell wrote:
> On Sat, Sep 01, 2007 at 08:43:49PM -0600, Matthew Wilcox wrote:
>> Here's the second version of TASK_KILLABLE. A few changes since
>> version 1:
> <snip>
>> I obviously haven't covered every place that can result in a process
>> sleeping uninterruptibly while attempting an operation. But sync_page
>> (patch 4/5) covers about 90% of the times I've attempted to kill cat,
>> and I hope that by providing the two examples, I can help other people
>> to fix the cases that they find interesting.
>
> I've been testing this patch on my systems. It's working for me when
> I read() a file. Asynchronous write()s seem okay, too. However,
> synchronous writes (caused by either calling fsync() or fcntl() to
> release a lock) prevent the process from being killed when the NFS
> server goes down.

After hearing again last month about how few people actually read every
lkml thread, I wanted to point you all at this thread explicitly since
it seems that we are getting somewhat close to having a forced unmount
that actually is usable by real applications, something that we seem to
have been talking about for many years ;-)

With Matthew's original TASK_KILLABLE patch, we have a solution for a
task read, but still have some holes (fsync & fcntl, others?) that need
fixed as well for NFS clients.

Is this patch going in the right direction?

ric



2007-09-28 14:18:37

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] TASK_KILLABLE version 2

On Wednesday 26 September 2007 21:57, Ric Wheeler wrote:
> Bob Bell wrote:
> > On Sat, Sep 01, 2007 at 08:43:49PM -0600, Matthew Wilcox wrote:
> >> Here's the second version of TASK_KILLABLE. A few changes since
> >> version 1:
> >
> > <snip>
> >
> >> I obviously haven't covered every place that can result in a process
> >> sleeping uninterruptibly while attempting an operation. But sync_page
> >> (patch 4/5) covers about 90% of the times I've attempted to kill cat,
> >> and I hope that by providing the two examples, I can help other people
> >> to fix the cases that they find interesting.
> >
> > I've been testing this patch on my systems. It's working for me when
> > I read() a file. Asynchronous write()s seem okay, too. However,
> > synchronous writes (caused by either calling fsync() or fcntl() to
> > release a lock) prevent the process from being killed when the NFS
> > server goes down.
>
> After hearing again last month about how few people actually read every
> lkml thread, I wanted to point you all at this thread explicitly since
> it seems that we are getting somewhat close to having a forced unmount
> that actually is usable by real applications, something that we seem to
> have been talking about for many years ;-)
>
> With Matthew's original TASK_KILLABLE patch, we have a solution for a
> task read, but still have some holes (fsync & fcntl, others?) that need
> fixed as well for NFS clients.
>
> Is this patch going in the right direction?

FWIW, I do think it seems like a good idea to work towards better
interruptibility in various potentially long-blocking paths like these.
I think Andrea's recent work to solve some oom killer deadlocks
probably has some requirements in common with this patch.

2007-12-07 01:49:30

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] TASK_KILLABLE version 2

On Mon, Sep 24, 2007 at 04:16:48PM -0400, Bob Bell wrote:
> I've been testing this patch on my systems. It's working for me when
> I read() a file. Asynchronous write()s seem okay, too. However,
> synchronous writes (caused by either calling fsync() or fcntl() to
> release a lock) prevent the process from being killed when the NFS
> server goes down.
>
> When the process is sent SIGKILL, it's waiting with the following call
> tree:
> do_fsync
> nfs_fsync
> nfs_wb_all
> nfs_sync_mapping_wait
> nfs_wait_on_requests_locks (I believe)
> nfs_wait_on_request
> out_of_line_wait_on_bit
> __wait_on_bit
> nfs_wait_bit_interruptible
> schedule
>
> When the process is later viewed after being deemed "stuck", it's
> waiting with the following call tree:
> do_fsync
> filemap_fdatawait
> wait_on_page_writeback_range
> wait_on_page_writeback
> wait_on_page_bit
> __wait_on_bit
> sync_page
> io_schedule
> schedule
>
> If I hazard a guess as to what might be wrong here, I believe that when
> the processes catches SIGKILL, nfs_wait_bit_interruptible is returning
> -ERESTARTSYS. That error bubbles back up to nfs_fsync. However,
> nfs_fsync returns ctx->error, not -ERESTARTSYS, and ctx->error is 0.
> do_fsync proceeds to call filemap_fdatawait. I question whether
> nfs_sync should return an error, and if do_fsync should skip
> filemap_fdatawait if the fsync op returned an error.
>
> I did try replacing the call to sync_page in __wait_on_bit with
> sync_page_killable and replacing TASK_UNINTERRUPTIBLE with
> TASK_KILLABLE. That seemed to work once, but then really screwed things
> up on subsequent attempts.

Hi Bob,

The patch I posted earlier today (a somewhat cleaned-up version of a
patch originally from Liam Howlett) converts NFS to use TASK_KILLABLE.
Could you have another shot at breaking it? You can find it here:

http://marc.info/?l=linux-fsdevel&m=119698206729969&w=2

(It has some prerequisites that are in the git tree, so probably the
easiest thing is just to pull that git tree).

Liam notes that there's still problems with programs that call *stat*(),
but I haven't looked into that issue yet.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."