Date: Sun, 28 Nov 2010 21:51:34 +0100
From: Jan Kratochvil <jan.kratochvil@redhat.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>, roland@redhat.com, linux-kernel@vger.kernel.org,
        torvalds@linux-foundation.org, akpm@linux-foundation.org
Subject: Re: [PATCH 09/14] ptrace: clean transitions between TASK_STOPPED and
 TRACED
Message-ID: <20101128205134.GA16858@host0.dyn.jankratochvil.net>
References: <1290768569-16224-1-git-send-email-tj@kernel.org>
 <1290768569-16224-10-git-send-email-tj@kernel.org>
 <20101128202535.GC12896@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20101128202535.GC12896@redhat.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3622
Lines: 95

On Sun, 28 Nov 2010 21:25:35 +0100, Oleg Nesterov wrote:
> On 11/26, Tejun Heo wrote:
> > +	/*
> > +	 * If the task is already STOPPED, set GROUP_STOP_PENDING and
> > +	 * kick it so that it transits to TRACED.  This is safe as
> > +	 * both transitions in and out of STOPPED are protected by
> > +	 * siglock.
> > +	 */
> > +	spin_lock(&task->sighand->siglock);
> > +	if (task_is_stopped(task)) {
> > +		task->group_stop |= GROUP_STOP_PENDING;
> > +		signal_wake_up(task, 1);
> 
> OK. Now we have a window if the tracer attaches to the stopped task.
> 
> Say,
> 
> 	child = fork()
> 
> 	if (!child)
> 		return child_do_something();
> 
> 	kill(child, SIGSTOP);
> 	wait();			// <--- ensures it is stopped
> 
> 	ptrace(PTRACE_ATTACH, child);
> 
> 	assert(ptrace(PTRACE_WHATEVER, child) == 0);
> 
> Currently this code is correct. With this patch the assertion above
> can fail, the child may be running, changing its state from STOPPED
> to TRACED.

GDB now has code (as rewritten by Daniel Jacobowitz):

ptrace (PTRACE_ATTACH) has been done and:
linux_nat_post_attach_wait:
  if (pid_is_stopped (pid)) ### it means `State' is `T (stopped)'
    {
      if (debug_linux_nat)
        fprintf_unfiltered (gdb_stdlog,
                            "LNPAW: Attaching to a stopped process\n");

      /* The process is definitely stopped.  It is in a job control
         stop, unless the kernel predates the TASK_STOPPED /
         TASK_TRACED distinction, in which case it might be in a
         ptrace stop.  Make sure it is in a ptrace stop; from there we
         can kill it, signal it, et cetera.

         First make sure there is a pending SIGSTOP.  Since we are
         already attached, the process can not transition from stopped
         to running without a PTRACE_CONT; so we know this signal will
         go into the queue.  The SIGSTOP generated by PTRACE_ATTACH is
         probably already in the queue (unless this kernel is old
         enough to use TASK_STOPPED for ptrace stops); but since SIGSTOP
         is not an RT signal, it can only be queued once.  */
      kill_lwp (pid, SIGSTOP);

      /* Finally, resume the stopped process.  This will deliver the SIGSTOP
         (or a higher priority signal, just like normal PTRACE_ATTACH).  */
      ptrace (PTRACE_CONT, pid, 0, 0); ### line B
    }

### point A

  /* Make sure the initial process is stopped.  The user-level threads
     layer might want to poke around in the inferior, and that won't
     work if things haven't stabilized yet.  */
  new_pid = my_waitpid (pid, &status, 0); ### this is in fact waitpid()


The problem is this code is already racy.  At `point A' someone may
`kill (tracee, SIGSTOP)', `waitpid (tracee)' -- eats SIGSTOP, and then
my_waitpid() will hang because the <pid_is_stopped (pid)> block was not
executed to prevent it.

So if we are already discussing the ptrace race safety I would prefer some
kernel ptrace API suggestion how to safely race-less attach to a task, with
the task being in any state - unstopped, T (stopped) with pending SIGSTOP and
T (stopped) with already eaten SIGSTOP.

Specifically as a reply to your mail I guess the `line B' maybe could fail, if
the tracee was very freshly `kill (tracee, SIGSTOP)' and `waitpid (tracee)',
and thus the new `kill_lwp (pid, SIGSTOP)' delivery would not get into effect
for the `my_waitpid()' line.


Thanks,
Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/