Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp3649665ybf; Tue, 3 Mar 2020 09:46:48 -0800 (PST) X-Google-Smtp-Source: ADFU+vu6C7uFeB6JYt707ceKGEyN8AcPzkQzfr/jlubGNqNswLdsb41Is7YQq5sm1o4NKDp2KPOJ X-Received: by 2002:aca:f10:: with SMTP id 16mr3125855oip.117.1583257608397; Tue, 03 Mar 2020 09:46:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583257608; cv=none; d=google.com; s=arc-20160816; b=soRciiXrJUlxLSyTFNyIXF8s6tjiaAQOaOSARaXDz+8DVsin+kBWdfYKSm6Cpb16WD XOoePToXe3FBP09TyCfsJiBOCo2WnxZ6b/2kjwPLdTvblT/V+2zP5WNkh7V2Wq7ksPHm k4Nndj68H1c9iKTp0TpbIYVSFLUQz8kXwPP50CbiqTMHAGsIL9Tk+6QrLSj9mqBKmFDj OOaxDnmb01aeDwCGQk9jQSReyml/dmD14eKP7F4ynDkDXGRDAy4XamOyqCHcsr1Ep7Xq 18y+QlLfIfkcSRuafH7wvIrGqHnycNnoZM5BxjPwPrcvaqULwSYTO/TlJPIzj2Ip7Gr4 OZAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=y/S6CJmUg34p8vkA5DhXrlcDT+4joRR13Yywgw9wR4U=; b=NcGnd40P979RpToRP2SuJ5LWAo0T7ERMwHVw7M5+YDwf1Ar2kSNf7+fPqy4LcfId2y EJId127VImNTLoXh6AEp7Bds/HY0+ehl7y+ZwFGxDWPrnGj7FqC8X25NrrSCCbHimDgG WwzkUY7m9W3Fjx/Fx5vKi8z654Se96AFKxJOx/UvgWPbLUjqCzZBkj9Cl2FhrDSQFiez vAAqT3zFQJyyigkHfl9ZAjljMlTTrDK9YtVvNYrYmjkdldMEoIV440AsQGCL1vRgGI13 zuLkSlvsn94ZHl8/PEkOWVTI2vrDd61Hc4/nX16E/nhzXgwqqN2jWU8q9imaQHJQhU5O /FLg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x10si7785617oic.136.2020.03.03.09.46.35; Tue, 03 Mar 2020 09:46:48 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730055AbgCCRV3 (ORCPT + 99 others); Tue, 3 Mar 2020 12:21:29 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:50095 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729786AbgCCRV2 (ORCPT ); Tue, 3 Mar 2020 12:21:28 -0500 Received: from ip5f5bf7ec.dynamic.kabel-deutschland.de ([95.91.247.236] helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1j9BDx-0006cR-KJ; Tue, 03 Mar 2020 17:20:53 +0000 Date: Tue, 3 Mar 2020 18:20:52 +0100 From: Christian Brauner To: Bernd Edlinger Cc: "Eric W. Biederman" , Kees Cook , Jann Horn , Jonathan Corbet , Alexander Viro , Andrew Morton , Alexey Dobriyan , Thomas Gleixner , Oleg Nesterov , Frederic Weisbecker , Andrei Vagin , Ingo Molnar , "Peter Zijlstra (Intel)" , Yuyang Du , David Hildenbrand , Sebastian Andrzej Siewior , Anshuman Khandual , David Howells , James Morris , Greg Kroah-Hartman , Shakeel Butt , Jason Gunthorpe , Christian Kellner , Andrea Arcangeli , Aleksa Sarai , "Dmitry V. Levin" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" , "stable@vger.kernel.org" , "linux-api@vger.kernel.org" Subject: Re: [PATCHv5] exec: Fix a deadlock in ptrace Message-ID: <20200303172052.fvqk4r7vbtxwo3ig@wittgenstein> References: <875zfmloir.fsf@x220.int.ebiederm.org> <87v9nmjulm.fsf@x220.int.ebiederm.org> <202003021531.C77EF10@keescook> <20200303085802.eqn6jbhwxtmz4j2x@wittgenstein> <87v9nlii0b.fsf@x220.int.ebiederm.org> <20200303170109.y6q2acgydyzuh3mp@wittgenstein> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200303170109.y6q2acgydyzuh3mp@wittgenstein> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 03, 2020 at 06:01:11PM +0100, Christian Brauner wrote: > On Tue, Mar 03, 2020 at 04:48:01PM +0000, Bernd Edlinger wrote: > > On 3/3/20 4:18 PM, Eric W. Biederman wrote: > > > Bernd Edlinger writes: > > > > > >> This fixes a deadlock in the tracer when tracing a multi-threaded > > >> application that calls execve while more than one thread are running. > > >> > > >> I observed that when running strace on the gcc test suite, it always > > >> blocks after a while, when expect calls execve, because other threads > > >> have to be terminated. They send ptrace events, but the strace is no > > >> longer able to respond, since it is blocked in vm_access. > > >> > > >> The deadlock is always happening when strace needs to access the > > >> tracees process mmap, while another thread in the tracee starts to > > >> execve a child process, but that cannot continue until the > > >> PTRACE_EVENT_EXIT is handled and the WIFEXITED event is received: > > > > > > A couple of things. > > > > > > Why do we think it is safe to change the behavior exposed to userspace? > > > Not the deadlock but all of the times the current code would not > > > deadlock? > > > > > > Especially given that this is a small window it might be hard for people > > > to track down and report so we need a strong argument that this won't > > > break existing userspace before we just change things. > > > > > > > Hmm, I tend to agree. > > > > > Usually surveying all of the users of a system call that we can find > > > and checking to see if they might be affected by the change in behavior > > > is difficult enough that we usually opt for not being lazy and > > > preserving the behavior. > > > > > > This patch is up to two changes in behavior now, that could potentially > > > affect a whole array of programs. Adding linux-api so that this change > > > in behavior can be documented if/when this change goes through. > > > > > > > One is PTRACE_ACCESS possibly returning EAGAIN, yes. > > > > We could try to restrict that behavior change to when any > > thread is ptraced when execve starts, can't be too complicated. > > > > > > But the other is only SYS_seccomp returning EAGAIN, when a different > > thread of the current process is calling execve at the same time. > > > > I would consider it completely impossible to have any user-visual effect, > > since de_thread is just terminating all threads, including the thread > > where the -EAGAIN was returned, so we will never know what happened. > > I think if we risk a user-space facing change we should try the simple > thing first before making the fix more convoluted? But it's a tough > call... Actually, to get a _rough_ estimate of the possible impact I would recommend you run the criu test suite (and possible the strace test-suite) on a kernel with and without your fix. That's what I tend to do when I touch code I fear will have impact on APIs that very deeply touch core kernel. Criu's test-suite makes heavy use of ptrace and usually runs into a bunch of interesting (exec) races too, and does have tests for handling zombies processes etc. pp. Should be relatively simple: create a vm and then criu build-dependencies, git clone criu; cd criu; make; cd test; ./zdtm.py run -a --keep-going If your system doesn't support Selinux properly, you need to disable it when running the tests and you also need to make sure that you're using python3 or change the shebang in zdtm.py to python3. Just a recommendation. Christian