Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp4193356pxj; Mon, 21 Jun 2021 16:07:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxEsPTcVIP0Lw3XB14uazXxgHwYONWQZQz5m39NdX0BG4ds0oO1JTUS8vQ/CNwj/0PsK0Bi X-Received: by 2002:a05:6402:b7a:: with SMTP id cb26mr970600edb.184.1624316856679; Mon, 21 Jun 2021 16:07:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1624316856; cv=none; d=google.com; s=arc-20160816; b=sVYLXRjZR5y/y5f7j2L59Gl5RQPCvbLyc3vR4WEbuT5sRQ3A5dx8hljR8FJ43ttxjQ W/43NGKFWzUJpH0fpK+QW77cSpDFnpcfJtBzkFVb1XVUS+cSjiGiHruFycxTh3vQN8J/ 7pkz06EhQrLzxW1V2PkaO7xYSWgx2fK+jNwluf1puY6fLPuPum/FIFtax/+VNdQR22rY oQINhlvGu5+unzAjCqZWgZi/3aGagrtiaeO7oKQwjyMvMdYPSafvY0lux6FTXAG0pY72 GzNigvBnkIcNM/mvsbI9A+qnsnM/01wHUulWeyy3Yp+W4yaCK8enCiuepXiwHF3a8GWI eXzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=FCkg1byVpg56XUFWzN9pEkUxbUVz7341WM9ZI3tG9uA=; b=jWSwbMZbyPytZwcvRziQH2yY8R1qOm9qusWUZT0dUwquRx9Z6f7Vo49XvEYchxYmbE gU5h+ahyM55QrJ+dJrzA9OPozY8ilo9MU1/NWdcuCaH4wc2dXMQ4T9f9B6mHur7TAY/t 8/MqdAHhXdMcwpwLapRMLJ6zB+R1kM/Nk7FTLFQyVlrfT4/2bM15/e6vOREP79opEa7N c4hRx74iR4rNG0zmXZmVjoDXx09DamZya8COcU4PliJ9tihdQQN4f+XKUVLntI4zA0Jm t5a+wLQBWdsOY5cnl1JcG4PZoUGBLEoRQxjl1Rs/fAfppzf+/smNbAjJCbUJeeka/WHw /9lw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f18si17967495edq.570.2021.06.21.16.07.02; Mon, 21 Jun 2021 16:07:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230433AbhFUXIS (ORCPT + 99 others); Mon, 21 Jun 2021 19:08:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34642 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229915AbhFUXIR (ORCPT ); Mon, 21 Jun 2021 19:08:17 -0400 Received: from zeniv-ca.linux.org.uk (zeniv-ca.linux.org.uk [IPv6:2607:5300:60:148a::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8CEF6C061574; Mon, 21 Jun 2021 16:06:02 -0700 (PDT) Received: from viro by zeniv-ca.linux.org.uk with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1lvSyp-00AzNi-P9; Mon, 21 Jun 2021 23:05:23 +0000 Date: Mon, 21 Jun 2021 23:05:23 +0000 From: Al Viro To: "Eric W. Biederman" Cc: Linus Torvalds , Michael Schmitz , linux-arch , Jens Axboe , Oleg Nesterov , Linux Kernel Mailing List , Richard Henderson , Ivan Kokshaysky , Matt Turner , alpha , Geert Uytterhoeven , linux-m68k , Arnd Bergmann , Ley Foon Tan , Tejun Heo , Kees Cook Subject: Re: Kernel stack read with PTRACE_EVENT_EXIT and io_uring threads Message-ID: References: <6e47eff8-d0a4-8390-1222-e975bfbf3a65@gmail.com> <924ec53c-2fd9-2e1c-bbb1-3fda49809be4@gmail.com> <87eed4v2dc.fsf@disp2133> <5929e116-fa61-b211-342a-c706dcb834ca@gmail.com> <87fsxjorgs.fsf@disp2133> <87czsfi2kv.fsf@disp2133> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87czsfi2kv.fsf@disp2133> Sender: Al Viro Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 21, 2021 at 11:50:56AM -0500, Eric W. Biederman wrote: > Al Viro writes: > > > On Mon, Jun 21, 2021 at 01:54:56PM +0000, Al Viro wrote: > >> On Tue, Jun 15, 2021 at 02:58:12PM -0700, Linus Torvalds wrote: > >> > >> > And I think our horrible "kernel threads return to user space when > >> > done" is absolutely horrifically nasty. Maybe of the clever sort, but > >> > mostly of the historical horror sort. > >> > >> How would you prefer to handle that, then? Separate magical path from > >> kernel_execve() to switch to userland? We used to have something of > >> that sort, and that had been a real horror... > >> > >> As it is, it's "kernel thread is spawned at the point similar to > >> ret_from_fork(), runs the payload (which almost never returns) and > >> then proceeds out to userland, same way fork(2) would've done." > >> That way kernel_execve() doesn't have to do anything magical. > >> > >> Al, digging through the old notes and current call graph... > > > > FWIW, the major assumption back then had been that get_signal(), > > signal_delivered() and all associated machinery (including coredumps) > > runs *only* from SIGPENDING/NOTIFY_SIGNAL handling. > > > > And "has complete registers on stack" is only a part of that; > > there was other fun stuff in the area ;-/ Do we want coredumps for > > those, and if we do, will the de_thread stuff work there? > > Do we want coredumps from processes that use io_uring? yes > Exactly what we want from io_uring threads is less clear. We can't > really give much that is meaningful beyond the thread ids of the > io_uring threads. > > What problems do are you seeing beyond the missing registers on the > stack for kernel threads? > > I don't immediately see the connection between coredumps and de_thread. > > The function de_thread arranges for the fatal_signal_pending to be true, > and that should work just fine for io_uring threads. The io_uring > threads process the fatal_signal with get_signal and then proceed to > exit eventually calling do_exit. I would like to see the testing in cases when the io-uring thread is the one getting hit by initial signal and when it's the normal one with associated io-uring ones. The thread-collecting logics at least used to depend upon fairly subtle assumptions, and "kernel threads obviously can't show up as candidates" used to narrow the analysis down... In any case, WTF would we allow reads or writes to *any* registers of such threads? It's not as simple as "just return zeroes", BTW - the values allowed in special registers might have non-trivial constraints on them. The same goes for coredump - we don't _have_ registers to dump for those, period. Looks like the first things to do would be * prohibit ptrace accessing any regsets of worker threads * make coredump skip all register notes for those Note, BTW, that kernel_thread() and kernel_execve() do *NOT* step into ptrace_notify() - explicit CLONE_UNTRACED for the former and zero current->ptrace in the caller of the latter. So fork and exec side has ptrace_event() crap limited to real syscalls. It's seccomp[1] and exit-related stuff that are messy... [1] "never trust somebody who introduces himself as Honest Joe and keeps carping on that all the time"; c.f. __secure_computing(), CONFIG_INTEGRITY, etc.