Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp797476pxf; Thu, 25 Mar 2021 14:22:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzkmolmCN7duUn7AzZCXUmjxp+ete9C4UnREYg5Zd4YvJsLwefd+odOUpz05jNFEM/LMZ1q X-Received: by 2002:a17:906:5acd:: with SMTP id x13mr11676958ejs.211.1616707349160; Thu, 25 Mar 2021 14:22:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616707349; cv=none; d=google.com; s=arc-20160816; b=VB9y9Osb5ydr5mRbOkXT8qM+lBGYBGrwJgWPU5QFXXh0xqBsezWSTIFAlZi974/QE+ Yps4A4LKHJMcnvLGJY2ziChVSD8JBZHEUYzYB9h3NDjlUGF6xtuKfUDa/21fpRtvEg21 Ah6JBjdbNaSHdQQDGZaa5TnXJXeeKCO4HYvuvF4OhBISZ49SQgM3UftfbZQwDJu3Ek2N LorreVJ8YGFflIFR+6lXiF/eOIg5QwtOXR2HwQ0RwUMi98cAdh1Z5iN8qSgfo3BauVZD wtVfp9dKXJ1DAWT2BiP7a31exLfaFixoet7ONiBCtwS7NPgpCgNfxIba3W+8pFpXNNI6 Z33A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:subject:from :references:cc:to:dkim-signature; bh=OQZA0jbq1xbHQCEY8TUlGEMjzZL7qZb9xcn3Kc+aiKs=; b=vO8cSQ8sjTSpCcIFaxeJvdRt28TRDU72YI/OmsDr34bKdvVbaAzknHp+jS7GpldscJ 7Rev3ARrgOQSxOr2EGP86u3ee3fPOmTfud/yLQgv06uPMSkktulH/YrkzBku7hGx4bDy 4VxXdh9zy7FTxsyRek5TqgC7aHybCfy40wIN1uR3PXCVRH3OYTXP03rWDYNhglWrhYGy oj3Felz3UYR+M9I7iSsX8+1t2KnJ7fAjzxhZ9iGNluw1mLKVLEGbga0wkkpk0XqKPFKt Qz1TF/VUuHsgbdYcUuAl6E7Bqk+VGvvJ/+nOs6XJCWOgU7osLOEWl2GO2WxLH310NXiK +4hQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@samba.org header.s=42 header.b=IEcX+TJ4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=samba.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id le26si5130964ejc.659.2021.03.25.14.22.06; Thu, 25 Mar 2021 14:22:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@samba.org header.s=42 header.b=IEcX+TJ4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=samba.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230366AbhCYVU7 (ORCPT + 99 others); Thu, 25 Mar 2021 17:20:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229581AbhCYVUc (ORCPT ); Thu, 25 Mar 2021 17:20:32 -0400 Received: from hr2.samba.org (hr2.samba.org [IPv6:2a01:4f8:192:486::2:0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6CD5DC06174A; Thu, 25 Mar 2021 14:20:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=samba.org; s=42; h=Date:Message-ID:From:Cc:To; bh=OQZA0jbq1xbHQCEY8TUlGEMjzZL7qZb9xcn3Kc+aiKs=; b=IEcX+TJ4DVViDj3tjQpKZypFLF Ab9NpeluhoG74Xlf6pso8kRjaglOxb+cT+GOp/qHs+Rb1tmFTFvIGhs82LdWAOtepfPKtRAM2LXV5 QDkDeN467HKsZV2jzhXpRttGSNLIXbwFZbaOAb1W5jJaGfYRyKguLIfD8RNXS9PidrHSVoDYbIIRu Kcq1Gakgi9miP7Cbd4u6g+8UktT7NdrPQzxwXPagRs050JUc7rwYw3xuZsx1kLDZnFxWYqEPEiuPt j6JdmFKX2OnVDgoxckR4hvZwnEULi40WSgRw0VWxMFYoqF1xauZjpeeUdfjIXRnfXj6azY0f78/vk ejxawll8SChMOpgrj67rZcy0YX86RCYs3UG7/25BdtdlWjM8s7CgSS/B67QWXOlNpU5zO+4YOtVZ6 heJ/pV5hfQMlSwsx7kwmZOnhUEcAPNmQq8G+nban+uofjRSsWZKabfAioSIN0kuOQwi84k3v+g5EU J8SvhJjffz3lDnIFhSthA0MU; Received: from [127.0.0.2] (localhost [127.0.0.1]) by hr2.samba.org with esmtpsa (TLS1.3:ECDHE_RSA_CHACHA20_POLY1305:256) (Exim) id 1lPXOx-0003up-Qv; Thu, 25 Mar 2021 21:20:23 +0000 To: "Eric W. Biederman" , Oleg Nesterov Cc: Linus Torvalds , Jens Axboe , io-uring , Linux Kernel Mailing List References: <20210325164343.807498-1-axboe@kernel.dk> <20210325204430.GE28349@redhat.com> From: Stefan Metzmacher Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc//task/ Message-ID: Date: Thu, 25 Mar 2021 22:20:21 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am 25.03.21 um 21:55 schrieb Eric W. Biederman: > Oleg Nesterov writes: > >> On 03/25, Linus Torvalds wrote: >>> >>> The whole "signals are very special for IO threads" thing has caused >>> so many problems, that maybe the solution is simply to _not_ make them >>> special? >> >> Or may be IO threads should not abuse CLONE_THREAD? >> >> Why does create_io_thread() abuse CLONE_THREAD ? >> >> One reason (I think) is that this implies SIGKILL when the process exits/execs, >> anything else? > > A lot. > > The io workers perform work on behave of the ordinary userspace threads. > Some of that work is opening files. For things like rlimits to work > properly you need to share the signal_struct. But odds are if you find > anything in signal_struct (not counting signals) there will be an > io_uring code path that can exercise it as io_uring can traverse the > filesystem, open files and read/write files. So io_uring can exercise > all of proc. > > Using create_io_thread with CLONE_THREAD is the least problematic way > (including all of the signal and ptrace problems we are looking at right > now) to implement the io worker threads. > > They _really_ are threads of the process that just never execute any > code in userspace. So they should look like a userspace thread sitting in something like epoll_pwait() with all signals blocked, which will never return to userspace again? I think that would be useful, but I also think that userspace should see: - /proc/$tidofiothread/cmdline as empty (in order to let ps and top use [iou-wrk-$tidofuserspacethread]) - /proc/$tidofiothread/exe as symlink to that not exists - all of /proc/$tidofiothread/ shows root.root as owner and group and things which still allow write access to /proc/$tidofiothread/comm similar things with rw permissions should still disallow modifications: For the other kernel threads e.g. "[cryptd]" I see the following: LANG=C ls -l /proc/653 | grep rw ls: cannot read symbolic link '/proc/653/exe': No such file or directory -rw-r--r-- 1 root root 0 Mar 25 22:09 autogroup -rw-r--r-- 1 root root 0 Mar 25 22:09 comm -rw-r--r-- 1 root root 0 Mar 25 22:09 coredump_filter lrwxrwxrwx 1 root root 0 Mar 25 22:09 cwd -> / lrwxrwxrwx 1 root root 0 Mar 25 22:09 exe -rw-r--r-- 1 root root 0 Mar 25 22:09 gid_map -rw-r--r-- 1 root root 0 Mar 25 22:09 loginuid -rw------- 1 root root 0 Mar 25 22:09 mem -rw-r--r-- 1 root root 0 Mar 25 22:09 oom_adj -rw-r--r-- 1 root root 0 Mar 25 22:09 oom_score_adj -rw-r--r-- 1 root root 0 Mar 25 22:09 projid_map lrwxrwxrwx 1 root root 0 Mar 25 22:09 root -> / -rw-r--r-- 1 root root 0 Mar 25 22:09 sched -rw-r--r-- 1 root root 0 Mar 25 22:09 setgroups -rw-r--r-- 1 root root 0 Mar 25 22:09 timens_offsets -rw-rw-rw- 1 root root 0 Mar 25 22:09 timerslack_ns -rw-r--r-- 1 root root 0 Mar 25 22:09 uid_map And this: LANG=C echo "bla" > /proc/653/comm -bash: echo: write error: Invalid argument LANG=C echo "bla" > /proc/653/gid_map -bash: echo: write error: Operation not permitted Can't we do the same for iothreads regarding /proc? Just make things read only there and empty "cmdline"/"exe"? Maybe I'm too naive, but that what I'd assume as a userspace developer/admin. Does at least parts of it make any sense? metze