Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp4879439iob; Mon, 9 May 2022 03:56:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyTBxTD+KXGcYXbtqncWiAQ/M87zQboQYbfGmXI77GWZOaamKBbvDM16/2WA1boTZcF7FIi X-Received: by 2002:a62:f20d:0:b0:50d:6961:7b75 with SMTP id m13-20020a62f20d000000b0050d69617b75mr15606861pfh.19.1652093786840; Mon, 09 May 2022 03:56:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652093786; cv=none; d=google.com; s=arc-20160816; b=RCPb/aYhLoy3WkrFQJ/rHZc3h5iccPgwfWUhURft4UFYmnPpbuG/r9W7NMoZgfyIpr dn7rxaIKH8/vACVOZ1eoaATN9S9vrZwyydAd5oqXva52NNOr1CZSSSbu5dgcv8lD/cBg WwhFTSpASCUYhvP1ahAWctj/OWOJtjWcD2k0zC/fZK/9FEuULk+CNulvSUqeY8ukUvm7 Zew6DZYrUzaH3WMP4fwq70E3+xDOWphbMyn6881LiIGaYakxSZbd394PBRpsHJhF2530 G+fCdNCG60oZTnlZqMNc58HJZUanc0otZI64zL5nNONjM3gmkNLHV9nkOAQtbe+TPSKS UnbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=fID8FcT+xosj1Uc8gPBDKz5Wn3+LVaIbi8D4Nc2u3r4=; b=mVWNhtme4ICMBFpcUOHIAROryJxaK8PwO/bR+P+1RgXm8fKAL0zBxHXvektaqisF4v m3wYXkB/ZyPQV9NxOXDAE33sQkkUmazvs2BipGqnOnejCLAA7RJy5UL3qqbwSaI0fpV1 lWYRrb2HHP7jpXiWn2R2+wRDOV0JzIdvLiQR0tO3smQ8ZcvIJNKKKXzecawdFxbbzUBx d3juY12tHxookF1xpBk3EdbP4eCzOYpxM0j9J5AsqJgmTI+g6swcx9n6NHZK6xiiDnv0 3pPoxfXIggGzFdJvSeorlTRcKltYsqn+RRIaiUv9jyOnm88rjrjmnX7qgB96HjyQDWa6 b5mw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=rVhyEI4D; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id j3-20020a654d43000000b0039e01d84aa8si15982332pgt.352.2022.05.09.03.56.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 May 2022 03:56:26 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=rVhyEI4D; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id CE4AC23F3AC; Mon, 9 May 2022 03:12:55 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230399AbiEIKMe (ORCPT + 99 others); Mon, 9 May 2022 06:12:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34240 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233003AbiEIKM3 (ORCPT ); Mon, 9 May 2022 06:12:29 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 368CF233A6A; Mon, 9 May 2022 03:08:30 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id A54AAB80D3C; Mon, 9 May 2022 10:01:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 308F7C385A8; Mon, 9 May 2022 10:01:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1652090465; bh=Xj+BxxtzqKq6ClC+lgtNkIEjRLg+0U/DyE+0EsHanR4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=rVhyEI4DCCNX9EpA2cYe+oghmzpR+KJ7NQEvWTwkt57PG8e+pkVTuXwkv68g1h8ty X0d530Fm1dTEXvRXxQo84SlftXYiVkh6cviQ9dYCER0QlBP6uf83H4OEoiPXZFyh/y 831mCcvaiHL8kyxFd9ZkztQaBJQV1AKE8sqHtYI12swbA6JHnt/fURNh7iDnXA/025 Zbnoj/7G6qtsBV6hdnhu3VLlfjJtadyE0ZW6ckVDMUKZvdDf2Omi4GgcMKYXtlg+W0 2Nny282doe/wTCMPH/Zs+Uc79huxL3X5FjzwT7h6qM4vt1n0cH0b3/qcH2hhclM/Ao JOLdtch+eOtmQ== Date: Mon, 9 May 2022 12:00:58 +0200 From: Christian Brauner To: Arnd Bergmann Cc: Huacai Chen , Huacai Chen , Andy Lutomirski , Thomas Gleixner , Peter Zijlstra , Andrew Morton , David Airlie , Jonathan Corbet , Linus Torvalds , linux-arch , "open list:DOCUMENTATION" , Linux Kernel Mailing List , Xuefeng Li , Yanteng Si , Guo Ren , Xuerui Wang , Jiaxun Yang , Linux API Subject: Re: [PATCH V9 13/24] LoongArch: Add system call support Message-ID: <20220509100058.vmrgn5fkk3ayt63v@wittgenstein> References: <20220430090518.3127980-1-chenhuacai@loongson.cn> <20220430090518.3127980-14-chenhuacai@loongson.cn> <20220507121104.7soocpgoqkvwv3gc@wittgenstein> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20220507121104.7soocpgoqkvwv3gc@wittgenstein> X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, May 07, 2022 at 02:11:04PM +0200, Christian Brauner wrote: > On Sat, Apr 30, 2022 at 12:34:52PM +0200, Arnd Bergmann wrote: > > On Sat, Apr 30, 2022 at 12:05 PM Huacai Chen wrote: > > > On Sat, Apr 30, 2022 at 5:45 PM Arnd Bergmann wrote: > > > > On Sat, Apr 30, 2022 at 11:05 AM Huacai Chen wrote: > > > > > > > > > > This patch adds system call support and related uaccess.h for LoongArch. > > > > > > > > > > Q: Why keep __ARCH_WANT_NEW_STAT definition while there is statx: > > > > > A: Until the latest glibc release (2.34), statx is only used for 32-bit > > > > > platforms, or 64-bit platforms with 32-bit timestamp. I.e., Most 64- > > > > > bit platforms still use newstat now. > > > > > > > > > > Q: Why keep _ARCH_WANT_SYS_CLONE definition while there is clone3: > > > > > A: The latest glibc release (2.34) has some basic support for clone3 but > > > > > it isn't complete. E.g., pthread_create() and spawni() have converted > > > > > to use clone3 but fork() will still use clone. Moreover, some seccomp > > > > > related applications can still not work perfectly with clone3. E.g., > > > > > Chromium sandbox cannot work at all and there is no solution for it, > > > > > which is more terrible than the fork() story [1]. > > > > > > > > > > [1] https://chromium-review.googlesource.com/c/chromium/src/+/2936184 > > > > > > > > I still think these have to be removed. There is no mainline glibc or musl > > > > port yet, and neither of them should actually be required. Please remove > > > > them here, and modify your libc patches accordingly when you send those > > > > upstream. > > > > > > If this is just a problem that can be resolved by upgrading > > > glibc/musl, I will remove them. But the Chromium problem (or sandbox > > > problem in general) seems to have no solution now. > > > > I added Christian Brauner to Cc now, maybe he has come across the > > sandbox problem before and has an idea for a solution. > > (I just got back from LSFMM so I'll reply in more detail next week. I'm > still pretty jet-lagged.) Right, I forgot about the EPERM/ENOSYS sandbox thread. Kees and I gave a talk about this problem at LPC 2019 (see [2]). The proposed solutions back then was to add basic deep argument inspection for first-level pointers to seccomp. There are problems with this approach such as not useable on second-level pointers (although we concluded that's ok) and if the input args are very large copying stuff from within seccomp becomes rather costly and in general the various approaches seemed handwavy at the time. If seccomp were to be made to support some basic form of eBPF such that it can still be safely called by unprivileged users then this would likely be easier to do (famous last words) but given that the stance has traditionally bee to not port seccomp it remains a tricky patch. Some time after that I talked to Mathieu Desnoyers about this issue who used another angle of attack. The idea seems less complicated to me. Instead of argument inspection we introduce basic syscall argument checksumming for seccomp. It would only be done when seccomp is interested in syscall input args and checksumming would be per syscall argument. It would be validated within the syscall when it actually reads the arguments; again, only if seccomp is used. If the checksums mismatch an error is returned or the calling process terminated. There's one case that deserves mentioning: since we introduced the seccomp notifier we do allow advanced syscall interception and we do use it extensively in various projects. Roughly, it works by allowing a userspace process (the "supervisor") to listen on a seccomp fd. The seccomp fd is an fd referring to the filter of a target task (the "supervisee"). When the supervisee performs a syscall listed in the seccomp notify filter the supervisor will receive a notification on the seccomp fd for the filter. I mention this because it is possible for the supervisor to e.g. intercept an bpf() system call and then modify/create/attach a bpf program for the supervisee and then update fields in the supervisee's bpf struct that was passed to the bpf() syscall by it. So the supervisor might rewrite syscall args and continue the syscall (In general, it's not recommeneded because of TOCTOU. But still doable in certain scenarios where we can guarantee that this is safe even if syscall args are rewritten to something else by a MIT attack.). Arguably, the checksumming approach could even be made to work with this if the seccomp fd learns a new ioctl() or similar to safely update the checksum. I can try and move a poc for this up the todo list. Without an approach like this certain sandboxes will fallback to ENOSYSing system calls they can't filter. This is a generic problem though with clone3() being one promiment example. [2]: https://www.youtube.com/watch?v=PnOSPsRzVYM&list=PLVsQ_xZBEyN2Ol7y8axxhbTsG47Va3Se2