Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp1714173ybh; Thu, 16 Jul 2020 21:52:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzGojQmpyd1vYzO/0CHEJyE8HjpgJ6DZiJOR4Veg8x4QrUG8uLLxdjyJA+2LtiVbMm2p1wo X-Received: by 2002:a17:906:60c6:: with SMTP id f6mr6574650ejk.265.1594961553689; Thu, 16 Jul 2020 21:52:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594961553; cv=none; d=google.com; s=arc-20160816; b=SVizIyRRwAWQwyHcndV/bbpYLyXnrasWKFW0vDd2DbkAskkzHptGqV/S8pDGlYy+fl 4jJv+sFKExIaPXnXY+rjVxtVmKwhK8rcMcwRmmXUwPCz3cuOVe2WYVLNO/cQYpoIoc/a fDhmkGs3YwWRiZ6ZbuFUbBRq7jqmk2Q+pkgw49OfryJwiflLlCVMCey0AYxX3mB+xdAo CuTiTA4SjUANRCALSYiY/TlYLaJsLWXqZYpU3B0SS4HysdSl7gUvZnvDgH5StxfWCieq P9bZD5gGCFSFkTDdDuVceYlC+ERRc1IKCgDwmZJX+xj3DQMgLChZujefLaH+T14kQnVh JVDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=msrEvGCOT7BrPT1y9ISWDAdQMfVsPBNWx+Lau+SkmO8=; b=s4p7VRIafgeSu6E7YhYkb9oSMWGuawlEH5h23LAoULIK64Oj9ehFqxVilStpGI3Tpy VvOjiRrjgNPN6t7d9AvW2IZUDIadXz0WM+9rK/N4P3/+UZVyaLwUtf5DLxLNh+Vms3Rd EMc9cHKs/eAKdsfaUF0Jr3DZ1hWshJ1RUqqOEqyEsufroFp4k0yCCu2bL1rA85YJU9FZ /3TKtKuGMCeWm/dgYyHaDe0BYmMV06ITe/Gc44T0BnuhO9kFAzmLHukscnuyu4/iStsn N0d7IAjYod2kbWAav6RLOUxallpSlgjfZrFW0F4Q5447z4b+qG1NnYUyB5XZCobh+JR8 6Fag== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=z+rxNg5B; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dt10si3342207ejc.535.2020.07.16.21.52.09; Thu, 16 Jul 2020 21:52:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=z+rxNg5B; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726240AbgGQEtE (ORCPT + 99 others); Fri, 17 Jul 2020 00:49:04 -0400 Received: from mail.kernel.org ([198.145.29.99]:44030 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725811AbgGQEtE (ORCPT ); Fri, 17 Jul 2020 00:49:04 -0400 Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 56D7D20737 for ; Fri, 17 Jul 2020 04:49:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1594961343; bh=L8lqW9YWkE3hJz0DUPCXPBOUMOfpaEoI1neX+lDtyvg=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=z+rxNg5B7H9LBiuCEGSsYkwL9ZF4xQttstyNOv5daewr6uRVvV0RTLNULbosy1E9N Vhu5hMpGIe1oZVg3fh4HYZG85X2rKvhv0B8b0BzkagdSZSIQEkWUvdZylWNngfYD+x bcy8bAPyvAY6E0fcQPXj1THHopL4WDv956NUyA3Y= Received: by mail-wr1-f48.google.com with SMTP id q5so9624820wru.6 for ; Thu, 16 Jul 2020 21:49:03 -0700 (PDT) X-Gm-Message-State: AOAM5335TnnZK8j8HTp8z3zRXUfczxgPJkkQ6RdejsfnWvQppV1HlRHP n9Ia2wjaiCVBTgGSJCe0iJdonnZb3rSimt4uQzPIwg== X-Received: by 2002:adf:f707:: with SMTP id r7mr8291178wrp.70.1594961341827; Thu, 16 Jul 2020 21:49:01 -0700 (PDT) MIME-Version: 1.0 References: <20200716193141.4068476-1-krisman@collabora.com> <20200716193141.4068476-2-krisman@collabora.com> <87wo32j394.fsf@collabora.com> In-Reply-To: <87wo32j394.fsf@collabora.com> From: Andy Lutomirski Date: Thu, 16 Jul 2020 21:48:50 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v4 1/2] kernel: Implement selective syscall userspace redirection To: Gabriel Krisman Bertazi Cc: Andy Lutomirski , Thomas Gleixner , LKML , kernel@collabora.com, Matthew Wilcox , Paul Gofman , Kees Cook , "open list:KERNEL SELFTEST FRAMEWORK" , Shuah Khan Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 16, 2020 at 7:15 PM Gabriel Krisman Bertazi wrote: > > Andy Lutomirski writes: > > > On Thu, Jul 16, 2020 at 12:31 PM Gabriel Krisman Bertazi > > wrote: > >> > > > > This is quite nice. I have a few comments, though: > > > > You mentioned rt_sigreturn(). Should this automatically exempt the > > kernel-provided signal restorer on architectures (e.g. x86_32) that > > provide one? > > That seems reasonable. Not sure how easy it is to do it, though. For better or for worse, it's currently straightforward because the code is: __kernel_sigreturn: .LSTART_sigreturn: popl %eax /* XXX does this mean it needs unwind info? */ movl $__NR_sigreturn, %eax SYSCALL_ENTER_KERNEL and SYSCALL_ENTER_KERNEL is hardwired as int $0x80. (The latter is probably my fault, for better or for worse.) So this would change to: __vdso32_sigreturn_syscall: SYSCALL_ENTER_KERNEL and vdso2c would wire up __vdso32_sigreturn_syscall. Then there would be something like: bool arch_syscall_is_vdso_sigreturn(struct pt_regs *regs); and that would be that. Does anyone have an opinion as to whether this is a good idea? Modern glibc shouldn't be using this mechanism, I think, but I won't swear to it. > > > The amount of syscall entry wiring that arches need to do is IMO > > already a bit out of hand. Should we instead rename TIF_SECCOMP to > > TIF_SYSCALL_INTERCEPTION and have one generic callback that handles > > seccomp and this new thing? > > Considering the previous suggestion from Kees to hide it inside the > tracehook and Thomas rework of this path, I'm not sure what is the best > solution here, but some rework of these flags is due. Thomas suggested > expanding these flags to 64 bits and having some arch specific and > arch-agnostic flags. With the storage expansion and arch-agnostic flags, > would this still be desirable? I think it would be desirable to consolidate this to avoid having multiple arches need to separately wire up all of these mechanisms. I'm not sure that the initial upstream implementation needs this, but it might be nice to support this out of the box on all arches with seccomp support. > > >> +int do_syscall_user_dispatch(struct pt_regs *regs) > >> +{ > >> + struct syscall_user_dispatch *sd = ¤t->syscall_dispatch; > >> + unsigned long ip = instruction_pointer(regs); > >> + char state; > >> + > >> + if (likely(ip >= sd->dispatcher_start && ip <= sd->dispatcher_end)) > >> + return 0; > >> + > >> + if (likely(sd->selector)) { > >> + if (unlikely(__get_user(state, sd->selector))) > >> + do_exit(SIGSEGV); > >> + > >> + if (likely(state == 0)) > >> + return 0; > >> + > >> + if (state != 1) > >> + do_exit(SIGSEGV); > > > > This seems a bit extreme and hard to debug if it ever happens. > > Makes sense, but I don't see a better way to return the error here. > Maybe a SIGSYS with a different si_errno? Alternatively, we could > revert to the previous behavior of allowing syscalls on state != 0, that > existed in v1. What do you think? > I don't have a strong opinion. SIGSYS with different si_errno is probably reasonable. --Andy