Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp2932704pxy; Mon, 3 May 2021 11:10:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzyhk2ffSAd2P/DPP6fHMv1rZdiW+V3oGkieF1awPQwxnLJVf4kSd4l+agA2Nd61i/CWCDJ X-Received: by 2002:a63:2c81:: with SMTP id s123mr19412535pgs.168.1620065420878; Mon, 03 May 2021 11:10:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620065420; cv=none; d=google.com; s=arc-20160816; b=EEFKwhGikSjhwtErEM4uWC6V95YkO8Zmj0Rbm73gwIObdyvvH+Bxujohu6CA/+sR7l Peb4cU0/Uj8PRTAMmilXqqqZwOBZjiLHeopNmh1oe+79Rlrxgdx5hK9gNggfX6bCCtR6 d1l3unLCo0/7W/wJVJE1dVuQWWEttlchEwR88xNtUVdP+9hV/tQ8PGp2ByHaNi3yK1Cw ZkfQ674U4ZEMo8V6UTcf2e3PtrSsdmo65NH9cfThqGRCuYORSlb4oFsTV7fTNt4dC46I 19UpncqglcgnR4AUi0Ngr8tl4/cx535r1GagKILn1fO7p90m149U7kOXWf/DzLX1SLkj Ub+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:in-reply-to:cc:references:message-id:date :subject:mime-version:from:content-transfer-encoding:dkim-signature; bh=2V3HYq3YbH5eZIxZs7nKcsCMHCZy6Xq7DA+V46SoNmI=; b=Kn0Kz5xQpWeWtptyyaRnPsbcfMXe74dM4FBAXqEhhXarf9JyXPckJPEqKcOeeN+o5N p6qrmoMs7xhRzDjdr/Uim3uMKiKEy4SAZhADIuNXx72N1/gYpmaSI+CfZHHol2Jwv7bu 1illmdPsqrI0tkKnK6aaoiQpAWHLYq52IwxT/fKR2x5H1pHvdi+tOdJTWkbHqNnBS/MK HBXObiBy42JBjVP3qAHO8ixaV8sdPF2ajLJU0yttXQw9qtIIPounfL6Q2+MjuAaErzdZ oK+3rKgFJYQCyS68K6mtjVoJzo5jSAcztRfSJM4PCzDD+/RkD8OcShlpS/81eYF9zOPa tHMA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=TpPnf3bo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m136si403342pga.183.2021.05.03.11.10.08; Mon, 03 May 2021 11:10:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=TpPnf3bo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230388AbhECPaJ (ORCPT + 99 others); Mon, 3 May 2021 11:30:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229588AbhECPaH (ORCPT ); Mon, 3 May 2021 11:30:07 -0400 Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 75FCEC061761 for ; Mon, 3 May 2021 08:29:14 -0700 (PDT) Received: by mail-pf1-x435.google.com with SMTP id b27so4397774pfp.9 for ; Mon, 03 May 2021 08:29:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=content-transfer-encoding:from:mime-version:subject:date:message-id :references:cc:in-reply-to:to; bh=2V3HYq3YbH5eZIxZs7nKcsCMHCZy6Xq7DA+V46SoNmI=; b=TpPnf3bolYDVSjJPRDR8FjKGu85pfRv77SBJZBQHoYEbphH6DRrCLMsD3dOCctjAnL wYvu3jVE2AQewOxDy3vbfn34JxTgO54FuX0FukEmGBlfMPgi/OwJRDW8pwT8TYvQLPIa SAxm3wjxi7gZbV7CyGYGyYJ7J8VFqd+lGgJA5S/dnsskEew6z4LDbJqlNSWLZW7VQcQw XE8cpOE0KyYGVnmLW0WdupKeoC3QvggNovzlgSezkkAQc1nuKa8GFvy0zhhielqXmimy /B82tbcges+mfMiayPFZ6b3WCcwAP4Tzfr3NKzRg7CsvpOwm8JdBdR2efMIQ8UrLRsFm 7CnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=2V3HYq3YbH5eZIxZs7nKcsCMHCZy6Xq7DA+V46SoNmI=; b=SB8ZskVoQdJAUCNyi7qBLKuHYbkheuuIvkXAg35lJRgXRDZEfgUvPHcFOhIXGUY2dF yaUgFMRzqloEKkTNAEGrNXyUynUA06fayp+gC2BAWKee5IGuGMdBWrJTmLoCn98ksgW2 6L04709ZSU2kiek9h1mruSIl/7ELpZapmldlWsPJMG5Jy+TF9R90M3liEiFrtbm37cCT JbnGnnWQQJkPo58GkvgnRoq2Z6pP5C6ZcS90/gPUfd5/YopWjj1ltXjUMKyeqcjza0Bn dKFc0M4cX8wkMRwQ8yPPFuMghaGhvLDSYZSfAgRfSR1AGr/wUbrSKYBt1OJG9b4oYbld cSSA== X-Gm-Message-State: AOAM531ZOII5oiHcxRyX3ktVhzDUmWP5RjsT1MbmxSKDjcYJTjg7PHnT I7OSVsX8n4bnJJv/4zBRB+kAgA== X-Received: by 2002:a62:1888:0:b029:262:de45:b458 with SMTP id 130-20020a6218880000b0290262de45b458mr19722073pfy.20.1620055753894; Mon, 03 May 2021 08:29:13 -0700 (PDT) Received: from smtpclient.apple ([2601:646:c200:1ef2:1960:85f5:fe97:e8ac]) by smtp.gmail.com with ESMTPSA id l3sm17757773pju.44.2021.05.03.08.29.12 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 03 May 2021 08:29:13 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Andy Lutomirski Mime-Version: 1.0 (1.0) Subject: Re: extending ucontext (Re: [PATCH v26 25/30] x86/cet/shstk: Handle signals for shadow stack) Date: Mon, 3 May 2021 08:29:11 -0700 Message-Id: <2D8926E4-F1B6-433A-96EA-995A66F3F42D@amacapital.net> References: <782ffe96-b830-d13b-db80-5b60f41ccdbf@intel.com> Cc: Andy Lutomirski , linux-arch , X86 ML , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , LKML , "open list:DOCUMENTATION" , Linux-MM , Linux API , Arnd Bergmann , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue , Dave Martin , Weijiang Yang , Pengfei Xu , Haitao Huang In-Reply-To: <782ffe96-b830-d13b-db80-5b60f41ccdbf@intel.com> To: "Yu, Yu-cheng" X-Mailer: iPhone Mail (18E199) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On May 3, 2021, at 8:14 AM, Yu, Yu-cheng wrote: >=20 > =EF=BB=BFOn 5/2/2021 4:23 PM, Andy Lutomirski wrote: >>> On Fri, Apr 30, 2021 at 10:47 AM Andy Lutomirski wrote= : >>>=20 >>> On Fri, Apr 30, 2021 at 10:00 AM Yu, Yu-cheng wr= ote: >>>>=20 >>>> On 4/28/2021 4:03 PM, Andy Lutomirski wrote: >>>>> On Tue, Apr 27, 2021 at 1:44 PM Yu-cheng Yu wr= ote: >>>>>>=20 >>>>>> When shadow stack is enabled, a task's shadow stack states must be sa= ved >>>>>> along with the signal context and later restored in sigreturn. Howev= er, >>>>>> currently there is no systematic facility for extending a signal cont= ext. >>>>>> There is some space left in the ucontext, but changing ucontext is li= kely >>>>>> to create compatibility issues and there is not enough space for furt= her >>>>>> extensions. >>>>>>=20 >>>>>> Introduce a signal context extension struct 'sc_ext', which is used t= o save >>>>>> shadow stack restore token address. The extension is located above t= he fpu >>>>>> states, plus alignment. The struct can be extended (such as the ibt'= s >>>>>> wait_endbr status to be introduced later), and sc_ext.total_size fiel= d >>>>>> keeps track of total size. >>>>>=20 >>>>> I still don't like this. >>>>>=20 >>>>> Here's how the signal layout works, for better or for worse: >>>>>=20 >=20 > [...] >=20 >>>>>=20 >>>>> That's where we are right now upstream. The kernel has a parser for >>>>> the FPU state that is bugs piled upon bugs and is going to have to be >>>>> rewritten sometime soon. On top of all this, we have two upcoming >>>>> features, both of which require different kinds of extensions: >>>>>=20 >>>>> 1. AVX-512. (Yeah, you thought this story was over a few years ago, >>>>> but no. And AMX makes it worse.) To make a long story short, we >>>>> promised user code many years ago that a signal frame fit in 2048 >>>>> bytes with some room to spare. With AVX-512 this is false. With AMX >>>>> it's so wrong it's not even funny. The only way out of the mess >>>>> anyone has come up with involves making the length of the FPU state >>>>> vary depending on which features are INIT, i.e. making it more compact= >>>>> than "compact" mode is. This has a side effect: it's no longer >>>>> possible to modify the state in place, because enabling a feature with= >>>>> no space allocated will make the structure bigger, and the stack won't= >>>>> have room. Fortunately, one can relocate the entire FPU state, update= >>>>> the pointer in mcontext, and the kernel will happily follow the >>>>> pointer. So new code on a new kernel using a super-compact state >>>>> could expand the state by allocating new memory (on the heap? very >>>>> awkwardly on the stack?) and changing the pointer. For all we know, >>>>> some code already fiddles with the pointer. This is great, except >>>>> that your patch sticks more data at the end of the FPU block that no >>>>> one is expecting, and your sigreturn code follows that pointer, and >>>>> will read off into lala land. >>>>>=20 >>>>=20 >>>> Then, what about we don't do that at all. Is it possible from now on w= e >>>> don't stick more data at the end, and take the relocating-fpu approach?= >>>>=20 >>>>> 2. CET. CET wants us to find a few more bytes somewhere, and those >>>>> bytes logically belong in ucontext, and here we are. >>>>>=20 >>>>=20 >>>> Fortunately, we can spare CET the need of ucontext extension. When the= >>>> kernel handles sigreturn, the user-mode shadow stack pointer is right a= t >>>> the restore token. There is no need to put that in ucontext. >>>=20 >>> That seems entirely reasonable. This might also avoid needing to >>> teach CRIU about CET at all. >> Wait, what's the actual shadow stack token format? And is the token >> on the new stack or the old stack when sigaltstack is in use? For >> that matter, is there any support for an alternate shadow stack for >> signals? >=20 > The restore token is a pointer pointing directly above itself and bit[0] i= ndicates 64-bit mode. >=20 > Because the shadow stack stores only return addresses, there is no alterna= te shadow stack. However, the application can allocate and switch to a new s= hadow stack. I think we should make the ABI support an alternate shadow stack even if we d= on=E2=80=99t implement it initially. After all, some day someone might want t= o register a handler for shadow stack overflow. >=20 > Yu-cheng