Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp1331336pxu; Thu, 17 Dec 2020 07:37:03 -0800 (PST) X-Google-Smtp-Source: ABdhPJyusnhMnpkdGn6yU5eT+g0FE95pU3cp3qKbbKOcc71209k245I4TxwST5VHzYihNkGRM+tb X-Received: by 2002:a50:e80c:: with SMTP id e12mr37659101edn.288.1608219423552; Thu, 17 Dec 2020 07:37:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1608219423; cv=none; d=google.com; s=arc-20160816; b=XWA7J708IjmhTwvoGrVniUsGpUNMC5QCQFg1x+y/9Q5hHi/EMBQcpifrYFdiolWBih N3wb5y4wo+04HIHyZF/UP8vMzVmliBd/Mb0QJfQFLQUu5x2eZY7KvOWKX2JT3b3un76Z QtWiPvYVojXu0rywEqDmqTfSEeTVHcRKUPTa8QEuUtrU8aaDCXgv4pGHBm2sWdKwOVpC 5atjdF//mq/puKJC6cgY0JL7tIv8iGGQBiMxMugljJqoANNDYaYdvTDUj5bOU+AW/dNG M04LJ53m+yts/Zwhp63IYfGCr5LDBHw2VafI7DdmmTv8caH5d8hQZKxgxzfC5P95k5Gb 6OTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:in-reply-to:cc:references:message-id:date :subject:mime-version:from:content-transfer-encoding:dkim-signature; bh=cF1lpVOfX+qqIliOA4fJEolSXJPbWdzTY82HZk7M6Ss=; b=0MaE5sx/HuG1p4F4qQAb1otOA2UUjMucU2PTy/LrZkMr/2rXSOFfKxnon4Ozu8P5lE QGTi3x06RnHzWnQyYP26MpYy8sqbgKFdhwAB23/TUP8ifTVU3/1mpdMSxzc0bXq1ZOEr Hb4eZT9jpPVlVepq/6O3PNWtwsaMoYmRyLVu3HfqHpnux48t1EvGjX8qdE2xIvDF9ct0 XRpJSKfb1Rztpwi3w/5lTjEmX0rX8Zr9tECxguEr76aAwZ2RbAeQYHDPxLnXr8mBiyUF No9EmcbBKdxZoL+Lkhk3rBo3GbNu2DcVGgrHMEQs/H7XnuGjMlw2+92iAPSqc4zfx2Uc mGGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=jmCoo2Kf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s7si4199712edx.227.2020.12.17.07.36.39; Thu, 17 Dec 2020 07:37:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=jmCoo2Kf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728143AbgLQPgC (ORCPT + 99 others); Thu, 17 Dec 2020 10:36:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726773AbgLQPgC (ORCPT ); Thu, 17 Dec 2020 10:36:02 -0500 Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com [IPv6:2607:f8b0:4864:20::1034]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0B3E9C06138C for ; Thu, 17 Dec 2020 07:35:22 -0800 (PST) Received: by mail-pj1-x1034.google.com with SMTP id v1so3842753pjr.2 for ; Thu, 17 Dec 2020 07:35:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=content-transfer-encoding:from:mime-version:subject:date:message-id :references:cc:in-reply-to:to; bh=cF1lpVOfX+qqIliOA4fJEolSXJPbWdzTY82HZk7M6Ss=; b=jmCoo2KfKiXBFPZ5Pu4hTHY3iB5JmJdEW2xDg11/qL0XDN1JfsdN7XyC2j8sEoH95y GFMTYGr0oCS+l4J4pMyQxd8aZ/F75Twiay44FOIBLAxCIaB28JlAUo9Ib+wLBPNGhIB5 JZos5e7Irlc2WYQAdv2N/x8jYTVEVdXO+q3lEFoIwQJafSWpP75RiwxCcuG16exsWsl1 /xaldJycza+FyblXi0w7S8BUo7Flvd3XJQ13gxiBJ6wf0KNWX2NhU00LCXp3C5ufBfWG twR6w2w9HVJM956l2GZbRuU5Uhi+JmF4wOm7e79KWTwg3krAg7CO6ep+OzxKH/X27pcp D66A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=cF1lpVOfX+qqIliOA4fJEolSXJPbWdzTY82HZk7M6Ss=; b=dktlKzKOytUjd9cNMp3I4DnVb/LUNqlI7bb/cfik7Q+kB4xW/ij0AYYno1GamiQ0nr nXQ7ghYpNx1wI7+SUjz4E4rmkTHftm/WmyNPKvATzgrutqeOZlMHat7JCOXUlO5f64ke f5I8SdeSor1CPxHxkkhEQLLvJgjR8u44aX0Ogar+cUXbTgWb+Q+Aw2W/37/QZNEnv22d IuaFeP5eGoADtkTm3/KH0tCz31gzphOHShp6fn3ak8Mp6YAbu91XWAmDyScvfWN9ZeBI bpSauizyDeySQ8sDSFH3AQxEMy9QlNHXXSA+zQAi87Nh8x1YPPG9UFk7XfbREc2f9Ftg g+9w== X-Gm-Message-State: AOAM530VD6nx4Cc3+MKFnaZnsSDii2/7X1NDawo7NYQzi5GJcmfzFuBB qixNtOiFQrTTPAwrAJr2Ue3X8w== X-Received: by 2002:a17:902:64:b029:da:a9cf:4065 with SMTP id 91-20020a1709020064b02900daa9cf4065mr36533471pla.26.1608219321514; Thu, 17 Dec 2020 07:35:21 -0800 (PST) Received: from ?IPv6:2601:646:c200:1ef2:9dc1:d988:a568:787a? ([2601:646:c200:1ef2:9dc1:d988:a568:787a]) by smtp.gmail.com with ESMTPSA id r123sm6059458pfr.68.2020.12.17.07.35.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 17 Dec 2020 07:35:20 -0800 (PST) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Andy Lutomirski Mime-Version: 1.0 (1.0) Subject: Re: [PATCH V3.1] entry: Pass irqentry_state_t by reference Date: Thu, 17 Dec 2020 07:35:18 -0800 Message-Id: <24F5DC49-1FB3-42CF-8323-B0B39D936F7F@amacapital.net> References: <20201217131924.GW3040@hirez.programming.kicks-ass.net> Cc: Thomas Gleixner , Andy Lutomirski , Weiny Ira , Ingo Molnar , Borislav Petkov , Dave Hansen , X86 ML , LKML , Andrew Morton , Fenghua Yu , "open list:DOCUMENTATION" , linux-nvdimm , Linux-MM , "open list:KERNEL SELFTEST FRAMEWORK" , Dan Williams , Greg KH In-Reply-To: <20201217131924.GW3040@hirez.programming.kicks-ass.net> To: Peter Zijlstra X-Mailer: iPhone Mail (18B121) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Dec 17, 2020, at 5:19 AM, Peter Zijlstra wrote: >=20 > =EF=BB=BFOn Thu, Dec 17, 2020 at 02:07:01PM +0100, Thomas Gleixner wrote: >>> On Fri, Dec 11 2020 at 14:14, Andy Lutomirski wrote: >>>> On Mon, Nov 23, 2020 at 10:10 PM wrote: >>> After contemplating this for a bit, I think this isn't really the >>> right approach. It *works*, but we've mostly just created a bit of an >>> unfortunate situation. Our stack, on a (possibly nested) entry looks >>> like: >>>=20 >>> previous frame (or empty if we came from usermode) >>> --- >>> SS >>> RSP >>> FLAGS >>> CS >>> RIP >>> rest of pt_regs >>>=20 >>> C frame >>>=20 >>> irqentry_state_t (maybe -- the compiler is within its rights to play >>> almost arbitrary games here) >>>=20 >>> more C stuff >>>=20 >>> So what we've accomplished is having two distinct arch register >>> regions, one called pt_regs and the other stuck in irqentry_state_t. >>> This is annoying because it means that, if we want to access this >>> thing without passing a pointer around or access it at all from outer >>> frames, we need to do something terrible with the unwinder, and we >>> don't want to go there. >>>=20 >>> So I propose a somewhat different solution: lay out the stack like this.= >>>=20 >>> SS >>> RSP >>> FLAGS >>> CS >>> RIP >>> rest of pt_regs >>> PKS >>> ^^^^^^^^ extended_pt_regs points here >>>=20 >>> C frame >>> more C stuff >>> ... >>>=20 >>> IOW we have: >>>=20 >>> struct extended_pt_regs { >>> bool rcu_whatever; >>> other generic fields here; >>> struct arch_extended_pt_regs arch_regs; >>> struct pt_regs regs; >>> }; >>>=20 >>> and arch_extended_pt_regs has unsigned long pks; >>>=20 >>> and instead of passing a pointer to irqentry_state_t to the generic >>> entry/exit code, we just pass a pt_regs pointer. >>=20 >> While I agree vs. PKS which is architecture specific state and needed in >> other places e.g. #PF, I'm not convinced that sticking the existing >> state into the same area buys us anything more than an indirect access. >>=20 >> Peter? >=20 > Agreed; that immediately solves the confusion Ira had as well. While > extending pt_regs sounds scary, I think we've isolated our pt_regs > implementation from actual ABI pretty well, but of course, that would > need an audit. We don't want to leak this into signals for example. >=20 I=E2=80=99m okay with this. My suggestion for having an extended pt_regs that contains pt_regs is to kee= p extensions like this invisible to unsuspecting parts of the kernel. In par= ticular, BPF seems to pass around struct pt_regs *, and I don=E2=80=99t know= what the implications of effectively offsetting all the registers relative t= o the pointer would be. Anything that actually broke the signal regs ABI should be noticed by the x8= 6 selftests =E2=80=94 the tests read and write registers through ucontext. >=20