Received: by 2002:a25:ca44:0:0:0:0:0 with SMTP id a65csp843401ybg; Tue, 28 Jul 2020 22:10:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwIPI1KMDI3+5DmnIC0tmuxcQ5aE16NIrFqT4XlKMuaNpaNfCvUk2lk8oJuLN53j7Hw6yk6 X-Received: by 2002:aa7:dacb:: with SMTP id x11mr27804156eds.280.1595999449706; Tue, 28 Jul 2020 22:10:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595999449; cv=none; d=google.com; s=arc-20160816; b=QnhTnamayTAsm6TLeEDsXyDfNvRXwiAqt7XDxTdfc/HPDmRB4IEPOl8eWJcV4fxqeE A4cR47Nq2I1t3D2RD2FtfVcXUbYQhHvfoG23n4t9eWag7SWzGZ2zQmqxV9oDcU5f5smI HD3VogOfm8ieHzCE0ECekIYGr8HhSpGHDGukB4hAYuBpfgHgPzvDBL8XZLdj7BPXyMGU q6CwUi8Uk4n+fvgJAOlcJ6ifPWDh92+ydi2T7hNgz46ZSMtom3PUJR81ihTK32NTVK82 0FSwDvB2fnXmu+L8VAgls9ECefuu9KaOaAitu9YaiitwxRRRB9us6wXQXApr0n7VA0Du Dynw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=MrRHg3eeibZEDZKW1OMGc2WL1v5riPnlx5zI57W9juI=; b=RMJOUO5jYTXXVHArC37Tflt48QVaolitoVLp9XHpIqUiT+m9m8F0ULBjBeZiEspPGX hOmdkNQbaCUIJFNEMoPLPSsvx1nfYG/GOXVDoK1uFbUxK44VA4yvZOO1Xh0i/q4xGWki 2q6ngsGJU3EWxQet3kk6YCnUGx+rT8i/8+D7nLv1NhrP/DkjmohNoqf9zm/RFJQKJ0KM PQOnoyqarMFLqAy4JbWraW+mOa8T8+04eP+Po3UIK6l/1llVIO1Oq0dx9bc1sJnAzKu2 JWXRqVFqm3eI+4L/r5Bix7i6+bDR4UK8tVKIvM3x3VERgRMDDg31CcG6/W7FhIyJYocP VKnA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=GAwiXcoN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w13si401629ejk.175.2020.07.28.22.10.27; Tue, 28 Jul 2020 22:10:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=GAwiXcoN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726353AbgG2FJl (ORCPT + 99 others); Wed, 29 Jul 2020 01:09:41 -0400 Received: from mail.kernel.org ([198.145.29.99]:45732 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726054AbgG2FJk (ORCPT ); Wed, 29 Jul 2020 01:09:40 -0400 Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id B677E21744 for ; Wed, 29 Jul 2020 05:09:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595999380; bh=eYb5MXNDios1YM3ylYsZfmcDeaiH+jiIZ1lgRFwy64s=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=GAwiXcoNz08y7TTbyLlCoUYpO5MeI+RNkj6Z/fJ/FknMkcrDelYDcCyRdSMznPhBk 1NMp1Qiq0B6/53VJfm2dmBzzTxgiRDJdqtC9dmMNp7Uejt+Y7Y9kF9GbTpSLjFNT8D RKD3FO1mRp/u2PP+0LAaMMO+flgn1PDTrIs+qSnk= Received: by mail-wr1-f48.google.com with SMTP id r12so20250235wrj.13 for ; Tue, 28 Jul 2020 22:09:39 -0700 (PDT) X-Gm-Message-State: AOAM532Y0d/O5GsMAK7aRiBHJymSotd0Sg/1zBK2sXw5DJ6360K2kTme GKcwsUFOiT6F+8JO3KPwBQZJFVhriCWatMJfkLsl3Q== X-Received: by 2002:a5d:5273:: with SMTP id l19mr27308359wrc.257.1595999378162; Tue, 28 Jul 2020 22:09:38 -0700 (PDT) MIME-Version: 1.0 References: <20200728202229.1195682-1-krisman@collabora.com> <20200728202229.1195682-3-krisman@collabora.com> <878sf3j5cl.fsf@collabora.com> In-Reply-To: <878sf3j5cl.fsf@collabora.com> From: Andy Lutomirski Date: Tue, 28 Jul 2020 22:09:26 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 2/6] arch: x86: Wrap TIF_IA32 checks To: Gabriel Krisman Bertazi Cc: Andy Lutomirski , Peter Zijlstra , Christoph Hellwig , Thomas Gleixner , Kees Cook , X86 ML , LKML , kernel@collabora.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 28, 2020 at 9:46 PM Gabriel Krisman Bertazi wrote: > > Andy Lutomirski writes: > > > On Tue, Jul 28, 2020 at 1:22 PM Gabriel Krisman Bertazi > > wrote: > >> > >> In preparation to remove TIF_IA32, add wrapper that check the process > >> has IA32 ABI without using the flag directly. > > > > Thank you for doing this, but let's please do it right. There is, > > fundamentally, no such thing as a "process with IA32 ABI". > > Hi Andy, > > Thanks a lot for your review. > > As you can see, I'm learning my way here. Can you clarify "there is no > such a thing as a 'process with IA32 ABI'"? I'm not sure if I confused > the terminology or if (more worrisome for me) I got the concepts wrong. > > My understanding is that TIF_IA32 marks a thread that is running under > the 32-bit compat mode, which would be running a 32-bit process (as in > compiled with -m32, for instance), while TIF_X32 marks a process running > under the X32 ABI. Each process would have only one of these > "personalities". This is what I meant by a process with IA32 ABI (which > is wrong in any case). Is there more to it, or is the problem the > terminology I used? There's more to it. On sane architectures like Linux on ARM64, a process can be 32-bit or 64-bit. If you are a 64-bit ARM process, you run in 64-bit mode, period. Anything other than execve() that lets you become 32-bit is a bug. Linux on x86 is nutty, though. (IIRC Sparc is too.) In particular, it's entirely legal for a Linux program to switch between 32-bit and 64-bit mode all by itself. This can be done with LJMP, LRET, IRET, or a few other hacks. It can also be done using any of the sigreturn syscalls. (SYSENTER comes to mind, but anyone who uses SYSENTER to switch modes is *nuts*.) Certain evil people named Andy Lutomirski write kernel selftests that switch modes using sigreturn just to abuse the kernel. But some real programs do this too. For example, I think that CRIU restores 32-bit programs by starting out as a 64-bit program, restoring everything, and then switching to 32-bit mode. Keep in mind that most of these mode-switching tricks do not enter the kernel at all. The upshot is that kernel does not actually know what the running program's ABI is. For all the kernel knows, a process uses the Linux/psABI 64-bit ABI, the ia32 ABI, the x32 ABI, the Windows 32-bit ABI, the Windows 64-bit ABI, and some gnarly ABI from DOS or OS/2, all in the same process. So we really shouldn't be trying to keep track of what we think the user program should be doing, and instead we should pay attention to what the user program actually did in its interactions with the kernel. If we want to know whether a user task is running in 64-bit mode right now, we can use user_64bit_mode(regs). (And keep in mind that syscalls like rt_sigreturn() and anything that gets ptraced can *change* the outcome of user_64bit_mode(regs) mid-syscall.) If we're delivering a signal, we instead want to know the ABI of the signal's handler, so we use is_ia32_frame(ksig) and is_x32_frame(ksig). If we want to know whether mmap() should return values above 2^32-1, we currently use some hackery, but we really ought to be looking at whether the call was 64-bit mmap() or compat mmap(). (I had some really half-baked patches to fix this at one point, but the core code was a real mess and I didn't finish it.) If we still supported MPX in the kernel, we'd have a real mess, because MPX malfunctions if user programs switch bitness on us. Good riddance! > > I don't have any comments on the other things you mentioned, except that > I need to go through them and better understand your suggestions. Would > you prefer me to rework this patch series with what you suggested or is > this something you want to take over and do yourself? Both ways are > fine by me. Please rework it :) I have seriously limited bandwidth right now.