Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2408275imu; Wed, 21 Nov 2018 11:08:59 -0800 (PST) X-Google-Smtp-Source: AFSGD/Xt9guRwJTiHFc0MPEGqT54dRohtifMlZAPy050N6SWW5sdCflkeED8Td1uzAKnC6pF5vsZ X-Received: by 2002:a63:a41:: with SMTP id z1mr7082419pgk.117.1542827339432; Wed, 21 Nov 2018 11:08:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542827339; cv=none; d=google.com; s=arc-20160816; b=KQeayFCBBuhFD0GEb9IWN2r0GVEYFE8umvb/kvb+r/h0uX48UiNqFmLlAeBsKzPfeB nG33VZd2robDJqMtVrrJuYzIgi9W+pRuefyHM2CJPMpXuCqo79Ux7VFVJ/lpQq9LFg2C 4HLptKvKimvAjkghaB/DRc5ng+3zFxDBjskTaTiQw8wIibEFZ/d6bMYN0EGokv7tjh1Z 58z1k9bkiYAl4+jquAywj/41ucssgYycs0D8dUuL+RLKTyHzZbdenuUepshREdEA4kQ0 +jorDCTKlJxuckol89BR5CoX+LW+wO9RVWtFx1sNbGre43o2xf67IGZPJArLSY3zJwUl ymow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version; bh=k/nbbnLcoB2UD/mJAsAgrOoPAKxIOgOaJZPIaXY/YOo=; b=ktdmhfbHDUVB5H4O4/ZEVxWJvpMBmCxPHrMSejYl65sI4ZlQVwgArAUraKu+egBO09 TT5eTD84o7un4U0SNQBg8aJsKQyQf/U93/uf5PtZiU++ZX3oWJAobQmHHFnirp0IroDF ewfbhstPhtQBBDkqaE58HN7voBBI4DNWBHGI/ohImgNWLRidBrJVt3yUsqm0+02mGdrF vUKX1Bdx4g1S0+wAuaqBlNpN+TnyCwxnj+oBCYqKG3ZDBZ/W8P7iGBhUS29ufC+eCi7W 6J75XNaE680kiPjtSL8Z7gq+KrgTRZVSRwi7m2MatRBVtw88MmUBxKYbSQ+QGva+fct4 cbBw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w7si28446295ply.421.2018.11.21.11.08.43; Wed, 21 Nov 2018 11:08:59 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732151AbeKVDt4 (ORCPT + 99 others); Wed, 21 Nov 2018 22:49:56 -0500 Received: from mail-qt1-f195.google.com ([209.85.160.195]:33589 "EHLO mail-qt1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725995AbeKVDt4 (ORCPT ); Wed, 21 Nov 2018 22:49:56 -0500 Received: by mail-qt1-f195.google.com with SMTP id l11so4641684qtp.0 for ; Wed, 21 Nov 2018 09:14:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=k/nbbnLcoB2UD/mJAsAgrOoPAKxIOgOaJZPIaXY/YOo=; b=mWNRdzDpnKmU415fwZvSDDqkXeSzE14DpmA8vjyZmZmFXuJ0U/hxIP3FDAozN4vuih 08mZGwdgdjwbNAKweZHmlm2BvtlDosTKAhy2c85+jy3Favr3Y0q0z3XciuaTlgQZo3qo PvKEdrOMmcQ9qfbT99qgKkYRd4NaeFr4nlhh0bv87gMi2nJtFa+2x7LfeW0w1nQ7hjCF WbrzzmYdUIzYF+AmKxiS9C3aR5Z+K9uUNlwq5rxvN1I70jy/2A4WDHfxQsFBNxBlQZ6p AQ3nS2wXllcnSs2K98P1yhNNy+P7C7EQXoGv5HmFtElXl+8zMjNnbg7HCRuECXzntVht iL1Q== X-Gm-Message-State: AGRZ1gKgIV3nscTsvcrhMwxzcLmdXJTDeTFWlODxs8qvTsURlbqhCVyw grQNHNe08ftfTQkRArWHnaUbxBOjHML/PdWBoMHCQmCx X-Received: by 2002:aed:35c5:: with SMTP id d5mr6717396qte.212.1542820479445; Wed, 21 Nov 2018 09:14:39 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Arnd Bergmann Date: Wed, 21 Nov 2018 18:14:23 +0100 Message-ID: Subject: Re: Cleaning up numbering for new x86 syscalls? To: Andy Lutomirski Cc: "the arch/x86 maintainers" , Linux Kernel Mailing List , Borislav Petkov , Peter Zijlstra , Tycho Andersen , Daniel Colascione , Florian Weimer , carlos@redhat.com, Rich Felker Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 20, 2018 at 1:25 AM Andy Lutomirski wrote: > > Hi all- > > We currently have some giant turds in the way that syscalls are > numbered. We have the x86_32 table, which is totally sane other than > some legacy multiplexers. Then we have the x86_64 table, which is, > um, demented: > > - The numbers don't match x86_32. I have no idea why. I think it was an early attempt at cleanup up the table, and only adding those that were still used. Back in the days, each architecture had its own table, and of course they started out as separate top-level architectures. > - We use bit 30, which triggers in_x32_syscall(). It should have > been bit 31, bit I digress. > > - We have this weird set of extra x32 syscalls that start at 512. > Who wants to bet whether we have no bugs if someone does syscall with, > say, nr == 512 (i.e. not 512 | BIT(30)) or nr == (16 | BIT(30))? The > latter would be non-compat ioctl with in_x32_syscall() set and hence > in_compat_syscall() set. The comment in the table says it's purely for keeping the calls in separate cache lines. I don't know if the cache lines make a difference in the end, but it seems that once we start running into the x32 syscall numbers, I think we just treat them like any others, we just choose to never call them from a 64-bit glibc. > I propose we consider some subset of the following: > > 1. Introduce restart_syscall_2(). Make its number be 1024. Maybe > someday we could start using it instead of restart_syscall(). The > only issue I can see is programs that allow restart_syscall() using > seccomp but don't allow the new variant. > > 2. Introduce an outright ban on new syscalls with nr < 1024. This would leave a hole of several hundred numbers if we do it for all architectures. Wasting multiple kilobytes for a cosmetic cleanup might be considered excessive. > 3. Introduce an outright ban on the addition of new __x32_compat > syscalls. If new compat hacks are needed, they can use > in_compat_syscall(), thank you very much. I would definitely want to keep anything regarding x32 out of the common syscall implementation. If you want to add on to that pile, please do it in arch/x86, not in kernel/ or fs/. If we decide that x32 is a failed experiment and we don't keep it working in the future, let's just kill it off right away. I'm fairly sure nobody depends on it for anything real, the only users I could find are either for showing off benchmark results or for playing around with it for fun. Most of that fun part has apparently ended many years ago, but there is still some work going into debian/x32. We probably need to coordinate with them and see if they know of actual users before removing it. Popcon lists 5 active users [1] and a sharp downward trend. > 4. Modify the wrappers of the __x32_compat entries so that they will > return -ENOSYS if in_x32_syscall() returns false. No objection here, but what would that help? > 5. Adjust the scripts so that we only have to wire up new syscalls > once. They'll have a nr above 1024, and they'll have the same nr on > all x86 variants. > > Thoughts? I would definitely welcome assigning the same syscall numbers across all architectures. It is a needless burden for the libc developers to figure out for each syscall which kernel is known to support it. When a call gets added, they typically add logic to check for the system call at runtime, but for older syscalls, it helps to know when all architectures support it once the minimum kernel version for a libc has been raised beyond that. Please see also the work that Firoz Khan has been posting for generalizing the tables on all architectures to use the format we have on x86, arm and s390. I hope we can merge it all for 4.21, and then build on top of that for generalization and cleanups. Arnd [1] https://popcon.debian.org/stat/sub-x32.png