Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp873789pxb; Fri, 22 Apr 2022 13:09:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzau2CS4ZnJSnFVzzN/hCLRs7OUj35aUGsqzdCEsFsWK/Bxf9JdPw5ULpIp5fl4Jvf6ST8Y X-Received: by 2002:a63:2b0b:0:b0:39d:890a:ab68 with SMTP id r11-20020a632b0b000000b0039d890aab68mr5225244pgr.247.1650658156414; Fri, 22 Apr 2022 13:09:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650658156; cv=none; d=google.com; s=arc-20160816; b=ZBw2ReHZOpihWJ3jmCzzTD5KsNG+TM1Wg2rywSQ7HJ8uCDKEggwr6dudyHgD2IsN6s bQdaSGXw4Qd5xkIyuNedrY6ue1mru0NCS7MxMe3Oz3Sznk3qfaaccP8oSQRxZErl7qDa v41MgNnh0PtIHKlw/yC17shbHwghXC/WNx4U8EFpyF0/Sh2D+AZLxJifL5NNqIVBMgh2 4WfmNsgxxH1S2ACYtmFhBtI6cH+2ysVrk9c6MiUJ7M1KbSmEXt4aodBIzS/eCfEqnm6q +SsZewgk2jBfP89pKihYRnC3UhmwdXqaY2B4b97M8Ds/oUQWRN/p2qEmeIoLXt40ZeBa ARYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :dkim-signature; bh=VMrQB8Mg6VKJYObqAvv9nuVQ8BrTb7/yb5RzzVbXM7Y=; b=d6HS2PH1fSRkfQKngPBNV1tIsZH20lFBsynq6RyUrVJnBgNflOnmazlIdP3J1UaGur W+4uZABi3BR/EY8TGr9nWUYyHLDJHPT6USHfeGcd09i3rHkcHQ4BxudKFpE8w955s31F CDlghhAdNsLJKrJv7+uMMKYsVnzjw8aMONYAgd2Td9GQHha2nDqxkr5YBgcF9/+Q6Zj1 xL28LN74eJchCuBneTgm4vLMutyUJtTgFb5fizA8s2apjNlI1p1NzJeK4Nd4xFKiKD2l jOTunFdG4uSwzE2QdGXNms5cloC3IorJNOlhGhhQuZK2/rtzyDpjHC/bxO5JfPs1+jIZ ASVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=G40LNoD3; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id z9-20020a170902ccc900b00153b2d164b0si9037327ple.184.2022.04.22.13.09.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Apr 2022 13:09:16 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=G40LNoD3; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 9ED5824DC22; Fri, 22 Apr 2022 12:05:20 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1388025AbiDUJ55 (ORCPT + 99 others); Thu, 21 Apr 2022 05:57:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1387984AbiDUJ5v (ORCPT ); Thu, 21 Apr 2022 05:57:51 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8F283DF98; Thu, 21 Apr 2022 02:55:01 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 3AD711F748; Thu, 21 Apr 2022 09:55:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1650534900; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=VMrQB8Mg6VKJYObqAvv9nuVQ8BrTb7/yb5RzzVbXM7Y=; b=G40LNoD3gBMlHBEsNDKoq/6Oq/g0SlkYZV73W0xvZQE1eWRGhEW0QAikzYvEJafzth6L/f X0EoFGAizGC2hXIVWEW7XVFJz5JgPJGrg8RAf5ZzyaJpYB98iyxWp79DMOczk9edacNc6J kZTS0et16KBC5Uuah/PvhaqjlanqNzk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1650534900; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=VMrQB8Mg6VKJYObqAvv9nuVQ8BrTb7/yb5RzzVbXM7Y=; b=QhliLrVy5IuCYIEAoFyqO7AU0uWiNDURI5tVdsz1bcke8wRdm5QMNaWd/jG6f8tw/MEy6j MAjIkbXQNVUhVhDA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 28F6913A84; Thu, 21 Apr 2022 09:55:00 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id DDIgCfQpYWJ9UAAAMHmgww (envelope-from ); Thu, 21 Apr 2022 09:55:00 +0000 Date: Thu, 21 Apr 2022 11:57:16 +0200 From: Cyril Hrubis To: Spencer Baugh Cc: linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, marcin@juszkiewicz.com.pl, torvalds@linux-foundation.org, arnd@arndb.de Subject: Re: Explicitly defining the userspace API Message-ID: References: <874k2nhgtg.fsf@catern.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <874k2nhgtg.fsf@catern.com> X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi! > Linux guarantees the stability of its userspace API, but the API > itself is only informally described, primarily with English prose. I > want to add an explicit, authoritative machine-readable definition of > the Linux userspace API. My background is in kernel testing I do maintain the Linux Test Project for more than a decade now. During the years we did create many "unit tests" for kernel syscalls that are watching over the syscall API and making sure that we get right results for both valid and invalid inputs. These tests can also be considered to be a form of a documentation. The same goes for some of the selftests that have been added to kernel repo in the recent years. In a sense these are the most detailed descriptions of the interfaces we have. The main problem is that the kernel userspace boundary is large, we have thousands of tests and I'm pretty sure that we don't cover even half of it. Also some of the interfaces are too complex to be even described in any formal system, mostly the modern stuff such as io_uring or bfp. I have had hard time even understading how to use these and I doubt I would be even able to build a formal system to describe them. Especially since the io_uring is mostly syscall less and we talk to the kernel by shared buffers and atomic data updates. > As background, in a conventional libc like glibc, read(2) calls the > Linux system call read, passing arguments in an architecture-specific > way according to the specific details of read. > > The details of these syscalls are at best documented in manpages, and > often defined only by the implementation. Anyone else who wants to > work with a syscall, in any way, needs to duplicate all those details. > > So the most basic definition of the API would just represent the > information already present in SYSCALL_DEFINE macros: the C types of > arguments and return values. More usefully, it would describe the > formats of those arguments and return values: that the first argument > to read is a file descriptor rather than an arbitrary integer, and > what flags are valid in the flags argument of openat, and that open > returns a file descriptor. A step beyond that would be describing, in > some limited way, the effects of syscalls; for example, that read > writes into the passed buffer the number of bytes that it returned. Having this would be awesome, this is just one step from actually generating automated tests for the syscalls. However my estimate is that even if you started to work on this now it will take decade to get somewhere, but maybe I'm too pesimistic. Stil fingers crossed. -- Cyril Hrubis chrubis@suse.cz