Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp2807214rwb; Mon, 19 Sep 2022 10:12:19 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7bziw+F2RhkOQJGuQT8MZ4H3VEQacMHMHjqjhgOFqF8CFGzUUNpYHr7eGLcj7Dhrsodq06 X-Received: by 2002:a17:907:78d:b0:740:33e1:998 with SMTP id xd13-20020a170907078d00b0074033e10998mr13660520ejb.162.1663607538791; Mon, 19 Sep 2022 10:12:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663607538; cv=none; d=google.com; s=arc-20160816; b=PbgfnVSV1ycuwRoYn0DQ8TRhRhoMBsxeM6VAetpMqWod7M5fa4Z3X8Iw3IOrwcgbRB 9aB6iVdIs9xMjA8Y9IPSGlMmv1Lvf1KHZU1xPETycZzgztCCLvuEUoWsUhG41pJ99Oya 3ep4CFaomKQQSNIRNYuwdqEFmy+16LePZrM2hKBoqjUnRomLYG4ohv/UOEtYrgkuzTdz Efk82ZHVlqg5Z/C3lhsX/ScaVwaotTzni3h1QpJFNM9ojdZHm/JzO0UJsGuBBSsI9blo KNE+0UoLs4iChmjSC68zYYJMhns0GwPR8kAizHiS/NFuT52n7ise8r6TSqot+LsPUBCF b0cw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=klevHm3vypIn+i0yQtuh2tQgsYKrfGVP5Z+xZ4SLjKs=; b=DgCvOM39MUFsCDH1a7yuInwySxn+NKBWpLp74K8sWJZ2t9kht7ADI6gL2tz5l2pUPI MYILWoKlAx/QX81jw18U2X1SOWft2PjhUBHzJWb6dQdHrXlTG/TuBlRavUz3brfp8ssl C0PLNzvdw/Qy6ldCS9L5xlW7OB4taIgBxcXWKtsC+OXPumQYESjqgsvuMWk6XWOMqZTy mIvLjIvig0MTR4s+fGJ6jRrtrkuiKNcnkSTVTQGP7CdAw+Lj9QzwIXS5gK3p14nk3Uo9 iK8FFJecIaUO5jm+S0aRvxMGq5th0Zza9mlhgWym0nwncoUBdjpxW0u6zREESYRMns5b VFKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=ZWK6YGLx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bx10-20020a0564020b4a00b004542781a13bsi2512385edb.383.2022.09.19.10.11.51; Mon, 19 Sep 2022 10:12:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=ZWK6YGLx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229889AbiISQKS (ORCPT + 99 others); Mon, 19 Sep 2022 12:10:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37232 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229746AbiISQKP (ORCPT ); Mon, 19 Sep 2022 12:10:15 -0400 Received: from mail-lf1-x12d.google.com (mail-lf1-x12d.google.com [IPv6:2a00:1450:4864:20::12d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BCF8A2AE3C for ; Mon, 19 Sep 2022 09:10:12 -0700 (PDT) Received: by mail-lf1-x12d.google.com with SMTP id f14so46769983lfg.5 for ; Mon, 19 Sep 2022 09:10:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=klevHm3vypIn+i0yQtuh2tQgsYKrfGVP5Z+xZ4SLjKs=; b=ZWK6YGLx9xp+wN7U3o4oknV3d7wzEt6zdkIwvLYoMR6GHwwihBnPuQsqWLF0E32rVI XWrgHrqAKsnmr2hYIz4w2o0kO+dzIj3hREoT4LKrx2QhuxcShGO4Hxn/FoOHQnnOIEwa DOVcvZm2pyQpKKMr7N+pt2/H35QT9+OITZEH0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=klevHm3vypIn+i0yQtuh2tQgsYKrfGVP5Z+xZ4SLjKs=; b=T4LXnpVKOEXnDsrOvKk2RSd4+yciD73RIW38y8zHR0aPeIVEgBWtRgcsxH77tmqIt4 S5wG785UzH+VMBRmLf1ontcDpgm+lG/w8mnq6+bgWmRUR+amHdD8E167OEXCtxieOfmF VukCsBNOgndr5h2+PdquZyGlKe54SnNa3ZGijjdkYzaSxn4rtYkR31ItLPFU5JH7JKQa US68D2+FmJFtMaZ4XFwMlMnvHGOjvQTsRFETX9Ln/kreaA4PuFyyHyZYdBxmoOMZSi2I XOr6qh+vBjiHW5xYpDaDN+u/mSsFgRgOhBnApyoFKmgvAc30f3D8AuLbTJeesVoi19P2 d48w== X-Gm-Message-State: ACrzQf1Q99iBdBFgEgY1Pi83oAmF3rVI7+not7gCtG770sLS6vn76NhZ ju3g54olCI9UyysLB6wlKeH9f6pNzwyHQwbF X-Received: by 2002:ac2:4e0d:0:b0:49c:d593:9d6c with SMTP id e13-20020ac24e0d000000b0049cd5939d6cmr6717332lfr.37.1663603809356; Mon, 19 Sep 2022 09:10:09 -0700 (PDT) Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com. [209.85.167.50]) by smtp.gmail.com with ESMTPSA id r13-20020a2eb60d000000b0026c4374a2a4sm1097900ljn.139.2022.09.19.09.10.08 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 19 Sep 2022 09:10:08 -0700 (PDT) Received: by mail-lf1-f50.google.com with SMTP id k10so47712367lfm.4 for ; Mon, 19 Sep 2022 09:10:08 -0700 (PDT) X-Received: by 2002:a05:6512:3d16:b0:498:f04f:56cf with SMTP id d22-20020a0565123d1600b00498f04f56cfmr7277354lfv.612.1663603797869; Mon, 19 Sep 2022 09:09:57 -0700 (PDT) MIME-Version: 1.0 References: <20220805154231.31257-13-ojeda@kernel.org> In-Reply-To: From: Linus Torvalds Date: Mon, 19 Sep 2022 09:09:40 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v9 12/27] rust: add `kernel` crate To: Wedson Almeida Filho Cc: Matthew Wilcox , Kees Cook , Miguel Ojeda , Konstantin Shelekhin , ojeda@kernel.org, alex.gaynor@gmail.com, ark.email@gmail.com, bjorn3_gh@protonmail.com, bobo1239@web.de, bonifaido@gmail.com, boqun.feng@gmail.com, davidgow@google.com, dev@niklasmohrin.de, dsosnowski@dsosnowski.pl, foxhlchen@gmail.com, gary@garyguo.net, geofft@ldpreload.com, gregkh@linuxfoundation.org, jarkko@kernel.org, john.m.baublitz@gmail.com, leseulartichaut@gmail.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, m.falkowski@samsung.com, me@kloenk.de, milan@mdaverde.com, mjmouse9999@gmail.com, patches@lists.linux.dev, rust-for-linux@vger.kernel.org, thesven73@gmail.com, viktor@v-gar.de, Andreas Hindborg Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 19, 2022 at 7:07 AM Wedson Almeida Filho wrote: > > For GFP_ATOMIC, we could use preempt_count except that it isn't always > enabled. Conveniently, it is already separated out into its own config. > How do people feel about removing CONFIG_PREEMPT_COUNT and having the > count always enabled? > > We would then have a way to reliably detect when we are in atomic > context [..] No. First off, it's not true. There are non-preempt atomic contexts too, like interrupts disabled etc. Can you enumerate all those? Possibly. But more importantly, doing "depending on context, I silently and automatically do different things" is simply WRONG. Don't do it. It's a disaster. Doing that for *debugging* is fine. So having a WARN_ON_ONCE(in_atomic_context()); is a perfectly fine debug check to find people who do bad bad things, and we have lots of variations of that theme (ie might_sleep(), but also things like lockdep_assert_held() and friends that assert other constraints entirely). But having *behavior changes* depending on context is a total disaster. And that's invariably why people want this disgusting thing. They want to do broken things like "I want to allocate memory, and I don't want to care where I am, so I want the memory allocator to just do the whole GFP_ATOMIC for me". And that is FUNDAMENTALLY BROKEN. If you want to allocate memory, and you don't want to care about what context you are in, or whether you are holding spinlocks etc, then you damn well shouldn't be doing kernel programming. Not in C, and not in Rust. It really is that simple. Contexts like this ("I am in a critical region, I must not do memory allocation or use sleeping locks") is *fundamental* to kernel programming. It has nothing to do with the language, and everything to do with the problem space. So don't go down this "let's have the allocator just know if you're in an atomic context automatically" path. It's wrong. It's complete garbage. It may generate kernel code that superficially "works", but one that is fundamentally broken, and will fail and becaome unreliable under memory pressure. The thing is, when you do kernel programming, and you're holding a spinlock and need to allocate memory, you generally shouldn't allocate memory at all, you should go "Oh, maybe I need to do the allocation *before* getting the lock". And it's not just locks. It's also "I'm in a hardware interrupt", but now the solution is fundamentally different. Maybe you still want to do pre-allocation, but now you're a driver interrupt and the pre-allocation has to happen in another code sequence entirely, because obviously the interrupt itself is asynchronous. But more commonly, you just want to use GFP_ATOMIC, and go "ok, I know the VM layer tries to keep a _limited_ set of pre-allocated buffers around". But it needs to be explicit, because that GFP_ATOMIC pool of allocations really is very limited, and you as the allocator need to make it *explicit* that yeah, now you're not just doing a random allocation, you are doing one of these *special* allocations that will eat into that very limited global pool of allocations. So no, you cannot and MUST NOT have an allocator that silently just dips into that special pool, without the user being aware or requesting it. That really is very very fundamental. Allocators that "just work" in different contexts are broken garbage within the context of a kernel. Please just accept this, and really *internalize* it. Because this isn't actually just about allocators. Allocators may be one very common special case of this kind of issue, and they come up quite often as a result, but this whole "your code needs to *understand* the special restrictions that the kernel is under" is something that is quite fundamental in general. It shows up in various other situations too, like "Oh, this code can run in an interrupt handler" (which is *different* from the atomicity of just "while holding a lock", because it implies a level of random nesting that is very relevant for locking). Or sometimes it's subtler things than just correctness, ie "I'm running under a critical lock, so I must be very careful because there are scalability issues". The code may *work* just fine, but things like tasklist_lock are very very special. So embrace that kernel requirement. Kernels are special. We do things, and we have constraints that pretty much no other environment has. It is often what makes kernel programming interesting - because it's challenging. We have a very limited stack. We have some very direct and deep interactions with the CPU, with things like IO and interrupts. Some constraints are always there, and others are context-dependent. The whole "really know what context this code is running within" is really important. You may want to write very explicit comments about it. And you definitely should always write your code knowing about it. Linus