Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp6602pxk; Wed, 16 Sep 2020 15:24:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy22UhS1FN7SVX5MfHnoMM+xi0GxZ3r+iTtHjjfzUk2lULbR3AewPyMzJCDzlC8y5Q4R7Px X-Received: by 2002:a17:906:6005:: with SMTP id o5mr14845816ejj.465.1600295076714; Wed, 16 Sep 2020 15:24:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600295076; cv=none; d=google.com; s=arc-20160816; b=tr3DLN2sMm4gafcEeW2lFNEEJJLKJ2LYaRAFZ8FZvIfZpqIx9anfcNTSyvX81w4A1d CaOhZzPNEzgoQ+HtOc1Gjae5K4Y58HjuPs/uza7rwzyHc/cKSbwp5ZhOW7fhhGm+cmna CrkpOGiw5mdIn6iECDERMfQ3cPYQoj/FakgLj/gff0//Mi8dk0kP5YHC9kKhomTsd2rn 5zAu2s6EzhHnD/agc+mOZ2533o9fZY8gdutlEelVt/p332U7x8BsRjYheSLpXv/14RTW mHoJvrqQgmqmkwc88WcycPPrTtppb9bAIl/EmC2rt8jVYPZwMifcl487YFa3PJtw6FNd 4YuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=0U6lePtHUzuHNkGr/lHdQ2n9vFxpAC0RKTsGOLtWqMU=; b=xCwp4tjkHDPLogesrWfm0qq2CPjblRZBWDqe6XRAsZEQCUdWH+hMU4mOky9SuN4+iR vbi0SAxQnDZfLoRHKSc1GDWb6UzIWvpKJEAFkBpkXywBcT9cBPlRl2VqT7Byeo4vTeNk hjV1tasmWRaSqShal1z7mcnkTcVs9wT0ALgGOmk3Sdk2uENdTUrU6kEJ22mo4snK5La9 gBQNVsArWbqm0PiSwdynvL75XWQkqIccTfUv+XHmxSHJ6rlEdEKfbX0AP/WyqyQ/SfEM J6cmAPPg84ELP8axj6JUhkYMlWGpXh0p3CTIHYDQ7LD8iow4bPUC/fk10nqdcdGPm0Mt 3T7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b=BuOSmxtg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f8si12420658edm.193.2020.09.16.15.24.11; Wed, 16 Sep 2020 15:24:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b=BuOSmxtg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726301AbgIPWXD (ORCPT + 99 others); Wed, 16 Sep 2020 18:23:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50360 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726408AbgIPWXB (ORCPT ); Wed, 16 Sep 2020 18:23:01 -0400 Received: from mail-ot1-x341.google.com (mail-ot1-x341.google.com [IPv6:2607:f8b0:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C6B7C061D7E for ; Wed, 16 Sep 2020 14:43:14 -0700 (PDT) Received: by mail-ot1-x341.google.com with SMTP id n61so8074161ota.10 for ; Wed, 16 Sep 2020 14:43:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=0U6lePtHUzuHNkGr/lHdQ2n9vFxpAC0RKTsGOLtWqMU=; b=BuOSmxtghdYv1Pu3+U3tb8abJ5nf3TWnJBg3uVURF8/e2SpjhWAdyUHWhFgYIFMxvB d5+bwZ5pRd58OgZ4vH/yU+7BPFXtf1oaS9iv8VNgV6jcyzfMIggtf1o50gCrR9J41Dga uJ7taPXVqQy9TZGItqkyL1/OnuZWlkNHLm4+E= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=0U6lePtHUzuHNkGr/lHdQ2n9vFxpAC0RKTsGOLtWqMU=; b=Hq7D2wYPNRf9Uvwd68hohWqsBYb+gmDhJv0r7iBs6Gox5HZO5P6kkXXv0w+/skk3iS i5GY7np8jHBSHnOPGFc3/bfM2ojTZ67i1SWg6u2ExCRhI7KktpVcD7+ifzL2nmGhYsJa rJUmStO544Yd+J0yF+sEeVUOsl2jkqYLbP/oni3K3FjsdP3Tc8omq6ExMxMaAOWDl96c hn7A6dmnOVBRV+0CpknKLlpfm0LennOEpVYD/ySK69tl7nSakLBh6w3yBKoWMvXm721g 18upOhRSSD29A1sfr/hoRsxBP6nKjQjeiRj23pBv9S1ONL/FZOM0shEigOj3FPA5V40n NW6w== X-Gm-Message-State: AOAM532RVnx0k8UKVVMY4M0LeTdSuKuS/0htr0Q9j/JR7FnH4ReIV3dh +Dsh8mcUOx2t2D/HYePFxXb14nXqeOze2JC4+CecnD4sytidcA== X-Received: by 2002:a05:6830:14d9:: with SMTP id t25mr18766603otq.188.1600292593863; Wed, 16 Sep 2020 14:43:13 -0700 (PDT) MIME-Version: 1.0 References: <20200914204209.256266093@linutronix.de> <871rj4owfn.fsf@nanos.tec.linutronix.de> <87bli75t7v.fsf@nanos.tec.linutronix.de> <20200916152956.GV29330@paulmck-ThinkPad-P72> <20200916205840.GD29330@paulmck-ThinkPad-P72> In-Reply-To: <20200916205840.GD29330@paulmck-ThinkPad-P72> From: Daniel Vetter Date: Wed, 16 Sep 2020 23:43:02 +0200 Message-ID: Subject: Re: [patch 00/13] preempt: Make preempt count unconditional To: "Paul E. McKenney" Cc: Linus Torvalds , Thomas Gleixner , Ard Biesheuvel , Herbert Xu , LKML , linux-arch , Sebastian Andrzej Siewior , Valentin Schneider , Richard Henderson , Ivan Kokshaysky , Matt Turner , alpha , Jeff Dike , Richard Weinberger , Anton Ivanov , linux-um , Brian Cain , linux-hexagon@vger.kernel.org, Geert Uytterhoeven , linux-m68k , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Will Deacon , Andrew Morton , Linux-MM , Ingo Molnar , Russell King , Linux ARM , Chris Zankel , Max Filippov , linux-xtensa@linux-xtensa.org, Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , David Airlie , intel-gfx , dri-devel , Josh Triplett , Mathieu Desnoyers , Lai Jiangshan , Shuah Khan , rcu@vger.kernel.org, "open list:KERNEL SELFTEST FRAMEWORK" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 16, 2020 at 10:58 PM Paul E. McKenney wrote: > > On Wed, Sep 16, 2020 at 10:29:06PM +0200, Daniel Vetter wrote: > > On Wed, Sep 16, 2020 at 5:29 PM Paul E. McKenney wrote: > > > > > > On Wed, Sep 16, 2020 at 09:37:17AM +0200, Daniel Vetter wrote: > > > > On Tue, Sep 15, 2020 at 7:35 PM Linus Torvalds > > > > wrote: > > > > > > > > > > On Tue, Sep 15, 2020 at 1:39 AM Thomas Gleixner wrote: > > > > > > > > > > > > OTOH, having a working 'preemptible()' or maybe better named > > > > > > 'can_schedule()' check makes tons of sense to make decisions about > > > > > > allocation modes or other things. > > > > > > > > > > No. I think that those kinds of decisions about actual behavior are > > > > > always simply fundamentally wrong. > > > > > > > > > > Note that this is very different from having warnings about invalid > > > > > use. THAT is correct. It may not warn in all configurations, but that > > > > > doesn't matter: what matters is that it warns in common enough > > > > > configurations that developers will catch it. > > > > > > > > > > So having a warning in "might_sleep()" that doesn't always trigger, > > > > > because you have a limited configuration that can't even detect the > > > > > situation, that's fine and dandy and intentional. > > > > > > > > > > But having code like > > > > > > > > > > if (can_schedule()) > > > > > .. do something different .. > > > > > > > > > > is fundamentally complete and utter garbage. > > > > > > > > > > It's one thing if you test for "am I in hardware interrupt context". > > > > > Those tests aren't great either, but at least they make sense. > > > > > > > > > > But a driver - or some library routine - making a difference based on > > > > > some nebulous "can I schedule" is fundamentally and basically WRONG. > > > > > > > > > > If some code changes behavior, it needs to be explicit to the *caller* > > > > > of that code. > > > > > > > > > > So this is why GFP_ATOMIC is fine, but "if (!can_schedule()) > > > > > do_something_atomic()" is pure shite. > > > > > > > > > > And I am not IN THE LEAST interested in trying to help people doing > > > > > pure shite. We need to fix them. Like the crypto code is getting > > > > > fixed. > > > > > > > > Just figured I'll throw my +1 in from reading too many (gpu) drivers. > > > > Code that tries to cleverly adjust its behaviour depending upon the > > > > context it's running in is harder to understand and blows up in more > > > > interesting ways. We still have drm_can_sleep() and it's mostly just > > > > used for debug code, and I've largely ended up just deleting > > > > everything that used it because when you're driver is blowing up the > > > > last thing you want is to realize your debug code and output can't be > > > > relied upon. Or worse, that the only Oops you have is the one in the > > > > debug code, because the real one scrolled away - the original idea > > > > behind drm_can_sleep was to make all the modeset code work > > > > automagically both in normal ioctl/kworker context and in the panic > > > > handlers or kgdb callbacks. Wishful thinking at best. > > > > > > > > Also at least for me that extends to everything, e.g. I much prefer > > > > explicit spin_lock and spin_lock_irq vs magic spin_lock_irqsave for > > > > locks shared with interrupt handlers, since the former two gives me > > > > clear information from which contexts such function can be called. > > > > Other end is the memalloc_no*_save/restore functions, where I recently > > > > made a real big fool of myself because I didn't realize how much that > > > > impacts everything that's run within - suddenly "GFP_KERNEL for small > > > > stuff never fails" is wrong everywhere. > > > > > > > > It's all great for debugging and sanity checks (and we run with all > > > > that stuff enabled in our CI), but really semantic changes depending > > > > upon magic context checks freak my out :-) > > > > > > All fair, but some of us need to write code that must handle being > > > invoked from a wide variety of contexts. Now perhaps you like the idea of > > > call_rcu() for schedulable contexts, call_rcu_nosched() when preemption > > > is disabled, call_rcu_irqs_are_disabled() when interrupts are disabled, > > > call_rcu_raw_atomic() from contexts where (for example) raw spinlocks > > > are held, and so on. However, from what I can see, most people instead > > > consistently prefer that the RCU API instead be consolidated. > > > > > > Some in-flight cache-efficiency work for kvfree_rcu() and call_rcu() > > > needs to be able to allocate memory occasionally. It can do that when > > > invoked from some contexts, but not when invoked from others. Right now, > > > in !PREEMPT kernels, it cannot tell, and must either do things to the > > > memory allocators that some of the MM hate or must unnecessarily invoke > > > workqueues. Thomas's patches would allow the code to just allocate in > > > the common case when these primitives are invoked from contexts where > > > allocation is permitted. > > > > > > If we want to restrict access to the can_schedule() or whatever primitive, > > > fine and good. We can add a check to checkpatch.pl, for example. Maybe > > > we can go back to the old brlock approach of requiring certain people's > > > review for each addition to the kernel. > > > > > > But there really are use cases that it would greatly help. > > > > We can deadlock in random fun places if random stuff we're calling > > suddenly starts allocating. Sometimes. Maybe once in a blue moon, to > > make it extra fun to reproduce. Maybe most driver subsystems are less > > brittle, but gpu drivers definitely need to know about the details for > > exactly this example. And yes gpu drivers use rcu for freeing > > dma_fence structures, and that tends to happen in code that we only > > recently figured out should really not allocate memory. > > > > I think minimally you need to throw in an unconditional > > fs_reclaim_acquire();fs_reclaim_release(); so that everyone who runs > > with full debugging knows what might happen. It's kinda like > > might_sleep, but a lot more specific. might_sleep() alone is not > > enough, because in the specific code paths I'm thinking of (and > > created special lockdep annotations for just recently) sleeping is > > allowed, but any memory allocations with GFP_RECLAIM set are no-go. > > Completely agreed! Any allocation on any free path must be handled > -extremely- carefully. To that end... > > First, there is always a fallback in case the allocation fails. Which > might have performance or corner-case robustness issues, but which will > at least allow forward progress. Second, we consulted with a number of > MM experts to arrive at appropriate GFP_* flags (and their patience is > greatly appreciated). Third, the paths that can allocate will do so about > one time of 500, so any issues should be spotted sooner rather than later. > > So you are quite right to be concerned, but I believe we will be doing the > right things. And based on his previous track record, I am also quite > certain that Mr. Murphy will be on hand to provide me any additional > education that I might require. > > Finally, I have noted down your point about fs_reclaim_acquire() and > fs_reclaim_release(). Whether or not they prove to be needed, I do > appreciate your calling them to my attention. I just realized that since these dma_fence structs are refcounted and userspace can hold references (directly, it can pass them around behind file descriptors) we might never hit such a path until slightly unusual or evil userspace does something interesting. Do you have links to those patches? Some googling didn't turn up anything. I can then figure out whether it's better to risk not spotting issues with call_rcu vs slapping a memalloc_noio_save/restore around all these critical section which force-degrades any allocation to GFP_ATOMIC at most, but has the risk that we run into code that assumes "GFP_KERNEL never fails for small stuff" and has a decidedly less tested fallback path than rcu code. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch