Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp8448ybf; Wed, 26 Feb 2020 07:54:43 -0800 (PST) X-Google-Smtp-Source: APXvYqxTmuCUQJmUY6rXzOI3OHO5ezZJAw4wWFHwVN2awkdF8R1iOnien+MrFGnaTqoz+WAfKOCr X-Received: by 2002:a9d:750b:: with SMTP id r11mr3747209otk.310.1582732483742; Wed, 26 Feb 2020 07:54:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582732483; cv=none; d=google.com; s=arc-20160816; b=o3lAf/Nm3jli8lXKXS2wJ2YraZ4foZ9iPOzxe7ktIpXLckazjX2hhdAYVWn0NVVeBQ abWx4okDcEtDSYKrA44GmJP17hyz4sCOnfdBW9bxxgzB9UgC7cynZYQf5nv46HbPfbwa 40ZXis0Eo+t/XfH2nzRcRSzoheCLrO9CNi9ukCxqX6R8k539uI1vhHmD/8cXZzZo0yXC 0hQIrwXXm+us/iGxP+fBIkgKnHbFDXaSO5rcjilfp/DZaden9wzC9qnQbW5Xg/zcvCDg AHVSTrGpcCXkm8WAMTcMR11O5sAuyyIPR05xDBD3FgdW7XLMGR/t4C4dro6CD08C8ZwI EgSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:date:from:dkim-signature; bh=nRV/ZP4gHjtmZI58ArlgV50vbukG5a7XrSGzPw4XlO0=; b=owTujRdOIRnp2tGo58PruZNDryMH6oLwi/xdoYy3ZQlSBpdjrRZ9mB3iwzewK/CjXZ TCWeEbwN7LDNZgvJ79MNU19GiaFpeo4CAysQwenbrlpae2mbnj4tAVaEdwbWRuuCFN6M nV5JS0UmdQ85Uyab0ufgjrgbpYpn7bkRBvbtVLSXiYKxDSb8xLLD69sxge+6mYoTXWKV 4dQh04etv59AvsmO3sv1dnOTboEJ9rufBJX4tNjT8ycuHqG45Bl3OlEdvykxUS7WDOEZ nz4Hks8M3twaNgCW/8UISjHpTbexbbggcGOxbz1QX+EB/c0AcD+NxJZREglYjK20RlAr Q4Cg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=D59yYChx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w5si1499665otq.238.2020.02.26.07.54.31; Wed, 26 Feb 2020 07:54:43 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=D59yYChx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728524AbgBZPx6 (ORCPT + 99 others); Wed, 26 Feb 2020 10:53:58 -0500 Received: from mail-lj1-f195.google.com ([209.85.208.195]:39631 "EHLO mail-lj1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728205AbgBZPx6 (ORCPT ); Wed, 26 Feb 2020 10:53:58 -0500 Received: by mail-lj1-f195.google.com with SMTP id o15so3664649ljg.6; Wed, 26 Feb 2020 07:53:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=nRV/ZP4gHjtmZI58ArlgV50vbukG5a7XrSGzPw4XlO0=; b=D59yYChxABJVuiRSx3NvnkYsU3atQDD/8A4eIUwnEL0U6o+ezF8kcW9fFHkSeF9mlw Wxqhr7lpZBOtxnjee9PGbcSG8DNXg0G+hnijcT2t0V6BcjIgS86+kug1SQDLPQtTnMm0 ZvwNv5OaIUss5PDhJKVQvNVizU/dYaueSJs+fWML+b+Li/Rj8f3eDtoAILkP//nqXG3M jv9RSPsNDzid28di/eUzLjG+BlhQF6TGtT7lYmnh4f6SKZawcNqMtVIetpSqJgUWqDYg 8HkHEEBhutZH3E6sT+PqMO5FeE/l6VnxB1C5HN2m1Sj1Ct28FS50lpqjBUjFCwV6tQDJ AsVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=nRV/ZP4gHjtmZI58ArlgV50vbukG5a7XrSGzPw4XlO0=; b=DFdgQHefDMYVs6dlSTmkfLCw0VEJuEliHXotMCvSOqJszQel+eIgqLUFwjLv+5GM7l ooHGssz6xeUiotTaiBDk4D3y3l+z/I5tl1a07c4x3W3RM9AEScalROX99CgTpIcQSxEG H5hMzg3mZSHL6rtkNpVM0bc8I37oYNmGUNqg9CEMRRnNZVknzYlu2yr6GFmodZbwgKQz sd3bAE9XV+K3o5uCGkG7D7RaX6M7qrGk8uz0zx5z9kSJxX0xKiRUmmpasxeWwu0h+rz/ kOCbQ2yLfdNwmVrUxY9TOJsIp+wTLSucgViffnZ1MWPeVag2vwEuK+dRVIBVWldv1MW8 D1OA== X-Gm-Message-State: ANhLgQ1MhSdLwbpaLgq9g0GwnHhr6Y7DwujhCDSHSK7HdVnLKISMoegs W6OnUfU13cgvmKYrCnZ/kb0= X-Received: by 2002:a2e:95c4:: with SMTP id y4mr3454093ljh.38.1582732435265; Wed, 26 Feb 2020 07:53:55 -0800 (PST) Received: from pc636 ([37.139.158.167]) by smtp.gmail.com with ESMTPSA id f21sm1407986ljc.30.2020.02.26.07.53.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 Feb 2020 07:53:54 -0800 (PST) From: Uladzislau Rezki X-Google-Original-From: Uladzislau Rezki Date: Wed, 26 Feb 2020 16:53:47 +0100 To: "Paul E. McKenney" , Joel Fernandes Cc: Uladzislau Rezki , Joel Fernandes , "Theodore Y. Ts'o" , Ext4 Developers List , Suraj Jitindar Singh , LKML Subject: Re: [PATCH RFC] ext4: fix potential race between online resizing and write operations Message-ID: <20200226155347.GA31097@pc636> References: <20200221131455.GA4904@pc636> <20200221202250.GK2935@paulmck-ThinkPad-P72> <20200222222415.GC191380@google.com> <20200223011018.GB2935@paulmck-ThinkPad-P72> <20200224174030.GA22138@pc636> <20200225020705.GA253171@google.com> <20200225185400.GA27919@pc636> <20200225224745.GX2935@paulmck-ThinkPad-P72> <20200226130440.GA30008@pc636> <20200226150656.GB2935@paulmck-ThinkPad-P72> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200226150656.GB2935@paulmck-ThinkPad-P72> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 26, 2020 at 07:06:56AM -0800, Paul E. McKenney wrote: > On Wed, Feb 26, 2020 at 02:04:40PM +0100, Uladzislau Rezki wrote: > > On Tue, Feb 25, 2020 at 02:47:45PM -0800, Paul E. McKenney wrote: > > > On Tue, Feb 25, 2020 at 07:54:00PM +0100, Uladzislau Rezki wrote: > > > > > > > > I was thinking a 2 fold approach (just thinking out loud..): > > > > > > > > > > > > > > > > If kfree_call_rcu() is called in atomic context or in any rcu reader, then > > > > > > > > use GFP_ATOMIC to grow an rcu_head wrapper on the atomic memory pool and > > > > > > > > queue that. > > > > > > > > > > > > > > I am not sure if that is acceptable, i mean what to do when GFP_ATOMIC > > > > > > gets failed in atomic context? Or we can just consider it as out of > > > > > > memory and another variant is to say that headless object can be called > > > > > > from preemptible context only. > > > > > > > > > > Yes that makes sense, and we can always put disclaimer in the API's comments > > > > > saying if this object is expected to be freed a lot, then don't use the > > > > > headless-API to be extra safe. > > > > > > > > > Agree. > > > > > > > > > BTW, GFP_ATOMIC the documentation says if GFP_ATOMIC reserves are depleted, > > > > > the kernel can even panic some times, so if GFP_ATOMIC allocation fails, then > > > > > there seems to be bigger problems in the system any way. I would say let us > > > > > write a patch to allocate there and see what the -mm guys think. > > > > > > > > > OK. It might be that they can offer something if they do not like our > > > > approach. I will try to compose something and send the patch to see. > > > > The tree.c implementation is almost done, whereas tiny one is on hold. > > > > > > > > I think we should support batching as well as bulk interface there. > > > > Another way is to workaround head-less object, just to attach the head > > > > dynamically using kmalloc() and then call_rcu() but then it will not be > > > > a fair headless support :) > > > > > > > > What is your view? > > > > > > > > > > > > Otherwise, grow an rcu_head on the stack of kfree_call_rcu() and call > > > > > > > > synchronize_rcu() inline with it. > > > > > > > > > > > > > > > > > > > > > > What do you mean here, Joel? "grow an rcu_head on the stack"? > > > > > > > > > > By "grow on the stack", use the compiler-allocated rcu_head on the > > > > > kfree_rcu() caller's stack. > > > > > > > > > > I meant here to say, if we are not in atomic context, then we use regular > > > > > GFP_KERNEL allocation, and if that fails, then we just use the stack's > > > > > rcu_head and call synchronize_rcu() or even synchronize_rcu_expedited since > > > > > the allocation failure would mean the need for RCU to free some memory is > > > > > probably great. > > > > > > > > > Ah, i got it. I thought you meant something like recursion and then > > > > unwinding the stack back somehow :) > > > > > > > > > > > > Use preemptible() andr task_struct's rcu_read_lock_nesting to differentiate > > > > > > > > between the 2 cases. > > > > > > > > > > > > > > If the current context is preemptable then we can inline synchronize_rcu() > > > > > > together with freeing to handle such corner case, i mean when we are run > > > > > > out of memory. > > > > > > > > > > Ah yes, exactly what I mean. > > > > > > > > > OK. > > > > > > > > > > As for "task_struct's rcu_read_lock_nesting". Will it be enough just > > > > > > have a look at preempt_count of current process? If we have for example > > > > > > nested rcu_read_locks: > > > > > > > > > > > > > > > > > > rcu_read_lock() > > > > > > rcu_read_lock() > > > > > > rcu_read_lock() > > > > > > > > > > > > > > > > > > the counter would be 3. > > > > > > > > > > No, because preempt_count is not incremented during rcu_read_lock(). RCU > > > > > reader sections can be preempted, they just cannot goto sleep in a reader > > > > > section (unless the kernel is RT). > > > > > > > > > So in CONFIG_PREEMPT kernel we can identify if we are in atomic or not by > > > > using rcu_preempt_depth() and in_atomic(). When it comes to !CONFIG_PREEMPT > > > > then we skip it and consider as atomic. Something like: > > > > > > > > > > > > static bool is_current_in_atomic() > > > > { > > > > #ifdef CONFIG_PREEMPT_RCU > > > > > > If possible: if (IS_ENABLED(CONFIG_PREEMPT_RCU)) > > > > > > Much nicer than #ifdef, and I -think- it should work in this case. > > > > > OK. Thank you, Paul! > > > > There is one point i would like to highlight it is about making caller > > instead to be responsible for atomic or not decision. Like how kmalloc() > > works, it does not really know the context it runs on, so it is up to > > caller to inform. > > > > The same way: > > > > kvfree_rcu(p, atomic = true/false); > > > > in this case we could cover !CONFIG_PREEMPT case also. > > Understood, but couldn't we instead use IS_ENABLED() to work out the > actual situation at runtime and relieve the caller of this burden? > Or am I missing a corner case? > Yes we can do it in run-time, i mean to detect context type, atomic or not. But only for CONFIG_PREEMPT kernel. In case of !CONFIG_PREEMPT configuration i do not see a straight forward way how to detect it. For example when caller holds "spinlock". Therefore for such configuration we can just consider it as atomic. But in reality it could be not in atomic. We need it for emergency/corner case and head-less objects. When we are run of memory. So in this case we should attach the rcu_head dynamically and queue the freed object to be processed later on, after GP. If atomic context use GFP_ATOMIC flag if not use GFP_KERNEL. It is better to allocate with GFP_KERNEL flag(if possible) because it has much less restrictions then GFP_ATOMIC one, i.e. GFP_KERNEL can sleep and wait until the memory is reclaimed. But that is a corner case and i agree that it would be good to avoid of such passing of extra info by the caller. Anyway i just share some extra info :) Thanks. -- Vlad Rezki