Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp3385589pxk; Mon, 28 Sep 2020 16:36:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxZEhZSuUnwABpeDDP5p2nHhV5cc8g3Mh2q8cuRXcCdWh4F3tVuVR0ArVE+AV9frh3RxOdN X-Received: by 2002:a17:906:c55:: with SMTP id t21mr1206009ejf.276.1601336210384; Mon, 28 Sep 2020 16:36:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1601336210; cv=none; d=google.com; s=arc-20160816; b=DiIdseTNdaabJxwWr93FGXM+0zGyVnhkJOESURTvwgGrPb8Ju/UWkQKqV/yE/F3zZ3 pDjVojCbjK8NMdXeIFlKM1Bcu4RKi8v9R5AP7xxb+3fPl6b7oMAcjrU2RYd/EbxDoaxZ mXY0KaShzg1Bke3iIQDjHt5YCCg7uj9VKNlgVvnERcMuIMUkXw4yGA4gCvNd0EV9FIiE oVtfx0GLIA/mKUTqIILPcTP5H9iJ1FOWKRmrDFb9AAech8uqs7ir7wKUWxxLuUCwve81 /Zg8zsP9Y46KetFqviM6glI2qhaPLFUAsP6PPoclGZUKk3Xcp8Vsf/Ry7OZEQi+KaeQO QgIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=yCscAXNNmvPSbfiEN1uj2LKnX6Iw8zlpdTm3m+pa1mY=; b=h/1A3Wpvlqt4w7F5Juw5A3ue4dXpDQu86SpfGoAmwPS7drXnrCS5+a+ocU5zVEwmvN 7ojPPU27PRW2GhqKP81oBK7LiP1odCseNeoWkkVr9J2S410YHLC0U70v38/pAx03dsbh DcnfoC7cWPF5w2KrGmTQgK0dmktLZMAZqVTDy41Elo/Fo+bKTWXo3h5QNpru3nSgJb0d 2uR564TLzxmT23wERpCcY5Yor66KQIws3JlGkwshHk9w0j5xSV140VNgan8CnG50cWDn u9Ps4iijTuZQexzEwBLe0PFH49oU10ZnYco9QSJ95kE86yYEDlB9AqTp9pXdXmskdpaO wyDA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=FHI8grI4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dr21si1880431ejc.186.2020.09.28.16.36.14; Mon, 28 Sep 2020 16:36:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=FHI8grI4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727301AbgI1XbX (ORCPT + 99 others); Mon, 28 Sep 2020 19:31:23 -0400 Received: from mail.kernel.org ([198.145.29.99]:54652 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727240AbgI1XbI (ORCPT ); Mon, 28 Sep 2020 19:31:08 -0400 Received: from paulmck-ThinkPad-P72.home (unknown [50.45.173.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 56A6A21D46; Mon, 28 Sep 2020 23:31:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601335867; bh=SInaPT4V0cCi4iLCtFw0IFvBG+iebCOdpu+qQisIIsA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=FHI8grI4gysohQAj4KTnWsNlkOU8lXeKl/iLFjKMqnpRRL5s853+qUHUHNjwAjIBi phZCgHgyqp/q2gZHx/hWd+nEJYpt0P7wfbyny+dAafCUdfnF+MGwBPhzMQle0SHAAM EMD4Undzt8rtvD32QS21yxPFCoctZQqik/i4nU+Q= From: paulmck@kernel.org To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@fb.com, mingo@kernel.org, jiangshanlai@gmail.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com, oleg@redhat.com, joel@joelfernandes.org, mhocko@kernel.org, mgorman@techsingularity.net, torvalds@linux-foundation.org, "Uladzislau Rezki (Sony)" , "Paul E . McKenney" Subject: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible Date: Mon, 28 Sep 2020 16:31:01 -0700 Message-Id: <20200928233102.24265-14-paulmck@kernel.org> X-Mailer: git-send-email 2.9.5 In-Reply-To: <20200928233041.GA23230@paulmck-ThinkPad-P72> References: <20200928233041.GA23230@paulmck-ThinkPad-P72> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Uladzislau Rezki (Sony)" The current memory-allocation interface poses the following challenges: a) In kernels built with CONFIG_PROVE_RAW_LOCK_NESTING, lockdep complains ("BUG: Invalid wait context"). This complaint is due to the memory allocator acquiring non-raw spinlocks while a raw spinlocks is held. This problem can also arise if kvfree_rcu() is invoked while holding a raw spinlock. b) In -rt kernels built with CONFIG_PREEMPT_RT, the situation described in (a) above results in an attempt to acquire a sleeplock while holding a spinlock, which is of course forbidden. This can lead to "BUG: scheduling while atomic". c) Please note that call_rcu() is invoked from raw atomic context, so that kfree_rcu() and kvfree_rcu() are therefore also expected to be callable from atomic raw context as well. However given that CONFIG_PREEMPT_COUNT is unconditionally enabled by the earlier commits in this series, the preemptible() macro now properly detects preempt-disable code regions even in kernels built with CONFIG_PREEMPT_NONE. This commit therefore uses preemptible() to determine whether allocation is possible at all for double-argument kvfree_rcu(). If !preemptible(), then allocation is not possible, and kvfree_rcu() falls back to using the less cache-friendly rcu_head approach. Even when preemptible(), the caller might be involved in reclaim, so the GFP_ flags used by double-argument kvfree_rcu() must avoid invoking reclaim processing. Note that single-argument kvfree_rcu() must be invoked in sleepable contexts, and that its fallback is the relatively high latency synchronize_rcu(). Single-argument kvfree_rcu() therefore uses GFP_KERNEL|__GFP_RETRY_MAYFAIL to allow limited sleeping within the memory allocator. Link: https://lore.kernel.org/lkml/20200630164543.4mdcf6zb4zfclhln@linutronix.de/ Fixes: 3042f83f19be ("rcu: Support reclaim for head-less object") Reported-by: Sebastian Andrzej Siewior Signed-off-by: Uladzislau Rezki (Sony) Signed-off-by: Paul E. McKenney --- kernel/rcu/tree.c | 70 ++++++++++++++++--------------------------------------- 1 file changed, 20 insertions(+), 50 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 8ce77d9..cc998d7 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3166,7 +3166,7 @@ static void kfree_rcu_work(struct work_struct *work) krc_this_cpu_unlock(krcp, flags); if (bkvhead[i]) - free_page((unsigned long) bkvhead[i]); + kfree(bkvhead[i]); cond_resched_tasks_rcu_qs(); } @@ -3291,43 +3291,28 @@ static void kfree_rcu_monitor(struct work_struct *work) } static inline bool -kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) +add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, + unsigned long *flags, void *ptr, bool can_sleep) { struct kvfree_rcu_bulk_data *bnode; + bool can_alloc_page = preemptible(); + gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) | __GFP_NOWARN; int idx; - if (unlikely(!krcp->initialized)) + *krcp = krc_this_cpu_lock(flags); + if (unlikely(!(*krcp)->initialized)) return false; - lockdep_assert_held(&krcp->lock); idx = !!is_vmalloc_addr(ptr); /* Check if a new block is required. */ - if (!krcp->bkvhead[idx] || - krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) { - bnode = get_cached_bnode(krcp); - if (!bnode) { - /* - * To keep this path working on raw non-preemptible - * sections, prevent the optional entry into the - * allocator as it uses sleeping locks. In fact, even - * if the caller of kfree_rcu() is preemptible, this - * path still is not, as krcp->lock is a raw spinlock. - * With additional page pre-allocation in the works, - * hitting this return is going to be much less likely. - */ - if (IS_ENABLED(CONFIG_PREEMPT_RT)) - return false; - - /* - * NOTE: For one argument of kvfree_rcu() we can - * drop the lock and get the page in sleepable - * context. That would allow to maintain an array - * for the CONFIG_PREEMPT_RT as well if no cached - * pages are available. - */ - bnode = (struct kvfree_rcu_bulk_data *) - __get_free_page(GFP_NOWAIT | __GFP_NOWARN); + if (!(*krcp)->bkvhead[idx] || + (*krcp)->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) { + bnode = get_cached_bnode(*krcp); + if (!bnode && can_alloc_page) { + krc_this_cpu_unlock(*krcp, *flags); + bnode = kmalloc(PAGE_SIZE, gfp); + *krcp = krc_this_cpu_lock(flags); } /* Switch to emergency path. */ @@ -3336,15 +3321,15 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) /* Initialize the new block. */ bnode->nr_records = 0; - bnode->next = krcp->bkvhead[idx]; + bnode->next = (*krcp)->bkvhead[idx]; /* Attach it to the head. */ - krcp->bkvhead[idx] = bnode; + (*krcp)->bkvhead[idx] = bnode; } /* Finally insert. */ - krcp->bkvhead[idx]->records - [krcp->bkvhead[idx]->nr_records++] = ptr; + (*krcp)->bkvhead[idx]->records + [(*krcp)->bkvhead[idx]->nr_records++] = ptr; return true; } @@ -3382,24 +3367,20 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) ptr = (unsigned long *) func; } - krcp = krc_this_cpu_lock(&flags); - // Queue the object but don't yet schedule the batch. if (debug_rcu_head_queue(ptr)) { // Probable double kfree_rcu(), just leak. WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n", __func__, head); - // Mark as success and leave. - success = true; - goto unlock_return; + return; } /* * Under high memory pressure GFP_NOWAIT can fail, * in that case the emergency path is maintained. */ - success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr); + success = add_ptr_to_bulk_krc_lock(&krcp, &flags, ptr, !head); if (!success) { if (head == NULL) // Inline if kvfree_rcu(one_arg) call. @@ -4394,23 +4375,12 @@ static void __init kfree_rcu_batch_init(void) for_each_possible_cpu(cpu) { struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); - struct kvfree_rcu_bulk_data *bnode; for (i = 0; i < KFREE_N_BATCHES; i++) { INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work); krcp->krw_arr[i].krcp = krcp; } - for (i = 0; i < rcu_min_cached_objs; i++) { - bnode = (struct kvfree_rcu_bulk_data *) - __get_free_page(GFP_NOWAIT | __GFP_NOWARN); - - if (bnode) - put_cached_bnode(krcp, bnode); - else - pr_err("Failed to preallocate for %d CPU!\n", cpu); - } - INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor); krcp->initialized = true; } -- 2.9.5