Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751689AbdLST5C (ORCPT ); Tue, 19 Dec 2017 14:57:02 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:42114 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752363AbdLST4v (ORCPT ); Tue, 19 Dec 2017 14:56:51 -0500 Subject: Re: [PATCH] kfree_rcu() should use the new kfree_bulk() interface for freeing rcu structures To: Matthew Wilcox Cc: linux-kernel@vger.kernel.org, paulmck@linux.vnet.ibm.com, brouer@redhat.com, linux-mm@kvack.org References: <1513705948-31072-1-git-send-email-rao.shoaib@oracle.com> <20171219193039.GB6515@bombadil.infradead.org> From: Rao Shoaib Message-ID: <24c9f1c0-58d4-5d27-8795-d211693455dd@oracle.com> Date: Tue, 19 Dec 2017 11:56:30 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <20171219193039.GB6515@bombadil.infradead.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8750 signatures=668650 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1712190284 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2893 Lines: 86 On 12/19/2017 11:30 AM, Matthew Wilcox wrote: > On Tue, Dec 19, 2017 at 09:52:27AM -0800, rao.shoaib@oracle.com wrote: >> @@ -129,6 +130,7 @@ int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t nr, >> >> for (i = 0; i < nr; i++) { >> void *x = p[i] = kmem_cache_alloc(s, flags); >> + >> if (!x) { >> __kmem_cache_free_bulk(s, i, p); >> return 0; > Don't mix whitespace changes with significant patches. OK. > >> +/* Main RCU function that is called to free RCU structures */ >> +static void >> +__rcu_bulk_free(struct rcu_head *head, rcu_callback_t func, int cpu, bool lazy) >> +{ >> + unsigned long offset; >> + void *ptr; >> + struct rcu_bulk_free *rbf; >> + struct rcu_bulk_free_container *rbfc = NULL; >> + >> + rbf = this_cpu_ptr(&cpu_rbf); >> + >> + if (unlikely(!rbf->rbf_init)) { >> + spin_lock_init(&rbf->rbf_lock); >> + rbf->rbf_cpu = smp_processor_id(); >> + rbf->rbf_init = true; >> + } >> + >> + /* hold lock to protect against other cpu's */ >> + spin_lock_bh(&rbf->rbf_lock); > Are you sure we can't call kfree_rcu() from interrupt context? I thought about it, but the interrupts are off due to acquiring the lock. No ? > >> + rbfc = rbf->rbf_container; >> + rbfc->rbfc_entries = 0; >> + >> + if (rbf->rbf_list_head != NULL) >> + __rcu_bulk_schedule_list(rbf); > You've broken RCU. Consider this scenario: > > Thread 1 Thread 2 Thread 3 > kfree_rcu(a) > schedule() > schedule() > gets pointer to b > kfree_rcu(b) > processes rcu callbacks > uses b > > Thread 3 will free a and also free b, so thread 2 is going to use freed > memory and go splat. You can't batch up memory to be freed without > taking into account the grace periods. The code does not change the grace period at all. In fact it adds to the grace period. The free's are accumulated in an array, when a certain limit/time is reached the frees are submitted to RCU for freeing. So the grace period is maintained starting from the time of the last free. In case the memory allocation fails the code uses a list that is also submitted to RCU for freeing. > > It might make sense for RCU to batch up all the memory it's going to free > in a single grace period, and hand it all off to slub at once, but that's > not what you've done here. I am kind of doing that but not on a per grace period but on a per cpu basis. > > > I've been doing a lot of thinking about this because I really want a > way to kfree_rcu() an object without embedding a struct rcu_head in it. > But I see no way to do that today; even if we have an external memory > allocation to point to the object to be freed, we have to keep track of > the grace periods. I am not sure I understand. If you had external memory you can easily do that. I am exactly doing that, the only reason the RCU structure is needed is to get the pointer to the object being freed. Shoaib