Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp566806pxa; Wed, 19 Aug 2020 08:56:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz4gRA+mpenyB4ybRSJNeSvdh9iaa9ujMWfg7PQ1WsovDM9fuHqxaH8uXwOQ1l8fdKaENDD X-Received: by 2002:a17:906:b213:: with SMTP id p19mr25442932ejz.39.1597852562922; Wed, 19 Aug 2020 08:56:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597852562; cv=none; d=google.com; s=arc-20160816; b=txprPAKeLqnjKAM5YQ6oeWTjXigdQ1jrHH0prxM7mKncaA7PRrAzGK/yz2+UKtWR8t r7ZnXEy9dbNFqpmlKGjtVapv1RBCC8jxDMADQ3Crbueu9puLrZAhEx/U1ozlEkNr8Uw1 VGJoKNbYs5YwmoGFYkEdMSSzYXZnYDpJeudWZ+G6V3rPmUpxu25gSub5LypRtyiWaPyX 5rfPJAt3VhV3DHddwtIM+WS3AxMU65Ei8SDC98pzEDmt7dOA43P1pPcOuz3pxvp76E1u QCRgz0371E7waGpzjA0ub5uBQmJPC4Aaq3s/ELL4XJccmvR4sFRFQG364zr3A5ZU+GqJ fpFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=Q60onHid8iSpku+qZJQw6fYbSE4YurtiDVuXIPIGP8k=; b=LtAgPahSuyXwFHhKPvS9jMbLfTvEG1ALkGe5Z8lAqE8RFfm8cP2dwYMtAfAPdJv6Mu T0zRSvqu5G1j/tb5XTRcepj6wEOuKajZ8VNveOQ8dWFUoElCzLgPaSr3W2v5x/J8VgU8 it9RHovb1BPYF9yaN7en+OKPR52Oe5nzZE7nhEW3Ylm8Ry1zoWJr6CbMDE6NYExap0xL ecetfSZHf6RonAnRjAnbFCymkyTt83Gt7PC1XofXxp6ffxYuKgVVwM+vAOXEeShI6+B7 kR2xmoK+L3GcDJcL5+eO1I9K4LGSYr9KsQZlOMGhfNfpLg8wkoor2YH5+Wd7QohjQfoi Dy9Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=SlHEI38y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dk14si15694499edb.61.2020.08.19.08.55.37; Wed, 19 Aug 2020 08:56:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=SlHEI38y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728774AbgHSPyt (ORCPT + 99 others); Wed, 19 Aug 2020 11:54:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728756AbgHSPyp (ORCPT ); Wed, 19 Aug 2020 11:54:45 -0400 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 797A8C061757 for ; Wed, 19 Aug 2020 08:54:45 -0700 (PDT) Received: by mail-qt1-x841.google.com with SMTP id d27so18127302qtg.4 for ; Wed, 19 Aug 2020 08:54:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=Q60onHid8iSpku+qZJQw6fYbSE4YurtiDVuXIPIGP8k=; b=SlHEI38yuJz9o8rfXLjxWgFh5evie2N52TntNYjdNI3QKdNjfV2hrS0H0D9Apq/91I jyU3Yf/xQyrDL1YgD4ykelSilkw501q0jfvs3neauYK+hg/jz9p1m7143P2DBmmybcUp gouVlF0T+Kbyl2phbd72SkVbJa6tPK7daI30w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=Q60onHid8iSpku+qZJQw6fYbSE4YurtiDVuXIPIGP8k=; b=h3L+ihYQulxf81Rns+6SOwRSnaOxGQLZkIG+0Y8T1LH4BayCbpZWdiHOvPAtqNNRZj inn0UMghRgeDEAlXg7IEdA4JFgd3hkH3NXcisdc3HzRCbbuOKyrZOP3kCLbLInXSs+Vy rlmCKai73qjNjxnoBd3H6Y34KfXizG15HFYpB/1tM+wmcdMpMYOgSkTj/FIfyC/oZeH5 fkPfoqZZpyfLeNAw8jke1wyQFy7DqrqQLdeyvMZdywXBKNjlYkFcbuBPklXffzx3uz6p MMxQ1KizQoc3BpuXkGQk249CuBetpjXcnIqatNI7DaUbJG2mRMNtVdPchOys35ursJnL pjXA== X-Gm-Message-State: AOAM532F1VUfMhoJzog7uNkWIIZ+D9pRz8Rq5Yb239XNfgNE5unyyW3I m1/ZgIDnWiQ8sDkRK7ZOWJM7vg== X-Received: by 2002:ac8:24d9:: with SMTP id t25mr23509783qtt.15.1597852481203; Wed, 19 Aug 2020 08:54:41 -0700 (PDT) Received: from localhost ([2620:15c:6:12:cad3:ffff:feb3:bd59]) by smtp.gmail.com with ESMTPSA id f39sm30015550qta.59.2020.08.19.08.54.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Aug 2020 08:54:40 -0700 (PDT) Date: Wed, 19 Aug 2020 11:54:40 -0400 From: Joel Fernandes To: "Paul E. McKenney" , boqun.feng@gmail.com Cc: "Zhang, Qiang" , Uladzislau Rezki , Josh Triplett , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , rcu , LKML Subject: Re: =?utf-8?B?5Zue5aSN?= =?utf-8?Q?=3A?= [PATCH] rcu: shrink each possible cpu krcp Message-ID: <20200819155440.GD3886445@google.com> References: <20200818171807.GI27891@paulmck-ThinkPad-P72> <20200818210355.GM27891@paulmck-ThinkPad-P72> <20200818215511.GA2538@pc636> <20200818220245.GO27891@paulmck-ThinkPad-P72> <20200819135654.GB3875610@google.com> <20200819152159.GX27891@paulmck-ThinkPad-P72> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20200819152159.GX27891@paulmck-ThinkPad-P72> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote: > On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote: > > On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote: > > > > > > > > > ________________________________________ > > > 发件人: linux-kernel-owner@vger.kernel.org 代表 Joel Fernandes > > > 发送时间: 2020年8月19日 8:04 > > > 收件人: Paul E. McKenney > > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML > > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp > > > > > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney wrote: > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > index b8ccd7b5af82..6decb9ad2421 100644 > > > > > --- a/kernel/rcu/tree.c > > > > > +++ b/kernel/rcu/tree.c > > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu) > > > > > { > > > > > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); > > > > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */ > > > > > + struct kfree_rcu_cpu *krcp; > > > > > > > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) > > > > > return 0; > > > > > > > > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */ > > > > > + krcp = this_cpu_ptr(&krc) > > > > > + schedule_delayed_work(&krcp->monitor_work, 0); > > > > > + > > > > > > > > > > A cpu can be offlined and its krp will be stuck until a shrinker is involved. > > > > > Maybe be never. > > > > > > > > Does the same apply to its kmalloc() per-CPU caches? If so, I have a > > > > hard time getting too worried about it. ;-) > > > > > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync() > > > >on the cache reaper who's job is to flush the per-cpu caches. So I > > > >believe during CPU offlining, the per-cpu slab caches are flushed. > > > > > > > >thanks, > > > > > > > >- Joel > > > > > > When cpu going offline, the slub or slab only flush free objects in offline > > > cpu cache, put these free objects in node list or return buddy system, > > > for those who are still in use, they still stay offline cpu cache. > > > > > > If we want clean per-cpu "krcp" objects when cpu going offline. we should > > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called > > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be > > > called in "cpuhp/%u" per-cpu thread. > > > > > > > Could you please wrap text properly when you post to mailing list, thanks. I > > fixed it for you above. > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index 8ce77d9ac716..1812d4a1ac1b 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu) > > > unsigned long flags; > > > struct rcu_data *rdp; > > > struct rcu_node *rnp; > > > + struct kfree_rcu_cpu *krcp; > > > > > > rdp = per_cpu_ptr(&rcu_data, cpu); > > > rnp = rdp->mynode; > > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu) > > > > > > // nohz_full CPUs need the tick for stop-machine to work quickly > > > tick_dep_set(TICK_DEP_BIT_RCU); > > > + > > > + krcp = per_cpu_ptr(&krc, cpu); > > > + raw_spin_lock_irqsave(&krcp->lock, flags); > > > + schedule_delayed_work(&krcp->monitor_work, 0); > > > + raw_spin_unlock_irqrestore(&krcp->lock, flags); > > > return 0; > > > > I realized the above is not good enough for what this is trying to do. Unlike > > the slab, the new kfree_rcu objects cannot always be drained / submitted to > > RCU because the previous batch may still be waiting for a grace period. So > > the above code could very well return with the yet-to-be-submitted kfree_rcu > > objects still in the cache. > > > > One option is to spin-wait here for monitor_todo to be false and keep calling > > kfree_rcu_drain_unlock() till then. > > > > But then that's not good enough either, because if new objects are queued > > when interrupts are enabled in the CPU offline path, then the cache will get > > new objects after the previous set was drained. Further, spin waiting may > > introduce deadlocks. > > > > Another option is to switch the kfree_rcu() path to non-batching (so new > > objects cannot be cached in the offline path and are submitted directly to > > RCU), wait for a GP and then submit the work. But then not sure if 1-argument > > kfree_rcu() will like that. > > Or spawn a workqueue that does something like this: > > 1. Get any pending kvfree_rcu() requests sent off to RCU. > > 2. Do an rcu_barrier(). > > 3. Do the cleanup actions. One more thing I'n not sure about is if this will even be an issue because the delayed-work to run the monitor could be migrated to another (both timers and wq get migrated). So in that case, the draining of the kfree_rcu objects will happened on another CPU. Adding Boqun as well since I believe he knows a lot about wq and hotplug as well. Basically my question is does delayed_work get migrated to another CPU when one CPU is offline? If so, I am assuming all races between new schedule_delayed_work invocations and CPU offlining are handled as well, such that at some point those new invocations are queued on another CPU. > > Probably Qian's original fix for for_each_possible_cpus() is good enough for > > the shrinker case, and then we can tackle the hotplug one. > > It might take some experimentation to find the best solution. Sure. thanks, - Joel