Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp4372280ybi; Tue, 30 Jul 2019 00:50:25 -0700 (PDT) X-Google-Smtp-Source: APXvYqyZVDJvoFotgm8+RTmL3CIYa44gmVoXjAxvHerO79b0pFNYCVTipFuyOkW8NE3FypMj1zsO X-Received: by 2002:a63:ff66:: with SMTP id s38mr109265895pgk.363.1564473025213; Tue, 30 Jul 2019 00:50:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564473025; cv=none; d=google.com; s=arc-20160816; b=up8GP1ieMk7pRjjQl+Wqu+7E73h3qvo6V5xam7Y8XDcMuzuXtux4+H1Zh+LgqvxvA/ thAOmnEhQeLdLBiLZyhDTSm+CtFAXZVWYY1XD+rXTHqGKjTTqcyS384j2m2Qv244w+/G T6WQdR5WDXsIDKOHH5GsLHW8YEBe4KQ/xCO4DL5vitq4FzzAkEHQIVKP0W5TGReRtvVL kFn3egJfUv1ZgVLCT9Rzc0W8AuZ9Mce5dtonqPqPx3yiIf0MVVPAlt77X/wXtOJBgBAG 9tXYioh9CNTE/JWMG5zA+YU3pdwwsaoH8cyP1gXqaKlt5DivZjQzci/ZTdnUY7zojnb1 FdSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:date:cc:to:from:subject:message-id; bh=jbA2FIUk5XWSh6okDk+X0w/SCUEF1RgwXZz64vQnvgs=; b=P4dPgeDhUrByKPLAad9ApslwatI5Y89nnOBeW7TtSaWaihTyCkfxTm5d7ZWF/aPLyd HIbSwAWJGAtdt/IvlDvfuIlDXx8K7NfuMp2PjPD21R6AUoJ9/i6rohxPQoFTcpS5+9et KmKhdWqxTKCWIorY2qXbZ0c5tWpSkXyYQIcmWexTRIi282Ang0PPcmh7IyU4DRoJlqWm iqRqzNFe67U1LkCqIlCrxQVNrN8Obq1M0FD85UreQANBNawanl1OKpbt2Nr7NK5se7uR /07Cgg1xJ3rdI/b3/SKN3SL0rmiCVngN7hsLYX3HMSIZxW5G5KEZ+lMbYKz2fXQ9S9zP kNvA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v20si28384919pgb.182.2019.07.30.00.49.37; Tue, 30 Jul 2019 00:50:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388591AbfG2VVK (ORCPT + 99 others); Mon, 29 Jul 2019 17:21:10 -0400 Received: from shelob.surriel.com ([96.67.55.147]:48794 "EHLO shelob.surriel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388163AbfG2VVJ (ORCPT ); Mon, 29 Jul 2019 17:21:09 -0400 Received: from imladris.surriel.com ([96.67.55.152]) by shelob.surriel.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.92) (envelope-from ) id 1hsD4u-00061P-72; Mon, 29 Jul 2019 17:21:08 -0400 Message-ID: Subject: Re: [PATCH v3] sched/core: Don't use dying mm as active_mm of kthreads From: Rik van Riel To: Waiman Long , Peter Zijlstra , Ingo Molnar Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Phil Auld , Michal Hocko Date: Mon, 29 Jul 2019 17:21:07 -0400 In-Reply-To: <20190729210728.21634-1-longman@redhat.com> References: <20190729210728.21634-1-longman@redhat.com> Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-l4XujGBB1uulLbJ4CPGM" User-Agent: Evolution 3.30.5 (3.30.5-1.fc29) MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-l4XujGBB1uulLbJ4CPGM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, 2019-07-29 at 17:07 -0400, Waiman Long wrote: > It was found that a dying mm_struct where the owning task has exited > can stay on as active_mm of kernel threads as long as no other user > tasks run on those CPUs that use it as active_mm. This prolongs the > life time of dying mm holding up some resources that cannot be freed > on a mostly idle system. On what kernels does this happen? Don't we explicitly flush all lazy TLB CPUs at exit time, when we are about to free page tables? Does this happen only on the CPU where the task in question is exiting, or also on other CPUs? If it is only on the CPU where the task is exiting, would the TASK_DEAD handling in finish_task_switch() be a better place to handle this? > Fix that by forcing the kernel threads to use init_mm as the > active_mm > during a kernel thread to kernel thread transition if the previous > active_mm is dying (!mm_users). This will allows the freeing of > resources > associated with the dying mm ASAP. >=20 > The presence of a kernel-to-kernel thread transition indicates that > the cpu is probably idling with no higher priority user task to run. > So the overhead of loading the mm_users cacheline should not really > matter in this case. >=20 > My testing on an x86 system showed that the mm_struct was freed > within > seconds after the task exited instead of staying alive for minutes or > even longer on a mostly idle system before this patch. >=20 > Signed-off-by: Waiman Long > --- > kernel/sched/core.c | 21 +++++++++++++++++++-- > 1 file changed, 19 insertions(+), 2 deletions(-) >=20 > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 795077af4f1a..41997e676251 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3214,6 +3214,8 @@ static __always_inline struct rq * > context_switch(struct rq *rq, struct task_struct *prev, > struct task_struct *next, struct rq_flags *rf) > { > + struct mm_struct *next_mm =3D next->mm; > + > prepare_task_switch(rq, prev, next); > =20 > /* > @@ -3229,8 +3231,22 @@ context_switch(struct rq *rq, struct > task_struct *prev, > * > * kernel -> user switch + mmdrop() active > * user -> user switch > + * > + * kernel -> kernel and !prev->active_mm->mm_users: > + * switch to init_mm + mmgrab() + mmdrop() > */ > - if (!next->mm) { // to kernel > + if (!next_mm) { // to kernel > + /* > + * Checking is only done on kernel -> kernel transition > + * to avoid any performance overhead while user tasks > + * are running. > + */ > + if (unlikely(!prev->mm && > + !atomic_read(&prev->active_mm->mm_users)))=20 > { > + next_mm =3D next->active_mm =3D &init_mm; > + mmgrab(next_mm); > + goto mm_switch; > + } > enter_lazy_tlb(prev->active_mm, next); > =20 > next->active_mm =3D prev->active_mm; > @@ -3239,6 +3255,7 @@ context_switch(struct rq *rq, struct > task_struct *prev, > else > prev->active_mm =3D NULL; > } else { // to user > +mm_switch: > /* > * sys_membarrier() requires an smp_mb() between > setting > * rq->curr and returning to userspace. > @@ -3248,7 +3265,7 @@ context_switch(struct rq *rq, struct > task_struct *prev, > * finish_task_switch()'s mmdrop(). > */ > =20 > - switch_mm_irqs_off(prev->active_mm, next->mm, next); > + switch_mm_irqs_off(prev->active_mm, next_mm, next); > =20 > if (!prev->mm) { // from kernel > /* will mmdrop() in finish_task_switch(). */ --=20 All Rights Reversed. --=-l4XujGBB1uulLbJ4CPGM Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEKR73pCCtJ5Xj3yADznnekoTE3oMFAl0/Y0MACgkQznnekoTE 3oOaKQf/VSVsZmXhfcU6zOpLa8iKeG1i8twZxHUp3pmctc0g9XcW7a+h60GTX/Aq wHbRpgAnjpltplzqrWqipaxvj+fj+8IRNOBuWzB20gupeq+bx/tJHcvnXpAlUsZJ Zgj4eIzYETaREYfdUgEmBZg6gE9DyLI3sEz5/RH5L4/V+HkdYu9i1bVX5rfNq1kn iIgK7EjGmoM84W/zNgIFMtvGZiWiTDkYMhjqTpp5wVUNHHutF41gHNVEQ1KXr/la Xdu2OgionGyrr2SfsU/KxWN8Ha34E1IjVEaHcR1AhLPIdOX6DeWUB2aFr9Ymx7HM i1UfCr9K0VKZ3cWVGAZDmNtm+kQXEg== =kV++ -----END PGP SIGNATURE----- --=-l4XujGBB1uulLbJ4CPGM--