Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp6227705ybi; Wed, 31 Jul 2019 10:28:32 -0700 (PDT) X-Google-Smtp-Source: APXvYqzGsCJPMLFWyYRr/W+kUYXRudEpj0jAIlk3dKVSethswG1dx8uTK2FR2u7OEh6dOEbs4Smo X-Received: by 2002:a65:62c4:: with SMTP id m4mr112477543pgv.243.1564594111930; Wed, 31 Jul 2019 10:28:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564594111; cv=none; d=google.com; s=arc-20160816; b=ypksQpXcVn7kL9AIgMkmpaF6OZWKAivmxEDtlCnEoBminFGOHOCNcGISVk2+qpXbH3 waoUHbMDMQ0tD8M892NDAqGNEyY7p3gU862P5pYBGoyJQ+JOOMx2AfDXPXN+qEXph9ci AkWaOBST+6hbgHZHGVPFPQQAtQElN2zBMcUC62L27mL4R4nWRz6nT54L5StMKEMjC8p6 gLz8ssxlC+7NtTAgUmpQ+uEtvKLJMQiykW+eKgCHXHbeisDnEVo4s+n9eY9y45r3lZpV jb+JJzBod/tDxw6ILd6CJp093UdWmTaPCfE1V+y8KaccQH/iqLfk2T70H63bGd/Wutyw TxQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:date:cc:to:from:subject:message-id; bh=6h6gUfLc5N0r1ACqpLSVgHZtCyHtC7RImQVmqauYd7Q=; b=tZzTpA8UijN+UvvOkhfFa8i10Bvhk7TYWn/Vld3xyIeOWTfTaXrENeLWVdEwY2bCcI TrYPPxCo6NbMu61NgxhAHJqLJarDCmPVgfzjXCswlEzIDqc7ktTIQH0kAYGKeWZfM+C9 xrd4DdXwf6K4a4HtUniWuTLLNFYmwZu29iCU+VOanwAPAv4sUtnmr4jGZmZ9PyCBlMA7 t6xzk9h+4jtYfy+Z9tJoHj956ZQOl4gHpC7XPGLdO5ptvS4wXmN3QTiyUmE6EXtO45Ov oALDy0PVps9VszYHNM/rM+Zm8/NbDduMA60yrpIK10IdzMAP8gmVm/YqzoLam4pEXnQB Yzyg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a5si32751113plm.340.2019.07.31.10.28.13; Wed, 31 Jul 2019 10:28:31 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388225AbfGaPH2 (ORCPT + 99 others); Wed, 31 Jul 2019 11:07:28 -0400 Received: from shelob.surriel.com ([96.67.55.147]:45634 "EHLO shelob.surriel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387662AbfGaPH2 (ORCPT ); Wed, 31 Jul 2019 11:07:28 -0400 Received: from imladris.surriel.com ([96.67.55.152]) by shelob.surriel.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.92) (envelope-from ) id 1hsqCM-0001Qa-NS; Wed, 31 Jul 2019 11:07:26 -0400 Message-ID: <76dbc397e21d64da75cd07d90b3ca15ca50d6fbb.camel@surriel.com> Subject: Re: [PATCH v3] sched/core: Don't use dying mm as active_mm of kthreads From: Rik van Riel To: Waiman Long , Peter Zijlstra , Ingo Molnar Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Phil Auld , Michal Hocko Date: Wed, 31 Jul 2019 11:07:26 -0400 In-Reply-To: <01125822-c883-18ce-42e4-942a4f28c128@redhat.com> References: <20190729210728.21634-1-longman@redhat.com> <3e2ff4c9-c51f-8512-5051-5841131f4acb@redhat.com> <8021be4426fdafdce83517194112f43009fb9f6d.camel@surriel.com> <01125822-c883-18ce-42e4-942a4f28c128@redhat.com> Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-LA3XQyUFN0DfKhuFWUzC" User-Agent: Evolution 3.30.5 (3.30.5-1.fc29) MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-LA3XQyUFN0DfKhuFWUzC Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, 2019-07-31 at 10:15 -0400, Waiman Long wrote: > On 7/31/19 9:48 AM, Rik van Riel wrote: > > On Tue, 2019-07-30 at 17:01 -0400, Waiman Long wrote: > > > On 7/29/19 8:26 PM, Rik van Riel wrote: > > > > On Mon, 2019-07-29 at 17:42 -0400, Waiman Long wrote: > > > >=20 > > > > > What I have found is that a long running process on a mostly > > > > > idle > > > > > system > > > > > with many CPUs is likely to cycle through a lot of the CPUs > > > > > during > > > > > its > > > > > lifetime and leave behind its mm in the active_mm of those > > > > > CPUs. My > > > > > 2-socket test system have 96 logical CPUs. After running the > > > > > test > > > > > program for a minute or so, it leaves behind its mm in about > > > > > half > > > > > of > > > > > the > > > > > CPUs with a mm_count of 45 after exit. So the dying mm will > > > > > stay > > > > > until > > > > > all those 45 CPUs get new user tasks to run. > > > > OK. On what kernel are you seeing this? > > > >=20 > > > > On current upstream, the code in native_flush_tlb_others() > > > > will send a TLB flush to every CPU in mm_cpumask() if page > > > > table pages have been freed. > > > >=20 > > > > That should cause the lazy TLB CPUs to switch to init_mm > > > > when the exit->zap_page_range path gets to the point where > > > > it frees page tables. > > > >=20 > > > I was using the latest upstream 5.3-rc2 kernel. It may be the > > > case > > > that > > > the mm has been switched, but the mm_count field of the active_mm > > > of > > > the > > > kthread is not being decremented until a user task runs on a CPU. > > Is that something we could fix from the TLB flushing > > code? > >=20 > > When switching to init_mm, drop the refcount on the > > lazy mm? > >=20 > > That way that overhead is not added to the context > > switching code. >=20 > I have thought about that. That will require changing the active_mm > of > the current task to point to init_mm, for example. Since TLB flush is > done in interrupt context, proper coordination between interrupt and > process context will require some atomic instruction which will > defect > the purpose. Would it be possible to work around that by scheduling a work item that drops the active_mm? After all, a work item runs in a kernel thread, so by the time the work item is run, either the kernel will still be running the mm you want to get rid of as active_mm, or it will have already gotten rid of it earlier. --=20 All Rights Reversed. --=-LA3XQyUFN0DfKhuFWUzC Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEKR73pCCtJ5Xj3yADznnekoTE3oMFAl1Brq4ACgkQznnekoTE 3oO3eQf5AccmrRMXxWm78tqusxi+4qeoJTYk0e55qTe9ICxJJM1GjyKCEok1mcRz MKCKG1Gf6OMsZSp76dqQ3/WhbveHGMM7q+TBRqS3uKi2T+1kn3iPju0X66OYJ3jV GCxf3mhuBwgbMAlJ/orHvbX0TUwE7yHnVWQOLU0PdivEvfA9FBU7LsnnDHhm8EzL BSPJ1Qugfn7o9PrrFTGAARfEYQ2/ZHIhL3c1SZOfF6psCDIJaNJJbzCrUG12OVZU eWZgt8mtpmozWRxZ+7s1UkYSXfIDOJQkRtHNcRtikonqBO3Tt/e8RFJBP5Rp0ond HvpD66RQSO4zrn91S5EaUfnGmos7Zg== =yJHK -----END PGP SIGNATURE----- --=-LA3XQyUFN0DfKhuFWUzC--