Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753114AbcD0RLI (ORCPT ); Wed, 27 Apr 2016 13:11:08 -0400 Received: from mga03.intel.com ([134.134.136.65]:52472 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752325AbcD0RLF (ORCPT ); Wed, 27 Apr 2016 13:11:05 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,542,1455004800"; d="scan'208";a="941467901" Subject: Re: mm: pages are not freed from lru_add_pvecs after process termination To: "Odzioba, Lukasz" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" References: Cc: "Shutemov, Kirill" , "Anaczkowski, Lukasz" From: Dave Hansen Message-ID: <5720F2A8.6070406@intel.com> Date: Wed, 27 Apr 2016 10:11:04 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1733 Lines: 38 On 04/27/2016 10:01 AM, Odzioba, Lukasz wrote: > Pieces of the puzzle: > A) after process termination memory is not getting freed nor accounted as free I don't think this part is necessarily a bug. As long as we have stats *somewhere*, and we really do "reclaim" them, I don't think we need to call these pages "free". > I am not sure whether it is expected behavior or a side effect of something else not > going as it should. Temporarily I added lru_add_drain_all() to try_to_free_pages() > which sort of hammers B case, but A is still present. It's not expected behavior. It's an unanticipated side effect of large numbers of cpu threads, large pages on the LRU, and (relatively) small zones. > I am not familiar with this code, but I feel like draining lru_add work should be split > into smaller pieces and done by kswapd to fix A and drain only as much pages as > needed in try_to_free_pages to fix B. > > Any comments/ideas/patches for a proper fix are welcome. Here are my suggestions. I've passed these along multiple times, but I guess I'll repeat them again for good measure. > 1. We need some statistics on the number and total *SIZES* of all pages > in the lru pagevecs. It's too opaque now. > 2. We need to make darn sure we drain the lru pagevecs before failing > any kind of allocation. > 3. We need some way to drain the lru pagevecs directly. Maybe the buddy > pcp lists too. > 4. We need to make sure that a zone_reclaim_mode=0 system still drains > too. > 5. The VM stats and their updates are now related to how often > drain_zone_pages() gets run. That might be interacting here too. 6. Perhaps don't use the LRU pagevecs for large pages. It limits the severity of the problem.