Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935707AbdGTQPw (ORCPT ); Thu, 20 Jul 2017 12:15:52 -0400 Received: from mx1.redhat.com ([209.132.183.28]:40696 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934059AbdGTQPu (ORCPT ); Thu, 20 Jul 2017 12:15:50 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 9B5BB7CBB2 Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=aarcange@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 9B5BB7CBB2 Date: Thu, 20 Jul 2017 18:15:45 +0200 From: Andrea Arcangeli To: Xishi Qiu Cc: Vlastimil Babka , "'Kirill A . Shutemov'" , zhong jiang , Hugh Dickins , Andrew Morton , Tejun Heo , Michal Hocko , Johannes Weiner , Mel Gorman , Michal Hocko , Minchan Kim , David Rientjes , Joonsoo Kim , sumeet.keswani@hpe.com, Rik van Riel , Linux MM , LKML Subject: Re: mm, something wrong in page_lock_anon_vma_read()? Message-ID: <20170720161545.GD29716@redhat.com> References: <591FB173.4020409@huawei.com> <5923FF31.5020801@huawei.com> <593954BD.9060703@huawei.com> <596DEA07.5000009@huawei.com> <24bd80c6-1bb7-c8b8-2acf-b91e5e10dbb1@suse.cz> <596F2D65.8020902@huawei.com> <20170720125835.GC29716@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170720125835.GC29716@redhat.com> User-Agent: Mutt/1.8.3 (2017-05-23) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Thu, 20 Jul 2017 16:15:50 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1258 Lines: 25 On Thu, Jul 20, 2017 at 02:58:35PM +0200, Andrea Arcangeli wrote: > but if zap_pte in a fremap fails to drop the anon page that was under > memory migration/compaction the exact same thing will happen. Either ... except it is ok to clear a migration entry, it will be migration that will free the new page after migration completes, zap_pte doesn't need to wait. So this fix is good, but I was too optimistic about its ability to explain the whole problem. It only can explain Rss cosmetic errors, not a anon page left hanging around after its anon vma has been freed. About the theory this could be THP related, the Rss stats being off by one as symptom of the bug, don't seem to point in that direction, all complex THP operations don't mess with the rss or they tend to act in blocks of 512. Furthermore the BZ already confirmed it can be reproduced with THP disabled. Said that it also was supposedly already fixed by the various patches you manually backported to your build. I believe for fairness (mailing list traffic etc..) it's be preferable to continue the debugging in the open BZ and not here because you didn't reproduce it on a upstraem kernel yet so we cannot be 100% sure if upstream (if only -stable) could reproduce it. Thanks, Andrea