Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1946040AbdD1JSc (ORCPT ); Fri, 28 Apr 2017 05:18:32 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:57298 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1425404AbdD1JRu (ORCPT ); Fri, 28 Apr 2017 05:17:50 -0400 Subject: Re: [PATCH v2 1/2] mm: Uncharge poisoned pages To: Michal Hocko , Andi Kleen References: <1493130472-22843-1-git-send-email-ldufour@linux.vnet.ibm.com> <1493130472-22843-2-git-send-email-ldufour@linux.vnet.ibm.com> <20170427143721.GK4706@dhcp22.suse.cz> <87pofxk20k.fsf@firstfloor.org> <20170428060755.GA8143@dhcp22.suse.cz> <20170428073136.GE8143@dhcp22.suse.cz> Cc: Naoya Horiguchi , linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, Johannes Weiner , Vladimir Davydov From: Laurent Dufour Date: Fri, 28 Apr 2017 11:17:34 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20170428073136.GE8143@dhcp22.suse.cz> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 17042809-0008-0000-0000-000004358BA8 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17042809-0009-0000-0000-00001D74031C Message-Id: <3eb86373-dafc-6db9-82cd-84eb9e8b0d37@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-04-28_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1704280138 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2142 Lines: 51 On 28/04/2017 09:31, Michal Hocko wrote: > [CC Johannes and Vladimir - the patch is > http://lkml.kernel.org/r/1493130472-22843-2-git-send-email-ldufour@linux.vnet.ibm.com] > > On Fri 28-04-17 08:07:55, Michal Hocko wrote: >> On Thu 27-04-17 13:51:23, Andi Kleen wrote: >>> Michal Hocko writes: >>> >>>> On Tue 25-04-17 16:27:51, Laurent Dufour wrote: >>>>> When page are poisoned, they should be uncharged from the root memory >>>>> cgroup. >>>>> >>>>> This is required to avoid a BUG raised when the page is onlined back: >>>>> BUG: Bad page state in process mem-on-off-test pfn:7ae3b >>>>> page:f000000001eb8ec0 count:0 mapcount:0 mapping: (null) >>>>> index:0x1 >>>>> flags: 0x3ffff800200000(hwpoison) >>>> >>>> My knowledge of memory poisoning is very rudimentary but aren't those >>>> pages supposed to leak and never come back? In other words isn't the >>>> hoplug code broken because it should leave them alone? >>> >>> Yes that would be the right interpretation. If it was really offlined >>> due to a hardware error the memory will be poisoned and any access >>> could cause a machine check. >> >> OK, thanks for the clarification. Then I am not sure the patch is >> correct. Why do we need to uncharge that page at all? > > Now, I have realized that we actually want to uncharge that page because > it will pin the memcg and we do not want to have that memcg and its > whole hierarchy pinned as well. This used to work before the charge > rework 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") I guess > because we used to uncharge on page cache removal. > > I do not think the patch is correct, though. memcg_kmem_enabled() will > check whether kmem accounting is enabled and we are talking about page > cache pages here. You should be using mem_cgroup_uncharge instead. Thanks for the review Michal. I was not comfortable either with this patch. I did some tests calling mem_cgroup_uncharge() when isolate_lru_page() succeeds only, so not calling it if isolate_lru_page() failed. This seems to work as well, so if everyone agree on that, I'll send a new version soon. Cheers, Laurent.