Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp26648imm; Fri, 25 May 2018 15:18:39 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrjPTgchf3XpT0lC/XpDomCnFbRViSSWDlWkEx9G1/dPMXOJwvxDwNmzquuxwqFl1aluMN8 X-Received: by 2002:a17:902:c81:: with SMTP id 1-v6mr4307084plt.126.1527286719575; Fri, 25 May 2018 15:18:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527286719; cv=none; d=google.com; s=arc-20160816; b=FqIws8+fa2ecVaYasAMjAIDCLi2AlPsRxXrrWFKDnq9sDhidtlmO509KdWm8trbIBq J+klsJMN+1EP0CFQa978NVGF8SFbmanm0hSPDMSIhq1j6GYeoZsjNhkkBpTJMvzNqHUJ zoxJSSHqsiSKLb0QslyLF75KXPXDAx8oy/3ZOYsVF1VSm112MpkyXwk2tucC5mE3zOf8 g90pt0X/Rpe9NBuxXAWKa1Z6BjBwz1kWfEt35dSa4cbcrDf17i7DPNDEyV470vGppouP zFfjQOhO0Dwtk+E5d0ADjvQgYsmGfBWZlbJMOc2LghurFxZzudgjsmQVKSmUQlNfwvkC 0x/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=l2pn/d2QZBtLmqW3wBe1mcfaPa7RHrw1sBd2C2UtF8c=; b=YdzQrfirzOtn4eWN0PQL/BT0R2icwDoMHnNZmjKdUfNhjKdbGqexw21gslN+Zi4iIT hL+HuDiwC4bXj5aHYckITjXrairm8yoIY1dYn+dFPf2+UucFVg1/3ka5tae5Zt3J0E8a lyP0s/5F/ULSZNU9JN8X80ps0haXp6ShDorkeXuXI6LKQlQA1Q9nTNts4A2Dmm4EzIqH lcZFQC5e84zzZWj7+Eytc4WfdDQ8aORHMWboVEsfb5UHY4LA87UT7Jc6AajW9QUQXZur jSRyCxo6++x5KvXXXlkKrwyfS7x+8xmJIDrzXpZ1TwB0r8cKnCy8Io7KdvTl5NU/5Mar Xk8Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=r2JcKriI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r76-v6si24044684pfb.65.2018.05.25.15.18.22; Fri, 25 May 2018 15:18:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=r2JcKriI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030558AbeEYWSO (ORCPT + 99 others); Fri, 25 May 2018 18:18:14 -0400 Received: from mail-pl0-f67.google.com ([209.85.160.67]:44767 "EHLO mail-pl0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030394AbeEYWSN (ORCPT ); Fri, 25 May 2018 18:18:13 -0400 Received: by mail-pl0-f67.google.com with SMTP id z9-v6so533156plk.11 for ; Fri, 25 May 2018 15:18:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=l2pn/d2QZBtLmqW3wBe1mcfaPa7RHrw1sBd2C2UtF8c=; b=r2JcKriIKefsSr0criPHm62a3AfpmNUcwt0NqZTJC7JIHpanfFY8Zlom74T50QzY4b bjsoKwO7OcrYHRnxPi1oPsWETKII5+/lagdOKUbXngQY8MrPO8y//juNFDLg0wa3Nvt5 cS5R/FJOH8t4/dR1pG3zxJlWjihTCNGpJhKAxxb9s+EeQTON1qv4MCAzbMi/6zxI5g22 ikjE21Loi8UST0dd1qRhNQbUuRj2gk/B2UK79k8ANrCSMd50dq5BiaSlaAUrGVVOVrcJ sru45+kWaFYcvbDSgNir6QipmVN+muzp8Rs5iGXFSArInXD0EkceLUftoiBrlQakNauu ZsRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=l2pn/d2QZBtLmqW3wBe1mcfaPa7RHrw1sBd2C2UtF8c=; b=gDIeDbHKZCEX/darTNl8ukGnq5ugoSx63MSqgBRbH8ma5zLD3ymTQTO5E2DbDQwPJs EspVsG4TAXtIuphXgOrilLg8NOdIeOu9mS3qbKH5m+JK9LwUTfK2cPuqQODKExCeJQyI LSqG1etH5fWz4/4sIs+bETDpvR29CSQFROrcUhM2IxqubAe0DxK3JMRHis9l/MrXsS7y 4AlA6zNeXcWOS2//mFamKThAZa/RQQpL27cvZ4Q+qiFjihnbwffkWKDpyvbndIEj1yp8 Kf8MMmi8TzmqokWFcpS0QSlOceVeYnNgO2lgptZk1RlMuqeNNe1Vj1R4LtJ6YYSBCDNn ThCQ== X-Gm-Message-State: ALKqPwdN/ZWlOk0EPfMQbVhuDffa0AqUZGC0ZzxVtVqkTZ73F+DGJp8L +OaxU7S+xQLa2z9y2RoB9pdGBA== X-Received: by 2002:a17:902:8b85:: with SMTP id ay5-v6mr4193831plb.30.1527286692828; Fri, 25 May 2018 15:18:12 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id q68-v6sm42762982pfb.182.2018.05.25.15.18.11 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 25 May 2018 15:18:12 -0700 (PDT) Date: Fri, 25 May 2018 15:18:11 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Andrew Morton cc: Mike Kravetz , "Aneesh Kumar K.V" , Naoya Horiguchi , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch] mm, hugetlb_cgroup: suppress SIGBUS when hugetlb_cgroup charge fails In-Reply-To: <20180525140940.976ca667f3c6ff83238c3620@linux-foundation.org> Message-ID: References: <20180525134459.5c6f8e06f55307f72b95a901@linux-foundation.org> <20180525140940.976ca667f3c6ff83238c3620@linux-foundation.org> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 25 May 2018, Andrew Morton wrote: > > > > --- a/mm/hugetlb.c > > > > +++ b/mm/hugetlb.c > > > > @@ -2006,8 +2006,10 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, > > > > * code of zero indicates a reservation exists (no change). > > > > */ > > > > map_chg = gbl_chg = vma_needs_reservation(h, vma, addr); > > > > - if (map_chg < 0) > > > > - return ERR_PTR(-ENOMEM); > > > > + if (map_chg < 0) { > > > > + ret = -ENOMEM; > > > > + goto out; > > > > + } > > > > > > This doesn't change the return value. > > > > > > > This, and the subsequent comments, are referring to the third paragraph of > > the changelog. > > > > The functional part of the change is for the > > hugetlb_cgroup_charge_cgroup() return value that is now actually used. > > > Ah. Missed that bit. > If you'd like this separated into two separate patches, one that fixes the actual issue with the hugetlb_cgroup_charge_cgroup() return value and the other to use a single exit path with ERR_PTR(ret), that might make it easier. I think the latter is why the bug was introduced: it's too easy to force -ENOSPC unintentionally. > Altered changelog: > > : When charging to a hugetlb_cgroup fails, alloc_huge_page() returns > : ERR_PTR(-ENOSPC) which will cause VM_FAULT_SIGBUS to be returned to the > : page fault handler. > : > : This is because the return value from hugetlb_cgroup_charge_cgroup() is > : overwritten with ERR_PTR(-ENOSPC). > : > : Instead, propagate the error code from hugetlb_cgroup_charge_cgroup() > : (ERR_PTR(-ENOMEM)), so VM_FAULT_OOM is handled correctly. This is > : consistent with failing mem cgroup charges in the non-hugetlb fault path. > : > : At the same time, restructure the return paths of alloc_huge_page() so it > : is consistent. > LGTM, thanks. > > > > > It would be nice if you could add a comment over alloc_huge_page() > > > explaining the return values (at least). Why sometimes ENOMEM, other > > > times ENOSPC? > > > > > > > The ENOSPC is used to specifically induce a VM_FAULT_SIGBUS, which > > Documentation/vm/hugetlbfs_reserv.txt specifies is how faults are handled > > if no hugetlb pages are available. > > That wasn't a code comment ;) Nobody will know to go looking in > hugetlbfs_reserv.txt - it isn't even referred to from mm/*.c. > Let's see what Mike and Aneesh say, because they may object to using VM_FAULT_OOM because there's no way to guarantee that we'll come under the limit of hugetlb_cgroup as a result of the oom. My assumption is that we use VM_FAULT_SIGBUS since oom killing will not guarantee that the allocation can succeed. But now a process can get a SIGBUS if its hugetlb pages are not allocatable or its under a limit imposed by hugetlb_cgroup that it's not aware of. Faulting hugetlb pages is certainly risky business these days... Perhaps the optimal solution for reaching hugetlb_cgroup limits is to induce an oom kill from within the hugetlb_cgroup itself? Otherwise the unlucky process to fault their hugetlb pages last gets SIGBUS.