Received: by 2002:ac0:8845:0:0:0:0:0 with SMTP id g63csp1090538img; Tue, 26 Feb 2019 14:05:15 -0800 (PST) X-Google-Smtp-Source: AHgI3Ib5r3LC5aQrPzQhQlIZ7kzLhjHuR2df1IUvlTNwKGeRdgoFS8PkPyBWV7SZ6Ra8wTCutF/j X-Received: by 2002:a65:614f:: with SMTP id o15mr17009585pgv.383.1551218715580; Tue, 26 Feb 2019 14:05:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551218715; cv=none; d=google.com; s=arc-20160816; b=pJ3Z2r3NtMmRlQfnBknqh40PzcvN9KtU00CIV9246DeeAbTjB1tj7gfzVbS1p9gqr9 jE548sRirBwKjbd7qsjG8QnTeGBSM0n0P+dr9OGIMfg2G9qbY2399OqCTS84xhduGq/V 0Q83Gl4okV4Mu2ESpqNNC4q+u7y7I7STWVTESHFqlUqGQ3+1z1v28jx0lsx/tL3rWCKJ DN51RDebkI+uFQ8ytOVrY5Y2ARXsB7TaOdu9t58nM9nLIUPDrc3y7zH7YaUz9N3aMbKG ru5xA68ACSHJj44u2ZvHl7tQnGV176CWMrTfeo1s1NzQV7NSUUGPN9Ibf6DisI/R/bW6 D1hw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=+FmWNgdPrQkIgwjjkZm438VDy4/G80EYKEL6gE97x4o=; b=mBZRk/olxLnd1zgJe16l33ltMelFShhjsRfNCCHzlJjemt00nCk2vuQJidkNKAHAXe a6N7EDi75dB0v1sZUSV5Ieb7UnbsVZOB1mz0BDprXBW8j0+elZ+WtU22VsR5bge4N+Qj C1beUQlTJ0RdUPhoVSt07txi4UMPVU6xwTfyNfKwsDrLs7mHH0AE4zSj/o//tr1rU5Kc t8vT1S5iaZQvesBXAELby90D8chcYm9YWZWvMQDAz9JwD9NFSafSPb8EXrJmR+WTsfnQ ImurOOS+sHfJwfCwCXVbGA/nmCLl/+pAAzifQaCSmsXVYmwYeZqU3jMg8cqK1MwwcdAj 4oog== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u1si12782410pgn.158.2019.02.26.14.04.53; Tue, 26 Feb 2019 14:05:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729136AbfBZWEb (ORCPT + 99 others); Tue, 26 Feb 2019 17:04:31 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:41626 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728791AbfBZWEb (ORCPT ); Tue, 26 Feb 2019 17:04:31 -0500 Received: from akpm3.svl.corp.google.com (unknown [104.133.8.65]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 8645D8251; Tue, 26 Feb 2019 22:04:30 +0000 (UTC) Date: Tue, 26 Feb 2019 14:04:28 -0800 From: Andrew Morton To: Oscar Salvador Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, hughd@google.com, kirill@shutemov.name, vbabka@suse.cz, joel@joelfernandes.org, jglisse@redhat.com, yang.shi@linux.alibaba.com, mgorman@techsingularity.net Subject: Re: [PATCH] mm,mremap: Bail out earlier in mremap_to under map pressure Message-Id: <20190226140428.3e7c8188eda6a54f9da08c43@linux-foundation.org> In-Reply-To: <20190226091314.18446-1-osalvador@suse.de> References: <20190226091314.18446-1-osalvador@suse.de> X-Mailer: Sylpheed 3.6.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 26 Feb 2019 10:13:14 +0100 Oscar Salvador wrote: > When using mremap() syscall in addition to MREMAP_FIXED flag, > mremap() calls mremap_to() which does the following: > > 1) unmaps the destination region where we are going to move the map > 2) If the new region is going to be smaller, we unmap the last part > of the old region > > Then, we will eventually call move_vma() to do the actual move. > > move_vma() checks whether we are at least 4 maps below max_map_count > before going further, otherwise it bails out with -ENOMEM. > The problem is that we might have already unmapped the vma's in steps > 1) and 2), so it is not possible for userspace to figure out the state > of the vma's after it gets -ENOMEM, and it gets tricky for userspace > to clean up properly on error path. > > While it is true that we can return -ENOMEM for more reasons > (e.g: see may_expand_vm() or move_page_tables()), I think that we can > avoid this scenario in concret if we check early in mremap_to() if the > operation has high chances to succeed map-wise. > > Should not be that the case, we can bail out before we even try to unmap > anything, so we make sure the vma's are left untouched in case we are likely > to be short of maps. > > The thumb-rule now is to rely on the worst-scenario case we can have. > That is when both vma's (old region and new region) are going to be split > in 3, so we get two more maps to the ones we already hold (one per each). > If current map count + 2 maps still leads us to 4 maps below the threshold, > we are going to pass the check in move_vma(). > > Of course, this is not free, as it might generate false positives when it is > true that we are tight map-wise, but the unmap operation can release several > vma's leading us to a good state. > > Another approach was also investigated [1], but it may be too much hassle > for what it brings. > How is this going to affect existing userspace which is aware of the current behaviour? And how does it affect your existing cleanup code, come to that? Does it work as well or better after this change?