Received: by 10.223.176.5 with SMTP id f5csp2207772wra; Thu, 8 Feb 2018 10:06:45 -0800 (PST) X-Google-Smtp-Source: AH8x226MjTMFbGlZSiYC8OogfLGcLTGxNFH0A1TkZMDMKKcXh2ZgAFF+wc2Wsqs9yQ30s1Sd7/Vq X-Received: by 2002:a17:902:3281:: with SMTP id z1-v6mr12346plb.431.1518113205123; Thu, 08 Feb 2018 10:06:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518113205; cv=none; d=google.com; s=arc-20160816; b=cpqupKNBR4dH+GP5FYBE6V2Z8pto/UuS/5cX0biXTaMEk1C6kGlerVakRDpJI56RV+ 9vWm5aWpxoxjHODR8ZsNXdoliFYJWYRzS1jJYXdkP6j6DmvE7ADZYU2MfyQWw56+cbIi X0pMn65TKxhGTk+glaTR6Ihh82r4geOFeOTBD92XVMkjVO/011vNtaTeG6tKxPAAx8zS 1noE5g4F5F629ss7POnaEifKaQo6kuDHlihVhzph36fim+jTGL/glmD1LYjnAYfcpSYC NcCdrEbeSIL4lKktdzrHQ8PwT1hT0xXdL2K/53+8/zNWGYTA8XCzBUKHHTJq5xmYAkJ2 fwnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=Z1CAoITxgNUmVEtyVyxzcFDi7o2MR/Bb963tYSQ2AY0=; b=tOsI+TYEqpT15Kl2Zy15yoSfWUB8DENLGqtIpLuaE00mF4GUasIn8xJ6EgmjoAYifM qAjpXVmlT+K0hUIT5/HQTCTnavlPfVdAkeP3AeAD3BeNqeGFj7xpjSdDxDHQL3CCL9Zr xjgRGkXCWiWXhJQ4sT9ck6uVLZSPpV/5wIqioe7X77VgNgmZZHUNGsYfeKQG5PPAYCsi NP8HEMNXRMruzqK/NqC06q8wfo4UfAG01vqOeDnRcmLsCJa7Udn4ptWZ/R8VpZgFgtuT RxvfUtvgHkSQiWc/zhzGDszqzCoUXwcioXYIOF4jd5DQRsBOWT/iuQilBynOByy5RD6r D6Tw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Px7M75o5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k187si236070pgc.532.2018.02.08.10.06.29; Thu, 08 Feb 2018 10:06:45 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Px7M75o5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752404AbeBHSFg (ORCPT + 99 others); Thu, 8 Feb 2018 13:05:36 -0500 Received: from mail-qk0-f171.google.com ([209.85.220.171]:43870 "EHLO mail-qk0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751700AbeBHSFe (ORCPT ); Thu, 8 Feb 2018 13:05:34 -0500 Received: by mail-qk0-f171.google.com with SMTP id d129so3465107qke.10 for ; Thu, 08 Feb 2018 10:05:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=Z1CAoITxgNUmVEtyVyxzcFDi7o2MR/Bb963tYSQ2AY0=; b=Px7M75o5rr1Yw20U3vrU+3SC0XmhBclbNaB97YS1Uz6jXekdS+ps+KBaPZI5Ojnb7E 0CL1ERb9a++CFReGRLR56PBU/fON4ZHSEBXehVxLivjdBUtDXWhWJwLuIqp+gCeb4qnl KqCWlxXb4IBg16/pm+CiYyqzYYC5PnnGowh3g6ndQOqbhOnRrM0yo2in1UgRU5NU5oc+ yXNP3V9e8hzM63tmjBdU8Gzl4Pdt7Q7XfcbAV28l7J6M0dfiyJhrIC/JE7Es1cRW94vY IkLAwXWz0+c01rLfg8owpuLfJwWLLRYy5a9WQg4WwNV+/8WSlcl74ARc5kCRVYTw13cF WFsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=Z1CAoITxgNUmVEtyVyxzcFDi7o2MR/Bb963tYSQ2AY0=; b=nG/C5333kW3Joqev+503SpQllTMe5IiT7OPB/K66oq82dJJL4w9HcFxjLIyhfABDtn YqDK4PSScy0OgH1cmodP0KLo5tLH74sBiZDLZ9jocU2CGKUil96PHhF/Lb/rYibhjMSO DzZqpimlUhH52dChzhpMPg9DTWNJT59CrySOTcerknoJ0UUautr6OMVzIGP5QL4dT2P3 Uyq3p5utK0FF27fzGFfiJg55YJft2T+7yjFvSjm0R+pSCAJB3+bO6l8698rMNcs/cMBO FG/pZALkKSIeM6aX5UohJV+I2q2sbSgUjG3YeoraGr/BrhJxXH5UKcBxzlHueVMsQv5D jVnQ== X-Gm-Message-State: APf1xPD1nvKeZW0yN46MgcSgxukuTVe4Y8VK5QYl6Ncm8255tFgbrk4F G3Q0FuXDuIBVYcWhX2Vvy7D4e+5co5AkgKKKQiw= X-Received: by 10.55.33.8 with SMTP id h8mr59412qkh.9.1518113134092; Thu, 08 Feb 2018 10:05:34 -0800 (PST) MIME-Version: 1.0 Received: by 10.200.38.47 with HTTP; Thu, 8 Feb 2018 10:05:33 -0800 (PST) In-Reply-To: References: <20180208021112.GB14918@bombadil.infradead.org> From: Daniel Micay Date: Thu, 8 Feb 2018 13:05:33 -0500 Message-ID: Subject: Re: [RFC] Warn the user when they could overflow mapcount To: Jann Horn Cc: Matthew Wilcox , linux-mm@kvack.org, Kernel Hardening , kernel list , "Kirill A. Shutemov" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >> That seems pretty bad. So here's a patch which adds documentation to the >> two sysctls that a sysadmin could use to shoot themselves in the foot, >> and adds a warning if they change either of them to a dangerous value. > > I have negative feelings about this patch, mostly because AFAICS: > > - It documents an issue instead of fixing it. > - It likely only addresses a small part of the actual problem. The standard map_max_count / pid_max are very low and there are many situations where either or both need to be raised. VM fragmentation in long-lived processes is a major issue. There are allocators like jemalloc designed to minimize VM fragmentation by never unmapping memory but they're relying on not having anything else using mmap regularly so they can have all their ranges merged together, unless they decide to do something like making a 1TB PROT_NONE mapping up front to slowly consume. If you Google this sysctl name, you'll find lots of people running into the limit. If you're using a debugging / hardened allocator designed to use a lot of guard pages, the standard map_max_count is close to unusable... I think the same thing applies to pid_max. There are too many reasonable reasons to increase it. Process-per-request is quite reasonable if you care about robustness / security and want to sandbox each request handler. Look at Chrome / Chromium: it's currently process-per-site-instance, but they're moving to having more processes with site isolation to isolate iframes into their own processes to work towards enforcing the boundaries between sites at a process level. It's way worse for fine-grained server-side sandboxing. Using a lot of processes like this does counter VM fragmentation especially if long-lived processes doing a lot of work are mostly avoided... but if your allocators like using guard pages you're still going to hit the limit. I do think the default value in the documentation should be fixed but if there's a clear problem with raising these it really needs to be fixed. Google either of the sysctl names and look at all the people running into issues and needing to raise them. It's only going to become more common to raise these with people trying to use lots of fine-grained sandboxing. Process-per-request is back in style.