Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp373671pxb; Tue, 9 Feb 2021 02:36:58 -0800 (PST) X-Google-Smtp-Source: ABdhPJwwTBYm9eEyAbVAEpaJu27NU8X2p77T2ntXLoJQCWutmvRQqufuKhIiBJ3FfwULvJ0xUyr3 X-Received: by 2002:a17:906:c299:: with SMTP id r25mr21843185ejz.80.1612867017807; Tue, 09 Feb 2021 02:36:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612867017; cv=none; d=google.com; s=arc-20160816; b=lPOnFMn3R1SMCHvm+oQ2nGNK4l0vBUMefETKre2eX9t3zn2sKWrKuLrHSzDGH7fkoa flJ+78O/GTQAwWZoZNZC4agvfBJMPL2VNVAGCGug32gfoGmiuuaCqpMrgX03IzS2Nmpq tU/KE+b2FJUdjGjvePD1A28QT75Set1TKcWTJjGaWYb2/bZ+tvyCzlmiQpYU4yB2mDb1 0TI/o0J+dqHlvtDm5vQ/B9vC7wibPiPkUGWwehJ32CljvUMyd/0KoqPVd3o1wGn4svkS ObKPwmW5XVNeZ7tTKj+d0DnpHW4q2jooOP7LsaKKWhZn3bYJHMZg1pgYn6G8Ld/P8zaL 6pnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:organization :references:cc:to:from:subject:dkim-signature; bh=l9NmyWsJuFC0/5JNGWNXSCR8ksDc6+RfyJhTiUXWtF4=; b=FP9WA0GuTrH2G4FPwilWB+ypTBDWcJowdkskQTSjHKZN4uMxXrKdva1Vpls2RbopOn bp+MUh80VAemzCXZho2Oz6FPyA726NPhYw3GVGDmxrbAIAl98elPm3c9nMXdLNLOocVE bIyDWYouG48niAV8prVvOTohXErUBGvnbePi+NL04c3l9f6hQQCISMyew1FPKDeQsijm Lg/cFFvdDETsxlcQildM442OEzu8ZMQcRqwaSXstOt7A9FKtGVRPsYJQ3vlgGlt9r5u3 5KfNVSxhf7cWFdESIj81k/2sFbKpcF+6yO/AhYmh8IYhqc3FvNnBIiaU46mrjYMLwwMh n3kw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="OZiBee4/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k23si2719050ejq.693.2021.02.09.02.36.33; Tue, 09 Feb 2021 02:36:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="OZiBee4/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231444AbhBIKez (ORCPT + 99 others); Tue, 9 Feb 2021 05:34:55 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:46113 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231881AbhBIKcf (ORCPT ); Tue, 9 Feb 2021 05:32:35 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1612866669; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=l9NmyWsJuFC0/5JNGWNXSCR8ksDc6+RfyJhTiUXWtF4=; b=OZiBee4/hx+0WzQCZMLlGJRIFCiAD9X3KouUn1rReCunutViiArKTZnU/OexnZuFy204c/ LYzC+rWAwbXUxsPTNOAI49XW3mGoED+KCRwulMrjPNhrcCRxrehQrJ+7vIdt/eBaDEf9et AgkqIvHPCXaP23isZybVF6BdD42+aWk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-105-IIfTrpvSPlC2PvPZbsJlDQ-1; Tue, 09 Feb 2021 05:31:06 -0500 X-MC-Unique: IIfTrpvSPlC2PvPZbsJlDQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C379C107ACE4; Tue, 9 Feb 2021 10:31:01 +0000 (UTC) Received: from [10.36.113.141] (ovpn-113-141.ams2.redhat.com [10.36.113.141]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3559C17AE2; Tue, 9 Feb 2021 10:30:54 +0000 (UTC) Subject: Re: [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas From: David Hildenbrand To: Michal Hocko Cc: Mike Rapoport , Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org References: <20210208211326.GV242749@kernel.org> <1F6A73CF-158A-4261-AA6C-1F5C77F4F326@redhat.com> <662b5871-b461-0896-697f-5e903c23d7b9@redhat.com> Organization: Red Hat GmbH Message-ID: Date: Tue, 9 Feb 2021 11:30:53 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09.02.21 11:23, David Hildenbrand wrote: >>>> A lot of unevictable memory is a concern regardless of CMA/ZONE_MOVABLE. >>>> As I've said it is quite easy to land at the similar situation even with >>>> tmpfs/MAP_ANON|MAP_SHARED on swapless system. Neither of the two is >>>> really uncommon. It would be even worse that those would be allowed to >>>> consume both CMA/ZONE_MOVABLE. >>> >>> IIRC, tmpfs/MAP_ANON|MAP_SHARED memory >>> a) Is movable, can land in ZONE_MOVABLE/CMA >>> b) Can be limited by sizing tmpfs appropriately >>> >>> AFAIK, what you describe is a problem with memory overcommit, not with zone >>> imbalances (below). Or what am I missing? >> >> It can be problem for both. If you have just too much of shm (do not >> forget about MAP_SHARED|MAP_ANON which is much harder to size from an >> admin POV) then migrateability doesn't really help because you need a >> free memory to migrate. Without reclaimability this can easily become a >> problem. That is why I am saying this is not really a new problem. >> Swapless systems are not all that uncommon. > > I get your point, it's similar but still different. "no memory in the > system" vs. "plenty of unusable free memory available in the system". > > In many setups, memory for user space applications can go to > ZONE_MOVABLE just fine. ZONE_NORMAL etc. can be used for supporting user > space memory (e.g., page tables) and other kernel stuff. > > Like, have 4GB of ZONE_MOVABLE with 2GB of ZONE_NORMAL. Have an > application (database) that allocates 4GB of memory. Works just fine. > The zone ratio ends up being a problem for example with many processes > (-> many page tables). > > Not being able to put user space memory into the movable zone is a > special case. And we are introducing yet another special case here > (besides vfio, rdma, unmigratable huge pages like gigantic pages). > > With plenty of secretmem, looking at /proc/meminfo Total vs. Free can be > a big lie of how your system behaves. > >> >>>> One has to be very careful when relying on CMA or movable zones. This is >>>> definitely worth a comment in the kernel command line parameter >>>> documentation. But this is not a new problem. >>> >>> I see the following thing worth documenting: >>> >>> Assume you have a system with 2GB of ZONE_NORMAL/ZONE_DMA and 4GB of >>> ZONE_MOVABLE/CMA. >>> >>> Assume you make use of 1.5GB of secretmem. Your system might run into OOM >>> any time although you still have plenty of memory on ZONE_MOVAVLE (and even >>> swap!), simply because you are making excessive use of unmovable allocations >>> (for user space!) in an environment where you should not make excessive use >>> of unmovable allocations (e.g., where should page tables go?). >> >> yes, you are right of course and I am not really disputing this. But I >> would argue that 2:1 Movable/Normal is something to expect problems >> already. "Lowmem" allocations can easily trigger OOM even without secret >> mem in the picture. It all just takes to allocate a lot of GFP_KERNEL or >> even GFP_{HIGH}USER. Really, it is CMA/MOVABLE that are elephant in the >> room and one has to be really careful when relying on them. > > Right, it's all about what the setup actually needs. Sure, there are > cases where you need significantly more GFP_KERNEL/GFP_{HIGH}USER such > that a 2:1 ratio is not feasible. But I claim that these are corner cases. > > Secretmem gives user space the option to allocate a lot of > GFP_{HIGH}USER memory. If I am not wrong, "ulimit -a" tells me that each > application on F33 can allocate 16 GiB (!) of secretmem. Got to learn to do my math. It's 16 MiB - so as a default it's less dangerous than I thought! -- Thanks, David / dhildenb