Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3762516pxb; Tue, 26 Jan 2021 04:14:38 -0800 (PST) X-Google-Smtp-Source: ABdhPJz/0kjOwz7cp5+KuLfScIYF33/W8HjxpddEgcio59oXS90LxuAxLD3gxNVq4t1TW6NTovGX X-Received: by 2002:aa7:c384:: with SMTP id k4mr4241267edq.23.1611663278562; Tue, 26 Jan 2021 04:14:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611663278; cv=none; d=google.com; s=arc-20160816; b=MBJTcgWipUPN+im63BdhDoG3To6xlcHWI7mDYCye7OlqTf6sw0ob7dqJ1ijTfGkIOb EY1HZSSqp7mMtP7MFhYqwfl/WB38bdKvBajNUwc1RztVR9PRR+1bheUug88i61EaQvDQ UDiv2yZC+IGBHJO+3PSltBErzZCOtdMie3YdAvA/E4RRtP4xJT7s4a0F39bG8PvmTUJ4 rrwr+UmJRig+eEb6SGNGaua3cqcd4PPpEIM1vLBRA/Pbh/41u4J6sFL/sr8/lx4dyNtY pRZGA7y7Uw/YASOyQHKWBubd5E0bnOp8QR6A3T8RAuOCEN2h4H9X2rULXHhzafu8ZwaV Y8SA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=rMlCz0R0+4hx1OA36aSmAlu8tlv8RUdQ0e7IJLpT36s=; b=LNEPdPwMlZM90w2bMNq4jSNTQfqF+U+e27NSiv2VTVwcQRrE3W1Mq8PCo+l2x8452h l6hES2MmQ6BTIVezn9ClC/xNLd9Imw1kUhkLRiLQF7Q5jrBMVTB9MLuPbkhsE/m9puMf cEhIR4Pc797dFc7E/uD7tZ0BUPXPM8gX1iPVbB2VuNbXmfEzD1c5VbYETXfFGZOoGp7S i94AkJRoZe4dBqZx+BOT7WsRn+3EDy+VIh9rMpWAO3CY6T5MHLxqxI4JQPq3YZTmoYqw eJNAufgUFZ9tmb9MCwQM1dFBi7sCSAs9Uep7IQdapY3N+I4abKzrgk58cRa8FP22rKjo gg4A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=SMhufNb2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z21si9267465edi.577.2021.01.26.04.14.14; Tue, 26 Jan 2021 04:14:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=SMhufNb2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2392396AbhAZMKW (ORCPT + 99 others); Tue, 26 Jan 2021 07:10:22 -0500 Received: from mx2.suse.de ([195.135.220.15]:34440 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2392429AbhAZMJM (ORCPT ); Tue, 26 Jan 2021 07:09:12 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1611662905; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=rMlCz0R0+4hx1OA36aSmAlu8tlv8RUdQ0e7IJLpT36s=; b=SMhufNb2gWzhhrAZNHftCxKV09Z4ON1Nna331EeVwq6g5uadItVhhfYPwqZMwl7NrZHGoe sfrAnbEuROoHMCjd6kgLJlKlWNL6a2+tLn/JlOmK2zxwqEpUN438kLFMPBZjX2B+uErhsc 6f7ZU3sY9xTAQHU0CIqtjHt0JXNni38= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id B5805AD26; Tue, 26 Jan 2021 12:08:24 +0000 (UTC) Date: Tue, 26 Jan 2021 13:08:23 +0100 From: Michal Hocko To: David Hildenbrand Cc: Mike Rapoport , Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, Hagen Paul Pfeifer , Palmer Dabbelt Subject: Re: [PATCH v16 07/11] secretmem: use PMD-size pages to amortize direct map fragmentation Message-ID: <20210126120823.GM827@dhcp22.suse.cz> References: <20210121122723.3446-1-rppt@kernel.org> <20210121122723.3446-8-rppt@kernel.org> <20210126114657.GL827@dhcp22.suse.cz> <303f348d-e494-e386-d1f5-14505b5da254@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <303f348d-e494-e386-d1f5-14505b5da254@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 26-01-21 12:56:48, David Hildenbrand wrote: > On 26.01.21 12:46, Michal Hocko wrote: > > On Thu 21-01-21 14:27:19, Mike Rapoport wrote: > > > From: Mike Rapoport > > > > > > Removing a PAGE_SIZE page from the direct map every time such page is > > > allocated for a secret memory mapping will cause severe fragmentation of > > > the direct map. This fragmentation can be reduced by using PMD-size pages > > > as a pool for small pages for secret memory mappings. > > > > > > Add a gen_pool per secretmem inode and lazily populate this pool with > > > PMD-size pages. > > > > > > As pages allocated by secretmem become unmovable, use CMA to back large > > > page caches so that page allocator won't be surprised by failing attempt to > > > migrate these pages. > > > > > > The CMA area used by secretmem is controlled by the "secretmem=" kernel > > > parameter. This allows explicit control over the memory available for > > > secretmem and provides upper hard limit for secretmem consumption. > > > > OK, so I have finally had a look at this closer and this is really not > > acceptable. I have already mentioned that in a response to other patch > > but any task is able to deprive access to secret memory to other tasks > > and cause OOM killer which wouldn't really recover ever and potentially > > panic the system. Now you could be less drastic and only make SIGBUS on > > fault but that would be still quite terrible. There is a very good > > reason why hugetlb implements is non-trivial reservation system to avoid > > exactly these problems. > > > > So unless I am really misreading the code > > Nacked-by: Michal Hocko > > > > That doesn't mean I reject the whole idea. There are some details to > > sort out as mentioned elsewhere but you cannot really depend on > > pre-allocated pool which can fail at a fault time like that. > > So, to do it similar to hugetlbfs (e.g., with CMA), there would have to be a > mechanism to actually try pre-reserving (e.g., from the CMA area), at which > point in time the pages would get moved to the secretmem pool, and a > mechanism for mmap() etc. to "reserve" from these secretmem pool, such that > there are guarantees at fault time? yes, reserve at mmap time and use during the fault. But this all sounds like a self inflicted problem to me. Sure you can have a pre-allocated or more dynamic pool to reduce the direct mapping fragmentation but you can always fall back to regular allocatios. In other ways have the pool as an optimization rather than a hard requirement. With a careful access control this sounds like a manageable solution to me. -- Michal Hocko SUSE Labs