Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp627823pxb; Tue, 2 Feb 2021 13:40:03 -0800 (PST) X-Google-Smtp-Source: ABdhPJxxVvfanelRLQXc3Rjg9YQQnEwG/6VT2XSn5T4kE86HzVD7Wn0Oxh7eKzikjzShVKZaxLKw X-Received: by 2002:aa7:db1a:: with SMTP id t26mr94823eds.25.1612302003297; Tue, 02 Feb 2021 13:40:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612302003; cv=none; d=google.com; s=arc-20160816; b=Rq9aLHh6ef76GbOt3rpMHxjTaXOf3P071bbnflTyfO9MOzOunkVGQ/G+xvzh20byzj 3W0EwMPJuGUXaLW7QMopcLwEuwKNxr3O/yyZSqJwq4GfCFfcwERf7z5XVhEEBIaLlwnM 6PwHn4QoY6vJiEa6vcSg3QHoIumBQMg8UnqIRrqKtITpTYsQMcibKnVNR+oRBQr/obKB zQeaSXIXHmLWKwW2gkxuVW8s/HOPeSbnuk4zf7dRsykVZ0jwCcJB8KRYPJj0XZMpQqKf krmEto4D6xkNSP/ZZc0CFnDcb7RBO5tbkdaUXoKGwQVXazBmNhORejXTy0OF6TCGagJb 2mUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:subject :organization:from:references:cc:to:dkim-signature; bh=TzlZq29k5DxcO121v0em/werHdNKON5M2NKDPpBxCdU=; b=aaReYH7eROVTvtkdSDkPTHwwyhzi9jYedRmmMpobGEAkAdBFIYYzKkCTVG3UNREgZP BlOjtzSDc+07mGmET/qsG9YE+9mlkGI6doDyg9d0CdrpQFne2iXdxtg7R9nwAXnDJoyn Y+uFdzIfzLaG2kvzBzAmSZEhdEJzXYaBi87s/WQWE+DXa7DscdsnwJmogTwvOMQULgFc 0GHwuSrVMHLpVgIxZ0JoubIioy3e3ZZaUu1Ptv5A7HgVj6uo09QHMi5c9nx0E/LT5Mc7 rflrP/cT4ej5mkSWHM1Bzm+I32hq5cmyzeietcNrcc/wrSHKENHLi2MTVS6Y+cR5eZOI sbeA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=OPPtcnxI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id qk25si54326ejb.422.2021.02.02.13.39.37; Tue, 02 Feb 2021 13:40:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=OPPtcnxI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232235AbhBBNSD (ORCPT + 99 others); Tue, 2 Feb 2021 08:18:03 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:57802 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232236AbhBBNRw (ORCPT ); Tue, 2 Feb 2021 08:17:52 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1612271669; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TzlZq29k5DxcO121v0em/werHdNKON5M2NKDPpBxCdU=; b=OPPtcnxIdyCjRncwVvFcJrUscPz1+YjBsw6PM/IUFVx/YBz5ThXDG5UOct6wdjrysLEsXQ Bfa7gDetJ4Vrn1wXtqoWchG1aztglNVsZIkxyX6xDkc0lbJAKONwGb4aXCGhScnrTCycgw /WWdPHb4CzGlPKN+Qhs29w736BeeTRI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-267-YjtjDQfPMTG3ZZy6csP2bQ-1; Tue, 02 Feb 2021 08:14:25 -0500 X-MC-Unique: YjtjDQfPMTG3ZZy6csP2bQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 38FC69CDA0; Tue, 2 Feb 2021 13:14:20 +0000 (UTC) Received: from [10.36.114.148] (ovpn-114-148.ams2.redhat.com [10.36.114.148]) by smtp.corp.redhat.com (Postfix) with ESMTP id 463FE1F0; Tue, 2 Feb 2021 13:14:10 +0000 (UTC) To: Mike Rapoport , Michal Hocko Cc: James Bottomley , Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, Hagen Paul Pfeifer , Palmer Dabbelt References: <20210121122723.3446-8-rppt@kernel.org> <20210126114657.GL827@dhcp22.suse.cz> <303f348d-e494-e386-d1f5-14505b5da254@redhat.com> <20210126120823.GM827@dhcp22.suse.cz> <20210128092259.GB242749@kernel.org> <73738cda43236b5ac2714e228af362b67a712f5d.camel@linux.ibm.com> <6de6b9f9c2d28eecc494e7db6ffbedc262317e11.camel@linux.ibm.com> <20210202124857.GN242749@kernel.org> From: David Hildenbrand Organization: Red Hat GmbH Subject: Re: [PATCH v16 07/11] secretmem: use PMD-size pages to amortize direct map fragmentation Message-ID: <6653288a-dd02-f9de-ef6a-e8d567d71d53@redhat.com> Date: Tue, 2 Feb 2021 14:14:09 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: <20210202124857.GN242749@kernel.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02.02.21 13:48, Mike Rapoport wrote: > On Tue, Feb 02, 2021 at 10:35:05AM +0100, Michal Hocko wrote: >> On Mon 01-02-21 08:56:19, James Bottomley wrote: >> >> I have also proposed potential ways out of this. Either the pool is not >> fixed sized and you make it a regular unevictable memory (if direct map >> fragmentation is not considered a major problem) > > I think that the direct map fragmentation is not a major problem, and the > data we have confirms it, so I'd be more than happy to entirely drop the > pool, allocate memory page by page and remove each page from the direct > map. > > Still, we cannot prove negative and it could happen that there is a > workload that would suffer a lot from the direct map fragmentation, so > having a pool of large pages upfront is better than trying to fix it > afterwards. As we get more confidence that the direct map fragmentation is > not an issue as it is common to believe we may remove the pool altogether. > > I think that using PMD_ORDER allocations for the pool with a fallback to > order 0 will do the job, but unfortunately I doubt we'll reach a consensus > about this because dogmatic beliefs are hard to shake... > > A more restrictive possibility is to still use plain PMD_ORDER allocations > to fill the pool, without relying on CMA. In this case there will be no > global secretmem specific pool to exhaust, but then it's possible to drain > high order free blocks in a system, so CMA has an advantage of limiting > secretmem pools to certain amount of memory with somewhat higher > probability for high order allocation to succeed. I am not really concerned about fragmenting/breaking up the direct map as long as the feature has to be explicitly enabled (similar to fragmenting the vmemmap). As already expressed, I dislike allowing user space to consume an unlimited number unmovable/unmigratable allocations. We already have that in some cases with huge pages (when the arch does not support migration) - but there we can at least manage the consumption using the whole max/reserved/free/... infrastructure. In addition, adding arch support for migration shouldn't be too complicated. The idea of using CMA is quite good IMHO, because there we can locally limit the direct map fragmentation and don't have to bother about migration at all. We own the area, so we can place as many unmovable allocations on it as we can fit. But it sounds like, we would also need some kind of reservation mechanism in either scenario (CMA vs. no CMA). If we don't want to go full-circle on max/reserved/free/..., allowing for migration of secretmem pages would make sense. Then, these pages become "less special". Map source, copy, unmap destination. The security implementations are the ugly part. I wonder if we could temporarily map somewhere else, so avoiding to touch the direct map during migration. -- Thanks, David / dhildenb