Received-SPF: pass (google.com: domain of linux-kernel+bounces-204898-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1;
Date: Thu, 6 Jun 2024 19:38:08 +0100
From: Catalin Marinas <catalin.marinas@arm.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Steven Price <steven.price@arm.com>, kvm@vger.kernel.org,
	kvmarm@lists.linux.dev, Will Deacon <will@kernel.org>,
	James Morse <james.morse@arm.com>,
	Oliver Upton <oliver.upton@linux.dev>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org,
	Joey Gouly <joey.gouly@arm.com>,
	Alexandru Elisei <alexandru.elisei@arm.com>,
	Christoffer Dall <christoffer.dall@arm.com>,
	Fuad Tabba <tabba@google.com>, linux-coco@lists.linux.dev,
	Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Subject: Re: [PATCH v3 12/14] arm64: realm: Support nonsecure ITS emulation
 shared
Message-ID: <ZmICEN8JvWM7M9Ch@arm.com>
References: <20240605093006.145492-1-steven.price@arm.com>
 <20240605093006.145492-13-steven.price@arm.com>
 <86a5jzld9g.wl-maz@kernel.org>
 <4c363476-e5b5-42ff-9f30-a02a92b6751b@arm.com>
 <867cf2l6in.wl-maz@kernel.org>
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <867cf2l6in.wl-maz@kernel.org>

On Thu, Jun 06, 2024 at 11:17:36AM +0100, Marc Zyngier wrote:
> On Wed, 05 Jun 2024 16:08:49 +0100,
> Steven Price <steven.price@arm.com> wrote:
> > 2. Use a special (global) memory allocator that does the
> > set_memory_decrypted() dance on the pages that it allocates but allows
> > packing the allocations. I'm not aware of an existing kernel API for
> > this, so it's potentially quite a bit of code. The benefit is that it
> > reduces memory consumption in a realm guest, although fragmentation
> > still means we're likely to see a (small) growth.
> > 
> > Any thoughts on what you think would be best?
> 
> I would expect that something similar to kmem_cache could be of help,
> only with the ability to deal with variable object sizes (in this
> case: minimum of 256 bytes, in increments defined by the
> implementation, and with a 256 byte alignment).

Hmm, that's doable but not that easy to make generic. We'd need a new
class of kmalloc-* caches (e.g. kmalloc-decrypted-*) which use only
decrypted pages together with a GFP_DECRYPTED flag or something to get
the slab allocator to go for these pages and avoid merging with other
caches. It would actually be the page allocator parsing this gfp flag,
probably in post_alloc_hook() to set the page decrypted and re-encrypt
it in free_pages_prepare(). A slight problem here is that free_pages()
doesn't get the gfp flag, so we'd need to store some bit in the page
flags. Maybe the flag is not that bad, do we have something like for
page_to_phys() to give us the high IPA address for decrypted pages?

Similarly if we go for a kmem_cache (or a few for multiple sizes). One
can specify a constructor which could set the memory decrypted but
there's no destructor (and also the constructor is per object, not per
page, so we'd need some refcounting).

Another approach contained within the driver is to use mempool_create()
with our own _alloc_fn/_free_fn that sets the memory decrypted/encrypted
accordingly, though sub-page allocations need additional tracking. Also
that's fairly similar to kmem_cache, fixed size.

Yet another option would be to wire it somehow in the DMA API but the
minimum allocation is already a page size, so we don't gain anything.

What gets somewhat closer to what we need is gen_pool. It can track
different sizes, we just need to populate the chunks as needed. I don't
think this would work as a generic allocator but may be good enough
within the ITS code.

If there's a need for such generic allocations in other parts of the
kernel, my preference would be something around kmalloc caches and a new
GFP flag (first option; subject to the selling it to the mm folk). But
that's more of a separate prototyping effort that may or may not
succeed.

Anyway, we could do some hacking around gen_pool as a temporary solution
(maybe as a set of patches on top of this series to be easier to revert)
and start investigating a proper decrypted page allocator in parallel.
We just need to find a victim that has the page allocator fresh in mind
(Ryan or Alexandru ;)).

-- 
Catalin