Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932836AbbESOgh (ORCPT ); Tue, 19 May 2015 10:36:37 -0400 Received: from mail-la0-f54.google.com ([209.85.215.54]:33892 "EHLO mail-la0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755392AbbESOgd (ORCPT ); Tue, 19 May 2015 10:36:33 -0400 MIME-Version: 1.0 In-Reply-To: <29742.1432025631@warthog.procyon.org.uk> References: <31154.1431965087@warthog.procyon.org.uk> <555A88FB.7000809@kernel.org> <29742.1432025631@warthog.procyon.org.uk> From: Andy Lutomirski Date: Tue, 19 May 2015 07:36:11 -0700 Message-ID: Subject: Re: Should we automatically generate a module signing key at all? To: David Howells Cc: Andy Lutomirski , Linus Torvalds , Michal Marek , David Woodhouse , Abelardo Ricart III , Linux Kernel Mailing List , Sedat Dilek , keyrings@linux-nfs.org, Rusty Russell , LSM List , Borislav Petkov , Jiri Kosina Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5590 Lines: 116 On Tue, May 19, 2015 at 1:53 AM, David Howells wrote: > Andy Lutomirski wrote: > >> I think we should get rid of the idea of automatically generated signing keys >> entirely. Instead I think we should generate, at build time, a list of all >> the module hashes and link that into vmlinux. > > Just in Fedora 21: > > warthog>rpm -ql kernel-modules | grep [.]ko | wc -l > 3604 > warthog>rpm -ql kernel-modules-extra | grep [.]ko | wc -l > 480 > > So that's >4000 modules, each signed with a SHA256 sum (32 bytes). That's > more than 125K of unswappable memory. And it's uncompressible as Dave pointed > out. And that doesn't include any metadata to match a module to a digest, but > rather assumes we just scan through the entire list comparing against each > SHA256 sum until we find one that matches. Let's go through the numbers. There are two main things that matter, I think: non-swappable memory and disk space. For simplicity and because it doesn't really matter, I'll ignore things like the filesystem block size. I'll assume that everyone uses a 256-bit hash. (This is charitable to the status quo, since hash size doesn't really matter for public-key signatures, and the default is SHA-1.) I'll further assume that there are 4096 modules or so. The current kernel uses 4096-bit RSA. The kernel text needed for verification seems to be around 21kB (9kB asymmetric_keys + 12kB MPI). The public key is tiny, and the signature is 512 bytes per module. (Actually, it's probably more because of PKCS garbage. I'll ignore that.) This is a total of ~21kB of non-swappable storage and 2MB of disk space for all the signatures. If the goal were to optimize for size, the kernel should probably use a much more compact signature scheme, probably some compressed EC signature. Ed25519 is 64 bytes per signature, which seems to be more or less optimal. That would reduce disk space used to 64 bytes per module or 256kB for 4k modules. With the hash-based scheme I outlined, the kernel text needed is nearly zero. The overhead in each .ko file is zero, and module_hashes.ko is 32 bytes per module or 128kB for 4k modules. It wins the disk space competition hands down. Naively, though, all of that space is non-swappable. Note that any sensible implementation would sort the hash list, making hash checks very fast. One improvement would be to unload module_hashes.ko when you're done with it. That's annoying. A different approach would be to use a hash tree. For a basic binary hash tree, the root (module_hashes.ko, for example) is a single signature, i.e. 32 bytes. (For simplicity, we'd store the number of hashes, too. That would add a couple of bytes.) Each module needs log2(number of modules) - 1 hashes stored. (There's no need for a module to store its own hash, and if the hashes are sorted before the hash tree is generated, then the edge directions are all implicit.) For 4k modules, that's 11 hashes or 352 bytes per module, for a total of 1408kB for 4k modules. The kernel text required is almost zero (while efficiently generating hash trees takes some thought, verifying them is a very simple loop over the hash function). This already beats the status quo in terms of both non-swappable memory and disk space. It still loses to Ed25519 or similar, though. As David Woodhouse pointed out, if kmod were changed, most of the overhead could go away. kmod could generate the proof at module load time. That reduces the total overhead to just the list of hashes. In summary, I think that the hash scheme does quite well for space efficiency, although the comparison is a bit unfair because the current code is unnecessarily inefficient. > >> Then, if anyone actually wants to use a public key to verify modules, they can >> build the public key into a module as opposed to dragging all of the public >> key crud into the main kernel image. > > A chunk of the 'public key crud' has to be in the kernel for other reasons > (the integrity stuff, I think, which has to start before you load any modules) > and the public key stuff is used for other things too (such as kexec and may > well be used for firmware validation in future) - though that doesn't preclude > it being modularised, it does mean that you are likely to load it anyway in > future. What integrity stuff? IIRC dm-verity doesn't use asymmetric crypto at all. IMA probably does, though. For firmware validation, there's no good reason it couldn't work exactly like module signatures. Alternatively, firmware validation could still use loadable public key crypto. (Again, it could be unloaded after boot, which is currently impossible.) For kexec, I think that the main use is for crash dumps, in which case the hash of the crash kernel could be built in. Alternatively, if the crash kernel is identical to the original kernel, it would be reasonably straightforward to arrange for the kernel to accept itself as a valid kexec image. > >> We autogenerate module_hashes.ko > > This just makes things worse. I suspect all distributions would have to load > it anyway - and you don't really win as it will just make the initramfs bigger > instead of the bzImage. For initramfs use, the hash tree approach works quite well, since the hash list doesn't need to live in the initramfs. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/