For as far back as I can tell, writing to /dev/urandom or /dev/random
will put entropy into the pool, but won't immediately use it, and won't
credit it either. Instead, crediting is controlled by the ioctls
RNDADDTOENTCNT and RNDADDENTROPY. If, however, this happens during early
boot before the input pool is ready, we have a dangerous situation with
seed files as commonly used by shell scripts:
dd if=seedfile of=/dev/urandom # write seed into pool
dd if=/dev/urandom of=seedfile count=1 bs=32 # read new seed for next boot
Since the entropy in seedfile isn't credited there, this won't cause the
RNG to transition from crng_init=0 to crng_init=2, and so when we make a
new seedfile for next boot, we'll still be getting crng_init=0-quality
randomness, which may well regress from the original seedfile.
I fixed a related issue in systemd with
<https://github.com/systemd/systemd/commit/da2862ef06f22fc8d31dafced6d2d6dc14f2ee0b>,
which is a clean way of doing it, but it doesn't really help with
existing shell scripts. And the behavior here remains bad and
surprising.
So this patch fixes the issue by including /dev/urandom writes as part
of the "fast init", but not crediting it as part of the fast init
counter. This is more or less exactly what's already done for
kernel-sourced entropy whose quality we don't know, when we use
add_device_randomness(), which both contributes to the input pool and to
the fast init key.
There is one caveat to consider, which is what happens if the user
additionally calls RNDADDTOENTCNT after having written to /dev/urandom,
expecting to credit that write. That might give way to this pathological
pattern:
- write(urandom_fd, &single_byte, 1);
- ioctl(urandom_fd, RNDADDTOENTCNT, 8);
- attacker brute forces that single byte.
- write(urandom_fd, &single_byte, 1);
- ioctl(urandom_fd, RNDADDTOENTCNT, 8);
- attacker brute forces that single byte.
- write(urandom_fd, &single_byte, 1);
- ioctl(urandom_fd, RNDADDTOENTCNT, 8);
- attacker brute forces that single byte.
- write(urandom_fd, &single_byte, 1);
- ioctl(urandom_fd, RNDADDTOENTCNT, 8);
- attacker brute forces that single byte.
- write(urandom_fd, &single_byte, 1);
- ioctl(urandom_fd, RNDADDTOENTCNT, 8);
- attacker brute forces that single byte.
- ...
After this goes 32 times, the crng has now initialized, except an
attacker was able to bruteforce the whole state, making for a sort of
boot time "premature next" problem.
This is bad, but the case above is pretty pathological. Part of this
stems from the fact that /dev/urandom write + RNDADDTOENTCNT is a poor
interface, because it's unclear whether the crediting will follow the
writing, whereas we know in the add_device_randomness() case that we
_won't_ credit it, so we don't have to worry about that.
The better interface for userspace is RNDADDENTROPY, which takes the
input buffer and the entropy credit all at once, so we can make the
right decision. For the RNDADDENTROPY, we do not take part in fast init
if entropy is being credited.
And perhaps we might consider attempting to deprecate RNDADDTOENTCNT at
some point in the future.
Cc: Linus Torvalds <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Cc: Jann Horn <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
---
This isn't a "perfect" solution, and the of entropy accounting sort of
points both to problems there and to how the "fast init" phase fits in
to the overall design. So I'll think this over a bit for a few days, and
maybe it might lead to more improvements in fast init handling later.
drivers/char/random.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 706f08edf0dc..7b7f5e6596c2 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1493,7 +1493,7 @@ static __poll_t random_poll(struct file *file, poll_table *wait)
return mask;
}
-static int write_pool(const char __user *ubuf, size_t count)
+static int write_pool(const char __user *ubuf, size_t count, bool will_credit)
{
size_t len;
int ret = 0;
@@ -1507,6 +1507,8 @@ static int write_pool(const char __user *ubuf, size_t count)
}
count -= len;
ubuf += len;
+ if (unlikely(crng_init == 0 && !will_credit))
+ crng_pre_init_inject(block, len, false);
mix_pool_bytes(block, len);
cond_resched();
}
@@ -1521,7 +1523,13 @@ static ssize_t random_write(struct file *file, const char __user *buffer,
{
int ret;
- ret = write_pool(buffer, count);
+ /*
+ * We set "will_credit" to false here, which might be wrong if the
+ * user subsequently calls RNDADDTOENTCNT. But it accounts for the
+ * more common case of shell scripts writing to /dev/urandom and
+ * then immediately reading back a seed file.
+ */
+ ret = write_pool(buffer, count, false);
if (ret)
return ret;
@@ -1584,7 +1592,7 @@ static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
return -EINVAL;
if (get_user(size, p++))
return -EFAULT;
- retval = write_pool((const char __user *)p, size);
+ retval = write_pool((const char __user *)p, size, ent_count > 0);
if (retval < 0)
return retval;
credit_entropy_bits(ent_count);
--
2.35.1
On Tue, Mar 22, 2022 at 12:15 PM Jason A. Donenfeld <[email protected]> wrote:
>
> @@ -1507,6 +1507,8 @@ static int write_pool(const char __user *ubuf, size_t count)
> }
> count -= len;
> ubuf += len;
> + if (unlikely(crng_init == 0 && !will_credit))
> + crng_pre_init_inject(block, len, false);
> mix_pool_bytes(block, len);
> cond_resched();
> }
Ugh. I hate that whole crng_pre_init_inject() dance.
We already mix the data into the input_pool with that 'mix_pool_bytes()' call.
So what I think the real fix is, is to just make urandom_read() use
the input_pool data directly for initializing the state.
IOW, why isn't the patch along the lines of just making
crng_make_state() take the data from the input pool instead, when
crng_ready() isn't set?
As a broken example patch, something like the appended (except that
doesn't build, because 'input_pool' is declared later)?
So take this purely as a conceptual patch, not a real patch.
(Yeah, I think this also means that code that currently does that
crng_pre_init_inject(pool, sizeof(pool), true);
mix_pool_bytes(pool, sizeof(pool));
should do those two operations in the reverse order, so that the input
pool is always updated before that crng_pre_init_inject() dance).
Maybe I'm missing something. But it seems kind of silly to use
base_crng AT ALL before crng_ready(). Why not use the pool we have
that *is* actually updated (that 'input_pool')?
Linus
@@ -374,19 +374,14 @@ static void crng_make_state(u32
chacha_state[CHACHA_STATE_WORDS],
/*
* For the fast path, we check whether we're ready, unlocked first, and
* then re-check once locked later. In the case where we're really not
- * ready, we do fast key erasure with the base_crng directly, because
- * this is what crng_pre_init_inject() mutates during early init.
+ * ready, we do fast key erasure with the input pool directly.
*/
if (!crng_ready()) {
- bool ready;
-
- spin_lock_irqsave(&base_crng.lock, flags);
- ready = crng_ready();
- if (!ready)
- crng_fast_key_erasure(base_crng.key, chacha_state,
- random_data, random_data_len);
- spin_unlock_irqrestore(&base_crng.lock, flags);
- if (!ready)
+ spin_lock_irqsave(&input_pool.lock, flags);
+ crng_fast_key_erasure(input_pool.key, chacha_state,
+ random_data, random_data_len);
+ spin_unlock_irqrestore(&input_pool.lock, flags);
+ if (!crng_ready())
return;
}
Hey Alex,
Thanks a bunch for doing that research.
On Tue, Mar 22, 2022 at 10:30 PM Alex Xu (Hello71) <[email protected]> wrote:
> Several programs use it for testing purposes, without writing any
> entropy to /dev/random or /dev/urandom, including rauc, wireguard
The WireGuard use case is sitting in my tree but unpushed, and indeed
it's done as a hack. I have no qualms about changing that before
pushing to net or net-next.
> - kata-containers is a lightweight VM implementation. Its guest-side
> agent offers a gRPC endpoint which will write the provided data to
> /dev/random, then call RNDADDTOENTCNT with the length of the data,
> then call RNDRESEEDRNG.
Sounds like this usage is safe, and that this patch wouldn't really
fix much there.
> - aws-nitro-enclaves-sdk-c is an SDK for building lightweight VMs to be
> used with AWS Nitro Enclaves. kmstool-enclave is a sample application
> provided, which writes "up to 256 bytes" (from where?) to /dev/random,
> then calls RNDADDTOENTCNT, then repeats the process until it reaches
> 1024 bytes.
Looks like another safe use case, which the patch here doesn't help.
Actually this patch might _hurt_ that use case if some of those writes
are short (less than 7 bytes or so). So that might be a data point
that indicates we shouldn't merge this patch, and instead should go
with the "do nothing" route.
> - sandy-harris/maxwell is a "jitter entropy" daemon, similar to haveged.
> It writes 4 bytes of "generated entropy" to /dev/random, then calls
> RNDADDTOENTCNT, then repeats.
Okay bingo. The existence of this means that this patch will
definitely introduce a new vulnerability. It means that an attacker
can brute force all of Sandy's (CC'd now) inputs 32 bits at a time. So
I don't think I can merge this patch as-is.
A potential fix for this would be to change:
+ if (unlikely(crng_init == 0 && !will_credit))
+ crng_pre_init_inject(block, len, false);
into
+ if (unlikely(crng_init == 0 && !will_credit && count
>= 16 && capable(CAP_SYS_ADMIN)))
+ crng_pre_init_inject(block, len, false);
Or something like that. But that doesn't account for the case where
what's written to /dev/urandom is 300 bytes long but only has 3 bits
of unknown data. So it's better as far as heuristics go, but it
doesn't definitely solve the problem, which stems from the fact of
separating entropy writing from entropy crediting into separate calls.
(Plus as I already mentioned to David, the patch still wouldn't help
the crng_init=1 case.)
> - guix is, among other things, a "GNU/"Linux distribution. The provided
> base services write the seed file to /dev/urandom, then call
> RNDADDTOENTCNT, then write 512 bytes from /dev/hwrng to /dev/urandom,
> then call RNDADDTOENTCNT, then "immediately" read 512 bytes from
> /dev/urandom and write it to the seed file. On shutdown, 512 bytes are
> read from /dev/urandom and written to the seed file.
That also sounds like a safe usage of RNDADDTOENTCNT.
> I don't have any particular expertise with the random subsystem or
> conclusions to make from this data, but I hope this helps inform the
> discussion.
Very much so, thanks again. What I take away from your results is:
- RNDADDTOENTCNT is in active use in a safe way. Sure, RNDADDENTROPY
is still much better, but RNDADDTOENTCNT isn't entirely broken in the
above configurations either.
- This patch would make RNDADDTOENTCNT unsafe for some of the above
configurations in a way that it currently isn't unsafe.
- Plenty of things are seeding the RNG correctly, and buildroot's
shell script is just "doing it wrong".
On that last point, I should reiterate that buildroot's shell script
still isn't actually initializing the RNG, despite what it says in its
echo; there's never been a way to initialize the RNG from a shell
script, without calling out to various special purpose ioctl-aware
binaries.
Jason
Hi David (and Linus),
On Tue, Mar 22, 2022 at 8:16 PM David Laight <[email protected]> wrote:
> Never mind scripts that try to immediately save a new seedfile [1].
>
> What about code run by later startup scripts that wants random numbers.
> They really do want the seedfile data to be used.
> If it isn't used then they are likely to get very weak random numbers.
>
> You can't really expect startup scripts to be issuing ioctl requests.
To be clear, this "expect[ation]" of yours is very much a new
expectation. Crediting bits has required an ioctl since forever. Those
shell scripts have been broken forever. The proposal here is to add new
behavior to support those old broken shell scripts.
Fortunately, it seems sort of fixable. But only sort of. There are also
a lot of complications, as detailed above. Part of it is that some
people use /dev/urandom writes expecting it not to credit, while others
use that and then later manually credit it with the ioctl. Both of those
in general seem like not a very good interface for seeding the rng. The
correct interface to use is RNDADDENTROPY, which takes both the data and
whether it should be credited, since then the kernel knows what the
intentions are and can do something smart with it. Barring that
knowledge, we're in this vague middle ground where it's unclear what the
user intends to do.
"But I want my shell scripts to work, even if they have never worked,"
you fairly protest. I'll do my best here to figure something out, but
note that introducing sane behavior to /dev/urandom writes might imply
breaking compatibility with the behavior /dev/urandom writes have always
had in the past. So the "perfect" solution for a shell script seeding
interface might prove elusive, and we'll be left with something merely,
"not terrible."
However, if you do think you have a "perfect" solution that takes into
account all these concerns and doesn't break any prior contracts, please
do pipe up.
Two more thoughts, also for Linus' consideration:
- Had we not needed to revert the /dev/urandom + /dev/random unification
patch, we wouldn't be facing this problem. So a last minute creative
solution to save that effort would be quite welcome. I'm not
optimistic about us finding that right away, but if a lightbulb goes
off, I'd be quite happy.
- Since these seeding shell scripts have always been broken, because
this is how the rng has always been, rather than trying to bolt on a
very imperfect fix in the kernel for something that never worked
right, we could suggest shell scripts take the path that I implemented
for systemd:
https://github.com/systemd/systemd/commit/da2862ef06f22fc8d31dafced6d2d6dc14f2ee0b
In shell, this would look like:
#!/bin/bash
cat seedfile > /dev/urandom
{ cat seedfile; head -c 32 /dev/urandom; } | sha256sum | cut -d ' ' -f 1 > seedfile
This seems better in nearly every way to trying to retroactively fix it
in the kernel by changing the semantics of an extremely old interface.
The more I think about it, the more I'm inclined to go with this "do
nothing" approach. Open to hearing your thoughts though.
Jason
I searched for users of RNDADDTOENTCNT using
(?s:ioctl.{1,500}RNDADDTOENTCNT) on Debian Code Search and
"/(?s)ioctl.{1,40},\s*RNDADDTOENTCNT/ -path:incfs_test.c" on GitHub Code
Search (beta).
Several programs use it for testing purposes, without writing any
entropy to /dev/random or /dev/urandom, including rauc, wireguard, and
openSUSE kdump. Several programs use it as intended, after writing
entropy to /dev/random or /dev/urandom. Of the latter group,
- kata-containers is a lightweight VM implementation. Its guest-side
agent offers a gRPC endpoint which will write the provided data to
/dev/random, then call RNDADDTOENTCNT with the length of the data,
then call RNDRESEEDRNG. As far as I can tell, this endpoint is
made available to users on the host, but is not used by
kata-containers itself.
- aws-nitro-enclaves-sdk-c is an SDK for building lightweight VMs to be
used with AWS Nitro Enclaves. kmstool-enclave is a sample application
provided, which writes "up to 256 bytes" (from where?) to /dev/random,
then calls RNDADDTOENTCNT, then repeats the process until it reaches
1024 bytes.
- sandy-harris/maxwell is a "jitter entropy" daemon, similar to haveged.
It writes 4 bytes of "generated entropy" to /dev/random, then calls
RNDADDTOENTCNT, then repeats.
- guix is, among other things, a "GNU/"Linux distribution. The provided
base services write the seed file to /dev/urandom, then call
RNDADDTOENTCNT, then write 512 bytes from /dev/hwrng to /dev/urandom,
then call RNDADDTOENTCNT, then "immediately" read 512 bytes from
/dev/urandom and write it to the seed file. On shutdown, 512 bytes are
read from /dev/urandom and written to the seed file.
I was unable to locate any other public non-archived usages of
RNDADDTOENTCNT on Debian or GitHub Code Search.
I don't have any particular expertise with the random subsystem or
conclusions to make from this data, but I hope this helps inform the
discussion.
Cheers,
Alex.
Hey Linus,
On Tue, Mar 22, 2022 at 2:42 PM Linus Torvalds
<[email protected]> wrote:
>
> On Tue, Mar 22, 2022 at 12:15 PM Jason A. Donenfeld <[email protected]> wrote:
> >
> > @@ -1507,6 +1507,8 @@ static int write_pool(const char __user *ubuf, size_t count)
> > }
> > count -= len;
> > ubuf += len;
> > + if (unlikely(crng_init == 0 && !will_credit))
> > + crng_pre_init_inject(block, len, false);
> > mix_pool_bytes(block, len);
> > cond_resched();
> > }
>
> Ugh. I hate that whole crng_pre_init_inject() dance.
Yea it's not great, but it's a helluva lot better than what was there
before 5.18, when there were two different ways of mixing into the
crng key, both of which were insecure. At least now it's a hash
function. As I mentioned in that pull request, I tried to shore up the
existing design as much as I could without departing from it in any
large fundamental ways. It may well be time to revisit the pre init
situation though.
Maybe it'd help if I described what the goals are of the current
approach and how it attempts to accomplish that.
The chacha key that's used for random bytes is separate from the input
pool, which is where entropy gets dumped. The reason for that
separation is because it's important that we decide when and under
what conditions to extract from the input pool into a new chacha key.
If we do it too often or prematurely, then we risk updating the chacha
key with very few new bits of entropy. If there aren't enough new
bits, and an attacker knows the old state (say, because the system
just booted), then it's trivial to bruteforce those new bits, since
read access to /dev/urandom is unprivileged. That's very bad.
For that reason, we only reseed when we've collected 256 bits of
entropy, and at a schedule of uptime/2 for the first 10 minutes, and
then every 5 minutes after that.
The question, here, is what to do before we've collected 256 bits of
entropy. 80 bits is still better than 0, for example. The existing
scheme (from Ted) is that we maintain a state variable, crng_init.
When crng_init=0, we direct all entropy into the chacha key directly.
If that entropy is creditable, then we account for up to 64 bytes of
input total, regardless of how many "bits" of entropy we would
otherwise be crediting were it not pre-init phase. That's kinda weird,
but the motivating mantra has always been, "fast init is garbage, but
it's better than nothing." Then, once we hit 64 bytes of fast init
accounting, we transition to crng_init=1. At this stage, we're putting
bytes into the normal input pool and crediting it as usual, not
updating the crng key directly anymore with injection, and the crng is
still blocked. When we hit 256 bits of accounted entropy from the
crng_init=1 stage, then we're ready to go, and we transition to
crng_init=2.
So this patch helps with the crng_init=0 case. It doesn't address the
crng_init=1 case, when the bytes written to /dev/urandom won't be
directly injected as they are for crng_init=0. In any case, the one
thing we're trying to preserve in all of this is that when we do
transition to crng_init=2, it's with a pool that contains 256 bits of
entropy that haven't previously been exposed anywhere, so that they
can't be brute forced.
With that as background, to answer your question:
> Maybe I'm missing something. But it seems kind of silly to use
> base_crng AT ALL before crng_ready(). Why not use the pool we have
> that *is* actually updated (that 'input_pool')?
In this case, base_crng.key is really just doubling as a separate
pool. The "pre init pool", the "fast init pool", the "super terrible
but at least not zero pool", the "all bets are off pool", ... whatever
you want to call it. Why a separate pool for pre init? Because the
real input pool needs to accumulate 256 bits of entropy before it's
safe to use.
Your suggestion is to instead not have a separate pool, but perhaps
just do separate accounting. That might work to some degree, but the
devil is in the details, and that sounds a lot harder and messier to
code. For example, you wrote:
> + crng_fast_key_erasure(input_pool.key, chacha_state,
> + random_data, random_data_len);
Except there is no input_pool.key, because the input_pool's state is
actually an un-finalized hash function. So what you wind up with
instead is calling extract_entropy() on every read. But then you have
to decide a certain point when you stop doing that, so it can
accumulate 256 bits prior to exposure, and things quickly get
extremely messy. So I don't think your suggestion improves much.
The long term solution to this stuff might be doing away with the
entropy accounting entirely, as I suggested in the other thread. The
shorter term solution, though, might be reimagining how the
crng_init=1/2 states work to begin with. (And the shortest-term "fix"
is this patch, which while not perfect is at least something.)
Just sort of spitballing what the shorter term solution might be, we
could do something exponential, where we get rid of the
pre_init_inject stuff entirely, but call `crng_reseed(force=true)` at
1 bit, 2 bits, 4 bits, 8 bits, 16 bits, 32 bits, ... 256 bits, the
same way we do now with the time. (This is somewhat similar to what NT
does, fwiw.) A downside is that the gap between, say, 64 bits and 128
bits is much "longer" than what we have now with pre_init_inject.
While the above might be an improvement, that doesn't help with our
/dev/urandom problem, though. Perhaps for that we could have some
messy heuristic, like `if (capable(CAP_SYS_ADMIN) && !crng_ready())
crng_reseed(force=true);`. That's sort of ugly too, but it would help
with the /dev/urandom shellscript seeders.
I'll think on it some more, but these two spitball ideas together
might result in a nice simplification by eliminating the fast pool
entirely. Happy to hear more ideas from you too if the above inspires
anything.
Jason
Hi Ted,
On Tue, Mar 22, 2022 at 9:35 PM Theodore Ts'o <[email protected]> wrote:
> One of the big issues with /dev/urandom writes is that *anyone*,
> including malicious user space, can force specific bytes to be mixed
> in. That's the source of the reluctance to immediate use inputs from
> writes into /dev/[u]random until there is a chance for it to be mixed
> in with other entropy which is hopefully not under the control of
> malicious userspace.
Right, sort of. Since we now always use a cryptographic hash function,
we can haphazardly mix whatever any user wants, without too much
concern. The issue is whether we _credit_ those bits. Were we to credit
those bits, a malicious unpriv'd user could credit 248 bits of known
input, and then bruteforce 8 bits of unknown input, and repeat, and in
that way destroy the security of the thing. So, yea, the current
reluctance does make sense.
> Now, I recognize that things are a bit special in early boot, and if
> we have a malicious script running in a systemd unit script, we might
> as well go home. But something to consider is whether we want to do
> soemthing special if the process writing to /dev/[u]random has
> CAP_SYS_ADMIN, or some such.
Exactly. So one way of approaching this would be to simply credit writes
to /dev/urandom if it's CAP_SYS_ADMIN and maybe if also !crng_ready(),
and just skip the crng_pre_init_inject() part that this current patch
adds. I'll attach a sample patch of what this might look like at the end
of this email.
The problem with that approach, though, is that various userspaces might
already write garbage into /dev/urandom, not expecting them to be
credited -- for example, some userspace hardware configuration component
that writes some serial number there. So I'm sort of hesitant to
_change_ the behavior that we've had for so long.
Another variation on that would be to do what this current patch does,
but only crng_pre_init_inject() on CAP_SYS_ADMIN. But this has the same
pitfall of only working as intended at cnrg_init=0 but not crng_init=1.
That's better than nothing, but it's not perfect, and it introduces that
problem with RNDADDTOENTCNT.
Also, to echo the point I made in the email I just sent to David, this
has _always_ been broken, and those shell scripts have _always_ been
vulnerable. Maybe the kernel should fix that, but due to the ambiguity
of the /dev/urandom write interface, maybe the best fix is actually in
userspace itself, which means it'd work on old kernels too (which are
rather common for the embedded devices that tend to have those types of
shell scripts). Specifically, I'm talking about this fix I did for
systemd:
https://github.com/systemd/systemd/commit/da2862ef06f22fc8d31dafced6d2d6dc14f2ee0b
And then the same fix that I just submitted this evening for Buildroot:
https://lists.buildroot.org/pipermail/buildroot/2022-March/639359.html
It seems like hashing the old seed together with the new one ought to be
a good standard practice to mitigate against all sorts of bugs.
> Yeah, no one should ever ver ever be using RNDADDTOENTCNT. It's an
> ioctl which requires root privilegs, and if it breaks, you get to keep
> both pieces.
>
> > And perhaps we might consider attempting to deprecate RNDADDTOENTCNT at
> > some point in the future.
>
> That would be a good idea. :-)
Oh cool, I'm glad you agree. Let's do that then. Have a preferred path?
Maybe just a pr_once() saying not to use it?
Jason
--------8<-------------------------------------------
The CAP_SYS_ADMIN idea, maybe not so good, but here for illustration:
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 706f08edf0dc..d1dc46366647 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1493,12 +1493,16 @@ static __poll_t random_poll(struct file *file, poll_table *wait)
return mask;
}
-static int write_pool(const char __user *ubuf, size_t count)
+static int write_pool(const char __user *ubuf, size_t count, bool credit_at_boot)
{
- size_t len;
+ size_t len, bits;
int ret = 0;
u8 block[BLAKE2S_BLOCK_SIZE];
+ if (count > SIZE_MAX / 8)
+ return -EINVAL;
+ bits = count * 8;
+
while (count) {
len = min(count, sizeof(block));
if (copy_from_user(block, ubuf, len)) {
@@ -1511,6 +1515,9 @@ static int write_pool(const char __user *ubuf, size_t count)
cond_resched();
}
+ if (credit_at_boot && !crng_ready() && capable(CAP_SYS_ADMIN))
+ credit_entropy_bits(bits);
+
out:
memzero_explicit(block, sizeof(block));
return ret;
@@ -1521,7 +1528,7 @@ static ssize_t random_write(struct file *file, const char __user *buffer,
{
int ret;
- ret = write_pool(buffer, count);
+ ret = write_pool(buffer, count, true);
if (ret)
return ret;
@@ -1584,7 +1591,7 @@ static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
return -EINVAL;
if (get_user(size, p++))
return -EFAULT;
- retval = write_pool((const char __user *)p, size);
+ retval = write_pool((const char __user *)p, size, false);
if (retval < 0)
return retval;
credit_entropy_bits(ent_count);
From: Jason A. Donenfeld
> Sent: 23 March 2022 02:51
>
> On Tue, Mar 22, 2022 at 8:16 PM David Laight <[email protected]> wrote:
> > Never mind scripts that try to immediately save a new seedfile [1].
> >
> > What about code run by later startup scripts that wants random numbers.
> > They really do want the seedfile data to be used.
> > If it isn't used then they are likely to get very weak random numbers.
> >
> > You can't really expect startup scripts to be issuing ioctl requests.
>
> To be clear, this "expect[ation]" of yours is very much a new
> expectation. Crediting bits has required an ioctl since forever. Those
> shell scripts have been broken forever. The proposal here is to add new
> behavior to support those old broken shell scripts.
...
I personally won't have expected the behaviour for long!
I was only trying to get a buildroot system to initialise the
random number generator last year.
But I'm sure I read some of the documentation as well as looking
at the scripts and the kernel sources.
The buildroot scripts actually need fixing so they actually
add entropy on older kernel.
I do remember looking at one of the kernel entropy stores
(probably the Linux one) a few years back and thinking
that it was over-complex and probably didn't actually work
that well in reality.
IIRC it saved 'entropy bytes' and the number of bits of entropy
they represented - and then read out enough bytes to get
the required entropy to reseed the PRNG.
Now if your PRNG has N bits of state. In principle at least
after you've output N bits someone can solve the simultaneous
equations and determine the PRNG state.
But as soon as you use a cryptographic hash function that
is not really possible in any reasonable timeframe.
(Is even MD5 that broken?)
But the 'entropy store' can just stir in new bytes and
count the number of entropy bits.
Then it doesn't really care how random the bytes are.
(Apart from an estimation of how 'full' it is.)
Copy bits to the PRNG and you reduce the number of
bits in the entropy store - but continue just stirring
in new data.
I've often wondered whether the RC5 algorithm would
make a good entropy store.
Just cycle the algorithm with each entropy byte as
is done when setting each key byte.
Clearly you don't want to use the RC5 output as random data.
But it ought to be plenty good enough to keep entropy.
The only real problem is that RC5 is pretty horrid on
the data cache, and probably a bit big for per-cpu data.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
From: Jason A. Donenfeld
> Sent: 23 March 2022 04:48
...
> - Plenty of things are seeding the RNG correctly, and buildroot's
> shell script is just "doing it wrong".
>
> On that last point, I should reiterate that buildroot's shell script
> still isn't actually initializing the RNG, despite what it says in its
> echo; there's never been a way to initialize the RNG from a shell
> script, without calling out to various special purpose ioctl-aware
> binaries.
Perhaps the very first write after boot could be assumed to
be valid initialisation data?
(On top of a few other tests.)
Then I need a patch against 5.10 :-)
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Hi all,
On Tue, Mar 22, 2022 at 10:47 PM Jason A. Donenfeld <[email protected]> wrote:
> Very much so, thanks again. What I take away from your results is:
>
> - RNDADDTOENTCNT is in active use in a safe way. Sure, RNDADDENTROPY
> is still much better, but RNDADDTOENTCNT isn't entirely broken in the
> above configurations either.
> - This patch would make RNDADDTOENTCNT unsafe for some of the above
> configurations in a way that it currently isn't unsafe.
> - Plenty of things are seeding the RNG correctly, and buildroot's
> shell script is just "doing it wrong".
>
> On that last point, I should reiterate that buildroot's shell script
> still isn't actually initializing the RNG, despite what it says in its
> echo; there's never been a way to initialize the RNG from a shell
> script, without calling out to various special purpose ioctl-aware
> binaries.
Based on this, the fact that shell scripts cannot seed the RNG anyway,
and due to the hazards in trying to retrofit some heuristics onto an
interface that was never designed to work like this, I'm convinced at
this point that the right course of action here is to leave this
alone. There's no combination of /dev/urandom write hacks/heuristics
that do the right thing without creating some big problem elsewhere.
It just does not have the right semantics for it, and changing the
existing semantics will break existing users.
In light of that conclusion, I'm going to work with every userspace
downstream I can find to help them fix their file-based seeding, if it
has bugs. I've started talking with the buildroot folks, and then I'll
speak with the OpenRC people (being a Gentoo dev, that should be easy
going). Systemd does the right thing already.
I wrote a little utility for potential inclusion in
busybox/util-linux/whatever when it matures beyond its current age of
being half hour old:
- https://git.zx2c4.com/seedrng/about/
- https://git.zx2c4.com/seedrng/tree/seedrng.c
So I'll see what the buildroot people think of this and take it from there.
The plus side of doing all this is that, if the efforts pan out, it
means there'll actually be proper seeding on devices that don't
currently do that, which then might lead to a better ecosystem and
less boot time blocking and all that jazz.
Jason
Hi Rasmus,
On Wed, Mar 23, 2022 at 2:43 AM Rasmus Villemoes
<[email protected]> wrote:
>
> On 23/03/2022 03.50, Jason A. Donenfeld wrote:
>
> > - Since these seeding shell scripts have always been broken, because
> > this is how the rng has always been, rather than trying to bolt on a
> > very imperfect fix in the kernel for something that never worked
> > right, we could suggest shell scripts take the path that I implemented
> > for systemd:
> > https://github.com/systemd/systemd/commit/da2862ef06f22fc8d31dafced6d2d6dc14f2ee0b
> > In shell, this would look like:
> >
> > #!/bin/bash
> > cat seedfile > /dev/urandom
> > { cat seedfile; head -c 32 /dev/urandom; } | sha256sum | cut -d ' ' -f 1 > seedfile
>
> Maybe stating the obvious, but in the interest of preventing
> proliferation of more broken shell scripts: The tail of the above should
> be spelled
>
> ... > seedfile.tmp && mv seedfile.tmp seedfile
>
> or seedfile would be truncated before cat had a chance to read it.
You're not wrong. The actual thing that got committed is:
https://git.buildroot.net/buildroot/commit/?id=f0986de551f46e72268857fd817986e9be697cd0
which thankfully doesn't have this issue.
Jason
On Wed, Mar 23, 2022 at 01:53:03PM -0600, Jason A. Donenfeld wrote:
> Hi David,
>
> On Wed, Mar 23, 2022 at 8:01 AM David Laight <[email protected]> wrote:
> >
> > From: Jason A. Donenfeld
> > > Sent: 23 March 2022 04:48
> > ...
> > > - Plenty of things are seeding the RNG correctly, and buildroot's
> > > shell script is just "doing it wrong".
> > >
> > > On that last point, I should reiterate that buildroot's shell script
> > > still isn't actually initializing the RNG, despite what it says in its
> > > echo; there's never been a way to initialize the RNG from a shell
> > > script, without calling out to various special purpose ioctl-aware
> > > binaries.
> >
> > Perhaps the very first write after boot could be assumed to
> > be valid initialisation data?
> > (On top of a few other tests.)
>
> I addressed this already earlier. That approach does not work. Too
> many things already pass in garbage, not expecting for it to be
> credited, but just contributory. /dev/urandom writes simply has never
> had the semantics one would want for credited seeding. Adding a
> heuristic like this will break users.
>
Just to give an example of this, one of the first things that Android does at
boot time is copy the kernel command line into /dev/urandom:
https://android.googlesource.com/platform/system/core/+/refs/heads/android12-release/rootdir/init.rc#122
It would certainly be undesirable if that started crediting entropy, given that
the command line might not contain much entropy, or any at all.
(And yes, copying the kernel command line into /dev/urandom is redundant in
kernels v4.14 and later due to https://git.kernel.org/linus/33d72f3822d7ff8a.
But this is going to be kept around for a while longer due to older kernels.)
- Eric
Excerpts from Jason A. Donenfeld's message of March 24, 2022 1:20 pm:
> Hi Alex,
>
> On Thu, Mar 24, 2022 at 10:29 AM Alex Xu (Hello71) <[email protected]> wrote:
>> The issue, in systemd developers' opinion, is that counting seed file
>> towards entropy initialization potentially causes repeated RNG output if
>> a system is cloned without resetting the seed file. This is discussed at
>> length in https://github.com/systemd/systemd/pull/4513. A few years ago,
>> I wrote most of a program to check machine ID, disk ID, DMI ID, and some
>> other things in order to avoid this issue. Since then, systemd decided
>> to store the random seed in EFI variables, I assume on the basis that
>> machine cloning typically does not clone the EFI variables? In my
>> opinion, since the same argument applies to machine ID, ssh keys, and
>> any other persistent cryptographic (or even non-cryptographic) material,
>> this falls outside the scope of random seeding and into a general
>> machine cloning "sysprep"-like utility.
>
> systemd's seed utility will credit a seed file if the seed file was
> generated properly (e.g. after the RNG was initialized). For that they
> use the user.random-seed-creditable xattr, which is a reasonable way of
> deciding. If that attribute is present, it's credited; if it's not, it's
> not. Here's their source:
>
> /* If we got this random seed data from getrandom() the data is suitable for crediting
> * entropy later on. Let's keep that in mind by setting an extended attribute. on the file */
> if (getrandom_worked)
> if (fsetxattr(seed_fd, "user.random-seed-creditable", "1", 1, 0) < 0)
> log_full_errno(ERRNO_IS_NOT_SUPPORTED(errno) ? LOG_DEBUG : LOG_WARNING, errno,
> "Failed to mark seed file as creditable, ignoring: %m");
>
> Since my seedrng.c is designed for more minimal systems (running
> buildroot or openrc or whatever), which might not have xattrs available,
> it distinguishes just based on the filename:
>
> if (new_seed_creditable && rename(NON_CREDITABLE_SEED, CREDITABLE_SEED) < 0) {
> fprintf(stderr, "ERROR: Unable to make new seed creditable: %s\n", strerror(errno));
> program_ret |= 1 << 6;
> }
>
> It's no surprise that these are very similar; I've read systemd's
> seeding logic and contributed a fix to it.
>
> By the way, if you think something is different or wrong or whatever in
> seedrng.c, please feel free to send me a patch for it. It already
> received its first contribution this morning (from the buildroot
> maintainer). Hopefully the code will reach a good settling point soon,
> and then various projects that want it can just copy and paste it
> verbatim into their environment, and tweak idiomatic things as needed.
>
> Jason
>
Right, but I'm saying that even if the seed file is good when you wrote
it, it becomes bad if you copy it to multiple machines (e.g. by VM
cloning). For various reasons, I think this is outside the scope of the
random seed services to handle, but I think it's good to keep in mind.
Cheers,
Alex.
On Tue, Mar 22, 2022 at 10:00:49PM -0600, Jason A. Donenfeld wrote:
>
> Another variation on that would be to do what this current patch does,
> but only crng_pre_init_inject() on CAP_SYS_ADMIN. But this has the same
> pitfall of only working as intended at cnrg_init=0 but not crng_init=1.
> That's better than nothing, but it's not perfect, and it introduces that
> problem with RNDADDTOENTCNT.
Well, one could argue that "RNDADDTOENTCNT" is a problem that has
always been there, and it already requires CAP_SYS_ADMIN. So I'm not
sure it makes it any worse.
> > > And perhaps we might consider attempting to deprecate RNDADDTOENTCNT at
> > > some point in the future.
> >
> > That would be a good idea. ?:-)
>
> Oh cool, I'm glad you agree. Let's do that then. Have a preferred path?
> Maybe just a pr_once() saying not to use it?
Probably. We could get more aggressive (e.g., WARN), but the first
Google search on RNDADDTOENTCNT returned:
https://github.com/jumpnow/rndaddtoentcnt
So I'm now regretting not silently making it vanish a decade or more ago...
- Ted
Hi Eric,
On 3/24/22, Eric Biggers <[email protected]> wrote:
> On Tue, Mar 22, 2022 at 01:14:36PM -0600, Jason A. Donenfeld wrote:
>> For as far back as I can tell, writing to /dev/urandom or /dev/random
>> will put entropy into the pool, but won't immediately use it, and won't
>> credit it either.
>
> Did you check kernels v4.7 and earlier? It looks like this actually changed
> in
> v4.8 when the ChaCha20 CRNG was introduced. v4.7 would mix the data written
> to
> /dev/{u,}random into {non,}blocking_pool, which would immediately be
> reflected
> in reads from /dev/{u,}random, sys_getrandom(), and get_random_bytes().
> Writes
> to /dev/{u,}random didn't affect the input_pool, which was separate.
Oh, I suppose you might be right, actually, that v4.7 and below would
hash the non blocking pool, and let /dev/urandom write directly into
it, as something distinct from the input pool. This changed with v4.8,
6 years ago, and now there are no LTS kernels that old, with most
small devices even having vendor kernels v4.9+. v4.8 apparently did
this while fixing a more extreme vulnerability of allowing
unprivileged users to bruteforce input bytes (in addition to allowing
unbounded unprivileged lock contention). Of those who have been
seeding via /dev/random, the ones who additionally issued the ioctl to
credit those bits haven't been affected since the crediting solved the
issue by invoking a reseeding. And those who didn't issue the ioctl
never had their RNG initialize in the first place, causing getrandom()
to block until entropy was collected from elsewhere, until it was
safe, so the harm there was minimal. So it's not great, but it's not
horrific either, and I still think the cons strongly outweigh the pros
in trying to change the behavior from what it is now.
Jason
Hi Eric,
On Thu, Mar 24, 2022 at 12:27 PM Eric Biggers <[email protected]> wrote:
> The example in random(4) needs to be fixed too, right?
>
> https://man7.org/linux/man-pages/man4/random.4.html
Yes, that's always been wrong, at least for all of 2.6.... There are a
number of issues with the man page that I plan on addressing all at
once with a series.
Jason
Hi!
> > One of the big issues with /dev/urandom writes is that *anyone*,
> > including malicious user space, can force specific bytes to be mixed
> > in. ?That's the source of the reluctance to immediate use inputs from
> > writes into /dev/[u]random until there is a chance for it to be mixed
> > in with other entropy which is hopefully not under the control of
> > malicious userspace.
>
> Right, sort of. Since we now always use a cryptographic hash function,
> we can haphazardly mix whatever any user wants, without too much
> concern. The issue is whether we _credit_ those bits. Were we to credit
> those bits, a malicious unpriv'd user could credit 248 bits of known
> input, and then bruteforce 8 bits of unknown input, and repeat, and in
> that way destroy the security of the thing. So, yea, the current
> reluctance does make sense.
>
> > Now, I recognize that things are a bit special in early boot, and if
> > we have a malicious script running in a systemd unit script, we might
> > as well go home. ?But something to consider is whether we want to do
> > soemthing special if the process writing to /dev/[u]random has
> > CAP_SYS_ADMIN, or some such.
>
> Exactly. So one way of approaching this would be to simply credit writes
> to /dev/urandom if it's CAP_SYS_ADMIN and maybe if also !crng_ready(),
> and just skip the crng_pre_init_inject() part that this current patch
> adds. I'll attach a sample patch of what this might look like at the end
> of this email.
CAP_* should not really work like that. They should control if process
can do the operation, but should not really silently change what
syscall does based on the CAP_*...
(And yes, I'm a bit late).
Best regards,
Pavel
--
People of Russia, stop Putin before his war on Ukraine escalates.
Hi!
> > Very much so, thanks again. What I take away from your results is:
> >
> > - RNDADDTOENTCNT is in active use in a safe way. Sure, RNDADDENTROPY
> > is still much better, but RNDADDTOENTCNT isn't entirely broken in the
> > above configurations either.
> > - This patch would make RNDADDTOENTCNT unsafe for some of the above
> > configurations in a way that it currently isn't unsafe.
> > - Plenty of things are seeding the RNG correctly, and buildroot's
> > shell script is just "doing it wrong".
> >
> > On that last point, I should reiterate that buildroot's shell script
> > still isn't actually initializing the RNG, despite what it says in its
> > echo; there's never been a way to initialize the RNG from a shell
> > script, without calling out to various special purpose ioctl-aware
> > binaries.
>
> Based on this, the fact that shell scripts cannot seed the RNG anyway,
> and due to the hazards in trying to retrofit some heuristics onto an
> interface that was never designed to work like this, I'm convinced at
> this point that the right course of action here is to leave this
> alone. There's no combination of /dev/urandom write hacks/heuristics
> that do the right thing without creating some big problem elsewhere.
> It just does not have the right semantics for it, and changing the
> existing semantics will break existing users.
>
> In light of that conclusion, I'm going to work with every userspace
> downstream I can find to help them fix their file-based seeding, if it
> has bugs. I've started talking with the buildroot folks, and then I'll
> speak with the OpenRC people (being a Gentoo dev, that should be easy
> going). Systemd does the right thing already.
>
> I wrote a little utility for potential inclusion in
> busybox/util-linux/whatever when it matures beyond its current age of
> being half hour old:
> - https://git.zx2c4.com/seedrng/about/
> - https://git.zx2c4.com/seedrng/tree/seedrng.c
> So I'll see what the buildroot people think of this and take it from there.
You could put it into the kernel into tools/ directory...
Best regards,
Pavel
Hi!
> > On Tue, Mar 22, 2022 at 01:14:36PM -0600, Jason A. Donenfeld wrote:
> >> For as far back as I can tell, writing to /dev/urandom or /dev/random
> >> will put entropy into the pool, but won't immediately use it, and won't
> >> credit it either.
> >
> > Did you check kernels v4.7 and earlier? It looks like this actually changed
> > in
> > v4.8 when the ChaCha20 CRNG was introduced. v4.7 would mix the data written
> > to
> > /dev/{u,}random into {non,}blocking_pool, which would immediately be
> > reflected
> > in reads from /dev/{u,}random, sys_getrandom(), and get_random_bytes().
> > Writes
> > to /dev/{u,}random didn't affect the input_pool, which was separate.
>
> Oh, I suppose you might be right, actually, that v4.7 and below would
> hash the non blocking pool, and let /dev/urandom write directly into
> it, as something distinct from the input pool. This changed with v4.8,
> 6 years ago, and now there are no LTS kernels that old, with most
> small devices even having vendor kernels v4.9+. v4.8 apparently did
We are still maintaining 4.4 for -cip project, and people running android probably still
maintain that, too.
> this while fixing a more extreme vulnerability of allowing unprivileged users to
> bruteforce input bytes (in addition to allowing unbounded unprivileged lock contention).
I assume this got fixed during the 4.4-stable series?
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html