Hi,
with the previous release of the CPU Jitter RNG ([1]), concerns were raised on
the presence of entropy in the CPU execution timing. With this new version of
the CPU Jitter RNG, a new noise source based on memory access timings is now
added and the concerns raised before are addressed with additional analyses
given in [2] section 6.1.
This additional noise source is again covered with extensive testing
documented in [2] section 6.2. The test results allowed the explanation of the
basics of that memory access noise source.
To analyze the two noise sources, a bare metal testing program is used as
documented in [2] section 6.3. That bare metal testing allows the analysis of
the noise source without interference of an OS and interrupts.
Furthermore, for the already existent noise source of the CPU execution
timing, more analysis of the behavior of the CPU is provided in [2] section
6.1. The analysis, however, showed CPU behavior that cannot easily be
explained. The testing shows that there is a possibility to eliminate the CPU
execution timing jitter for one particular measurement using a serialization
instruction. That elimination of timing jitter, however, was not visible when
the individual rounds of the RNG were tested. That means that the elimination
of timing jitter in one special case did not show any effects on the behavior
of the RNG.
The following set of patches integrate the CPU Jitter RNG as a fallback noise
source into /dev/random. The reason for using it as a fallback only is the
conceptual difference of the CPU Jitter RNG to the other noise sources: all
other noise sources are a push mechanism whereas the CPU Jitter RNG works by
pulling bits on demand. Due to the speed of the Jitter RNG, it has the
capability of monopolizing all other noise sources which is prevented by only
invoking it when the lower entropy threshold of the Linux RNG is reached.
Ciao
Stephan
[1] http://thread.gmane.org/gmane.linux.kernel/1577419/focus=1586212
[2] http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html
--
| Cui bono? |
The two added sysctls are read/writable to allow administrators to
tweak the behavior of the CPU Jitter RNG. Normally, no tweaking is
neccessary. Though, some overly cautious users may set the default
to higher values.
The sysctls are found under /proc/sys/kernel/random with the following
files:
jent_memaccessloops -- number of accesses per timing measurement (the
more memory accesses, the higher the timing variations and thus the
entropy per measurement)
jent_osr -- the oversampling rate when generating a random number that
is injected as noise into the Linux RNG
Signed-off-by: Stephan Mueller <[email protected]>
---
drivers/char/random.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 4b2267b..e689956 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1717,6 +1717,20 @@ struct ctl_table random_table[] = {
.proc_handler = proc_dointvec,
.data = &input_pool.jent_ec.memblocks,
},
+ {
+ .procname = "jent_memaccessloops",
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ .data = &input_pool.jent_ec.memaccessloops,
+ },
+ {
+ .procname = "jent_osr",
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ .data = &input_pool.jent_ec.osr,
+ },
{ }
};
#endif /* CONFIG_SYSCTL */
--
1.8.5.3
After successful initialization of the CPU Jitter RNG as part of the
Linux RNG, the two variables defining the memory size of the memory
chunk used for measuring memory access times are set. In case the Jitter
RNG does not successfully initialize, these variables are set to zero.
These two variables can be exported to user space to allow user space to
check whether the CPU Jitter RNG is operational and which memory values
are used. Note, according to tests, the size of the memory chunk has a
direct impact on the execution timing variations.
The exported variables are all read only and can be found at:
/proc/sys/kernel/random. The files are:
jent_memblocksize -- size of one memory block in bytes
jent_memblocks -- number of memory blocks used
Signed-off-by: Stephan Mueller <[email protected]>
---
drivers/char/random.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index eb4fe99..4b2267b 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1703,6 +1703,20 @@ struct ctl_table random_table[] = {
.mode = 0444,
.proc_handler = proc_do_uuid,
},
+ {
+ .procname = "jent_memblocksize",
+ .maxlen = sizeof(int),
+ .mode = 0444,
+ .proc_handler = proc_dointvec,
+ .data = &input_pool.jent_ec.memblocksize,
+ },
+ {
+ .procname = "jent_memblocks",
+ .maxlen = sizeof(int),
+ .mode = 0444,
+ .proc_handler = proc_dointvec,
+ .data = &input_pool.jent_ec.memblocks,
+ },
{ }
};
#endif /* CONFIG_SYSCTL */
--
1.8.5.3
The CPU Jitter RNG is included into random.c as a new noise source.
The noise source, however, works differently than all other noise
sources. The CPU Jitter RNG provides entropy on demand and thus, the
callback to obtain new data is implemented as a pull operation.
The pull operation is only executed for the input_pool when its entropy
estimator falls below the threshold that causes /dev/random to block. In
this case, the CPU Jitter RNG is queried to provide as much random
numbers as requested by the caller, but not more than 64 bytes at one
time.
The restrictions shall ensure that the CPU Jitter RNG does not
monopolize all other Linux RNG noise sources, as it provides entropy far
faster than any other noise sources.
When pulling data from /dev/random, no user-noticable blocking occurs
any more. Also, the speed of /dev/urandom is not lowered by the
patch.
When the first random number is pulled from the Linux RNG, the code
invokes the CPU Jitter RNG self test which verifies that the underlying
hardware is capable enough to support the RNG. If the hardware is
insufficient (i.e. it does not provide a high-resolution timer), the CPU
Jitter RNG noise source is completely disabled. Otherwise it will
provide entropy right for the first invocations of the Linux RNG.
Signed-off-by: Stephan Mueller <[email protected]>
---
drivers/char/random.c | 86
++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 85 insertions(+), 1 deletion(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 429b75b..eb4fe99 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -267,6 +267,8 @@
#define CREATE_TRACE_POINTS
#include <trace/events/random.h>
+#include <linux/jitterentropy.h>
+
/*
* Configuration information
*/
@@ -430,19 +432,23 @@ struct entropy_store {
unsigned int limit:1;
unsigned int last_data_init:1;
__u8 last_data[EXTRACT_SIZE];
+ struct rand_data jent_ec;
};
static void push_to_pool(struct work_struct *work);
static __u32 input_pool_data[INPUT_POOL_WORDS];
static __u32 blocking_pool_data[OUTPUT_POOL_WORDS];
static __u32 nonblocking_pool_data[OUTPUT_POOL_WORDS];
+static unsigned char input_jentmem[JENT_MEMORY_SIZE];
static struct entropy_store input_pool = {
.poolinfo = &poolinfo_table[0],
.name = "input",
.limit = 1,
.lock = __SPIN_LOCK_UNLOCKED(input_pool.lock),
- .pool = input_pool_data
+ .pool = input_pool_data,
+ .jent_ec.mem = input_jentmem,
+ .jent_ec.memblocks = 0,
};
static struct entropy_store blocking_pool = {
@@ -454,6 +460,7 @@ static struct entropy_store blocking_pool = {
.pool = blocking_pool_data,
.push_work = __WORK_INITIALIZER(blocking_pool.push_work,
push_to_pool),
+ .jent_ec.mem = NULL,
};
static struct entropy_store nonblocking_pool = {
@@ -464,6 +471,7 @@ static struct entropy_store nonblocking_pool = {
.pool = nonblocking_pool_data,
.push_work = __WORK_INITIALIZER(nonblocking_pool.push_work,
push_to_pool),
+ .jent_ec.mem = NULL,
};
static __u32 const twist_table[8] = {
@@ -715,6 +723,81 @@ static void credit_entropy_bits_safe(struct entropy_store
*r, int nbits)
*
*********************************************************************/
+/* On some architectures without random_get_entropy, the clocksource
+ * drivers may provide a high resolution timer. This, however, prevents
+ * this function from being called from init_std_data to fill the
+ * entropy pools with entropy at the time of creation. The clocksource
drivers
+ * are loaded during module_init() time, just as init_std_data. Thus, there
+ * is no guarantee that the clocksource drivers are available here.
+ */
+static void add_jent_randomness(struct entropy_store *r, int bytes)
+{
+#define JENT_BUFFER 64 /* ensure that JENT_BUFFER is a multiple of
+ * the CPU Jitter RNG block size */
+ char rand[JENT_BUFFER];
+ int ret = 0;
+ int entropy_count = 0;
+ unsigned long flags;
+
+ /* the initialization process determined that we cannot use the
+ * CPU Jitter RNG or the caller provided wrong input */
+ if(NULL == r->jent_ec.mem || 0 >= bytes)
+ return;
+
+ /* only use the Jitter RNG if we fall to the low threshold as
+ * otherwise the Jitter RNG monopolizes the noise sources */
+ entropy_count = ACCESS_ONCE(r->entropy_count);
+ entropy_count = entropy_count >> (ENTROPY_SHIFT);
+ if (entropy_count > random_read_wakeup_thresh)
+ return;
+
+ memset(rand, 0, JENT_BUFFER);
+ spin_lock_irqsave(&r->lock, flags);
+ if(0 == r->jent_ec.memblocks)
+ {
+ /* we are uninitialized, try to initialize */
+ if(jent_entropy_init())
+ {
+ /* there is no CPU Jitter, disable the collector */
+ r->jent_ec.mem = NULL;
+ spin_unlock_irqrestore(&r->lock, flags);
+ return;
+ }
+ r->jent_ec.data = 0;
+ r->jent_ec.prev_time = 0;
+ r->jent_ec.old_data = 0;
+ r->jent_ec.fips_fail = 0;
+ r->jent_ec.stir = 0;
+ r->jent_ec.disable_unbias = 0;
+ r->jent_ec.osr = 1;
+ /* r->jent_ec.mem does not need to be zeroized */
+ r->jent_ec.memblocksize = JENT_MEMORY_BLOCKSIZE;
+ r->jent_ec.memblocks = JENT_MEMORY_BLOCKS;
+ r->jent_ec.memaccessloops = JENT_MEMORY_ACCESSLOOPS;
+ /* fill the entropy collector and init the FIPS test
+ * by pulling one round from the RNG */
+ jent_read_entropy(&r->jent_ec, rand, 8);
+ }
+
+ /* never pull more bytes than available in temp variable */
+ ret = min_t(int, bytes, JENT_BUFFER);
+#define JENT_WRAP (DATA_SIZE_BITS / 8 - 1)
+ /* round up number of bytes to be pulled to next multiple of
+ * CPU Jitter RNG block size */
+ ret = (ret + JENT_WRAP) &~ JENT_WRAP;
+
+ ret = jent_read_entropy(&r->jent_ec, rand, ret);
+ spin_unlock_irqrestore(&r->lock, flags);
+ if(0 < ret)
+ {
+ /* we do not need to worry about trickle threshold as we are
+ * called when we are low on entropy */
+ mix_pool_bytes(r, rand, ret, NULL);
+ credit_entropy_bits(r, ret * 8);
+ }
+ memset(rand, 0, JENT_BUFFER);
+}
+
/* There is one of these per entropy source */
struct timer_rand_state {
cycles_t last_time;
@@ -935,6 +1018,7 @@ static void _xfer_secondary_pool(struct entropy_store *r,
size_t nbytes)
trace_xfer_secondary_pool(r->name, bytes * 8, nbytes * 8,
ENTROPY_BITS(r), ENTROPY_BITS(r->pull));
+ add_jent_randomness(r->pull, bytes);
bytes = extract_entropy(r->pull, tmp, bytes,
random_read_wakeup_thresh / 8, rsvd);
mix_pool_bytes(r, tmp, bytes, NULL);
--
1.8.5.3
Amend Makefile to allow the CPU Jitter RNG code to be statically
compiled.
Signed-off-by: Stephan Mueller <[email protected]>
---
drivers/char/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/char/Makefile b/drivers/char/Makefile
index 290fe5b..480c8f6 100644
--- a/drivers/char/Makefile
+++ b/drivers/char/Makefile
@@ -2,7 +2,8 @@
# Makefile for the kernel character device drivers.
#
-obj-y += mem.o random.o
+obj-y += mem.o random.o jitterentropy-base.o
+CFLAGS_jitterentropy-base.o = -O0
obj-$(CONFIG_TTY_PRINTK) += ttyprintk.o
obj-y += misc.o
obj-$(CONFIG_ATARI_DSP56K) += dsp56k.o
--
1.8.5.3
The jitterentropy-base.c file implements the CPU Jitter RNG as
documented at http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html.
The associated header file makes the RNG available to the remainder
of the kernel.
The CPU Jitter RNG delivers entropy on demand. Therefore, it only
causes system overhead when entropy is requested. The RNG delivers
entropy even at early boot time, sufficient to satisfy the earliest
users of random numbers.
The RNG is based on a high-resolution timer like the x86 RDTSC
instruction. The generated random bit stream does not show any
statistical significant patterns. The output of the random number
generator passes all standard statistical tools analyzing the quality
of a random data stream.
The documentation discusses the noise sources, including quantitative
analyses in chapter 6 of the aforementioned document.
Testing of the code on a large number of CPUs and operating systems
is performed as outlined in chapter 5 and the appendix of the mentioned
document.
Signed-off-by: Stephan Mueller <[email protected]>
---
drivers/char/jitterentropy-base.c | 665 ++++++++++++++++++++++++++++++++++++++
include/linux/jitterentropy.h | 164 ++++++++++
2 files changed, 829 insertions(+)
create mode 100644 drivers/char/jitterentropy-base.c
create mode 100644 include/linux/jitterentropy.h
diff --git a/drivers/char/jitterentropy-base.c b/drivers/char/jitterentropy-base.c
new file mode 100644
index 0000000..950877c
--- /dev/null
+++ b/drivers/char/jitterentropy-base.c
@@ -0,0 +1,665 @@
+/*
+ * Non-physical true random number generator based on timing jitter.
+ *
+ * Copyright Stephan Mueller <[email protected]>, 2014
+ *
+ * Design
+ * ======
+ *
+ * See documentation in http://www.chronox.de.
+ *
+ * License
+ * =======
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, and the entire permission notice in its entirety,
+ * including the disclaimer of warranties.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. The name of the author may not be used to endorse or promote
+ * products derived from this software without specific prior
+ * written permission.
+ *
+ * ALTERNATIVELY, this product may be distributed under the terms of
+ * the GNU General Public License, in which case the provisions of the GPL are
+ * required INSTEAD OF the above restrictions. (This clause is
+ * necessary due to a potential bad interaction between the GPL and
+ * the restrictions contained in a BSD-style copyright.)
+ *
+ * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, ALL OF
+ * WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
+ * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
+ * USE OF THIS SOFTWARE, EVEN IF NOT ADVISED OF THE POSSIBILITY OF SUCH
+ * DAMAGE.
+ */
+
+#include <linux/jitterentropy.h>
+
+#ifdef __OPTIMIZE__
+ #error "The CPU Jitter random number generator must not be compiled with optimizations. See documentation. Use the compiler switch -O0 for compiling jitterentropy-base.c."
+#endif
+
+/*
+ * Update of the loop count used for the next round of
+ * an entropy collection.
+ *
+ * Input:
+ * @bits is the number of low bits of the timer to consider
+ * @min is the number of bits we shift the timer value to the right at
+ * the end to make sure we have a guaranteed minimum value
+ *
+ * Return:
+ * Newly calculated loop counter
+ */
+static unsigned int jent_loop_shuffle(unsigned int bits, unsigned int min)
+{
+ __u64 time = 0;
+ jent_get_nstime(&time);
+
+ /* we take the low bit of the timer which implies any
+ * value between 0 and 2^(bits + 1) - 1 */
+ time = time << (64 - bits);
+ time = time >> (64 - bits);
+
+ /* We add a lower boundary value to ensure we have a minimum
+ * RNG loop count. */
+ return (time + (1<<min));
+}
+
+static unsigned int jent_loop_fold_shuffle(void)
+{
+/* Number of low bits of timer that are used to determine the next
+ * folding loop counter */
+#define MAX_FOLD_LOOP_BIT 4
+#define MIN_FOLD_LOOP_BIT 0
+ return jent_loop_shuffle(MAX_FOLD_LOOP_BIT, MIN_FOLD_LOOP_BIT);
+}
+
+/***************************************************************************
+ * Noise sources
+ ***************************************************************************/
+
+/*
+ * CPU Jitter noise source -- this is the noise source based on the CPU
+ * execution time jitter
+ *
+ * This function folds the time into TIME_ENTROPY_BITS bits by iterating
+ * through the DATA_SIZE_BITS bit time value as follows: assume our time value
+ * is 0xaaabbbcccddd, TIME_ENTROPY_BITS is 3
+ * 1st loop, 1st shift generates 0xddd000000000
+ * 1st loop, 2nd shift generates 0x000000000ddd
+ * 2nd loop, 1st shift generates 0xcccddd000000
+ * 2nd loop, 2nd shift generates 0x000000000ccc
+ * 3rd loop, 1st shift generates 0xbbbcccddd000
+ * 3rd loop, 2nd shift generates 0x000000000bbb
+ * 4th loop, 1st shift generates 0xaaabbbcccddd
+ * 4th loop, 2nd shift generates 0x000000000aaa
+ * Now, the values at the end of the 2nd shifts are XORed together.
+ * Note, the loop only performs (DATA_SIZE_BITS / TIME_SIZE) iterations. If the
+ * division is not complete, it takes the lower bound (e.g. 64 / 3 would result
+ * 21). Thus, the upmost bits that are less than TIME_SIZE in size (which are
+ * assumed to have no entropy to begin with) are discarded.
+ *
+ * The code is deliberately inefficient and shall stay that way. This function
+ * is the root cause why the code shall be compiled without optimization. This
+ * function not only acts as folding operation, but this function's execution
+ * is used to measure the CPU execution time jitter. Any change to the loop in
+ * this function implies that careful retesting must be done.
+ *
+ * Input:
+ * @time time stamp to be folded
+ * @loop_cnt if a value not equal to 0 is set, use the given value as number of
+ * loops to perform the folding
+ *
+ * Output:
+ * @folded result of folding operation
+ *
+ * Return:
+ * Number of loops the folding operation is performed
+ */
+static unsigned int jent_fold_time(__u64 time, __u64 *folded,
+ unsigned int loop_cnt)
+{
+ int i, j;
+ __u64 new = 0;
+ unsigned int fold_loop_cnt = jent_loop_fold_shuffle();
+
+ /* testing purposes -- allow test app to set the counter, not
+ * needed during runtime */
+ if(loop_cnt)
+ fold_loop_cnt = loop_cnt;
+ for(j = 0; j < fold_loop_cnt; j++)
+ {
+ new = 0;
+ for(i = 1; (DATA_SIZE_BITS / TIME_ENTROPY_BITS) >= i; i++)
+ {
+ __u64 tmp = time << (DATA_SIZE_BITS - (TIME_ENTROPY_BITS * i));
+ tmp = tmp >> (DATA_SIZE_BITS - TIME_ENTROPY_BITS);
+ new ^= tmp;
+ }
+ }
+ *folded = new;
+ return fold_loop_cnt;
+}
+
+/*
+ * Memory Access noise source -- this is a noise source based on variations in
+ * memory access times
+ *
+ * This function performs memory accesses which will add to the timing
+ * variations due to an unknown amount of CPU wait states that need to be
+ * added when accessing memory. The memory size should be larger than the L1
+ * caches as outlined in the documentation and the associated testing.
+ *
+ * The L1 cache has a very high bandwidth, albeit its access rate is usually
+ * slower than accessing CPU registers. Therefore, L1 accesses only add minimal
+ * variations as the CPU has hardly to wait. Starting with L2, significant
+ * variations are added because L2 typically does not belong to the CPU any more
+ * and therefore a wider range of CPU wait states is necessary for accesses.
+ * L3 and real memory accesses have even a wider range of wait states. However,
+ * to reliably access either L3 or memory, the ec->mem memory must be quite large
+ * which is usually not desirable.
+ *
+ * Input:
+ * @ec Reference to the entropy collector with the memory access data -- if
+ * the reference to the memory block to be accessed is NULL, this noise
+ * source is disabled
+ *
+ * Output:
+ * nothing -- the state of the memory access data in @ec is updated
+ *
+ * Return:
+ * Number of memory access operations
+ */
+static unsigned int jent_memaccess(struct rand_data *ec)
+{
+ unsigned char *tmpval = NULL;
+ unsigned long wrap = 0;
+ unsigned int i = 0;
+
+ if(NULL == ec || NULL == ec->mem)
+ return 0;
+
+ wrap = ec->memblocksize * ec->memblocks - 1;
+
+ for(i = 0; i < ec->memaccessloops; i++)
+ {
+ tmpval = ec->mem + ec->memlocation;
+ /* memory access: just add 1 to one byte,
+ * wrap at 255 -- memory access implies read
+ * from and write to memory location */
+ *tmpval = (*tmpval + 1) & 0xff;
+ /* Addition of memblocksize - 1 to pointer
+ * with wrap around logic to ensure that every
+ * memory location is hit evenly
+ */
+ ec->memlocation = ec->memlocation + ec->memblocksize - 1;
+ if(ec->memlocation > wrap)
+ ec->memlocation -= wrap;
+ }
+
+ return i;
+}
+
+/***************************************************************************
+ * Start of entropy processing logic
+ ***************************************************************************/
+
+/*
+ * This is the heart of the entropy generation: calculate time deltas and
+ * use the CPU jitter in the time deltas. The jitter is folded into one
+ * bit. You can call this function the "random bit generator" as it
+ * produces one random bit per invocation.
+ *
+ * WARNING: ensure that ->prev_time is primed before using the output
+ * of this function! This can be done by calling this function
+ * and not using its result.
+ *
+ * Input:
+ * @entropy_collector Reference to entropy collector
+ *
+ * Return:
+ * One random bit
+ *
+ */
+static __u64 jent_measure_jitter(struct rand_data *entropy_collector)
+{
+ __u64 time = 0;
+ __u64 delta = 0;
+ __u64 data = 0;
+
+ /* Invoke one noise source before time measurement to add variations */
+ jent_memaccess(entropy_collector);
+
+ /* Get time stamp and calculate time delta to previous invocation
+ * to measure the timing variations with the previous invocation */
+ jent_get_nstime(&time);
+ delta = time - entropy_collector->prev_time;
+ entropy_collector->prev_time = time;
+
+ /* Now call the next noise sources which also folds the data */
+ jent_fold_time(delta, &data, 0);
+
+ return data;
+}
+
+/*
+ * Von Neuman unbias as explained in RFC 4086 section 4.2. As shown in the
+ * documentation of that RNG, the bits from jent_measure_jitter are considered
+ * independent which implies that the Von Neuman unbias operation is applicable.
+ * A proof of the Von-Neumann unbias operation to remove skews is given in the
+ * document "A proposal for: Functionality classes for random number
+ * generators", version 2.0 by Werner Schindler, section 5.4.1.
+ *
+ * Input:
+ * @entropy_collector Reference to entropy collector
+ *
+ * Return:
+ * One random bit
+ */
+static __u64 jent_unbiased_bit(struct rand_data *entropy_collector)
+{
+ if(1 == entropy_collector->disable_unbias)
+ return (jent_measure_jitter(entropy_collector));
+ do
+ {
+ __u64 a = jent_measure_jitter(entropy_collector);
+ __u64 b = jent_measure_jitter(entropy_collector);
+ if(a == b)
+ continue;
+ if(1 == a)
+ return 1;
+ else
+ return 0;
+ } while(1);
+}
+
+/*
+ * Shuffle the pool a bit by mixing some value with a bijective function (XOR)
+ * into the pool.
+ *
+ * The function generates a mixer value that depends on the bits set and the
+ * location of the set bits in the random number generated by the entropy
+ * source. Therefore, based on the generated random number, this mixer value
+ * can have 2**64 different values. That mixer value is initialized with the
+ * first two SHA-1 constants. After obtaining the mixer value, it is XORed into
+ * the random number.
+ *
+ * The mixer value is not assumed to contain any entropy. But due to the XOR
+ * operation, it can also not destroy any entropy present in the entropy pool.
+ *
+ * Input:
+ * @entropy_collector Reference to entropy collector
+ *
+ * Output:
+ * nothing
+ */
+static void jent_stir_pool(struct rand_data *entropy_collector)
+{
+ /* to shut up GCC on 32 bit, we have to initialize the 64 variable
+ * with two 32 bit variables */
+ union c {
+ __u64 u64;
+ __u32 u32[2];
+ };
+ /* This constant is derived from the first two 32 bit initialization
+ * vectors of SHA-1 as defined in FIPS 180-4 section 5.3.1 */
+ union c constant;
+ /* The start value of the mixer variable is derived from the third
+ * and fourth 32 bit initialization vector of SHA-1 as defined in
+ * FIPS 180-4 section 5.3.1 */
+ union c mixer;
+ int i = 0;
+
+ /* Store the SHA-1 constants in reverse order to make up the 64 bit
+ * value -- this applies to a little endian system, on a big endian
+ * system, it reverses as expected. But this really does not matter
+ * as we do not rely on the specific numbers. We just pick the SHA-1
+ * constants as they have a good mix of bit set and unset. */
+ constant.u32[1] = 0x67452301;
+ constant.u32[0] = 0xefcdab89;
+ mixer.u32[1] = 0x98badcfe;
+ mixer.u32[0] = 0x10325476;
+
+ for(i = 0; i < DATA_SIZE_BITS; i++)
+ {
+ /* get the i-th bit of the input random number and only XOR
+ * the constant into the mixer value when that bit is set */
+ if((entropy_collector->data >> i) & 0x0000000000000001)
+ mixer.u64 ^= constant.u64;
+ mixer.u64 = rol64(mixer.u64, 1);
+ }
+ entropy_collector->data ^= mixer.u64;
+}
+
+/*
+ * Generator of one 64 bit random number
+ * Function fills rand_data->data
+ *
+ * Input:
+ * @entropy_collector Reference to entropy collector
+ *
+ * Return:
+ * Number of loops the entropy collection is performed.
+ */
+static void jent_gen_entropy(struct rand_data *entropy_collector)
+{
+ unsigned int k;
+
+ /* number of loops for the entropy collection depends on the size of
+ * the random number and the size of the folded value. We want to
+ * ensure that we pass over each bit of the random value once with the
+ * folded value. E.g. if we have a random value of 64 bits and 2 bits
+ * of folded size, we need 32 entropy collection loops. If the random
+ * value size is not divisible by the folded value size, we have as
+ * many loops to cover each random number value bit at least once. E.g.
+ * 64 bits random value size and the folded value is 3 bits, we need 22
+ * loops to cover the 64 bits at least once. */
+ /* We multiply the loop value with ->osr to obtain the oversampling
+ * rate requested by the caller */
+ for (k = 0;
+ k < ((((DATA_SIZE_BITS - 1) / TIME_ENTROPY_BITS) + 1) *
+ entropy_collector->osr);
+ k++)
+ {
+ __u64 data = 0;
+ /* priming of the ->prev_time value in first loop iteration */
+ if(!k)
+ jent_measure_jitter(entropy_collector);
+
+ data = jent_unbiased_bit(entropy_collector);
+ entropy_collector->data ^= data;
+ entropy_collector->data = rol64(entropy_collector->data,
+ TIME_ENTROPY_BITS);
+ }
+ if(entropy_collector->stir)
+ jent_stir_pool(entropy_collector);
+}
+
+/* the continuous test required by FIPS 140-2 -- the function automatically
+ * primes the test if needed.
+ *
+ * Return:
+ * 0 if FIPS test passed
+ * < 0 if FIPS test failed
+ */
+static int jent_fips_test(struct rand_data *entropy_collector)
+{
+ if(!jent_fips_enabled())
+ return 0;
+
+ /* shall we somehow allow the caller to reset that? Probably
+ * not, because the caller can de-allocate the entropy collector
+ * instance and set up a new one. */
+ if(entropy_collector->fips_fail)
+ return -1;
+
+ /* prime the FIPS test */
+ if(!entropy_collector->old_data)
+ {
+ entropy_collector->old_data = entropy_collector->data;
+ jent_gen_entropy(entropy_collector);
+ }
+
+ if(entropy_collector->data == entropy_collector->old_data)
+ {
+ entropy_collector->fips_fail = 1;
+ return -1;
+ }
+
+ entropy_collector->old_data = entropy_collector->data;
+
+ return 0;
+}
+
+/*
+ * Entry function: Obtain entropy for the caller.
+ *
+ * This function invokes the entropy gathering logic as often to generate
+ * as many bytes as requested by the caller. The entropy gathering logic
+ * creates 64 bit per invocation.
+ *
+ * This function truncates the last 64 bit entropy value output to the exact
+ * size specified by the caller.
+ *
+ * @data: pointer to buffer for storing random data -- buffer must already
+ * exist
+ * @len: size of the buffer, specifying also the requested number of random
+ * in bytes
+ *
+ * return: number of bytes returned when request is fulfilled or an error
+ *
+ * The following error codes can occur:
+ * -1 FIPS 140-2 continuous self test failed
+ * -2 entropy_collector is NULL
+ */
+int jent_read_entropy(struct rand_data *entropy_collector,
+ char *data, size_t len)
+{
+ char *p = data;
+ int ret = 0;
+ size_t orig_len = len;
+
+ if(NULL == entropy_collector)
+ return -2;
+
+ while (0 < len)
+ {
+ size_t tocopy;
+ jent_gen_entropy(entropy_collector);
+ ret = jent_fips_test(entropy_collector);
+ if(0 > ret)
+ return ret;
+
+ if((DATA_SIZE_BITS / 8) < len)
+ tocopy = (DATA_SIZE_BITS / 8);
+ else
+ tocopy = len;
+ memcpy(p, &entropy_collector->data, tocopy);
+
+ len -= tocopy;
+ p += tocopy;
+ }
+
+ /* To be on the safe side, we generate one more round of entropy
+ * which we do not give out to the caller. That round shall ensure
+ * that in case the calling application crashes, memory dumps, pages
+ * out, or due to the CPU Jitter RNG lingering in memory for long
+ * time without being moved and an attacker cracks the application,
+ * all he reads in the entropy pool is a value that is NEVER EVER
+ * being used for anything. Thus, he does NOT see the previous value
+ * that was returned to the caller for cryptographic purposes.
+ */
+ /* If we use secured memory, do not use that precaution as the secure
+ * memory protects the entropy pool. Moreover, note that using this
+ * call reduces the speed of the RNG by up to half */
+#ifndef CONFIG_CRYPTO_CPU_JITTERENTROPY_SECURE_MEMORY
+ jent_gen_entropy(entropy_collector);
+#endif
+
+ return orig_len;
+}
+
+/***************************************************************************
+ * Initialization logic
+ ***************************************************************************/
+
+struct rand_data *jent_entropy_collector_alloc(unsigned int osr,
+ unsigned int flags)
+{
+ struct rand_data *entropy_collector;
+
+ entropy_collector = jent_zalloc(sizeof(struct rand_data));
+ if(NULL == entropy_collector)
+ return NULL;
+
+ if(!(flags & JENT_DISABLE_MEMORY_ACCESS))
+ {
+ /* Allocate memory for adding variations based on memory
+ * access
+ */
+ entropy_collector->mem =
+ (unsigned char *)jent_zalloc(JENT_MEMORY_SIZE);
+ if(NULL == entropy_collector->mem)
+ {
+ jent_zfree(entropy_collector, sizeof(struct rand_data));
+ return NULL;
+ }
+ entropy_collector->memblocksize = JENT_MEMORY_BLOCKSIZE;
+ entropy_collector->memblocks = JENT_MEMORY_BLOCKS;
+ entropy_collector->memaccessloops = JENT_MEMORY_ACCESSLOOPS;
+ }
+
+ /* verify and set the oversampling rate */
+ if(0 == osr)
+ osr = 1; /* minimum sampling rate is 1 */
+ entropy_collector->osr = osr;
+
+ entropy_collector->stir = 1;
+ if(flags & JENT_DISABLE_STIR)
+ entropy_collector->stir = 0;
+ if(flags & JENT_DISABLE_UNBIAS)
+ entropy_collector->disable_unbias = 1;
+
+ /* fill the data pad with non-zero values */
+ jent_gen_entropy(entropy_collector);
+
+ /* initialize the FIPS 140-2 continuous test if needed */
+ jent_fips_test(entropy_collector);
+
+ return entropy_collector;
+}
+
+void jent_entropy_collector_free(struct rand_data *entropy_collector)
+{
+
+ jent_zfree(entropy_collector->mem, JENT_MEMORY_SIZE);
+ jent_zfree(entropy_collector, sizeof(struct rand_data));
+ entropy_collector = NULL;
+}
+
+int jent_entropy_init(void)
+{
+ int i;
+ __u64 delta_sum = 0;
+ __u64 old_delta = 0;
+ int time_backwards = 0;
+ int count_var = 0;
+ int count_mod = 0;
+
+ /* We could perform statistical tests here, but the problem is
+ * that we only have a few loop counts to do testing. These
+ * loop counts may show some slight skew and we produce
+ * false positives.
+ *
+ * Moreover, only old systems show potentially problematic
+ * jitter entropy that could potentially be caught here. But
+ * the RNG is intended for hardware that is available or widely
+ * used, but not old systems that are long out of favor. Thus,
+ * no statistical tests.
+ */
+
+ /* We could add a check for system capabilities such as clock_getres or
+ * check for CONFIG_X86_TSC, but it does not make much sense as the
+ * following sanity checks verify that we have a high-resolution
+ * timer. */
+ /* TESTLOOPCOUNT needs some loops to identify edge systems. 100 is
+ * definitely too little. */
+#define TESTLOOPCOUNT 300
+#define CLEARCACHE 100
+ for(i = 0; (TESTLOOPCOUNT + CLEARCACHE) > i; i++)
+ {
+ __u64 time = 0;
+ __u64 time2 = 0;
+ __u64 folded = 0;
+ __u64 delta = 0;
+
+ jent_get_nstime(&time);
+ jent_fold_time(time, &folded, 1<<MIN_FOLD_LOOP_BIT);
+ jent_get_nstime(&time2);
+
+ /* test whether timer works */
+ if(!time || !time2)
+ return ENOTIME;
+ delta = time2 - time;
+ /* test whether timer is fine grained enough to provide
+ * delta even when called shortly after each other -- this
+ * implies that we also have a high resolution timer */
+ if(!delta)
+ return ECOARSETIME;
+ /* TIME_ENTROPY_BITS states the absolute minimum entropy we
+ * assume the time variances have. As we also check for
+ * delta of deltas, we ensure that there is a varying delta
+ * value, preventing identical time spans */
+ if(TIME_ENTROPY_BITS > delta)
+ return EMINVARIATION;
+
+ /* up to here we did not modify any variable that will be
+ * evaluated later, but we already performed some work. Thus we
+ * already have had an impact on the caches, branch prediction,
+ * etc. with the goal to clear it to get the worst case
+ * measurements. */
+ if(CLEARCACHE > i)
+ continue;
+
+ /* test whether we have an increasing timer */
+ if(!(time2 > time))
+ time_backwards++;
+
+ if(!(delta % 100))
+ count_mod++;
+
+ /* ensure that we have a varying delta timer which is necessary
+ * for the calculation of entropy -- perform this check
+ * only after the first loop is executed as we need to prime
+ * the old_data value */
+ if(i)
+ {
+ if(delta != old_delta)
+ count_var++;
+ if(delta > old_delta)
+ delta_sum += (delta - old_delta);
+ else
+ delta_sum += (old_delta - delta);
+ }
+ old_delta = delta;
+
+ }
+
+ /* we allow up to three times the time running backwards.
+ * CLOCK_REALTIME is affected by adjtime and NTP operations. Thus,
+ * if such an operation just happens to interfere with our test, it
+ * should not fail. The value of 3 should cover the NTP case being
+ * performed during our test run. */
+ if(3 < time_backwards)
+ return ENOMONOTONIC;
+ /* Error if the time variances are always identical */
+ if(!delta_sum)
+ return EVARVAR;
+
+ /* Variations of deltas of time must on average be larger
+ * than TIME_ENTROPY_BITS to ensure the entropy estimation
+ * implied with TIME_ENTROPY_BITS is preserved */
+ if(!(delta_sum / TESTLOOPCOUNT) > TIME_ENTROPY_BITS)
+ return EMINVARVAR;
+
+ /* Ensure that we have variations in the time stamp below 10 for at least
+ * 10% of all checks -- on some platforms, the counter increments in
+ * multiples of 100, but not always */
+ if((TESTLOOPCOUNT/10 * 9) < count_mod)
+ return ECOARSETIME;
+
+ return 0;
+}
+
diff --git a/include/linux/jitterentropy.h b/include/linux/jitterentropy.h
new file mode 100644
index 0000000..5953e2f
--- /dev/null
+++ b/include/linux/jitterentropy.h
@@ -0,0 +1,164 @@
+/*
+ * Non-physical true random number generator based on timing jitter.
+ *
+ * Copyright Stephan Mueller <[email protected]>, 2014
+ *
+ * License
+ * =======
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, and the entire permission notice in its entirety,
+ * including the disclaimer of warranties.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. The name of the author may not be used to endorse or promote
+ * products derived from this software without specific prior
+ * written permission.
+ *
+ * ALTERNATIVELY, this product may be distributed under the terms of
+ * the GNU General Public License, in which case the provisions of the GPL are
+ * required INSTEAD OF the above restrictions. (This clause is
+ * necessary due to a potential bad interaction between the GPL and
+ * the restrictions contained in a BSD-style copyright.)
+ *
+ * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, ALL OF
+ * WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
+ * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
+ * USE OF THIS SOFTWARE, EVEN IF NOT ADVISED OF THE POSSIBILITY OF SUCH
+ * DAMAGE.
+ */
+
+#ifndef _JITTERENTROPY_H
+#define _JITTERENTROPY_H
+
+#include <linux/slab.h> /* needed for kzalloc */
+#include <linux/module.h> /* needed for random_get_entropy */
+#include <linux/fips.h> /* needed for fips_enabled */
+#include <linux/time.h> /* needed for __getnstimeofday */
+
+static inline void jent_get_nstime(__u64 *out)
+{
+ struct timespec ts;
+ __u64 tmp = 0;
+
+ tmp = random_get_entropy();
+
+ /* If random_get_entropy does not return a value invoke __getnstimeofday
+ * hoping that there are timers we can work with.
+ *
+ * The list of available timers can be obtained from
+ * /sys/devices/system/clocksource/clocksource0/available_clocksource
+ * and are registered with clocksource_register()
+ */
+ if((0 == tmp) &&
+ (0 == timekeeping_valid_for_hres()) &&
+ (0 == __getnstimeofday(&ts)))
+ {
+ tmp = ts.tv_sec;
+ tmp = tmp << 32;
+ tmp = tmp | ts.tv_nsec;
+ }
+
+ *out = tmp;
+}
+
+static inline void *jent_zalloc(size_t len)
+{
+ /* We consider kernel memory as secure -- if somebody breaks it,
+ * the user has much more pressing problems than the state of our
+ * RNG */
+#define CONFIG_CRYPTO_CPU_JITTERENTROPY_SECURE_MEMORY
+ return kzalloc(len, GFP_KERNEL);
+}
+static inline void jent_zfree(void *ptr, unsigned int len)
+{
+ kzfree(ptr);
+}
+
+static inline int jent_fips_enabled(void)
+{
+ return fips_enabled;
+}
+
+/* The entropy pool */
+struct rand_data
+{
+ /* all data values that are vital to maintain the security
+ * of the RNG are marked as SENSITIVE. A user must not
+ * access that information while the RNG executes its loops to
+ * calculate the next random value. */
+ __u64 data; /* SENSITIVE Actual random number */
+ __u64 prev_time; /* SENSITIVE Previous time stamp */
+#define DATA_SIZE_BITS ((sizeof(__u64)) * 8)
+ __u64 old_data; /* SENSITIVE FIPS continuous test */
+ unsigned int osr; /* Oversample rate */
+ unsigned int fips_fail:1; /* FIPS status */
+ unsigned int stir:1; /* Post-processing stirring */
+ unsigned int disable_unbias:1; /* Deactivate Von-Neuman unbias */
+#define JENT_MEMORY_BLOCKS 64
+#define JENT_MEMORY_BLOCKSIZE 32
+#define JENT_MEMORY_ACCESSLOOPS 128
+#define JENT_MEMORY_SIZE (JENT_MEMORY_BLOCKS*JENT_MEMORY_BLOCKSIZE)
+ unsigned char *mem; /* Memory access location with size of
+ * memblocks * memblocksize */
+ unsigned int memlocation; /* Pointer to byte in *mem */
+ unsigned int memblocks; /* Number of memory blocks in *mem */
+ unsigned int memblocksize; /* Size of one memory block in bytes */
+ unsigned int memaccessloops; /* Number of memory accesses per random
+ * bit generation */
+};
+
+/* Flags that can be used to initialize the RNG */
+#define JENT_DISABLE_STIR (1<<0) /* Disable stirring the entropy pool */
+#define JENT_DISABLE_UNBIAS (1<<1) /* Disable Von Neuman unbias */
+#define JENT_DISABLE_MEMORY_ACCESS (1<<2) /* Disable memory access for more
+ entropy, saves MEMORY_SIZE RAM for
+ entropy collector */
+
+/* Number of low bits of the time value that we want to consider */
+#define TIME_ENTROPY_BITS 1
+
+#define DRIVER_NAME "jitterentropy"
+
+/* -- BEGIN Main interface functions -- */
+
+/* Number of low bits of the time value that we want to consider */
+/* get raw entropy */
+int jent_read_entropy(struct rand_data *entropy_collector,
+ char *data, size_t len);
+/* initialize an instance of the entropy collector */
+struct rand_data *jent_entropy_collector_alloc(unsigned int osr,
+ unsigned int flags);
+/* clearing of entropy collector */
+void jent_entropy_collector_free(struct rand_data *entropy_collector);
+
+/* initialization of entropy collector */
+int jent_entropy_init(void);
+
+/* -- END of Main interface functions -- */
+
+/* -- BEGIN error codes for init function -- */
+#define ENOTIME 1 /* Timer service not available */
+#define ECOARSETIME 2 /* Timer too coarse for RNG */
+#define ENOMONOTONIC 3 /* Timer is not monotonic increasing */
+#define EMINVARIATION 4 /* Timer variations too small for RNG */
+#define EVARVAR 5 /* Timer does not produce variations of variations
+ (2nd derivation of time is zero) */
+#define EMINVARVAR 6 /* Timer variations of variations is too small */
+#define EPROGERR 7 /* Programming error */
+
+/* -- END of statistical test function -- */
+
+#endif /* _JITTERENTROPY_H */
+
--
1.8.5.3
On Tue, Feb 4, 2014 at 1:40 PM, Stephan Mueller <[email protected]> wrote:
> +CFLAGS_jitterentropy-base.o = -O0
Why? if really needed, this deserves a comment.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
Am Dienstag, 4. Februar 2014, 14:39:54 schrieb Geert Uytterhoeven:
Hi Geert,
>On Tue, Feb 4, 2014 at 1:40 PM, Stephan Mueller <[email protected]>
wrote:
>> +CFLAGS_jitterentropy-base.o = -O0
>
>Why? if really needed, this deserves a comment.
Sorry to have not explained all details in the email.
Please consider the following rationale found in the jitterentropy-
base.c, given for jent_fold_time:
* The code is deliberately inefficient and shall stay that way. This
function
* is the root cause why the code shall be compiled without
optimization. This
* function not only acts as folding operation, but this function's
execution
* is used to measure the CPU execution time jitter. Any change to the
loop in
* this function implies that careful retesting must be done.
The idea of the RNG is to measure the execution timing of a set of
instruction. The set of instructions that are measured as part of the
execution timing jitter measurement is exactly the jent_fold_time
function. When the compiler applies optimizations, one really does not
know how this deliberately inefficient loop is made more efficient. When
it is made more efficient by the compiler, it is unclear how much
instructions are really executed. And the less instructions, the less
timing variations the RNG gets, the less entropy the RNG picks up.
Please note that all testing that is executed shows that optimizations
do not really matter. But there are some very old systems (AMD
Semperons, very old Pentiums) who show timing variations which border to
the lowest allowed variations when enabling optimizations. When
disabling optimizations, all the expected timing variations are present.
And for an RNG, it is always important to have "leeway" in the amount of
entropy sampled from the raw noise, i.e. it is better to be too
conservative and underestimate the entropy by a significant amount.
As the entire RNG is intended to be based on timing variations, I felt
that the entire C file can be compiled without optimizations. In this
case, even the post-processing of the data while collecting entropy even
adds more entropy, albeit this impact was not subject to testing or
analysis -- at least it will not diminish the measured timing
variations.
Also, I consider the execution speed of the entropy collection is not
really an issue because the RNG delivers random numbers at a
comparatively high rate. Any other noise source feeding into random.c
delivers data with far less speed.
For more details, please see [1] section 5.1 below the presented graphs.
[1] http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html
Ciao
Stephan
On Tue, Feb 04, 2014 at 05:19:52PM +0100, Stephan Mueller wrote:
> Also, I consider the execution speed of the entropy collection is not
> really an issue because the RNG delivers random numbers at a
> comparatively high rate. Any other noise source feeding into random.c
> delivers data with far less speed.
Compiling the kernel with -O0 could add some other problems, like
e.g. not doing enough constant folding which could result in linking
errors. I guess it is not a problem currently though, but some of the
compile time checks depend on this (compiletime_assert and such).
Have you looked into adding compiler barriers into relevant places in the
loops to stop the compiler from optimizing and spill out the values from the
registers to their memory locations?
Greetings,
Hannes
On Tue, Feb 04, 2014 at 05:39:57PM +0100, Hannes Frederic Sowa wrote:
> On Tue, Feb 04, 2014 at 05:19:52PM +0100, Stephan Mueller wrote:
> > Also, I consider the execution speed of the entropy collection is not
> > really an issue because the RNG delivers random numbers at a
> > comparatively high rate. Any other noise source feeding into random.c
> > delivers data with far less speed.
>
> Compiling the kernel with -O0 could add some other problems, like
> e.g. not doing enough constant folding which could result in linking
> errors. I guess it is not a problem currently though, but some of the
> compile time checks depend on this (compiletime_assert and such).
>
> Have you looked into adding compiler barriers into relevant places in the
> loops to stop the compiler from optimizing and spill out the values from the
> registers to their memory locations?
Quick follow-up:
Maybe you can get some ideas how to stop the compiler from optimizing code
from commit fe8c8a126806fe ("crypto: more robust crypto_memneq"). Maybe also
volatile could be helpful and OPTIMIZER_HIDE_VAR seems to be a good candidate
to use here, too.
Greetings,
Hannes
Am Dienstag, 4. Februar 2014, 17:39:57 schrieb Hannes Frederic Sowa:
Hi Hannes,
>On Tue, Feb 04, 2014 at 05:19:52PM +0100, Stephan Mueller wrote:
>> Also, I consider the execution speed of the entropy collection is not
>> really an issue because the RNG delivers random numbers at a
>> comparatively high rate. Any other noise source feeding into random.c
>> delivers data with far less speed.
>
>Compiling the kernel with -O0 could add some other problems, like
I thought with the given flag, I only compile the respective C file
without optimizations, but not the entire kernel. Am I wrong here?
>e.g. not doing enough constant folding which could result in linking
>errors. I guess it is not a problem currently though, but some of the
>compile time checks depend on this (compiletime_assert and such).
How do you think that my folding code can cause linking errors?
>
>Have you looked into adding compiler barriers into relevant places in
>the loops to stop the compiler from optimizing and spill out the
>values from the registers to their memory locations?
I did not look into that one, let me have a look.
>
>Greetings,
>
> Hannes
Ciao
Stephan
I really wish we could get someone inside Intel who has deep knowledge
about CPU internals to render an opinion about this. My reaction to
"I can't explain where the entropy is coming from" seems very similar
to what my home grown attempts to create an encryption algoritm when I
was much younger and much more foolish --- "it must be secure because
I can't break it".
I will note that there are parts of
> [2] http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html
which don't really add much to the discussion, but instead just simply
make an expert question how deep the analysis has gone. Measuring the
statistical tests of the entropy pool is a complete waste of time ---
and in general, using things like "dieharder" don't do anything to
increase one's confidence (and could decrease one's confidence if it
makes it appear too much like a snake oil sales document). Sure,
passing dieharder is necessary, but it isn't even vaguely close to
sufficient.
Modulo questions of how much CPU overhead it has, I wouldn't have an
objection to adding additional sources into the entropy pool, such as
what Joern has suggested. It's when there is a proposal to give such
output entropy credit that I start to get queasy. (But then again,
since most applications uses /dev/urandom, the question of entropy
credit isn't that important for many use cases.)
- Ted
Hi!
On Tue, Feb 04, 2014 at 05:53:39PM +0100, Stephan Mueller wrote:
> Am Dienstag, 4. Februar 2014, 17:39:57 schrieb Hannes Frederic Sowa:
> >On Tue, Feb 04, 2014 at 05:19:52PM +0100, Stephan Mueller wrote:
> >> Also, I consider the execution speed of the entropy collection is not
> >> really an issue because the RNG delivers random numbers at a
> >> comparatively high rate. Any other noise source feeding into random.c
> >> delivers data with far less speed.
> >
> >Compiling the kernel with -O0 could add some other problems, like
>
> I thought with the given flag, I only compile the respective C file
> without optimizations, but not the entire kernel. Am I wrong here?
That's correct.
But your file is pretty long and now someone tries to use a static inline
function from one of the included headers which itself does some tests with
compiletime_assert. In some cases compiletime_assert does generate a call to
an external function which does not exist in the kernel but will be folded away
if the compiler can check that the condition in the assert is always true. It
is expected that the linking process fails in case the assert could not be
shown valid.
So in case you add -O0 and you eliminate some optimizations the compiler may
not eliminate those calls to the undefined functions and thus you could always
get a linking error.
If you need to compile something with -O0 try to just put the non-optimizing
code into a seperate file and try to eliminate as much dependencies and
headers as possible.
> >e.g. not doing enough constant folding which could result in linking
> >errors. I guess it is not a problem currently though, but some of the
> >compile time checks depend on this (compiletime_assert and such).
>
> How do you think that my folding code can cause linking errors?
I thought about the compiler not doing enough constant folding not
you. ;) The kernel is known to not always compile with optimization
levels below -O2.
Greetings,
Hannes
On 02/04/2014 09:08 AM, Theodore Ts'o wrote:
> I really wish we could get someone inside Intel who has deep knowledge
> about CPU internals to render an opinion about this. My reaction to
> "I can't explain where the entropy is coming from" seems very similar
> to what my home grown attempts to create an encryption algoritm when I
> was much younger and much more foolish --- "it must be secure because
> I can't break it".
I think I have deep enough knowledge about CPU architectures in general
(as opposed to specific Intel designs, which I wouldn't be able to
comment on anyway) to comment. The more modern and high performance a
design you have the more sources of unpredictability there are.
However, there are very few, if any, (unintentional) sources of actual
quantum noise in a synchronous CPU, which means that this is at its core
a PRNG albeit with a large and rather obfuscated state space.
The quantum noise sources there are in a system are generally two
independent clocks running against each other. However, independent
clocks are rare; instead, most clocks are in fact slaved against each
other using PLLs and similar structures. When mixing spread spectrum
clocks and non-spread-spectrum clocks that relationship can be very
complex, but at least for some designs it is still at its core predictable.
The most damning thing in my view is that the CPUs that need it the most
-- small, embedded machines without high resolution clocks and few
sources of I/O noise -- are also the simplest designs and therefore are
the least likely ones to have any actual entropy coming out of this.
As mentioned, I definitely have no objection to these sort of things as
zero-credit entropy sources -- they cannot, by definition, do harm,
unless they somehow cancel other inputs out -- but the notion of making
them creditable sources makes me skeptical in the extreme.
> I will note that there are parts of
>
>> [2] http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html
>
> which don't really add much to the discussion, but instead just simply
> make an expert question how deep the analysis has gone. Measuring the
> statistical tests of the entropy pool is a complete waste of time ---
> and in general, using things like "dieharder" don't do anything to
> increase one's confidence (and could decrease one's confidence if it
> makes it appear too much like a snake oil sales document). Sure,
> passing dieharder is necessary, but it isn't even vaguely close to
> sufficient.
This definitely doesn't help one iota.
> Modulo questions of how much CPU overhead it has, I wouldn't have an
> objection to adding additional sources into the entropy pool, such as
> what Joern has suggested. It's when there is a proposal to give such
> output entropy credit that I start to get queasy. (But then again,
> since most applications uses /dev/urandom, the question of entropy
> credit isn't that important for many use cases.)
Entropy credit mostly matters for rngd backpressure, although I do
believe that for things like generating passwords or persistent keys at
least one should use /dev/random.
-hpa
On Tue, Feb 04, 2014 at 11:06:04AM -0800, H. Peter Anvin wrote:
>
> The quantum noise sources there are in a system are generally two
> independent clocks running against each other. However, independent
> clocks are rare; instead, most clocks are in fact slaved against each
> other using PLLs and similar structures.
One of the things that would be useful for us to understand is in
general, where in a system we have independent clocks. For example, I
think (correct me if I'm wrong), a 2.5" or 3.5" HDD has its own clock
which is separate from the CPU/chipset. That is actually how and
where we get any entropy; I am not at all convinced that we are
getting any variation from "chaotic air turbulence in the HDD" ---
that paper was published in 1994, and hard drive technologies have
changed quite a bit since then, with extra layers of caching, track
bufers, etc.
However, where a decade ago the ethernet card probably had its own
independent clock crystal/oscillator, I'm going to guess that these
days with SOC's and even on laptops, with ethernet device part of the
chipset, it is probably being driven off the same master oscillator.
I wonder if there's anyway we can either figure out manually, or
preferably, automatically at boot time, which devices actually have
independent clock oscillators.
- Ted
On Tue, Feb 4, 2014 at 8:23 PM, <[email protected]> wrote:
> However, where a decade ago the ethernet card probably had its own
> independent clock crystal/oscillator, I'm going to guess that these
> days with SOC's and even on laptops, with ethernet device part of the
> chipset, it is probably being driven off the same master oscillator.
USB typically still has its own crystal.
> I wonder if there's anyway we can either figure out manually, or
> preferably, automatically at boot time, which devices actually have
> independent clock oscillators.
You may find this information in the DT on some platforms (if you're
lucky).
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
Am Dienstag, 4. Februar 2014, 12:08:23 schrieb Theodore Ts'o:
Hi Theodore,
>
>> [2] http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html
>
>which don't really add much to the discussion, but instead just simply
>make an expert question how deep the analysis has gone. Measuring the
>statistical tests of the entropy pool is a complete waste of time ---
>and in general, using things like "dieharder" don't do anything to
>increase one's confidence (and could decrease one's confidence if it
>makes it appear too much like a snake oil sales document). Sure,
>passing dieharder is necessary, but it isn't even vaguely close to
>sufficient.
>
I am a bit surprised by this statement because I use statistical test
only as a necessary baseline. After that, the hard work started on the
actual noise source by measuring the actual noise coming out of the
noise sources. All of these tests have nothing to do with the
statistical tests of dieharder & Co.
That is the sole reason why I looked into testing the timing variations
on bare metal, independently for each noise source. Also, measuring the
basic noise on a large array of different CPUs which then show similar
behaviors are used to help my case.
So, when only looking at the statistical tests, the majority of the work
is not considered.
Ciao
Stephan
Am Dienstag, 4. Februar 2014, 11:06:04 schrieb H. Peter Anvin:
Hi Peter,
>On 02/04/2014 09:08 AM, Theodore Ts'o wrote:
>> I really wish we could get someone inside Intel who has deep
>> knowledge
>> about CPU internals to render an opinion about this. My reaction to
>> "I can't explain where the entropy is coming from" seems very similar
>> to what my home grown attempts to create an encryption algoritm when
>> I
>> was much younger and much more foolish --- "it must be secure because
>> I can't break it".
>
>I think I have deep enough knowledge about CPU architectures in general
>(as opposed to specific Intel designs, which I wouldn't be able to
>comment on anyway) to comment. The more modern and high performance a
>design you have the more sources of unpredictability there are.
>However, there are very few, if any, (unintentional) sources of actual
>quantum noise in a synchronous CPU, which means that this is at its
>core a PRNG albeit with a large and rather obfuscated state space.
>
>The quantum noise sources there are in a system are generally two
>independent clocks running against each other. However, independent
>clocks are rare; instead, most clocks are in fact slaved against each
>other using PLLs and similar structures. When mixing spread spectrum
>clocks and non-spread-spectrum clocks that relationship can be very
>complex, but at least for some designs it is still at its core
>predictable.
But isn't there an additional clock? The clock used to drive the cache
and memory bus? When measuring memory accesses timings, larger
variations in the execution time are evident. This also applies when
hitting the caches (for L1, the variations are less than for L2 than for
L3). The variations in access timings would come from the CPU wait
states and their duration, would it not?
Ciao
Stephan
On 02/04/2014 11:39 AM, Geert Uytterhoeven wrote:
> On Tue, Feb 4, 2014 at 8:23 PM, <[email protected]> wrote:
>> However, where a decade ago the ethernet card probably had its own
>> independent clock crystal/oscillator, I'm going to guess that these
>> days with SOC's and even on laptops, with ethernet device part of the
>> chipset, it is probably being driven off the same master oscillator.
>
> USB typically still has its own crystal.
USB and the Ethernet PHY frequently do still have their own crystals,
for reasons not entirely clear to me. However, what all of these have
in common is that they are way out in the periphery.
>> I wonder if there's anyway we can either figure out manually, or
>> preferably, automatically at boot time, which devices actually have
>> independent clock oscillators.
>
> You may find this information in the DT on some platforms (if you're
> lucky).
On most systems today, all the high speed clocks (CPU, memory, etc.) are
all fed from a single oscillator. On PCs there used to be a separate
14.31818 MHz oscillator for the PIT, PMTIMER and HPET, but that is
increasingly handled by a frequency converter from the main bus clock.
Oscillators are expensive, and true asynchronous domains cause problems
with metastability.
-hpa
On 02/04/2014 12:31 PM, Stephan Mueller wrote:
>>
>> The quantum noise sources there are in a system are generally two
>> independent clocks running against each other. However, independent
>> clocks are rare; instead, most clocks are in fact slaved against each
>> other using PLLs and similar structures. When mixing spread spectrum
>> clocks and non-spread-spectrum clocks that relationship can be very
>> complex, but at least for some designs it is still at its core
>> predictable.
>
> But isn't there an additional clock? The clock used to drive the cache
> and memory bus? When measuring memory accesses timings, larger
> variations in the execution time are evident. This also applies when
> hitting the caches (for L1, the variations are less than for L2 than for
> L3). The variations in access timings would come from the CPU wait
> states and their duration, would it not?
>
Variations doesn't mean quantum unpredictable noise. All the clocks you
are referring to are derived from the same BCLK and thus predictable.
What you have here is a PRNG with a large and obscure state space.
-hpa
On Tue, Feb 4, 2014 at 9:31 PM, Stephan Mueller <[email protected]> wrote:
>>On 02/04/2014 09:08 AM, Theodore Ts'o wrote:
>>> I really wish we could get someone inside Intel who has deep
>>> knowledge
>>> about CPU internals to render an opinion about this. My reaction to
>>> "I can't explain where the entropy is coming from" seems very similar
>>> to what my home grown attempts to create an encryption algoritm when
>>> I
>>> was much younger and much more foolish --- "it must be secure because
>>> I can't break it".
>>
>>I think I have deep enough knowledge about CPU architectures in general
>>(as opposed to specific Intel designs, which I wouldn't be able to
>>comment on anyway) to comment. The more modern and high performance a
>>design you have the more sources of unpredictability there are.
>>However, there are very few, if any, (unintentional) sources of actual
>>quantum noise in a synchronous CPU, which means that this is at its
>>core a PRNG albeit with a large and rather obfuscated state space.
>>
>>The quantum noise sources there are in a system are generally two
>>independent clocks running against each other. However, independent
>>clocks are rare; instead, most clocks are in fact slaved against each
>>other using PLLs and similar structures. When mixing spread spectrum
>>clocks and non-spread-spectrum clocks that relationship can be very
>>complex, but at least for some designs it is still at its core
>>predictable.
>
> But isn't there an additional clock? The clock used to drive the cache
> and memory bus? When measuring memory accesses timings, larger
> variations in the execution time are evident. This also applies when
> hitting the caches (for L1, the variations are less than for L2 than for
> L3). The variations in access timings would come from the CPU wait
> states and their duration, would it not?
CPU, cache, and memory bus clocks are usually derived from the same
crystal. Hence they're not independent.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
On Tue, Feb 4, 2014 at 9:39 PM, H. Peter Anvin <[email protected]> wrote:
> On 02/04/2014 11:39 AM, Geert Uytterhoeven wrote:
>> On Tue, Feb 4, 2014 at 8:23 PM, <[email protected]> wrote:
>>> However, where a decade ago the ethernet card probably had its own
>>> independent clock crystal/oscillator, I'm going to guess that these
>>> days with SOC's and even on laptops, with ethernet device part of the
>>> chipset, it is probably being driven off the same master oscillator.
>>
>> USB typically still has its own crystal.
>
> USB and the Ethernet PHY frequently do still have their own crystals,
> for reasons not entirely clear to me. However, what all of these have
> in common is that they are way out in the periphery.
Because they're fixed frequency, and used for communication with other
devices, so accuracy matters?
Other clocks can be tuned for performance or power reasons, but clocks
for communication must be fixed and stable. You can run e.g. your CPU
or memory a bit slower or faster, but not your Ethernet.
>>> I wonder if there's anyway we can either figure out manually, or
>>> preferably, automatically at boot time, which devices actually have
>>> independent clock oscillators.
>>
>> You may find this information in the DT on some platforms (if you're
>> lucky).
>
> On most systems today, all the high speed clocks (CPU, memory, etc.) are
> all fed from a single oscillator.
Indeed.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
On 02/04/2014 01:46 PM, Geert Uytterhoeven wrote:
>
> Because they're fixed frequency, and used for communication with other
> devices, so accuracy matters?
>
> Other clocks can be tuned for performance or power reasons, but clocks
> for communication must be fixed and stable. You can run e.g. your CPU
> or memory a bit slower or faster, but not your Ethernet.
>
But modern frequency synthesizers can do that easily.
-hpa
On Tue, 4 February 2014 12:39:28 -0800, H. Peter Anvin wrote:
>
> USB and the Ethernet PHY frequently do still have their own crystals,
> for reasons not entirely clear to me. However, what all of these have
> in common is that they are way out in the periphery.
Storage might be another source. We have had add_disk_randomness()
forever. Flash also takes quite variable timings for writes and
erases. Even if the timings are not random, they certainly change
from block to block and depending on wear.
I am less certain about reads. But one can run a few experiments and
see how consistent the timings are.
Jörn
--
Linux is more the core point of a concept that surrounds "open source"
which, in turn, is based on a false concept. This concept is that
people actually want to look at source code.
-- Rob Enderle