iPXE discussion forum
iPXE initializing devices - takes 30 seconds - Printable Version

+- iPXE discussion forum (https://forum.ipxe.org)
+-- Forum: iPXE user forums (/forumdisplay.php?fid=1)
+--- Forum: General (/forumdisplay.php?fid=2)
+--- Thread: iPXE initializing devices - takes 30 seconds (/showthread.php?tid=31540)



iPXE initializing devices - takes 30 seconds - frr - 2020-10-01 20:31

Dear everybody,

I've been using iPXE for a couple months to PXE-boot various UEFI-only PC's in the LAN in our HW service workshop. iPXE is pretty cool for that job.

There's a quirk that I've been watching for some time: on most motherboards that managed to boot, the boot process would stall at "iPXE initializing devices" for about half a minute. Today I've found some time to take a dive under the hood, and owing to iPXE's excellent debugging capabilities, I've been able to narrow down the problem to the following place:

in file src/crypto/drbg.c, in function drbg_instantiate(), there's a call to get_entropy_input() - and it's this call that takes 25+ seconds to return.

This can be further traced to src/include/ipxe/entropy.h , which is a header file, defining the inline function get_entropy_input(). And I haven't inspected all the guts yet, but chances are, that this ultimately calls a function called get_entropy_input_tmp() , defined in src/crypto/entropy.c .

The point of this whole show appears to be: to seed a random number generator with enough initial entropy. And for some reason, it takes a hideous amount of time.

I'm wondering how many people who complain about the boot process freezing at "iPXE initializing devices" are just not patient enough to wait 30 seconds to see if the machine finally boots. And the last message on the screen, without debugging compiled in, appears to refer to "devices" = you would understand that this refers to "network" devices, you'd scorn the NIC on your motherboard etc.

That 30-second pause is a disgrace.
What's this crypto stuff useful for, anyway? UEFI? Or some networking? I don't need HTTPs. Does NFS need crypto? Would the problem be solved if I just skipped some options in the config, that bring crypto support in?

For the record, for people coming after me, I'd like to mention the debugging options that led me to this conclusion. As I also need to embed a tiny script in the bootable iPXE binary (to load the menu script from a separate file), I have created the following script in /tftpboot/ipxe/embed_script.sh :

Code:
#!/bin/sh

CWDIR=`pwd`

cd /usr/src/ipxe/src
rm bin-x86_64-efi/ipxe.efi
make bin-x86_64-efi/ipxe.efi EMBED=$CWDIR/embed.ipxe DEBUG=init:3,rbg,drbg
cp bin-x86_64-efi/ipxe.efi $CWDIR/

cd $CWDIR

And my embed.ipxe looks like this:

Code:
#!ipxe

dhcp
chain tftp://192.168.100.200/ipxe/menu.ipxe

I've actually added a few more DBGC debug messages to crypto/drbg.c to see which call takes time to return.
The crypto/entropy.c does not seem to contain any DBGC messages, so I did not bother to investigate further... but I sure can try if desired. The DBGC macro declared in compiler.h is likely available even in entropy.c.

Admittedly, most of the machines that I've ever tried to boot via iPXE+UEFI were ATOM based. ATOM is just such a popular platform for industrial PC applications, and the latest generations are not nearly as sluggish as the 45nm/32nm predecessors. ATOM from Bay Trail above is my favourite CPU family, and apparently I'm not alone :-)

Thanks for providing iPXE free of charge. If there's something I can do to debug this problem, please let me know.

Frank


RE: iPXE initializing devices - takes 30 seconds - frr - 2020-10-09 13:19

For the record: I've had the chance to boot iPXE on some Coffee Lake machine and the delay didn't happen. It just ran straight without pausing for breath.

Back to ATOM-based hardware:
After some further thought and observations, I've noticed that I can only see debug messages from drbg_instantiate() and drbg_uninstantiate(), but none from drbg_generate() nor drbg_reseed(). I've tried grepping drbg_generate from the whole source code of ipxe, found some relationship to hmac_drbg, but apparently no further links? Just that indirect bootstrap via struct startup_fn.

Thus, although I feel like I'm committing a horrible sin, I went on to search for the easiest way to zap that crypto thing. And the lowest hanging fruit appears to be:
in src/crypto/drbg.c::drbg_instantiate() at line 446, comment out the call to get_entropy_input() (two lines actually)
and replace by
Code:
len = min_len;
entropy_bits = entropy_bits; // muffle -Werror=unused-variable

After that, the delay during iPXE boot on an ATOM is gone. I feel bad, but apparently this "solved" the problem for my use case, and I can see no ill effects.

My hack is ugly, but it's easier for me compared to finding the precise borderline between the "dead crypto tissue"? and actually needed source code, followed by a clean surgical removal. I haven't found a config option that would do this either. I wish someone would come and set me straight in all my misguided observations and judgements.


RE: iPXE initializing devices - takes 30 seconds - simon - 2020-10-13 14:08

This is the slightly less horrible solution I've been using for a while now:

Code:
diff --git a/src/interface/efi/efi_entropy.c b/src/interface/efi/efi_entropy.c
index dca0b6f1..f6e82e2c 100644
--- a/src/interface/efi/efi_entropy.c
+++ b/src/interface/efi/efi_entropy.c
@@ -30,6 +30,11 @@ FILE_LICENCE ( GPL2_OR_LATER_OR_UBDL );
#include <ipxe/efi/efi.h>
#include <ipxe/efi/Protocol/Rng.h>

+#if defined(__i386) || defined(__x86_64)
+#include <ipxe/cpuid.h>
+static char have_hwrnd = 0;
+#endif
+
/** @file
  *
  * EFI entropy source
@@ -141,6 +146,44 @@ static int efi_entropy_tick ( void ) {
        return low;
}

+/**
+ * Get noise sample from rdrand
+ *
+ * @ret noise          Noise sample
+ * @ret rc             Return status code
+ */
+static int efi_get_noise_rdrand ( noise_sample_t *noise ) {
+#if defined(__i386) || defined(__x86_64)
+       if ( have_hwrnd == 0 ) {
+               struct x86_features features;
+               x86_features( &features );
+               if ( features.intel.ecx & ( 1 << 30 ) ) {
+                       have_hwrnd = 1;
+               } else {
+                       have_hwrnd = 2;
+               }
+               DBGC( &tick, "Have RDRAND: %s\n", ( have_hwrnd == 1 ? "YES!" : "NO :-(" ) );
+       }
+       if ( have_hwrnd == 1 ) {
+               int ret, retries = 10;
+               char ok;
+               while ( --retries > 0 ) {
+                       __asm__ volatile ( "rdrand %0; setc %1" : "=r" ( ret ), "=qm" ( ok ) );
+                       if ( ok ) {
+                               if ( ret == -1 ) {
+                                       /* Assume this is a broken AMD CPU, fall back to TSC */
+                                       ret = profile_timestamp();
+                               }
+                               *noise = ret;
+                               return 0;
+                       }
+               }
+               return -EBUSY;
+       }
+#endif
+       return -ENOTSUP;
+}
+
/**
  * Get noise sample from timer ticks
  *
@@ -229,6 +272,7 @@ static int efi_get_noise ( noise_sample_t *noise ) {

        /* Try RNG first, falling back to timer ticks */
        if ( ( ( rc = efi_get_noise_rng ( noise ) ) != 0 ) &&
+            ( ( rc = efi_get_noise_rdrand ( noise ) ) != 0 ) &&
             ( ( rc = efi_get_noise_ticks ( noise ) ) != 0 ) )
                return rc;