iPXE discussion forum

Full Version: iPXE EFI - no network after the DHCP sequence
You're currently viewing a stripped down version of our content. View the full version with proper formatting.

we are experiencing multiple issues while booting with iPXE for UEFI.

We have compiled it as "snp.efi" (app) and "snp.efidrv" (driver).

While booting with iPXE included as an EFI driver in our firmware, the Intel NIC is seen, then the DHCP sequence is done and the client receives its IP parameters. However, it fails to continue (HTTP download timeout). At this step, we noticed that the client and the server cannot ping themselves (whereas the DHCPACK has been received by the client), and the client emits lots of ARP WHO-HAS to know the MAC address of the server.

Here are the corresponding traces:

iPXE 1.0.0+ (827dd) -- Open Source Network Boot Firmware -- http://ipxe.org
Features: DNS HTTP TFTP EFI Menu

net0: 08:00:38:b0:90:ad using NII on NII-0000:02:00.0 (open)
[Link:up, TX:0 TXE:0 RX:2993 RXE:6]
[RXE: 4 x "Network unreachable (http://ipxe.org/28056090)"]
[RXE: 2 x "Operation not supported (http://ipxe.org/3c086083)"]
DHCP 0xa7e02308 entering discovery state
Configuring (net0 08:00:38:b0:90:ad)...DHCP 0xa7e02308 DHCPDISCOVER
DHCP 0xa7e02308 DHCPOFFER from for
..DHCP 0xa7e02308 entering request state
DHCP 0xa7e02308 DHCPREQUEST to for
DHCP 0xa7e02308 DHCPACK from for
net0: gw
Next server:
Filename: Operation canceled (http://ipxe.org/0b072095)

iPXE> ifstat
net0: 08:00:38:b0:90:ad using NII on NII-0000:02:00.0 (open)
[Link:up, TX:7 TXE:4 RX:4992 RXE:137]
[TXE: 4 x "Error 0x2a376089 (http://ipxe.org/2a376089)"]
[RXE: 56 x "Network unreachable (http://ipxe.org/28056090)"]
[RXE: 25 x "Operation not supported (http://ipxe.org/3c086083)"]
[RXE: 56 x "Invalid argument (http://ipxe.org/1c056082)"]
net1: 00:a0:c9:00:00:01 using NII on NII-0000:02:00.1 (closed)
[Link:down, TX:0 TXE:0 RX:0 RXE:0]
[Link status: Unknown (http://ipxe.org/1a086194)]
iPXE> ping
0 bytes from <none>: seq=1: Connection timed out (http://ipxe.org/4c1b2092)
0 bytes from <none>: seq=2: Connection timed out (http://ipxe.org/4c1b2092)
0 bytes from <none>: seq=3: Connection timed out (http://ipxe.org/4c1b2092)
Finished: Operation canceled (http://ipxe.org/0b072095)

As you can see, there a few network errors, but this doesn't seem to be the root cause of this issue: that works well with the snp.efi application when it is launched from an EFI shell !

Does someone has an idea about this issue (or what we are doing wrong) ?

We have a second issue which is really less important at this step, because we have a workaround. Even with the snp.efi application, we need to add a 'sleep' of 10 seconds after retrieving the kernel and the initrd to avoid a kernel crash in the first second (while enumerating the cores [smpboot]).

Many thanks for any help. Please let me know if more traces might help.

If you must include ipxe into firmware then uou should build the efi that is specific for your NIC, and not use snp.
Let me try and explain like this:

snponly.efi will have support for the SNP/NII driver framework and relies on the existing firmware driver to work, if you add this to your firmware you normally replace the existing NIC fw. if you add both that will be weird.

Instead you will want to use the ipxe native driver for your nic (if there is one) to help further on this we would need more info about your hardware, pci device id to begin with 8086:xxxx you can get it in linux with lspci -nn

My suggestion otherwise is to use the normal firmware pxe support and chain into snponly.efi or ipxe.efi
Many thanks for your reply.

Until recently, we succeeded to boot our nodes by chainloading "snp.efi", but now we see the devices which are unusable (link seen as down). We also tried snponly.efi and ipxe.efi but no interfaces were seen.

I tried to chainload a new version "80861523.efi", corresponding to an Intel NIC i350, and that works fine!

At this step, we do not have removed the native EFI Intel driver included in our firmware. After having unloaded it (through EFI shell), that works too! We do need to integrate iPXE for several reasons:
- we would like to avoid a second DHCP sequence properly
- we want to play with crypto features (for authenticated clients with HTTPS)
and we love the features provided by iPXE, that's a good reason!

Thank you
Finally, after several tests, we noticed the EFI driver 80861523.efidrv (for i350 backplane) works only 50% of the time. We followed your advises in removing the PowerVille/I350 Gbe EFI PXE driver to avoid conflicts.

We noticed various errors when that doesn't work:
- sometimes we observe the same behavior than the snp.efidrv: DHCP seems to work fine, but the network stack remains unusable (20%).
- the firmware freezes while loading the iPXE driver, which is loaded at the end, just before to go in the EFI shell (30%).
- the firmware freezes when iPXE tries to "initializes the devices" (40%).
- sometimes we get an iPXE error: http://ipxe.org/err/2a6540 which points to drivers/net/intel.c (10%)

The EFI application works fine when we launch it from the ESP, and also when we chainload it from PXE. One time (on 30 tests), we got a freeze of our firmware just after iPXE was downloaded from PXE (DXE_ASSERT...).

Please let me know what traces might help to debug.
Thank you in advance for any help.

we had a similar problem. The solution was to enable portfast on the port where our servers were connected.

we lost a lot of time on that. I suspect the root cause is the intel driver for i350 but enabling portfast in the switch was the easy fix after a few days of debugging...
(2016-12-16 21:43)methodmath Wrote: [ -> ]we had a similar problem. The solution was to enable portfast on the port where our servers were connected.

we lost a lot of time on that. I suspect the root cause is the intel driver for i350 but enabling portfast in the switch was the easy fix after a few days of debugging...

I'm gonna assume you are using DHCP...
Yes portfast (or enabled STP) is a common issue for not being able to use iPXE because the dhcp requests in ipxe times out (the timeout in ipxe is lower in ipxe then the standard)

In this case the DHCP request did work just fine for cbn38 then it dies, so does not seem to be the same issue.

I also guess that you did get a error messsage when the dhcp request failed that error message has something like a http://ipxe.org/err/4c1060 url which points to a page where portfast is mentioned.
Reference URL's