2018-07-02, 12:49
I have the following situation:
- a brand new computer with 4 ethernet ports all the same
- a router that adds relay agent information (DHCP option 82)
- a server that is set up to give me the right image depending on the port to which the computer is connected to the router (this is where the DHCP option 82 is used)
The problem is that ipxe sends a DHCP discover whose ethernet frame has 13 extra bytes of zero padding after the IPv4 packet containing the DHCP discover. The router adds its relay agent information after the padding and adjust the IPv4 packet header to accommodate this by adding the quantity of bytes added to the IPv4 header. This results in truncation of the relay agent information in the ethernet frame which was invalid anyway as the router placed it after the DHCP END option in the DHCP discver request. As a result no IP address is assigned by the server and booting fails.
I have traced the fault back to:
DHCP packet length: 347 bytes
UDP packet length: 375 bytes (347+28: OK)
Ethernet frame length: 403 bytes (375 + 14=389: NOK)
And from enabling and adding some debugging statements in iPXE:
DHCP packet length: 347: OK
IPv4 length: 375: OK
NETDEV length: 185 (hex) = 389 (decimal): OK, assuming that netdevice.c should add the link layer header.
NII: cpb.mediaheaderlength = 14, cpb.datalength = 389 (14+389=403: NOK!)
It appears that the NII driver whose implementation is in the UEFI BIOS adds the link level header by itself.
Unfortunately, I am kinda forced to use iPXE on an USB boot stick for booting:
- the UEFI BIOS does not allow me to single out a specific network port to do network booting from; the other three network ports are not allowed to send out DHCP requests.
- there is a strict requirement that the machine should attempt to boot infinitely many times until router, server and image to apply are all available and the BIOS PXE downloader gives up after minute.
An embedded iPXE script works wonders to get this functionality going (at least it did on other computers using legacy PXE booting).
How and where do I fix this without mucking up the later DHCP cycles and TFTP transfers?
- a brand new computer with 4 ethernet ports all the same
- a router that adds relay agent information (DHCP option 82)
- a server that is set up to give me the right image depending on the port to which the computer is connected to the router (this is where the DHCP option 82 is used)
The problem is that ipxe sends a DHCP discover whose ethernet frame has 13 extra bytes of zero padding after the IPv4 packet containing the DHCP discover. The router adds its relay agent information after the padding and adjust the IPv4 packet header to accommodate this by adding the quantity of bytes added to the IPv4 header. This results in truncation of the relay agent information in the ethernet frame which was invalid anyway as the router placed it after the DHCP END option in the DHCP discver request. As a result no IP address is assigned by the server and booting fails.
I have traced the fault back to:
DHCP packet length: 347 bytes
UDP packet length: 375 bytes (347+28: OK)
Ethernet frame length: 403 bytes (375 + 14=389: NOK)
And from enabling and adding some debugging statements in iPXE:
DHCP packet length: 347: OK
IPv4 length: 375: OK
NETDEV length: 185 (hex) = 389 (decimal): OK, assuming that netdevice.c should add the link layer header.
NII: cpb.mediaheaderlength = 14, cpb.datalength = 389 (14+389=403: NOK!)
It appears that the NII driver whose implementation is in the UEFI BIOS adds the link level header by itself.
Unfortunately, I am kinda forced to use iPXE on an USB boot stick for booting:
- the UEFI BIOS does not allow me to single out a specific network port to do network booting from; the other three network ports are not allowed to send out DHCP requests.
- there is a strict requirement that the machine should attempt to boot infinitely many times until router, server and image to apply are all available and the BIOS PXE downloader gives up after minute.
An embedded iPXE script works wonders to get this functionality going (at least it did on other computers using legacy PXE booting).
How and where do I fix this without mucking up the later DHCP cycles and TFTP transfers?