I found iPXE because I needed an PXE client that supported vlan tagging. I've been using it successfully for months, but just recently it stopped having successful DHCP exchanges. I did a Wireshark capture of the DHCP exchange between the client and the DHCP server (which also serves out the pxelinux.0 later) and there is a full DHCP exchange, but there are a ton of repeated packets on the part of the diskless client. He sends around 10 DHCP discovers before he realizes that he's been getting DHCP offers, and he sends another 10 DHCP requests, receiving DHCP ACKs all the while. The problem is, the DHCP request in iPXE eventually times out. I guess the client is just ignoring the ACKs that I'm seeing it receive (I'm mirroring its port on its switch, which is how I'm able to monitor the exchange).
I'm not a complete expert on DHCP, is there some sort of timeout for each request, where if he doesn't receive an OFFER after his DISCOVER he sends another DISCOVER, ignoring any OFFERs that eventually come in?
I'm just really trying to figure out why none of the ACKs I clearly see him receive are taking, and I'm thinking maybe the ACKs are taking too long? The DHCP server works just fine for some other hardware I have (not using iPXE, just built-in Intel PXE).
Is there some way I can do a DHCP in verbose mode or something to get more information about how iPXE is processing the DHCP it gets?
I did a Wireshark capture of an Intel PXE boot loader that supports VLAN tagging and compared the packets with iPXE, and I noticed that there were different DHCP options set, such the Broadcast flag being set to Unicast on iPXE.
I'm not sure what is going on here, since I don't use VLAN features at all, but if you provide the packet trace (create an attachment) someone else might be able to help you out.
After you have done the DHCP command and seen it time out, you should provide the full output from the ifstat command, to verify how many packets have been sent and received. It is also useful to specify how you load iPXE (chainloading, rom-burned, usb/cd/floppy-loaded) and which network card you're using (VENDOR+PCI ID). If you have multiple network cards in the machine, that could also be relevant.
Almost forgot; which version/commit of iPXE are you using?
(2011-05-12 09:25)robinsmidsrod Wrote: [ -> ]I'm not sure what is going on here, since I don't use VLAN features at all, but if you provide the packet trace (create an attachment) someone else might be able to help you out.
After you have done the DHCP command and seen it time out, you should provide the full output from the ifstat command, to verify how many packets have been sent and received. It is also useful to specify how you load iPXE (chainloading, rom-burned, usb/cd/floppy-loaded) and which network card you're using (VENDOR+PCI ID). If you have multiple network cards in the machine, that could also be relevant.
Almost forgot; which version/commit of iPXE are you using?
Thanks for replying. I'll work on exporting the trace, though my company is a bit paranoid about this stuff so it could take some time. The most frustrating thing is that this exchange USED to work, so it might be some change in our network, but for the life of me I can't find any changes.
The first thing I did was disable VLAN tagging to make sure it wasn't messing anything up. I've used both chainloading and boot CD in the past, both have worked. As a test for hardware failure, I used the trusty boot CD that has always worked on another machine with identical hardware, still doesn't work now. I'm seeing tons of repeated DHCP packets from the client and he's ignoring the ACKs I'm seeing him receive. He must eventually receive a DHCP OFFER from the server he likes because he does eventually send (10) REQUESTS. And once in a blue moon, like once every 25 reboots, he takes the IP. Once in a while, he likes the ACK he's receiving or it comes fast enough (like I said, I have no idea if DHCP requests time out and how often).
The worst part is that the Intel PXE bootloader works just fine, does a perfect DHCP request (since I disabled VLAN tagging, I was able to test using the default Intel program).
The network card in question is an Intel 8086:105e and I was using a version from a couple months ago (ipxe-HEAD-72d387e) but now I'm using the latest version of iPXE, just captured it from the source and built it yesterday.
It's just so frustrating when something used to work, and then just doesn't, you know?
Is there some way I can control the details of how iPXE sends DHCP requests? The options it uses?
I think you might have been bitten by a bug. Everything seems to indicate it. I would suggest you try on the mailing list for more technical answers, or join us on channel #ipxe on irc.freenode.net to try and debug the issue in real-time.
Followup: Just did another test. I removed all VLANs and attempted DHCP with the Intel PXE loader that came with the hardware. It takes quite a long time but succeeds. I then restarted and did an identical test with iPXE from my boot CD. Nothing, but iPXE timed out before the Intel one started to succeed. I'm starting to think it has to do with how long the iPXE waits for DHCP. Any idea how to tweak this?
The default DHCP timeout is 10 seconds. You can change it in src/include/ipxe/dhcp.h, line 613 and recompile.
(2011-05-16 13:20)robinsmidsrod Wrote: [ -> ]The default DHCP timeout is 10 seconds. You can change it in src/include/ipxe/dhcp.h, line 613 and recompile.
Thanks for the response. I got the machine to complete a DHCP exchange by directly connecting it to the DHCP server. This can't be my end configuration though. I don't think the 10 seconds for the complete DHCP exchange is the issue. My current thought about why I'm having trouble is that the DHCPDISCOVER is timing out and retrying before the DHCPOFFER from the server has a chance to get there. So the offer always comes too late. I have no idea how to change how long the client waits for an offer before timing out and resending in a Linux distro, let alone in iPXE.
I'm trying to understand dhcp.h, and I'm not sure how to make it so that iPXE uses #define BOOTP_FL_BROADCAST 0x8000 in dhcp.h.
So how can I ensure that iPXE uses #define BOOTP_FL_BROADCAST 0x8000 when doing DHCP?