Post Reply 
 
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
second attempt DHCP failure
2017-09-01, 20:05 (This post was last modified: 2017-09-02 21:48 by jov0.)
Post: #1
second attempt DHCP failure
Hello,
I have a problem specific to at least i219lm-2. My iPXE image uses a trivial script that consists of a loop with the autoboot command. On the first attempt, assuming network availability etc, everything works without any issue. But if the autoboot or media detection times out and autoboot is restarted the dhcp discover packets never hit the wire. I have debug for dhcp enabled and I see that iPXE believes it is sending the packets, but a tcpdump on the server it is connected to sees no traffic. I tried removing the INTEL_NO_PHY_RST option based on another post here, but I did not observe any changes. The same image, usb built, works without any issue on other NICs like the 82567lm-2. They will loop until I reconnect the media and then instantly pull the expected DHCP configuration etc. Thoughts? As a work around I might just put the reboot command after autoboot.




These are the only changes I normally make when I build:

Build to include script
make bin/ipxe.usb EMBED=autobootloopscript.ipxe

Script
#!ipxe
:retry_boot
autoboot || goto retry_boot



./src/config/general.h (enabled ELF, NBI, PXE, scripts, and bzimage support)
/*
* Image types
*
* Etherboot supports various image formats. Select whichever ones
* you want to use.
*
*/
#define IMAGE_NBI /* NBI image support */
#define IMAGE_ELF /* ELF image support */
//#define IMAGE_MULTIBOOT /* MultiBoot image support */
#define IMAGE_PXE /* PXE image support */
#define IMAGE_SCRIPT /* iPXE script image support */
#define IMAGE_BZIMAGE /* Linux bzImage image support */
//#define IMAGE_COMBOOT /* SYSLINUX COMBOOT image support */
//#define IMAGE_EFI /* EFI image support */
//#define IMAGE_SDI /* SDI image support */
//#define IMAGE_PNM /* PNM image support */
//#define IMAGE_PNG /* PNG image support */



./src/config/dhcp.h (change the dhcp time outs)
//#define DHCP_DISC_START_TIMEOUT_SEC 1
//#define DHCP_DISC_END_TIMEOUT_SEC 10
#define DHCP_DISC_START_TIMEOUT_SEC 4 /* as per PXE spec */
#define DHCP_DISC_END_TIMEOUT_SEC 32 /* as per PXE spec */


Thanks,
screenshot of debug dhcp and intel enabled output
Find all posts by this user
Quote this message in a reply
2017-09-05, 23:58
Post: #2
RE: second attempt DHCP failure
(2017-09-01 20:05)jov0 Wrote:  screenshot of debug dhcp and intel enabled output

Your screenshot shows at least two interesting messages:

Code:
INTEL 0xdd1e8 unexpected ICR 00040000
...
INTEL 0xdd1e8 unexpected IXR 00000106

Assuming that the bits in this register are the same for i210 and i219, then the first of these (00040000 = bit 18) is "manageability event detected". This suggests that something has upset the Intel management controller (which is not part of the NIC, but does attempt to use the NIC via various sideband mechanisms).

You could try disabling the Intel Management Engine (or whatever it happens to be called on your system) within the BIOS setup screen.

Michael
Visit this user's website Find all posts by this user
Quote this message in a reply
2017-09-06, 02:02
Post: #3
RE: second attempt DHCP failure
Thank you for the insight. I will give that a whirl tomorrow.
Find all posts by this user
Quote this message in a reply
2017-09-06, 20:18
Post: #4
RE: second attempt DHCP failure
I disabled all of the Intel Management features I could find. Unfortunately there was no discernible change. I will note that even in the successful boot state the suspicious messages are also present.

Successful Boot

Failed Boot - media delayed boot
Find all posts by this user
Quote this message in a reply
2017-09-14, 16:52
Post: #5
RE: second attempt DHCP failure
Any other debug I could run to hint at what may be happening? I'm seeing this issue on both HP and Lenovo hardware with Intel 200 series chip sets. Sad
Find all posts by this user
Quote this message in a reply
2017-09-14, 17:01
Post: #6
RE: second attempt DHCP failure
(2017-09-14 16:52)jov0 Wrote:  Any other debug I could run to hint at what may be happening? I'm seeing this issue on both HP and Lenovo hardware with Intel 200 series chip sets. Sad

You could build with DEBUG=netdevice:3. That will show you a message for every transmit, transmit completion, and receive: you should be able to figure out from this which parts (if any) of the datapath are still working. For example, you may see receives (from broadcast traffic on the network) even if iPXE is failing to transmit anything.

If you use DEBUG=netdevice:3,intel:3 then you will also see the status of the transmit and receive head/tail pointers as read back from the hardware. This may also give some clue as to the problem.

Michael
Visit this user's website Find all posts by this user
Quote this message in a reply
2017-09-18, 20:47
Post: #7
RE: second attempt DHCP failure
I started it up with an image using your recommended debug options: DEBUG=netdevice:3,intel:3. Took me back to Matrix movie when it was loading an image!

When I delayed the media to force a loop it consistently failed as expected. Google did not help me with the messages such as the link status. Thoughts?

delayed media connection
media detected start config
dhcp requests failing
Find all posts by this user
Quote this message in a reply
2017-09-19, 11:25
Post: #8
RE: second attempt DHCP failure
(2017-09-18 20:47)jov0 Wrote:  I started it up with an image using your recommended debug options: DEBUG=netdevice:3,intel:3. Took me back to Matrix movie when it was loading an image!

When I delayed the media to force a loop it consistently failed as expected. Google did not help me with the messages such as the link status. Thoughts?

delayed media connection
media detected start config
dhcp requests failing

The third screenshot shows that you are not getting transmit completions. The first screenshot may show that the receive datapath is still working (which would in turn show that the link is up correctly), but there's no way to know since you haven't said what was actually happening in the first screenshot (i.e. was this in the middle of a DHCP failure?).

Michael
Visit this user's website Find all posts by this user
Quote this message in a reply
2017-09-20, 03:22
Post: #9
RE: second attempt DHCP failure
I apologize for the lack of clarity.

The first screen shot is taken right after I plug in the network cable while it is waiting for link up. So it detects the link up as expected. I took it simply because of the interesting point that if I delay plugging the network cable in and see the 'waiting for link up' dialog, and then I connect the cable and it starts the autoboot/dhcp then it never boots up like shown in the second and third screen shots.

I included the second screen shot in the event that any of the INTEL debug messages such as the link status right before the autoboot starts were of note.

The third is simply watching the autoboot time out and fail.

And as a control I booted the computer up with media connected and it booted up without issue. The problem only occurs when there is a delay such as a media link up delay, or the first DHCP requests fails and it tries to loop it a second time.
Find all posts by this user
Quote this message in a reply
2017-09-27, 14:41
Post: #10
RE: second attempt DHCP failure
Still stumped - open to any ideas Sad
Find all posts by this user
Quote this message in a reply
2017-09-27, 14:56
Post: #11
RE: second attempt DHCP failure
(2017-09-27 14:41)jov0 Wrote:  Still stumped - open to any ideas Sad

Are you able to reproduce this failure on a PCI or PCIe plug-in NIC that I can easily buy online? If you can find me a direct Amazon (or similar) link then I'll get one and debug it locally.

Michael
Visit this user's website Find all posts by this user
Quote this message in a reply
2017-09-27, 20:25 (This post was last modified: 2017-09-27 20:25 by jov0.)
Post: #12
RE: second attempt DHCP failure
Unfortunately they are tiny form factor machines from Lenovo and HP such as the Lenovo M700 and M710 Tiny and the HP M800 Elite. But if you are willing to take a look at it: I will gladly ship you one that you can permanently evaluate.
Find all posts by this user
Quote this message in a reply
2017-10-06, 18:53
Post: #13
RE: second attempt DHCP failure
Let me know if you are interested Big Grin
Find all posts by this user
Quote this message in a reply
2018-02-03, 20:53
Post: #14
RE: second attempt DHCP failure
Hopefully now fixed in http://git.ipxe.org/ipxe.git/commitdiff/546dd51.

Michael
Visit this user's website Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 1 Guest(s)