iPXE discussion forum

Full Version: ipxe timeout
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

While wrestling with pxe I suddenly came accros gpxe and not much later accross ipxe, so I was relieved and enthusiastic that something better than pxe finally showed up !!

It had been a struggle to get everything finally working, but with ipxe I'm still not getting through.

With gpxe I managed to get several scripts loading untill it wanted to load a menu and then it just hangs on a real pc, the virtualbox crashes with an error message (so the app, not the machine).

With ipxe I can't get past the stage where it loads the chained script from http or nfs.
I keeps throwing the same error 040ee119, which I believe is an tftp time-out ?
But even when I enter the command prompt and run ipconf or dhcp it returns the same error.

What could be wrong ?
(What config files do I post ?)
Have you actually read and followed the information on http://ipxe.org/err/040ee1 ? You should try to perform the tests at http://ipxe.org/dev/driver to ensure that your network card driver works consistently. Some drivers have changed quite a lot in the last 2-3 years since gpxe development was abandoned, so it's not so strange that some hardware might actually work with the old version but not with the newer. My guess is that you have Intel or Realtek network hardware, which has completely rewritten drivers. What is the PCI vendor/device id of your device? And after you're failed ifconf/dhcp attempt. What is the output from the 'ifstat' and 'route' commands?
(2014-02-14 09:56)robinsmidsrod Wrote: [ -> ]Have you actually read and followed the information on http://ipxe.org/err/040ee1 ? You should try to perform the tests at http://ipxe.org/dev/driver to ensure that your network card driver works consistently. Some drivers have changed quite a lot in the last 2-3 years since gpxe development was abandoned, so it's not so strange that some hardware might actually work with the old version but not with the newer. My guess is that you have Intel or Realtek network hardware, which has completely rewritten drivers. What is the PCI vendor/device id of your device? And after you're failed ifconf/dhcp attempt. What is the output from the 'ifstat' and 'route' commands?

I'm sorry, I indeed haven't... But I'm testing this on a Virtualbox
The ifstat and the route are giving me the expected output
To help you we'll need the actual output of the ifstat/route commands. There is multiple pieces of information in there that can shed some light on the problem.
Okay, I will post it later today...
Although I think I have an idea what the problem is, but I still don't know how to solve it.

I believe it's due to the time-out from the network bridge in Virtualbox.
I've found some info on using the command Vboxmanage to set the delay to zero, but I can't figure out the exact way to do it
While I realize this thread is a little outdated, I'm getting the same types of issues. With multiple systems.

I am writing on behalf of the FOG (Free Opensource Ghost) Community as we've just released and are using ipxe for PXE boot environment for the http options and dynamic capabilities from php.

All of that said, I can't possibly provide all of the information for every person having these issues. However, I, too, am receiving the 040ee119 error message from VirtualBox. (It's all I have to test with and I'm sorry.)

I have been able to create an ipxe.kpxe and everything works fine in VirtualBox using this. (Without needing to set delay)

Creating/Using the undionly.kpxe, however, does not work in a specific set of sequence. (Unless I follow the guidelines to set the bridged adapter DELAY=0)
o As I know how to reproduce I'm not trying to fix my error specifically, but narrow down possibilities of where the issue actually lies.

My current version number is: (f747a) from the git pull from (15-MAY-2014).

My Virtual Box that get's the timeout (indefinite if in retry loop) settings are as such if you have the capability to reproduce:

Operation System of Host:
Virtual Box 4.3.10 (93012) Hosted Headless from CentOS 6.5 (Final) with extension pack: Oracle_VM_VirtualBox_Extension_Pack-4.3.10-93012.vbox-extpack
No host-only or NAT networks with system.

GENERAL
----BASIC----
Type: Microsoft Windows
Version: Windows 7 (64 bit)
----ADVANCED----
Removable Media: Checked Remember Runtime Changes

SYSTEM
----Motherboard----
Base Memory: 2048 MB
Boot Order:
o Network
o CD/DVD-ROM
o Hard Disk
Chipset: PIIX3 (ICH9 Doesn't even boot the system.)
Extended Features: Enable IO APIC
----Processor----
Processor(s): 1
Execution Cap: 100%
----Acceleration----
Hardware Virtualization: Enable VT-x/AMD-V

DISPLAY
----Video----
Video Memory: 16MB
----Remote Display----
Enable Server Checked
Server Port: XXXX
Authentication Method: None
Authentication Timeout: 5000

STORAGE
Controller: SATA Type: AHCI Port Count: 2
guestname.vdi
emtpy cdrom.

NO AUDIO

NETWORK
----Adapter 1----
Enable Network Adapter
Attached to: Bridged Adapter
Name: br4
Advanced:
o Adapter Type: Intel PRO/1000 MT Server (82545EM) (all intels work fine for pxe boot, but the PC and virtio-net do not:
o MAC Address: XXXXXXXXXXXX
Promiscuous Mode: Deny
Cable connected.
----No Other adapters----

NO SERIAL PORTS

USB
Enable USB Controller
Enable USB 2.0 (EHCI) Controller

NO SHARED FOLDERS

That is just one of the systems, I have multiple, all using Intel NICs.

iPXE Configuration:
ipxescript contains for FOG setup:
#!ipxe
:retry
sync --timeout 500
dhcp || goto retry
chain default.ipxe

Changes to src/config/general.h are from fresh git clone:
Download Protocols enabled:
o DOWNLOAD_PROTO_HTTPS
o DOWNLOAD_PROTO_NFS
Image Types:
o IMAGE_BZIMAGE
o IMAGE_PNG
Command-line commands:
Commented IWMGMT_CMD
Commented FCMGMT_CMD
Un-commented PXE_CMD
Un-commented REBOOT_CMD
Un-commented POWEROFF_CMD
Un-commented PARAM_CMD
Un-commented CONSOLE_CMD

Changes to src/config/settings.h are from fresh git clone:
Un-commented VMWARE_SETTINGS

Changes to src/config/console.h are from fresh git clone:
Un-commented CONSOLE_VESAFB

To build undionly.kpxe I'm running:
make bin/undionly.kpxe EMBED=ipxescript

ifstat after the dhcp fails:
net0: XX:XX:XX:XX:XX:XX using undionly on UNDI-PCI00:11.0 (closed)
o [Link:up, TX:4 TXE:0 RX:0 RXE:0]

route after the dhcp fails:
net0: 10.0.8.1/255.0.0.0 gw 10.10.10.1 (inaccessible)

Hopefully this helps.
You haven't posted the contents of your embedded ipxe script, so it's a bit hard to be sure what's actually wrong. My guess is that you don't have either an ifopen or dhcp/ifconf command in your embedded script, which is why the interface stays in a closed state. I'm a regular user of VirtualBox, and I generally use ipxe.pxe which is compatible with all of the emulated NICs in VirtualBox. I would recommend using one of the intel vnics for best performance.

You should also upgrade to include commit ca93505 (3 days ago), as it contains a bugfix to the NFS download protocol, which I notice you have enabled.
I had the same error number. Rebuilding from source fixed it.
(2014-05-19 08:47)robinsmidsrod Wrote: [ -> ]You haven't posted the contents of your embedded ipxe script, so it's a bit hard to be sure what's actually wrong.
I'm very sorry about that.

iPXE Embedded Script
Code:
#!ipxe
:retry
sync --timeout 500
dhcp || goto retry
chain default.ipxe

That is the script. The sync --timeout 500 was an attempt, initially, thinking it was something to deal with a timing issue. I tried 5000, 300000, 120000, and other variations but the problem still occurs. I've also tried sleep commands in place of sync as I understand sync --timeout only waits for background jobs to complete for the specified period of time. If they complete before the timeout it'll return back to ipxe to begin with.

While waiting for my post to become "posted" I set and looked at all possible things that could've changed as the file:
http://sourceforge.net/p/freeghost/code/...format=raw (rev 1312) seems to work very consistently.

The only two files that, I think could be causing the issue, changed from the date of this file and seeing this error pop up was drivers/net/intel.c and drivers/net/intel.h.

While this file consistently works, as it doesn't have the drivers built into the undionly, my VirtualBox still gives issues. For that I don't mind using the ipxe.pxe style file.

(2014-05-19 08:47)robinsmidsrod Wrote: [ -> ]My guess is that you don't have either an ifopen or dhcp/ifconf command in your embedded script, which is why the interface stays in a closed state. I'm a regular user of VirtualBox, and I generally use ipxe.pxe which is compatible with all of the emulated NICs in VirtualBox. I would recommend using one of the intel vnics for best performance.

You should also upgrade to include commit ca93505 (3 days ago), as it contains a bugfix to the NFS download protocol, which I notice you have enabled.

At the time of writing this posting I was on the latest commit. I've no issues trying again, but although I compile the NFS into it (possibly for future use) we don't currently boot through it. It's more just a "better to have than not need it......" situation.

Am I correct with my understanding of the variances of undionly.kpxe and ipxe.kpxe? Ipxe.kpxe contains drivers where undionly.kpxe only has a minimal set?

Also, I tried compiling a new git pull using the "old" intel.c and intel.h drivers, but this doesn't seem to have helped though I'm still awaiting response.

Thank you for your time.
(2014-05-19 12:55)TheUltimateUnltd Wrote: [ -> ]I had the same error number. Rebuilding from source fixed it.

I only ever build from source. I did try the files from rom-o-matic.eu, but received the same errors from both files.
I think I've found out what and/or possibly why 040ee119 is occurring.

While we initially suspected it was drivers, today I spent most of the day building and testing the undionly.kpxe (problem still occurs on vbox, but easily fixed with ipxe.{.,k,kk}pxe.

I ran through git bisect and it reports to me:
Code:
69313edad85f8958acc8a47272b3c3da494835ec is the first bad commit
commit 69313edad85f8958acc8a47272b3c3da494835ec
Author: Michael Brown <mcb30@ipxe.org>
Date:   Sat May 3 12:53:20 2014 +0100

    [undi] Place an upper limit on the number of PXENV_UNDI_ISR calls per poll
    
    PXENV_UNDI_ISR calls may implicitly refill the underlying receive
    ring, and so could continue to retrieve packets indefinitely.  Place
    an upper limit on the number of calls to PXENV_UNDI_ISR per call to
    undinet_poll().
    
    Signed-off-by: Michael Brown <mcb30@ipxe.org>

:040000 040000 2333a1b71a6fe8fb3d1d9f9698c28eef7c28679c 758241dde56deccc902e12cc4990c985d0bc9d36 M      src

Looking at the commit within reveal's that only one file was edited for this commit

The exact changes were:
Code:
index d7a632d..82dd8d2 100644 (file)
--- a/src/arch/i386/drivers/net/undinet.c
+++ b/src/arch/i386/drivers/net/undinet.c
@@ -72,6 +72,9 @@ struct undi_nic {
/** Delay between retries of PXENV_UNDI_INITIALIZE */
#define UNDI_INITIALIZE_RETRY_DELAY_MS 200

+/** Maximum number of calls to PXENV_UNDI_ISR per poll */
+#define UNDI_POLL_QUOTA 4
+
/** Alignment of received frame payload */
#define UNDI_RX_ALIGN 16

@@ -328,6 +331,7 @@ static void undinet_poll ( struct net_device *netdev ) {
        struct undi_nic *undinic = netdev->priv;
        struct s_PXENV_UNDI_ISR undi_isr;
        struct io_buffer *iobuf = NULL;
+       unsigned int quota = UNDI_POLL_QUOTA;
        size_t len;
        size_t reserve_len;
        size_t frag_len;
@@ -366,7 +370,7 @@ static void undinet_poll ( struct net_device *netdev ) {
        }

        /* Run through the ISR loop */
-       while ( 1 ) {
+       while ( quota-- ) {
                profile_start ( &undinet_isr_call_profiler );
                if ( ( rc = pxeparent_call ( undinet_entry, PXENV_UNDI_ISR,
                                             &undi_isr,
While most of the people using this aren't having problems, the problems seem to occur most consistently on intel/broadcom physical NIC's and of course VBox as I've told above.

This leads me to think the two lines aren't working properly on these particular nics. Maybe it doesn't like the way ISR's are, now, being handled?

I'm going to test more tomorrow with removing these lines, and hopefully all work be more easily identifiable.
I'm running through 3 tests on f3d42 revision (latest as of today 30MAY2014)

First test is with UNDI_RX_QUOTA set to 10.

Second test is with UNDI_RX_QUOTA set to 20.

Third test is with UNDI_RX_QUOTA commented and while ( quota ) set as: while ( 1 ) as it was in f473, and commented the quota-- line on 427.

All the changes are specific to src/arch/i386/drivers/net/undinet.c.

I'll report back with more information later on.

Thank you,
RX_QUOTA at 10, does not work.

RX_QUOTA at 20, works.

RX_QUOTA at 15, works with one failure on VMWare.

RX_QUOTA AT 12, works on VMWare but fails on
NIC Intel 825xx Gigabit Ethernet (Intel® Boot Agent GE v1.3.52.1)
and
Intel 82566DM (Intel® Boot Agent PXE-2.1 build 086)

RX_QUOTA AT 13, Works on all hardware but fails on VMWare once.

Mind you these numbers are successive attempts to find viable working files.

The files we're testing with are located at:
http://mastacontrola.com/ipxe

The Document for recording our Success/Fails is at:
https://www.dropbox.com/s/p8noizwb1wcaeu...files.xlsx
Nice amount of debugging you've done. I've mentioned it to one of the core developers, let's see what he says about what you've discovered.
(2014-05-30 12:18)mastacontrola Wrote: [ -> ]I'm running through 3 tests on f3d42 revision (latest as of today 30MAY2014)

Could you try with the current master, which includes a fix for an issue known to result in DHCP failures (the 040ee119 error which you are seeing) on some networks.

Thanks,

Michael
(2014-06-02 10:49)mcb30 Wrote: [ -> ]
(2014-05-30 12:18)mastacontrola Wrote: [ -> ]I'm running through 3 tests on f3d42 revision (latest as of today 30MAY2014)

Could you try with the current master, which includes a fix for an issue known to result in DHCP failures (the 040ee119 error which you are seeing) on some networks.

Thanks,

Michael

Already saw, built and am waiting the testing results. Thank you much.
(2014-06-02 12:24)mastacontrola Wrote: [ -> ]
(2014-06-02 10:49)mcb30 Wrote: [ -> ]
(2014-05-30 12:18)mastacontrola Wrote: [ -> ]I'm running through 3 tests on f3d42 revision (latest as of today 30MAY2014)

Could you try with the current master, which includes a fix for an issue known to result in DHCP failures (the 040ee119 error which you are seeing) on some networks.

Thanks,

Michael

Already saw, built and am waiting the testing results. Thank you much.

So, from what I can tell, using the latest pull (d6300) from today and yesterday's pull (9f0b7) All seems to be functional again.

Thank you very much for the help.
Reference URL's