iPXE discussion forum

Full Version: windows 7 iscsi variable boot times - could ixpe be causing it?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

I'm opening this thread to ask for help and input on a very strange issue i've been experiencing with windows 7 iscsi boot with ipxe.

At this point, i'm not sure the issue is with ipxe but would like to gather your thoughts on it and possibly rule out ipxe.


So, i have the whole shebang setup, tftp + dhcpd + ipxe (undionly.kpxe) + tgtd(iscsi) + windows 7 Enterprise.

Everything works just fine, however there is one weird exception:

When i first boot a pc (from a power off state), windows take about 30s to boot. These 30s are tracked right after sanboot succeeds to the windows login screen (where one inputs login credentials).
At this point i can login into windows, everything works stable and with good performance.

Now comes the weird part: i reboot the pc (via windows restart), this time around windows take 2minutes to boot (from the same start & end metrics described above).
Subsequent reboot's (via windows restart) always take about 2m.
If instead of doing a restart, i tell windows to shut down the pc, the next reboot will take 30s.

In short, the first boot after the pc is powered on takes 30s but subsequent reboots (without powering off the pc) always take 2m. If the pc is powered off and on again, the first boot after that will once again be 30s and subsequent 2m (until another power off & on).


PC specs:
MB: Asus Gryphon Z87
Net: Intel Ethernet i217-V
CPU: Intel i5 4670 3.4Ghz
MEM: 2x4G KINGSTON
VGA: Nvidia 770GTX

Also, the pc's are diskless.


On the server side we're running Centos 6.2 on a Dell R420 server with 8 SSD's in HW RAID10 for storage and bonded NICs
tftp and dhcpd are all centos default versions.
tgtd is v1.0.46 and ipxe is git master from April 19th.


Strange things:

a. boot times are very consistent; it's either 30s (give or take a sec) or 2m (give or take a sec).
It's never 40s or 1m or 1m20s; times are always consistent.

b. When total boot time is 2m, it takes 7s for "Starting Windows" to appear on screen (after sanboot succeeds); then 1m30s for the windows logo to appear on screen; then 30s for windows login screen to show up. When boot time is 30s, "Starting windows" and windows logo appear on screen in less then 2s (after sanboot succeeds).


Things tried without any success:

1. After booting windows once, restart pc, boot into linux live cd and boot back into windows (without ever powering off the pc); no difference in windows boot times, still 2m.
2. Play around with BIOS settings on the pc (sata config (ahci,ide), c.m.a.r.t., fast boot, cpu states, disable uefi, power saving settings)
3. Try a few different pc's (though exactly same hardware)
4. Upgrade tgtd (initially running an older version)
5. Try another iscsi target software (tried LIO)
6. Upgrade ipxe (initially running a version from end 2013)
7. Use ipxe.pxe instead of undionly.kpxe
8. wireshark traffic looks normal (quick glance) without any tcp checksum errors or retransmits
9. break server NIC bonding
10. try different filesystem on server storage (running ZFS with zvol's; tried with disk img file on ext4 storage; no difference)
11. unbind ipv6 from windows network adapter and disable all offload settings

I've also found - http://forum.ipxe.org/showthread.php?tid=7216 - and disabled the Intel equivalent setting - Reduce Speed On Power Down - plus Receive Side Scaling. Still no difference.


The only success i had so far was to use a different computer (a prebuilt alienware desktop) with different hardware. With this pc, it boots consistently in 30s everytime.
However, this pc was just for testing and i can not be used for our purposes.

I really don't think the cause of this is server side, due to the fact that a pc power off & on make the boot time improve for the first boot. That is completely abstracted from the server.
This leads me to believe it's either ipxe having a bad interaction with the hardware or just faulty hardware/BIOS or something on those lines.

So, after sanboot suceedds, ipxe has presented a LUN to the BIOS and boots directly of it. It will keep control of the iscsi session and boot the kernel with the necessary drivers until the OS is ready to take over the iscsi session, which is when ipxe will unload.
Thinking back to what i've mentioned before, during the subsequent windows boots where the boot time is 2m, "Starting Windows" takes 7s to show up and the windows logo only appears at the 1m30s mark

Is there a way to tell exactly when ipxe unloads? Could ipxe be delaying windows from booting by taking much longer to relinquish control of the iscsi session to the OS? Since the pc had a regular reboot, maybe something didn't unload correctly from the first boot and causes some sort of a timeout during the next reboots? Since a hard power cycle brings the boot time to 30s i wonder if something is not unloading right or causing some sort of timeout to occur.
Is there any debug i can add to ipxe to help troubleshoot?


Lastly, apologies for the lengthly post. I tried to add as much info as i can and be concise at the same time. I hope you guys have some good insights on what might be happening here.
I'm opened to suggestions and have some leeway to run troubleshooting (this system is not live yet).


Thanks for reading all this. Any help appreciated.

Cheers.
(2014-04-23 20:57)bunoc Wrote: [ -> ]In short, the first boot after the pc is powered on takes 30s but subsequent reboots (without powering off the pc) always take 2m. If the pc is powered off and on again, the first boot after that will once again be 30s and subsequent 2m (until another power off & on).

Is the link speed (as reported by e.g. the switch LEDs) dropping to 100M for the 2m boots?

Quote:Is there a way to tell exactly when ipxe unloads? Could ipxe be delaying windows from booting by taking much longer to relinquish control of the iscsi session to the OS?

With an iSCSI boot, there is no explicit "unload" step. The OS bootloader (bootmgr.exe in the case of Windows 7) simply stops making calls to INT 13.

Michael
I have a similiar "problem", the "Windows flag" screen takes exactly 1.30 from the Point it shows "Starting Windows" until the screen disappear for the Windows login screen to show.
But this apples BOTH to Cold and Warm boots, eg no boot is faster for me.

Three key Points to make:
The screen shows off without any Windows flag logo, only the text "Starting Windows". (T=0.00)
At (T=1.00) the Windows flag appears and start animating (the 4 dots that join together into a Windows 7 Flag).
At (T=1.30), the screen becomes black. A few seconds later the Windows login screen appears.

The link speed for me is 1000MBIT (gigabit).

My conclusion: I Think this is because first, the windows boot from a virtual local device created by iPXE (Since Windows boot loader does not support iSCSI at all), eg iPXE hands over boot to Windows boot loader, and Windows Thinks its booting from a local drive, while all disk data is sent through iPXE, which in turn sends it to iSCSI target. When the actual Windows boot is taking over, the boot will be done with Windows 7 native iSCSI client, which will talk directly to iSCSI target without talking to iPXE, thus gaining a magnitude in speed increase, thats why the actual inside-Windows-Experience is pretty fast, and disk access have a Windows Experience Index of 6.5 for me. (7.9 is max)

The reason you get a fast first-boot might have with network card configuration to do. As you see in my thread, some motherboard to not do a complete chip reset on the network NVRAM configuration space when asked to do it: http://forum.ipxe.org/showthread.php?tid=7216
It might have to do with the motherboard chip-type/architecture (Z87).

My solution to the "problem":
I downloaded Windows 7 Boot Logo Changer:
http://www.coderforlife.com/projects/win7boot/#Download

Then I changed the boot logo text and logo so it explicity said: "The computer is downloading files from our server." "This may take up to 5 minutes." "Please be very patient!"
Note that the boot logo changer does only support 2 lines of text, but more text can be easily added by simply telling it to use "raw" mode where you set a background Picture containing text instead.
Note here that I told it that the computer may take up to 5 minutes instead of 1.5 minutes. By specifying a longer time than it really takes, the users will be more patient actually. They will be prepared to wait 5 minutes but it takes much faster.

This means it does not matter that the computer is slow at boot, and user will not Think it "hanged".
Michael,

You seem to be correct. I missed that.
Looking at the switch, after restarting windows the link speed is set at 10Mbps. And it continues to be so until the 1m30s mark, which is when it gets reset to 1Gbps and windows logo shows up.

I've disabled - Reduce link speed on power down - on the driver plus any other energy savings options but the link still keeps to 10Mbps after windows reboot.

I wonder, is it possible t have ipxe force a 1Gbps link speed or a card reset or something around those lines?

Thanks.
Hi,

It seems i have found the fix for this issue. A BIOS upgrade to the latest version (1802 as of now).

Since none of the driver options to disable reduce speed on power down and friends worked, i tried booting linux a few times for comparison and saw the exact same behavior. This showed the issue was not Windows specific.
I had previously done a BIOS upgrade before and didn't made any difference but when i tried again a few days ago it worked; and worked for a dozen or so computers (something must have gone wrong with the first BIOS upgrade).

The NIC still goes into 10Mbps when i reboot the PCs, however now it gets set to 1000Mbps when PXE kicks in (while before it would stay 10Mbps until half the OS was loaded).

Feel free to close this thread.

Thanks.
(2014-04-28 18:21)bunoc Wrote: [ -> ]It seems i have found the fix for this issue. A BIOS upgrade to the latest version (1802 as of now).

Glad you managed to get it fixed! Thanks for letting us know.

Michael
Reference URL's