iPXE discussion forum

Full Version: could ipxe be causing slow windows boot speeds?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

I'm opening this thread to ask for help and input on a very strange issue i've been experiencing with windows 7 iscsi boot with ipxe.

At this point, i'm not sure the issue is with ipxe but would like to gather your thoughts on it and possibly rule out ipxe.


So, i have the whole shebang setup, tftp + dhcpd + ipxe (undionly.kpxe) + tgtd(iscsi) + windows 7 Enterprise.

Windows 7 clients boot of iscsi and everything works just fine.
However there is one weird exception:

When i first boot a pc (from a power off state), windows take about 30s to boot. These 30s are tracked right after sanboot succeeds to the windows login screen (where one inputs login credentials).
At this point i can login into windows, everything works stable and with good performance.

Now comes the weird part: i reboot the pc (via windows restart), this time around windows take 2minutes to boot (from the same start & end metrics described above).
Subsequent reboot's (via windows restart) always take about 2m.
If instead of doing a restart, i tell windows to shut down the pc, the next reboot will take 30s.

In short, the first boot after the pc is powered on takes 30s but subsequent reboots (without powering off the pc) always take 2m. If the pc is powered off and on again, the first boot after that will once again be 30s and subsequent 2m (until another power off & on).


PC specs:
MB: Asus Gryphon Z87
Net: Intel Ethernet i217-V
CPU: Intel i5 4670 3.4Ghz
MEM: 2x4G KINGSTON
VGA: Nvidia 770GTX

Also, the pc's are diskless.


On the server side we're running Centos 6.2 on a Dell R420 server with 8 SSD's in HW RAID10 for storage and bonded NICs
tftp and dhcpd are all centos default versions.
tgtd is v1.0.46 and ipxe is git master from April 19th.


Strange things:

a. boot times are very consistent; it's either 30s (give or take a sec) or 2m (give or take a sec).
It's never 40s or 1m or 1m20s; times are always consistent.

b. When total boot time is 2m, it takes 7s for "Starting Windows" to appear on screen (after sanboot succeeds); then 1m30s for the windows logo to appear on screen; then 30s for windows login screen to show up. When boot time is 30s, "Starting windows" and windows logo appear on screen in less then 2s (after sanboot succeeds).


Things tried without any success:

1. After booting windows once, restart pc, boot into linux live cd and boot back into windows (without ever powering off the pc); no difference in windows boot times, still 2m.
2. Play around with BIOS settings on the pc (sata config (ahci,ide), c.m.a.r.t., fast boot, cpu states, disable uefi, power saving settings)
3. Try a few different pc's (though exactly same hardware)
4. Upgrade tgtd (initially running an older version)
5. Try another iscsi target software (tried LIO)
6. Upgrade ipxe (initially running a version from end 2013)
7. Use ipxe.pxe instead of undionly.kpxe
8. wireshark traffic looks normal (quick glance) without any tcp checksum errors or retransmits
9. break server NIC bonding
10. try different filesystem on server storage (running ZFS with zvol's; tried with disk img file on ext4 storage; no difference)
11. unbind ipv6 from windows network adapter and disable all offload settings

I've also found - http://forum.ipxe.org/showthread.php?tid=7216 - and disabled the Intel equivalent setting - Reduce Speed On Power Down - plus Receive Side Scaling. Still no difference.


The only success i had so far was to use a different computer (a prebuilt alienware desktop) with different hardware. With this pc, it boots consistently in 30s everytime.
However, this pc was just for testing and i can not be used for our purposes.

I really don't think the cause of this is server side, due to the fact that a pc power off & on make the boot time improve for the first boot. That is completely abstracted from the server.
This leads me to believe it's either ipxe having a bad interaction with the hardware or just faulty hardware/BIOS or something on those lines.

So, after sanboot suceedds, ipxe has presented a LUN to the BIOS and boots directly of it. It will keep control of the iscsi session and boot the kernel with the necessary drivers until the OS is ready to take over the iscsi session, which is when ipxe will unload.
Thinking back to what i've mentioned before, during the subsequent windows boots where the boot time is 2m, "Starting Windows" takes 7s to show up and the windows logo only appears at the 1m30s mark

Is there a way to tell exactly when ipxe unloads? Could ipxe be delaying windows from booting by taking much longer to relinquish control of the iscsi session to the OS? Since the pc had a regular reboot, maybe something didn't unload correctly from the first boot and causes some sort of a timeout during the next reboots? Since a hard power cycle brings the boot time to 30s i wonder if something is not unloading right or causing some sort of timeout to occur.
Is there any debug i can add to ipxe to help troubleshoot?


Lastly, apologies for the lengthly post. I tried to add as much info as i can and be concise at the same time. I hope you guys have some good insights on what might be happening here.
I'm opened to suggestions and have some leeway to run troubleshooting (this system is not live yet).


Thanks for reading all this. Any help appreciated.

Cheers.
You can enable syslog logging in iPXE to get more exact information on what is happening without having access to the console. You can then build with DEBG=int13,scsi,iscsi to see the timestamp when iPXE no longer is communicating with the iSCSI server. That might give you some ideas.

I would capture packets on the iscsi server when the machine boots quickly and once when it boots slowly and carefully compare them to see if there is any difference in the iSCSI conversation between the two traces. I'd also try to increase the log-level in Windows and carefully look through the log files in generates about boot-time initialization (not sure how to do this, but you'll probably need to modify the BCD used).
Reference URL's