Post Reply 
 
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
HP BL460 Gen9 http download hangs
2016-07-11, 15:40
Post: #1
HP BL460 Gen9 http download hangs
I've encountered an issue with inconsistent booting of HP BL460c Gen9 Blades.

I'm using RHEL 7.0 x86_64 and the blades are using UEFI to boot. I have an xCAT master server which handles all the boot configuration, but the problem seems to be with iPXE.

What happens is that randomly, when booting as few as 12 nodes, iPXE will randomly hang when downloading the initrd via http from the boot server. It is completely random which nodes this will affect, and it happens across a variety of nodes or chassis that I have available to test with.

Upon a reboot, the affected node will generally boot just fine.

I initially thought this was a throughput problem with my boot server (HP DL380 Gen 9), though I can have an entire cluster (250+ nodes) execute a simultaneous download of the exact same file via wget and have the transfers complete successfully. I've even verified the file sizes on each node to ensure there weren't any transfer errors.

What's more is that later in the boot process, once the actual kernel is loaded, the rest of the stateless image (a significantly larger file) is transferred from the same server via http and there is never a problem despite having over 200 simultaneous connections/downloads.

I've attached a screenshot of where the process hangs each time. Just for additional info:

Boot server is running RHEL 7.0 x86_64
BL460c is using a 1Gb NIC to boot from
Boot Server is using two 10Gb NIC's in a LACP bond to serve data, however the problem persists exactly the same way when using a single 10Gb NIC, or even using a 1Gb NIC on the boot server.

I'd appreciate any information on what I can do to avoid having these random hangups.

Thanks!
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 2 Guest(s)