iPXE discussion forum

Full Version: ESXi 5.5 Install (via iPXE using Razor) - Failed to free base memory
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Greetings,

I'm having some challenges with Razor's default vmware_esxi task when migrating off of DAS and onto a ISC-DHCP provide iSCSI boot volume. I'd like to use Razor to facilitate bare metal provisioning and pass the final configuration over to an ESM like Chef.

The x86 compute node (supermicro X8DTT-H motherboard, Intel X540 NIC), connects to a couple of 10G .1q Trunks. The native VLAN on this trunk is 2003. A razor VM is installed on 10.136.0.10, VLAN 136. A DHCP relay between VLAN 2003 and 136 is in place. ISC-DHCP is also 10.136.0.10 to provide a scope for 10.200.3.0/24.

When the node PXE boots, it is given an IP of 10.200.3.10 via DCHP/BOOTP, the node is also given an iSCSI target, IQN, filename (bootstrap.ipxe) and a "next-server" of 10.136.0.10. Bootstrap.ipxe is loaded from 10.136.0.10. An IQN provided by DHCP is sucessfully used to register a SAN device at 0x80.

On _first boot_, Razor's pushes a Fedora19 based image to the new node. The image discovers "facts" about the new node - what's type of h/w it is; CPU, RAM, Manufactuer, UUID etc..etc and passes this back to the Razor, prior to a reboot.

Facts registered about the new node are used to create tags, environment specific meta-data can been associated with tags. Tags are used to match policies, policies are associated with installer operations (tasks). After the node has been factored, Razor applies a tag, some meta-data, a policy and a task is installed. For this node, I'm using the default vmware_esxi.task, which comes with the latest version of razor.

I'm getting the following error message near the end of the kickstart process, which registers as completing, but obviously doesn't!

"Failed to free base memory error 0015-995f0943 (570/584K)" at the top of a black screen.

The installer hangs.

Consequent boots, since the the kickstart section has been completed (from Razor's point of view), result in the node attempting to boot from the SAN volume, which fail whale.

I have screenshots and a video of the boot process if anyone is interested.

Happy to share configs...

to trying the SAN boot option, I was installing to DAS (LSI controller 2-way mirror) and all was well.

I'm sure someone has seen this error message before and hopefully has a simple fix for it.

Thanks!
From what I know that error message is not coming from iPXE, but probably some part of ESXi. Could you paste the iPXE script that is used to boot up the ESXi installer? That might make it possible for us to understand what's going on.
(2014-08-08 08:53)robinsmidsrod Wrote: [ -> ]From what I know that error message is not coming from iPXE, but probably some part of ESXi. Could you paste the iPXE script that is used to boot up the ESXi installer? That might make it possible for us to understand what's going on.

As requested...

<snip>

accepteula
install --firstdisk=local --overwritevmfs
rootpw vmware123
reboot

%include /tmp/networkconfig


%pre --interpreter=busybox


wget <%= stage_done_url("kickstart") %>
</snip>

Also, prior to the above.

root@BM1-QVSL-RAZR-03-001:~/razor-server/tasks/vmware_esxi.task# more pxelinux_esxi.cfg.erb
DEFAULT esxi
LABEL esxi
KERNEL <%= repo_url('/mboot.c32') %>
APPEND -c <%= file_url('boot.cfg') %>
IPAPPEND 2
root@BM1-QVSL-RAZR-03-001:~/razor-server/tasks/vmware_esxi.task# more boot.cfg.erb
<%=
# This code finds the `boot.cfg` file in the repository, parses it, and
# makes a set of modifications intended to make it net-boot from Razor.
#
# I chose this path because the boot.cfg file varies between releases of
# ESXi (such as 5.0 to 5.5), and could quite legitimately vary between
# smaller jumps than that -- after all, it is inside their boot loader,
# and they can change the construction of the installer CD without
# breaking anything...
#
# The alternative was to copy in the boot.cfg, modify it the way this code
# will do on the fly, and then use that. That is actually the exact
# process documented for setting up ESXi PXE boot -- statically make this
# change -- too.
#
# Given the changes are purely mechanical and well understood, this will
# do for the majority of cases without the risk / cost that a slight
# variant in the CD breaks everything, I think. --daniel 2013-10-14
(File.read(repo_file('/boot.cfg')) rescue '').split("\n").map do |line|
case line
when /^kernel=/
'kernel=' + repo_url(line.sub('kernel=', ''))
when /^modules=/
line.gsub('/', repo_url('/'))
when /^kernelopt=/
"kernelopt=ks=#{file_url('ks.cfg')}"
else
line
end
end.join("\n")
%>
root@BM1-QVSL-RAZR-03-001:~/razor-server/tasks/vmware_esxi.task#


I have some screenshots (PNG), but they don't like being uploaded for some reason. The ESXi installer yellow progress bare gets almost the end before it craps out.

For anyone wishing further topology data, I have a network diagram that shows L2/L3 details and such.

Thanks for any feedback on this!
For anyone following this thread, this appears to be a known issue with PXE if a larger than 1500 byte frames are received?

http://www.linuxquestions.org/questions/...xe-623714/

... tends to suggest a PXE issues vs ESXi installer issue. That seems odd since 9k frames if supported by the network, should be enabled by default... and for NexentaStor, its a per VIP setting. I don't want to setup a new VIP in NexentaStor for serving boot volumes and associating a new pool just for this use case.
(2014-08-11 14:14)onxis Wrote: [ -> ]For anyone following this thread, this appears to be a known issue with PXE if a larger than 1500 byte frames are received?

http://www.linuxquestions.org/questions/...xe-623714/

... tends to suggest a PXE issues vs ESXi installer issue. That seems odd since 9k frames if supported by the network, should be enabled by default... and for NexentaStor, its a per VIP setting. I don't want to setup a new VIP in NexentaStor for serving boot volumes and associating a new pool just for this use case.

You're probably looking at a Hardware-Level or UNDI stack limitation. I have jumbo frames enabled on my network, and in use, and don't get these errors. Are you doing Boot from SAN or something else?
Reference URL's