iPXE discussion forum
Howto: Boot from ipxe/iscsi target using ibft with debian and dracut based Linux - Printable Version

+- iPXE discussion forum (https://forum.ipxe.org)
+-- Forum: iPXE user forums (/forumdisplay.php?fid=1)
+--- Forum: General (/forumdisplay.php?fid=2)
+--- Thread: Howto: Boot from ipxe/iscsi target using ibft with debian and dracut based Linux (/showthread.php?tid=7914)



Howto: Boot from ipxe/iscsi target using ibft with debian and dracut based Linux - nomizfs - 2016-01-16 18:56

This guide is intended to be a short writeup of what i learned about booting from iscsi using ipxe.

At the time of writing this some time has passed since i implemented these settings on my systems, so a few details might be incorrect or missing, always check documentation yourself.. i'll try to include as many reference links i can remember.

The goal of my setup was to implement remote booting of diskless hypervisor nodes running dracut or debian based distro's, with the specific requirement that no host/node specific settings like ip addresses, paths, iscsi initiators or targets, need to be configured on the diskless node.

This enables setting up a infrastructure where physical baremetal nodes can be added and removed from a cluster or any setup requiring no configuration other than setting the node bios to boot from nic via pxe/ipxe, and adding the MAC address of the booting nic to your DHCP server. The name of the iscsi initiator and target and all ip settings are configured only once in ipxe menu, after which they automatically follow through the whole boot process automatically thanks to ibft. A standard iscsi boot process requires 4 logins to the iscsi target hosting the root fs.

After that the adminitrator has full control of the node remotely, using ssh, wake-on-lan magic packets, ipxe MAC/hostname based boot menu defaults, and iscsi target filtering/LUN management, administrator can instruct the node to boot from any type of block device that can be exposed via iscsi, and set different boot defaults based on various conditions using any shell scripts, and or ipxe scripting, never needing to physically visit the node.

After hundreds of reboots i can say that these methods when working correctly are 100% predictable from the viewpoint of the remote admin, meaning that every single boot has been successful and no non-reproducable booting errors or failures have occurred yet, barring hardware failures.


IMPORTANT NOTE:

--------------
Using other than intel brand nic's can mess with nic interface ordering, especially in debian. If you have nic brands other than intel, and need to boot debian, you must read the important note at the end of the guide in addition to the debian section.
--------------


STEP 1: Set up robinsmidsrod's ipxe scripts at https://gist.github.com/robinsmidsrod/2234639. Ask in #ipxe @ freenode or this thread if you have issues. You need to define the variables and defaults, and have the /boot directory for adding node defaults. Don't worry, that's actually the hard part. Setting up debian and dracut is the easy part Smile

Debian Jessie and Wheezy based distros:

Jessie:

1. Install standard debian distro, it doesn't matter how: bootstrap, install to iscsi target via debian installer, install to iscsi backed KVM volume, install to raw file, install to disk. As long as you cen export the resulting block device via iscsi you're good.

Easiest way is to start debian netinstaller in KVM/qemu VM with your working iscsi LUN target directly attached as kvm block storage.
(you can also run the installer with no local VM disk configured, only difference is you need to manually login to iscsi target after install, mount it, chroot into it to activate ibft booting)

2. After installation reboot into system and do as root:

Code:
apt-get install open-iscsi -y
echo "ISCSI_AUTO=true" > /etc/iscsi/iscsi.initramfs
nano /etc/initramfs-tools/initramfs.conf

scroll down and set DEVICE=eth0

^^^ IMPORTANT ^^^
A: this setting is only needed when you have more than one nic's in the system afaik, but i would set it anyway as system would otherwise become unbootable if more nics are added later.
B: this setting is CRITICAL if you run more than one nic AND/OR ESPECIALLY if you run other than intel brand nic, in which case you must read important notice at the end of this guide.

^^^ IMPORTANT ^^^

3.
Code:
update-initramfs -u

Done, your system will now boot from iscsi with static or dhcp based ip settings, which are defined only once serverside in ipxe boot scripts.

Wheezy:

The procedure is exactly the same as above with the only exception that changes are needed in halt and reboot scripts to prevent them from hanging while waiting for iscsi initiator to disconnect before unmounting root, which ofcourse is impossible since root is on the remote target it wants to disconnect.

this is the part where te reader has to do his or her own googling, as i can't remember exactly the needed changes, and have no wheezy to test at time of writing. All i remember is one switch, probably -f (force) needed to be added to /etc/init.d/reboot , (command is 'reboot -d -f -i' in jessie), and in /etc/init.d/halt edit NETDOWN=yes
or no, just flip it, i can't remember what it was as default.

Dracut based:

1. Install as usual to any block device, as with debian.
2. after installation do: (replace yum with dnf for fedora i think)

Code:
yum update && yum install -y iscsi-initiator-utils
echo "add_dracutmodules+="iscsi"" > /etc/dracut.conf.d/ibft.conf
nano /etc/default/grub

add the following 3 entries to GRUB_CMDLINE_LINUX=

Code:
"rd.iscsi.firmware=1 rd.iscsi.ibft=1 netroot=iscsi:ibft"

Code:
update-grub
(or grub-update or grub2-update or update-grub2, who cares anymore..)
Code:
dracut -v

Done.

notes:

Note 1: Assigning a host name to MAC address in DHCP server will result in robinsmidsrod's script to use hostname in iscsi initiator name (for example iqn.2007-09.jp.ne.peach.istgt:debian), which in conjuncture with ibft further improves and beautifies (!) the infrastructure, hostname and MAC are the only variables needed to be configured only once on the backend for each new node added to the network. Without DHCP assigned hostname initiator address will have MAC address (iqn.2007-09.jp.ne.peach.istgt:XX:XX:XX:XX:XX:XX).

Either way, set ipxe defaults for each node in ipxe menu /boot dir, maintain multiple default boot profiles for all nodes using multiple directories with appropriate default entries, or simply keep defaults static and edit the LUN in menu.ipxe:, or do both and add automation with CGI/Perl scripting and publish your scripts..

Code:
:iscsi-node-01
echo Booting iscsi node 01 for ${initiator-iqn}
set base-iscsi iscsi:${iscsi-server}::::${base-iqn}
set root-path ${base-iscsi}:iscsi-LUN-01
sanboot ${root-path} || goto failed
goto start

Note 2: if you want to add redundancy to you iscsi booting, you can add a loop in ipxe undionly.kpxe, or custom ROM that keeps querying the tftp and/or http server hosting the ipxe rom and menu.ipxe like this:

Code:
#!ipxe
goto dhcp_retry
:dhcp_retry
sleep 3
dhcp && goto chain_retry || goto dhcp_retry
:chain_retry
sleep 3
chain http://[your http server]/boot.ipxe && goto exit || goto chain_retry

Incase you have a powerout, when the power comes back on the DHCP/tftp/http servers might not be ready, the loop will allow the nodes to wait instead of failing to boot with a timeout. The DHCP loop only works if you can control how the BIOS handles DHCP timeouts incase DHCP server is unreachable. Only drawback is it causes a few second delay in script even if all servers are up.

Currently, i chainload undionly.kpxe for my integrated nics which can't be flashed, and i have replaced DHCP with ifopen to reduce total DHCP queries for the whole boot process to 1 (the nic PXE rom dhcp query), after which that ip follows up the chain using ibft. If chainloading ipxe and using DHCP like the example above, and no static host ip entry was defined on DHCP server, this would result in 2 DHCP leases even before we get to ipxe menu.

Note 3: chainloading undionly.ipxe from tftp, and setting all clients in the infrastructure to boot via iscsi this way, then defining a default to escape ipxe menu and continue BIOS boot for the nodes that are not booted from iscsi normally, would enable you to control the bootprocess of even the clients booting from local HDD's remotely if needed.

Note 4: For even more redundancy host the iscsi LUN's on a RBD or DRBD mirrored pool, set up tftp/http/iscsi servers for High Availability.

-------------
references:
http://linux.die.net/man/8/dracut
http://pve.proxmox.com/wiki/Proxmox_ISCSI_installation



-----------IMPORTANT NOTICE------------
For example adding a broadcom based nic to a node with one integrated intel nic, will probably flip the order of the nics at grub kernel boot, making the broadcom nic eth0 and the intel nic eth1. You need to be aware of this because in that case you need to change which MAC address is granted the DHCP configured hostname at boot, and also because in that case ipxe will see the intel MAC first, but after grub phase the linux kernel will switch the interfaces and thus the MAC and ip address of the client logging into the iscsi target _after_ grub phase will be different, meaning 2 entries for each node is needed in SAN firewall and/or iscsi target LUN access permissions.

During my testing this affected only debian when using other than all intel nics. luckily you can instruct initramfs which interface to use in netbooting, and thus combined with MAC settings on DHCP server, it's possible to mitigate this issue, which can be a real pain otherwise. Remember when setting DEVICE= in /etc/initramfs-tools/initramfs.conf , that it is the device used AFTER grub kernel stage, meaning if you boot via PXE/IPXE using the integrated intel nic, that nic would be eth1 in initramfs.conf, and the other brand additional nic would be eth0. Thus, setting this to eth1 in a machine with more than one nics of which some are non intel, combined with flipping the MAC address assigned for the hostname on the DHCP server, would result in a boot process where ony one DHCP lease, and one ip address is used, keeping the configuration persistent across infra. Ofcourse, the most bestest solution is to only use intel nics.
-----------^^^^^^^^^^^-----------