iPXE discussion forum

Full Version: Chainloading linux in AWS EC2 HVM
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello

Anyone succeeded to chainload any operating system on AWS EC2 HVM instances?

According to this post, it is not possible because it requires low-level access to hypervisor but there is not much information why. Any developer could comment on limitations with AWS EC2 PV instance?

What about HVM instances? I tried different tests but can get the network working *after chainloading the kernel*.

To summarize the tests:
1/ compiled latest version of iPXE (ipxe.lkrn) with an embedded script to boot from a network server
2/ created an HVM instance
3/ configured Grub to boot from ipxe.lkrn
4/ created an ipxe script on the network server to control chainloading
5/ reboot the EC2 instance

What is working:
- iPXE load and can find network (netfront)
- iPXE can connect to my network server and download various stuff (kernel, initrd, other iPXE scripts, etc.)
- imgtrust compiled and enforcing (chainloading fails as expected if downloaded content signature don't match)
- kernel loads and find initrd
- kernel can find disk device
- kernel *seems* to find a network device but not working

I tried CentOS 7.0, 7.1, 7.2 (and RHEL 7.2), CoreOS, RancherOS but none of them can bring up the network device.

If using DHCP:
- CentOS/RHEL distribution show "RTNETLINK answers: file exists" and fails back to the emergency shell
- CoreOS show nothing
If using Static IP (with kernel arguments ip=${net0/ip}:...):
- CentOS/RHEL distribution show "network unreachable"

Any idea/workaround?
Thanks for assistance!
(2016-01-14 21:37)plachance Wrote: [ -> ]Anyone succeeded to chainload any operating system on AWS EC2 HVM instances?

According to this post, it is not possible because it requires low-level access to hypervisor but there is not much information why. Any developer could comment on limitations with AWS EC2 PV instance?

That post is incorrect. iPXE does (as you discovered) have native support for the Xen netfront NIC as used in EC2 HVM.

There is no support for PV. The netfront NIC is the same, but PV is effectively a different firmware platform: the differences between HVM and PV are similar to the differences between BIOS and UEFI as far as iPXE is concerned. We just don't support PV as a platform.

Quote:To summarize the tests:
1/ compiled latest version of iPXE (ipxe.lkrn) with an embedded script to boot from a network server
2/ created an HVM instance
3/ configured Grub to boot from ipxe.lkrn
4/ created an ipxe script on the network server to control chainloading
5/ reboot the EC2 instance

That's similar to what I've used. The only real difference is that there was no GRUB in my setup (just bin/ipxe.usb written directly to a 1GB EBS root disk).

Also, I usually build bin/ipxe.usb with an embedded script that does:

Code:
#!ipxe
echo Amazon EC2 - iPXE boot via user-data
ifstat ||
dhcp ||
route ||
chain -ar http://169.254.169.254/latest/user-data

This allows me to control the boot process by configuring the EC2 instance "user-data", without needing to rebuild with a new embedded script.

I also tend to enable CONSOLE_SYSLOG and CONSOLE_INT13, both of which are useful for getting hold of output in EC2. (The EC2 "system log" which theoretically shows the serial port output has been unreliable for me.)

Quote:What is working:
- iPXE load and can find network (netfront)
- iPXE can connect to my network server and download various stuff (kernel, initrd, other iPXE scripts, etc.)
- imgtrust compiled and enforcing (chainloading fails as expected if downloaded content signature don't match)
- kernel loads and find initrd
- kernel can find disk device
- kernel *seems* to find a network device but not working

I tried CentOS 7.0, 7.1, 7.2 (and RHEL 7.2), CoreOS, RancherOS but none of them can bring up the network device.

I did encounter some issues with the booted OS correctly initialising the netfront NIC after it had been used by iPXE. This was with older versions of CentOS, where the netfront driver did not gracefully handle finding the NIC in an unexpected state.

You probably need to start hacking the initrd and/or kernel to print out extra debug information about the state of the netfront NIC, to find out what's going wrong in your setup. Unfortunately it's almost impossible to interact with an EC2 VM until the network is up, so you're limited to adding debug prints. (Alternatively, you could try to reproduce the problem in a local Xen instance, where you would have console access.)

Good luck!

Michael
(2016-01-15 08:24)mcb30 Wrote: [ -> ]That's similar to what I've used. The only real difference is that there was no GRUB in my setup (just bin/ipxe.usb written directly to a 1GB EBS root disk).

Also, I usually build bin/ipxe.usb with an embedded script that does:

Code:
#!ipxe
echo Amazon EC2 - iPXE boot via user-data
ifstat ||
dhcp ||
route ||
chain -ar http://169.254.169.254/latest/user-data

This allows me to control the boot process by configuring the EC2 instance "user-data", without needing to rebuild with a new embedded script.

I tried this exact set up without success with the following in the user-data Angry

Code:
#!ipxe
set base_uri http://stable.release.core-os.net
set repo amd64-usr
set liveos_path current
set kernel coreos_production_pxe.vmlinuz
set initramfs coreos_production_pxe_image.cpio.gz
set kernel_params cloud-config-url=<path_to_coreos.cloudconfig>

kernel ${base_uri}/${repo}/${liveos_path}/${kernel}
initrd ${base_uri}/${repo}/${liveos_path}/${initramfs}
imgargs ${kernel} ${kernel_params}
boot ${kernel}

Still not network after loading kernel on EC2 (ipxe.usb) and soyoustart server (ipxe.lkrn) when I compile latests iPXE releases but it works on SoYouStart with their "iPXE netboot script" feature based on iPXE/1.0.0+ (ddd1) Huh

Can it be related to my iPXE compile options?
Did you try chainloading on EC2 with newer/newest releases of iPXE?
I wanted to try compiling iPXE/1.0.0+ (ddd1) myself and use it on EC2 but can't find the commit ID.
[ipxe/src] $ git log | grep ddd1
commit 230f16538f4b0ad9ddd1edd7da24c52c39da0c8d
commit be0cd1cddd18a29264091fabc69bf2ec7a1f2cd2

I tried an old release (branch 53d2d9e) but same problem.

Any clue? Do you remember which iPXE release and script worked?
Thanks again for your help!
Patrice
(2016-01-18 19:01)plachance Wrote: [ -> ]Still not network after loading kernel on EC2

Are you sure that the initrd image includes the Xen netfront driver?

Quote:Can it be related to my iPXE compile options?

Try building as per the instructions in http://git.ipxe.org/ipxe.git/commitdiff/cc25260 (pushed earlier today).

Quote:I wanted to try compiling iPXE/1.0.0+ (ddd1) myself and use it on EC2 but can't find the commit ID.

Ask SoYouStart to provide the source (as required by the GPL).

Michael
Thanks for the quick response!

(2016-01-18 23:00)mcb30 Wrote: [ -> ]Are you sure that the initrd image includes the Xen netfront driver?

Yes, the system log shows that xen_netfront and xen_blockfront are loaded.
Code:
[    5.808330] xen_netfront: Initialising Xen virtual ethernet driver
::::::
[    5.881939] blkfront: xvda: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: enabled;
[    5.884958]  xvda: xvda3 xvda4

Also, the same CoreOS example provided in the previous post works on SoYouStart using their release of iPXE but not using the latest one I compile myself.

Quote:Try building as per the instructions in http://git.ipxe.org/ipxe.git/commitdiff/cc25260 (pushed earlier today).

Edit1: Exactly the same problem. iPXE loads as expected and finds the kernel and initrd from the boot server.

Code:
iPXE LOG

iPXE initialising devices...ok

iPXE 1.0.0+ (3c26) -- Open Source Network Boot Firmware -- http://ipxe.org
Features: DNS HTTP iSCSI TFTP SRP AoE ELF MBOOT PXE bzImage Menu PXEXT
Amazon EC2 - iPXE boot via user-data
net0: 0a:dd:a6:cb:d5:b3 using netfront on vif/0 (closed)
  [Link:down, TX:0 TXE:0 RX:0 RXE:0]
  [Link status: Down (http://ipxe.org/38086101)]
Configuring (net0 0a:dd:a6:cb:d5:b3)...... ok
net0: 172.31.47.39/255.255.240.0 gw 172.31.32.1
http://169.254.169.254/latest/user-data... ok
http://mirror.centos.org/centos-7/7/os/x86_64/isolinux/vmlinuz... ok
http://mirror.centos.org/centos-7/7/os/x86_64/isolinux/initrd.img... ok

Server output log available here.

Kernel command line:
Code:
console=hvc0 console=tty0 console=ttyS0 ip=dhcp inst.repo=http://mirror.centos.org/centos-7/7/os/x86_64 inst.vnc inst.vncpassword=XXXXXXXX inst.headless inst.lang=en_US inst.keymap=us

I tried force loading netfront driver by adding
Code:
rd.driver.pre=xen_netfront,xennet,netfront
but the problem is the same.
Quote:Ask SoYouStart to provide the source (as required by the GPL).
Edit1: I did but didn't get it yet. Apparently they only patch the code to avoid replying to ping requests.
(2016-01-18 23:17)plachance Wrote: [ -> ]Server output log available here.

Thanks for the log. The xenbus enumeration is definitely working since we see a "vif/0" message in the kernel output. Unfortunately I don't think the netfront driver provides anything more verbose than the single "xen_netfront: Initialising Xen virtual ethernet driver" message that you're already seeing.

Since you seem to be reliably reaching a controllable userspace, you could try patching the kernel and/or initrd to dump out more diagnostic information. For example, patching xenbus_xs.c to display all xenbus reads and writes would help.

Alternatively, you could try reproducing the problem in a local XenServer instance. That would allow you to examine the state using xenstore-ls from dom0, without having to patch the guest kernel/initrd.

Michael
(2016-01-27 14:31)mcb30 Wrote: [ -> ]Alternatively, you could try reproducing the problem in a local XenServer instance. That would allow you to examine the state using xenstore-ls from dom0, without having to patch the guest kernel/initrd.

It works as expected. I could successfully run the following sequence:
1. create the domain using "builder='hvm'"
2. boot it using ipxe.usb + bootstrap embedded (after dd if=ipxe.usb of=<path_to_lvm_dev>
3. chainload an ipxe.pxe + boot script embedded image from an http server
4. chainload kernel + initramfs and trigger kickstart installation

Xen Platform info
Code:
xen_version            : 4.6.1-1.el7
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64

bootstrap.ipxe
Code:
#!ipxe
dhcp &&
route
chain -ar http://<bootserver>/boot.pxe

boot.ipxe
Code:
#!ipxe
set base http://<bootserver>
set repo mirrors/centos/ftp.centos.org
set ose centos
set ver 7
set arch x86_64

dhcp &&
route

set params ip=dhcp inst.repo=${base}/${repo}/${ver}/os/${arch} inst.ks=${base}/ks/${ose}${ver}  console=hvc0 console=ttyS0 inst.cmdline inst.lang=en_US net.ifnames=0 biosdevname=0 edd=off

kernel ${base}/${ver}/os/${arch}/isolinux/vmlinuz
initrd ${base}/${ver}/os/${arch}/isolinux/initrd.img
imgargs vmlinuz ${params}
boot -ar

Boot sequence output
Code:
# xl create -c <path_to_domu_cfg_file>
Parsing config from <path_to_domu_cfg_file>
iPXE initialising devices...ok



iPXE 1.0.0+ (e2cf) -- Open Source Network Boot Firmware -- http://ipxe.org
Features: DNS HTTP HTTPS iSCSI TFTP SRP VLAN AoE ELF MBOOT NBI PXE SDI bzImage COMBOOT Menu PXEXT
Configuring (net0 00:50:56:04:47:4c)...... ok
net0: <ip>/255.255.255.240 gw <gateway>
http://<bootserver>/boot.pxe... ok
iPXE initialising devices...ok



iPXE 1.0.0+ (e2cf) -- Open Source Network Boot Firmware -- http://ipxe.org
Features: DNS HTTP HTTPS iSCSI TFTP SRP VLAN AoE ELF MBOOT NBI PXE SDI bzImage COMBOOT Menu PXEXT
Configuring (net0 00:50:56:04:47:4c)...... ok
net0: <ip>/255.255.255.240 gw <gateway>
http://<bootserver>/mirrors/centos/ftp.centos.org/7/os/x86_64/isolinux/vmlinuz... ok
http://<bootserver>/mirrors/centos/ftp.centos.org/7/os/x86_64/isolinux/initrd.img... ok
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.10.0-327.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #1 SMP Thu Nov 19 22:10:57 UTC 2015
:::::::
:::::::

As expected, ipxe stored on disk gets loaded, it can fetch and load a new ipxe.pxe from the bootserver and load the kernel and initramfs.

But using the same ipxe.usb and boot.pxe files on AWS EC2 (region: eu-west-1, instance type t2-medium) doesn't work. Here is the console output from AWS console, trimmed a little bit to remove control codes. The first line (ok) is certainly the output of ipxe.usb and the other lines, from the boot.pxe file downloaded from the bootserver.

Code:
ok
iPXE initialising devices...ok

iPXE 1.0.0+ (e2cf) -- Open Source Network Boot Firmware -- http://ipxe.org
Features: DNS HTTP HTTPS iSCSI TFTP SRP VLAN AoE ELF MBOOT NBI PXE SDI bzImage COMBOOT Menu PXEXT
Configuring (net0 06:4c:e3:f2:87:39).................. Error 0x040ee119 (http://ipxe.org/040ee119)

So it seems that ipxe can't chainload itself from the network on AWS.

Other test
A few weeks ago I made another test. I tried to create a chainload loop using the following ipxe script, expecting the sequence to loop infinitely. But it didn't!

Code:
#!ipxe
ifstat
dhcp
ifstat
sanboot --no-describe --drive 0x80 ||
sanboot --no-describe --drive 0x81 ||
exit

Console output

Code:
iPXE 1.0.0+ (3c26) -- Open Source Network Boot Firmware -- http://ipxe.org
Features: DNS HTTP iSCSI TFTP SRP AoE ELF MBOOT PXE bzImage Menu PXEXT
Amazon EC2 - iPXE boot via user-data
net0: 0a:dd:a6:cb:d5:b3 using netfront on vif/0 (closed)
  [Link:down, TX:0 TXE:0 RX:0 RXE:0]
  [Link status: Down (http://ipxe.org/38086101)]
Configuring (net0 0a:dd:a6:cb:d5:b3)...... ok
net0: 172.31.47.39/255.255.240.0 gw 172.31.32.1
http://169.254.169.254/latest/user-data... ok
net0: 0a:dd:a6:cb:d5:b3 using netfront on vif/0 (open)
  [Link:up, TX:6 TXE:0 RX:10 RXE:3]
  [RXE: 3 x "Operation not supported (http://ipxe.org/3c086003)"]
Configuring (net0 0a:dd:a6:cb:d5:b3)...... ok
net0: 0a:dd:a6:cb:d5:b3 using netfront on vif/0 (open)
  [Link:up, TX:10 TXE:0 RX:14 RXE:4]
  [RXE: 4 x "Operation not supported (http://ipxe.org/3c086003)"]
Booting from SAN device 0x80
iPXE initialising devices...ok



iPXE 1.0.0+ (3c26) -- Open Source Network Boot Firmware -- http://ipxe.org
Features: DNS HTTP iSCSI TFTP SRP AoE ELF MBOOT PXE bzImage Menu PXEXT
Amazon EC2 - iPXE boot via user-data
net0: 0a:dd:a6:cb:d5:b3 using netfront on vif/0 (closed)
  [Link:down, TX:0 TXE:0 RX:0 RXE:0]
  [Link status: Down (http://ipxe.org/38086101)]
Configuring (net0 0a:dd:a6:cb:d5:b3).................. Error 0x040ee119 (http://ipxe.org/040ee119)
http://169.254.169.254/latest/user-data... Network unreachable (http://ipxe.org/280a6011)
Boot from SAN device 0x80 failed: Operation canceled (http://ipxe.org/0b8080a0)
Booting from SAN device 0x81

ipxe load from disk at instance creation, can configure the network and reload itself from 0x80. But then it can't configure the network, fails to boot and (I presume), the first ipxe sequence continue and try to boot from 0x81 which fails because there is only one disk attached to the instance.

It doesn't seem related to kernel/initramfs but specific to AWS.
Can it be related to the Xen version (mine is 4.6.1 and AWS is 4.2) or AWS-specific Xen tuning preventing chainloading?
Do you have a working example of chainloading ipxe from ipxe on AWS?

Thanks again for your great work and your help on this issue!
Reference URL's