Post Reply 
 
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Problem with iPXE and AoE
2012-09-17, 16:46
Post: #1
Problem with iPXE and AoE
Hello,

I am trying to implement a small LAN of diskless clients using iPXE to boot from an AoE SAN device, and dnsmasq for DHCP and TFTP.

In order to make the image, I placed a hard drive in one of the clients and installed Debian Squeeze with XFCE (using two partitions in ext3, for root and swap), then placed the disk on the server and made an image of it with dd, using a block size of 512 bytes.

I then exported the image using vblade-persist:

Code:
vblade-persist setup 0 1 eth0 /var/storage/disks/d01.ima
vblade-persist auto all
vblade-persist restart all

Now, this is basically the dnsmasq setting I am using (so far I have configured only one client and image, for testing purposes)

Code:
interface=eth0
domain=mydomain.org
bogus-priv
filterwin2k
no-resolv
localise-queries
listen-address=192.168.1.1
server=192.168.0.2
cache-size=300
no-negcache
bogus-nxdomain=64.94.110.11
dhcp-range=set:lan,192.168.1.2,192.168.1.12,255.255.255.0,36h
dhcp-host=00:1A:2B:3C:4D:5E,set:d01,192.168.1.2,diskless01,infinite
cname=d01,diskless01
enable-tftp
tftp-root=/var/storage/ipxe/
dhcp-boot=tag:!ipxe,undionly.kpxe
dhcp-match=ipxe,175
dhcp-option=175,8:1:1
dhcp-option=tag:ipxe,tag:d01,17,"aoe:e0.1"
log-facility=/var/log/dnsmasq/dnsmasq.log
log-dhcp
log-queries
log-async=20

The problem is that although the client does begin to boot from the network, iPXE does not detect the SAN device.

However, the odd thing is that if I put the hard disk in the client and run the commands aoe-stat and aoeping, they seem to respond properly, so I asume the image is being served correctly.

I also burned the iPXE ISO on a CD and booted from it, and in this case the client does boots and detects the SAN device, but the boot process is interrupted when Linux tries to mount the root partition, because apparently it cannot find it. I modified the fstab to reference particions the former way instead of by UUID, but the same thing happens.

I am running out of ideas right now, so I would appreciate any help.
Find all posts by this user
Quote this message in a reply
2012-09-18, 11:25
Post: #2
RE: Problem with iPXE and AoE
(2012-09-17 16:46)Nestor Wrote:  The problem is that although the client does begin to boot from the network, iPXE does not detect the SAN device.

Could you be more specific? What does iPXE display as the "Next server", "Filename" and "Root path"? Does it attempt to boot from the AoE root path? Is there an error message? What (exactly) is the error message?

Quote:I also burned the iPXE ISO on a CD and booted from it, and in this case the client does boots and detects the SAN device, but the boot process is interrupted when Linux tries to mount the root partition, because apparently it cannot find it. I modified the fstab to reference particions the former way instead of by UUID, but the same thing happens.

Do you have the relevant kernel modules for AoE included in your initrd?

Michael
Visit this user's website Find all posts by this user
Quote this message in a reply
2012-09-18, 13:50 (This post was last modified: 2012-09-18 14:02 by Nestor.)
Post: #3
RE: Problem with iPXE and AoE
(2012-09-18 11:25)mcb30 Wrote:  
(2012-09-17 16:46)Nestor Wrote:  The problem is that although the client does begin to boot from the network, iPXE does not detect the SAN device.

Could you be more specific? What does iPXE display as the "Next server", "Filename" and "Root path"? Does it attempt to boot from the AoE root path? Is there an error message? What (exactly) is the error message?

Allright, here it is:
Code:
iPXE 1.0.0+ (8e4faa) -- Open Source Network Boot Firmware -- http://ipxe.org
Features: HTTP iSCSI DNS TFTP AoE bzImage ELF MBOOT PXE PXEXT Menu

net0: 00:1c:c0:57:7a:40 using undionly on UNDI-PCI00:04.0 (open)
  [Link:up, TX:0 TXE:0 RX:0 RXE:0]
DHCP (net0 00:1c:c0:57:7a:40)...... ok
net0: 192.168.1.2/255.255.255.0 gw 192.168.1.1
Next server: 192.168.1.1
Root path: aoe:e0.1
Could not open SAN device: Connection timed out (http://ipxe.org/4c006035)
No more network devices

mcb30 Wrote:Do you have the relevant kernel modules for AoE included in your initrd?

Hmm... I was under the impression that iPXE already had support for AoE. Both the server and image certainly have all the required kernel modules and packages. Do I need to take extra steps then? If so, please point me to the relevant manual.

P.S. In the original message, I used in dnsmasq settings a fictitious MAC address (00:1A:2B:3C:4D:5E) for illustration purposes only, but the actual configuration file has the correct MAC address, as seen in the log.
Find all posts by this user
Quote this message in a reply
2012-09-18, 14:02
Post: #4
RE: Problem with iPXE and AoE
(2012-09-18 13:50)Nestor Wrote:  
Code:
iPXE 1.0.0+ (8e4faa) -- Open Source Network Boot Firmware -- http://ipxe.org
Features: HTTP iSCSI DNS TFTP AoE bzImage ELF MBOOT PXE PXEXT Menu

net0: 00:1c:c0:57:7a:40 using undionly on UNDI-PCI00:04.0 (open)
  [Link:up, TX:0 TXE:0 RX:0 RXE:0]
DHCP (net0 00:1c:c0:57:7a:40)...... ok
net0: 192.168.1.2/255.255.255.0 gw 192.168.1.1
Next server: 192.168.1.1
Root path: aoe:e0.1
Could not open SAN device: Connection timed out (http://ipxe.org/4c006035)
No more network devices

It looks as though your AoE target is simply not responding. You could try capturing a packet trace to see what's really happening.

Quote:
mcb30 Wrote:Do you have the relevant kernel modules for AoE included in your initrd?
Hmm... I was under the impression that iPXE already had support for AoE. Both the server and image certainly have all the required kernel modules and packages. Do I need to take extra steps then? If so, please point me to the relevant manual.

iPXE does already have support for AoE. However, your loaded kernel+initrd also needs to have its own support for AoE, otherwise Linux won't be able to connect to the AoE target in order to mount the root filesystem.

Michael
Visit this user's website Find all posts by this user
Quote this message in a reply
2012-09-18, 14:15 (This post was last modified: 2012-09-18 14:23 by Nestor.)
Post: #5
RE: Problem with iPXE and AoE
mcb30 Wrote:It looks as though your AoE target is simply not responding. You could try capturing a packet trace to see what's really happening.

I doubt this is the case, because when I booted from the client with the hard drive installed and used the commands aoe-stat and aoeping, they both produced the correct reply from the AoE device, as expected. Also, if I boot from a CD with the iPXE ISO, it loads the AoE device, otherwise GRUB would not even appear. Unless by "AoE target" you are referring to something else.

mcb30 Wrote:
Nestor Wrote:Hmm... I was under the impression that iPXE already had support for AoE. Both the server and image certainly have all the required kernel modules and packages. Do I need to take extra steps then? If so, please point me to the relevant manual.

iPXE does already have support for AoE. However, your loaded kernel+initrd also needs to have its own support for AoE, otherwise Linux won't be able to connect to the AoE target in order to mount the root filesystem.

Please define to what exactly you refer by "Loaded kernel+initrd", and also I would appreciate if you pointed me to the procedure, because I have been using the default unpatched undionly.kpxe file, under the assumption that it already had everything required to boot from an AoE device.
Find all posts by this user
Quote this message in a reply
2012-09-18, 19:01
Post: #6
RE: Problem with iPXE and AoE
(2012-09-18 14:15)Nestor Wrote:  
mcb30 Wrote:It looks as though your AoE target is simply not responding. You could try capturing a packet trace to see what's really happening.
I doubt this is the case, because when I booted from the client with the hard drive installed and used the commands aoe-stat and aoeping, they both produced the correct reply from the AoE device, as expected. Also, if I boot from a CD with the iPXE ISO, it loads the AoE device, otherwise GRUB would not even appear. Unless by "AoE target" you are referring to something else.

You could try capturing a packet trace to see what's really happening.

Quote:
mcb30 Wrote:iPXE does already have support for AoE. However, your loaded kernel+initrd also needs to have its own support for AoE, otherwise Linux won't be able to connect to the AoE target in order to mount the root filesystem.
Please define to what exactly you refer by "Loaded kernel+initrd", and also I would appreciate if you pointed me to the procedure, because I have been using the default unpatched undionly.kpxe file, under the assumption that it already had everything required to boot from an AoE device.

The default unpatched undionly.kpxe file is fine. iPXE already has AoE support; you don't need to do anything to the iPXE binary (which is undionly.kpxe).

However, the fact that iPXE is capable of booting from an AoE disk does not magically grant the same ability to the booted operating system (in your case Debian Squeeze). iPXE's AoE support is used to load the kernel and the initrd; after that, the loaded OS (Debian) takes over, and must establish its own connection to the AoE disk.

From a very quick search using terms such as "Debian AoE boot", it looks as though you may need to make some modifications to the base Debian install to make it AoE boot-capable.

Michael
Visit this user's website Find all posts by this user
Quote this message in a reply
2012-09-18, 22:28 (This post was last modified: 2012-09-18 22:34 by Nestor.)
Post: #7
RE: Problem with iPXE and AoE
mcb30 Wrote:From a very quick search using terms such as "Debian AoE boot", it looks as though you may need to make some modifications to the base Debian install to make it AoE boot-capable.

I did make sure that kernel module aoe was installed and loaded both in the server and in the disk image. Also, package aoetools is installed in both.

Something that puzzles me is the fact that if I boot the diskless client from a iPXE CD (burned from the ISO) at least it detects the AoE device and loads GRUB, but the same thing does not happen when using undionly.kpxe. What do you think about this, shouldn't undionly.kpxe produce a similar result to the bootable ISO?

Regarding packet capture, I would prefer avoiding Wireshark installation unless there is no other option, because I dislike installing graphical mode in servers. AFAIK wireshark has a GTK+ dependency, and I don't know if using tcpdump with AoE (which does not use TCP/IP) is worthwhile.
Find all posts by this user
Quote this message in a reply
2012-09-19, 01:31
Post: #8
RE: Problem with iPXE and AoE
(2012-09-18 22:28)Nestor Wrote:  
mcb30 Wrote:From a very quick search using terms such as "Debian AoE boot", it looks as though you may need to make some modifications to the base Debian install to make it AoE boot-capable.
I did make sure that kernel module aoe was installed and loaded both in the server and in the disk image. Also, package aoetools is installed in both.

That may not be enough. AoE support needs to be included within the initrd, otherwise the system won't be able to connect to its own root filesystem. (Consider what happens once the kernel starts up: it doesn't help that aoe.ko is available in /lib/modules since /lib is not accessible until the root filesystem has been mounted, and the root filesystem is on an AoE disk.)

Quote:Something that puzzles me is the fact that if I boot the diskless client from a iPXE CD (burned from the ISO) at least it detects the AoE device and loads GRUB, but the same thing does not happen when using undionly.kpxe. What do you think about this, shouldn't undionly.kpxe produce a similar result to the bootable ISO?

Yes, it should produce a similar result.

Quote:Regarding packet capture, I would prefer avoiding Wireshark installation unless there is no other option, because I dislike installing graphical mode in servers. AFAIK wireshark has a GTK+ dependency, and I don't know if using tcpdump with AoE (which does not use TCP/IP) is worthwhile.

tcpdump is fine. Capture the packets using something like

Code:
tcpdump -s 0 -i ethN -w tcpdump.out

You can copy the tcpdump.out file to a desktop machine, and open it with wireshark there.

Michael
Visit this user's website Find all posts by this user
Quote this message in a reply
2012-09-19, 14:38 (This post was last modified: 2012-09-19 14:42 by Nestor.)
Post: #9
RE: Problem with iPXE and AoE
mcb30 Wrote:That may not be enough. AoE support needs to be included within the initrd, otherwise the system won't be able to connect to its own root filesystem. (Consider what happens once the kernel starts up: it doesn't help that aoe.ko is available in /lib/modules since /lib is not accessible until the root filesystem has been mounted, and the root filesystem is on an AoE disk.)

I see. Well, I will post exactly what I have just done to check (booting the client from the hard disk out of which I made the image)

Code:
mkdir /var/tmp/initrd && cp /boot/initrd.img-2.6.32-5-686 /var/tmp/initrd
cd /var/tmp/initrd && mv initrd.img-2.6.32-5-686 initrd.img-2.6.32-5-686.gz
gunzip initrd.img-2.6.32-5-686.gz && mkdir unpacked && cd unpacked
cpio -id < ../initrd.img-2.6.32-5-686
cd /lib/modules/2.6.32-5-686/kernel/drivers/block
ls -laR aoe

This is the result:

Code:
drwxr-xr-x 1 root root   4096 sep 19 09:18 .
drwxr-xr-x 1 root root   4096 sep 19 09:18 ..
-rw-r--r-- 1 root root  32760 sep 19 09:18 aoe.ko

Doesn't this mean that the required kernel module is in place?

This is the content of /conf/initramfs.conf :

Code:
MODULES=most
BUSYBOX=y
KEYMAP=n
COMPRESS=gzip
BOOT=local
DEVICE=
NFSROOT=auto

This is the content of /config/modules :

Code:
unix

And this is the content of /conf/conf.d/resume :

Code:
RESUME=UUID=21a0e5ba-e084-4937-84d9-bf737462e974

Do you see any problem? What other thing do I need to check? (I will try to make a packet capture in a while and post results)
Find all posts by this user
Quote this message in a reply
2012-09-20, 13:56 (This post was last modified: 2012-09-20 13:57 by Nestor.)
Post: #10
RE: Problem with iPXE and AoE
Allright, I captured packet traces of both booting from undionly.kpxe and from a CD. There are several packets in each dump, and I am not familiar enough with iPXE and SAN/AoE so as to correctly interpret them and pinpoint the error. Please tell me how should I provide this information so you can better analyze it.
Find all posts by this user
Quote this message in a reply
2012-09-20, 16:33
Post: #11
RE: Problem with iPXE and AoE
Short update: since the inconsistency in behavior between undionly.kpxe and the ISO was suspicious, I made a test replacing the executable with gpxe-1.0.1-undionly.kpxe from gPXE project, and this time it booted like the iPXE ISO, at least to the GRUB part. I still need to resolve the problem why the root partition cannot be mounted from the AoE device, but at least this confirms my suspicion that there is some problem with the current version of undionly.kpxe
Find all posts by this user
Quote this message in a reply
2012-09-21, 13:55
Post: #12
RE: Problem with iPXE and AoE
That problem you're having with undionly.kpxe (and not with ipxe.iso) is probably that undi only uses the network stack (PXE) of your network card. It could be that your network card's PXE stack is not behaving well together with undionly.kpxe. You could try using ipxe.pxe instead of undionly.kpxe (when chainloading) to see if native drivers make any difference.
Visit this user's website Find all posts by this user
Quote this message in a reply
2012-09-21, 17:14
Post: #13
RE: Problem with iPXE and AoE
(2012-09-21 13:55)robinsmidsrod Wrote:  That problem you're having with undionly.kpxe (and not with ipxe.iso) is probably that undi only uses the network stack (PXE) of your network card. It could be that your network card's PXE stack is not behaving well together with undionly.kpxe. You could try using ipxe.pxe instead of undionly.kpxe (when chainloading) to see if native drivers make any difference.

Thanks, I will try. However, I thought that iPXE was somehow a continuation or fork of gPXE, which seems to be working right now. This why I assumed there was an issue with undionly.kpxe
Find all posts by this user
Quote this message in a reply
2012-09-23, 12:47
Post: #14
RE: Problem with iPXE and AoE
If gPXE works and iPXE doesn't (with the same compilation target), then you might've found a bug. If you use the procedure described in http://ipxe.org/howto/bisect to figure out which commit made it stop working you should forward the information it outputs to the mailing-list. Maybe the developers can help you figure it out.
Visit this user's website Find all posts by this user
Quote this message in a reply
2012-09-23, 14:37
Post: #15
RE: Problem with iPXE and AoE
Unfortunately I do not have enough time right now to build and test each commit (maybe in a few days when workload is lower), I simply needed to have the thing working by this week, which I have already accomplished. In case you are interested, in order for Debian to mount the root partition, I needed to modify initrd as Michael suggested (in /conf/initramfs.conf I used BOOT=aoe and then added /scripts/aoe which I found online)
Find all posts by this user
Quote this message in a reply
2012-10-01, 02:53
Post: #16
RE: Problem with iPXE and AoE
Hi Nestor,

Would it be possible for you to post your configuration/setup? I haven't gotten this working myself yet. What's out there on google search seems to be quite outdated or incomplete. Smile


(2012-09-23 14:37)Nestor Wrote:  Unfortunately I do not have enough time right now to build and test each commit (maybe in a few days when workload is lower), I simply needed to have the thing working by this week, which I have already accomplished. In case you are interested, in order for Debian to mount the root partition, I needed to modify initrd as Michael suggested (in /conf/initramfs.conf I used BOOT=aoe and then added /scripts/aoe which I found online)
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 2 Guest(s)