iPXE discussion forum

Full Version: Multicast support
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I've been trying to add multicast support to my ipxe implementation but this has not been working well for me.

First, i tried to enable MTFTP. At this point i'm a little confused because when a edit "general.h" and then this line to "#define DOWNLOAD_PROTO_TFTM", then I run "make bin/undionly.kpxe" and I get this message:

ld: bin/undionly.kpxe.tmp: hidden symbol `obj_tftm' isn't defined
ls: final link failed: Bad value
make: ** [bin/undionly.kpxe.tmp] Error 1

And so I can't try MTFTP. Any ideias why ?

Another issue is with SLAM. At ipxe.org I can't find any information about this but at http://etherboot.org/wiki/multicast, I do. But the mini-slamd file isn't at "contrib/" direcotry and I can't find it anywhere on my ipxe dir.

Help please Smile
(2012-05-09 00:17)r_b Wrote: [ -> ]First, i tried to enable MTFTP. At this point i'm a little confused because when a edit "general.h" and then this line to "#define DOWNLOAD_PROTO_TFTM", then I run "make bin/undionly.kpxe" and I get this message:

There's no separate enable for the multicast TFTP variants (TFTM and MTFTP). Support for these protocols is part of the general TFTP protocol support, and is enabled by default.

(2012-05-09 00:17)r_b Wrote: [ -> ]Another issue is with SLAM. At ipxe.org I can't find any information about this but at http://etherboot.org/wiki/multicast, I do. But the mini-slamd file isn't at "contrib/" direcotry and I can't find it anywhere on my ipxe dir.

SLAM is supported in iPXE, but I would recommend that you do not use it. You are likely to get equivalent performance from multicast TFTP.

In general, I would recommend using an efficient unicast protocol such as HTTP instead of multicast, unless you are planning to boot thousands of nodes simultaneously.

Michael
Sure. So if I just let TFTP enabled then I should be able to download the file specified in "filename" at my DHCP server (For exemple: filename "x-tftm://undionly.kpxe";). But this doesn't work. TFTP times ("PXE-E32: TFTP open timeout") out when I load my client machine.

But I do still want to use SLAM, for performance testing. But as I says on etherboot web site, a SLAM daemon (mini-slamd) is needed and I don't have this path: contrib/mini-slamd
(2012-05-09 18:17)r_b Wrote: [ -> ]Sure. So if I just let TFTP enabled then I should be able to download the file specified in "filename" at my DHCP server (For exemple: filename "x-tftm://undionly.kpxe";). But this doesn't work. TFTP times ("PXE-E32: TFTP open timeout") out when I load my client machine.

That error isn't coming from iPXE; it's coming from your OEM PXE stack. You need to use plain TFTP to load the iPXE binary (undionly.kpxe). Once you've loaded undionly.kpxe, then you can use any protocols supported by iPXE.

You can't use iPXE's features before you actually load iPXE! Smile

Quote:But I do still want to use SLAM, for performance testing. But as I says on etherboot web site, a SLAM daemon (mini-slamd) is needed and I don't have this path: contrib/mini-slamd

If you really want it, you can find mini-slamd in the original iPXE commit from 2005: https://git.ipxe.org/ipxe.git/snapshot/3...ce.tar.bz2.

Michael
[/b]
While testing with SLAM I keep getting this error messege: Connection timed out (http://ipxe.org/err/4c166035). I opened the link but can't figure out why is this happening.

Configured everything as explained here: http://etherboot.org/wiki/multicast
(2012-05-15 00:16)r_b Wrote: [ -> ]While testing with SLAM I keep getting this error messege: Connection timed out (http://ipxe.org/err/4c166035). I opened the link but can't figure out why is this happening.

You could try building with DEBUG=slam to see if there are any useful diagnostic messages.

Michael
After builing with DEBUG=slam the following message appears: slam 0x1e104 transmitting initial nack for blocks 0-3

Then, looked that this link: http://www.etherboot.org/api/slam_8c.html. Here I found this:

00270 /* Construct NACK. We always request only a single packet;
00271 * this allows us to force multicast-TFTP-style flow control
00272 * on the SLAM server, which will otherwise just blast the
00273 * data out as fast as it can. On a gigabit network, without
00274 * RX checksumming, this would inevitably cause packet drops.
00275 */
00276 first_block = bitmap_first_gap ( &slam->bitmap );
00277 for ( num_blocks = 1 ; ; num_blocks++ ) {
00278 if ( num_blocks >= SLAM_MAX_BLOCKS_PER_NACK )
00279 break;
00280 if ( ( first_block + num_blocks ) >= slam->num_blocks )
00281 break;
00282 if ( bitmap_test ( &slam->bitmap,
00283 ( first_block + num_blocks ) ) )
00284 break;
00285 }
00286 if ( first_block ) {
00287 DBGCP ( slam, "SLAM %p transmitting NACK for blocks "
00288 "%ld-%ld\n", slam, first_block,
00289 ( first_block + num_blocks - 1 ) );
00290 } else {
00291 DBGC ( slam, "SLAM %p transmitting initial NACK for blocks "
00292 "0-%ld\n", slam, ( num_blocks - 1 ) );
00293 }
00294 if ( ( rc = slam_put_value ( slam, iobuf, first_block ) ) != 0 )
00295 return rc;
00296 if ( ( rc = slam_put_value ( slam, iobuf, num_blocks ) ) != 0 )
00297 return rc;
00298 nul = iob_put ( iobuf, 1 );
00299 *nul = 0;

So, I changed atftpd port to run on 1758, as it is at Etherboot site: atftpd –daemon –port 1758 –mcast_addr 224.1.1.0-255 –mcast_port 1758 –mcast_ttl=1

But now I can't even download the initial undionly.kpxe file at my TFTP directory. I get: TFTP............
Actually, this has been a problem everytime I tried to use MTFTP, like I said in the 1st post.
Have you remembered to create a suitable multicast route on your TFTP/SLAM server machine?

Michael
(2012-05-15 09:24)mcb30 Wrote: [ -> ]Have you remembered to create a suitable multicast route on your TFTP/SLAM server machine?
I'm curious what you meant by this. I am struggling to learn how to multicast to solve a problem with my test lab. I've added a route for my multicast address and can ping it, and get responses from the other machines on the line when I ping it. I've got atftpd setup and working normally, it even shows an attempt to give out the requested file (typed in ipxe shell: "initrd tftm://192.168.10.1/myfile.rd") in the atftpd logs. When I wireshark I see the multicast options go to the client, then the server sends packet 0 over and over (to 224.1.1.0) and the client seems to request packet 0 over and over (via the ACK to 192.168.10.1 if i'm understanding the process even remotely correctly) then they both time out.

I'm trying to send the same 130mb ISO to 120 computers at once. Using sanboot it downloads in 10 seconds to a single machine, but by 100 machines it's unbearable and is quicker to boot them in groups of 10-20 (which is still quite slow). I'm now considering putting in a timing function in the boot script to only allow 5 to sanboot at once.
It does seem like you have a problem with your multicast setup. I suggest you look carefully at it again to ensure two machines can indeed do multicast between themselves without involving iPXE.

But I do think you might have a bottleneck in your network as well. If you're using 10 seconds to download 130MB, that means 13MB/s transfer rate, which is 104Mbit/s, which I'm going to assume is not a 100Mbit network, but gigabit, as you're not likely to get more than 100Mbit/s on a 100Mbit network (probably closer to 85-90Mbit/s even on a very tuned network). I'm therefore going to assume that you're using TFTP on a gigabit network instead. If you switch out that TFTP with a proper HTTP server and load the image using HTTP instead, you're more than likely to max out your network (especially if you serve that file from a server that has a decent cache and/or ramdisk). If you max out your network, you should be getting around 100MB/s in download speed, which should download that image in less than 2 seconds per machine. That should boot all of your machines in approx. 200 seconds, which is about 3 minutes. If you have your HTTP server connected to the gigabit switch with fiber (but all the clients using gigabit) I'm guessing you should be able to saturate a lot of them quite well. I'd try to serve the files using HTTP from a ramdisk first and see what kind of results that would get you.

If you're sanbooting, you might also find luck using AoE and vblade, as it works on a lower level and have less overhead. You just have to figure out where your bottleneck is. If it turns out to be Apache or IIS being slow, you should consider another web server software, e.g. nginx.
@robinsmidsrod
Thank you for your prompt response, it's got me thinking at least in the right direction... I am going to reply here but I feel that I've torn this thread off topic a little, Im still working to rethink my multicast setup. I'm not having much luck. For now I'm sticking with sanboot from http since it is working (so long as I don't boot more than 20 machines at once...) I've created a small test network consisting of my laptop, the server, and one gigabit unmanaged switch inbetween. I then run apachebench to attempt to find weaknesses. ab -kc 100 -t 30 http://192.168.10.1/ibmcd.iso seems to be working, though the first few times I ran it, nothing seemed to happen on the server but now the server shows sending ~23.5MiB/s (then down to 0 and back up each second, though I don't really understand that...) just as it does at work with the full network connected and 20 machines booting. I know if I go for more than 20 machines at once my server output drops to 1MiB/s or lower (just like with the tftp)... I also noticed on my last batch of 100 (the size of the network was reduced from 120 for unrelated reasons) that 3 machines in particular would cause server output to drop below 100KiB/s. I got around this by disconnecting those machines until the others were finished downloading the image (the second the problem machine is disconnected all others immediately go back to the expected speed).

I'm going to take your suggestion and setup AoE and vblade, though I'm not sure how I can test that from home for 100 concurrent connections. I've been researching how to identify bottlenecks in my network but I'm coming up pretty short. I'm simply connecting 7 Cisco 2950's (24port 10/100 managed switches) to one central unmanaged gigabit switch, the server is connected to the central switch. For the cisco switches I load factory defaults, disable STP on vlan1 then enable portfast on all ports. There is nothing else present in the network, and no bridging/forwarding from the server. It's only job is this.

I also notice that if I use WoL to boot them all simultaneously I get a lot of issues with the initial undionly.kpxe download... which still suggests to me that my network is having trouble with all that traffic. A lot of the machines time out on the initial DHCP as well. In addition, if it's a CMOS setting utility that's being run I also load up an undi for dos driver and a packet driver then use wget to let the server know it completed... this can fail as well if too many machines are going at once (for my last batch, all 100 failed to get an address in dos when booted one at a time each 3 seconds). I definitely feel like I'm doing something wrong with my network, should 100 simultaneous dhcp requests be a problem? I would appreciate any point in pretty much any direction. Also, thank you for your ipxe menu example you prepared for the world! It helped me quite a bit when converting my dynamic pxelinux menu to work with ipxe's menu system (which rocks!).

Edit: I just realized my cable connecting my server to the gigabit switch wasn't allowing gigabit!!! swapped that out and obviously i'm getting 10x the speed out of the server now. I have a feeling I'm too novice at networking yet to handle such a project efficiently but hopefully I'll slowly get there. I'm going to rerun a couple wires at work and make sure i'm getting gigabit to the server. What an obvious oversight. This certainly doesn't explain my congestion issues (draggin the whole network down to sub KiB/s ranges, then when that connection gets terminated the rest jump back up to the expected speeds)
If you have access to managed switches I'd wager you'd be able to get much more information from them, as they report transport errors and such more properly. Bad cables (or network cards) can indeed cause all sorts of problems. I've once seen a single NIC take down an entire network segment just because of extreme amount of bad data it pushed out onto the network. The switches got completely overloaded! Switched it out and everything got back to normal. Pay attention to those error counters.

How many DHCP requests your network can handle at once I'm not sure, but 100 seems to be a reasonable number. The interesting part is how well iPXE can deal with that, considering it's broadcast traffic. Maybe some of the core developers can give some feedback here?

Just be aware that if you're using 10/100 switches out to the clients routed via a central gigabit switch, you'll still be limited to 100/8 = 12.5MB/s theoretical maximum throughput per client when downloading your boot image. And if you said it was around 130MB that should take about 10-15 seconds for each client under optimum conditions. But if you push 24 machines over a single gigabit uplink you're obviously going to get only half of that, as the gigabit uplink will be saturated, just for one switch (and you said you had 7). That means if only 24 machines boot you should at best get 6.25MB/s download. If you divide that with 7 (a completely full network) around 0.89MB/s which means about 2 and a half minute to boot all of the machines at the same time.

If those cisco 10/100 switches supports trunking so you can have two gigabit cables running at gigabit full-duplex you'd double your performance. Another option would also be to put as many gigabit network cards in the server as it can handle, bridge those network cards together and connect each of them to a 100/100 switch. Don't forget to enable packet forwarding. If you don't have enough PCI slots you can use one NIC to the unmanaged gigabit switch which can take the rest.

My best suggestion would be to buy a fiber module to the gigabit switch and use that between your central switch and the server. That should allow the server to push out a good amount more of data.
While your time estimations all sound about right, except for the full network estimation of 2.5min, I'm a bit confused on how the speed measurements work. Here are my recent findings (my speeds are recorded from what I observe in the network monitor as I experiment). My project is for 11,000 identical machines from IBM's factories. Brand new SurePOS 4800-743's. Broadcom BCM5755.

I get 23.5MB/s out of the server when I boot a single bench at a time (18 machines) that's wired with a Cisco Catalyst 2950 (takes about 15-20sec for them all). For the same bench with the Cisco switch swapped out for a Symbol ES-3000-PWR (24 x 10/100Mbps and 2 x 10/100/1000Mbps) I see up to 175MB/s out of the server (they boot before I can even get over to them) so obviously having that gigabit link to the bench switch is significant.

My lab at home consists of 1 x Netgear ProSafe 8port Gigabit Switch, 2 target machines, and the server (wireshark from server) and I can reach 23.5MB/s with 1 machine booting over 10/100 (undionly.kkpxe or ipxe.pxe). Undionly breaks down with gigabit connectivity, only pulls around 1MB/s where if I switch to the full ipxe it transfers in a little over a second. If I connect my laptop and initiate 100 connections to download the file over http I get 234MB/s out of the server. My "anomaly" machines (the ones who drag the entire network) do not exhibit the same behavior at home (it's worth noting that they drag down every aspect of the network, tftp is slow, dhcp response times are slow... 10x slower at least...).

Last Friday I had a batch of 100 up with no anomalies present (I replaced them) and did some testing of full simultaneous booting. It took over 10 minutes to deliver the data to all machines (with a few inconsistencies, 10 or so lagged quite a ways behind the rest). The server stayed around 75MB/s the entire time and CPU activity and Disk Activity were about on par with idle.

I'm fairly certain my issues stem from my inexperience with networking. I have 2 "anomalies" from my batch of 100 this morning that I'm going to isolate on another test network for further testing. I'm going to attempt learning how to setup trunking which those cisco 2950's do support (though I have no fiber cable, switches, or experience so I'm limited to using the 24 x 10/100 ports). I'd also wondered if setting up a VLAN per table may be a good idea, but I'm not sure if that's going to help. I will consider attempting adding more NIC's to the server eventually (which is entirely possible, plenty of room) but I don't seem to be stressing the server at all. I've also completely abandoned the idea of using multicast tftp. I think once I've got these "anomalies" sorted out this network will be plenty sufficient (10min to boot all 100 is perfectly OK)

Edit2: Just finished some more testing, this time I disconnected every Cisco 2950 and left my two benches of Symbol ES-3000's boot up, they immediately saturated the network to 234MB/s at the server's end.so I'm going to attempt to continue adding my anomalies back in. I can only assume it's a complicated cisco feature that I'm struggling to understand (I've disabled igmp snooping, stp, snmp traps are off though it says it filters by default I have not been able to figure that out yet)...

For what it's worth, I absolutely love iPXE, thank you for the continued development and support!
How are you able to get more than 125MB/s on a full-duplex gigabit link? 1000Mbit divided by 8 is 125. That is supposed to be the theoretical maximum on a single full-duplex gigabit link (in one direction). Are you using trunking or some other feature to bundle links? Your network links are obviously your bottlenecks. 130MB of data should easily fit into the disk cache of the server, and since you're serving it without gzip/deflate compression there is almost no CPU load as well. Your limitation is the network speed.

I'm curious how you measure these numbers? Do you use MRTG or something to poll the counters on the switches every few seconds or do you read them from the web server?
https://lh5.googleusercontent.com/-cEpAT...2520PM.png

I read them from Network Monitor as things are happening. I figure it's maybe doubled or something, I wrote whatever that value read in my notes during my problem for consistency, and also because I didn't understand the math well. Sorry for the confusion. This isn't even an iPXE problem, it seems to be me not knowing enough mixed with a little cisco ios headache.

I think I'm onto something now after my afternoon testing. Normally a problem machine is replaced when found, instead I replaced the switch under that bench with anything that wasn't a cisco 2950. Problem gone (for that machine on that bench anyway). I'm removing the remaining cisco switches in the morning and trying this batch again.

https://lh5.googleusercontent.com/-P0Y16.../drop1.png
UPD: at work this morning, replaced the cisco switches. Problem persists. Here's a screencap of my systems this morning, you can see it's maxed out like I want it until some problem rolls along and jams it up (there it goes to 8.1MiB/s until I located that machine and removed it from the network, then back to 234 instantly!). I also noticed if I unplug the bench having the issue and plug it back in it can operate at full speed. I've gone through every setting on the symbol switches, I had stp off, then back on and only enabled on my 2 uplink ports, disabled every other feature the switch has. I think we may be able to just use it like this though if I can reliably just refresh the connection to that bench to get them booted. Booting the dos disks isn't an issue much because the network is only held back for like 10 seconds or so. I might try a different server OS too but I'm feeling pretty lost here. I still don't think it's an ipxe issue or anything like that...
Reference URL's