Post Reply 
 
Thread Rating:
  • 1 Vote(s) - 1 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Hyper-V Gen 2
2017-05-25, 17:40
Post: #1
Hyper-V Gen 2
Hello guys, long time no posting.

We have a new bug/issue that needs some pretty strong help.

We've determined the cause was introduced in:
https://github.com/ipxe/ipxe/commit/a0f6...d9663f5269

And made more difficult to patch with:
https://github.com/ipxe/ipxe/commit/276d...d5a93b20d6

By reverting the changes in these two commits, Gen 2 Hyper-V systems are able to boot.

I don't know if it's a big problem to correct for, or if simply reverting will be the best bet.

I suspect the problem is specific to the lack of the return -EBUSY statement as that's all that changed in a0f6e7.
Visit this user's website Find all posts by this user
Quote this message in a reply
2017-05-26, 10:59
Post: #2
RE: Hyper-V Gen 2
(2017-05-25 17:40)mastacontrola Wrote:  We've determined the cause was introduced in:
https://github.com/ipxe/ipxe/commit/a0f6...d9663f5269

And made more difficult to patch with:
https://github.com/ipxe/ipxe/commit/276d...d5a93b20d6

By reverting the changes in these two commits, Gen 2 Hyper-V systems are able to boot.

I don't know if it's a big problem to correct for, or if simply reverting will be the best bet.

Those patches cannot simply be reverted. hv_map_hypercall() and hv_map_synic() are now used on the hv_unquiesce() path, and we rely on hv_map_hypercall() accepting a situation in which the guest OS ID MSR is already non-zero in order to recover from having our network connection dropped in the middle of an iSCSI boot: http://git.ipxe.org/ipxe.git/commitdiff/b91cc98.

How are you starting iPXE within your Gen 2 VM?

Michael
Visit this user's website Find all posts by this user
Quote this message in a reply
2017-05-26, 12:23 (This post was last modified: 2017-05-26 12:24 by mastacontrola.)
Post: #3
RE: Hyper-V Gen 2
(2017-05-26 10:59)mcb30 Wrote:  
(2017-05-25 17:40)mastacontrola Wrote:  We've determined the cause was introduced in:
https://github.com/ipxe/ipxe/commit/a0f6...d9663f5269

And made more difficult to patch with:
https://github.com/ipxe/ipxe/commit/276d...d5a93b20d6

By reverting the changes in these two commits, Gen 2 Hyper-V systems are able to boot.

I don't know if it's a big problem to correct for, or if simply reverting will be the best bet.

Those patches cannot simply be reverted. hv_map_hypercall() and hv_map_synic() are now used on the hv_unquiesce() path, and we rely on hv_map_hypercall() accepting a situation in which the guest OS ID MSR is already non-zero in order to recover from having our network connection dropped in the middle of an iSCSI boot: http://git.ipxe.org/ipxe.git/commitdiff/b91cc98.

How are you starting iPXE within your Gen 2 VM?

Michael

All we're doing is our normal boot processes. It works for Gen 1 Hyper-V (as far as I know) for UEFI/EFI booting on. We build the boot system fairly dynamically via a php script.

A task boot script would like like:
Code:
#!ipxe
set fog-ip 10.0.7.1
set fog-webroot fog
set boot-url http://${fog-ip}/${fog-webroot}
kernel bzImage32 loglevel=4 initrd=init_32.xz root=/dev/ram0 rw ramdisk_size=127000 web=10.0.7.1/fog/ consoleblank=0 rootfstype=ext4 sshpasswd=somepasswordhere isdebug=yes ismajordebug=1 storage=10.0.7.1:/images/ storageip=10.0.7.1 mac=00:0c:29:24:4c:3d ftp=10.2.1.5 storage=10.2.1.5:/images/ storageip=10.2.1.5 osid=1 irqpoll hostname=winxptest chkdsk=0 img=winxpleg2 imgType=n imgPartitionType=all imgid=6 imgFormat=0 PIGZ_COMP=-6 adon=1 addomain="mastacontrola" adou="" aduser="HAHAHANOPE" adpass="somestringhere" hostearly=1 type=down isdebug=yes ismajordebug=1
imgfetch init_32.xz
boot
This will be the same for either UEFI/EFI or Legacy style boot. With the current ipxe binaries (as is) this will load iPXE just fine, but nothing actually lets the VM boot. Reverting these changes, while I understand there can be consequences, allows the VM to boot again. I think it's specifically related to the lost return -EBUSY, and the only reason I had to make the other mods was so the return wouldn't cause a problem during build time.
Visit this user's website Find all posts by this user
Quote this message in a reply
2017-05-26, 13:38
Post: #4
RE: Hyper-V Gen 2
(2017-05-26 12:23)mastacontrola Wrote:  All we're doing is our normal boot processes. It works for Gen 1 Hyper-V (as far as I know) for UEFI/EFI booting on. We build the boot system fairly dynamically via a php script.

Sorry; I wasn't clear. I meant to ask how you are loading the iPXE binary itself. Are you using a bootable disk image, bootable ISO image, chainloading, ...?

Michael
Visit this user's website Find all posts by this user
Quote this message in a reply
2017-05-27, 01:35
Post: #5
RE: Hyper-V Gen 2
(2017-05-26 13:38)mcb30 Wrote:  
(2017-05-26 12:23)mastacontrola Wrote:  All we're doing is our normal boot processes. It works for Gen 1 Hyper-V (as far as I know) for UEFI/EFI booting on. We build the boot system fairly dynamically via a php script.

Sorry; I wasn't clear. I meant to ask how you are loading the iPXE binary itself. Are you using a bootable disk image, bootable ISO image, chainloading, ...?

Michael

My bad too, it's normally loaded via pxe boot. So UEFI PXE Boot requests ipxe.efi in this particular case. I don't know if you would call that chainloading, persay though I guess I can't say it's "not" chainloading.
Visit this user's website Find all posts by this user
Quote this message in a reply
2017-05-31, 18:00
Post: #6
RE: Hyper-V Gen 2
So my reversions, from what i can tell, didn't impact any functionality, it just made it so the functions returned as int's and used the values returned to perform the same feats as earlier. I didn't change large segments of code.

Can you explain, more directly, what the repercussions of the reversions I made had on the whole process?

Of note: here's the exact diff:

Code:
--- a/src/arch/x86/drivers/hyperv/hyperv.c
+++ b/src/arch/x86/drivers/hyperv/hyperv.c
@@ -225,7 +225,7 @@ static int hv_check_features ( struct hv_hypervisor *hv ) {
  *
  * @v hv               Hyper-V hypervisor
  */
-static void hv_map_hypercall ( struct hv_hypervisor *hv ) {
+static int hv_map_hypercall ( struct hv_hypervisor *hv ) {
        union {
                struct {
                        uint32_t ebx;
@@ -247,6 +247,7 @@ static void hv_map_hypercall ( struct hv_hypervisor *hv ) {
        if ( guest_os_id != 0 ) {
                DBGC ( hv, "HV %p guest OS ID MSR was %#08llx\n",
                       hv, guest_os_id );
+               return -EBUSY;
        }
        guest_os_id = HV_GUEST_OS_ID_IPXE;
        DBGC2 ( hv, "HV %p guest OS ID MSR is %#08llx\n", hv, guest_os_id );
@@ -267,6 +268,8 @@ static void hv_map_hypercall ( struct hv_hypervisor *hv ) {
        hypercall |= ( virt_to_phys ( hv->hypercall ) | HV_HYPERCALL_ENABLE );
        DBGC2 ( hv, "HV %p hypercall MSR is %#08llx\n", hv, hypercall );
        wrmsr ( HV_X64_MSR_HYPERCALL, hypercall );
+
+       return 0;
}

/**
@@ -295,7 +298,7 @@ static void hv_unmap_hypercall ( struct hv_hypervisor *hv ) {
  *
  * @v hv               Hyper-V hypervisor
  */
-static void hv_map_synic ( struct hv_hypervisor *hv ) {
+static int hv_map_synic ( struct hv_hypervisor *hv ) {
        uint64_t simp;
        uint64_t siefp;
        uint64_t scontrol;
@@ -323,6 +326,8 @@ static void hv_map_synic ( struct hv_hypervisor *hv ) {
        scontrol |= HV_SCONTROL_ENABLE;
        DBGC2 ( hv, "HV %p SCONTROL MSR is %#08llx\n", hv, scontrol );
        wrmsr ( HV_X64_MSR_SCONTROL, scontrol );
+
+       return 0;
}

/**
@@ -566,10 +571,14 @@ static int hv_probe ( struct root_device *rootdev ) {
                goto err_alloc_message;

        /* Map hypercall page */
-       hv_map_hypercall ( hv );
+       if ( ( rc = hv_map_hypercall ( hv ) ) != 0 )
+               goto err_map_hypercall
+       //hv_map_hypercall ( hv );

        /* Map synthetic interrupt controller */
-       hv_map_synic ( hv );
+       if ( ( rc = hv_map_synic ( hv ) ) != 0 )
+               goto err_map_synic
+       //hv_map_synic ( hv );

        /* Probe Hyper-V devices */
        if ( ( rc = vmbus_probe ( hv, &rootdev->dev ) ) != 0 )
@@ -581,7 +590,9 @@ static int hv_probe ( struct root_device *rootdev ) {
        vmbus_remove ( hv, &rootdev->dev );
  err_vmbus_probe:
        hv_unmap_synic ( hv );
+ err_map_synic:
        hv_unmap_hypercall ( hv );
+ err_map_hypercall:
        hv_free_message ( hv );
  err_alloc_message:
        hv_free_pages ( hv, hv->hypercall, hv->synic.message, hv->synic.event,

I did not do a full on code reversion, just updated things to try to get them functional again. It would seem the -EBUSY was needed, in my head.

I did not lose the functionality of the calls, just reintegrated what was removed. And this seems to work, even with the head state, where without these things it seems to not work for Hyper-V Gen 2. I don't have Hyper-V VM's so I'm not a good test subject, but I know people who can get us information as needed.
Visit this user's website Find all posts by this user
Quote this message in a reply
2017-05-31, 18:05
Post: #7
RE: Hyper-V Gen 2
(2017-05-31 18:00)mastacontrola Wrote:  Can you explain, more directly, what the repercussions of the reversions I made had on the whole process?

It introduces a potential unhandled failure case into the recovery path used to restore the VMBus connection when booting Windows Server 2016 via iSCSI.

Michael
Visit this user's website Find all posts by this user
Quote this message in a reply
2017-05-31, 18:48
Post: #8
RE: Hyper-V Gen 2
(2017-05-31 18:05)mcb30 Wrote:  
(2017-05-31 18:00)mastacontrola Wrote:  Can you explain, more directly, what the repercussions of the reversions I made had on the whole process?

It introduces a potential unhandled failure case into the recovery path used to restore the VMBus connection when booting Windows Server 2016 via iSCSI.

Michael

So while not optimal, our stuff is usually used to boot into an embedded linux OS for the purposes imaging (capturing and/or deploying). From "FOG's" perspective this has limited potential problem. For those who customize their ipxe menu they would need to be made aware of the problem here.

What I can say, leaving out the "return -EBUSY" is what seems to have been the "breaking" point of the problem. It doesn't make sense to me why. Any suggestions you might have so I can more appropriately and generically fix what might be causing it? The only reason I reverted those potential files is because they were side by side and the "least" effort involved to correct the problems. The only reason I had to do the second file reversion is so I could have the return -EBUSY and successfully build.
Visit this user's website Find all posts by this user
Quote this message in a reply
2017-06-08, 13:57
Post: #9
RE: Hyper-V Gen 2
Hello there.
I am meeting the same problem.
I found Hyper-V VMM watchdog timeout Error log.
It seems that this error is caused by efix86_cpu_nap() in src/arch/x86/interface/efi/efix86_nap.c(by printf debugging!).

Code:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Hyper-V-Chipset" Guid="{DE9BA731-7F33-4F44-98C9-6CAC856B9F83}" />
    <EventID>18600</EventID>
    <Version>0</Version>
    <Level>1</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime="2017-06-08T12:42:52.954069100Z" />
    <EventRecordID>10122</EventRecordID>
    <Correlation />
    <Execution ProcessID="14284" ThreadID="4964" />
    <Channel>Microsoft-Windows-Hyper-V-Worker-Admin</Channel>
    <Computer>foobar-desktop</Computer>
    <Security UserID="S-1-5-83-1-2864687803-1288913492-2153597110-3744186609" />
  </System>
  <UserData>
    <VmlEventLog xmlns="http://www.microsoft.com/Windows/Virtualization/Events">
      <VmName>testee</VmName>
      <VmId>AABFAABB-4254-4CD3-B648-5D80F1C02BDF</VmId>
    </VmlEventLog>
  </UserData>
</Event>
Find all posts by this user
Quote this message in a reply
2017-06-13, 14:45
Post: #10
RE: Hyper-V Gen 2
Is there anything we can do to try to correct the problems? Any other information we need to provide?
Visit this user's website Find all posts by this user
Quote this message in a reply
2017-07-25, 16:35
Post: #11
RE: Hyper-V Gen 2
It seems that this issue also matches what Clemlar reported on IRC
Using Gen 2 machine current code just caused reboot almost immediately after iPXE banner

Using parent of a0f6e75 it supposedly works.

Maybe building with DEBUG=hyperv:3 and comparing output between working and non working might shed some light on best way to fix this?

(will update if I get more info on IRC)

Use GitHub Discussions
VRAM bin
Visit this user's website Find all posts by this user
Quote this message in a reply
2017-07-27, 10:58
Post: #12
RE: Hyper-V Gen 2
For now the recommendation is to use snponly.efi when using Hyper-V Gen 2

From mcb30:
What is needed for proper fix is to find which EFI device that represents ownership of VMbus and have iPXE take ownership instead.

Use GitHub Discussions
VRAM bin
Visit this user's website Find all posts by this user
Quote this message in a reply
2017-07-28, 21:32
Post: #13
RE: Hyper-V Gen 2
There is now a workaround in place: http://git.ipxe.org/ipxe.git/commitdiff/9366578.

Michael
Visit this user's website Find all posts by this user
Quote this message in a reply
2017-07-31, 00:25
Post: #14
RE: Hyper-V Gen 2
(2017-07-28 21:32)mcb30 Wrote:  There is now a workaround in place: http://git.ipxe.org/ipxe.git/commitdiff/9366578.

Michael

Building after resetting my changes to allow me to keep up to date. Will have people test and report back. Thank you for getting back and letting us know.
Visit this user's website Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 2 Guest(s)