2019-04-12, 17:17
Please forgive my poor english first.
Test environment:
Server: windows server 2012 with iscsi target(wintarget) + tftp32 + latest ipxe.efi / snponly.efi + gpt system image (128GB, three partitons: ESP, MSR, and win7 OS volume)
Client : 1000M PCI-E NIC with UEFI network boot enabled ,local harddisk has removed.
After ClientPC boot into ipxe shell , I type
dhcp --> set keep-san 1 -> sanboot iscsi:10.10.10.253::::iqn.2019.com.test:gpt , then the clientPC begin load OS image, but very very slowly, it takes about 3 - 5 minutes before the windows boot animation logo appeared.
and at this time, I found the wintarget service in server side has crashed, once I restart the wintarget servive manually, it crashed soon(because ipxe will reconnect to target and retry the previous failed scsi cmd) , after crash -> restart for 10 - 20 times , the clientPC can successfully boot into windows desktop.
I tried snponly.efi , and tried another PC with diffrent NIC, the result always same. So I guess this is not a network or NIC driver problem, it maybe the ipxe send a wrong scsi command to iscsi target, and caused the target crash.
at last, I fetched the latest IPXE source ,and compiled with DEBUG=iscsi,scsi,efi_block:2, enable syslog, then I replace ipxe.efi in server side, and reboot the clientPC.
syslog I got after clientPC reboot:
...................
...................
<6>ipxe: EFIBLK 0x80 read LBA 0x00032800 to 0xbf5000+0x00002000
<6>ipxe: EFIBLK 0x80 read LBA 0x00072800 to 0xc04000+0x00004000
<6>ipxe: EFIBLK 0x80 read LBA 0x00672800 to 0xbf5000+0x00000400
<6>ipxe: EFIBLK 0x80 read LBA 0x0067280a to 0xbf5000+0x00000400
<6>ipxe: EFIBLK 0x80 read LBA 0x00072960 to 0xc08000+0x00004000
<6>ipxe: EFIBLK 0x80 read LBA 0x0067ab08 to 0xbf5000+0x00000400
<6>ipxe: EFIBLK 0x80 read LBA 0x0101c830 to 0xbf5000+0x00002000
<6>ipxe: EFIBLK 0x80 read LBA 0x004a5988 to 0x24c5000+0x014c2600
<6>ipxe: iSCSI 0xe9a3008 closed: Connection reset (http://ipxe.org/0f0a6095)
<6>ipxe: SCSI 0xe9a36e8 tag 18ae01e6 closed: Connection reset (http://ipxe.org/0f0a6095)
<6>ipxe: iSCSI 0xe99fb08 initiator iqn.2010-04.org.ipxe:2dfe4d56-efb6-00a3-efcf-7a4242c83c9f
<6>ipxe: iSCSI 0xe99fb08 target 10.10.10.253 iqn.2019.com.test:gpt
<6>ipxe: iSCSI 0xe99fb08 entering security negotiation
<6>ipxe: SCSI 0xe9a12e8 created for LUN 0000-0000-0000-0000
<6>ipxe: iSCSI 0xe99fb08 closed: Connection reset (http://ipxe.org/0f0a6095)
<6>ipxe: iSCSI 0xe99f908 initiator iqn.2010-04.org.ipxe:2dfe4d56-efb6-00a3-efcf-7a4242c83c9f
<6>ipxe: iSCSI 0xe99f908 target 10.10.10.253 iqn.2019.com.test:gpt
<6>ipxe: iSCSI 0xe99f908 entering security negotiation
<6>ipxe: SCSI 0xe99fb28 created for LUN 0000-0000-0000-0000
<6>ipxe: iSCSI 0xe99f908 closed: Connection reset (http://ipxe.org/0f0a6095)
......................................
......................................
once the log line " <6>ipxe: EFIBLK 0x80 read LBA 0x004a5988 to 0x24c5000+0x014c2600 " appears , the wintarget crashed .
from the source , the 0x014c2600 in the log means efi block read io size, I don't understand why there is an efi block io with read size set to 0x014c2600 bytes (21M). (usually the request size in one efi read io is 0x200 - 0x100000. (512B - 64K))
I simply add some code in the function efi_block_io_read (efi_block.c) :
if (len > 0x1000000)
{
return EFI_BAD_BUFFER_SIZE;
}
then the client can successfully boot with no errors in 30 - 40 seconds, and the target has not crash.
But this is not a good solution, I think we need to figure out the reason of why the len arg passed to efi_block_io_read is so large.
The GPT disk image in the target and the physical disk of the server are all ok, I checked with HDTUNE ,no errors or warnnings.
And I test in legacy bios with a MBR disk image, The boot slow problem and large size read request won't occurrs, only uefi.
I readed the UEFI spec , it doesn't mentioned the max request size limits of block_io_read.
maybe the bug is in efi block io protocol? or in wintarget? I'm not sure. (I haven't worked with uefi before today)
Any one can help? Thanks very much.
Test environment:
Server: windows server 2012 with iscsi target(wintarget) + tftp32 + latest ipxe.efi / snponly.efi + gpt system image (128GB, three partitons: ESP, MSR, and win7 OS volume)
Client : 1000M PCI-E NIC with UEFI network boot enabled ,local harddisk has removed.
After ClientPC boot into ipxe shell , I type
dhcp --> set keep-san 1 -> sanboot iscsi:10.10.10.253::::iqn.2019.com.test:gpt , then the clientPC begin load OS image, but very very slowly, it takes about 3 - 5 minutes before the windows boot animation logo appeared.
and at this time, I found the wintarget service in server side has crashed, once I restart the wintarget servive manually, it crashed soon(because ipxe will reconnect to target and retry the previous failed scsi cmd) , after crash -> restart for 10 - 20 times , the clientPC can successfully boot into windows desktop.
I tried snponly.efi , and tried another PC with diffrent NIC, the result always same. So I guess this is not a network or NIC driver problem, it maybe the ipxe send a wrong scsi command to iscsi target, and caused the target crash.
at last, I fetched the latest IPXE source ,and compiled with DEBUG=iscsi,scsi,efi_block:2, enable syslog, then I replace ipxe.efi in server side, and reboot the clientPC.
syslog I got after clientPC reboot:
...................
...................
<6>ipxe: EFIBLK 0x80 read LBA 0x00032800 to 0xbf5000+0x00002000
<6>ipxe: EFIBLK 0x80 read LBA 0x00072800 to 0xc04000+0x00004000
<6>ipxe: EFIBLK 0x80 read LBA 0x00672800 to 0xbf5000+0x00000400
<6>ipxe: EFIBLK 0x80 read LBA 0x0067280a to 0xbf5000+0x00000400
<6>ipxe: EFIBLK 0x80 read LBA 0x00072960 to 0xc08000+0x00004000
<6>ipxe: EFIBLK 0x80 read LBA 0x0067ab08 to 0xbf5000+0x00000400
<6>ipxe: EFIBLK 0x80 read LBA 0x0101c830 to 0xbf5000+0x00002000
<6>ipxe: EFIBLK 0x80 read LBA 0x004a5988 to 0x24c5000+0x014c2600
<6>ipxe: iSCSI 0xe9a3008 closed: Connection reset (http://ipxe.org/0f0a6095)
<6>ipxe: SCSI 0xe9a36e8 tag 18ae01e6 closed: Connection reset (http://ipxe.org/0f0a6095)
<6>ipxe: iSCSI 0xe99fb08 initiator iqn.2010-04.org.ipxe:2dfe4d56-efb6-00a3-efcf-7a4242c83c9f
<6>ipxe: iSCSI 0xe99fb08 target 10.10.10.253 iqn.2019.com.test:gpt
<6>ipxe: iSCSI 0xe99fb08 entering security negotiation
<6>ipxe: SCSI 0xe9a12e8 created for LUN 0000-0000-0000-0000
<6>ipxe: iSCSI 0xe99fb08 closed: Connection reset (http://ipxe.org/0f0a6095)
<6>ipxe: iSCSI 0xe99f908 initiator iqn.2010-04.org.ipxe:2dfe4d56-efb6-00a3-efcf-7a4242c83c9f
<6>ipxe: iSCSI 0xe99f908 target 10.10.10.253 iqn.2019.com.test:gpt
<6>ipxe: iSCSI 0xe99f908 entering security negotiation
<6>ipxe: SCSI 0xe99fb28 created for LUN 0000-0000-0000-0000
<6>ipxe: iSCSI 0xe99f908 closed: Connection reset (http://ipxe.org/0f0a6095)
......................................
......................................
once the log line " <6>ipxe: EFIBLK 0x80 read LBA 0x004a5988 to 0x24c5000+0x014c2600 " appears , the wintarget crashed .
from the source , the 0x014c2600 in the log means efi block read io size, I don't understand why there is an efi block io with read size set to 0x014c2600 bytes (21M). (usually the request size in one efi read io is 0x200 - 0x100000. (512B - 64K))
I simply add some code in the function efi_block_io_read (efi_block.c) :
if (len > 0x1000000)
{
return EFI_BAD_BUFFER_SIZE;
}
then the client can successfully boot with no errors in 30 - 40 seconds, and the target has not crash.
But this is not a good solution, I think we need to figure out the reason of why the len arg passed to efi_block_io_read is so large.
The GPT disk image in the target and the physical disk of the server are all ok, I checked with HDTUNE ,no errors or warnnings.
And I test in legacy bios with a MBR disk image, The boot slow problem and large size read request won't occurrs, only uefi.
I readed the UEFI spec , it doesn't mentioned the max request size limits of block_io_read.
maybe the bug is in efi block io protocol? or in wintarget? I'm not sure. (I haven't worked with uefi before today)
Any one can help? Thanks very much.