Monday, November 3, 2008

Solaris 10 update 6 bug: x86: ata timeout during boot (6586621)

This really weird bug is blocking me from updating Solaris 10 on every Sun Ultra 24 development workstation we have. Installation is performed normally and the first time I hit this bug was during a test deployment of Solaris 10 update 5 05/08. During boot, the machine seems to hang and a message similar to this is displayed:

scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@0 (ata0):
timeout: reset bus, target=0 lun=0
scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@0 (ata0):
timeout: early timeout, target=0 lun=0
gda: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@0/cmdk@0,0 (Disk0):
Error for command 'read sector' Error Level: Informational
gda: [ID 107833 kern.notice] Sense Key: aborted command
gda: [ID 107833 kern.notice] Vendor 'Gen-ATA ' error code: 0x3
gda: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@0/cmdk@0,0 (Disk0):
Error for command 'read sector' Error Level: Informational
gda: [ID 107833 kern.notice] Sense Key: aborted command
gda: [ID 107833 kern.notice] Vendor 'Gen-ATA ' error code: 0x3
scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@0 (ata0):
timeout: abort request, target=0 lun=0
scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@0 (ata0):
timeout: abort device, target=0 lun=0
scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@0 (ata0):
timeout: reset target, target=0 lun=0
scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@0 (ata0):
timeout: reset bus, target=0 lun=0
scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@0 (ata0):
timeout: early timeout, target=0 lun=0
gda: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@0/cmdk@0,0 (Disk0):
Error for command 'read sector' Error Level: Informational
gda: [ID 107833 kern.notice] Sense Key: aborted command
gda: [ID 107833 kern.notice] Vendor 'Gen-ATA ' error code: 0x3
gda: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@1f,2/ide@0/cmdk@0,0 (Disk0):

As documented in Solaris 10 Update 6 release notes, this bug has these 5 known workarounds (full details for each workaround can be found in the release notes):
  • Workaround 1: Enable AHCI in BIOS if available on the system. Enabling this setting requires a reinstall of the Solaris OS.
  • Workaround 2: Install Solaris on a disk on a controller which does not use the ata driver.
  • Workaround 3: Disable MP in the BIOS setup so that a single processor is active.
  • Workaround 4: Disable MP in Solaris so that a single processor is active. Perform the following steps from the Grand Unified Bootloader (GRUB) menu.
  • Workaround 5: Disable microcode update.
Workaround 3 and 4 have such an impact that are not realistic. Workaround 1, when possible, implies a new Solaris 10 installation, which in my case wouldn't be such a nuisance but in most cases isn't viable, too. Workaround 5, as documented in the release notes, relies on running a script after boot, which is a deployment detail I don't really like.

Sun is trying to be competitive on the desktop market and is struggling to push Solaris on x86, too. The Sun Ultra series (20, 24, 40) is a wonderful line of machines but... Ultra 24 is affected by this bug! Hopefully, Sun Ultra 20 M2 is not and I think I'll keep on buying Sun Ultra 20 instead of Sun Ultra 24: moreover, they have 2 NICs by default and I often need multihomed workstations.

I think I'll wait for the next Solaris 10 release to (hope to) see it fixed.

No comments: