Here is the story: my sun4v can boot OBP, but booting Solaris 10 hangs with no error messages. Ok, being there, done that. Let’s start the Solaris kernel with a debugger. I really liked kadb for debugging early boot stuff, but the Solaris 10 image supplied with the OpenSPARC project has only its successor - kmdb. Well, kmdb is indeed more advanced, but it’s also quite bigger than its predecessor. Which might be (or might be not) the reason for it failing to boot:
Sun Fire T2000, No Keyboard
Copyright 2005 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.20.0, 256 MB memory available, Serial #1122867.
[mo23723 obp4.20.0 #0]
Ethernet address 0:80:3:de:ad:3, Host ID: 80112233.
ok boot -kdv
Boot device: /virtual-devices/disk@0 File and args: -kdv
Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54.
FCode UFS Reader 1.12 00/07/17 15:48:16.
The boot filesystem is logging.
The ufs log is empty and will not be used.
Size: 0x76e40+0x1c872+0x3123a Bytes
module /platform/sun4v/kernel/sparcv9/unix: text at [0x1000000, 0x1076e3f] data at 0x1800000
module misc/sparcv9/krtld: text at [0x1076e40, 0x108f737] data at 0x184dab0
module /platform/sun4v/kernel/sparcv9/genunix: text at [0x108f738, 0x11dd437] data at 0x18531c0
module /platform/sun4v/kernel/misc/sparcv9/platmod: text at [0x11dd438, 0x11dd43f] data at 0x18a4be0
module /platform/sun4v/kernel/cpu/sparcv9/SUNW,UltraSPARC-T1: text at [0x11dd440, 0x11e06ff] data at 0x18a5300
module /platform/sun4v/kernel/misc/sparcv9/kmdbmod: text at [0x11e0700, 0x124b2bf] data at 0x18b4da0
module /kernel/misc/sparcv9/ctf: text at [0x124b2c0, 0x1252d97] data at 0x18d6ed0
module /kernel/misc/sparcv9/zmod: text at [0x1252d98, 0x1257a67] data at 0x18d7af8
failed to decompress CTF data for unix: File data structure corruption detected
failed to decompress CTF data for genunix: String name offset is corrupt
failed to decompress CTF data for ctf: File data structure corruption detected
failed to decompress CTF data for zmod: File data structure corruption detected
What is the solution? Connect another debugger (gdb) to QEMU and debug the Solaris debugger (kmdb). Sounds reasonable, right? In the next step I found a place where memory is already corrupted. This has been easy: as you see, the Solaris engineers put some sanity checks in the CTF code. Well done, Sun guys!
Finding the place where it gets corrupted is a bit harder: gdb has no watch-points on the physical memory, supporting only virtual memory watch-points. The solution is indeed starting the QEMU process itself in a debugger. At this point it gets slightly insane: