Here is the story: my sun4v can boot OBP,
but booting Solaris 10 hangs with no error messages. Ok, being there, done
that. Let’s start the Solaris kernel with a debugger. I really liked kadb for
debugging early boot stuff, but the Solaris 10 image supplied with the OpenSPARC
project has only its successor - kmdb. Well, kmdb is indeed more advanced, but it’s
also quite bigger than its predecessor.
Which might be (or might be not) the reason for it failing to boot:
Sun Fire T2000, No Keyboard
Copyright 2005 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.20.0, 256 MB memory available,
Serial #1122867.
[mo23723 obp4.20.0 #0]
Ethernet address 0:80:3:de:ad:3, Host ID:
80112233.
ok boot -kdv
Boot device: /virtual-devices/disk@0 File and args: -kdv
Loading ufs-file-system package 1.4 04 Aug
1995 13:02:54.
FCode UFS Reader 1.12 00/07/17 15:48:16.
Loading:
/platform/SUNW,Sun-Fire-T2000/ufsboot
Loading: /platform/sun4v/ufsboot
The boot filesystem is logging.
The ufs log is empty and will not be used.
Size: 0x76e40+0x1c872+0x3123a Bytes
module /platform/sun4v/kernel/sparcv9/unix:
text at [0x1000000, 0x1076e3f] data at 0x1800000
module misc/sparcv9/krtld: text at
[0x1076e40, 0x108f737] data at 0x184dab0
module
/platform/sun4v/kernel/sparcv9/genunix: text at [0x108f738, 0x11dd437] data at
0x18531c0
module
/platform/sun4v/kernel/misc/sparcv9/platmod: text at [0x11dd438, 0x11dd43f]
data at 0x18a4be0
module
/platform/sun4v/kernel/cpu/sparcv9/SUNW,UltraSPARC-T1: text at [0x11dd440,
0x11e06ff] data at 0x18a5300
Loading kmdb...
module
/platform/sun4v/kernel/misc/sparcv9/kmdbmod: text at [0x11e0700, 0x124b2bf]
data at 0x18b4da0
module /kernel/misc/sparcv9/ctf: text at
[0x124b2c0, 0x1252d97] data at 0x18d6ed0
module /kernel/misc/sparcv9/zmod: text at
[0x1252d98, 0x1257a67] data at 0x18d7af8
failed to decompress CTF data for unix:
File data structure corruption detected
failed to decompress CTF data for genunix:
String name offset is corrupt
failed to decompress CTF data for ctf: File
data structure corruption detected
failed to decompress CTF data for zmod:
File data structure corruption detected
What is the solution? Connect another
debugger (gdb) to QEMU and debug the Solaris debugger (kmdb). Sounds reasonable,
right? In the next step I found a place
where memory is already corrupted. This has been easy: as you see, the Solaris engineers
put some sanity checks in the CTF code. Well done, Sun guys!
Finding the place where it gets corrupted is
a bit harder: gdb has no watch-points on the physical memory, supporting only
virtual memory watch-points. The solution is indeed starting the QEMU process
itself in a debugger. At this point it gets slightly insane:
I put a debugger (kmdb) in a debugger (gdb
x86-64) and connected it to a debugger (gdb sparc-v9) so I can debug while I’m debugging
a debugger.