Saturday, October 9, 2010

Bug in all Solaris versions after 5.7?!?

Tried to clean up some patches for submitting upstream, and it turned out that one of the hacks is needed because of a Solaris bug! I'm really astonished.
The init routine of the network card driver in Solaris 2.6 has this piece of code:

call      ddi_get_parent
ld        [%l0 + 0xc], %o0
call      ddi_get_driver_private
nop
add       %o0, 0x4, %g2
st        %g2, [%l0 + 0x720]

That's how it looks in Solaris 9:
call      ddi_get_parent
ld        [%l0 + 0x10], %o0
call      ddi_get_driver_private
nop
add       %o0, 0x10, %g2
st        %g2, [%l0 + 0x728]

Adding 0x10 to the base of dma registers, makes a pointer to a nowhere.

Yes, qemu is not precise, and doesn't emulate memory aliasing (Blue Swirl had a patch for it), but hey, Solaris works on sun4m only due to a coincidence!

So, all the Solaris versions from 5.7 to 9 can be booted in qemu by hot patching in kadb (booting kadb is already described in the how-to).

That's how I patched Solaris 9 for booting under qemu:
kadb[0]: le#leinit:bset a deferred breakpoint
kadb[0]: :c   continue execution
...
kadb[0]: leinit+0x654?i check that we are at the correct place
               add       %o0, 0x10, %g2
kadb[0]: leinit+0x654/X
leinit+0x654:   84022010
kadb[0]: leinit+0x654/W 84022004
leinit+0x654:   0x84022010    =   0x84022004       patch
kadb[0]: leinit:ddelete the breakpoint
kadb[0]: :c continue

Once again I can only recommend  reading the PANIC! UNIX System Crash Dump Analysis Handbook to understand the basics before patching anything.

5 comments:

Anonymous said...

A fairly distinguished engineer added a cast that turned it into a pointer, so the + 4 became array math--- 4 * sizeof(uint32_t). This was introduced by the Solaris 64-bit port. Thanks for catching it, but I wouldn't hold out any hope for a patch at this point.

atar said...

Expected something like this.

Surprised that it was introduced with the 64-bit port though: didn't the 2.6 (and 5.7) already have support for 64 bit SPARC CPUs?

Anonymous said...

Yes, but not for 64-bit (LP64) code. Solaris 7 added the 64-bit kernel and support for 64-bit apps. That entailed making the code compile 32-bit and 64-bit, hence lots of changes throughout.

Jason Stevens said...

wow, it certainly explains why it'd run with no networking.....!

That's an awesome find!

0x0 0x0 said...

Your post is awesome. It covers your creative aspects. I sincerely appreciate your creativity. This article is truly unique.