Saturday, October 9, 2010

Bug in all Solaris versions after 5.7?!?

Tried to clean up some patches for submitting upstream, and it turned out that one of the hacks is needed because of a Solaris bug! I'm really astonished.
The init routine of the network card driver in Solaris 2.6 has this piece of code:

call      ddi_get_parent
ld        [%l0 + 0xc], %o0
call      ddi_get_driver_private
nop
add       %o0, 0x4, %g2
st        %g2, [%l0 + 0x720]

That's how it looks in Solaris 9:
call      ddi_get_parent
ld        [%l0 + 0x10], %o0
call      ddi_get_driver_private
nop
add       %o0, 0x10, %g2
st        %g2, [%l0 + 0x728]

Adding 0x10 to the base of dma registers, makes a pointer to a nowhere.

Yes, qemu is not precise, and doesn't emulate memory aliasing (Blue Swirl had a patch for it), but hey, Solaris works on sun4m only due to a coincidence!

So, all the Solaris versions from 5.7 to 9 can be booted in qemu by hot patching in kadb (booting kadb is already described in the how-to).

That's how I patched Solaris 9 for booting under qemu:
kadb[0]: le#leinit:bset a deferred breakpoint
kadb[0]: :c   continue execution
...
kadb[0]: leinit+0x654?i check that we are at the correct place
               add       %o0, 0x10, %g2
kadb[0]: leinit+0x654/X
leinit+0x654:   84022010
kadb[0]: leinit+0x654/W 84022004
leinit+0x654:   0x84022010    =   0x84022004       patch
kadb[0]: leinit:ddelete the breakpoint
kadb[0]: :c continue

Once again I can only recommend  reading the PANIC! UNIX System Crash Dump Analysis Handbook to understand the basics before patching anything.

Saturday, October 2, 2010

Did java 5+ ever work on 32 bit SPARC machines?

Basically the question should sound "Did java 5_ ever work on 32 bit SPARC machines?", because for the plus part I already know the answer. Up to now it didn't.

Here goes the story:

Sunday, August 15, 2010

Fixed the "Solaris Y2K10" bug

Got back to Solaris/qemu Y2K10 bug. The name turned out to be misleading because a) it's not a Solaris bug and b) it's not a Y2K10 bug.

The reason for the bug was someone mixing hexadecimal and decimal values. Gonna check if there are more such places and send the trivial fix later.

Saturday, August 14, 2010

Submitted SunOS 4.1.4 & Solaris 2.2 fix upstream

Sent upstream the serial port patch. The patch allows booting SunOS 4.1.4 and Solaris 2.2.

Booting Solaris 2.2 is more tricky than SunOS 4.1.4, since it doesn't support SPARCStation-5. It has to be started in SS-20 emulation.

Will update the how-to shortly.

Saturday, July 24, 2010

Going deeper into NetBSD problem

It looks like the documentation on the NCR89C100 chip (aka esp) is wrong or incomplete. Actually nothing new. Earlier I found that timer is definitely described wrong. But at least timer is described correctly in the Sun4m architecture manual, and for the scsi part the only source is the NCR document.

The subunits of the real machine are much more integrated than described. DVMA works differently compared to qemu, the expected interrupts are not triggered, and "Select with(out) attention" commands work differently too. At least in the mode NetBSD uses.

Starting to wonder why qemu is capable to boot anything at all. Looking at all this errors makes me feel like getting lost 100 meters away from the home. Or even in my own basement. Have to experiment further.

Saturday, July 17, 2010

Bug in NetBSD 1.6 - 3.1 emulation

Was going to write that I couldn't imagine what did the NetBSD guys smoke during 5 years between the versions 1.6 and 3.1 inclusively. The NCR 53c9x SCSI chip (known as "esp" in SPARC and PPC machines) was not so uncommon back then. How could they introduce an instability and didn't notice it ?!?

The code from NetBSD 1.5.3:

NCRDMA_SETUP(sc, &sc->sc_cmdp, &sc->sc_cmdlen, 0, &dmasize);
//...
NCRCMD(sc, NCRCMD_SELATN | NCRCMD_DMA);
NCRDMA_GO(sc);

The code from NetBSD 1.6-3.1:

NCRCMD(sc, NCRCMD_SELATN | NCRCMD_DMA);
NCRDMA_SETUP(sc, &sc->sc_cmdp, &sc->sc_cmdlen, 0, &dmasize);
NCRDMA_GO(sc);

The code from NetBSD 4:

NCRDMA_SETUP(sc, &sc->sc_cmdp, &sc->sc_cmdlen, 0, &dmasize);
//...
NCRCMD(sc, NCRCMD_SELATN | NCRCMD_DMA);
NCRDMA_GO(sc);

See the difference? In the versions 1.6-3.1 the command is executed before the DMA is set up, so the SCSI controller may not get the command using DMA.

And then I googled for the expected bug reports and found none. Why could it work on the real hardware? Maybe some latency: if the latency of a SCSI controller was larger than a DMA controller, it might work? Maybe concurrency: if some other driver (for instance Ethernet) prepared DMA for itself, the SCSI driver could steal it? Maybe DMA didn't work for these versions at all and after the first attempt they switched to the PIO mode.

Sunday, July 11, 2010

TME strikes back!

A great news for everyone who's interested in the 64-bit SPARC emulation (sun4u)!

After a nearly 3 years long pause Matt Fredette released the new version of TME - The Machine Emulator. Matt has skipped the sun4m emulation, which I think makes sense, because meanwhile qemu emulates sun4m pretty well (and fast!). Therefore the current list of the supported platforms is sun2, sun3, sun4c and sun4u. The only sun4u machine which tme can emulate is Ultra 1, so don't hold your breath for OpenSolaris support just yet.

But at least it can boot NetBSD 5. Also it uses the original OBP (not OpenBIOS) which implies that the emulation is pretty close to the hardware. It also seems to emulate cg6 graphic adapter which is much more powerful than qemu's tcx. Less powerful than Bob's cg14, but it's not yet included in the official git master anyway.

As for the OpenSolaris: it doesn't support the Ultra-1 and Ultra-2 machines. But! You can give a spin to Martux, the hacked OpenSolaris distribution which does have a support for the early Ultras.

Saturday, July 10, 2010

Fixed the slavio timer bug properly

Fixed the timer bug found last week more correctly. Will submit the patch later.

Gosh, it is hot over here. Today is something like 38C (~100F) . The cpu in my head is refusing to send any patches and fix any further bugs till the temperature drops.

Saturday, July 3, 2010

Fixed another bug in the slavio timer emulation

Trying to get NeXTStep/sparc to boot without any success, I got back to the old bug which seemed to be related: some versions of OBP hang at boot waiting for the timer interrupts.

Somehow I got poisoned by the motto of the qemu developers: OBP doesn't work because the initial set-up is wrong. On the real hardware no one would expect BIOS to work if the machine doesn't pass the power-on self test (POST). But then I it came to me that exactly this motto prevented other people to get OBP working under qemu-system-sparc.

So, I went on and asked Mitch, if he thought whether his creation - OBP - was buggy and relied on the [probably missing] POST initialization. Mitch said that he's pretty much sure that the OBP would do the right thing in this case, so I took another look at the qemu timer code, and fixed the bug.

The bug turned out to be unrelated to the NeXTStep boot problem. On the other hand the fix provides the alternatives to SPARCStation-5 emulation. Now it's possible to get SPARCStation-10 firmware to work, which gives 512m to the guest.

Here come some boot logs with the OBP from SPARCStation-10 and LX.

Saturday, June 19, 2010

Yet another bug in IRQ emulation

Trying to find out why NetBSD versions 1.6-3.1 do not boot, I found a bug in IRQ processing. After fixing it these versions still don't boot, :) but fail more gracefully.
Looks like there is another couple of bugs to go...

Saturday, June 12, 2010

Solaris 2.2 / sparc

Still in 1993. Moved from November to May: Using the cg14 implementation from Bob I was able to boot Solaris 2.2 on emulated SPARCstation-20. It was not possible before cause Solaris 2.2 is not compatible with SS-5. I wonder what are the oldest Solaris/SunOS versions which can be booted on SS-20? Solaris 2.0 is not: it supported only sun4c. SunOS 4.1.2 can not boot either: according to the Wikipedia it supports sun4m, but none of the SPARCstations.

So the versions left untested are Solaris 2.1 and SunOS 4.1.{0-1}. Have no idea about SunOS 4.0.

Will post some screenshots later.

Monday, May 31, 2010

Another week another SCSI bug

Fixed Solaris 2.6+ boot which I accidentally broke last week. It's not that my Solaris 2.3 dma/irq fix was wrong, but the fix unleashed a counterpart interrupts handling bug in esp controller.

Too bad that no one reported it earlier. I wouldn't have to hack till midnight now. ;-) And thanks to VooDoo_UzH_ for reporting it.

Saturday, May 29, 2010

SX framebuffer emulation

Bob Breuer implemented the cgfourteen framebuffer for SS-20. This is the great news for those who wait for NeXTStep/sparc emulation under qemu!


With one hack that I used last year for booting Solaris 2.5.1, it's possible to boot the early Solaris versions (2.3-2.5.1) with a color graphics. There is still a problem with y2k10 bug with Solaris 2.4-2.5.1 under qemu.

sparc-softmmu/qemu-system-sparc -M SS-20 -bios /path/to/ss20_v2.22.3.bin -hdb /path/to/Solaris23.iso -m 64 -cpu "TI SuperSparc 50"

sparc-softmmu/qemu-system-sparc -M SS-20 -bios /path/to/ss20_v2.22.3.bin -hdb /path/to/Solaris251.iso -m 64 -cpu "Ross RT620" -startdate "2009-09-05"

Right now it sometimes complains about

zs3: ring buffer overflow

when you do something with the mouse. But this issue is not SX/cg14 related.

Also some OBP versions are not happy with the DBRI emulation. Under these versions instead of booting directly, detection of the DBRI has to be switched off:

ok setenv sbus-probe-list f
ok reset

Saturday, May 22, 2010

1993 reached

The time machine is working! Well, I had to fix another bug to do it. This time in DMA again. The Solaris 2.3/sparc can be installed under qemu!

Submitted the patch upstream. After it gets accepted, it should be possible to use Solaris 2.5.1- instructions from the how-to to install Solaris 2.3. I wonder if the patch also improves the situation with NetBSD 5.x stability. Feel free to report.

Saturday, May 15, 2010

Trying to reach 1993

Trying to boot yet older Solaris/SunOS version: Solaris 2.3. According to the Wikipedia, it's the first one which supported SPARCStation-5. You may wanna ask "what is about SunOS 4.1.4 (Solaris 1.1.2), it can be booted, and it must be the older?". No, it's not. Looks like Solaris 2.x and SunOS 4.x were developed independently: 1.1.2 was released in November 1994, and 2.3 was released in November 1993. Is a bit misleading. But explains why 2.3 has problems with the esp scsi controller while 1.1.2 doesn't. And the both systems are so old that the kadb debugger can't set deferred breakpoints. Anyway, the current status of Solaris 2.3:

ok boot disk1:d -vs
Boot device: /iommu/sbus/espdma@5,8400000/esp@5,8800000/sd@1,0:d  File and args: -vs
Size: 719688+166144+108728 Bytes
SunOS Release 5.3 Version Generic [UNIX(R) System V Release 4.0]
Copyright (c) 1983-1993, Sun Microsystems, Inc.
vac: enabled in write through mode
cpu0: FMI,MB86907 (mid 0 impl 0x0 ver 0x4 clock 1075 MHz)
mem = 65536K (0x4000000)
avail mem = 58142720
Ethernet address = 52:54:0:12:34:56
root nexus = SUNW,SPARCstation-5
iommu0 at root: obio 0x10000000
sbus0 at iommu0: obio 0x10001000
espdma0 at sbus0: SBus slot 5 0x8400000
esp0 at espdma0: SBus slot 5 0x8800000 sparc ipl 4
        polled command timeout
esp:            State=CLEARING Last State=CLEARING
esp:            Latched stat=0x97 intr=0x8 fifo 0x0
esp:            last msg out: NO-OP; last msg in: COMMAND COMPLETE
esp:            DMA csr=0xa4240030
esp:            addr=fc00300a dmacnt=8000 last=fc003008 last_cnt=30
esp:            Cmd dump for Target 1 Lun 0:
esp:            cdblen=6, cdb=[ 0x12 0x0 0x0 0x0 0x30 0x0 ]; Status=0x0
esp:            pkt_state=0x1f pkt_flags=0xb pkt_statistics=0x0
esp:            cmd_flags=0x1422 cmd_timeout=60

Saturday, May 8, 2010

sent the SunOS 4.1.4 patch upstream

Sent the SunOS 4.1.4 patch upstream. If it gets accepted, it should be possible to get up to "The Wh" using Solaris 2.5.1- instructions from the how-to.

Found another bug in qemu which I can't fix. Hoped that others can do it, so posted it to the mailing list, but got no responses. Actually the bug may be related to the SunOS fix I just sent: maybe SunOS tries to access a non-connected address not because it's buggy, but because qemu translates the address wrong.

If no-one from the mailing list answers I'll have to dig it further.

Saturday, April 24, 2010

SunOS 4.1.4 again

Got back to SunOS 4.1.4 (aka Solaris 1.1.2). Thanks to Carey Sсhug, I had a chance to test in on a hard drive image. While serial port driver still has problems, the system is alive in the background and using qemu port forwarding  I was able to login via telnet:

SunOS UNIX (sol112)

login: root
SunOS Release 4.1.4 (GENERIC) #2: Fri Oct 14 11:09:47 PDT 1994
sol112#

Sunday, April 18, 2010

FPU bugs

Found more bugs. This time in FPU. One was very promptly fixed by Blue Swirl (the mysterious qemu-sparc maintainer). Another one is more tricky. qemu goes astray and doesn't stop at breakpoint, so it's gonna be hard to find out what exactly is going on.

Saturday, April 17, 2010

sent another qemu patch upstream

The -m 256 option should be not necessary anymore. The minimal RAM amount for the emulated SPARCStation-5 is now 32 MiB. So OBP should understand now the memory sizes of 32, 64, 96, 128, 160,192, 224 and 256 MiB. When not specified, the default value of 128 MiB is taken.

The support of RAM size <256MiB is important for SunOS 4.1.4 (aka Solaris 1.1.2) because its installer crashes under qemu when started with "-m 256" option.

Will update the how-to shortly.

Wednesday, April 7, 2010

a couple of bugs in OpenBIOS

Found two bugs in OpenBIOS. I think I fixed one of them, but it can be checked only when the other one gets fixed. The other one is pretty deep in the Forth engine, don't want to mess with it. Once it will be fixed I'll get back to OpenBIOS.

Sunday, April 4, 2010

The ultimate way of starting qemu under Windows

Being on vacation I gave a shot on building and launching qemu under Windows. Of course I was stumbled on getting the message

chardev: backend "stdio" not found
qemu: could not open serial device 'mon:stdio': No such file or directory


when running with  -nographic option. The option is necessary for launching qemu with SPARCstation-5's OBP, as it doesn't support qemu's TCX graphic card.

I googled, but found only that many people have stumbled over this problem before with no luck of solving it. And then I was so desperate that I read the qemu manual page. And you know what? The solution is described there! Instead of using console for standard I/O, it's possible to redirect it to a port, and telnet there:

sparc-softmmu/qemu-system-sparc -nographic  -serial mon:telnet::4444,server,nowait -bios ../sparc/ss5.bin -hdd ../sparc/solaris-disk

And then just telnet to localhost:4444.
Now I have Solaris/sparc running on my Windows laptop. ;-)

Moreover I think the option -serial mon:telnet::4444,server,nowait is pretty useful for Unix hosts too. When stated with it you have ability to get into qemu console by pressing Ctrl-A c,  and hence eject/insert virtual CD-ROMs, plugging unplugging disks and so on.

Update:  the above method works if qemu is  compiled under cygwin. With plain msys/MinGW, it has to be started with
-monitor file:bob.txt -bios ..\ss5.bin -m 256 -M SS-5 -serial mon:telnet:127.0.0.1:4444,server,nowait
options. Thanks, Neozeed.

Sunday, March 28, 2010

saving energy while working with qemu

As you know qemu always tends to use 100% of cpu. Qemu console is not available with -nographic option, so saving vm is not possible.

The good news is UNIX (and of course Linux) has a standard mechanism for freezing and unfreezing processes:

$ kill -STOP
(from another shell)

will stop  the process. It will still reside in the  memory (RAM or swap).
To continue process, from the shell where it was started, type

$ fg

There will be no echo, but the command will execute.
If there is an echo, the tty is probably messed up. It can be restored again with

$ stty -brkint -icrnl ixoff -imaxbel iutf8 -isig -icanon -iexten -echo

going further with OpenBIOS

No, I was wrong about the 80 characters limit. Found the correct place where it crashed, but the reasoning was wrong. Something was corrupting the stack, so success after patching ufsboot was incidental.

Sunday, March 21, 2010

back to OpenBIOS

Since SS-5 OBP works pretty well, I think it's a good time to get back to OpenBIOS: it's free (as in speech), and can support more than 256 MiBs RAM.  Theoretically Ross's SPARCStation-20 OBP supports up to 2 GiB RAM, but frankly speaking I think having a more or less hardware-precise SS-5 is enough.

The very first experience has shown that bootblk loads ufsboot successfully, but then it hangs. I'm feeling myself getting back few months in the past. Back then I was debugging pretty much similar routines in the bootblk, but the main job was done by Mitch Bradley. I sort of hoped I skipped the necessity to learn the gory details of the UFS filesystem. Ok, back to school. ;-) . My very first long shot:

$ strings sun4m/ufsboot
...
 ['] find-device catch if 2drop true else current-device device-end then swap l!
...

Trying to find out whether this sentence works I found out that OpenBIOS doesn't support input lines longer than 80 characters! It would be interesting to find out whether it's just a console limitation, or the API too. If it's an API limitation it may be exactly the bug that prevents Solaris from booting.

Another observing:

0 > " /openprom" ['] find-device  ok
3 > . . . -29a594 9 -264cec  ok

  • "ok" is not a prompt, but a response. I don't think it's important.
  • It operates with signed hexadecimals. This looks pretty weird. Here comes the same question again: is this just a console representation, or another API bug.

Gonna check it with OpenBIOS developers.

Saturday, March 20, 2010

SSH ciphers on emulated sparcs

Running X-applications on an emulated sparc over two ssh tunnels seemed quite slow, so I experimented a bit with different ciphers. Of course when you are connected from localhost to the very same localhost, a risk that a third party can sniff you connection is pretty low. So, obviously the most performing cipher would have been "no cipher". Unfortunately the ssh  bundled with Solaris 9 doesn't have this feature. I found an article where a few different ciphers where compared and wondered whether an emulated sparc cpu is closer to a real sparc or to the host cpu (in my case x86-64). It seems to be that the emulated system is acting rather like a host: arcfour is just a little bit more performant than blowfish:
cipherthroughput
3des1.1MiB/s
aes128-cbc1.89 MiB/s
blowfish2.15MiB/s
arcfour2.63MiB/s
Theoretical limit (dd if=/dev/zero of=junk bs=1024k count=100): 20.8MiB/s.
Yes, 2.63MiB/s is pretty lame. But hey, on the real hardware you'd get even less.

For now I'm adding

Host 10.0.2.*
  Ciphers arcfour

to my ~/.ssh/config

Saturday, March 13, 2010

Tunneling qemu guest back and forth

Using the virtual guests sooner or later you need to have a way to transfer data between guest and host. If you just need to transfer files, it's relatively easy. There is a good document describing many possible ways of doing it in the qemu Puppy project.

But sometimes you want to pretend your virtual guest is a real machine: you want to log in into it, have multiple sessions, start GUI programs and so on. The easiest way to achieve it is use an ssh tunnel. This method is actually neither Solaris, nor qemu specific: you can do the same for exposing machines from your intranet to the outer world. How it works:

You start a ssh daemon (sshd) on your guest and on your host (or any other machine which will work as a representative for the guest). Then on your guest you say:

ssh -R 10022:localhost:22 hostuser@10.0.2.2

This opens a tunnel. If someone has an access to the host's port 10022 she can also login to your guest. So beware that your guest is exposed to the outer world after this point.

Now you can log in into your guest from the host using
ssh -X -p10022 guestuser@localhost
or from anywhere where your host is reachable using
ssh -X -p10022 guestuser@

The -X option is used to turn on X11 forwarding. From the sessions started like this you can run GUI applications.

It's also possible to transfer data via sftp:
sftp -o "port=10022" guestuser@localhost

Saturday, March 6, 2010

Testing needed

While I'm busy with rewriting my Solaris 7 hack (which is also needed for Solaris 8 and 9), I'm looking for some help: I'd like to expore the y2k10 problem under qemu. Carey couldn't reproduce the problem on a real SPARCStation, so it may be a qemu nvram bug.
Currently needed /etc/system hack seems to be also nvram-related.

So, what I'm asking for:
boot Solaris 2.0 - 2.5.1 image with qemu option -rtc base=<dateTtime> where dateTtime is a string like  "2006-06-22T16:01:21". Try different dates and times. I.e:
 
./qemu-system-sparc -M SS-5 -m 256 -rtc base=2009-12-22T16:01:21
 -L  -bios ss5.bin -nographic  -hdb Solaris2.5.1.iso
...
ok boot disk1:d -vs 
 
Currently it looks like the system doesn't come up if the day of month is >20. 
At least for years 2009 and 2010.
 
The question is: do year, month, hours and seconds also play a role?
I.e. is it possible to boot when day is 22, with any year?
Is it not possible to boot when day is 22 and year is 2010 with any
month, hour and minutes settings?
 
Waiting for your feedback (you can write in English, German or Russian).

Saturday, February 27, 2010

sneak preview screenshots

I haven't posted any images in my blog yet, so it may be the time to fix it:

Solaris 2.6 and Solaris 7 are not very different. Up to 2.6 you can already make such screenshots yourself using the vanilla qemu (git/master). Sneak preview starts here:

For some reason, Solaris 9 looks not that nice:

For all the installations I used a 'PC console' terminal type. Maybe it's not recommended anymore? That's how it looked at the end:

And yes, instead of pressing Return to reboot the qemu machine, I still had to do the following:

!
echo set scsi_options=0x58 >> /a/etc/system
halt

So, I've achieved the goal I set 7 moths ago. Solaris 9/SPARC will work under qemu.

Sunday, February 21, 2010

who is here?

How many people are reading my blog? I see ~ 30 page impressions daily, but I don't know how many of them are unique. Also it's possible that people read only the how-to, and not the rest. How many of you are actually reading this? Just leave a comment. If you do it anonymously, then please sign your message somehow, so  I can distinguish you (if there is more than one of  you :). It would be also interesting to know why are you reading this.

I see that this week there were suddenly two banner clicks! Who has clicked? I'd like to thank this people. Was there an interesting banner?

The question to the people who didn't click: are you annoyed by the ads here? I think about switching the banners off: after all 2 clicks within half a year is not a top business solution. Actually I'm also considering switching off this blog. Doesn't seem to be the right way for getting rich and famous.

Things I did for qemu/sparc weren't a rocket science. When I started a half year ago I didn't know neither the sparc assembly language, nor SPARCstation architecture, nor forth, nor qemu, nor git. Which means that I fixed all of this bugs just because I cared to fix them. Any other person who'd care would have fixed them too. Nobody else did, so probably no one actually needs emulating Solaris/sparc? Does any of you miss emulation of some other architecture/OS?

another week, another qemu bug again

Fixed one more CPU bug. Really surprised that it didn't affect Linux and Solaris versions prior 2.6. I'd expect that lots of multi-threaded code relying on mutex should have been affected...

Saturday, February 20, 2010

Lucky fix

It turned out that my last dma fixes had a nice side-effect: Solaris 8 and 9 don't complain about spurious timer interrupts anymore. I'm really surprised they don't because I'm absolutely sure there is at least one bug in the system timer. Funny enough fixing the timer bug without dma, covered 98% of spurious interrupt complains, whereas fixing dma removes 100% of them. After all it's possible that both documents describing the timer in two mutual exclusive ways describe it correctly. There are just multiple variants of the timer chip.

With the Solaris 7 hack, Solaris 8 & 9 can boot in a single user mode, regardless the timer fix. Will update the status in the how-to shortly.

Sunday, February 14, 2010

booting from a hdd image

For some reason during a HDD boot, SCSI disk driver (sd) is loaded before SCSI controller's (esp) . So the system can find no disks and exits causing kernel panic:

Cannot assemble drivers for root /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/sd@0,0:a
Cannot mount root on /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/sd@0,0:a fstype ufs
panic: vfs_mountroot: cannot mount root


A workaround: add the following line to /etc/system in the HDD image:
set scsi_options=0x58

Updated the Solaris/sparc under qemu how-to.

yet another dma bug

Fixed another bug in qemu sparc32 dma. This one seems to be only relevant for Linux guests though.

Sunday, February 7, 2010

another week, another qemu bug

There are very few qemu/sparc modules out there which I haven't had to touch. Since I've started I founded/fixed bugs in: irq, esp, cpuesp, esp, cpu, scsi-disk, cpuscsi-disk, fdd, tcx, mmu, slavio, mmu. Today this list is extended with (sparc32_)dma.

Fixed a bug in dma which produced spurious interrupts and incomplete reads/writes. Will submit the patch later on this week.

Monday, February 1, 2010

Spurious interrupts

Previously qemu dropped interrupts on disabling them. The real hardware doesn't do it. Which means, that lots of interrupts were dropped, including the spurious ones. But the real ones were dropped too, that's why the system timer was ticking so slow.

The question is where the spurious  IRQs are coming from: it's not only the ESP which produces them, under Solaris 8 & 9 there are lots of complains about spurious timer interrupts, and they both seem to crash due to a buffer overflow during processing of a (possibly spurious) LE interrupt.

NetBSD 1.3.3 boot doesn't crash with my wrong irq patch. The patch makes qemu drop interrupts, including the spurious ones.

All in all it looked like there is one global problem with interrupts processing. Until now. But now it looks like there is not one source of spurious interrupts, but many.
  • esp doesn't seem to produce spurious interrupts under Solaris while reading.
  • I've found a bug in the slavio timer which produces spurious interrupts.
  • NetBSD may be crashing due to another issue: I don't have a disk which I could boot under OBP. It is possible that OpenBIOS is not compatible with the older NetBSD versions. Update: This is the reason for NetBSD crashing, Michael Kostylev confirmed it.

Sunday, January 31, 2010

The problem of year 2010

Looks like the Solaris versions prior to the version 2.6 (SunOS 5.6) have a problem of year 2010.

Surprise, surprise! If the current date is specified, they just don't boot in a single user mode, hanging after detection of serial ports (zs1 is /obio/zs@0,0), when /devices directory should be configured.

At least the version 2.5.1 hangs when the system date is > 2009.12.20.

Weird, huh?

Going to update the howto....

hsfs_putpage:birthday gift

I think I've fixed the problem with the dirty pages. This is my birthday gift to me.
The bug is really simple: if we fail before modifying a RAM page, we don't really get the page dirty.

Submitted the patch upstream.

Sunday, January 24, 2010

Solaris 7 (aka SunOS 5.7)

Actually I didn't tell the truth as I wrote that I didn't have anything up my sleeve. People who read this blog noticed, that I claimed I could boot Solaris 7, but the how-to explicitly says it is not possible with the vanilla qemu.

Yes, I have a hack which would allow booting Solaris 7, but re-writing it properly would take some time.

The question is what do you think is more important: enabling Solaris 7 (~ 2 weekends), or fixing existing issues with Solaris 2.{4-6} (no time estimates, research necessary)?

Does Solaris 7 have something useful what 2.x didn't have?

Saturday, January 23, 2010

OpenSolaris sources are beautiful

Trying to find the roots of the "hsfs_putpage: dirty HSFS page" error, I looked in the OpenSolaris source.

High Sierra is a pretty old and stable stuff, so it is possible that the code is similar to OpenSolaris.
I looked in debugger, and the function calls hierarchy looks pretty similar.

Now in the OpenSolaris source code there is a nice comment:

/*
* Normally pvn_getdirty() should return 0, which
* impies that it has done the job for us.
* The shouldn't-happen scenario is when it returns 1.
* This means that the page has been modified and
* needs to be put back.
* Since we can't write on a CD, we fake a failed
* I/O and force pvn_write_done() to destroy the page.
*/
if (pvn_getdirty(pp, flags) == 1) {
               cmn_err(CE_NOTE,
                           "hsfs_putpage: dirty HSFS page");

The bright side: I don't know any other open source project which would be so nicely documented. The description confirms the suspect I had: it's the problem with MMU emulation.

The dark side:  it's not just the problem with hsfs. Other file systems will have this bug too, and there it must be even more dramatic: they must be constantly writing cache data back to disk.

The 100% mmu & mxcc emulation in qemu would make the memory access very slow. I still hope we can avoid this, but don't know how.

Saturday, January 16, 2010

MMU & irq fixes

Submitted mmu and irq fixes upstream and updated the solaris/sparc under qemu howto. Now all the fixes I had are in the vanilla qemu (git). Don't have anything else up my sleeve.

So please test everything you can and please send reports. Here and to the qemu-devel mailing list.

Saturday, January 9, 2010

Happy 2010

I'm back from skiing. Happy 2010 everyone!

Updated the Solaris under qemu how-to, added launching instructions for the Solaris versions prior 2.6.

Meanwhile I'm working on the MMU emulation problems. It's harder than it looked. There is a documentation on the SuperSPARC Multi Cache Controller, which describes, what MMU does in a case of a double fault differently than it is currently implemented in qemu. Unfortunately it looks like either it describes it wrong, or I don't get what is written there (yes, I've seen it says "Subject to Change Without Notice" in the footer). At least I can not confirm the described behavior on the real SS-20.

So, there are 3 variants of the MMU behavior: qemu's, described, and the correct one. I'm exploring the last two to fix the first one.