Sunday, January 31, 2010

The problem of year 2010

Looks like the Solaris versions prior to the version 2.6 (SunOS 5.6) have a problem of year 2010.

Surprise, surprise! If the current date is specified, they just don't boot in a single user mode, hanging after detection of serial ports (zs1 is /obio/zs@0,0), when /devices directory should be configured.

At least the version 2.5.1 hangs when the system date is > 2009.12.20.

Weird, huh?

Going to update the howto....

hsfs_putpage:birthday gift

I think I've fixed the problem with the dirty pages. This is my birthday gift to me.
The bug is really simple: if we fail before modifying a RAM page, we don't really get the page dirty.

Submitted the patch upstream.

Sunday, January 24, 2010

Solaris 7 (aka SunOS 5.7)

Actually I didn't tell the truth as I wrote that I didn't have anything up my sleeve. People who read this blog noticed, that I claimed I could boot Solaris 7, but the how-to explicitly says it is not possible with the vanilla qemu.

Yes, I have a hack which would allow booting Solaris 7, but re-writing it properly would take some time.

The question is what do you think is more important: enabling Solaris 7 (~ 2 weekends), or fixing existing issues with Solaris 2.{4-6} (no time estimates, research necessary)?

Does Solaris 7 have something useful what 2.x didn't have?

Saturday, January 23, 2010

OpenSolaris sources are beautiful

Trying to find the roots of the "hsfs_putpage: dirty HSFS page" error, I looked in the OpenSolaris source.

High Sierra is a pretty old and stable stuff, so it is possible that the code is similar to OpenSolaris.
I looked in debugger, and the function calls hierarchy looks pretty similar.

Now in the OpenSolaris source code there is a nice comment:

/*
* Normally pvn_getdirty() should return 0, which
* impies that it has done the job for us.
* The shouldn't-happen scenario is when it returns 1.
* This means that the page has been modified and
* needs to be put back.
* Since we can't write on a CD, we fake a failed
* I/O and force pvn_write_done() to destroy the page.
*/
if (pvn_getdirty(pp, flags) == 1) {
               cmn_err(CE_NOTE,
                           "hsfs_putpage: dirty HSFS page");

The bright side: I don't know any other open source project which would be so nicely documented. The description confirms the suspect I had: it's the problem with MMU emulation.

The dark side:  it's not just the problem with hsfs. Other file systems will have this bug too, and there it must be even more dramatic: they must be constantly writing cache data back to disk.

The 100% mmu & mxcc emulation in qemu would make the memory access very slow. I still hope we can avoid this, but don't know how.

Saturday, January 16, 2010

MMU & irq fixes

Submitted mmu and irq fixes upstream and updated the solaris/sparc under qemu howto. Now all the fixes I had are in the vanilla qemu (git). Don't have anything else up my sleeve.

So please test everything you can and please send reports. Here and to the qemu-devel mailing list.

Saturday, January 9, 2010

Happy 2010

I'm back from skiing. Happy 2010 everyone!

Updated the Solaris under qemu how-to, added launching instructions for the Solaris versions prior 2.6.

Meanwhile I'm working on the MMU emulation problems. It's harder than it looked. There is a documentation on the SuperSPARC Multi Cache Controller, which describes, what MMU does in a case of a double fault differently than it is currently implemented in qemu. Unfortunately it looks like either it describes it wrong, or I don't get what is written there (yes, I've seen it says "Subject to Change Without Notice" in the footer). At least I can not confirm the described behavior on the real SS-20.

So, there are 3 variants of the MMU behavior: qemu's, described, and the correct one. I'm exploring the last two to fix the first one.

Sunday, December 13, 2009

Solaris/sparc under qemu how-to

This document attempts to answer basic questions on how to set up qemu-system-sparc so that it can boot Solaris. The current version of this how-to is available under http://tyom.blogspot.com/2009/12/solaris-under-qemu-how-to.html. The emulation of sparc system is still being improved, so this document will probably be updated.

Disclaimer

Reading, understanding and using the Howto is by no means a guarantee for successfully finishing the task, and any mechanical failure, accident, psychological trauma or other cataclysm that may result from using the Howto is entirely your own responsibility and liability.

List of supported Solaris versions

Currently the versions 1.1.2 (SunOS 4.1.4), 2.2 (SunOS 5.2), 2.3 (SunOS 5.3), 2.4 (SunOS 5.4), 2.5.1 (SunOS 5.5.1), 2.6 (SunOS 5.6), 7 (SunOS 5.7), 8 (SunOS 5.8) and 9 (SunOS 5.9) are supported.

Kernel debugger (kadb) can be loaded for the versions 1.1.2 (from a HDD image) and 2.2 - 9 (from a HDD image or an install CD/DVD).

Solaris 10 and OpenSolaris do not support 32 bit SPARC platforms, so they can never be booted under qemu-system-sparc. (Some day they maybe will be booted under qemu-system-sparc64 though).
The versions prior 1.1.1 and 2.0-2.1 do not support SPARCstation-5 or SPARCstation-20, so they can not be booted. The version 2.2 can be booted in the SPARCstation-20 emulation mode only (the exact steps are not yet described in this howto).

The version 1.1.1 is not yet tested. Reports or/and boot disks are welcome.

List of supported Firmware versions

OpenBIOS 1.0+ can boot some Solaris versions. Please, try it first, and if doesn't work for you, send reports to the OpenBIOS mailing list.

The proprietary OpenBoot PROM (OBP) can boot all the Solaris versions available for the sun4m architecture (see the previous chapter). The SPARCstation-5 OBP versions 2.15 and 2.29 are known to work. The SPARCstation-20 revisions 2.15, 2.22 and 2.25 work only for some guest CPU models. If you have tested other OBP versions please let me know.

Compiling qemu-system-sparc

The qemu version 0.13+ is capable of booting some Solaris versions. In order to run Solaris 2.6+, a QEMU 2.5.91+ (April the 12th, 2016) is required. Indeed, some bugfixes or features are only included in the "bleeding edge", a.k.a git master. Compiling master is straightforward:

git clone git://git.qemu.org/qemu.git
mkdir -p qemu/build
cd qemu/build
../configure --target-list=sparc-softmmu
make

Launching qemu with OpenBIOS to boot from a cdrom image

As of today (svn.r1246) OpenBIOS can boot the following Solaris versions:

SunOS Release 5.7 Version Generic_106541-02
SunOS Release 5.7 Version Generic_106541-08
SunOS Release 5.8 Version Generic_108528-09 32-bit
SunOS Release 5.8 Version Generic_108528-29 32-bit
SunOS Release 5.9 Version Generic_112233-10 32-bit
SunOS Release 5.9 Version Generic_118558-34 32-bit

Launch command:
sparc-softmmu/qemu-system-sparc -M SS-5  -nographic -prom-env 'auto-boot?=false' -cdrom Solaris8.iso


The option -prom-env 'auto-boot?=false' is optional. It allows specifying Solaris boot options, like -v and/or -s and/or -b. If no boot options are required, the command line option -boot d can be used instead.

The option -nographic is handy, because the emulated default graphic card (TCX) is not compatible with Solaris X-Window system. Nevertheless it can be omitted when booting in text console (e.g. single user mode, or installation without X-Window).

If the option -prom-env 'auto-boot?=false' is used, type
 boot cdrom:d -v
at the "0 >" prompt.

The versions known to boot with OBP, but not with OpenBIOS:

SunOS Release 4.1.4 (MUNIX)
SunOS Release 5.2 Version Generic
SunOS Release 5.3 Version Generic
SunOS Release 5.4 Version Generic
SunOS Release 5.5.1 Version Generic
SunOS Release 5.6 Version Generic

Launching qemu with OBP to boot from a cdrom image

Solaris 2.6 and above:

sparc-softmmu/qemu-system-sparc -M SS-5  -bios /path/to/ss5.bin -nographic -cdrom Solaris2.6.iso

Solaris 2.5.1 and earlier:

sparc-softmmu/qemu-system-sparc -M SS-5 -startdate "2009-12-13" -bios /path/to/ss5.bin -nographic -hdb Solaris2.5.1.iso

The option -startdate "2009-12-13" is necessary for the older QEMU versions, which have the y2010 bug. It's not necessary for QEMU 1.2+.

The option -nographic is handy, because the emulated default graphic card (TCX) is not compatible with Solaris X-Window system. Nevertheless it can be omitted when booting in text console (e.g. single user mode, or installation without X-Window).

Successfully initialized OBP should print lines like this:

SPARCstation 5, No Keyboard
...
Type help for more information
ok

booting Solaris in a single user mode from a CD-ROM
at the ok prompt:

Solaris 2.6+:

boot disk2:d -vs

Solaris 2.5.1-:

boot disk1:d -vs

booting Solaris kernel debugger from a CD-ROM
at the ok prompt:

Solaris 2.6+:

boot disk2:d kadb -kdv
Solaris 2.5.1-:

boot disk1:d kadb -kdv
If you are going to debug the kernel, I recommend you to read the PANIC! UNIX System Crash Dump Analysis Handbook. The kernel debugger is a really powerful tool and the book helped me a lot to learn how to use it and shed a lot of light on Solaris internals.

booting Solaris from a HDD image
To be able to boot from a hdd image, add the following line to the /etc/system on the hard drive:
set scsi_options=0x58

Normally during the Solaris installation process the hard drive is mounted under /a, so it can be done with
# cat >> /a/etc/system
set scsi_options=0x58
^d
right after the installation. Hence it's recommended to switch off the automatic reboot  option when the installer asks for it.

If the steps above are not performed, the HDD boot fails with the error message:
cannot mount root on /iommu@0, 10000000/sbus@0, 10001000/espdma@5, 8400000/esp@5, 8800000/sd@0,0

Comments & reports are welcome. Here and at the qemu-devel mailing list.

Last updated on 11.04.2016.

/Happy hacking

Saturday, December 12, 2009

Submitted the SS-5 OBP patches upstream

Did some clean-ups and submitted a minimal patch set upstream.

I omitted the SparcStation-20 support for now, which made the patches for SparcStation-5 OBP cleaner, so there is a chance they will be accepted (my last patch was silently ignored for a month just because it was badly formatted, that's why I say "a chance", not "a good chance").

This means that if the patches will be accepted for the qemu 0.12, it will be possible to boot Solaris 2.5.1 and Solaris 2.6 kernels in the vanilla qemu with SS-5 OBP. I'll write a qemu/Solaris/sparc how-to.

No support for SS-20 (and SunOS 4.1.4 / Solaris 1.1.2) yet, as it is more buggy, and less requested. If someone thinks the support for SunOS 4.1.4 is important, feel free to write me. If you ever debugged a SunOS 4.x kernel (or have tools for doing it), please write me.

Saturday, November 28, 2009

My broken SS-5 is just too fast

Few days ago I wrote I have a world's fastest broken SS-5. The problem is that it is so fast that this alone makes it broken.

It looks like at least some PromDiag/POST/OBP tests fail just because qemu doesn't emulate cpu cycle-exact. It can be they wait that an irq would happen while they execute like 100 nops, but qemu nop is much faster than a real one, so an irq comes too late. "nop" is just an example here, I didn't disassemble the tests yet, but it looks very much like it: the timer test passes if I make the timer tick 256 times faster.

Probably the other tests fail due to the same reason. So the OBP timer/irq tests are probably useless.

Sunday, November 22, 2009

Hidden OBP feature found

debugging the initial Power-On-Self-Test of OBP 2.29 I found a secret level a cool undocumented feature, PromDiag. Whenever I turn it on, instead of getting a usual OBP "OK" prompt I get:

PromDiag
NOK>

I wonder what is "NOK"? Does it mean "Not OK"? Anyway, I played with it a little. It runed out that it can launch single POST tests, and there are some more features, which have to be discovered yet. All in all it accepts just a few symbols: numbers, dot, comma, c, h, l, q, r, s:

Saturday, November 21, 2009

IRQ/Timer puzzles

I've got two puzzles a puzzle concerning slavio irq/timer behavior:
  • qemu doesn't seem to behave as specified in the slavio documentation, I get an irq when I expect none.(no, it's ok, my test was just wrong)
  • a real SS-20 doesn't seem to behave as specified in the slavio documentation, I don't get an irq, when I expect one.


I already found some places where the documentation is not precise, for instance it claims that reserved bits "read as 0, write has no effect", but they don't always read as 0, (may be they aren't really reserved?).

I miss my oscilloscope and direct access to the hw. If someone has a sun4m machine and an oscilloscope, please get in touch!

Sunday, November 15, 2009

Lucky bug

After submitting the performance/irq fix upstream it turned out the fix should have never worked! I missed a logical "not" in the expression, and did exactly the opposite to what I intended, clearing all the irqs which had not to be cleared, and not clearing the irqs which had to be cleared.

The fact that this wrong code is working means that for some unknown reasons, the interrupts are additionally raised and cleared somewhere else. For the timer it's 99.5% of interrupts: without the improper fix I get ~ 100 spurious interrupt complains per second, with the improper fix it is 1 complain every 2 seconds.

And the fact that the wrong code improves the emulation (NetBSD 1.3.x-1.5.x is working) means there are some counterpart bugs in the code...

Saturday, November 14, 2009

The World's fastest broken SS-5

Fixed a bug in the IRQ routing and now I have a machine gun, ho-ho-ho the World's fastest [broken] SparcStation-5! According to the Solaris 2.6 and Solaris7 output, it's faster than 1 GHz:

cpu0: FMI,MB86907 (mid 0 impl 0x0 ver 0x4 clock 1083 MHz)


Remember, last week I told that after fixing the performance problems I'm going to get back in the XXI century? Well, I lied. I did another quick stop in the past:

WARNING: clock gained 3987 days -- CHECK AND RESET THE DATE!


Guess, which OS is it?

Thursday, November 12, 2009

sparc64's name is Legion

Recently I get a lot of questions about sparc64 emulation in qemu. The only answer I can give, is the same one as "The Zombies" sang in 1960s: "She's not there".

But there is another Open Source (the project's page claims it is CDDL, in the sources I've seen GPL) project which targets emulating Sparcs. Actually, OpenSparc. So, if you are interested in the Solaris 10+ emulation, take a look in the Project Kenai's Legion Sparc Simulator.

If you already have a 64 bit Solaris machine, you can download a pre-built all-in-one (including the Solaris 10 image) package here.

The bad news are, there is no network card emulation, and currently build doesn't work under Linux. Should work under the x86 Solaris though, so it is not completely useless. Also it should be possible to port it to linux, since SunStudio is also available there.

But for now I'd be sticking to 32 bits and qemu.

Saturday, November 7, 2009

Another week - another Solaris version (tm)

I'm still in the 20th century, but making progress.

SunOS Release 5.7 Version Generic_106541-08 [UNIX(R) System V Release 4.0]
Copyright (c) 1983-1999, Sun Microsystems, Inc.

# uname -a
SunOS 5.7 Generic_106541-08 sun4m sparc SUNW,SPARCstation-5
# ls -l /
total 122
drwxr-xr-x 2 root sys 512 Oct 15 1999 a

The next stop is going to be 21 century. But going to look at the performance problems first. Waiting 6 hours for the '#' is a bit boring (and the problem is definitely not the CPU speed).

Thanks to Sergey Dionidis (a.k.a sdio @ LOR) for helping to test it.

Friday, November 6, 2009

Things missing in the vanilla qemu

Things which can be fixed in the vanilla qemu:

For OBP:

- Floppy. Instead of fixing it, I broke it completely, so OBP doesn't try to initialize it and hang. Actually it maybe not the fdc itself, but the irq handling. There are OBP tests which may help to understand what is currently going wrong. I didn't need it, does it actually work with OpenBIOS?

- [SparcStation-5] 0x6e000000 AFX. OBP tries to access it and fails with "unassigned address exception".
- [SparcStation-20] 0xef8010000 DBRI, 0x9000X00X FCode SIMMs. Same problem here.

AFX, DBRI and FCode SIMMs can be implemented as stubs. Yet better would be if SBUS probing would do a proper fault. This devices are optional.

Solaris 2.5.1 - 7 have problems with

- interrupt handling. Due to errors in irq handling, the boot takes ~7 Hours. Working on it.
- MMU (?). Solaris tries to access memory after translation failed. Actually Debian/linux has similar problems, but it ignores traps, while Solaris doesn't.
- MMU (?). The message "hsfs_putpage: dirty HSFS page" means that a page was modified, although it wasn't supposed to. May have to do with the cacheabilty tweaking.
- [SparcStation-20] PAC. Solaris hangs where it would normally say that physical address cache is enabled.

Additionally Solaris 8-9 have problems with

- Spurious interrupts.

Nice to haves:
- The ability to send STOP-A to the serial console. Would greatly help to use Solaris kernel debugger (kadb) when the kernel hangs.

- Network boot. Looks like something which can easily be fixed. Currently it fails with the message
Internal loopback test -- Wrong packet length; expected 36, observed 64

Last updated on 15.12.2009.

Sunday, November 1, 2009

Another week - another Solaris version

After re-fixing the bug I fixed before, and fixing the third one in the Sparc CPU emulation, I got Solaris 2.6 going. This version doesn't say how much did the clock gain since the release, so I can not estimate, how good am I doing in comparison to the reference 4900 days. Probably it was released in year 1997 on July the 18th.

SunOS Release 5.6 Version Generic [UNIX(R) System V Release 4.0]
Copyright (c) 1983-1997, Sun Microsystems, Inc.

NOTICE: SBus clock frequency out of range.
# ls -ld /a
drwxr-xr-x 2 root sys 512 Jul 18 1997 a

It also complains that

NOTICE: hsfs_putpage: dirty HSFS page

this may mean the current qemu workaround for non-emulating CPU cache is not good for Solaris. On the other side, who needs the hsfs module :).

Again, thanks Carey for the Solaris 2.6 disk!

Saturday, October 31, 2009

Playground extension

Carey Schug did me another favor. This time he provided access to a SparcServer-20 which he has at home! Now I can compare a virtual SS-20 with a real one. So, the little bugs, beware of me!
Thanks, Carey!

Sunday, October 25, 2009

Another small improvement in SCSI emulation

The message

Error: Inquiry (STANDARD) buffer size 5 is less than 36 (TODO: only 5 required)

was quite annoying, so I attacked it. I used to have a hack which explicitly implemented
inquiry with the allocation size length == 5.
But it turned out that the clean fix is quite trivial, the specification says "if the allocation length of the command descriptor block (CDB) is too small to transfer all of the parameters, the additional length shall not be adjusted to reflect the truncation", so the clean fix for this problem is not longer than the code telling about "TODO". :) Will send the patch upstream. Now probe-scsi in OBP looks really nice:

ok probe-scsi
Target 0
Unit 0 Disk QEMU QEMU HARDDISK 0.11
Target 1
Unit 0 Disk QEMU QEMU HARDDISK 0.11
Target 2
Unit 0 Removable Read Only device QEMU QEMU CD-ROM 0.11
ok

Saturday, October 24, 2009

Greetings, Professor Falken

Success! I've managed to boot Solaris 2.5.1/sparc under qemu! It takes long. I started it on my machine (E8200@2.66GHz) yesterday at 18:31, and today at 03:24, I finally got the "#":

WARNING: clock gained 4900 days -- CHECK AND RESET THE DATE!
# ls
a devices kernel opt root.proto var
bin etc kvm platform sbin
cdrom export lib proc tmp
dev home mnt reconfigure usr
# uname -a
SunOS 5.5.1 Generic sun4m sparc SUNW,SPARCstation-20

Woo-hoo! Currently I can boot it in a single user mode only, as in normal mode it fails on non-existing SX framebuffer.

Does anyone know how to change module "exclude" list from the adb session?

P.S. 4900 days - didn't notice first, that the date was so special. Hope to boot Solaris 9 earlier than 4900 days after its release date. :)

Saturday, October 17, 2009

Can't invoke /etc/init, error 14

Solaris boots the kernel now. The next stop is booting /etc/init . With Solaris 2.6 I get

Can't invoke /etc/init, error 14
panic:icode


After searching the Net, I found that the error number is 2 if /sbin/init is missing, and 8 if /sbin/init has an incorrect executable format. But what is 14? Is this the errno 14, aka EFAULT? Is this a message from the init, or from the kernel?

Sunday, October 11, 2009

The second bug in the qemu sparc CPU emulation

Mitch Bradley found a bug in the Sparc CPU emulation. I gave him access to my qemu session and he stepped through the code. Is sort of shame, I haven't done it myself, as I thought about it 2 weeks ago.

This bug is actually much more heavy than the previous one. While the previous one affected only the hand crafted assembly code, this one should hit the compiled code as well: the handling of carry flag in subxcc instruction is wrong. And, yes, it's RISC architecture, so this instruction is also used for comparison...

I'm really astonished that Linux/sparc is working under qemu since years. Of course Linux may be just more robust, but it also may mean that gcc doesn't use some sparcv8 instructions, and is therefore inefficient.

Saturday, October 10, 2009

The OBP author is here and still cares

Mitch Bradley (OBP author) explained how OBP space* commands are working. While the emulation is meanwhile working properly for this commands, it is nevertheless great to know that the father still cares about his child.

And, speaking of children, Mitch is also the author of OLPC firmware and OLPC Forth tutorial I mentioned before.