Saturday, November 28, 2009

My broken SS-5 is just too fast

Few days ago I wrote I have a world's fastest broken SS-5. The problem is that it is so fast that this alone makes it broken.

It looks like at least some PromDiag/POST/OBP tests fail just because qemu doesn't emulate cpu cycle-exact. It can be they wait that an irq would happen while they execute like 100 nops, but qemu nop is much faster than a real one, so an irq comes too late. "nop" is just an example here, I didn't disassemble the tests yet, but it looks very much like it: the timer test passes if I make the timer tick 256 times faster.

Probably the other tests fail due to the same reason. So the OBP timer/irq tests are probably useless.

Sunday, November 22, 2009

Hidden OBP feature found

debugging the initial Power-On-Self-Test of OBP 2.29 I found a secret level a cool undocumented feature, PromDiag. Whenever I turn it on, instead of getting a usual OBP "OK" prompt I get:

PromDiag
NOK>

I wonder what is "NOK"? Does it mean "Not OK"? Anyway, I played with it a little. It runed out that it can launch single POST tests, and there are some more features, which have to be discovered yet. All in all it accepts just a few symbols: numbers, dot, comma, c, h, l, q, r, s:

Saturday, November 21, 2009

IRQ/Timer puzzles

I've got two puzzles a puzzle concerning slavio irq/timer behavior:
  • qemu doesn't seem to behave as specified in the slavio documentation, I get an irq when I expect none.(no, it's ok, my test was just wrong)
  • a real SS-20 doesn't seem to behave as specified in the slavio documentation, I don't get an irq, when I expect one.


I already found some places where the documentation is not precise, for instance it claims that reserved bits "read as 0, write has no effect", but they don't always read as 0, (may be they aren't really reserved?).

I miss my oscilloscope and direct access to the hw. If someone has a sun4m machine and an oscilloscope, please get in touch!

Sunday, November 15, 2009

Lucky bug

After submitting the performance/irq fix upstream it turned out the fix should have never worked! I missed a logical "not" in the expression, and did exactly the opposite to what I intended, clearing all the irqs which had not to be cleared, and not clearing the irqs which had to be cleared.

The fact that this wrong code is working means that for some unknown reasons, the interrupts are additionally raised and cleared somewhere else. For the timer it's 99.5% of interrupts: without the improper fix I get ~ 100 spurious interrupt complains per second, with the improper fix it is 1 complain every 2 seconds.

And the fact that the wrong code improves the emulation (NetBSD 1.3.x-1.5.x is working) means there are some counterpart bugs in the code...

Saturday, November 14, 2009

The World's fastest broken SS-5

Fixed a bug in the IRQ routing and now I have a machine gun, ho-ho-ho the World's fastest [broken] SparcStation-5! According to the Solaris 2.6 and Solaris7 output, it's faster than 1 GHz:

cpu0: FMI,MB86907 (mid 0 impl 0x0 ver 0x4 clock 1083 MHz)


Remember, last week I told that after fixing the performance problems I'm going to get back in the XXI century? Well, I lied. I did another quick stop in the past:

WARNING: clock gained 3987 days -- CHECK AND RESET THE DATE!


Guess, which OS is it?

Thursday, November 12, 2009

sparc64's name is Legion

Recently I get a lot of questions about sparc64 emulation in qemu. The only answer I can give, is the same one as "The Zombies" sang in 1960s: "She's not there".

But there is another Open Source (the project's page claims it is CDDL, in the sources I've seen GPL) project which targets emulating Sparcs. Actually, OpenSparc. So, if you are interested in the Solaris 10+ emulation, take a look in the Project Kenai's Legion Sparc Simulator.

If you already have a 64 bit Solaris machine, you can download a pre-built all-in-one (including the Solaris 10 image) package here.

The bad news are, there is no network card emulation, and currently build doesn't work under Linux. Should work under the x86 Solaris though, so it is not completely useless. Also it should be possible to port it to linux, since SunStudio is also available there.

But for now I'd be sticking to 32 bits and qemu.

Saturday, November 7, 2009

Another week - another Solaris version (tm)

I'm still in the 20th century, but making progress.

SunOS Release 5.7 Version Generic_106541-08 [UNIX(R) System V Release 4.0]
Copyright (c) 1983-1999, Sun Microsystems, Inc.

# uname -a
SunOS 5.7 Generic_106541-08 sun4m sparc SUNW,SPARCstation-5
# ls -l /
total 122
drwxr-xr-x 2 root sys 512 Oct 15 1999 a

The next stop is going to be 21 century. But going to look at the performance problems first. Waiting 6 hours for the '#' is a bit boring (and the problem is definitely not the CPU speed).

Thanks to Sergey Dionidis (a.k.a sdio @ LOR) for helping to test it.

Friday, November 6, 2009

Things missing in the vanilla qemu

Things which can be fixed in the vanilla qemu:

For OBP:

- Floppy. Instead of fixing it, I broke it completely, so OBP doesn't try to initialize it and hang. Actually it maybe not the fdc itself, but the irq handling. There are OBP tests which may help to understand what is currently going wrong. I didn't need it, does it actually work with OpenBIOS?

- [SparcStation-5] 0x6e000000 AFX. OBP tries to access it and fails with "unassigned address exception".
- [SparcStation-20] 0xef8010000 DBRI, 0x9000X00X FCode SIMMs. Same problem here.

AFX, DBRI and FCode SIMMs can be implemented as stubs. Yet better would be if SBUS probing would do a proper fault. This devices are optional.

Solaris 2.5.1 - 7 have problems with

- interrupt handling. Due to errors in irq handling, the boot takes ~7 Hours. Working on it.
- MMU (?). Solaris tries to access memory after translation failed. Actually Debian/linux has similar problems, but it ignores traps, while Solaris doesn't.
- MMU (?). The message "hsfs_putpage: dirty HSFS page" means that a page was modified, although it wasn't supposed to. May have to do with the cacheabilty tweaking.
- [SparcStation-20] PAC. Solaris hangs where it would normally say that physical address cache is enabled.

Additionally Solaris 8-9 have problems with

- Spurious interrupts.

Nice to haves:
- The ability to send STOP-A to the serial console. Would greatly help to use Solaris kernel debugger (kadb) when the kernel hangs.

- Network boot. Looks like something which can easily be fixed. Currently it fails with the message
Internal loopback test -- Wrong packet length; expected 36, observed 64

Last updated on 15.12.2009.

Sunday, November 1, 2009

Another week - another Solaris version

After re-fixing the bug I fixed before, and fixing the third one in the Sparc CPU emulation, I got Solaris 2.6 going. This version doesn't say how much did the clock gain since the release, so I can not estimate, how good am I doing in comparison to the reference 4900 days. Probably it was released in year 1997 on July the 18th.

SunOS Release 5.6 Version Generic [UNIX(R) System V Release 4.0]
Copyright (c) 1983-1997, Sun Microsystems, Inc.

NOTICE: SBus clock frequency out of range.
# ls -ld /a
drwxr-xr-x 2 root sys 512 Jul 18 1997 a

It also complains that

NOTICE: hsfs_putpage: dirty HSFS page

this may mean the current qemu workaround for non-emulating CPU cache is not good for Solaris. On the other side, who needs the hsfs module :).

Again, thanks Carey for the Solaris 2.6 disk!

Saturday, October 31, 2009

Playground extension

Carey Schug did me another favor. This time he provided access to a SparcServer-20 which he has at home! Now I can compare a virtual SS-20 with a real one. So, the little bugs, beware of me!
Thanks, Carey!

Sunday, October 25, 2009

Another small improvement in SCSI emulation

The message

Error: Inquiry (STANDARD) buffer size 5 is less than 36 (TODO: only 5 required)

was quite annoying, so I attacked it. I used to have a hack which explicitly implemented
inquiry with the allocation size length == 5.
But it turned out that the clean fix is quite trivial, the specification says "if the allocation length of the command descriptor block (CDB) is too small to transfer all of the parameters, the additional length shall not be adjusted to reflect the truncation", so the clean fix for this problem is not longer than the code telling about "TODO". :) Will send the patch upstream. Now probe-scsi in OBP looks really nice:

ok probe-scsi
Target 0
Unit 0 Disk QEMU QEMU HARDDISK 0.11
Target 1
Unit 0 Disk QEMU QEMU HARDDISK 0.11
Target 2
Unit 0 Removable Read Only device QEMU QEMU CD-ROM 0.11
ok

Saturday, October 24, 2009

Greetings, Professor Falken

Success! I've managed to boot Solaris 2.5.1/sparc under qemu! It takes long. I started it on my machine (E8200@2.66GHz) yesterday at 18:31, and today at 03:24, I finally got the "#":

WARNING: clock gained 4900 days -- CHECK AND RESET THE DATE!
# ls
a devices kernel opt root.proto var
bin etc kvm platform sbin
cdrom export lib proc tmp
dev home mnt reconfigure usr
# uname -a
SunOS 5.5.1 Generic sun4m sparc SUNW,SPARCstation-20

Woo-hoo! Currently I can boot it in a single user mode only, as in normal mode it fails on non-existing SX framebuffer.

Does anyone know how to change module "exclude" list from the adb session?

P.S. 4900 days - didn't notice first, that the date was so special. Hope to boot Solaris 9 earlier than 4900 days after its release date. :)

Saturday, October 17, 2009

Can't invoke /etc/init, error 14

Solaris boots the kernel now. The next stop is booting /etc/init . With Solaris 2.6 I get

Can't invoke /etc/init, error 14
panic:icode


After searching the Net, I found that the error number is 2 if /sbin/init is missing, and 8 if /sbin/init has an incorrect executable format. But what is 14? Is this the errno 14, aka EFAULT? Is this a message from the init, or from the kernel?

Sunday, October 11, 2009

The second bug in the qemu sparc CPU emulation

Mitch Bradley found a bug in the Sparc CPU emulation. I gave him access to my qemu session and he stepped through the code. Is sort of shame, I haven't done it myself, as I thought about it 2 weeks ago.

This bug is actually much more heavy than the previous one. While the previous one affected only the hand crafted assembly code, this one should hit the compiled code as well: the handling of carry flag in subxcc instruction is wrong. And, yes, it's RISC architecture, so this instruction is also used for comparison...

I'm really astonished that Linux/sparc is working under qemu since years. Of course Linux may be just more robust, but it also may mean that gcc doesn't use some sparcv8 instructions, and is therefore inefficient.

Saturday, October 10, 2009

The OBP author is here and still cares

Mitch Bradley (OBP author) explained how OBP space* commands are working. While the emulation is meanwhile working properly for this commands, it is nevertheless great to know that the father still cares about his child.

And, speaking of children, Mitch is also the author of OLPC firmware and OLPC Forth tutorial I mentioned before.

Sunday, October 4, 2009

A little improvement of SCSI disk emulation

NetBSD complained:

sd3: mode sense (4) returned nonsense; using fictitious geometry

SunOS 4.1.4 complains:

sd3: non-CCS device found at target 0 lun 0 on esp0

I was hoping that this is the same bug, so I tried to investigate. It's always easier to work when the sources are available. It turned out that NetBSD expects a block descriptor for mode pages, which wasn't implemented in qemu, and is probably mandatory for SCSI-2 disks. The specs are unclear: they don't explicitly say "optional" or "mandatory" in chapters 9.1.2 and 9.3.3.

Anyway, I implemented the block descriptor, and the NetBSD bug is gone. But this was in vain: the SunOS bug is still there, the Solaris bootblk problem is also un-affected.

Will send the patch upstream though. A small improvement is still an improvement.

Saturday, September 12, 2009

Solaris 2.5.1 and 2.6 install disks

Carey Schug provided me his spare 2.5.1 and 2.6 Solaris install disks, so I can extend my playground. Solaris 2.6 fails exactly the same way as 9 (bootblk: can't find the boot program). The 2.5.1 version fails differently:
(Can't deduct msgbuf from physical memory list) Program terminated

The problem is known with real SS-10 / SS-20 machines, and solution supposed to be moving SIMMs from one slot to another one. Not sure, that it would work with virtual SIMMs as well...

But, anyway this is a rare case when more bugs/error messages is better. Thanks, Carey!

Sunday, September 6, 2009

The "Wh": SunOS 4.1.4 under qemu-system-sparc

Ha! OBP already has shown one advantage: it can boot SunOS 4.1.4 (aka Solaris 1.1.2)!
Well, almost. It has problems with the serial console after booting. It is possible to give commands, but I see only the first 2 bytes of response:

SPARCstation 20 (1 X 390Z50), No Keyboard
ROM Rev. 2.25, 64 MB memory installed, Serial #0.
Ethernet address 52:54:0:12:34:56, Host ID: 72000000.

ok boot disk2:d
Boot device: /iommu/sbus/espdma@f,400000/esp@f,800000/sd@2,0:d File and args:
root on /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@2,0:d fstype 4.2
Boot: vmunix
Size: 868352+2319136+75288 bytes
SuperSPARC: PAC ENABLED
SunOS Release 4.1.4 (MUNIX) #2: Fri Oct 14 11:09:07 PDT 1994
Copyright (c) 1983-1993, Sun Microsystems, Inc.
cpu = SUNW,SPARCstation-20
mod0 = TI,TMS390Z50 (mid = 8)
mem = 65084K (0x3f8f000)
avail mem = 60309504
Ethernet address = 52:54:0:12:34:56
espdma0 at SBus slot f 0x400000
esp0 at SBus slot f 0x800000 pri 4 (onboard)
sd2: non-CCS device found at target 2 lun 0 on esp0
sd2 at esp0 target 2 lun 0
sd2:
sd2: Vendor 'QEMU', product 'QEMU', (unknown capacity)
sd3: non-CCS device found at target 0 lun 0 on esp0
sd3 at esp0 target 0 lun 0
sd3:
ledma0 at SBus slot f 0x400010
le0 at SBus slot f 0xc00000 pri 6 (onboard)
bpp_attach unit 0: register check failed!
zs0 at obio 0x100000 pri 12 (onboard)
zs1 at obio 0x0 pri 12 (onboard)
fdc: no RQM - stat 0xc0
fdc: no RQM - stat 0xc0
SUNW,fdtwo0 at obio 0x700000 pri 11 (onboard)
rd0: using preloaded munixfs
WARNING: preposterous time in file system -- CHECK AND RESET THE DATE!
root on rd0a fstype 4.2
swap on ns0b fstype spec size 58616K
dump on ns0b fstype spec size 58604K

What would you like to do?
1 - install SunOS mini-root
2 - exit to single user shell
Enter a 1 or 2: 2
you may restart this script by typing
# reboot


The things marked with light gray is what I don't see, but what is there on a real SS-20. Thanks, Carey for spending the time to find this out!
But the things are there: if I answer "2" I obviously get into the miniroot and when I type "reboot" afterwards the machine successfully reboots.

So, it is a success, but with open issues:

  • What is wrong with the serial console? I tested the mode 9600E1 (which is used by SunOS 4.1.4) under NetBSD and it looks totally fine there. Also I tried to switch parity off under SunOS, and it made no difference.
  • Why SCSI devices are detected as non-CCS? They must be compatible with SCSI-2, or even partially with SCSI-3.
Update 15.08.2010: The SunOS 4.1.4 can be successfully booted after my last patch.

Saturday, September 5, 2009

Yes, I did it! OBP is functional under qemu!

Wrote another hack, and now I can use OBP under qemu! Woo-hoo! All in all it took just 7 weekends. :)

The bad news is that.... It doesn't do much better than OpenBIOS. I can boot Linux, and NetBSD (this one is more complex, as OBP checks disklabel, and NetBSD miniroots don't have it), but booting Solaris 9 gives...
...the very error message as under OpenBIOS:

bootblk: can't find the boot program

I still think it is a progress: OBP has a debugger, while OpenBIOS doesn't. If nothing else helps I can step through the boot loader.

And anyway the effort wasn't useless: I found one bug in CPU, and three in scsi layer.

Third bug in SCSI layer (esp) fixed

Up to now "select without attention" was handled the same way as "select with attention". According to NCR53C9X documentation, select without ATN sends the CDB (Command Descriptor Block) directly, whereas select with ATN sends one message phase byte followed by 6, 10, or 12 command phase bytes. This one byte was shifting CDB and producing invalid commands. After fixing this bug scsi probe looks like this:

ok probe-scsi
Target 0
Unit 0 Disk
Target 2
Unit 0 Removable Read Only device
ok

It still doesn't show all the target properties, but it doesn't matter. The next stop is booting.

Sunday, August 30, 2009

second (and third) bugs in SCSI (esp) emulation

It looks like "Message Accepted" shouldn't write a response. At least ESP_RFLAGS must definetely be 0.

After I fixed the bug, OBP got one step further. Now it sees the targets:

ok probe-scsi
Target 0
scsi-disk: Unsupported command length, command 60
scsi-disk: Unsupported command length, command 60
scsi-disk: Unsupported command length, command c0
scsi-disk: Unsupported command length, command c0
scsi-disk: Unsupported command length, command e0
scsi-disk: Unsupported command length, command e0
Target 2
scsi-disk: Unsupported command length, command 60
scsi-disk: Unsupported command length, command 60
scsi-disk: Unsupported command length, command c0
scsi-disk: Unsupported command length, command c0
scsi-disk: Unsupported command length, command e0
scsi-disk: Unsupported command length, command e0
ok

Next stop is inquiring targets parameters.

Saturday, August 29, 2009

got past scsi-controller initialization

got past scsi-controller initialization. The next stop is disks probing:

ok probe-scsi
Extra scsi data. Fatal error.Extra scsi data. Fatal error.
ok

Sunday, August 23, 2009

Sun Studio for free

Currently there are two options to get Sun Studio for free:

- Everyone can have Sun Studio 12 update 1. There are Solaris/sparc, Solaris/intel and Linux/i686 versions. There seems to be compatibility issues with ld on newer linux distributions. The error message reads "libm format not recognized". The half official solution is

rm /opt/sun/sunstudio12/prod/lib/amd64/ld
ln -s /usr/bin/ld /opt/sun/sunstudio12/prod/lib/amd64/ld

Also there are problems with headless install under Linux. But it is possible to extract all the rpms with the --extract-installation-data command line option.

- OpenSolaris developers may get the version 10 here. But only the Solaris versions, not the Linux one. I wonder why would they need to mess with the older version 10, as there is a shiny new 12u1? Are there any known compatibility issues in the 12u1?