Saturday, July 30, 2011

Of Course, It Runs NetBSD!™

NetBSD boot was almost a piece of cake. It tries to detect more things than Solaris and Linux, so I had to implement a couple of device registers more. At the first glance using more registers contradicts with the declared portability. On another hand, it would work on some modified/weird chipsets having non-standard interrupt controllers. Don't know if such chipsets were ever produced though.

NetBSD 4.0.1 (INSTALL) #0: Wed Oct  8 01:13:04 PDT 2008
        builds@wb32:/home/builds/ab/netbsd-4-0-1-RELEASE/sparc64/200810080053Z-obj/home/builds/ab/netbsd-4-0-1-RELEASE/src/sys/arch/sparc64/compile/INSTALL
total memory = 256 MB
avail memory = 234 MB
timecounter: Timecounters tick every 10.000 msec
mainbus0 (root): QEMU,Ultra-3/2: hostid 80000000
cpu0 at mainbus0: SUNW,UltraSPARC @ 100.681 MHz, UPA id 0
cpu0: 32K instruction (32 b/l), 16K data (32 b/l), 512K external (64 b/l)
...
# ping 10.0.2.2
PING 10.0.2.2 (10.0.2.2): 56 data bytes
64 bytes from 10.0.2.2: icmp_seq=0 ttl=255 time=1.575 ms
64 bytes from 10.0.2.2: icmp_seq=1 ttl=255 time=1.150 ms
^C
----10.0.2.2 PING Statistics----
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 1.150/1.363/1.575/0.301 ms
#

Haven't found any CPU bugs so far, only interrupt processing in the serial port (not relevant to [Open]Solaris). Surprisingly sparc32 and sparc64 serial drivers diverge quite a lot.

Next stop - FreeBSD/sparc64.

Sunday, July 17, 2011

Hiding the boot device from an OS

Suspecting a bug in SCSI subsystem, I was wondering how to prevent an OS from detecting the device it's booting from. At first it sounds weird, but it's possible for most of OS/distributions in the SPARC (and probably PPC) world, because they
  • boot  init scripts from a RAM disk
  • use IEEE 1275 device tree to find out available devices *
*) with exception of  Linux. At least the Linux uniform IDE driver tries to detect a hardware by probing.

For FreeBSD, NetBSD, OpenBSD and Solaris it's possible to hide the boot device from the OS install CD with the following commands:

ok cd scsi " unknown" name device-end
ok boot /unknown/sd@6,0:f

Use the Forth, Luke!

Sunday, July 10, 2011

User mode emulation for Linux/SPARC64

As you know, qemu has a user-mode emulation. This means binaries for one CPU (for example SPARC) can be executed on another CPU (for example i686) under the same OS. The system calls are executed directly on the host (which means they are executed as fast as for native binaries), and the executable code itself is translated with TCG.
After I fixed ELF loading for SPARC64 binaries, qemu can load not only static Linux/sparc64 binaries, but dynamically linked ones too. To achieve that qemu has to be statically linked (it may sound confusing, for launching statically linked binaries qemu doesn't have to be built statically but for the dynamically one it has to be) and chrooted to the guest OS file system image:
 qemu$ ./configure --target-list=sparc-linux-user,sparc64-linux-user,sparc32plus-linux-user --static && make
 ...
 qemu$  mv -i sparc32plus-linux-user/qemu-sparc32plus ../debian-6-sparc64-initrd/
 qemu$  su
 Password:
 #  /usr/sbin/chroot ../debian-6-sparc64-initrd/ /qemu-sparc32plus -L . /bin/busybox
BusyBox v1.17.1 (Debian 1:1.17.1-8) multi-call binary.
Copyright (C) 1998-2009 Erik Andersen, Rob Landley, Denys Vlasenko
and others. Licensed under GPLv2.
See source distribution for full notice.

Usage: busybox [function] [arguments]...
   or: function [arguments]...

        BusyBox is a multi-call binary that combines many common Unix
        utilities into a single executable.  Most people will create a
        link to busybox for each function they wish to use and BusyBox
        will act like whatever it was invoked as.

Currently defined functions:
        [, [[, ar, ash, basename, blockdev, cat, chmod, chown, chroot, cp, cut, dd, df, dirname, dmesg, dnsdomainname, echo, egrep, env, expr, false,
        find, free, freeramdisk, grep, gunzip, halt, head, hostname, id, init, ip, kill, klogd, ln, logger, ls, md5sum, mkdir, mknod, mkswap, modinfo,
        more, mount, mv, nc, pidof, pivot_root, poweroff, printf, ps, pwd, readlink, realpath, reboot, rm, rmdir, route, sed, sh, sleep, sort,
        swapoff, swapon, sync, syslogd, tail, tar, test, tftp, touch, tr, true, tty, udhcpc, umount, uname, uniq, wc, wget, zcat


This is a great tool to find CPU bugs! One can use existing binaries for example to check that emulated CPU produces the same md5 or sha512 sum for a certain binary as the host does, or pack/unpack using gzip and bzip, or just observe weird unames:

 /usr/sbin/chroot ../debian-6-sparc64-initrd/ /qemu-sparc32plus -L . /bin/uname -a
Linux localhost 2.6.34.9-69.fc13.x86_64 #1 SMP Tue May 3 09:23:03 UTC 2011 sun4 GNU/Linux

Saturday, July 9, 2011

NetBSD vfs and MMU emulation

Since Solaris works perfectly, I turned again to NetBSD for the further test cases.

root on md0a dumps on md0b
root file system type: ffs
WARNING: clock gained 1005 days
WARNING: CHECK AND RESET THE DATE!
exec /sbin/init: error 8
init: trying /sbin/oinit
exec /sbin/oinit: error 2
init: trying /sbin/init.bak
exec /sbin/init.bak: error 2
init: not found panic: no init
cpu0: kdb breakpoint at 1362e60
Stopped in pid 1.1 (init) at 0x1362e64: nop
db>
Hmm what would be a reason for such a behavior? Sounds familiar, right? I hear you saying CPU math bug? Wrong. :)
The previous time I saw the message it was really a math bug. This time it's a MMU problem. Unlike SPARC v8, the v9 has a NFO mode for page mappings, which means No Fault Only. The 'only' part is crucial: all other page loads should fault.

Solaris knows what pages has been marked NFO (after all they can be marked by the OS only) and uses only the appropriate loads.

On the other side NetBSD vfs implementation explicitly provokes such faults to load RAM pages from a file. Had to read the NetBSD documentation to understand how its vfs works. The sources are not that good documented.

After fixing the bug (a sort of a counterpart to the one Tsuneo Saito mentioned on the mailing list), NetBSD gets a bit further:

Kernelized RAIDframe activated md0:
internal 5120 KB image area
root on md0a dumps on md0b
root file system type: ffs
WARNING: clock gained 1005 days
WARNING: CHECK AND RESET THE DATE!
erase ^?, werase ^W, kill ^U, intr ^C

The only part is missing here is the '#' prompt after the message. :) Working on it.

Saturday, July 2, 2011

More math bugs

 Why would one file system work, and another one - not?

Because the fs driver is buggy? Of course not! The ones who read this blog on a regular basis, know that inability to read the file has to do with a bug in a math emulation.
Thanks to Jakub's test case, I fixed two bugs with carry flag handling (yes, again). Now HelenOS/sparc64 boot looks like this:


My impressions of HelenOS - it's neat, the sources are good documented and easy readable. Also I needed just a few minutes to set up cross compiling under Linux/x86_64. So, if you need a small, micro-kernel(!) and multi-arch (amd64, arm32, ia32, ia64, mips32, ppc32, sparc64) OS to play with, HelenOS is definitely worth looking at.

Saturday, May 21, 2011

Now finally something new

Loading: /platform/sun4u/boot_archiveramdisk-root ufs-file-system
Loading: /platform/sun4u/kernel/sparcv9/unix
module /platform/sun4u/kernel/sparcv9/unix: text at [0x1000000, 0x10bf34d] data at 0x1800000
module /platform/sun4u/kernel/sparcv9/genunix: text at [0x10bf350, 0x12b5c7f] data at 0x1865e00
module /platform/sun4u/kernel/misc/sparcv9/platmod: text at [0x12b5c80, 0x12b5c97] data at 0x18bac30
module /platform/sun4u/kernel/cpu/sparcv9/SUNW,UltraSPARC-II: text at [0x12b5cc0, 0x12c2a37] data at 0x18bb2c0
SunOS Release 5.11 Version MilaX_0.3.2 64-bit
Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
os-io Ethernet address = 52:54:0:12:34:56
Using default device instance data
mem = 262144K (0x10000000)
avail mem = 154615808
...
Preparing live image for use
Hostname: milax
Requesting System Maintenance Mode
(See /lib/svc/share/README for more information.)
Console login service(s) cannot run

Root password for system maintenance (control-d to bypass):
single-user privilege assigned to /dev/console.
Entering System Maintenance Mode

May 20 07:26:42 su: 'su root' succeeded for root on /dev/console
-bash: /usr/sbin/quota: No such file or directory
Sun Microsystems Inc.   SunOS 5.11      MilaX_0.3.2     October 2008
-bash: /bin/mail: No such file or directory
-bash: id: command not found
-bash: id: command not found
-bash: [: !=: unary operator expected
(root@milax)# uname -X
System = SunOS
Node = milax
Release = 5.11
KernelID = MilaX_0.3.2
Machine = sun4u
BusType =
Serial =
Users =
OEM# = 0
Origin# = 1
NumCPU = 1

(root@milax)# uname -a
SunOS milax 5.11 MilaX_0.3.2 sun4u sparc sun4u
(root@milax)#


The missing files above are caused by the maintenance mode which in turn is caused by the missing network card. So the next steps are obvious:
- plug in a supported NIC
- find a customer for the sun4u emulation
- ???
- PROFIT!

PS. Btw, are there any live OpenSolaris/Illumos based live CDs newer than MilaX 0.3.2? The later MilaX versions seem to be PC-only.

Seen a really broken pipe

Have you seen a broken pipe? Sounds like a stupid question for everyone who is working with *NIX. Everyone who has English as a mother tongue, or is old enough to use a non-localized OS has seen a "broken pipe" message. A younger generation might have seen the message in their native language.

Well, that's not the sort of brokenness  I'm talking about.  I mean it more literary:

# echo "This pipe works fine" |cat
This pipe works fine
# echo "And this pipe is veeeeeeeeeeeeeeeeeeeery broken. This pipe is really very broken. Broken." | cat
Ths pip eisveTkebn

Ops. Had to spend a lot of of time in Solaris ascending from SCSI driver (at first I didn't realize that the bug appears in pipes, it looked like a DMA bug) to streams, pipe and so on and then descending to the memcpy which was the source of trouble. It turned out, memcpy uses different routines for small and large data chunks. The small chunks are copied in a loop word-wise and the large ones are copied using VIS instructions (that's SPARC equivalent for Intel's MMX). The emulation of VIS instructions in qemu is buggy, so the memory gets corrupted when these instructions are used.

Once again I'm very impressed that Solaris 2.5.1 - Solaris 7 boots without problems on such a broken hardware. Let's see if  the newer Solaris versions would work with this emulation bug fixed.

Saturday, April 23, 2011

OpenBIOS strikes back

Mark did it! OpenBIOS svn.r1035 with two patches (escc and lance) can boot a sun4m Solaris. As soon as the patches are reviewed and committed I'll have to update the how-to. Booting Solaris is going to be much simpler. Congratulations the OpenBIOS Team!

Friday, April 22, 2011

Seen a usable #

Well sort of:

Loading: /platform/sun4u/ufsboot
Size: 330820+55556+67364 Bytes
SunOS Release 5.7 Version Generic_106541-08 [UNIX(R) System V Release 4.0]
Copyright (c) 1983-1999, Sun Microsystems, Inc.
Ethernet address = 52:54:0:12:34:56
Using default device instance data
# uname -X
System = SunOS
Node =
Release = 5.7
KernelID = Generic_106541-08
Machine = sun4u
BusType = <unknown>
Serial = <unknown>
Users = <unknown>
OEM# = 0
Origin# = 1
NumCPU = 1
# ls
a           dev         kernel      opt         proto        tmp
bin         devices     lib        platform    reconfigure  usr
cdrom       etc         mnt         proc        sbin         var

So, the Solaris 2.6 and 7 kernels are functional in a minimal mode (the 2.5.1 and 8+ hang on device detection). The single user mode is not there yet though:

...
ld.so.1: internal: malloc failed
Killed
FATAL ERROR: / file system type "" is unknown
             Exiting to shell.
#

Nice is that Solaris engineers did a really good job making the OS robust. The message above comes from a branch in a script which has a following comment:

            # "this never happens" :-)

Expect the unexpected, well done!

Now, the bad news: MilaX 0.3.2, and probably other OpenSolaris distributions are eager to play with the E-Cache, which qemu doesn't emulate. If I find the way to tell MilaX that there is no cache, it probably would get it up to the command line too. The old Solaris versions had a '-n' boot parameter to switch the cache off, but MilaX says this boot parameter is invalid.

Gonna take a break now. (Happy Easter, everyone!) The next stop is a working single user mode. Stay tuned!

Saturday, April 16, 2011

I've seen the # but it doesn't count

It turned out that last weekend I didn't specify the contest conditions good enough.

Now I have to tell that I've seen the boot prompt under qemu-system-sparc64, but being fair, it doesn't count yet, because
a) only Solaris 2.5.1 - 7 get it up to there. The goal for 64 bit machine is certainly OpenSolaris (or maybe "just" Solaris 10? What do you think?).
b) the system hangs right after printing the prompt.

The next goal is therefore a usable # prompt, not just the hanging one. :-) But stay tuned, I'm getting closer...

Saturday, April 9, 2011

Whose house?

I occurred to me that I'm sort of competing with Mark Cave-Ayland. While Mark makes a really good job to get a proprietary OS (Solaris 8) working with an Open Source firmware (OpenBIOS) under a 32 bit SPARC qemu machine, I'm trying to get a Open Source OS (OpenSolaris) working with a proprietary firmware (OBP) under a 64 bit SPARC qemu machine. Right now we are at about the same place - he is right after the disk detection, I'm a bit behind.

Who's gonna see the '#' first, what are your bets? What task do you think is easier?

But speaking seriously, if Mark makes it first I'll buy him a beer.

Sunday, March 27, 2011

Round and round she goes, and where she stops nobody knows

After a while I got back to qemu. This time to the sparc64 port. It's interesting to see how the highlights of the sun4m story are repeating. With the difference that this time the way goes faster. The highlights of the last weekends:
  • What? Not even a greeting message? I should give it up. hack, hack, hack
  • Oh, wow, it gives a message "Button Power ON"! Too bad it hangs afterwards. hack, hack, hack
  • Ha! got it up to the Forth bootstrapping, too bad I have no prompt. hack, hack, hack
  • Yes! The Forth is with me, there finally is the "ok " prompt. Too bad it sees no devices. hack, hack, hack
  • Woo-hoo, OBP sees a disk controller! Why doesn't it see the disks? hack, hack, hack
  • Aha, there are my disks. Can I boot Solaris, please? Hrm, no. Can I boot anything? No?!? hack, hack
  • Ok, now it's properly initialized. Well, properly enough to boot SILO, and yes, it's stuck at ufsboot just like the sparc32 version did.
  • Ta-da! Finally I see something new:
SunOS Release 5.11 Version Natamar_0.4__b96 64-bit
Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
  • But wait, didn't I see it with OpenBIOS?!? Yes, I did. In fact all the 64 bit SPARC operating systems I tried so far go a little further with OpenBIOS.
  • So, can OBP on qemu-system-sparc64 do anything the OpenBIOS can't?!? No?!? (because badly documented missing devices, probably necessary just for OBP, are badly hacked by me). I should give it up. :-)
And this is the progress so far. Hats off to the HelenOS team who managed to get their OS running under qemu. No other OS can work under qemu-system-sparc64 yet.

Does anyone have a Martux 0.2 image? It used to be available on the authors site, but seems to be gone, leaving just the check sum files. The file I'm looking for is
CD_sun4u_sparcv9__marTux_0.2__small_naked_demo_cd_bs2048b.iso.bz2 .

Saturday, February 12, 2011

Sent a small patch to OpenBIOS

Now there is a few bytes of my code in the OpenBIOS project as well.
Switching perspectives is an interesting experience: while debugging qemu and seeing something what I don't understand immediately, the first thought is that some hw device is not properly emulated.
Debugging OpenBIOS I'm pretty much sure that hw emulation is fine.

Anyway, fixed one bug. Solaris still crashes on the boot,  but now it should be possible to switch off the mapping of the page 0x0.

In case you are wondering what this would be good for, there are 2 good reasons: a) being compliant to IEEE-1275 SPARC supplement, and more importantly b) ability to easy catch a null pointer dereferences.

The trivial rest of the Solaris boot exercise is left to Mark, Blue and other OpenBIOS developers. :-)