Saturday, September 15, 2018

Back to SPARC


Man proposes, God disposes.

I planned to finish the work on the 40p emulation, but due to some turbulences had absolutely no chance to work on QEMU since last December.  But it looks like I may have some time for QEMU again.

Now, while running AIX under QEMU has been a nice brain and finger exercise, looking at my mail box and the comments of this blog, I think the people are more interested in Solaris than in AIX, so I plan to dig out my Ultra-1 sun4u prototype from 2012, adapt it to the current QEMU object model and make it public (no ETA though).

/ Stay tuned

Saturday, December 2, 2017

AIX 5.1 boots under QEMU

Yoo-Hoo! I did it again. Now it’s the second achievement in the emulation of the proprietary (aka real) UNIX systems.  The Solaris/SPARC run first in December 2009, and now 8 years later, AIX 5.1 boots under QEMU. And even the S3-Trio framebuffer works, thanks to HervĂ©.  Looks pretty cool. Once I have the X-Window running I’ll make a screencast. For now, just a teaser:

AIX5.1 under qemu-system-ppc -M 40p

QEMU PReP, Serial #0, 128 MiB memory installed
Open Firmware , Built  December 01, 2017 16:41:00
Copyright (c) 1995-2000, FirmWorks.
Copyright (c) 2014,2017, Artyom Tarasenko.

Rebooting with command: boot /pci/scsi@1/disk@0,0
Boot device: /pci/scsi@1/disk@0,0  Arguments: 

Saving Base Customize Data to boot disk
Starting the sync daemon
Starting the error daemon
System initialization completed.
Starting Multi-user Initialization
 Performing auto-varyon of Volume Groups 
 Activating all paging spaces 
0517-075 swapon: Paging device /dev/hd6 is already active.
/dev/rhd1 (/home): ** Unmounted cleanly - Check suppressed
/dev/rhd10opt (/opt): ** Unmounted cleanly - Check suppressed
 Performing all automatic mounts 
Multi-user initialization completed
Checking for srcmstr active...complete
Starting tcpip daemons:
0513-059 The syslogd Subsystem has been started. Subsystem PID is 4408.
0513-059 The sendmail Subsystem has been started. Subsystem PID is 3402.
0513-059 The portmap Subsystem has been started. Subsystem PID is 4646.
0513-059 The inetd Subsystem has been started. Subsystem PID is 5160.
0513-059 The snmpd Subsystem has been started. Subsystem PID is 4904.
0513-059 The hostmibd Subsystem has been started. Subsystem PID is 5936.
Finished starting tcpip daemons.
Starting NFS services:
0513-059 The biod Subsystem has been started. Subsystem PID is 8000.
0513-059 The rpc.lockd Subsystem has been started. Subsystem PID is 7494.
Completed NFS services.
...
AIX Version 5
(C) Copyrights by IBM and by others 1982, 2000.
Console login: root
*******************************************************************************
*                                                                             *
*                                                                             *
*  Welcome to AIX Version 5.1!                                                *
*                                                                             *
*                                                                             *
*  Please see the README file in /usr/lpp/bos for information pertinent to    *
*  this release of the AIX Operating System.                                  *
*                                                                             *
*                                                                             *
*******************************************************************************
Last login: Wed Dec 31 18:45:33 CST 1969 on /dev/tty0

#  who -r
   .        run-level 2 Dec 31 18:05       2    0    S                  
# uname -a
AIX localhost 1 5 000000004C00
#

Sunday, September 24, 2017

What's the time?

Previously in this blog: ...two ways from here: either fix the floppy emulation, or make OFW for 40p with no floppy...

... or skip the call. You know, I have an armed debugger here and am not afraid to use it. So just turn the fatal call:


/usr/lib/methods/cfgfda_isa -2 -l fda0

into something harmless, like:

/bin/echo -2 -l fda0

by using

set *(int *) 0x200c11a8 = 0x2f62696e
set *(int *) 0x200c11ac = 0x2f656368
set *(int *) 0x200c11b0 = 0x6f000000

Well actually it probably should have been "/usr/bin/echo", there is no "/bin/echo" in the system. But obviously the attempt above was good enough for AIX, as it doesn't really need the floppy disk adapter (nor mouse & keyboard which I had to hack in a similar way at the second attempt). This brings AIX here:


Completed method for: fda0, Elapsed time = 0
Return code = 127
*** no stdout ****
***** stderr *****
sh: /usr/lib/methods/cfgfda_isa:  not found

Method error (/usr/lib/methods/cfgfda_isa -2 -l fda0 ):
        0514-068 Cause not known.
...
exec(/../usr/sbin/lqueryvg,-phdisk0,-L)
exec(/../usr/bin/grep,00000000000000000000000000000000)
exec(/usr/bin/dosread,-S,/preload,/preload)
exec(/usr/lpp/bosinst/datadaemon)
exec(/../usr/bin/sleep,1)

Where it hangs forever. And now the problem is sort of obvious. Yesterday I wrote that the boot log hadn't had shown any hint. But it did:


Time: 0 LEDS: 0x539
...
Time: 0 LEDS: 0x78a
...
Completed method for: bus0, Elapsed time = 0
...
Time: 0 LEDS: 0x539
...
Time: 0 LEDS: 0x868
...
Completed method for: scsi0, Elapsed time = 0
...

See? The clock is not ticking (it's probably caused by a QEMU bug, that "loadvm" command sometimes doesn't restore one of the machine timers. And I used the command a lot during the yesterdays session).

So basically there are two scenarios:
 - the clock is ticking - in this case AIX doesn't start any methods after spawning the init process
 - the clock is stopped - in this case it starts the methods up to the point where the timeouts are important. Probably if the clock had worked properly the boot process wouldn't had stopped at the floppy detection method.

Which means that debug process is getting real complicated. Now I have to debug the kernel scheduler, which is tricky. And obviously is different from AIX 4.2 which doesn't hang at that point.

The KDB from 5.1 has some features to see the scheduled timers, but I'm not sure it can be used to debug the interrupt handling. At least Solaris kadb was not good for debugging the interrupts, as it made a lot of side effects, and mostly hanged the system right after setting the breakpoint.

So, the good news: the most of the QEMU's 40p model devices are working properly. The bad news: finding a black sheep in a dark room is pretty hard.

Saturday, September 23, 2017

Some experiments with AIX 5.1

Since I could not find the AIX 4.2 install for Motorola, I gave AIX 5.1 under qemu-system-ppc a shot. The feelings are mixed, on one hand I've got no reference machine to check things, on the other hand the KDB debugger in AIX 5.1 is much more powerful than in 4.2. The initialization process of 5.1 is close to 4.2,  so I can recognize some structures. Which is good: the version 4.2 is quite different from 4.1.4 which I tried first. So I was afraid they made an equal leap in 4.x->5.x transition. Well, partially they did. Although the function names are more or less the same, the debugger made a great leap forward.

This stack trace looked like a flashback.

[01DE1BCC]init_pcicfg+000000 (2FF3A910 [??])
[01DE1380]config_pal+000030 (??)
[01DE12F8]config_planar_pal+0001D8 (??, ??)
[004832AC]config_kmod+000184 (??, ??, ??)
[004836E4]sysconfig+000104 (??, ??, ??)
[00003A94].sys_call+000000 ()
[10002668]cfgpal_rspc+0003E8 ()
[100016C0]main+000110 (??, ??, ??)
[10000188]__start+000088 ()

Under 4.2 it was like this:

(gdb) bt
#0  0x018d2b7c in ?? () -- pci_rw
#1  0x00088114 in ?? ()
#2  0x00088114 in ?? ()
#3  0x018d1db0 in ?? () -- init_crashdump 0x018d1d70
#4  0x018cf410 in ?? () -- config_pal  0x018cf35c
#5  0x018cf30c in ?? () -- config_planar_pal 0x018cf100
#6  0x000f9b3c in ?? () -- config_kmod 0x000f9a5c, 2 params, size 0x118
#7  0x000f9eb8 in ?? ()
#8  0x000037a8 in ?? ()

See the formatting differences and gaps? That's because under 4.2 I had to make the trace manually. Did I mention that the 4.2 debugger is eighties style? So, now I'm really enjoying the luxury of having a modern tool.

Also there is a possibility to make the output verbose:

KDB(0)> mw enter_dbg
enter_dbg+000000:  00000000  = 42
n_core+000000:  00000032  = .
KDB(0)>

But then there are also some bad news. There are still bugs (or missing features) in qemu. Even worse, there is at least one Heisenbug. Some times it gets to the PCI initialization and sometimes not. And in the cases where it doesn't get to PCI init it's really unclear why: it just sits in the idle loop, interrupts are enabled, and it receives the interrupts from the timer. Just for some reason it thinks there is nothing to do. Debugging such cases is a real nightmare.
So, I thought maybe go as far it can in case where it does reach PCI init and see any clues in the log.
No obvious clues, but here goes a pretty long log:

Time: 0 LEDS: 0x539
Number of running methods: 0
 cfgmgr LED{539}
----------------
Attempting to configure device 'bus0'
 cfgmgr LED{78A}
Time: 0 LEDS: 0x78a
Invoking /usr/lib/methods/cfgbus_pci -1 -l bus0
exec(/bin/sh,-c,/usr/lib/methods/cfgbus_pci -1 -l bus0)
Number of running methods: 1
exec(/usr/lib/methods/cfgbus_pci,-1,-l,bus0)
Breakpoint
.bus_register+000000     mflr    r0                  <01DEADA0>
KDB(0)> g
exec(/bin/sh,-c,/usr/lib/methods/define_rspc -c bus -s pci -t isa -p bus0 -w 88 -L 04-A0 -d)
exec(/usr/lib/methods/define_rspc,-c,bus,-s,pci,-t,isa,-p,bus0,-w,88,-L,04-A0,-d)
exec(/bin/sh,-c,/usr/lib/methods/cfgbus_isa -1 -l bus1)
exec(/usr/lib/methods/cfgbus_isa,-1,-l,bus1)
Breakpoint
.bus_register+000000     mflr    r0                  <01DF8FF0>
KDB(0)> g
exec(/bin/sh,-c,/usr/lib/methods/define_rspc -d -c adapter -s isa_sio -t fda -p bus1 -w PNP0700ffffffff -L 01-B0)
exec(/usr/lib/methods/define_rspc,-d,-c,adapter,-s,isa_sio,-t,fda,-p,bus1,-w,PNP0700ffffffff,-L,01-B0)
exec(/bin/sh,-c,/usr/lib/methods/define_rspc -d -c adapter -s isa_sio -t isa_keyboard -p bus1 -w PNP0303ffffffff -L 01-D0)
exec(/usr/lib/methods/define_rspc,-d,-c,adapter,-s,isa_sio,-t,isa_keyboard,-p,bus1,-w,PNP0303ffffffff,-L,01-D0)
exec(/bin/sh,-c,/usr/lib/methods/define_rspc -d -c adapter -s isa_sio -t isa_mouse -p bus1 -w PNP0F03ffffffff -L 01-E0)
exec(/usr/lib/methods/define_rspc,-d,-c,adapter,-s,isa_sio,-t,isa_mouse,-p,bus1,-w,PNP0F03ffffffff,-L,01-E0)
exec(/bin/sh,-c,/usr/lib/methods/define_rspc -d -c adapter -s isa_sio -t s1a -p bus1 -w PNP05011 -L 01-F0)
exec(/usr/lib/methods/define_rspc,-d,-c,adapter,-s,isa_sio,-t,s1a,-p,bus1,-w,PNP05011,-L,01-F0)
exec(/bin/sh,-c,/usr/lib/methods/define_rspc -c adapter -s pci -t ncr810 -p bus0 -w 96 -L 04-B0 -d)
exec(/usr/lib/methods/define_rspc,-c,adapter,-s,pci,-t,ncr810,-p,bus0,-w,96,-L,04-B0,-d)
----------------
Completed method for: bus0, Elapsed time = 0
Return code = 0
***** stdout *****
:devices.isa_sio.IBM000E :devices.isa_sio.PNP0400 :devices.pci.22100020
fda0,sioka0,sioma0,sa0,scsi0

*** no stderr ****
----------------
Time: 0 LEDS: 0x539
Number of running methods: 0
 cfgmgr LED{539}
----------------
Attempting to configure device 'fda0'
Method: /usr/lib/methods/cfgfda_isa not in boot image, configure in phase 2
----------------
Attempting to configure device 'sioka0'
Method: /usr/lib/methods/cfgkm_isa not in boot image, configure in phase 2
----------------
Attempting to configure device 'scsi0'
 cfgmgr LED{868}
Time: 0 LEDS: 0x868
Invoking /usr/lib/methods/cfgncr_scsi -1 -l scsi0
exec(/bin/sh,-c,/usr/lib/methods/cfgncr_scsi -1 -l scsi0)
exec(/usr/lib/methods/cfgncr_scsi,-1,-l,scsi0)
exec(/bin/sh,-c,/etc/methods/define -c disk -s scsi -t osdisk -p scsi0 -w 0,0)
exec(/etc/methods/define,-c,disk,-s,scsi,-t,osdisk,-p,scsi0,-w,0,0)
exec(/bin/sh,-c,/etc/methods/define -c cdrom -s scsi -t oscd -p scsi0 -w 2,0)
exec(/etc/methods/define,-c,cdrom,-s,scsi,-t,oscd,-p,scsi0,-w,2,0)
Number of running methods: 1
----------------
Completed method for: scsi0, Elapsed time = 0
Return code = 0
***** stdout *****
hdisk0 cd0
*** no stderr ****
----------------
Time: 0 LEDS: 0x539
Number of running methods: 0
 cfgmgr LED{539}
----------------
Attempting to configure device 'hdisk0'
Method: /etc/methods/cfgscdisk not in boot image, configure in phase 2
----------------
Attempting to configure device 'cd0'
 cfgmgr LED{723}
Time: 0 LEDS: 0x723
Invoking /etc/methods/cfgsccd -1 -l cd0
exec(/bin/sh,-c,/etc/methods/cfgsccd -1 -l cd0)
exec(/etc/methods/cfgsccd,-1,-l,cd0)
Number of running methods: 1
----------------
Completed method for: cd0, Elapsed time = 0
Return code = 0
*** no stdout ****
*** no stderr ****
----------------
Time: 0 LEDS: 0x539
Number of running methods: 0
 cfgmgr LED{539}
----------------
Time: 0 LEDS: 0x538
Invoking top level program -- "/usr/lib/methods/deflvm"
 cfgmgr LED{538}
exec(/bin/sh,-c,/usr/lib/methods/deflvm )
 cfgmgr LED{539}
Time: 0 LEDS: 0x539
Return code = 127
*** no stdout ****
***** stderr *****
sh: /usr/lib/methods/deflvm:  not found

Method error (/usr/lib/methods/deflvm):
        0514-068 Cause not known.
sh: /usr/lib/methods/deflvm:  not found

----------------
Time: 0 LEDS: 0x538
Invoking top level program -- "/usr/lib/methods/fdarcfgrule"
 cfgmgr LED{538}
exec(/bin/sh,-c,/usr/lib/methods/fdarcfgrule )
 cfgmgr LED{539}
Time: 0 LEDS: 0x539
Return code = 127
*** no stdout ****
***** stderr *****
sh: /usr/lib/methods/fdarcfgrule:  not found

Method error (/usr/lib/methods/fdarcfgrule):
        0514-068 Cause not known.
sh: /usr/lib/methods/fdarcfgrule:  not found

----------------
Time: 0 LEDS: 0x538
Invoking top level program -- "/usr/lib/methods/defssar"
 cfgmgr LED{538}
exec(/bin/sh,-c,/usr/lib/methods/defssar )
 cfgmgr LED{539}
Time: 0 LEDS: 0x539
Return code = 127
*** no stdout ****
***** stderr *****
sh: /usr/lib/methods/defssar:  not found

Method error (/usr/lib/methods/defssar):
        0514-068 Cause not known.
sh: /usr/lib/methods/defssar:  not found

 cfgmgr LED{FFF}
Configuration time: 0 seconds
+ 1> /etc/filesystems
+ /usr/lib/methods/showled 0x517
exec(/usr/lib/methods/showled,0x517)
 showled LED{517}
+ bootinfo -b
exec(/usr/sbin/bootinfo,-b)
exec(/usr/lib/boot/bin/bootinfo_rspc,-b)
+ mount -v cdrfs -o ro /dev/cd0 /SPOT
exec(/usr/sbin/mount,-v,cdrfs,-o,ro,/dev/cd0,/SPOT)
exec(/usr/bin/sh,-c,/usr/sbin/wlmcntrl -u -d "" > /dev/null 2>&1)
+ [ 0 -ne 0 ]
+ /usr/lib/methods/showled 0x512
exec(/usr/lib/methods/showled,0x512)
 showled LED{512}
+ /SPOT/usr/bin/rm -r /etc/init /usr/bin /usr/lib/boot /usr/lib/drivers/ataide /usr/lib/drivers/ataidepin /usr/lib/drivers/cfs.ext /usr/lib/drivers/idecdrom /usr/lib/drivers/idecdrompin /usr/lib/drivers/isa /usr/lib/drivers/pci /usr/lib/drivers/planar_pal_rspc /usr/lib/drivers/scdisk /usr/lib/drivers/scdiskpin /usr/lib/methods/cfgataide /usr/lib/methods/cfgbus_isa /usr/lib/methods/cfgbus_pci /usr/lib/methods/cfgidecdrom /usr/lib/methods/cfgncr_scsi /usr/lib/methods/cfgsccd /usr/lib/methods/cfgsys_rspc /usr/lib/methods/chggen /usr/lib/methods/chggen_rspc /usr/lib/methods/define /usr/lib/methods/define_rspc /usr/lib/methods/defsys /usr/lib/methods/showled /usr/lib/methods/ucfgdevice /usr/sbin
exec(/SPOT/usr/bin/rm,-r,/etc/init,/usr/bin,/usr/lib/boot,/usr/lib/drivers/ataide,/usr/lib/drivers/ataidepin,/usr/lib/drivers/cfs.ext,/usr/lib/drivers/idecdrom,/usr/lib/drivers/idecdrompin,/usr/lib/drivers/isa,/usr/lib/drivers/pci,/usr/lib/drivers/planar_pal_rspc,/usr/lib/drivers/scdisk,/usr/lib/drivers/scdiskpin,/usr/lib/methods/cfgataide,/usr/lib/methods/cfgbus_isa,/usr/lib/methods/cfgbus_pci,/usr/lib/methods/cfgidecdrom,/usr/lib/methods/cfgncr_scsi,/usr/lib/methods/cfgsccd,/usr/lib/methods/cfgsys_rspc,/usr/lib/methods/chggen,/usr/lib/methods/chggen_rspc,/usr/lib/methods/define,/usr/lib/methods/define_rspc,/usr/lib/methods/defsys,/usr/lib/methods/showled,/usr/lib/methods/ucfgdevice,/usr/sbin)
...
Attempting to configure device 'fda0'
 cfgmgr LED{828}
Time: 0 LEDS: 0x828
Invoking /usr/lib/methods/cfgfda_isa -2 -l fda0
exec(/bin/sh,-c,/usr/lib/methods/cfgfda_isa -2 -l fda0)
Number of running methods: 1
exec(/usr/lib/methods/cfgfda_isa,-2,-l,fda0)

Now it hangs on the floppy disk adapter init. Looks like there is no timeout. Strange.
There are two ways from here: either fix the floppy emulation, or make OFW for 40p with no floppy...

Saturday, September 16, 2017

AIX under QEMU boots up to NFS

Launching a proprietary OS under QEMU is never boring because every next problem has to do with yet another component. A few weeks ago I was mostly doing Forth to make a bootable firmware, then I fought with the missing residual data, at which point it was mostly debugging ODM database using the PPC assembly, then extracted the mock residual data from the live system, then spent some time with NCR/LSI script, and now after fixing the PCI layout it gets to the point of starting the NIS and NFS services. It works much slower than the real machine, and also much slower than Linux/PPC, but still:

MOT PowerStack2 (e0), Serial #0, 62 MiB memory installed
Open Firmware , Built  September 01, 2017 16:11:38
Copyright (c) 1995-2000, FirmWorks.
Copyright (c) 2014,2017, Artyom Tarasenko.

Rebooting with command: boot /pci/scsi@2/disk@0,0
Boot device: /pci/scsi@2/disk@0,0  Arguments:

+ swcons -c

Saving Base Customize Data to boot disk
Starting the sync daemon
Starting the error daemon
System initialization completed.
Starting Multi-user Initialization
 Performing auto-varyon of Volume Groups
 Activating all paging spaces
swapon: Paging device /dev/hd6 activated.
 Performing all automatic mounts
mount: 1831-010 server axxxxs01 not responding:
RPC: 1832-018 Port mapper failure - RPC: 1832-006 Unable to send
mount: backgrounding
axxxxs01:/home
Multi-user initialization completed
Checking for srcmstr active...complete
Starting tcpip daemons:
0513-056 Timeout waiting for command response.
0513-056 Timeout waiting for command response.
vmtune:  current values:
  -p       -P        -r          -R         -f       -F       -N        -W
minperm  maxperm  minpgahead maxpgahead  minfree  maxfree  pd_npages maxrandwrt
   2968    11872       2          8        115      123     524288        0

  -M       -w       -k       -c         -b          -B          -u
maxpin   npswarn  npskill  numclust  numfsbufs   hd_pbuf_cnt  lvm_bufcnt
  12692     1536      384        0       93           64           9

number of valid memory pages = 15864    maxperm=74.8% of real memory
maximum pinable=80.0% of real memory    minperm=18.7% of real memory
number of file memory pages = 1443      numperm=9.1% of real memory

vmtune:  new values:
  -p       -P        -r          -R         -f       -F       -N        -W
minperm  maxperm  minpgahead maxpgahead  minfree  maxfree  pd_npages maxrandwrt
   2968    11872       2          8        115      123     524288       64

  -M       -w       -k       -c         -b          -B          -u
maxpin   npswarn  npskill  numclust  numfsbufs   hd_pbuf_cnt  lvm_bufcnt
  12692     1536      384        1       93           64           9

number of valid memory pages = 15864    maxperm=74.8% of real memory
maximum pinable=80.0% of real memory    minperm=18.7% of real memory
number of file memory pages = 1444      numperm=9.1% of real memory

Starting NIS services:
Starting NFS services:

0513-056 Timeout waiting for command response.
NIS: server not responding for domain "axxxxs01"; still trying.
NIS: server not responding for domain "axxxxs01"; still trying.
0513-056 Timeout waiting for command response.
NIS: server not responding for domain "axxxxs01"; still trying.
NIS: server not responding for domain "axxxxs01"; still trying.
NIS: server not responding for domain "axxxxs01"; still trying.

Woo-hoo! I could be proud that I made the second proprietary OS boot under QEMU. Or even the third one, because Solaris/sun4m is quite different from Solaris/sun4v, so probably each one counts for one.

But now I'm puzzled. Initially I planned to build a throw-away prototype for Powestack II Utah, fix the QEMU bugs preventing booting AIX with the existing machines, and dispose the prototype. The Motorola AIX I've got seems to expect the machine to be called "MOT PowerStack2 (e0)". Not sure if pushing such a model upstream wouldn't cause any copyright/trademark issues.

Now it boots to the same point as on the physical machine, but except for the PCI layout problems, I haven't found any bugs. And the PCI layout can not be the reason of AIX failing on the 40p machine, because it fails much earlier.

So obviously it's time to change the plan. And here is where things are going to slow down. I don't have the install media for the Motorola AIX 4.2, so I can not make my disk usable - it waits for NFS/NIS servers forever. I can not share the HDD image because it may contain the private data, so publishing the Powerstack II Utah target for QEMU makes a little sense for now. So, either I find the Motorola AIX 4.2 install CD (and probably a physical UW-SCSI drive if it won't work at the first attempt under QEMU), or the work has to be done again for the 40p target. Yes, now I have some know-how about the AIX boot process, but still at the moment I don't feel like starting again from scratch with the 40p target.

The good news is that AIX can definitely be booted under qemu-system-ppc.

Saturday, September 9, 2017

From SCSI to PCI

20 years ago SCSI devices ruled the world. Reading the NCR53c810 manual, I see that it was basically a computer in a computer. It can be programmed to transfer data from/to/between the disks without using the CPU at all.

Trying to understand what happens in the NCR/LSI script:

lsi_scsi: Select LUN 0
lsi_scsi: Extended message 0x1 (len 3)
lsi_scsi: SDTR (ignored)
lsi_scsi: SCRIPTS dsp=81c3b924 opcode 80080000 arg 81c3b9a4
lsi_scsi: Jump to 0x81c3b9a4
lsi_scsi: SCRIPTS dsp=81c3b9a4 opcode 870b0000 arg 81c3b9c4
lsi_scsi: Compare phase 2 == 7
lsi_scsi: Control condition failed
lsi_scsi: SCRIPTS dsp=81c3b9ac opcode 860a0000 arg 81c3b94c
lsi_scsi: Compare phase 2 == 6
lsi_scsi: Control condition failed
lsi_scsi: SCRIPTS dsp=81c3b9b4 opcode 98080000 arg 00000022
lsi_scsi: Interrupt 0x00000022
...
lsi_scsi: SCRIPTS dsp=81c3bb6c opcode 0e000002 arg 81c3bdb4
lsi_scsi: MSG out len=2
lsi_scsi: Select LUN 0
lsi_scsi: MSG: ABORT tag=0x0
lsi_scsi: SCRIPTS dsp=81c3bb74 opcode 80080000 arg 81c3bbbc
lsi_scsi: Jump to 0x81c3bbbc
lsi_scsi: SCRIPTS dsp=81c3bbbc opcode 60000008 arg 00000000
lsi_scsi: Clear ATN

Looks like it aborts if the selected SCSI target doesn't change phase to MSG_OUT or MSG_IN. So I implemented a hack for SDTR reply and it doesn't abort here. But indeed it's a red herring. AIX can also work with the devices which do not support the synchronous or wide transfers.

The actual problem happens later:

lsi_scsi: SCRIPTS dsp=81c43444 opcode c0000004 arg 010000dc
lsi_scsi: memcpy dest 0x81c435fc src 0x010000dc count 4

Or, with a bit more enhanced logging:

lsi_scsi: memcpy dest 0x81c475fc (Mem) src 0x010000dc (IO) count 4
lsi_mem_read, address_space_read status 2
lsi_scsi: the first 4 bytes: 00 00 00 00

It tries to read the port 0x10000dc and save it. QEMU doesn't have anything at the port 0x10000dc, so no wonder the NCR script fails. But what is supposed to be there? The Motorola Ultra 603/Ultra 603e/Ultra 604 Programmer’s Reference Guide suggests it must be the PCI I/O space.

So looks like I've learned enough of NCR/LSI script. Time to see how the PCI bus mastering is supposed to work on this machine.

Saturday, September 2, 2017

More fun with AIX cfgncr_scsi

<in the previous part>... doesn't find SCSI disks. Here it is tricky, it may be a problem with the interrupt routing, or DMA or SCSI host emulation...

... or a bug in the AIX driver itself.

As AIX 4.2 tries to perform scsi inquiry, that's what happens in the QEMU log:

(qemu) lsi_scsi: Write reg ??? ac = e4
lsi_scsi: Write reg ??? ad = 38
lsi_scsi: Write reg ??? ae = c4
lsi_scsi: Write reg ??? af = 81

The register at 0xac-0xaf is DSA Relative Selector (DRS). Is known to qemu, but seems to be not used in any operations.

The newer LSI53c1010-66 manual says:

"This register supplies AD[63:32] during Table Indirect
Fetches and Load/Store Data Structure Address (DSA)
relative operations"

So, maybe just add the support of this register to QEMU and allow the 64 bit DMA transfers, right?

Wrong. The write to this register is the last write and it doesn't start any SCSI command. Let's look where it happens:

p8xx_start_chip:
...
   0x018ff854:  stw     r8,-4(r7)
   0x018ff858:  li      r4,44         ; 0x2c
   0x018ff85c:  b       0x18fe348     ; p8xx_write_reg <= write happens here

The register r4 is 0x2c, but the procedure writes to 0xac. Weird.

Let's look at the other registers:

0x018fe368 in ?? ()
(gdb) info registers r3 r5 r4
r3             0x18f5000        26169344
r5             0x31000080       822083712
r4             0x2c     44

What's that 80 at the end of r5? 0x80 + 0x2c is 0xac. Coincidence? Don't think so.

So, what happens here is the driver tries to write 0x2c, but the bus is shifted, so it hits 0xac. After some chasing I found where this shift is coming from:

p8xx_config:
...
   0x018fdb9c:  bl      0x1909938
   0x018fdba0:  lwz     r2,20(r1)
   0x018fdba4:  cmpwi   cr1,r3,19
   0x018fdba8:  beq     cr1,0x18fdbc8
   0x018fdbac:  li      r8,1
   0x018fdbb0:  stb     r8,256(r28)
   0x018fdbb4:  li      r3,0
   0x018fdbb8:  bl      0x1909938
   0x018fdbbc:  lwz     r2,20(r1)
   0x018fdbc0:  lwz     r8,160(r28)
   0x018fdbc4:  b       0x18fdbd0
   0x018fdbc8:  stb     r29,256(r28)
   0x018fdbcc:  lwz     r8,160(r28)
   0x018fdbd0:  lis     r11,4096
   0x018fdbd4:  addic   r10,r8,128    ; this is the 0x80 I'm looking for
   0x018fdbd8:  li      r8,-1
   0x018fdbdc:  rlwinm  r31,r26,1,15,30
   0x018fdbe0:  addic   r23,r28,10580
   0x018fdbe4:  stw     r10,252(r28)
   0x018fdbe8:  li      r25,1
   0x018fdbec:  addi    r30,r28,0
   0x018fdbf0:  stwu    r25,10512(r30)
   0x018fdbf4:  stw     r8,10588(r28)
   0x018fdbf8:  lwz     r8,10500(r28)
   0x018fdbfc:  stw     r11,10520(r28)
   0x018fdc00:  stw     r10,10532(r28) ; and here it is stored
...

It is added and stored unconditionally. If I drop this addic, something different happens:

(qemu) lsi_scsi: Write reg DSP0 2c = e4
lsi_scsi: Write reg DSP1 2d = 58
lsi_scsi: Write reg DSP2 2e = c4
lsi_scsi: Write reg DSP3 2f = 81
lsi_scsi: SCRIPTS dsp=81c458e4 opcode 41000000 arg 81c45a44
lsi_scsi: Selected target 0
lsi_scsi: SCRIPTS dsp=81c458ec opcode 78370000 arg 00000000
lsi_scsi: Read-Modify-Write reg 0x37 MOV data8=0x00 sfbr=0x00
...

Why would it work on the physical hardware? I guess because the addresses are aliased. Pretty similar to the le bug in Solaris.

So, it's not that QEMU has some unimplemented registers. In this case it has too many implemented ones.

On the other hand, it still doesn't detect the scsi disk, so maybe it has not just too much features, but too few as well...

/Stay tuned

Saturday, August 26, 2017

Milestone: my OFW boots AIX on Powerstack II Utah

Created residual data for the PCI bus, serial port and SCSI adapter. This is the minimal set to boot AIX. And AIX is booting using this residual data, not the hard coded ones. So now I have a reference firmware which works on a physical machine. Here is the complete boot log (mostly for search engines, and digital archaeologists):

MOT PowerStack2 (e0), Serial #0, 62 MiB memory installed
Open Firmware , Built  August 25, 2017 23:09:26
Copyright (c) 1995-2000, FirmWorks.
Copyright (c) 2014,2017, Artyom Tarasenko.

ok boot /scsi/disk@6 -s prompt

Boot device: /scsi/disk@6  Arguments: -s prompt
 0) - DISABLE_PARITY                   16) - ENABLE_END_STOP
 1) - DISABLE_DCACHE                   17) - ENABLE_DEBUG
 2) - DISABLE_ICACHE                   18) + DISABLE_VME
 3) - DISABLE_L2                       19) + ENABLE_BH_IDE_DMA
 4) - DISABLE_SSCALAR                  20) + WINBOND_PATCH
 5) - DISABLE_BHIST                    21) - MANUAL_SCSI_TYPE
 6) - DISABLE_CPU_EMCP                 
 7) - DISABLE_EAGLE_CF_DPARK           
 8) - DISABLE_EAGLE_CF_APARK           
 9) - DISABLE_LEDS                     
10) - EAGLE_ERR_STATUS_RESET           
11) + DISABLE_MASTER_ABORT             
12) - AIX_USES_BUG                     
13) + JUNO_DISCONTIGUOUS               
14) - LLDB_STOP                        
15) - SERVICE_MODE                     31) - DISABLE_HARDSTOPS
Enter bit # to toggle (just <CR> to end): 17
 0) - Top level debug - function names 
 1) - Main line debug messages         
 2) - Subroutine internal messages     
 3) - PCI Bridge settings              
                                       
                                       
                                       
10) - GEV Data debug                   
                                       
12) - IPLCB data                       
                                       
                                       
15) - IPL control block offsets        
Enter bit #(s) to toggle, '*' for ALL enabled, 'C' to clear ALL, 
or just <CR> to return): *
 0) - DISABLE_PARITY                   16) - ENABLE_END_STOP
 1) - DISABLE_DCACHE                   17) + ENABLE_DEBUG
 2) - DISABLE_ICACHE                   18) + DISABLE_VME
 3) - DISABLE_L2                       19) + ENABLE_BH_IDE_DMA
 4) - DISABLE_SSCALAR                  20) + WINBOND_PATCH
 5) - DISABLE_BHIST                    21) - MANUAL_SCSI_TYPE
 6) - DISABLE_CPU_EMCP                 
 7) - DISABLE_EAGLE_CF_DPARK           
 8) - DISABLE_EAGLE_CF_APARK           
 9) - DISABLE_LEDS                     
10) - EAGLE_ERR_STATUS_RESET           
11) + DISABLE_MASTER_ABORT             
12) - AIX_USES_BUG                     
13) + JUNO_DISCONTIGUOUS               
14) - LLDB_STOP                        
15) - SERVICE_MODE                     31) - DISABLE_HARDSTOPS
Enter bit # to toggle (just <CR> to end): 
0x4fd0 Hints relocation
0x5000 SoftROS start() after relocation
0x39728 SoftROS end after relocations
0x3951c SoftROS start of bss
0x39727 SoftROS end of bss
0x39750 Current sbrk(0)
0x596ac Current stack
Space reserved for kernel when IPLCB is being built @ 0x59728
Original bootimage located @ 0x400000
hi->signature = 0x4149584d
hi->resid_data_address = 0x3dd7970
hi->bss_offset = 0x3951c
hi->bss_length = 0x20c
hi->jump_offset = 0x38c
hi->load_exec_address = 0x400430
hi->header_size = 0x400
hi->header_block_size = 0x25e
hi->image_length = 0x4b75b
hi->Spare = 0x3cb419c
hi->res_mem_size = 0x0
hi->mode_control = 0xdead00c0
LED(MOTLED_CHECKING_HARDSTOPS)=0x130c
Magic is 0x01DF0004 
Image size .............. 0x0035A000
Boot image loaded at .... 0x0044BC00
Saved address for jump .. 0x0000038C
LED(MOTLED_INVALID_BOOT_IMAGE)=0x1310
LED(MOTLED_FIRST_KERNEL_MOVE)=0x1308
LED(MOTLED_HARDWARE_INIT)=0x130b
mot_gencmd.c ====> pj_motorola
mot_gencmd.c ====> Machine Check Pin was disabled by F/W

LED(MOTLED_ENABLING_CPU_EMCP)=0x1318
LED(MOTLED_ENABLING_DCACHE)=0x131a
LED(MOTLED_ENABLING_604_SSCALAR)=0x131c
LED(MOTLED_ENABLING_604_BHIST)=0x131d
LED(MOTLED_HARDWARE_INIT_COMPLETE)=0x1320
LED(MOTLED_IPLCB_INIT)=0x1309
iplcb_init.c ====> - iplcb_init()
iplcb_init.c ====> - mem_find()
Mem_addr = 0x3c74000, byte_index = 0x1ef, bit_index = 0x4
Returned from mem_find:
IPLCB addr: ..... 0x03C74000 len = 49152
DMA buffer addr:  0x00FF8000
Memory bitmap addr0x03C7FE10
Serial # from residual data:  4d 4f 54 30 45 32 33 34 41 43 20 20 20 20 20 20
nvram_addr = 0x0074, nvram_data = 0x0077
Name = board-init?  <--> Value = true  Match = FALSE
Name = use-default-vals?  <--> Value = true  Match = FALSE
Name = edo-memory?  <--> Value = false  Match = FALSE
Name = pboot-probe?  <--> Value = false  Match = FALSE
Name = pboot-device-default  <--> Value = fdisk0 hdisk0 enet0  Match = FALSE
Name = fcode-debug?  <--> Value = true<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF>
<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF>
<FF><FF>  Match = FALSE
Name = fw-boot-device  <--> Value = /pci@80000000/pci1000,3@2,0/harddisk@6,0  Match = TRUE
The IPLCB previpl_device string is:
        !!^A/pci@80000000/pci1000,3@2,0/harddisk@6,0
processor = 0x00000009
presoftros.c ====> - __mot_eth_addr()
Memory address of IPL Control Block = 0x03C74000
Directory: ......... 0x03C74080    offset: 0x00000080
IPL_info: .......... 0x03C742E0    offset: 0x000002E0
System area: ....... 0x03C74878    offset: 0x00000878
Buc Data area: ..... 0x03C749F4    offset: 0x00000914
Processor data area: 0x03C74A64    offset: 0x00000A64
Network data area: . 0x03C74D7C    offset: 0x00000D7C
Memory data area: .. 0x03C7677C    offset: 0x0000277C
L2_cache data area:  0x03C768B4    offset: 0x000028B4
Residual data area:  0x03C76974    offset: 0x00002974
ros_table area: .... 0x03C7D3E0
NVRAM cache area: .. 0x03C7D4C8    offset: 0x000094C8
Parameters passed to kernel boot image:
0x3c74000, 0xbe10, 0x4000, 0x1f0
LED(MOTLED_IPLCB_DUMP)=0x130a
IPLD.ipl_info_offset = 0x2e0
IPLD.ipl_info_size = 0x598
IPLD.system_info_offset = 0x878
IPLD.system_info_size = 0x9c
IPLD.buc_info_offset = 0x914
IPLD.buc_info_size = 0x150
IPLD.processor_info_offset = 0xa64
IPLD.processor_info_size = 0x318
IPLD.mem_data_offset = 0x277c
IPLD.mem_data_size = 0x138
IPLD.l2_data_offset = 0x28b4
IPLD.l2_data_size = 0xc0
IPLD.bit_map_offset = 0xbe10
IPLD.bit_map_size = 0x1f0
(IPLD.processor_info_size == sizeof(PROCESSOR_DATA)) failed
IPLD.user_struct_offset = 0x9380
IPLD.user_struct_size = 0x10
user_info->user_data_offset = 0x9390
user_info->user_data_len = 0x50
IPLD.nvram_cache_offset = 0x94c8
IPLD.nvram_cache_size = 0x2000
ipl_info->model = 0x80000e0
ipl_info->ram_size = 0x3e00000
ipl_info->bit_map_bytes_per_bit = 0x4000
ipl_info->ros_entry_table_ptr = 0x3c7d3e0
ipl_info->ros_entry_table_size = 0xe8
ipl_info->nvram_section_1_valid = 0x1
ipl_info->vpd_processor_serial_number = "00E20000"
ipl_info->previpl_device[0] = 0x21
ipl_info->previpl_device[1] = 0x21
ipl_info->previpl_device[2] = 0x1
ipl_info->previpl_device[3] = 0x2f
ipl_info->previpl_device[4] = 0x70
ipl_info->previpl_device[5] = 0x63
ipl_info->previpl_device[6] = 0x69
ipl_info->previpl_device[7] = 0x40
ipl_info->previpl_device[8] = 0x38
ipl_info->previpl_device[9] = 0x30
ipl_info->previpl_device[10] = 0x30
ipl_info->previpl_device[11] = 0x30
ipl_info->previpl_device[12] = 0x30
ipl_info->previpl_device[13] = 0x30
ipl_info->previpl_device[14] = 0x30
ipl_info->previpl_device[15] = 0x30
ipl_info->previpl_device[16] = 0x2f
ipl_info->previpl_device[17] = 0x70
ipl_info->previpl_device[18] = 0x63
ipl_info->previpl_device[19] = 0x69
ipl_info->previpl_device[20] = 0x31
ipl_info->previpl_device[21] = 0x30
ipl_info->previpl_device[22] = 0x30
ipl_info->previpl_device[23] = 0x30
ipl_info->previpl_device[24] = 0x2c
ipl_info->previpl_device[25] = 0x33
ipl_info->Power_Status_and_keylock_reg = 0x3
buc_info_ptr->num_of_structs = 0x3
buc_info_ptr->index = 0x1
buc_info_ptr->struct_size = 0x70
buc_info_ptr->bsrr_offset = 0x0
buc_info_ptr->bsrr_mask = 0x0
buc_info_ptr->bscr_value = 0x0
buc_info_ptr->cfg_status = 0x2
buc_info_ptr->device_type = 0x5
buc_info_ptr->num_of_buids = 0x0
buc_info_ptr->buid_data[0].buid_value = 0xffffffff
buc_info_ptr->buid_data[0].buid_Sptr = 0x0
buc_info_ptr->buid_data[1].buid_value = 0xffffffff
buc_info_ptr->buid_data[1].buid_Sptr = 0x0
buc_info_ptr->buid_data[2].buid_value = 0xffffffff
buc_info_ptr->buid_data[2].buid_Sptr = 0x0
buc_info_ptr->buid_data[3].buid_value = 0xffffffff
buc_info_ptr->buid_data[3].buid_Sptr = 0x0
buc_info_ptr->mem_alloc1 = 0x8000
buc_info_ptr->mem_addr1 = 0xff8000
buc_info_ptr->mem_alloc2 = 0x0
buc_info_ptr->mem_addr2 = 0x0
buc_info_ptr->vpd_rom_width = 0xffffffff
buc_info_ptr->cfg_addr_inc = 0x0
buc_info_ptr->device_id_reg = 0x2040
buc_info_ptr->aux_info_offset = 0x0
buc_info_ptr->feature_rom_code = 0x0
buc_info_ptr->IOCC_flag = 0x0
buc_info_ptr->location[0] = 0x30
buc_info_ptr->location[1] = 0x30
buc_info_ptr->location[2] = 0x30
buc_info_ptr->location[3] = 0x30
buc_info_ptr->num_of_structs = 0x3
buc_info_ptr->index = 0x2
buc_info_ptr->struct_size = 0x70
buc_info_ptr->bsrr_offset = 0x0
buc_info_ptr->bsrr_mask = 0x0
buc_info_ptr->bscr_value = 0x0
buc_info_ptr->cfg_status = 0x2
buc_info_ptr->device_type = 0x5
buc_info_ptr->num_of_buids = 0x2
buc_info_ptr->buid_data[0].buid_value = 0x100
buc_info_ptr->buid_data[0].buid_Sptr = 0x80000000
buc_info_ptr->buid_data[1].buid_value = 0x10100
buc_info_ptr->buid_data[1].buid_Sptr = 0xc0000000
buc_info_ptr->buid_data[2].buid_value = 0xffffffff
buc_info_ptr->buid_data[2].buid_Sptr = 0x0
buc_info_ptr->buid_data[3].buid_value = 0xffffffff
buc_info_ptr->buid_data[3].buid_Sptr = 0x0
buc_info_ptr->mem_alloc1 = 0x0
buc_info_ptr->mem_addr1 = 0x0
buc_info_ptr->mem_alloc2 = 0x0
buc_info_ptr->mem_addr2 = 0x0
buc_info_ptr->vpd_rom_width = 0xffffffff
buc_info_ptr->cfg_addr_inc = 0x0
buc_info_ptr->device_id_reg = 0x2020
buc_info_ptr->aux_info_offset = 0x0
buc_info_ptr->feature_rom_code = 0x0
buc_info_ptr->IOCC_flag = 0x1
buc_info_ptr->location[0] = 0x30
buc_info_ptr->location[1] = 0x30
buc_info_ptr->location[2] = 0x31
buc_info_ptr->location[3] = 0x30
sys_info->nvram_size = 0x0
sys_info->nvram_addr = 0x0
sys_info->todr_addr = 0x0
sys_info->architecture = 0x2
sys_info->implementation = 0x3
sys_info->pkg_descriptor="MOT3F00"
proc_info->num_of_structs = 0x1
proc_info->index = 0x0
proc_info->struct_size = 0xc8
proc_info->per_buc_info_offset = 0x3c749f4
proc_info->proc_int_area = 0x0
proc_info->proc_int_area_size = 0x0
proc_info->processor_present = 0x1
proc_info->test_run = 0xd5
proc_info->test_stat = 0x0
proc_info->link = 0x0
proc_info->link_address = 0x0
proc_info->phys_id = 0x0
proc_info->priv_lck_cnt = 0x0
proc_info->prob_lck_cnt = 0x0
proc_info->architecture = 0x2
proc_info->implementation = 0x10
proc_info->width = 0x20
proc_info->cache_attrib = 0x1
proc_info->icache_size = 0x8000
proc_info->dcache_size = 0x8000
proc_info->icache_asc = 0x4
proc_info->dcache_asc = 0x4
proc_info->tlb_attrib = 0x1
proc_info->itlb_size = 0x80
proc_info->dtlb_size = 0x80
proc_info->itlb_asc = 0x2
proc_info->dtlb_asc = 0x2
proc_info->slb_attrib = 0x0
proc_info->islb_size = 0x0
proc_info->dslb_size = 0x0
proc_info->islb_asc = 0x0
proc_info->dslb_asc = 0x0
proc_info->rtc_type = 0x2
proc_info->rtcXint = 0x0
proc_info->rtcXfrac = 0x0
proc_info->tbCfreq_HZ = 0x7f2815
proc_info->busCfreq_HZ = 0x0
proc_info->version = 0x50000
proc_info->L2_cache_size = 0x0
proc_info->L2_cache_asc = 0x0
proc_info->coherency_size = 0x20
proc_info->resv_size = 0x20
proc_info->icache_block = 0x20
proc_info->dcache_block = 0x20
proc_info->icache_line = 0x20
proc_info->dcache_line = 0x20
proc_info->proc_descriptor = "PowerPC_604"
l2_data->num_of_structs = 0x1
l2_data->index = 0x0
l2_data->struct_size = 0xc0
l2_data->shared_L2_cache = 0x0
l2_data->using_resource_offset = 0xa64
l2_data->mode = 0x0
l2_data->installed_size = 0x0
l2_data->configured_size = 0x0
l2_data->size[0] = 0x0
l2_data->type[0] = 0x30
l2_data->type[1] = 0x30
l2_data->adapter_present = 0x0
l2_data->adapter_bad = 0x0
mem_data[i].num_of_structs = 0x6
mem_data[i].struct_size = 0x34
mem_data[i].card_or_SIMM_size = 0x3e
mem_data[i].state = 0x1
mem_data[i].num_of_bad_simms = 0x0
mem_data[i].card_or_simm_indicator = 0x1
mem_data[i].EC_level = 0x0
mem_data[i].PD_bits = 0x0
mem_data[i].location [0][0] = 0x30
mem_data[i].location [0][1] = 0x30
mem_data[i].location [0][2] = 0x30
mem_data[i].num_of_structs = 0x6
mem_data[i].struct_size = 0x34
mem_data[i].card_or_SIMM_size = 0x0
mem_data[i].state = 0x0
mem_data[i].num_of_bad_simms = 0x0
mem_data[i].card_or_simm_indicator = 0x1
mem_data[i].EC_level = 0x0
mem_data[i].PD_bits = 0x0
mem_data[i].location [0][0] = 0x30
mem_data[i].location [0][1] = 0x30
mem_data[i].location [0][2] = 0x30
mem_data[i].location [0][3] = 0x42
mem_data[i].num_of_structs = 0x6
mem_data[i].struct_size = 0x34
mem_data[i].card_or_SIMM_size = 0x0
mem_data[i].state = 0x0
mem_data[i].num_of_bad_simms = 0x0
mem_data[i].card_or_simm_indicator = 0x1
mem_data[i].EC_level = 0x0
mem_data[i].PD_bits = 0x0
mem_data[i].location [0][0] = 0x30
mem_data[i].location [0][1] = 0x30
mem_data[i].location [0][2] = 0x30
mem_data[i].location [0][3] = 0x43
mem_data[i].num_of_structs = 0x6
mem_data[i].struct_size = 0x34
mem_data[i].card_or_SIMM_size = 0x0
mem_data[i].state = 0x0
mem_data[i].num_of_bad_simms = 0x0
mem_data[i].card_or_simm_indicator = 0x1
mem_data[i].EC_level = 0x0
mem_data[i].PD_bits = 0x0
mem_data[i].location [0][0] = 0x30
mem_data[i].location [0][1] = 0x30
mem_data[i].location [0][2] = 0x30
mem_data[i].location [0][3] = 0x44
mem_data[i].num_of_structs = 0x6
mem_data[i].struct_size = 0x34
mem_data[i].card_or_SIMM_size = 0x0
mem_data[i].state = 0x0
mem_data[i].num_of_bad_simms = 0x0
mem_data[i].card_or_simm_indicator = 0x1
mem_data[i].EC_level = 0x0
mem_data[i].PD_bits = 0x0
mem_data[i].location [0][0] = 0x30
mem_data[i].location [0][1] = 0x30
mem_data[i].location [0][2] = 0x30
mem_data[i].location [0][3] = 0x45
mem_data[i].num_of_structs = 0x6
mem_data[i].struct_size = 0x34
mem_data[i].card_or_SIMM_size = 0x0
mem_data[i].state = 0x0
mem_data[i].num_of_bad_simms = 0x0
mem_data[i].card_or_simm_indicator = 0x1
mem_data[i].EC_level = 0x0
mem_data[i].PD_bits = 0x0
mem_data[i].location [0][0] = 0x30
mem_data[i].location [0][1] = 0x30
mem_data[i].location [0][2] = 0x30
mem_data[i].location [0][3] = 0x46
user_info->user_data_len = 0x50
user_info->user_id_offset = 0x9390
user_info->next_offset = 0x0
mot_data->company = "Motorola Computer Group"
mot_data->board_model = 0x6
mot_data->board_revision = 0x42
mot_data->ethernet_na = 45 55 55 55 45 55
LED(MOTLED_RELOCATING_KERNEL)=0x13e0
+ swcons -c

Saving Base Customize Data to boot disk
Starting the sync daemon
Starting the error daemon
System initialization completed.
Starting Multi-user Initialization
 Performing auto-varyon of Volume Groups
 Activating all paging spaces
swapon: Paging device /dev/hd6 activated.
 Performing all automatic mounts
mount: 1831-010 server a2231s01 not responding: RPC: 1832-018 Port mapper failure - RPC: 1832-006 Unable to send
mount: backgrounding
a2231s01:/home
Multi-user initialization completed
Checking for srcmstr active...complete
Starting tcpip daemons:
0513-059 The syslogd Subsystem has been started. Subsystem PID is 3434.
0513-059 The sendmail Subsystem has been started. Subsystem PID is 5232.
0513-059 The portmap Subsystem has been started. Subsystem PID is 5494.
0513-059 The inetd Subsystem has been started. Subsystem PID is 5756.
0513-059 The snmpd Subsystem has been started. Subsystem PID is 6018.
0513-059 The dpid2 Subsystem has been started. Subsystem PID is 6280.
0513-059 The muxatmd Subsystem has been started. Subsystem PID is 6542.
0513-059 The fibred Subsystem has been started. Subsystem PID is 6804.
vmtune:  current values:
  -p       -P        -r          -R         -f       -F       -N        -W
minperm  maxperm  minpgahead maxpgahead  minfree  maxfree  pd_npages maxrandwrt
   2968    11872       2          8        115      123     524288        0

  -M       -w       -k       -c         -b          -B          -u
maxpin   npswarn  npskill  numclust  numfsbufs   hd_pbuf_cnt  lvm_bufcnt
  12692     1536      384        0       93           64           9

number of valid memory pages = 15864    maxperm=74.8% of real memory
maximum pinable=80.0% of real memory    minperm=18.7% of real memory
number of file memory pages = 1445      numperm=9.1% of real memory

The next step would be to get it working under QEMU.
Under QEMU it gets pretty far, it does find the PCI- and the ISA buses and even the SCSI host.

Unfortunately it doesn't find SCSI disks. Here it is tricky, it may be a problem with the interrupt routing, or DMA or SCSI host emulation.

/Stay tuned

Sunday, July 23, 2017

Wiretapping AIX

Identified a couple of kernel and shared library functions, so I'm not poking in the dark anymore:

First of all I found execv. It gives a lot of insights about the AIX boot process. The process is quite different from Linux or Solaris boot. Kernel is small, and actually is already loaded, even under QEMU. The most other operating systems would write a greeting once a kernel is loaded. AIX does it all silently. On IBM machines there is a LED panel showing one byte of a status. On the Motorola there are just two LEDs which can light green or yellow, which altogether gives just 9 combinations. Not very informative. But even if I had one byte,  it still would not help. I look for error messages like "missing property", "unknown PCI chip", "missing residual data", etc.

The initialization of the PCI bus happens long after  the kernel spawns the /etc/init process.

Breakpoint 20, 0x0008cd38 in ?? ()
(gdb) x/s $r3
0x20051d08:     "/etc/methods/defsys"
(gdb) c
Continuing.
Breakpoint 20, 0x0008cd38 in ?? ()
(gdb) x/s $r3
0x2ff22090:     "/bin/sh"
(gdb) c
Continuing.
Breakpoint 20, 0x0008cd38 in ?? ()
(gdb) x/s $r3
0x20051d28:     "/usr/lib/methods/cfgsys_MOT3F00"       <= here is where it can't find the PCI bus

Then I found the printf and sprintf functions. Although AIX doesn't write anything on the screen, it still collects the boot log messages, so wiretapping  printf and fprintf helps to see them.

The house is still dark but now I have a search light. So whatever bugs are there, beware, you are going to be seen soon!

Saturday, July 22, 2017

Debugging AIX 4.2 boot

I wonder if it is possible to make the AIX 4.2 boot more verbose.
The various sources say that it should be done via
 
mw enter_dbg

under KDB. The AIX version I have doesn't have it. In fact it even doesn't have an option to disassemble a piece of code. Just the hardcore hex-dump, pretty much like it was in eighties.

That feeling when you started with a retro-computing and ended up with a steam punk computing.

ok  boot /scsi/disk@6 -s trap
Trap instruction interrupt.
> mw enter_dbg
032-001  You entered a command «mw» that is not valid.
> help
alter   … (a)lter — alter memory
back    … (b)ack — decrement the IAR
ditto   … «» — blank repeats the last command
break   … (br)eak — set a breakpoint
breaks  … (breaks) — list currently set breakpoints
buckets … (bu)ckets — display kmembucket structures
clear   … (c)lear — clear breakpoint(s)
display … (d)isplay — display a specified amount of memory
dmodsw  … (dm)odsw — display Streams dmodsw table
drivers … (dr)ivers — display device driver (devsw) table
find    … (f)ind — find a string in memory
float   … (fl)oat — display floating point registers
fmodsw  … (fm)odsw — display Streams fmodsw table
fs      … fs — display file system data structures
go      … (g)o — start executing the program
help    … (h)elp — display the list of valid commands
loop    … (l)oop — execute until control returns to this point
map     … (m)ap — display the system loadlist
mblk    … (mb)lk — display mblk/kmemstat structures
next    … (n)ext — increment the IAR
origin  … (o)rigin — set the origin
proc    … (p)roc — process table display
quit    … (q)uit — end the debugger session
queue   … (que)ue — display Streams queues
reset   … (r)eset — release a user defined variable
restore … (re)store — restore or do not restore the screen
screen  … (s)creen — display a screen containing registers and memory
set     … (se)t — define an/or set a variable
sregs   … (sr)egs — display segment registers
st      … (st) — store a full word into memory
stack   … (sta)ck — formatted stack trace
stc     … (stc) — store one byte into memory
step    … (ste)p — perform an instruction single-step
sth     … (sth) — store a half word into memory
stream  … (str)eam — display Stream head structures
swap    … (sw)ap — switch from the current display/keyboard to RS-232 port
thread  … (th)read — thread table display
trace   … (tr)ace — print traceback buffer
trb     … (trb) — display formatted timer request block info
tty     … (tt)y — Display tty struct
user    … (u)ser — formatted user area
uthread … (ut)hread — formatted uthread area
vars    … (v)ars — display a listing of the user_defined variables
vmm     … vmm — display virtual memory data structures
xlate   … (x)late — display the real address of a memory location
>

Sunday, July 16, 2017

Booting OFW from OFW

In order to use my new Motorola Powerstack II Utah machine as a reference for improving qemu there are two ways:

a) make qemu run the Powerstack II firmware
b) make Powerstack II run my firmware

I quickly tried running the Powerstack II firmware under qemu. After all that was the way which did let me run Solaris/SPARC under qemu 7 years ago. The firmware sort of starts, but gets into some very limited debugger. It looks to me that the debugger is from Motorola and it starts before launching the OFW. Last week I found out that the Powerstack II Utah firmware is no good for anything but one version of AIX, so this particular version is really not worth of launching.

So went for the option b) and made a firmware which can be netbooted on the Uhah.
It's even bootable on both Powerstack I and II, which was tricky. For some reason the different Powerstacks have slightly different ideas about the layout of the 0x41 partition. For instance, Solaris floppy image can not be netbooted. But there is one layout which works both for floppy any netbooting on both Powerstacks.

So now I netboot my OFW from the Motorola OFW, and then try booting AIX from the SCSI disk.
And then it hangs with the residual data. No wonder here - I used the device tree from qemu, so at least some devices are different or wired differently. But if I remove the creation of the residual data, it even boots AIX to the same point as the Motorola OFW.

The SCSI host on both Powerstacks is different than on the 40p machine.  The 40p (and qemu) have Vendor Id: 0x1000, Device ID 0x0001, which is according to the pcidatabase a LSI53C810 chip. The Powerstacks have Vendor Id: 0x1000, Device ID 0x0003, which is supposedly LSI53C1010-33. On the chip is written Symbios Logic 53C825A.

I may hit the difference beween the LSI (formerly known as Symbios and NCR) chips later, but at least 53C825A is reverse compatible to 810, otherwise my firmware would not be able to load anything.

Sunday, July 9, 2017

My new toys: Motorola PowerStack II

The second gift from Jochen is a Powerstack II mainboard with an AT power supply unit and a SCSI disk. The SCSI disk has "AIX" written on it, which looks promising, but Jochen doesn't remember if it was really installed, or just planned.

The board has Serial/Parallel/Ethernet/SCSI and even a couple of unsoldered IDE connectors.
The boot log shows it has a Firmworks based Open Firmware:

WARNING: NVRAM Header Test Failed - Auto Initializing
Starting real time clock...
screen not found.
Can't open input device.
Keyboard not present.  Using com1 for input and output.
, Serial #0, 64 MB memory
Power Firmware(TM) by FirmWorks , Built  Thu Jun 4 10:20:43 MST 1998
Copyright (c) 1995-1996 FirmWorks.  All Rights Reserved.
PowerPC Open Firmware
Version 1.2 RM11   Thu Jun 4 10:20:43 MST 1998
Copyright Motorola 1995-96, All Rights Reserved
Copyright FirmWorks 1995-96, All Rights Reserved

 CPU . . . . . . . . . . . . . . . . . . . . . . . . . . . =PowerPC,604e
 MicroProcessor Internal Clock Speed (MHZ) . . . . . . . . =300
 MicroProcessor External Clock Speed (MHZ) . . . . . . . . =67
 PCI Bus Clock Speed (MHZ) . . . . . . . . . . . . . . . . =33
 Local Memory Size . . . . . . . . . . . . . . . . . . . . =4000000 (64 MB)
 Memory Type . . . . . . . . . . . . . . . . . . . . . . . =EDO
 Memory Error Checking . . . . . . . . . . . . . . . . . . =ECC
 Memory Speed. . . . . . . . . . . . . . . . . . . . . . . =50 NS
 L2 Cache Size . . . . . . . . . . . . . . . . . . . . . . =256KB
 L2 Cache Type . . . . . . . . . . . . . . . . . . . . . . =Asynchronous
 L2 Cache Parity . . . . . . . . . . . . . . . . . . . . . =Disabled
 Configuration Checksum. . . . . . . . . . . . . . . . . . =Failed

Then it gets to a windowed menu interface (which doesn't look like the typical OFW at all), but under "Administrative options" it's possible to choose "Invoke the Command Line Prompt", which gives the famous "ok" prompt.

AIX starts booting from the SCSI disk:

Trying..., fdisk0 Recalibrate failed.  The floppy drive is either missing,
improperly connected, or defective.
Failed
Trying..., hdisk0 Booting
Please wait while the system is booting
Boot device: /pci/scsi@2/disk@6,0  File and args:
   

 ******* Please define the System Console. *******

 Type a 1 and press Enter to use this terminal as the
  system console.

cvga0
+ swcons -c

Saving Base Customize Data to boot disk
Starting the sync daemon
Starting the error daemon
System initialization completed.
Starting Multi-user Initialization
 Performing auto-varyon of Volume Groups
 Activating all paging spaces
swapon: Paging device /dev/hd6 activated.
 Performing all automatic mounts

And here it hangs. Probably it tries to perform a NFS mount, which I don't have.
Anyways it's much further than QEMU currently gets, so it's definitely can be used as a reference.

I don't have a UW-SCSI cdrom drive to boot from the Powerstack II  media. But it can be netbooted via tftp.

Surprisingly booting the Solaris/PPC did not work out. The floppy is not recognized, tried to netboot SOLARIS.ELF from the cd got an interesting error:

Rebooting with command: boot /pci/ethernet@4:172.22.0.20,SOLARIS.ELF,172.22.134.1
Boot device: /pci/ethernet@4:172.22.0.250,SOLARIS.ELF,172.22.134.51  File and args:
Trying to get Internet/Ethernet Address ...
       Contact your system administrator to see if
       a Boot Host and network is setup correctly.

so, obviously after switching to the Little-Endian mode the Motorola network driver doesn't work anymore. Looks like in 1998 netbooting Solaris and Windows NT was not relevant for Motorola anymore, otherwise it would have been tested.

Overall it looks like Motorola did heavily modify the OFW. For instance, there are no hidden words. Which is nice. It should be possible to peek if it has any quirks in creation of the residual data. Or better to say it could have been. It is all a one single quirk, there is no residual data.

Initially I thought that this would be a perfect firmware which would boot both PReP images and the later OFW-compatible ones. But alas. After poking around, I googled and found a couple of mails on the NetBSD mailing list stating that:
 1. The firmware doesn't provide any residual data
 2. The firmware doesn't have the PCI, DMA and interrupt mapping properties in the device tree.
Looking at the code I see that the first point is clearly caused by the second one. In the OFW the residual data is generated from the device tree. The code was not removed, but Motorola forgot to add the properties. 

Which makes it a worst possible firmware.

But still it can boot the AIX from the supplied SCSI disk. This explains at least one reason for a custom AIX: the Motorola version should be able to live without the residual data.

Probably the developers were in a rush, so instead of fixing the firmware properties, they just added a hack to the OS. Maybe the OS department had more resources than the firmware one, or maybe the developers who were able to do Forth, were on vacation or fired.

The result is ugly, but I think every software developer has done something similar at least once.

Anyway now I have a sort of reference machine which can sort of boot AIX.

P.S. And by the way, if you wonder why I keep writing  "Powerstack II Utah" instead of just "Powerstack II". It turned out multiple machines called "Powerstack II" were produced. And indeed they are incompatible. More gory details in the Linux kernel sources.