Sunday, July 23, 2017

Wiretapping AIX

Identified a couple of kernel and shared library functions, so I'm not poking in the dark anymore:

First of all I found execv. It gives a lot of insights about the AIX boot process. The process is quite different from Linux or Solaris boot. Kernel is small, and actually is already loaded, even under QEMU. The most other operating systems would write a greeting once a kernel is loaded. AIX does it all silently. On IBM machines there is a LED panel showing one byte of a status. On the Motorola there are just two LEDs which can light green or yellow, which altogether gives just 9 combinations. Not very informative. But even if I had one byte,  it still would not help. I look for error messages like "missing property", "unknown PCI chip", "missing residual data", etc.

The initialization of the PCI bus happens long after  the kernel spawns the /etc/init process.

Breakpoint 20, 0x0008cd38 in ?? ()
(gdb) x/s $r3
0x20051d08:     "/etc/methods/defsys"
(gdb) c
Breakpoint 20, 0x0008cd38 in ?? ()
(gdb) x/s $r3
0x2ff22090:     "/bin/sh"
(gdb) c
Breakpoint 20, 0x0008cd38 in ?? ()
(gdb) x/s $r3
0x20051d28:     "/usr/lib/methods/cfgsys_MOT3F00"       <= here is where it can't find the PCI bus

Then I found the printf and sprintf functions. Although AIX doesn't write anything on the screen, it still collects the boot log messages, so wiretapping  printf and fprintf helps to see them.

The house is still dark but now I have a search light. So whatever bugs are there, beware, you are going to be seen soon!

Saturday, July 22, 2017

Debugging AIX 4.2 boot

I wonder if it is possible to make the AIX 4.2 boot more verbose.
The various sources say that it should be done via
mw enter_dbg

under KDB. The AIX version I have doesn't have it. In fact it even doesn't have an option to disassemble a piece of code. Just the hardcore hex-dump, pretty much like it was in eighties.

That feeling when you started with a retro-computing and ended up with a steam punk computing.

ok  boot /scsi/disk@6 -s trap
Trap instruction interrupt.
> mw enter_dbg
032-001  You entered a command «mw» that is not valid.
> help
alter   … (a)lter — alter memory
back    … (b)ack — decrement the IAR
ditto   … «» — blank repeats the last command
break   … (br)eak — set a breakpoint
breaks  … (breaks) — list currently set breakpoints
buckets … (bu)ckets — display kmembucket structures
clear   … (c)lear — clear breakpoint(s)
display … (d)isplay — display a specified amount of memory
dmodsw  … (dm)odsw — display Streams dmodsw table
drivers … (dr)ivers — display device driver (devsw) table
find    … (f)ind — find a string in memory
float   … (fl)oat — display floating point registers
fmodsw  … (fm)odsw — display Streams fmodsw table
fs      … fs — display file system data structures
go      … (g)o — start executing the program
help    … (h)elp — display the list of valid commands
loop    … (l)oop — execute until control returns to this point
map     … (m)ap — display the system loadlist
mblk    … (mb)lk — display mblk/kmemstat structures
next    … (n)ext — increment the IAR
origin  … (o)rigin — set the origin
proc    … (p)roc — process table display
quit    … (q)uit — end the debugger session
queue   … (que)ue — display Streams queues
reset   … (r)eset — release a user defined variable
restore … (re)store — restore or do not restore the screen
screen  … (s)creen — display a screen containing registers and memory
set     … (se)t — define an/or set a variable
sregs   … (sr)egs — display segment registers
st      … (st) — store a full word into memory
stack   … (sta)ck — formatted stack trace
stc     … (stc) — store one byte into memory
step    … (ste)p — perform an instruction single-step
sth     … (sth) — store a half word into memory
stream  … (str)eam — display Stream head structures
swap    … (sw)ap — switch from the current display/keyboard to RS-232 port
thread  … (th)read — thread table display
trace   … (tr)ace — print traceback buffer
trb     … (trb) — display formatted timer request block info
tty     … (tt)y — Display tty struct
user    … (u)ser — formatted user area
uthread … (ut)hread — formatted uthread area
vars    … (v)ars — display a listing of the user_defined variables
vmm     … vmm — display virtual memory data structures
xlate   … (x)late — display the real address of a memory location

Sunday, July 16, 2017

Booting OFW from OFW

In order to use my new Motorola Powerstack II Utah machine as a reference for improving qemu there are two ways:

a) make qemu run the Powerstack II firmware
b) make Powerstack II run my firmware

I quickly tried running the Powerstack II firmware under qemu. After all that was the way which did let me run Solaris/SPARC under qemu 7 years ago. The firmware sort of starts, but gets into some very limited debugger. It looks to me that the debugger is from Motorola and it starts before launching the OFW. Last week I found out that the Powerstack II Utah firmware is no good for anything but one version of AIX, so this particular version is really not worth of launching.

So went for the option b) and made a firmware which can be netbooted on the Uhah.
It's even bootable on both Powerstack I and II, which was tricky. For some reason the different Powerstacks have slightly different ideas about the layout of the 0x41 partition. For instance, Solaris floppy image can not be netbooted. But there is one layout which works both for floppy any netbooting on both Powerstacks.

So now I netboot my OFW from the Motorola OFW, and then try booting AIX from the SCSI disk.
And then it hangs with the residual data. No wonder here - I used the device tree from qemu, so at least some devices are different or wired differently. But if I remove the creation of the residual data, it even boots AIX to the same point as the Motorola OFW.

The SCSI host on both Powerstacks is different than on the 40p machine.  The 40p (and qemu) have Vendor Id: 0x1000, Device ID 0x0001, which is according to the pcidatabase a LSI53C810 chip. The Powerstacks have Vendor Id: 0x1000, Device ID 0x0003, which is supposedly LSI53C1010-33. On the chip is written Symbios Logic 53C825A.

I may hit the difference beween the LSI (formerly known as Symbios and NCR) chips later, but at least 53C825A is reverse compatible to 810, otherwise my firmware would not be able to load anything.

Sunday, July 9, 2017

My new toys: Motorola PowerStack II

The second gift from Jochen is a Powerstack II mainboard with an AT power supply unit and a SCSI disk. The SCSI disk has "AIX" written on it, which looks promising, but Jochen doesn't remember if it was really installed, or just planned.

The board has Serial/Parallel/Ethernet/SCSI and even a couple of unsoldered IDE connectors.
The boot log shows it has a Firmworks based Open Firmware:

WARNING: NVRAM Header Test Failed - Auto Initializing
Starting real time clock...
screen not found.
Can't open input device.
Keyboard not present.  Using com1 for input and output.
, Serial #0, 64 MB memory
Power Firmware(TM) by FirmWorks , Built  Thu Jun 4 10:20:43 MST 1998
Copyright (c) 1995-1996 FirmWorks.  All Rights Reserved.
PowerPC Open Firmware
Version 1.2 RM11   Thu Jun 4 10:20:43 MST 1998
Copyright Motorola 1995-96, All Rights Reserved
Copyright FirmWorks 1995-96, All Rights Reserved

 CPU . . . . . . . . . . . . . . . . . . . . . . . . . . . =PowerPC,604e
 MicroProcessor Internal Clock Speed (MHZ) . . . . . . . . =300
 MicroProcessor External Clock Speed (MHZ) . . . . . . . . =67
 PCI Bus Clock Speed (MHZ) . . . . . . . . . . . . . . . . =33
 Local Memory Size . . . . . . . . . . . . . . . . . . . . =4000000 (64 MB)
 Memory Type . . . . . . . . . . . . . . . . . . . . . . . =EDO
 Memory Error Checking . . . . . . . . . . . . . . . . . . =ECC
 Memory Speed. . . . . . . . . . . . . . . . . . . . . . . =50 NS
 L2 Cache Size . . . . . . . . . . . . . . . . . . . . . . =256KB
 L2 Cache Type . . . . . . . . . . . . . . . . . . . . . . =Asynchronous
 L2 Cache Parity . . . . . . . . . . . . . . . . . . . . . =Disabled
 Configuration Checksum. . . . . . . . . . . . . . . . . . =Failed

Then it gets to a windowed menu interface (which doesn't look like the typical OFW at all), but under "Administrative options" it's possible to choose "Invoke the Command Line Prompt", which gives the famous "ok" prompt.

AIX starts booting from the SCSI disk:

Trying..., fdisk0 Recalibrate failed.  The floppy drive is either missing,
improperly connected, or defective.
Trying..., hdisk0 Booting
Please wait while the system is booting
Boot device: /pci/scsi@2/disk@6,0  File and args:

 ******* Please define the System Console. *******

 Type a 1 and press Enter to use this terminal as the
  system console.

+ swcons -c

Saving Base Customize Data to boot disk
Starting the sync daemon
Starting the error daemon
System initialization completed.
Starting Multi-user Initialization
 Performing auto-varyon of Volume Groups
 Activating all paging spaces
swapon: Paging device /dev/hd6 activated.
 Performing all automatic mounts

And here it hangs. Probably it tries to perform a NFS mount, which I don't have.
Anyways it's much further than QEMU currently gets, so it's definitely can be used as a reference.

I don't have a UW-SCSI cdrom drive to boot from the Powerstack II  media. But it can be netbooted via tftp.

Surprisingly booting the Solaris/PPC did not work out. The floppy is not recognized, tried to netboot SOLARIS.ELF from the cd got an interesting error:

Rebooting with command: boot /pci/ethernet@4:,SOLARIS.ELF,
Boot device: /pci/ethernet@4:,SOLARIS.ELF,  File and args:
Trying to get Internet/Ethernet Address ...
       Contact your system administrator to see if
       a Boot Host and network is setup correctly.

so, obviously after switching to the Little-Endian mode the Motorola network driver doesn't work anymore. Looks like in 1998 netbooting Solaris and Windows NT was not relevant for Motorola anymore, otherwise it would have been tested.

Overall it looks like Motorola did heavily modify the OFW. For instance, there are no hidden words. Which is nice. It should be possible to peek if it has any quirks in creation of the residual data. Or better to say it could have been. It is all a one single quirk, there is no residual data.

Initially I thought that this would be a perfect firmware which would boot both PReP images and the later OFW-compatible ones. But alas. After poking around, I googled and found a couple of mails on the NetBSD mailing list stating that:
 1. The firmware doesn't provide any residual data
 2. The firmware doesn't have the PCI, DMA and interrupt mapping properties in the device tree.
Looking at the code I see that the first point is clearly caused by the second one. In the OFW the residual data is generated from the device tree. The code was not removed, but Motorola forgot to add the properties. 

Which makes it a worst possible firmware.

But still it can boot the AIX from the supplied SCSI disk. This explains at least one reason for a custom AIX: the Motorola version should be able to live without the residual data.

Probably the developers were in a rush, so instead of fixing the firmware properties, they just added a hack to the OS. Maybe the OS department had more resources than the firmware one, or maybe the developers who were able to do Forth, were on vacation or fired.

The result is ugly, but I think every software developer has done something similar at least once.

Anyway now I have a sort of reference machine which can sort of boot AIX.

P.S. And by the way, if you wonder why I keep writing  "Powerstack II Utah" instead of just "Powerstack II". It turned out multiple machines called "Powerstack II" were produced. And indeed they are incompatible. More gory details in the Linux kernel sources.

Sunday, July 2, 2017

My new toys: Motorola Powerstack I

Jochen Kunz sent me two Motorola PowerStack toys. Thank you very much, Jochen. Now I should have the reference machines which I can use for fixing QEMU.

The first one is a classic PowerStack I machine, which looks pretty cool and has some kind of proprietary firmware (PPC1Bug).  To connect it via a serial line I had to use pretty much all the cables and adapters I have: short 9F-9F cable, 9M-25F adapter, 25F-25F cable, 25M-9F adapter.
On the desktop Linux side I use GNU screen on a serial line. Found this feature just a few days ago. For those who missed it too, that's how it gets attached to a running screen:

screen -X screen /dev/ttyS0  # note 2 screens, that's not typo

That's what it tells on powering on:

Copyright Motorola Inc. 1988 - 1995, All Rights Reserved

PPC1 Debugger/Diagnostics Release Version 1.8 - 10/04/95
COLD Start

Local Memory Found =02000000 (&33554432)

WARNING: Board Configuration Data Failure

MPU Clock Speed =100Mhz
WARNING: Keyboard not connected

Initializing System Memory (DRAM)...

System Memory: 32MB, Parity Enabled (Parity-Memory Detected)
L2Cache:       NONE, Parity NOT Enabled

SelfTest/Boots about to Begin... Press <BREAK> at anytime to Abort ALL

SelfTest about to Begin... Press <ESC> to Bypass, <SPC> to Continue

RAM      ADR: Addressability......................... Running ---> PASSED
PC16550  REGA: Register Access....................... Running ---> PASSED

PC16550  IRQ: Interrupt.............................. Running ---> PASSED
PC16550  BAUD: Baud Rate............................. Running ---> PASSED

PC16550  LPBK: Internal Loopback..................... Running ---> PASSED
Z8536    CNT: Counter................................ Running ---> PASSED

Z8536    LNK: Linked Counter......................... Running ---> PASSED
Z8536    IRQ: Interrupt.............................. Running ---> PASSED

Z8536    REG: Register............................... Running ---> PASSED
SCC      ACCESS: Device/Register Access.............. Running ---> PASSED

SCC      IRQ: Interrupt Request...................... Running ---> PASSED
PAR87303 REG: PC87303 Parallel Port's Register/Data.. Running ---> PASSED

DEC21040 REGA: PCI Register Access................... Running ---> PASSED
DEC21040 XREGA: Extended PCI Register Access......... Running ---> PASSED

DEC21040 SPACK: Single Packet Xmit/Recv.............. Running ---> PASSED
DEC21040 ILR: Interrupt Line Register Access......... Running ---> PASSED

DEC21040 ERREN: ERREN and SERREN Bit Toggle.......... Running ---> PASSED
DEC21040 IOR: I/O Resource Register Access........... Running ---> PASSED

DEC21040 CINIT: Chip Initialization.................. Running ---> PASSED
NCR      PCI: NCR 53c8xx PCI Access.................. Running ---> PASSED

NCR      ACC1: NCR 53c8xx Device Access.............. Running ---> PASSED
NCR      ACC2: NCR 53c8xx Register Access............ Running ---> PASSED

NCR      SFIFO: NCR 53c8xx SCSI FIFO................. Running ---> PASSED
NCR      DFIFO: NCR 53c8xx DMA FIFO.................. Running ---> PASSED

NCR      IRQ: NCR 53c8xx Interrupts.................. Running ---> PASSED
NCR      SCRIPTS: NCR 53c8xx SCRIPTs Processor....... Running ---> PASSED

I82378   REG: i82378 Register Access................. Running ---> PASSED
I82378   IRQ: Interrupt Request...................... Running ---> PASSED

AutoBoot about to Begin... Press <ESC> to Bypass, <SPC> to Continue

NetBoot about to Begin... Press <ESC> to Bypass, <SPC> to Continue

1) Continue System Start Up
2) Select Alternate Boot Device
3) Go to System Debugger
4) Initiate Service Call
5) Display System Test Errors
6) Dump Memory to Tape
Enter Menu #:

It doesn't have anything on its hard drive, so the only reasonable option here is 3):

I/O Inquiry Status:
  0     0  NCR53C825   0      $00    N   SEAGATE  ST31230W         0456
  0    50  NCR53C825   5      $05    Y   TOSHIBA  CD-ROM XM-4101TA 1084
  1     0  PC8477      0      $00    Y   <None> 

Tried all boot disks I have.

+ Boots Solaris 2.5.1/PPC floppy, which provides some very limited Open Firmware (not even sure it's based on the Firmworks OFW). After booting the floppy it's possible to boot Solaris from a CD. Nice to have, but not my toy of the choice: it works in a little endian mode, which currently doesn't work under QEMU/PReP, and hardly has any software. But if one day all the other OSes emulated I may get back to it.

- Unsurprisingly doesn't boot from any IBM AIX CDs. Already heard that AIX is quite picky about the hardware, was just curious if it gives any error message. It doesn't.

- Surprisingly doesn't boot from the two Motorla AIX CDs I have:
    "AOS1_3__RM02" (aka AIX v4.1.4 for Motorola PowerStack II)
    "AOS1_4__RM03" (aka AIX v4.1.4r4 for Motorola PowerStack II)
So, obviously the PowerStack II AIX is not compatible with PowerStack I.

* Haven't tried booting Windows NT on it. There is a report in google groups that NT flashes another firmware which can only boot NT and it's not possible to get back to PPC1Bug. On top of that, NT is little-endian, just like Solaris 2.5.1/PPC, so all the considerations from the above apply here too.

The good news: it has  an i82378 PCI controller and a NCR53C825 SCSI, which is quite close to what QEMU/PReP/40p target currently emulates.
The bad news: unless I find a boot disk for AIX for Motorola PowerStack I, this machine can not be used for debugging AIX.

The next weekend I'll write about the second toy.

/Stay tuned