Saturday, September 23, 2017

Some experiments with AIX 5.1

Since I could not find the AIX 4.2 install for Motorola, I gave AIX 5.1 under qemu-system-ppc a shot. The feelings are mixed, on one hand I've got no reference machine to check things, on the other hand the KDB debugger in AIX 5.1 is much more powerful than in 4.2. The initialization process of 5.1 is close to 4.2,  so I can recognize some structures. Which is good: the version 4.2 is quite different from 4.1.4 which I tried first. So I was afraid they made an equal leap in 4.x->5.x transition. Well, partially they did. Although the function names are more or less the same, the debugger made a great leap forward.

This stack trace looked like a flashback.

[01DE1BCC]init_pcicfg+000000 (2FF3A910 [??])
[01DE1380]config_pal+000030 (??)
[01DE12F8]config_planar_pal+0001D8 (??, ??)
[004832AC]config_kmod+000184 (??, ??, ??)
[004836E4]sysconfig+000104 (??, ??, ??)
[00003A94].sys_call+000000 ()
[10002668]cfgpal_rspc+0003E8 ()
[100016C0]main+000110 (??, ??, ??)
[10000188]__start+000088 ()

Under 4.2 it was like this:

(gdb) bt
#0  0x018d2b7c in ?? () -- pci_rw
#1  0x00088114 in ?? ()
#2  0x00088114 in ?? ()
#3  0x018d1db0 in ?? () -- init_crashdump 0x018d1d70
#4  0x018cf410 in ?? () -- config_pal  0x018cf35c
#5  0x018cf30c in ?? () -- config_planar_pal 0x018cf100
#6  0x000f9b3c in ?? () -- config_kmod 0x000f9a5c, 2 params, size 0x118
#7  0x000f9eb8 in ?? ()
#8  0x000037a8 in ?? ()

See the formatting differences and gaps? That's because under 4.2 I had to make the trace manually. Did I mention that the 4.2 debugger is eighties style? So, now I'm really enjoying the luxury of having a modern tool.

Also there is a possibility to make the output verbose:

KDB(0)> mw enter_dbg
enter_dbg+000000:  00000000  = 42
n_core+000000:  00000032  = .
KDB(0)>

But then there are also some bad news. There are still bugs (or missing features) in qemu. Even worse, there is at least one Heisenbug. Some times it gets to the PCI initialization and sometimes not. And in the cases where it doesn't get to PCI init it's really unclear why: it just sits in the idle loop, interrupts are enabled, and it receives the interrupts from the timer. Just for some reason it thinks there is nothing to do. Debugging such cases is a real nightmare.
So, I thought maybe go as far it can in case where it does reach PCI init and see any clues in the log.
No obvious clues, but here goes a pretty long log:

Time: 0 LEDS: 0x539
Number of running methods: 0
 cfgmgr LED{539}
----------------
Attempting to configure device 'bus0'
 cfgmgr LED{78A}
Time: 0 LEDS: 0x78a
Invoking /usr/lib/methods/cfgbus_pci -1 -l bus0
exec(/bin/sh,-c,/usr/lib/methods/cfgbus_pci -1 -l bus0)
Number of running methods: 1
exec(/usr/lib/methods/cfgbus_pci,-1,-l,bus0)
Breakpoint
.bus_register+000000     mflr    r0                  <01DEADA0>
KDB(0)> g
exec(/bin/sh,-c,/usr/lib/methods/define_rspc -c bus -s pci -t isa -p bus0 -w 88 -L 04-A0 -d)
exec(/usr/lib/methods/define_rspc,-c,bus,-s,pci,-t,isa,-p,bus0,-w,88,-L,04-A0,-d)
exec(/bin/sh,-c,/usr/lib/methods/cfgbus_isa -1 -l bus1)
exec(/usr/lib/methods/cfgbus_isa,-1,-l,bus1)
Breakpoint
.bus_register+000000     mflr    r0                  <01DF8FF0>
KDB(0)> g
exec(/bin/sh,-c,/usr/lib/methods/define_rspc -d -c adapter -s isa_sio -t fda -p bus1 -w PNP0700ffffffff -L 01-B0)
exec(/usr/lib/methods/define_rspc,-d,-c,adapter,-s,isa_sio,-t,fda,-p,bus1,-w,PNP0700ffffffff,-L,01-B0)
exec(/bin/sh,-c,/usr/lib/methods/define_rspc -d -c adapter -s isa_sio -t isa_keyboard -p bus1 -w PNP0303ffffffff -L 01-D0)
exec(/usr/lib/methods/define_rspc,-d,-c,adapter,-s,isa_sio,-t,isa_keyboard,-p,bus1,-w,PNP0303ffffffff,-L,01-D0)
exec(/bin/sh,-c,/usr/lib/methods/define_rspc -d -c adapter -s isa_sio -t isa_mouse -p bus1 -w PNP0F03ffffffff -L 01-E0)
exec(/usr/lib/methods/define_rspc,-d,-c,adapter,-s,isa_sio,-t,isa_mouse,-p,bus1,-w,PNP0F03ffffffff,-L,01-E0)
exec(/bin/sh,-c,/usr/lib/methods/define_rspc -d -c adapter -s isa_sio -t s1a -p bus1 -w PNP05011 -L 01-F0)
exec(/usr/lib/methods/define_rspc,-d,-c,adapter,-s,isa_sio,-t,s1a,-p,bus1,-w,PNP05011,-L,01-F0)
exec(/bin/sh,-c,/usr/lib/methods/define_rspc -c adapter -s pci -t ncr810 -p bus0 -w 96 -L 04-B0 -d)
exec(/usr/lib/methods/define_rspc,-c,adapter,-s,pci,-t,ncr810,-p,bus0,-w,96,-L,04-B0,-d)
----------------
Completed method for: bus0, Elapsed time = 0
Return code = 0
***** stdout *****
:devices.isa_sio.IBM000E :devices.isa_sio.PNP0400 :devices.pci.22100020
fda0,sioka0,sioma0,sa0,scsi0

*** no stderr ****
----------------
Time: 0 LEDS: 0x539
Number of running methods: 0
 cfgmgr LED{539}
----------------
Attempting to configure device 'fda0'
Method: /usr/lib/methods/cfgfda_isa not in boot image, configure in phase 2
----------------
Attempting to configure device 'sioka0'
Method: /usr/lib/methods/cfgkm_isa not in boot image, configure in phase 2
----------------
Attempting to configure device 'scsi0'
 cfgmgr LED{868}
Time: 0 LEDS: 0x868
Invoking /usr/lib/methods/cfgncr_scsi -1 -l scsi0
exec(/bin/sh,-c,/usr/lib/methods/cfgncr_scsi -1 -l scsi0)
exec(/usr/lib/methods/cfgncr_scsi,-1,-l,scsi0)
exec(/bin/sh,-c,/etc/methods/define -c disk -s scsi -t osdisk -p scsi0 -w 0,0)
exec(/etc/methods/define,-c,disk,-s,scsi,-t,osdisk,-p,scsi0,-w,0,0)
exec(/bin/sh,-c,/etc/methods/define -c cdrom -s scsi -t oscd -p scsi0 -w 2,0)
exec(/etc/methods/define,-c,cdrom,-s,scsi,-t,oscd,-p,scsi0,-w,2,0)
Number of running methods: 1
----------------
Completed method for: scsi0, Elapsed time = 0
Return code = 0
***** stdout *****
hdisk0 cd0
*** no stderr ****
----------------
Time: 0 LEDS: 0x539
Number of running methods: 0
 cfgmgr LED{539}
----------------
Attempting to configure device 'hdisk0'
Method: /etc/methods/cfgscdisk not in boot image, configure in phase 2
----------------
Attempting to configure device 'cd0'
 cfgmgr LED{723}
Time: 0 LEDS: 0x723
Invoking /etc/methods/cfgsccd -1 -l cd0
exec(/bin/sh,-c,/etc/methods/cfgsccd -1 -l cd0)
exec(/etc/methods/cfgsccd,-1,-l,cd0)
Number of running methods: 1
----------------
Completed method for: cd0, Elapsed time = 0
Return code = 0
*** no stdout ****
*** no stderr ****
----------------
Time: 0 LEDS: 0x539
Number of running methods: 0
 cfgmgr LED{539}
----------------
Time: 0 LEDS: 0x538
Invoking top level program -- "/usr/lib/methods/deflvm"
 cfgmgr LED{538}
exec(/bin/sh,-c,/usr/lib/methods/deflvm )
 cfgmgr LED{539}
Time: 0 LEDS: 0x539
Return code = 127
*** no stdout ****
***** stderr *****
sh: /usr/lib/methods/deflvm:  not found

Method error (/usr/lib/methods/deflvm):
        0514-068 Cause not known.
sh: /usr/lib/methods/deflvm:  not found

----------------
Time: 0 LEDS: 0x538
Invoking top level program -- "/usr/lib/methods/fdarcfgrule"
 cfgmgr LED{538}
exec(/bin/sh,-c,/usr/lib/methods/fdarcfgrule )
 cfgmgr LED{539}
Time: 0 LEDS: 0x539
Return code = 127
*** no stdout ****
***** stderr *****
sh: /usr/lib/methods/fdarcfgrule:  not found

Method error (/usr/lib/methods/fdarcfgrule):
        0514-068 Cause not known.
sh: /usr/lib/methods/fdarcfgrule:  not found

----------------
Time: 0 LEDS: 0x538
Invoking top level program -- "/usr/lib/methods/defssar"
 cfgmgr LED{538}
exec(/bin/sh,-c,/usr/lib/methods/defssar )
 cfgmgr LED{539}
Time: 0 LEDS: 0x539
Return code = 127
*** no stdout ****
***** stderr *****
sh: /usr/lib/methods/defssar:  not found

Method error (/usr/lib/methods/defssar):
        0514-068 Cause not known.
sh: /usr/lib/methods/defssar:  not found

 cfgmgr LED{FFF}
Configuration time: 0 seconds
+ 1> /etc/filesystems
+ /usr/lib/methods/showled 0x517
exec(/usr/lib/methods/showled,0x517)
 showled LED{517}
+ bootinfo -b
exec(/usr/sbin/bootinfo,-b)
exec(/usr/lib/boot/bin/bootinfo_rspc,-b)
+ mount -v cdrfs -o ro /dev/cd0 /SPOT
exec(/usr/sbin/mount,-v,cdrfs,-o,ro,/dev/cd0,/SPOT)
exec(/usr/bin/sh,-c,/usr/sbin/wlmcntrl -u -d "" > /dev/null 2>&1)
+ [ 0 -ne 0 ]
+ /usr/lib/methods/showled 0x512
exec(/usr/lib/methods/showled,0x512)
 showled LED{512}
+ /SPOT/usr/bin/rm -r /etc/init /usr/bin /usr/lib/boot /usr/lib/drivers/ataide /usr/lib/drivers/ataidepin /usr/lib/drivers/cfs.ext /usr/lib/drivers/idecdrom /usr/lib/drivers/idecdrompin /usr/lib/drivers/isa /usr/lib/drivers/pci /usr/lib/drivers/planar_pal_rspc /usr/lib/drivers/scdisk /usr/lib/drivers/scdiskpin /usr/lib/methods/cfgataide /usr/lib/methods/cfgbus_isa /usr/lib/methods/cfgbus_pci /usr/lib/methods/cfgidecdrom /usr/lib/methods/cfgncr_scsi /usr/lib/methods/cfgsccd /usr/lib/methods/cfgsys_rspc /usr/lib/methods/chggen /usr/lib/methods/chggen_rspc /usr/lib/methods/define /usr/lib/methods/define_rspc /usr/lib/methods/defsys /usr/lib/methods/showled /usr/lib/methods/ucfgdevice /usr/sbin
exec(/SPOT/usr/bin/rm,-r,/etc/init,/usr/bin,/usr/lib/boot,/usr/lib/drivers/ataide,/usr/lib/drivers/ataidepin,/usr/lib/drivers/cfs.ext,/usr/lib/drivers/idecdrom,/usr/lib/drivers/idecdrompin,/usr/lib/drivers/isa,/usr/lib/drivers/pci,/usr/lib/drivers/planar_pal_rspc,/usr/lib/drivers/scdisk,/usr/lib/drivers/scdiskpin,/usr/lib/methods/cfgataide,/usr/lib/methods/cfgbus_isa,/usr/lib/methods/cfgbus_pci,/usr/lib/methods/cfgidecdrom,/usr/lib/methods/cfgncr_scsi,/usr/lib/methods/cfgsccd,/usr/lib/methods/cfgsys_rspc,/usr/lib/methods/chggen,/usr/lib/methods/chggen_rspc,/usr/lib/methods/define,/usr/lib/methods/define_rspc,/usr/lib/methods/defsys,/usr/lib/methods/showled,/usr/lib/methods/ucfgdevice,/usr/sbin)
...
Attempting to configure device 'fda0'
 cfgmgr LED{828}
Time: 0 LEDS: 0x828
Invoking /usr/lib/methods/cfgfda_isa -2 -l fda0
exec(/bin/sh,-c,/usr/lib/methods/cfgfda_isa -2 -l fda0)
Number of running methods: 1
exec(/usr/lib/methods/cfgfda_isa,-2,-l,fda0)

Now it hangs on the floppy disk adapter init. Looks like there is no timeout. Strange.
There are two ways from here: either fix the floppy emulation, or make OFW for 40p with no floppy...

5 comments:

Unknown said...

Does qemu can't emulate floppy drive ?

atar said...

It does pretty well for the i686 target(except for some versions of OS/2 which requires exact timings), and good enough for the proprietary IBM firmware to boot from floppy on 40p. But obviously not good enough for AIX. Also I'm not sure if IBM PPS 6015 uses the exact same floppy chip as IBM PC.

What makes me wonder is why there is no timeout and why doesn't AIX throw a XXX LED code indicating that it's not happy with the floppy. This implies that something is wrong with the timer, IRQ or DMA processing, causing testing fda to destroy something crucial in the kernel.

Unknown said...

Hi , if U need it I have a lot of copy of aix for rs/6000.
In my archive I have all original install disk of 4.3.3 , 5.x , 6.x.
If these can help you I can make the isos and send to you.
bye,
Marco.

atar said...

Thanks! If you've got AIX 4.2 or 5.x for Motorola (or Bull Estrella), this would indeed help. I've got the 4.2/5.1 disks for the IBM machines, but they don't work on my reference Motorola machine.

Unknown said...

uhmm probably I only have an mksysb tape taken from a bull with aix 4.3.3 , if I remeber well it has a couple of 603 cpu.
I will try to find install media but probably I only have it on dds tape.