Saturday, May 8, 2010

sent the SunOS 4.1.4 patch upstream

Sent the SunOS 4.1.4 patch upstream. If it gets accepted, it should be possible to get up to "The Wh" using Solaris 2.5.1- instructions from the how-to.

Found another bug in qemu which I can't fix. Hoped that others can do it, so posted it to the mailing list, but got no responses. Actually the bug may be related to the SunOS fix I just sent: maybe SunOS tries to access a non-connected address not because it's buggy, but because qemu translates the address wrong.

If no-one from the mailing list answers I'll have to dig it further.

9 comments:

Jason Stevens said...

cool, I hope they approve them...!

atar said...

Up to now they have not. The patches add some pseudo-devices to the addresses where according to the documentation nothing is supposed to be. I added them cause I see on the real machines there is something on this addresses (due to aliasing effects). But the maintainer (Blue Swirl) would like to have a better reasoning why is the patch needed.

Unfortunately I have to way to test whether the real hardware tries to access these addresses. My patches don't make the emulation worse, but may be unnecessary if the other bug gets fixed.

Ted Lemon said...

Do you have a set of patches lying around anywhere that represent the difference between the qemu code you're running and what is in git?

atar said...

No, cause I do some wild experiments.

The particular 4.1.4 patch is available at http://patchwork.ozlabs.org/patch/51965/raw/ .

Ted Lemon said...

Okay, thanks. I'm running into a persistent problem with NetBSD 5.0.2 where if I try to do a full install, or any substantial disk I/O, it crashes because of a repeated unexpected interrupt. I thought it might have to do with the interrupt mask bug you found in mid-November, but it looks like that patch is in the git tree.

I tried your suggestion of running in user mode, but it looks like that code isn't widely used - it dumps core on Linux after smashing the stack, and fails to mmap the stack on Mac OS X for reasons I haven't been able to determine (the mmap syscall doesn't give helpful error results).

How did you climb the learning curve on this code? It seems like it ought to be possible to get user mode working, but I don't know enough about how the whole system works to make an educated guess as to what's broken about it.

atar said...

Yes, all my good patches are in git. Except maybe the one mentioned above (but it really seems to affect the SunOS 4.1.x only). Not sure yet if I have to fight for it.

And no, I didn't succeed climbing the learning curve. I'm looking at phys_page_find the last two evenings and don't understand how can it be working. At all. Looks absolutely wrong to me... Must produce much more error cases than I spotted. And it's used with all architectures, so bugs there are unlikely...

How do you start your NetBSD? Did you try to switch between using OBP and OpenBIOS? If you see the bugs when started with OpenBIOS, can you please submit them to the qemu mailing list?

Are those timer or disk interrupts?

Ted Lemon said...

I get the same results with either firmware. The crash starts like this:

# stray interrupt ipl 0x4 pc=0xf02bb5f8 npc=0xf02bb5fc psr=44010c3
[...repeats about 10 times...]
stray interrupt ipl 0x4 pc=0xf02bb5f8 npc=0xf02bb5fc psr=44010c3
panic: crazy interrupts

I'll submit a bug report.

atar said...

Ted, can you try the fresh git? My Solaris 2.3 fix was accepted, and it solves some problems with the spurious interrupts.

Ted Lemon said...

Interestingly enough, when I went to submit a bug report, I was unable to reproduce the problem. I have no idea what changed, or whether I just got lucky. The code I ran to do the test install did not include the patch you're talking about.

In any case, I pulled down the latest patch you did and am currently doing a fairly intensive data transfer, which has not yet crashed the machine, despite running for about an hour so far. When I'm done, I'll try doing a build to see if I can get it to crash.