Crash Mode -> Startup Crash




List

My tests of the EMC use steppermod because I've found it to be the least
stable of the motion modules that I can easily run.  And I do know about
the thought that Period needs to be commented out.   

It looks like the NIST folk are way ahead of me and have succeeded in
ruining my "crash mode" construct.  

I downloaded and installed from the sourceforge repository on the 16'th. 
I found that there is code in the new generic.run that watches for
just the kind of faulty termination that led me to hypothesize that there
was a crash mode.  What this code does is rmmod motion modules out of the
kernel and sets or resets SHMEM and semaphores to initial conditions
whenever the EMC shuts down or starts up again.

I tested this new code with the BDI for more than 100 runs that were
interrupted by things like power failures, x-windows deaths, <control c's>,
<control d's> and such and found no two crashes in a row.  I also tested the
new code on 2.2.14 and rt 2.2a and found similar results. (I got tired
before I got to 100)

The good news is that I believe that I can say with reasonable assurance
that "Crash Mode" is dead.   We should have a virtual party!

The bad news is that "Startup Crash" is not dead.  During the 100 BDI runs
EMC failed to run correctly four times.  The system appears to start and I
was able to turn off e-stop and turn the machine on but the instant I ask
for a motion command <home X> something in the PC dies.  

Once, just the EMC went away.  Twice the PC did a warm boot.  Once it just
froze up and I had to reboot.  One of these deaths left the file system in
such disarray that I had to manually fsck and answer yes to a bunch of
questions that I knew nothing about.  (I really like the BDI at these
moments because I am only 10 minutes away from a new system.)  

My rather primitive test has been to cat some terminal reports into a file
using a terminal command like ./emc.run cat >> runlog.txt.  With debug set
to max in the ini file, this startup produces a list of NML messages and
some aux stuff concerning the startup.  The interesting thing that I found
is that cat stops working long before the crash.  The last line in the run
entry of runlog.txt would be something like

starting EMC MOTION PROGRAM -- steppermod.o...

Notice that it did not even report that it was done starting steppermod but
the run script kept going and put up the gui.  Then I took EMC out of
e-stop and turned the machine on but there was no message of it.

Other times the file said

starting EMC IO PROGRAM --  minimillio...done
Version:  1.5.s
Machine:  Ray's steppermod minimill

Notice that it finished loading minimilio but did not cat the report of
the loading of minimilltask or tkemc or any NML  But there had to be NML
messages or tkemc would not show changes to e-stop and machine on.

My rather tentative conclusion is that while the real time and EMC modules
are being connected into the kernel or to each other, some problem causes
the death of running processes like cat.  The failure that causes cat to go
away must also kill other running Linux or Real-Time Linux processes
essential to the proper operation of the motion modules within the EMC.

My last thought is that 96% reliability on startup is a lot better than I
have ever seen before with my runs of the EMC.  And I should remind the
reader that I consistently attempt a worse case test.  

Does anyone have a suggestion that would improve my testing so that we get
closer to the reasons for startup crash.

Ray




Date Index | Thread Index | Back to archive index | Back to Mailing List Page

Problems or questions? Contact