This document brings to light some of the documentation issues not addressed elsewhere or not easily accessed else where. Topics are
For tasks on two different machines to communicate, the networking tasks must be running on both machines. On unix, they are started using the "dits_netstart" command. This command starts the "IMP_Master", "IMP_Transmitter" , and "IMP_Receiver" tasks. "IMP_Master" is used for loading other tasks locally. The actual network communications are handled by "IMP_Transmitter" and and "IMP_Receiver".
Internet TCP ports are used for connections between machines. The receiver program must use a well defined port for DRAMA systems to communicate. The default port number used is based on the first two characters of the username. This works well in development but should be considered further before deployment. It can of course trigger problems if two users on the one machine have the same first two characters in their user names (case insensitive).
The default port number used can be determined (on unix based systems) by running the command "$IMP_DEV/impuserport"
To change the port used, use a IMP Start up file
If you fail to connect and get this error message, there are several possible causes
One of ways of checking of the connection to the other machine is possible is to simply to telnet to the correct port. E.g.
telnet
If you get a message like this, then the port at the other end is not open - e.g. the DRAMA network tasks are using a different port or are not running.
telnet: connect to address ...: Connection refused telnet: Unable to connect to remote host
If you see something like
Trying ... Connected to .... Escape character is '^]'.Then it was possible to connect the specified port number and DRAMA should be able to connect if it is using the same port and host number. Press "%Esc"% and "%]"% to exit.
If in the above, you only get the "Trying" line, then a firewall is likely preventing access.
If what happens is that when you try to connect, the message seems to be sent ok, but you get no reply, then it is likely the problem is in the connection from the target machine back to your machine.
To resolve this, use "telnet" as above but from the other directory
When debugging such issues, I find it easier to just run the "ticker" ("$DITS_DEV/ticker") program as the target task and "ditscmd" as the message sender.
The "ticker" task's "TICK" action can be used.
IMP Start up files are probably a necessity when communicating with machines which are likely for whatever reason to crash, be switched off or rebooted. In particular, VxWorks machines. In addition, if you intend using a predefined port number for the network tasks, a start up file is needed for every machine. These files are named
IMP_Startup.node
Where "node" is the node name of the machine which is running the start up file. On a UNIX machine, "node" should be what ever the command "uname -n" returns (it must be the same case). On a VxWorks machine, use the node name shown for the machine by the command "hostShow". From version 1.5 of DRAMA, if a file of this name does not exist, IMP will look for one named "IMP_Startup"
These files must reside in either the default directory (at the time that the IMP master program is started) of in the same directory as the IMP master program. For VxWorks and VMS, it should always be the default directory at the time IMP master is started. For the UNIX version, if you start master by specifying a full directory path (or an alias to a full directory path), then you can store the files in the same directory as IMP master.
Start up file entries of interest are
Log startup sequence Use port 50000 Pulse connections to node every 10 seconds forward registrations to machine1 machine2
The first line tells the tasks to inform you when it reads the start up file. This line is probably only necessary if you want to confirm you are picking up the right file.
The second line tells IMP to always use port 50000 for network connections. This means that inter machine communications are independent of the user names tasks are running under. Beware, it also means that different user's can interfere with each other (although only once they start talking to other machines). Note that only one user will be able to run on a given machine using the specified port number.
The third line tells IMP to pulse connections to the given node every ten seconds. If it gets no response in that time, it notifies all tasks communicating with the given node that it has gone down. Every machine which may communicate with a VxWorks machine will need a line like this for each VxWorks machine. In addition, each VxWorks machine should have a line like this for each node that may communicate with it. Each such line will case a copy of the ImpPulse program to be started when DRAMA starts up - so each one adds a new DRAMA task.
The fourth line tells IMP to forward registration messages to the specified machines. Each Registrar task on those machines will receive messages when any task starts up on the local machine.Note that the VxWorks version currently requires the node names be specified as Names, not numbers. This is a known bug. You can add IP names to VxWorks in your VxWorks start up script using hostAdd().
When using "pulse connections" to/from VxWorks machines, there is an issue which occurs if the DRAMA networking tasks (ImpPulse, Master, Transmitter and Receiver) are running at the same or lower priority then a program which has a hard CPU loop. Under VxWorks, the networking tasks won't have a chance to get in whilse the hard CPU loop is running and DRAMA thinks the machine has crashed. Since all DRAMA tasks are, by default, run at the same priority, this problem tends to happen when there is any DRAMA tasks which uses a lot of CPU (E.g. takes more then about 20 to 30 seconds to process a particular DRAMA message event).
To solve this problem you should manipulate priorities appropriately. For example - in a couple of my applications, CPU intensive programs automatically lower their own priority whilst in such CPU loops and then restore it. Alternatively, must run them at a lower priority then the DRAMA tasks.
It is not impossible for this problem to also occur on other machines - it has been seen at least once under Solaris. Sometimes the solution is to ensure the pulse rate is sufficent to cover such events.
Internet TCP ports are used for connections between machines. The receiver program must use a well defined port for DRAMA systems to communicate. The default port number used is based on the first two characters of the username. This works well in development but should be considered further before deployment. It can of course trigger problems if two users on the one machine have the same first two characters in their user names (case insensitive).
IMP Start up files should be used to set the PORT number. The best approach is normally that a given system uses the same port number throughout and the Use port n specification be used.
It should be noted that IMP also allows the following specification type
use portThis appears to allow you to use different ports on different machines. But it requires that communications start in an appropriate order and can be reliable as a result. If you feel that need to use it, please talk to Tony.for
Note in some cases it may be desirable to know the default port number. In later versions of DRAMA (V1.5 onwards) this can be determined by running the program
$IMP_DEV/impuserport
Several complexities with the use of the file system by IMP have been noticed.
When the deletion occurs, the DRAMA programs using these files will continue to run (since open file continue to exist) but other newly started DRAMA programs will not be able to communicate with them (this includes "cleanup").
Possible work arounds are:
Any application which is leaking SDS IDs will slow down. An application should always release SDS IDs which it is finished with. It normally takes considerable time for this problem to show in a significant way (say 5 to 8 hours, depending on the appllication, available CPU and memory resources etc.)
This effect can be very noticeable with user interface applications written in Tcl, probably because it is each to forget about releasing resources in Tcl. The effect seen is that responses to X events and DRAMA messages becomes ever slower.
The DRAMA control message SDSLEAKCHK can help monitor this. For example% ditscmd -c TICKER SDSLEAKCHK TICKER:Received registration message from DITSCMD_1440@aaolxp.aao.gov.au ------ SDS Leak check ---------------------------- Highest number of SDS ID's allocated so far: 5 Current number of SDS ID's allocated : 4 Highest number before more memory allocated: 2000 DRAMA allocates at least 3 ID's on start up. 4 if parameter system is in use. If the number of allocated ID's increases over time then you are likely to be leaking SDS ID's. --------------------------------------------------Send this command multiple times during the operation of the program. If the current number of SDS ID's allocate is going up all the time then you are likely to be leaking SDS IDs' Normally what has happened is that the task is allocating an SDS ID without freeing it, and repeating this a lot. These ID's are small, so you often don't see much of a memory leak, but over time they cause the program to slow down dramatically. Any of the SDS calls which allocate ID's may be the problem. You normally have to match each of these calls with calls to SdsFreeId().
DitsSetFixFlags(DITS_M_NO_SDS_CHECK); (C/C++ code)or
DitsSetFixFlags NO_SDS_CHECK (Tcl)Within the event/action reschedule handler. Setting debug levels is done by the DitsSetDebug(3) routine or one of various interfaces to this, e.g.
setenv DITS_DEBUG 32768 (before a program starts up) ditscmd -c <task> DEBUG 32768 (command line to running task) DitsSetDebug(DITS_M_LOG_SDSCHK) (C call within task) DitsSetDebug SDSCHK (Tcl command)
#define SDS_CHECK_FREEBefore your first include of any DRAMA related include file. If you do this then any call to SdsFreeId(3)is redefined to use SdsFreeIdCheck() and a message will be printed to stderr indicating the problem and which line in your code the call came from. (Since this is misleading for C++ objects (it is triggered by the contructors), a different message is output indicating some object details)
#define DRAMA_ALLOW_CPP_STDLIBBefore loading sds.h. You can then add the line following lines to your functions or methods:
SDS_CHECK_IDS("function")Where function is the function or method name. This creates an object which wraps up the SdsCheckInit()/SdsCheck() functions. When the destructor of this object is run (when control leaves the enclosing block) then the check is run and you are told if any SDS ID's have leaked or been released since the constructor was run. It should be noted that if you use the SdsId class for SDS access in C++ code, then you are far less likely to get into trouble - it does most of the work of tidying up SDS itself.
If you get a failure to connect to a remote task (DITS-F-NO_NET_ON_REMOTE) but are convinced that remote networking tasks are running, it could indicate the remote networking tasks are not using the port number you expect. Check the IMP Start up file usage on both machines in question
Click here for the DRAMA home page and here for the AAO home page.
For more information, contact tony.farrell@mq.edu.au