DRAMA Tidbits

This document brings to light some of the documentation issues not addressed elsewhere or not easily accessed else where. Topics are

  • IMP (DRAMA) Network connections.
  • IMP Start up files
  • IMP File usage
  • SDS Leaking slowing down applications
  • Selecting buffer sizes
  • Failure to connect to remote tasks
  • ImpRegister fail - No space left on device

  • IMP (DRAMA) Network connections.

    For tasks on two different machines to communicate, the networking tasks must be running on both machines. On unix, they are started using the "dits_netstart" command. This command starts the "IMP_Master", "IMP_Transmitter" , and "IMP_Receiver" tasks. "IMP_Master" is used for loading other tasks locally. The actual network communications are handled by "IMP_Transmitter" and and "IMP_Receiver".

    IMP Port numbers

    Internet TCP ports are used for connections between machines. The receiver program must use a well defined port for DRAMA systems to communicate. The default port number used is based on the first two characters of the username. This works well in development but should be considered further before deployment. It can of course trigger problems if two users on the one machine have the same first two characters in their user names (case insensitive).

    The default port number used can be determined (on unix based systems) by running the command "$IMP_DEV/impuserport"

    To change the port used, use a IMP Start up file

    Connection problems - "%DITS-F-NO_NET_ON_REMOTE"

    If you fail to connect and get this error message, there are several possible causes

  • The networking tasks are not running n the other machine.
  • The IMP Start up file are not compatible
  • You are not using IMP start up files, and your user name is different
  • A firewall is preventing connections - consider the firewall on the target machine.
  • A router or a virtual private network configuration is preventing connection
  • One of ways of checking of the connection to the other machine is possible is to simply to telnet to the correct port. E.g.

     
    telnet  
    

    If you get a message like this, then the port at the other end is not open - e.g. the DRAMA network tasks are using a different port or are not running.

    telnet: connect to address ...: Connection refused
    telnet: Unable to connect to remote host
    

    If you see something like

     
    Trying ...
    Connected to ....
    Escape character is '^]'.
    
    Then it was possible to connect the specified port number and DRAMA should be able to connect if it is using the same port and host number. Press "%Esc"% and "%]"% to exit.

    If in the above, you only get the "Trying" line, then a firewall is likely preventing access.

    Connection problems - silent lack of connection/timeout

    If what happens is that when you try to connect, the message seems to be sent ok, but you get no reply, then it is likely the problem is in the connection from the target machine back to your machine.

    To resolve this, use "telnet" as above but from the other directory

    Quick testing of network problems

    When debugging such issues, I find it easier to just run the "ticker" ("$DITS_DEV/ticker") program as the target task and "ditscmd" as the message sender.

    The "ticker" task's "TICK" action can be used.

    IMP Start up files.

    IMP Start up files are probably a necessity when communicating with machines which are likely for whatever reason to crash, be switched off or rebooted. In particular, VxWorks machines. In addition, if you intend using a predefined port number for the network tasks, a start up file is needed for every machine. These files are named

        
            IMP_Startup.node
    

    Where "node" is the node name of the machine which is running the start up file. On a UNIX machine, "node" should be what ever the command "uname -n" returns (it must be the same case). On a VxWorks machine, use the node name shown for the machine by the command "hostShow". From version 1.5 of DRAMA, if a file of this name does not exist, IMP will look for one named "IMP_Startup"

    These files must reside in either the default directory (at the time that the IMP master program is started) of in the same directory as the IMP master program. For VxWorks and VMS, it should always be the default directory at the time IMP master is started. For the UNIX version, if you start master by specifying a full directory path (or an alias to a full directory path), then you can store the files in the same directory as IMP master.

    Start up file entries of interest are

            Log startup sequence
            Use port 50000
            Pulse connections to node every 10 seconds
            forward registrations to machine1 machine2 
    

    The first line tells the tasks to inform you when it reads the start up file. This line is probably only necessary if you want to confirm you are picking up the right file.

    The second line tells IMP to always use port 50000 for network connections. This means that inter machine communications are independent of the user names tasks are running under. Beware, it also means that different user's can interfere with each other (although only once they start talking to other machines). Note that only one user will be able to run on a given machine using the specified port number.

    The third line tells IMP to pulse connections to the given node every ten seconds. If it gets no response in that time, it notifies all tasks communicating with the given node that it has gone down. Every machine which may communicate with a VxWorks machine will need a line like this for each VxWorks machine. In addition, each VxWorks machine should have a line like this for each node that may communicate with it. Each such line will case a copy of the ImpPulse program to be started when DRAMA starts up - so each one adds a new DRAMA task.

    The fourth line tells IMP to forward registration messages to the specified machines. Each Registrar task on those machines will receive messages when any task starts up on the local machine.

    Note that the VxWorks version currently requires the node names be specified as Names, not numbers. This is a known bug. You can add IP names to VxWorks in your VxWorks start up script using hostAdd().

    WARNING

    When using "pulse connections" to/from VxWorks machines, there is an issue which occurs if the DRAMA networking tasks (ImpPulse, Master, Transmitter and Receiver) are running at the same or lower priority then a program which has a hard CPU loop. Under VxWorks, the networking tasks won't have a chance to get in whilse the hard CPU loop is running and DRAMA thinks the machine has crashed. Since all DRAMA tasks are, by default, run at the same priority, this problem tends to happen when there is any DRAMA tasks which uses a lot of CPU (E.g. takes more then about 20 to 30 seconds to process a particular DRAMA message event).

    To solve this problem you should manipulate priorities appropriately. For example - in a couple of my applications, CPU intensive programs automatically lower their own priority whilst in such CPU loops and then restore it. Alternatively, must run them at a lower priority then the DRAMA tasks.

    It is not impossible for this problem to also occur on other machines - it has been seen at least once under Solaris. Sometimes the solution is to ensure the pulse rate is sufficent to cover such events.

    IMP Port numbers

    Internet TCP ports are used for connections between machines. The receiver program must use a well defined port for DRAMA systems to communicate. The default port number used is based on the first two characters of the username. This works well in development but should be considered further before deployment. It can of course trigger problems if two users on the one machine have the same first two characters in their user names (case insensitive).

    IMP Start up files should be used to set the PORT number. The best approach is normally that a given system uses the same port number throughout and the Use port n specification be used.

    It should be noted that IMP also allows the following specification type

    use port  for 
    
    This appears to allow you to use different ports on different machines. But it requires that communications start in an appropriate order and can be reliable as a result. If you feel that need to use it, please talk to Tony.

    Note in some cases it may be desirable to know the default port number. In later versions of DRAMA (V1.5 onwards) this can be determined by running the program

     $IMP_DEV/impuserport 
    


    IMP file usage.

    Several complexities with the use of the file system by IMP have been noticed.


    SDS leaks slow down application.

    Any application which is leaking SDS IDs will slow down. An application should always release SDS IDs which it is finished with. It normally takes considerable time for this problem to show in a significant way (say 5 to 8 hours, depending on the appllication, available CPU and memory resources etc.)

    This effect can be very noticeable with user interface applications written in Tcl, probably because it is each to forget about releasing resources in Tcl. The effect seen is that responses to X events and DRAMA messages becomes ever slower.

    The DRAMA control message SDSLEAKCHK can help monitor this. For example
    % ditscmd -c TICKER SDSLEAKCHK
    TICKER:Received registration message from DITSCMD_1440@aaolxp.aao.gov.au
    ------ SDS Leak check ----------------------------
    Highest number of SDS ID's allocated so far: 5
    Current number of SDS ID's allocated       : 4
    Highest number before more memory allocated: 2000
      DRAMA allocates at least 3 ID's on start up.
        4 if parameter system is in use.
    If the number of allocated ID's increases over time
      then you are likely to be leaking SDS ID's.
    --------------------------------------------------
    
    Send this command multiple times during the operation of the program. If the current number of SDS ID's allocate is going up all the time then you are likely to be leaking SDS IDs'

    Normally what has happened is that the task is allocating an SDS ID without freeing it, and repeating this a lot. These ID's are small, so you often don't see much of a memory leak, but over time they cause the program to slow down dramatically. Any of the SDS calls which allocate ID's may be the problem. You normally have to match each of these calls with calls to SdsFreeId().

    Chasing SDS ID Leaks in DRAMA Action or UFACE event handler Implementations

    DRAMA now (from DITS version 3.53) provides internal features to help chase SDS leaks. If you set the new DITS debugging flag DITS_M_LOG_SDSCHK (value 32768) then DRAMA will check for SDS leaks around your obey handlers (action routines) and UFACE handlers.

    When this is enabled and if your action handler changes the number of outstanding SDS id's, then an error report will be made to stderr (and the log file, if any). (In TCL, the error is reported using TCL error reports)

    Note - some handlers will be allocating new IDs or releasing olds ones intentionally. You can turn off the warning output for these by calling DitsSetFixFlags(3) DitsSetFixFlags(n). For example:
    DitsSetFixFlags(DITS_M_NO_SDS_CHECK);   (C/C++ code)
    
    or
    DitsSetFixFlags NO_SDS_CHECK		(Tcl)
    
    Within the event/action reschedule handler.

    Setting debug levels is done by the DitsSetDebug(3) routine or one of various interfaces to this, e.g.
    setenv DITS_DEBUG  32768         (before a program starts up)
    
    ditscmd -c <task> DEBUG 32768    (command line to running task)
     
    DitsSetDebug(DITS_M_LOG_SDSCHK)  (C call within task)
    
    DitsSetDebug SDSCHK	         (Tcl command)
    

    Chasing SDS ID Leaks in C code

    There is a C function - SdsSetWatch() - which allows you to watch events on an SDS ID, in particular, SDS Free ID calls to help you see which ID's are being free-ed.

    You can now bracket a set of SDS calls with SdsCheckInit(3) and SdsCheck(3). These are available from SDS version 3.40. When you call the routine SdsCheck() it will return bad status if there has been any change in the number of outstanding SDS ID's. (Note - both increases and decreases. Decreases can indicate that you released an SDS ID you didn't mean to).

    There is another new routine to consider - SdsFreeIdCheck(3). This function checks that the Sds ID you are freeing does not refer to data that should have been tidied up using SdsDelete(3) or SdsReadFree(3). This is experimental at the moment - it is hoped it will eventually replace SdsFreeId(3). The easiest way to use this at the moment is to insert the line
    #define SDS_CHECK_FREE
    
    Before your first include of any DRAMA related include file. If you do this then any call to SdsFreeId(3)is redefined to use SdsFreeIdCheck() and a message will be printed to stderr indicating the problem and which line in your code the call came from. (Since this is misleading for C++ objects (it is triggered by the contructors), a different message is output indicating some object details)

    Chasing SDS ID Leaks in C++ code

    Of course any of the features available in C code are available in C++ code, but there is an extra feature which makes things easier. This only works if you have done:
    #define DRAMA_ALLOW_CPP_STDLIB 
    
    Before loading sds.h. You can then add the line following lines to your functions or methods:
    SDS_CHECK_IDS("function")
    
    Where function is the function or method name. This creates an object which wraps up the SdsCheckInit()/SdsCheck() functions. When the destructor of this object is run (when control leaves the enclosing block) then the check is run and you are told if any SDS ID's have leaked or been released since the constructor was run. It should be noted that if you use the SdsId class for SDS access in C++ code, then you are far less likely to get into trouble - it does most of the work of tidying up SDS itself.

    Chasing SDS ID Leaks in Tcl code

    First note that the DRAMA Event level SDS Leak checking enabled by the logging flag DITS_M_LOG_SDSCHK does work in TCL code. But in addition, you can use SdsEvalAndCheck(n). This DRAMA TCL command works like the TCL "eval" copmmand, in that it evaulates its arguments as a TCL command. But additionally, it will complain if you have leaked or released SDS ID's during the command. This is an easy way to check a given TCL command. Additionally, there is a TCL version of SdsFreeIdCheck.

    Chasing SDS ID Leaks in JAVA code

    First note that the DRAMA Event level SDS Leak checking enabled by the logging flag DITS_M_LOG_SDSCHK does work in JAVA code. Additionally, in DJAVA's SdsID class the methods SetDebugging(), EnableFreeIDWatch() and ClearFreeIDWatch() are usefull when you need to debug SDS ID leaks.

    Older DRAMA Versions

    For old DRAMA versions (SDS prior 3.40, DITS prior 3.53) some of the above features are not available. It is relatively easy to check if an application is leaking SDS IDs. At any point when an application is given a new Sds id, it can print it out as a long integer. If the values for new SDS IDs continue to go up, it indicates a leak.

    The function SdsGetIdInfo() (available from DRAMA V1.5) can also be used to help here.

    SdsWatch

    It may also be worth considering the use of the SdsSetWatch(3) routine to monitor operations on existing SDS ids.

    SdsListInUse

    The SdsListInUse(3) routine will list the set of SDS id's which are currenty being used to stderr. This is a very simple debugging routine, but can be usefull.

    Connection failures

    If you get a failure to connect to a remote task (DITS-F-NO_NET_ON_REMOTE) but are convinced that remote networking tasks are running, it could indicate the remote networking tasks are not using the port number you expect. Check the IMP Start up file usage on both machines in question


    ImpRegister failure - No space left on device

    This occurs under Unix, general with an error like the following ##TICKER: Failed to create message semaphore # 16950 # TICKER: semget returned errno value 28, No space left on device # TICKER:ImpRegister failed This indicates you have run out of System V semaphores. The confusing No space left on device is a straight translations of the error number value DRAMA gets after the failure. This error message will be changed in the future.

    You can run out of semaphores for one of two reasons. First, you may be trying to run more tasks then the system allows. The number of semaphores is a fixed system-wide limit. See the installation pagefor details on how to monitor and increase this limit.

    The second reason is that some DRAMA tasks may have died without cleaning up correctly after themselves. This may happen if they are killed using "kill -9" or if they die in a debugger or in some cases after a core dump (although I beleive this last case should be ok). In the later case, you can use the cleanup command to remove the dead tasks. As cleanup cleanup cannot always tell the difference between dead and living tasks, it's default mode removes all tasks being run by the user who invokes it on that machine. There are other options available. Use cleanup -help to list these options.


    Click here for the DRAMA home page and here for the AAO home page.

     For more information, contact tony.farrell@mq.edu.au