X and signals

Dan Heller writes:

* The problem is that Xlib has no layer for signal handling.  The
* reason is dubious, but it's in many people's opinion that it's
* due to DEC's strong involvement with X and since DEC really likes
* VMS (brought to the US on one of Columbus's ships circa 1492 and
* VMS does not support signals....

As the person who did Xlib, as well as the "strong Digital involvement",
I care to set the record straight.  I have been using Unix off and on
since 1976; while I have VMS experience, it entirely predates Digital,
and was for 3 of those 14 years; my Unix usage is more like 9 of those
years, and the entire time I've been at Digital.  Don't impune someone's
motives without good cause. 

And speaking of Columbus, Unix is older than VMS. :-).  It also deals
with asynchronous stuff much better than Unix.  The reason why Xlib
ducks is that Unix hasn't been up to it.  (Don't take me for a VMS lover
though; I'm emphatically not fond of it personally.)

1) Xlib cannot AFFORD to mess with signals in the default case (i.e.
protect the X buffer and/or event queue by masking signals).  The system
call overhead is much too high.  Other approaches are possible
given locking of Xlib data structures.

2) Non-BSD based systems have not had "safe signals" (This seems finally
to be changing).  To presume a given signal model would have reduced
Xlib's portability greatly.  On most System V based systems, it would
not have been possible at all.

3) anything Xlib does would almost certainly get in the way of what a
toolkit would do in this area.  Ergo, Xlib stays out of the way, and
doesn't do anything to you.

4) the only issue with Xlib and signals is that you had better not call
back into Xlib ON THE SAME CONNECTION to a server from a signal handler.
On systems that allow system call restart, the implementation should be
doing the right thing.

If you've ever looked at Xlib sources, you would have seem that in fact
there has been some provision for locking of Xlib data structures.  Some
people's Xlib implementations on some operating systems may in fact work
correctly (i.e. allow calling into Xlib from a signal handler).  At the
time I was working on the X11 Xlib, I had a 5 processor SMP workstation
in my office (DECSRC Firefly running Topaz,
a research system capable of running Unix binaries, but with threads and
very fast RPC), so don't accuse me of ignorance.  There was also some
care put into the interface design for multi-processing; contrast this
to Unix's use of errno.

Signals (or VMS AST's) are not the right way to deal with asyncrony, in
any case; the right solution are threads.  Maybe someday Unix will be a
decent operating system.  Closest things to decent right now are
DECSRC's Topaz and CMU's Mach.  In the meanwhile, Xlib will be silent on
the issue, rather than doing what could be at best a half-assed job
which would just get us into trouble when POSIX threads finally become
a reality.  Even then, we've got mucho trouble just with the Unix
system call interface, for example.
				- Jim Gettys

As has often been pointed out before (comp.lang.c, comp.std.c,
comp.std.unix) about the only thing one can do safely in a signal
handler is set a flag, and return, or cleanup and exit.  (Even the
latter may not be quite safe, but heck, you're exiting...) 

Calling any procedure or touching global data is perilous -- parts of
the C library are not reentrant (malloc for sure) and calling anything
that might call these parts might put your application into a very
confused state.  I don't know how much of Xlib or Xt is re-entrant, I
suspect there are non-re-entrant parts.  (Perhaps a scenario where
Xlib gets interrupted when packing data into a buffer, the signal
handler tries to pack more data into that buffer resulting in an
interleaved mess?)

---------------------------------------------------------------------

The problem with signals and X is rather simple.  X is asynchronous
and relies on a communication between the client (application) and
the server (the X server).  This "protocol" is defined and implemented
using Xlib.  If you are putting together a packet to send over the wire
to the server and a signal "interrupts" your client side application,
then you have interrupted the protocol and *may* result in a protocol
error with the server.  You may even get a core dump.

This is not guaranteed to happen, but the case becomes much more
likely if your event handler attempts to make any Xlib calls (which
includes many, but not all, Xt calls).  Perhaps the most common
case of this is the SIGALRM signal, which is delivered as a result
of such things as "setitimer".

The "fundamental problem" still exists: if you interrupt the
protocol betwween the client and server, then you must be sure
to return to the place in the application that was interrupted
before making further attempts to communicate with the X server.
Therefore, you signal handler should not display dialog boxes,
make Xlib calls, or do much any anything until it is guaranteed
that you are not going to interfere with a protocol exachange.

So, what do you do?  Well, your signal handler could be written
to set a flag to note the delivery of the signal and then return
so that the code being execute may continue with whatever it's
doing till you know it's safe.  *that* is the problem -- you 
never know when it's safe.  And when it *is* safe, you are not
in any application-specific code.  There may not be any more
events in the queue so that your next event handling function
won't be called for quite some time.

My proposed solution is to implement a routine:
    XtSigRet
    XtAddSignalHandler(signal, handler)
	int signal;
	XtSigRet (*handler)();

XtSigRet is either void or int depending on what your UNIX OS
uses for signal().  When you register your signal handler for
the specified signal, when the signal is delivered, *Xt* catches
the signal and notes the signal context, resources, etc..
Then, when all is well with the events and protocol, etc,
your handler is called.  Your handler is written _exactly the same_
as it would be had you called "signal(sig, handler)" because
the same parameters are passed back into the signal handler.
(XtAddSignalHandler returns the previously set handler just
like signal() does.)

Now, you could implement this yourself via "work procs"
(see XtAddWorkProc()), but the granularity isn't fine enough.
That is, work procs are only called when there are no events
pending.  Implemented properly, the internals to Xt would
check in between event delivery to event handlers.
**** BUT ****

Are you suggesting that you do an XtAddWorkProc from the signal
handler?  You really can't even do that safely because the client code
being interrupted by the signal might also be in the middle of a call
to XtAddWorkProc and the workproc queue (or whatever database is used
to keep track of them) might be in some inconsistent state.  Just about
the ONLY thing you can do safely from a signal handler is set a flag.

One solution to this problem, as you mention in your message, is to set
a flag and use a modified XtMainLoop that checks the flag upon return
from XtNextEvent before calling XtDispatchEvent.  I've implemented
something like this and the one problem I have is that the arrival of
the signal will not kick the client out of XtNextEvent so your signal
handler will not get executed until some other event comes along.
Usually, this is no problem, but, if the signal you're catching is
SIGQUIT, there probably will not be any more events coming from the
user.

The real root of this problem, at least on Unix systems, is that the
select in the Xlib tries to recover from being interrupted by a signal
by checking errno for EINTR (which is the value returned when a system
call is interrupted by a signal).  If it finds EINTR, it just dives
back into the select call.  On the surface, it looks as though you
could hack an Xlib to respond to signals by changing this behaviour so
that XtNextEvent would return with a null event (or something) if it
was interrupted and your special XtMainLoop could examine your signal
flag.

I envision a signal handler like:

	SignalHandler
	{
	    sigFlag = TRUE;
	}

and an XtMainLoop like:

	MyMainLoop()
	{
	    XEvent event;

	    for (;;)
	    {
		XtNextEvent( &event );
		if (event.type != NULL_EVENT)	/* or something like this. */
		    XtDispatchEvent( &event );
		if (sigFlag)
		{
		    process signal.
		    Inhibit signals.
		    sigFlag = FALSE;
		    Enable signals.
		}
	    }
	}
             
            ******* event loop + processing *******

* Before starting any event loop, call pipe(2) to get the two ends of a
* pipe as file descriptors.  [...] [W]e're going to use it to talk to
* ourself!

Be very careful here.  Is it guaranteed a pipe is buffered?  I don't
think the specified semantics forbid an implementation where a write
blocks until all data have been read.

* The solve() procedure can fork a separate process (and possibly nice
* it) if you want; you could then have a STOP button that killed the
* child to stop solving.

Well, dubious at least.  You will probably get away with it just fine,
but wait for the day when you're writing that one byte too many, or
(potentially) the system is short of mbufs, and you deadlock with
yourself....
---------------------------------------------------------------------

* I've been thinking a little about this problem [races in signal
* delivery] and had the following idea:

*	  Boolean sigSet[MAX_SIGNALS];
*	  static struct timeval pollTimeout = { 0, 0 };
*	  static struct timeval *timeoutPtr;
*	  static fd_set *inputSetPtr;
*	
*	  void SetSignalFlag (sig)
*	  {
*	    sigSet[sig] = TRUE;
*	    timeoutPtr = &pollTimeout;
*	  }
*	
*	  SuperLoop()
*	  {
*	    while (whatever)
*	    {
*	      timeoutPtr = NextTimeoutTime();
*	(1)   UnblockSignals();
*	      nready = select (NOFILE, inputSetPtr, 0, 0, timeoutPtr);
*	(2)   BlockSignals();
*	      ExecuteRunnableThreads();
*	    }
*	  }

* So the question is, will the select always return with EINTR or after
* a polling call?  Of course it doesn't deliver true async execution
* because it assumes that all you do is set the flag in the interrupt
* handler, but does it eliminate the race condition (does it *always*
* prevent the select call from hanging indefinitely given no scheduled
* timeouts)?  Do I *always* get what I want no matter what point the
* signal or signals hit between (1) and (2) ?

Alas, no.  If a signal arrives after the last argument to select has
been pushed on the stack (I'm assuming a stack-based argument passing
scheme, which is common enough to be reasonable) but before select has
been called, you'll lose.  (Or, if possible, after select has been
called but before it's entered the kernel.)

Suggested fix:

  Boolean sigSet[MAX_SIGNALS];
  static struct timeval pollTimeout = { 0, 0 };
  static struct timeval timeoutValue;
  static fd_set *inputSetPtr;

  void SetSignalFlag (sig)
  {
    sigSet[sig] = TRUE;
    timeoutValue = pollTimeout;
  }

  SuperLoop()
  {
    while (whatever)
    {
      timeoutValue = NextTimeoutTime();
      UnblockSignals();
      nready = select (NOFILE, inputSetPtr, 0, 0, &timeoutValue);
      BlockSignals();
      ExecuteRunnableThreads();
    }
  }

That is, instead of changing the pointer, change the value pointed to.
(I have assumed your C supports structure assignment and I've also
changed the return value of NextTimeoutTime from struct timeval * to
plain struct timeval.)  This works because the pointer is not followed
until select() is inside the kernel and signals have therefore been
(implicitly) blocked.

This doesn't work if you have a (hypothetical, as far as I know) select
that returns the remaining time in the timeout time structure, and you
actually use that information, because a signal can arrive after the
select returns for some other reason and bash timeoutValue.  But since
as far as I know nobody's select actually does that yet, and therefore
no code depends on it, you should be OK.

Select really should take a sixth argument indicating a signal mask and
also set the signal mask to that value atomically with the waiting for
some file descriptor to come ready or the timeout to expire, and reset
it to its previous value upon return, something like sigpause() and the
current select() rolled into one.

-----------------------------------------------------------------------
Berkeley sockets and AT&T Streams
both have a facility to deliver a signal on receipt of data on a network
connection.  Under SYSV it's SIGPOLL, under BSD I think it's SIGASYNC.

I don't use XtMainLoop at all.  The init routine does this:
	
	for (fd=3; fd&lt;100; fd++)
		if fd is a network socket
			make fd send the input-available signal

	This is, admittedly, grody, but Xt won't find the fd's for me.

The signal handler for this sig sets a flag and exits.

Periodically you check that flag and run this routine:
	
	XtInputMask mask;

	while(mask = XtAppPending(applicationcontext))
		XtAppProcessEvent(applicationcontext, mask)

Under SYSV, this pair stops sigs and then handles them:
	sighold(SIGPOLL);	/* single-thread this routine */
	sigrelse(SIGPOLL);	/* take more signals now */

Using polling when interrupts are available is extremely goony.
If you follow the other suggestions in this thread, your 7-hour calculation
will become 8 or 9 or 10 hours, because of all the polling involved.

The only problem with this scheme under SYSV is that signals always make
system calls return with EINTR, so I had to guard all file I/O with 
sighold/sigrelse pairs and play games with the command line I/O handler.
Also, there are sighold/sigrelse pairs around all X driver routines.

I have a client written solely in Xlib routines.  The user can instruct the
program either to display another drawing or to exit, by means of key presses
or button presses.  The user can also resize the window with a window manager,
in which case the bit gravity of the window (set to ForgetGravity) clears the
window, and the program redraws the drawing in the new size.

Unfortunately, it takes a fair bit of time for the client to compute and for
the server to draw this window.  During this time, users type keys, press
buttons, or resize the window, and they don't get what they expect.  A key
press which tells the program to go to the next picture is not seen by the
client until it finishes drawing the current picture.  A resize causes the
window to be cleared, but the client keeps on drawing the rest of its picture
in a size appropriate to the old size window, into the newly cleared and
resized window.  It then gets back to the main event loop, sees expose events,
and redraws a correctly sized picture on top of this partial picture.

Part of the solution is fairly easy.  I can just clear the window on the first
Expose event following a ConfigureNotify, and the user will never end up with
two pictures on top of one another.

However, the user will still have to wait for the code to finish drawing a
picture he doesn't want to see, before the code recognizes that he has typed
a key.  In the non-event driven paradigm for C code under Unix, problems like
this are handled by having the tty driver convert a few special keystrokes into
signals.  No analogous mechanism appears to be available in X11.

I am willing to have the image-drawing code occasionally examine the event
queue, and return to the main loop if it finds an appropriate event, but I
can't find an Xlib call which does what I need.  I want to peek into the queue
to decide whether I should stop drawing and return, but if there is no event
which I interpret as meaning stop drawing, I want to continue drawing.
Unfortunately, XPeekEvent() and XPeekIfEvent() cause the code to block until
an event is available.  This would prevent the code from ever finishing the
drawing.  All the XCheck...Event() functions pull the event off the queue.
This means that the code would not know where in the queue the event was
found.  When somebody presses "n", I want to be able to go immediately to
the next drawing.  This means that I must flush all expose events which
occurred before the key was pressed, but process expose events which occurred
after the key was pressed.  Once the event has been pulled off of the queue,
I believe I've lost any indication of the event's position in the queue.  If so,
None of the XCheck...Event() functions would be useful.

The best idea I've come up with is to call XCheckIfEvent(), write my predicate
function so that it always returns False and store information in a global
variable about whether the drawing routine should continue drawing.  That would
allow the client to prevent the event from being pulled from the queue, and
still not block.  Unfortunately, it would require the entire event queue to
be searched every time.  This seems like it could slow down the drawing
routine substantially.

-- 

* For instance, what if I
* had a model solver that was going to take 7 hours to run a solve on a
* model. The user clicks a SOLVE widget and its out of the event loop
* for 7 hours. During that time all kinds of events could be received

Don't bother with signals.  Use this totally different approach, which
sounds bizarre at first but works wonderfully, allowing events to be
processed normally while your long processing goes on.  It makes use
of the XView Notifier - for you Motif fans, remember you can use the
Notifier separately from the rest of XView, avoiding OpenLook.

Caveat: I've only actually implemented the following scheme under
SunView, but I believe it will work with XView.

Before starting any event loop, call pipe(2) to get the two ends of a
pipe as file descriptors.  Usually you do this to before forking to
establish a means of talking to a forked child, but we're going to use
it to talk to ourself!

Interpose an event handler using Notify_interpose_event_func(), then
start an asynchronous event loop with Notify_do_dispatch(), then enter
your main loop, which is now a loop blocking on a select(2) with one
end of the pipe in its readmask.

When an event occurs, your handler looks at it.  If it's not the SOLVE
click you've been waiting for, you call Notify_next_event_func(),
making happen whatever would normally happen.  But if it is that long-
awaited SOLVE click, just write a token byte to the other end of the
pipe and return.

Data in the pipe causes your main-loop select(2) to return; your main
loop wakes up, reads the token, discovers it's supposed to SOLVE, and
calls solve(), which goes off and solves for your 7 hours.  In the
meantime, the asynchronous event loop continues normal processing of
other X events.

You've satisfied the dictum that event handlers shouldn't do much,
events continue to get normal processing while solving, and you can
initiate solving with a click.  The solve() procedure can fork a
separate process (and possibly nice it) if you want; you could then
have a STOP button that killed the child to stop solving.

I know, I know, programs that talk to themselves are sick; I laughed
at the friend who first came up with this sort of thing for a long
time.  Then I tried it.  Sooooo many things suddenly got easier.

--
Dick St.Peters, GE Corporate R&D, Schenectady, NY
stpeters@dawn.crd.ge.com	uunet!dawn.crd.ge.com!stpeters

Bill Daniels writes:
*Caveat:  I am a very green X programmer.
*
*I am working on an application in which the client application needs to
*receive input from a file/socket as well as normal events.  In my environment,
*which is DECwindows, I have access to a call, XtAppAddInput, that will allow
*this to happen.  I cannot of course find this documented in any generic
*X document which leads me to believe that it is not a portable construct.
*My question is how to accomplish this maneuver in a portable fashion.  I hope
*that I am not alone in needing this capability and that someone more versed
*in X than I will be able to offer me some guidance.

The X User's group (XUG) post a "frequently asked questions & answers"
message to comp.windows.x once a month and they are thinking of doing so more
frequently. Your problem hasn't yet made it onto the list; but I think this is
more to do with the fact that the "frequently asked questions" posting
has just gotten off the ground rather than it being an uncommon question.
So, I'd advise you to do several things


	1. Keep an eye out for the regular posting since it may prove
	   useful for other problems.
	2. Look at what I've given below --- it may appear on a future
	   "frequently asked questions" posting.
	3. Learn the difference between "Xlib" based programs and
	   "Xt" based programs. Basically the "X Toolkit Intrinsics",
	   commonly abbreviated to Xt, is a library built on top of Xlib.
	   It is designed to take some of the burden off programmers.
	   Unfortunately for you, you can't just mix and match Xlib code
	   with Xt code at will so you won't be able to use XtAppAddInput()
	   (which is an Xt routine) in a program that doesn't otherwise
	   follow the Xt style of programming.

	   While it's can be useful to be able to program in Xlib, you
	   might find that learning how to use a toolkit will save you
	   time in the long run.


---------------------------------- CUT HERE ---------------------------------
Question:
How can my Xlib based application communicate with more than one X
and/or some other input source simultanously?

Answer:
If you're programming on a BSD UNIX type system then the following code is
directly of relevence. If not, then you'll have to consult your
system's manual to find an equivalent of the select() system call.
To help you, the following is a brief overview of the select() call.

----------------------- start of overview on select() ---------------------
	int select(int nfds, int *readfds, int *writefds, int *execptfds,
		   struct timeval *timeout);
	
	The select() call is passed in 3 bitmasks - "readfds", "writefds"
	and "execptfds" - the size of these bitmasks in "nfds".
	The bitmasks represent the file descriptors that select() is
	to pay attention to. It will block until there is something
	waiting to be read on any of the "readfds" file descriptors,
	or it is possible to write to any of the "writefds" file
	descriptors or an exception (such as "end of file") is
	encountered on any of the "exceptfds" file descriptors. File
	descriptor N is indicated by "1 << N" in the appropiate
	bitmask. The "timeout" parameter is used to determine the
	maximun amount of time that select() should block for. If
	a NULL pointer is passed in then select() will block
	indefinitely. Upon return, the select() call will overwrite
	the bitmasks to indicate which file descriptors are ready.
	Also the return value is -1 on error, or the number of
	ready file descriptors on success.
------------------------ end of overview on select() -----------------------


You can use the select() system call to wait until there is something
waiting to be read on one of several file descriptors - it's more effecient
than waiting via busy looping. So to manage 2 display connections and
a socket/pipe connection simultaneously you might use a main loop of
the form ...
(disclaimer: I haven't tested this code)

	Display *display_1 = XOpenDisplay(...);
	Display *display_2 = XOpenDisplay(...);
	int other_fd = ... /* code to open a socket or pipe etc */
	struct timeval zero_time;

	zero_time.tv_sec = 0;
	zero_time.tv_usec = 0;

	while (1) {
		int	fd_1, fd_2;
		long	read_mask; /* A 32 bit bitmask - HOPEFULLY big enough */
		int	num_fds, result;
		boolean	got_event = FALSE;

		/*
		** check if there are any X events waiting to be processed.
		** But don't block if there isn't.
		*/

		if (XPending(display_1)) {
			got_event = TRUE;
			XNextEvent(display_1, &event);
			... /* process it */
		}
		if (XPending(display_2)) {
			got_event = TRUE;
			XNextEvent(display_2, &event);
			... /* process it */
		}
		/* ditto for display_3, display_4 ... */

		/*
		** check if there is any data to be read on "other_fd".
		** But don't block if there isn't.
		*/
		read_mask = 1 << other_fd;
		num_fds = other_fd + 1;
		result = select(num_fds, &read_mask,	/* read mask */
					 (int *)0,	/* no write mask */
					 (int *)0,	/* no exception mask */
					 &zero_time);	/* don't block */
		if (read_mask & (1 << other_fd)) {
			got_event = TRUE;
			/* code to read and process data */
			...
		}
		/* ditto for any other socket/pipe connections */

		if (got_event) {
			/* back to top of while loop */
			continue;
		}
		
		/*
		** There's no events to be processed so wait until there is.
		** The select system call is used for this.
		*/

		/* get the file descriptor of the Display connections */
		fd_1 = ConnectionNumber(display_1);
		fd_2 = ConnectionNumber(display_2);

		/* set up the read mask for select() */
		read_mask = (1 << fd_1) | (1 << fd_2) | (1 << other_fd);
		num_fds = max(fd_1, max(fd_2, other_fd)) + 1;

		result = select(num_fds, &read_mask,	/* read mask */
					 (int *)0,	/* no write mask */
					 (int *)0,	/* no exception mask */
					 (struct timeval *)0);	/* no timeout */

		/*
		** The value of "read_mask" could be checked to see
		** which of the file descriptors have data waiting to
		** be read. However, falling back to the top of the while
		** loop will suffice .
		*/
	} /*of while*/


If you modify the above loop style then beware of the following:

For various reasons (including network effeciency, I suppose) some X events
which are yet to be seen are stored in a queue in main memory. Hence, even
though there may be nothing to read on ConnectionNumber(display), there might
be X events held locally by the Xlib data structures. So make sure to use
XPending(display) before doing a blocking select call. Otherwise your program
is likely to hang.

The chapter on event processing in "Introduction to the X Window System",
by Oliver Jones, Prentice-Hall, 1988 (ISBN 0-13-499997-5) also discusses this.
---------------------------------- CUT HERE ---------------------------------
Tim Love (tpl@eng.cam.ac.uk)