published in , July, 1994
Copyright © 1994 Kenton Lee. All rights reserved.
In this paper, I will examine several different aspects of X application performance. I will start by discussing performance metrics and general techniques for designing performance into your applications. Unfortunately, you can't always simultaneously optimize all types of performance, so the bulk of this paper will discuss possible software engineering tradeoffs that you should consider. To keep the paper to a reasonable length, I won't go into a lot of detail on particular issues; I will leave it to you to apply them to your particular applications.
Hopefully, this material will help you improve your new designs and/or solve problems in existing designs.
For most X applications, you cannot easily give an interactive application a single performance rating. Performance measurements can vary widely depend on the user's particular task. When studying the performance of an application, you must study it in a manner similar to the way the user is using the application.
If your application is well designed, you will probably have generated a set of typical user tasks or use cases for which your application will be used. This task analysis is critical to designing an easy-to-use X-based user interface. It is also very important for optimizing the performance characteristics of your application. In the analysis phase, you can create performance requirements, using the metrics discussed in the last section, for the most common user tasks. In the design phase, you should create designs that avoid general performance problems in the associated functional areas (including interactions between functional areas).
Remember, of course, to consider all relevant variables in your performance analysis, including data set sizes, client and server hardware, networks, etc. For example, if your users are always using large data sets, performance tests with small sets are not very useful.
Hopefully, the performance problem will be caused by an isolated bottleneck, rather than a major design flaw. If it is a bottleneck, you can usually work backward from the problem's symptoms to identify the problem. From there, you may be able to apply some of the guidelines and tradeoffs given below to improve the performance of a low level module.
If the problem is a major design flaw, you may be better off reworking everything from the design on down. As with any other bug fix, a poorly designed solution usually leads to more problems in the long run.
Note that most of these items may have a very small effect if applied only once, but the benefits add up quickly if you are performing an operation hundreds or thousands of times. Look for the latter cases, which are often very common in graphics and interactive applications.
I can't list all possible flawed designs, but the three most popular are probably these:
As with most tradeoff decisions, the best answer depends on your application. In most cases, however, if the performance differences are small to moderate and the simpler algorithm can be implemented much more quickly, I will choose the simpler one. The simpler algorithm will generally lead to more robust and more easily maintained code. It will also help get the application or prototype running much sooner. If I later find that I need more performance in this module, I can then switch to the more complex algorithm.
Many programmers create all the secondary windows at start up time, but leave them unmapped. In the X Toolkit and Motif, this is similar to realizing but not managing widgets. When you need the windows, they can be mapped or realized very quickly. Unfortunately, creating them at start up time slows down your application when it may be trying to do a lot of other initializations, thus delaying the display of your primary windows. Also, the unmapped windows can use a lot of memory (both client and server), especially if they contain pixmaps.
An alternative implementation is to create the secondary windows when they need to be displayed. This avoids the above performance and memory problems, but causes the secondary windows to display more slowly.
A third choice is to use the X Toolkit work procedure mechanism (XtAppAddWorkProc). Your application would create all its primary windows at start up time, then create all the secondary windows via work procedures. Work procedures are only executed when the X Toolkit event loop would otherwise be waiting for input. This technique speeds up display times for both your primary and your secondary windows. It does not help with the memory use problem mentioned above, though.
To choose between these techniques, you should examine the performance needs of your application. Many people use different techniques for different parts of the same application. The first technique is probably inappropriate for memory intensive windows that are rarely, if ever, used by your application. The second technique is possibly inappropriate for windows that must be quickly displayed, such as popup menus. The third technique is a good performance compromise, but is a little more complex to implement and requires the X Toolkit.
If delayed displays are not possible, you should provide some sort of indication that the application is processing the input. Some applications change the cursor to a stop watch or hour glass shape. Others popup a dialog message window. The immediate feedback informs the user that the input was accepted and gives the illusion that your application is responsive, even though the user still has to wait.
If your application frequently performs round trip requests, you should consider caching schemes or alternative architectures to minimize them.
Note that Xlib and the X Toolkit have been designed to minimize the need for round trip requests. For example, when an X client initially connects to the X server it is allocated a collection of resource identifiers. Functions like XCreateWindow simply retrieve an ID from this collection rather than force a round trip. Also, the X Toolkit caches information like the widget sizes and positions so that the client does not have to perform an XGetGeometry round trip request. You should always take advantage of these built in optimizations when possible. Several others are mentioned in the following sections.
An exception to this rule is the functions that change graphics context attributes. Changing several attributes at once with XChangeGC is no more efficient than the XSet* GC functions because both write to the Xlib GC cache.
Most events are requested via XSelectEvent, window event mask attributes, widget translation tables, and/or widget event handlers. You should choose these carefully to avoid receiving unneeded events.
In addition, GraphicsExpose or NoExpose events may be automatically requested when you do a XCopyArea or XCopyPlane and the graphics_exposures flag in the graphics contexts is set. The default is to send the events, so you should disable this (set the flag to 0) unless you need the events.
Events may also be generated in response to certain arguments to Xlib functions. For example, if the last argument to XClearArea is True, Expose events are generated. You should, of course, only set these arguments if you need the events.
Sometimes, you are not interested in all events of a particular type. For example, you may be interested in EnterNotify events (reported when the pointer cursor enters a window), but only if the user is actually going to use the window. The detail and mode fields of the event structure give you additional information about the event. Most applications can safely ignore EnterNotify events with "virtual" details or "non-normal" modes.
Finally, Expose events have count fields that can help you decide if you need to process the event. See the next section for some Expose event handling ideas.
First, avoid performing unnecessary work in non-expose event handlers. Most applications redraw their displays in response to Expose events. Beginning programmers often also redraw in response to ConfigureNotify (resize) events. Since you will usually get an Expose event whenever your window resizes (especially if the window's bit_gravity attribute is set to ForgetGravity), automatically drawing in your ConfigureNotify event handler may be unnecessary.
Also, remember that the Expose event structure includes count and region (x, y, width, height) fields. By using one or both of these, you may be able avoid unnecessary redrawing. Two popular schemes are:
The two schemes are both implemented by copying window contents to off screen memory and both have the same drawback as the off-screen pixmap copy technique described in the previous section: heavy memory use. Neither backing store nor save unders is guaranteed (either to exist on a server or to be maintained for the life of your window), so your application must provide exposure handling code as well.
You should consider using backing store in place of the pixmap copy technique describe above, but remember that you must provide an exposure handler in case the backing store fails.
Save unders were intended for very transient windows, such as popup menus. It can work well and is the default for X Toolkit transient shell widgets. Unfortunately, some X servers do not handle save unders very efficiently, so you may want to avoid using it or provide an option so that users can enable or disable it.
Note that Sun's olwm window manager has an option that causes save unders to automatically be set for all client transient windows even if the client did not request it. The option is called TransientsSaveUnder and its default is True. Unfortunately, the save unders implementation in Sun's xnews server is poor, so you probably want to disable this.
If you must track the pointer within a window, X provides five main techniques for doing so:
Note that many X tutorial books recommend the PointerMotionHints technique, but in practice it usually gives no better performance than any of the other techniques because a XQueryPointer round trip request is required after each event. It is also the least poorly defined (in terms of X server functionality), so it may behave differently on different systems.[Gajewska]
For certain graphical effects, you can get the similar results by either drawing a simple object (e.g., a line or an circle) with a non-zero line width or by drawing one or more filled objects (e.g., a filled rectangle or two concentric filled circles) with a zero line width.
Which technique is faster? If you're only drawing a few objects, the difference is not significant. If you're drawing hundreds of objects, you should try both cases on your target X server and hardware. You may find that filled zero-width drawing is somewhat faster.
In general, there are three basic parts to basic X drawing: reading the source pixels, reading the destination pixels, and writing the destination pixels. You can usually improve performance if you eliminate the need to perform some of those steps. For example, the GXset and GXclear function just write 1 or 0 into the destination drawable, eliminating the first two steps. On the other hand, GXcopy (the default function) requires reading the source and writing the destination. GXor and GXxor require all three steps. The potential performance differences depend on your graphics hardware and X server, but can be significant.
You can also add more steps to drawing by specifying stipples, tiles, clipmasks, plane masks, etc. The specific performance of these operations varies greatly between X servers and hardware. In general, however, stipples and tiles will slow down your performance. On the other hand, clipmasks and plane masks can improve performance by reducing the number of pixels involved.
You probably shouldn't spend a lot of time handling the general case, however, since most X servers aren't too picky. Most X servers simply return the power of 2 closest to your desired size, possibly restricted to the maximum sizes supported by your hardware.
Since objects with simpler shapes will reduce the number of rectangles or spans, you can improve graphics performance by using these, if possible. The "fastest" shapes are composed of a few large rectangles, but any convex shape will be better than a similar non-convex shape. Of course, a tradeoff like this is only possible if you are willing to sacrifice some functionality for potentially small amounts additional drawing speed.
While shaped windows can be useful, they can cause performance problems in many situations. Most importantly, Expose events always return rectangular exposure regions, so if a shaped window is moved or unmapped, the X server must send many Expose events to each exposed window to describe the shaped region. If the client owning the exposed window processes each event, its performance will be significantly affected. In general, you should avoid repeatedly resizing, moving, or unmapping shaped windows owned by your application. Also, if you expect many Expose events caused by shaped windows, you should write your Expose event handlers to compress the events as much as possible.
A previous paper [Lee92] detailed these techniques so I won't go into them here. The main concept, however, is to segment your color planes and manipulate your colormap so that different color plane permutations have different, pre-defined meanings.
You can also use one GC for different windows as long as the windows have the same root, depth, and visual type.
These techniques are most useful if you are sharing GCs across dozens or hundreds of different windows (as is often the case with X Toolkit widgets). If you only have a few windows, the benefits will be small and the extra complexity of managing GC sharing is probably not worth while.
Note that there is no advantage to using one call to XChangeGC rather than several calls to GC convenience functions. Both interface to the Xlib GC cache, which caches the changes and does not send the updated GC to the X server until they are used in a drawing request.
In some cases, however, you have a choice. You can draw or refresh a window using either XCopyArea and a pixmap or XPutImage and an image. Which should you use? The pixmap technique will usually give you faster drawing speeds since you don't have to copy the data over your network. On the other hand, if memory in your X server is limited (e.g., in a diskless X terminal), you may have to use images or other techniques.
If you do use X images and your client is running on the same machine as the X server, you should consider using the MIT-SHM protocol extension for optimal performance. MIT-SHM is discussed in a later section.
The same principle applies to pixmaps used as X Toolkit label widget resources. A text label will use much less memory, and this can add up if you have lots of labels.
The same problem occurs if you grab the X server with XGrabServer. When one X client grabs the server, it prevents the user from interacting with any other X client and affects the performance of any output operations those clients were performing.
There's usually not much you can do about the drawing issue. You could try using smaller batches to periodically process events. The server grab problem, on the other hand, is usually avoidable: don't grab the server unless it is absolutely necessary. Only rarely is it necessary.
In most cases, you can greatly improve IPC performance if you use your operating system's IPC facilities to communicate directly between the two clients. Doing so can improve performance in two ways. First, you eliminate the X server bridge. Passing large amounts of data via the X server can significantly slow down the performance of the X server and, thus, the apparent responsiveness of X clients connected to the X server.
Second, you have your choice of IPC facilities and may be able to use one that gives better performance than your X server connection. This is especially true if the two X clients are running on one computer but the X server is running on a different computer.
An alternative to .Xdefaults is to put your application resources for X Toolkit applications in user defaults disk files. On UNIX systems, X Toolkit applications will search the directories specified by your XUSERFILESEARCHPATH environment variable for these files and use them in addition to system app-defaults and root property resources. Reading disk files is generally slower than reading X server properties, especially if the files are stored on remote file servers. On the other hand, because the application resource files are usually small (contain resources for only one application) and .Xdefaults files can be large (resources for all applications), there probably is no significant speed difference.
Note that X clients not based on the X Toolkit may not properly search for these files, so you may still have to use .Xdefaults for those clients.
Another protocol extension that helps improve performance is the Multi-Buffering extension. Before this extension became available, applications had to directly manipulate pixel values and colormaps to get high performance multi-buffered effects (e.g., smooth animations or erasable overlays).[Lee92] The Multi-Buffering simplifies the implementation of these sorts of effects.
Since the XBM format is actually C programming language statements, you can significantly improve parsing performance by including the files in your program and letting the C compiler parse them at compile time. You can then process the compiled data with XCreateBitmapFromData or XCreatePixmapFromBitmapData.
If you're using large bitmap files or many files, compiling them into your application can increase your code size somewhat. You can minimize the code size problem (on operating systems using demand paged memory management) by placing all the images in one code module: the pages will not be touched after they are first copied to the X server.
The above concept also applies to the popular (but non-standard) XPM color image file format. The XPM library provides functions for both reading the format as files or as compiled data structures.
While resource converters are usually installed by widgets, application writers can also use the mechanism if they install any converters.
When managing a group of widgets, you should try to manage them as a group (XtManageChildren instead of XtManageChild) to minimize the number of negotiations that occur. This is unnecessary, however, when managed widgets are added to a not yet realized parent, as they the X Toolkit automatically groups the geometry negotiation when the parent is realized.
Also, note that many widgets have resources that control whether or not they will participate in geometry negotiations if their contents change after they are realized. If you can disable these resources, possibly only for short periods, you may be able to greatly improve application performance. Some examples of these resources for the Motif widget set are:
Because XtGetValues usually simply copies values from the widget record, batching calls to XtGetValues improves performance only by reducing the number of function calls. In most cases, this offers little to no performance gain.
The main purpose of gadgets is to improve performance by reducing the overhead of widgets. In early releases of X11, widgets used quite a bit more X server memory than did gadgets, but X11R4 servers have minimized this difference. Widgets also take somewhat more time to create and realize than do gadgets. You can sometimes notice this if your application uses hundreds or thousands of widgets.
On the other hand, gadgets have somewhat less functionality than do widgets. Gadgets do not have color or translation resources make it impossible to manipulate these on a gadget-by-gadget basis. Also, gadgets require that their parents do all their event handling, which can cause somewhat more network traffic and client-side CPU usage than with widgets.
Which should you choose? I generally use widgets. If start-up time becomes a problem, switching to gadgets in some modules is an easy conversion.
Some people have speculated that shared libraries also save memory by allowing simultaneously running applications to share the text space of the shared library. Unfortunately, several studies have show this to not be true. The dynamic memory management systems of modern operating systems are so efficient that shared libraries have no significant performance benefit.
A related area that frequently confuses beginners is whether or not "stripping" UNIX executables (removing debugging symbols) improves performance. Because the symbols are not loaded with the executable, however, stripping has no effect on performance.
On the other hand, compiler optimizers can improve performance. How much they help depends on you code. They are most useful for improving the performance of computationally intensive algorithms.
If your application is disk intensive, you can often significantly improve performance in a few areas. First, consider buffering disk I/O when possible. UNIX operating systems do some kernel level buffering and the UNIX stdio library does more; you should use these when possible.
More importantly, try to avoid any disk I/O across a network when possible. Networked file systems offer some nice features, but usually at a high performance cost. Accessing a (good quality) local disk drive will usually be much faster. This applies to all of the following disk I/O operations (if supported by your operating system):
You may also want to try special versions of malloc that help debug memory problems, including memory leaks as well as memory corruption problems. There are several such malloc's available, both free via the Internet and as commercial products. There are also several popular (and heavily advertised) commercial products that extend on this idea to provide a full-featured memory analysis package.
Apparent X server memory leaks can also be caused because of client actions. These memory leaks can be easily avoided with a few precautions. X clients that allocate pixmaps without freeing them will cause the server to allocate memory without bound. If your X client will be running for a long period (some run for weeks or months at a time), make sure you free server resources you no longer need, especially large ones like pixmaps. Also, don't use retained close down modes (XSetCloseDownMode) unless you can absolutely guarantee that the resources will be properly freed later.
Client-side memory leaks are more complicated than server-side leaks since there are many more library interfaces and the interfaces are less well defined. The three main programming levels are Xlib, the X Toolkit, and the widget set.
Xlib rarely allocates memory and those cases where it does (e.g., XCreateImage) are well documented.[Scheifler]
Xt allocates memory more often, but these are also generally well documented. [Asente]
Some widget sets are, unfortunately, less well documented. Some key questions are which strings and arrays are copied during widget XtGetValues, XtSetValues, and in convenience functions. I discuss these issues in more detail in two separate papers.[Lee96]
As with most engineering guidelines, these are not hard and fast rules. Most are tradeoffs that force you to choose between two or more design goals. You, the software engineer, must decide what is best for your project.
[Droms] Ralph Droms and Wayne Dyksen, "Performance Measurements of the X Window System Protocol," Software Practice and Experience, Vol. 20, No. S2 (special issue), October, 1990.
[Gajewska] Hania Gajewska, Mark Manasse, and Joel McCormack, "Why X Is Not Our Ideal Window System," Software Practice and Experience, Vol. 20, No. S2 (special issue), October, 1990.
[Jones] Oliver Jones, Introduction to the X Window System, Prentice-Hall, 1989.
[Lee92] Kenton Lee, "Graphics Effects by X Colormap Manipulation," The X Journal, May, 1992.
[Lee93] Kenton Lee, "The 40 Most Common X Programming Errors (And How To Avoid Repeating Them)," The X Journal, March, 1993.
[Lee96-1] Kenton Lee, "Debugging X Memory Leaks and Other Dynamic Memory Bugs," The X Advisor, February, 1996.
[Lee96-2] Kenton Lee, "Avoiding Motif Memory Leaks," The X Advisor, March, 1996.
[McCormack] Joel McCormack, "Writing Fast X Servers For Dumb Color Frame Buffers," Software Practice and Experience, Vol. 20, No. S2 (special issue), October, 1990.
[McMinds] Donald McMinds, Mastering OSF/Motif Widgets, second edition, Addison-Wesley, 1993.
[Mirchandani] Dinesh Mirchandani and Prabuddha Biswas, "Ethernet Performance of Remote DECwindows Applications," Digital Technical Journal, Vol. 2, No. 3, Summer, 1990.
[Mulder] Art Mulder, "How To Maximize the Performance of X," monthly posting to USENET comp.windows.x newsgroup and Internet xpert mailing list.
[Peterson] Chris Peterson and Sharon Chang, "Improving X Application Performance," The X Resource, Issue 3, Summer, 1992.
[Scheifler] Robert Scheifler and James Gettys, X Window System (Third Edition), Digital Press, 1992.
Ken has published over two dozen technical papers on the X Window System. Most are available over the World Wide Web at http://www.rahul.net/kenton/bib.html.
Ken may be reached by Internet electronic mail to kenton @ rahul.net or the World Wide Web at http://www.rahul.net/kenton/.
For more information on the X Window System, please visit my home page..