Kenton Lee

X Window System Application Performance Tuning

Kenton Lee
published in The X Journal, July, 1994

Copyright © 1994 Kenton Lee. All rights reserved.


ABSTRACT

The performance of X-based application programs goes beyond simple drawing speed. This paper discusses several performance metrics as well as techniques you can use to improve the performance of your applications.

CONTENTS

  1. INTRODUCTION
  2. PERFORMANCE AND SOFTWARE ENGINEERING
  3. PERFORMANCE GUIDELINE
  4. DESIGN GUIDELINES
  5. SOFTWARE ORGANIZATION
  6. XLIB REQUESTS AND REPLIES
  7. X EVENTS
  8. GRAPHICS ISSUES
  9. SERVER-SIDE FEATURES AND LIMITATIONS
  10. CLIENT-SIDE FEATURES AND LIMITATIONS
  11. MEMORY ISSUES
  12. CONCLUSION
  13. REFERENCES
  14. THE AUTHOR

INTRODUCTION

When many X Window System application programmers think of performance, they think of drawing speed, i.e., the time the application takes to draw or refresh its graphical displays. Experienced software engineers, however, know that application performance engineering can be somewhat more complex than solely optimizing drawing speed.

In this paper, I will examine several different aspects of X application performance. I will start by discussing performance metrics and general techniques for designing performance into your applications. Unfortunately, you can't always simultaneously optimize all types of performance, so the bulk of this paper will discuss possible software engineering tradeoffs that you should consider. To keep the paper to a reasonable length, I won't go into a lot of detail on particular issues; I will leave it to you to apply them to your particular applications.

Hopefully, this material will help you improve your new designs and/or solve problems in existing designs.

PERFORMANCE AND SOFTWARE ENGINEERING

In this section, I will discuss several popular metrics for X application performance. Then, I will discuss software engineering methodologies for using these metrics to design and/or fix your software.

Performance Metrics

Part of the problem is that everyone wants their applications to perform well, but the term "performance" is vague and is often left poorly defined. In the X world there are usually several interrelated and often conflicting components to performance. Here are some easily measured aspects of X application performance. Later in this paper, I will discuss specific techniques for optimizing them.
  1. Drawing/refresh speed. Drawing speed is very important for applications that must draw complex graphics on the screen. With these applications, slow speeds can significantly slow down the user or cause poor aesthetics. On the other hand, many applications, especially those with simple text and window displays, redraw very quickly, so optimal drawing speed is not very important.
  2. Application responsiveness to user input. Response time is the time the application takes to respond to user input. Usually, this is measured from the user's point of view: does the application respond when the user expects a response?
  3. Application start-up/initialization time. X applications, especially those with many windows, often take non-trivial amounts of time to initially start up. As with response time, this is usually measured from the user's point of view: how long does the user have to wait before useful information is displayed and/or the user can input something?
  4. Client-server communications. X has a client-server architecture. The X client and X server may be running on the same computer or on different computers connected by a network. The level and pattern of communication between the client and the server is an important component to performance.
  5. CPU usage. A major part of performance is CPU usage. If the X client and X server are running on different CPUs, the two should be considered separately.
  6. Memory usage. X applications often use a lot of memory, both on the client and X server computers. If your system cannot fit it all into RAM, it will constantly have to page or swap data to your disk drive. This can cause performance problems, especially with networked file servers or very low memory machines (like X terminals).
  7. Graphics and disk I/O. Like memory usage, graphics and disk I/O are tightly related with performance. X often accesses these, either as a direct or as an indirect effect of the X programming interfaces. While indirect access patterns may vary between X implementations, some general design principles usually apply and are discussed below.
  8. Performance of other applications connected to the same X server. Two X clients connected to the same X server can interfere with each other's performance, either directly (e.g., grabs) or indirectly (e.g., X server load).
There are also some cases where you may want to sacrifice performance for important software engineering goals, such as:
  1. Code robustness (bugs, maintainability, etc.).
  2. Development time.
  3. Scarce X resources (color cells, grabs, screen space).
In later sections, I will discuss specific tradeoffs between these and performance.

High Level Application Design

The bulk of this paper will discuss specific software engineering tradeoffs you can make. These are primarily local tradeoffs that apply within code modules. Before we consider local tradeoffs, however, let's discuss some global issues. In most cases, global engineering is much more valuable than local engineering. Using a top-down engineering approach, resolving global issues first usually greatly simplifies and often eliminates low level problems.

For most X applications, you cannot easily give an interactive application a single performance rating. Performance measurements can vary widely depend on the user's particular task. When studying the performance of an application, you must study it in a manner similar to the way the user is using the application.

If your application is well designed, you will probably have generated a set of typical user tasks or use cases for which your application will be used. This task analysis is critical to designing an easy-to-use X-based user interface. It is also very important for optimizing the performance characteristics of your application. In the analysis phase, you can create performance requirements, using the metrics discussed in the last section, for the most common user tasks. In the design phase, you should create designs that avoid general performance problems in the associated functional areas (including interactions between functional areas).

Remember, of course, to consider all relevant variables in your performance analysis, including data set sizes, client and server hardware, networks, etc. For example, if your users are always using large data sets, performance tests with small sets are not very useful.

Performance Troubleshooting

Unfortunately, you probably won't be able to identify all user tasks in your application's design phase. You may find performance problems during (hopefully) testing or (more often than you'd like) via reports from users.

Hopefully, the performance problem will be caused by an isolated bottleneck, rather than a major design flaw. If it is a bottleneck, you can usually work backward from the problem's symptoms to identify the problem. From there, you may be able to apply some of the guidelines and tradeoffs given below to improve the performance of a low level module.

If the problem is a major design flaw, you may be better off reworking everything from the design on down. As with any other bug fix, a poorly designed solution usually leads to more problems in the long run.

PERFORMANCE GUIDELINES

Now that we know how performance analysis can be used to improve your application, let's look at some specific ways to meet your performance goals. I will start with some general guidelines that probably apply to all X applications. These should help you with the overall design of your application. Later, I will cover some specific lower-level guidelines and tradeoffs, for the following X software areas:
  1. Xlib Requests and Replies.
  2. X Events.
  3. Graphics Issues.
  4. Server-Side Features and Limitations.
  5. Client-Side Features and Limitations (mostly the X Toolkit).
  6. Memory Issues.
Most likely, only a few of these will apply to any particular application. Hopefully, they will help you design specific algorithms or code modules.

Note that most of these items may have a very small effect if applied only once, but the benefits add up quickly if you are performing an operation hundreds or thousands of times. Look for the latter cases, which are often very common in graphics and interactive applications.

DESIGN GUIDELINES

In this section, I will discuss some high level design guidelines related to X application performance. These will be most useful in the analysis and design phases of your application development.

X Programming Models

As with any large programming system, X works more efficiently with certain program designs than with others. X also provides support and optimizations for certain designs. If you find yourself asking, "why doesn't X perform this generic functionality for me?", you may find that an alternative design will give you the same functionality with better performance and much less complexity. Try to resolve issues like this in your design phase, as they can be very difficult to work around later.

I can't list all possible flawed designs, but the three most popular are probably these:

  1. Interactive X applications (and almost all interactive non-X applications) work best using an event-driven single event loop model. Many beginners don't do this and their performance (and robustness) suffers. In fact, you'll have a hard time getting most X Toolkit features to work efficiently, if at all, with any other architecture.
  2. Avoid flushing the X request buffer or running the X protocol synchronously unless absolutely necessary. The built-in buffering mechanisms work very efficiently. Subverting this buffering can cause significant performance problems, especially with networked or graphics-intensive applications.
  3. If you're using the X Toolkit, avoid performing any type of Xlib request outside of widgets or gadgets. In particular, don't draw to the screen in callback functions. The X Toolkit's functionality is highly optimized to work with the widget hierarchy. To effectively work outside the hierarchy, you'll have to duplicate a lot of this functionality, adding complexity and inefficiency.

Performance vs. Code Complexity

If you can implement a particular application in one of two ways, should you always choose the better performing one? Often the faster of two similar algorithms will be somewhat more complex. It may also be less functional, more difficult and time consuming to implement and maintain, less flexible, or less portable.

As with most tradeoff decisions, the best answer depends on your application. In most cases, however, if the performance differences are small to moderate and the simpler algorithm can be implemented much more quickly, I will choose the simpler one. The simpler algorithm will generally lead to more robust and more easily maintained code. It will also help get the application or prototype running much sooner. If I later find that I need more performance in this module, I can then switch to the more complex algorithm.

SOFTWARE ORGANIZATION

Interactive applications often have bursty performance characteristics, i.e., they show poor performance only at certain times. Most of the time, they're just waiting for user input. You can often improve performance by organizing your code to lower the peaks. Two such techniques are discussed in this section.

Improving Start-up Time By Delayed Initializations

Often, not all of your windows are displayed when the application starts up. There may be dialog windows or subwindows that are displayed later, usually in response to certain system or user input. There are several ways to initialize these secondary windows, with some important performance tradeoffs.

Many programmers create all the secondary windows at start up time, but leave them unmapped. In the X Toolkit and Motif, this is similar to realizing but not managing widgets. When you need the windows, they can be mapped or realized very quickly. Unfortunately, creating them at start up time slows down your application when it may be trying to do a lot of other initializations, thus delaying the display of your primary windows. Also, the unmapped windows can use a lot of memory (both client and server), especially if they contain pixmaps.

An alternative implementation is to create the secondary windows when they need to be displayed. This avoids the above performance and memory problems, but causes the secondary windows to display more slowly.

A third choice is to use the X Toolkit work procedure mechanism (XtAppAddWorkProc). Your application would create all its primary windows at start up time, then create all the secondary windows via work procedures. Work procedures are only executed when the X Toolkit event loop would otherwise be waiting for input. This technique speeds up display times for both your primary and your secondary windows. It does not help with the memory use problem mentioned above, though.

To choose between these techniques, you should examine the performance needs of your application. Many people use different techniques for different parts of the same application. The first technique is probably inappropriate for memory intensive windows that are rarely, if ever, used by your application. The second technique is possibly inappropriate for windows that must be quickly displayed, such as popup menus. The third technique is a good performance compromise, but is a little more complex to implement and requires the X Toolkit.

Improving Response Time

Some of the initialization tradeoffs discussed above also apply to the problem of improving response time. When users interact with an application, they usually want an immediate response. If the user's command requires a lot of computing, you could delay that and just display what is immediately available. The remainder could be filled in later.

If delayed displays are not possible, you should provide some sort of indication that the application is processing the input. Some applications change the cursor to a stop watch or hour glass shape. Others popup a dialog message window. The immediate feedback informs the user that the input was accepted and gives the illusion that your application is responsive, even though the user still has to wait.

XLIB REQUESTS AND REPLIES

Xlib is the lowest level X programming interface. Here are some performance ideas for applications that use this level. Most deal with optimizing the performance of communication between the client and the X server via the X protocol.

Xlib Request Buffer

Xlib normally buffers X protocol requests to improve client-server communication performance. Xlib (and the X toolkit) automatically flush the buffer at appropriate times, so applications rarely have to explicitly flush the request buffer. In most cases, programmers shouldn't worry about this and only add explicit flushing if they find cases where the automatic flushing is insufficient. Unnecessary flushing will increase the load on the client, the X server, and the network.

Round Trip Requests

The most expensive category of Xlib requests are those that require a "round trip" to the X server. These are requests like XQueryTree and XGetWindowProperty that require a response from the X server. When a round trip request is executed, the Xlib request buffer is immediately flushed and Xlib blocks until a reply is received. If the X server is running on the same computer as the X client, the operating system must context switch from the client to the server and then back again after the server has processed the request and sent its reply. In general, you should try to avoid these round trip Xlib requests.

If your application frequently performs round trip requests, you should consider caching schemes or alternative architectures to minimize them.

Note that Xlib and the X Toolkit have been designed to minimize the need for round trip requests. For example, when an X client initially connects to the X server it is allocated a collection of resource identifiers. Functions like XCreateWindow simply retrieve an ID from this collection rather than force a round trip. Also, the X Toolkit caches information like the widget sizes and positions so that the client does not have to perform an XGetGeometry round trip request. You should always take advantage of these built in optimizations when possible. Several others are mentioned in the following sections.

Xlib Batch Requests

Xlib often provides a function to perform one operation as well as a similar function to perform a batch of similar operations (e.g., XDrawPoint vs. XDrawPoints and XStoreColor vs. XStoreColors). You should generally try to use the batch versions of these functions as performance will usually be better, sometimes much better.

An exception to this rule is the functions that change graphics context attributes. Changing several attributes at once with XChangeGC is no more efficient than the XSet* GC functions because both write to the Xlib GC cache.

X EVENTS

Event handling is a major part of X application programming. It is also a performance problem area for many applications. Since it affects most X programming interfaces (Xlib, X Toolkit, widgets), I will dedicate a section to it.

Minimize Event Handling

In many applications, events and event handling are a major performance bottleneck. Large numbers of events can load down your network. Also, your client must read and process each event, even if it just checks the event's type and discards it. You can help optimize the performance of your client by both minimizing the number of events sent by the X server and also by minimizing the amount of processing done by the client when events are received.

Most events are requested via XSelectEvent, window event mask attributes, widget translation tables, and/or widget event handlers. You should choose these carefully to avoid receiving unneeded events.

In addition, GraphicsExpose or NoExpose events may be automatically requested when you do a XCopyArea or XCopyPlane and the graphics_exposures flag in the graphics contexts is set. The default is to send the events, so you should disable this (set the flag to 0) unless you need the events.

Events may also be generated in response to certain arguments to Xlib functions. For example, if the last argument to XClearArea is True, Expose events are generated. You should, of course, only set these arguments if you need the events.

Sometimes, you are not interested in all events of a particular type. For example, you may be interested in EnterNotify events (reported when the pointer cursor enters a window), but only if the user is actually going to use the window. The detail and mode fields of the event structure give you additional information about the event. Most applications can safely ignore EnterNotify events with "virtual" details or "non-normal" modes.

Finally, Expose events have count fields that can help you decide if you need to process the event. See the next section for some Expose event handling ideas.

Expose Event Handling

Expose events are probably the most common and problematic type of event. Almost all applications can benefit by optimizing Expose event handling. Here are some ideas.

First, avoid performing unnecessary work in non-expose event handlers. Most applications redraw their displays in response to Expose events. Beginning programmers often also redraw in response to ConfigureNotify (resize) events. Since you will usually get an Expose event whenever your window resizes (especially if the window's bit_gravity attribute is set to ForgetGravity), automatically drawing in your ConfigureNotify event handler may be unnecessary.

Also, remember that the Expose event structure includes count and region (x, y, width, height) fields. By using one or both of these, you may be able avoid unnecessary redrawing. Two popular schemes are:

  1. redraw only the affected region and
  2. ignore Expose events with count fields greater than 0.
Finally, consider maintaining an off-screen copy of your graphical displays if redrawing is a performance problem. You can draw once to a pixmap and, as Expose events arrive, copy from the pixmap to the screen. Pixmaps are expensive (1000 pixels by 1000 pixels by 8 bits deep is 1 megabyte of RAM) but the performance benefits can be worthwhile for complex graphical displays or for very fast refreshing (as may be required for smooth scrolling).

Backing Store and Save Unders

Another way to reduce Expose event handling is to enable backing store and/or save unders. Basically, they are schemes for having the X server automatically preserve the contents of windows and restore them after exposures. Backing store applies to the window on which it is set. Save unders applies to windows that are obscured by the save unders window. Both are settable as window attributes (XCreateWindow or XSetWindowAttributes).

The two schemes are both implemented by copying window contents to off screen memory and both have the same drawback as the off-screen pixmap copy technique described in the previous section: heavy memory use. Neither backing store nor save unders is guaranteed (either to exist on a server or to be maintained for the life of your window), so your application must provide exposure handling code as well.

You should consider using backing store in place of the pixmap copy technique describe above, but remember that you must provide an exposure handler in case the backing store fails.

Save unders were intended for very transient windows, such as popup menus. It can work well and is the default for X Toolkit transient shell widgets. Unfortunately, some X servers do not handle save unders very efficiently, so you may want to avoid using it or provide an option so that users can enable or disable it.

Note that Sun's olwm window manager has an option that causes save unders to automatically be set for all client transient windows even if the client did not request it. The option is called TransientsSaveUnder and its default is True. Unfortunately, the save unders implementation in Sun's xnews server is poor, so you probably want to disable this.

Pointer Motion Tracking

Tracking the pointer within a window can be a performance problem since it usually requires and/or generates a lot of network traffic. You should avoid doing it unless you're prepared to handle the performance hit. In most cases, your client should not attempt to do anything else at the same time.

If you must track the pointer within a window, X provides five main techniques for doing so:

  1. The motion history buffer (XGetMotionEvents).
  2. Synchronous polling (XQueryPointer loop).
  3. Selecting PointerMotion and/or ButtonMotion events.
  4. Same as 3 plus filtering using XPeekIfEvent.
  5. Selecting PointerMotionHints events and using XQueryPointer.
Each has different throughput performance and data detail characteristics, so you may want to try more than one to see which best meets your needs. The motion history technique can be difficult to use and is not supported by all X servers. Synchronous polling is the easiest to use, but is very CPU expensive and can interfere with other application functionality. Your best bet is most likely one of the other three.

Note that many X tutorial books recommend the PointerMotionHints technique, but in practice it usually gives no better performance than any of the other techniques because a XQueryPointer round trip request is required after each event. It is also the least poorly defined (in terms of X server functionality), so it may behave differently on different systems.[Gajewska]

GRAPHICS ISSUES

Graphics is another performance problem area for many applications. Very high performance can be difficult in a client-server architecture, but there are still many things you can do. The ideas in this section can greatly improve the performance of applications requiring optimal graphics performance. Text-oriented applications will probably not benefit much.

Wide Lines

One important X graphics context attribute is line width. The default line width of 0 instructs the X server to draw using the fastest algorithm supported by your hardware. Because of the strict specifications for the results of non-zero line width lines (also called wide lines), these are often drawn much slower.

For certain graphical effects, you can get the similar results by either drawing a simple object (e.g., a line or an circle) with a non-zero line width or by drawing one or more filled objects (e.g., a filled rectangle or two concentric filled circles) with a zero line width.

Which technique is faster? If you're only drawing a few objects, the difference is not significant. If you're drawing hundreds of objects, you should try both cases on your target X server and hardware. You may find that filled zero-width drawing is somewhat faster.

Graphics Context Functions

X provides a variety of graphics context drawing functions, e.g., GXcopy, GXxor, GXset, etc. In many cases, you can use two or more to get similar results. Which one should you choose?

In general, there are three basic parts to basic X drawing: reading the source pixels, reading the destination pixels, and writing the destination pixels. You can usually improve performance if you eliminate the need to perform some of those steps. For example, the GXset and GXclear function just write 1 or 0 into the destination drawable, eliminating the first two steps. On the other hand, GXcopy (the default function) requires reading the source and writing the destination. GXor and GXxor require all three steps. The potential performance differences depend on your graphics hardware and X server, but can be significant.

You can also add more steps to drawing by specifying stipples, tiles, clipmasks, plane masks, etc. The specific performance of these operations varies greatly between X servers and hardware. In general, however, stipples and tiles will slow down your performance. On the other hand, clipmasks and plane masks can improve performance by reducing the number of pixels involved.

XQueryBestSizes

Xlib provides XQueryBestCursor, XQueryBestStipple, and XQueryBestTile to ask the server for the optimal sizes of these items. If your application uses these items a lot and can easily configure their sizes at run time, you should consider using these functions to find the best sizes.

You probably shouldn't spend a lot of time handling the general case, however, since most X servers aren't too picky. Most X servers simply return the power of 2 closest to your desired size, possibly restricted to the maximum sizes supported by your hardware.

Shapes Of Graphical Objects

Most X servers break down filled graphical objects into a series of rectangles or spans. (A span is a set of contiguous pixels on one scan line.) The GC function is then performed on each rectangle or span.[McCormack]

Since objects with simpler shapes will reduce the number of rectangles or spans, you can improve graphics performance by using these, if possible. The "fastest" shapes are composed of a few large rectangles, but any convex shape will be better than a similar non-convex shape. Of course, a tradeoff like this is only possible if you are willing to sacrifice some functionality for potentially small amounts additional drawing speed.

Shaped Windows

A performance issue related to object shapes is the use of shaped (non-rectangular) windows. Shaped windows are available via the SHAPE X protocol extension.

While shaped windows can be useful, they can cause performance problems in many situations. Most importantly, Expose events always return rectangular exposure regions, so if a shaped window is moved or unmapped, the X server must send many Expose events to each exposed window to describe the shaped region. If the client owning the exposed window processes each event, its performance will be significantly affected. In general, you should avoid repeatedly resizing, moving, or unmapping shaped windows owned by your application. Also, if you expect many Expose events caused by shaped windows, you should write your Expose event handlers to compress the events as much as possible.

Colormap Tricks

With most X servers, you can manipulate the colormap much more quickly than you can manipulate raster memory. You can take advantage of this fact to greatly improve the speed of certain graphical functions, such as animations or interactive highlighting.

A previous paper [Lee92] detailed these techniques so I won't go into them here. The main concept, however, is to segment your color planes and manipulate your colormap so that different color plane permutations have different, pre-defined meanings.

X Toolkit GC Cache

The X Toolkit caches graphics contexts if you use the XtGetGC function. In most cases, you can save some memory and decrease your network traffic by using the Xt GC cache rather than creating your own GCs with XCreateGC. This is especially true if you'll be using similar GCs for many different widgets.

Graphics Context Sharing

If you're not using the X Toolkit GC cache, you may want to simulate it to get some of the same benefits. If you can, try to use the same GC for different drawing operations in the same window. For example, XDrawLine uses the line width attribute and XDrawText uses the font attribute. If you're doing both operations in the same window, set both attributes in one GC.

You can also use one GC for different windows as long as the windows have the same root, depth, and visual type.

These techniques are most useful if you are sharing GCs across dozens or hundreds of different windows (as is often the case with X Toolkit widgets). If you only have a few windows, the benefits will be small and the extra complexity of managing GC sharing is probably not worth while.

Note that there is no advantage to using one call to XChangeGC rather than several calls to GC convenience functions. Both interface to the Xlib GC cache, which caches the changes and does not send the updated GC to the X server until they are used in a drawing request.

Images vs. Pixmaps

X provides two mechanisms for storing off-screen images: server-side pixmaps and client-side images. In most cases, which you should choose is obvious. You must use images to read or write to disk files. You must use pixmaps for window and graphics context attributes.

In some cases, however, you have a choice. You can draw or refresh a window using either XCopyArea and a pixmap or XPutImage and an image. Which should you use? The pixmap technique will usually give you faster drawing speeds since you don't have to copy the data over your network. On the other hand, if memory in your X server is limited (e.g., in a diskless X terminal), you may have to use images or other techniques.

If you do use X images and your client is running on the same machine as the X server, you should consider using the MIT-SHM protocol extension for optimal performance. MIT-SHM is discussed in a later section.

SERVER-SIDE FEATURES AND LIMITATIONS

X uses a client-server architecture. This means that you must consider the performance of your X server as well as that of your client. X server performance can be more difficult to measure, unfortunately. This section discusses some general X server performance issues.

Large Pixmaps

While pixmaps provide useful functionality, they do use a lot of X server memory, as mentioned above. Many people use large pixmaps for window backgrounds, including the root window background. To improve performance, you should consider using a smaller pixmap instead. The X server will tile these across the window.

The same principle applies to pixmaps used as X Toolkit label widget resources. A text label will use much less memory, and this can add up if you have lots of labels.

Fonts

In addition to the resource caching issues mentioned above, each font your application uses requires processing and memory within your X server. If your application(s) use many different fonts, especially 16 bit fonts (e.g., Asian languages), the cost can be significant. Consider using only a few fonts in your application (and configuring your .Xdefaults so that all applications on your screen use the same fonts). This is probably easier on your eyes anyway.

Hogging the X Server

While batching drawing requests, as mentioned previously, is usually worth doing, if you're doing a lot of drawing you may find that it interferes with the responsiveness of your X server. The X server may be busy processing your large request and won't be able to respond quickly to input events for your or other X clients.

The same problem occurs if you grab the X server with XGrabServer. When one X client grabs the server, it prevents the user from interacting with any other X client and affects the performance of any output operations those clients were performing.

There's usually not much you can do about the drawing issue. You could try using smaller batches to periodically process events. The server grab problem, on the other hand, is usually avoidable: don't grab the server unless it is absolutely necessary. Only rarely is it necessary.

X Server as General IPC Mechanism

X provides several inter-process communication (IPC) facilities (client messages, window properties, selection conventions). This makes it inviting to application programmers seeking to easily implement a non-X-related communication mechanism between two X clients. Because all X communication facilities go through the X server, however, they aren't the best choice from a performance point of view.

In most cases, you can greatly improve IPC performance if you use your operating system's IPC facilities to communicate directly between the two clients. Doing so can improve performance in two ways. First, you eliminate the X server bridge. Passing large amounts of data via the X server can significantly slow down the performance of the X server and, thus, the apparent responsiveness of X clients connected to the X server.

Second, you have your choice of IPC facilities and may be able to use one that gives better performance than your X server connection. This is especially true if the two X clients are running on one computer but the X server is running on a different computer.

X Resource Database

Most users have a .Xdefaults file containing customizations for some X clients. When the user logs in, this file is copied into a X server window property. X clients copy this property when they start up. Of course, if your file is huge, this X server property and the copies in every application can use up a lot of memory.

An alternative to .Xdefaults is to put your application resources for X Toolkit applications in user defaults disk files. On UNIX systems, X Toolkit applications will search the directories specified by your XUSERFILESEARCHPATH environment variable for these files and use them in addition to system app-defaults and root property resources. Reading disk files is generally slower than reading X server properties, especially if the files are stored on remote file servers. On the other hand, because the application resource files are usually small (contain resources for only one application) and .Xdefaults files can be large (resources for all applications), there probably is no significant speed difference.

Note that X clients not based on the X Toolkit may not properly search for these files, so you may still have to use .Xdefaults for those clients.

X Protocol Extensions

Most X protocol extension, e.g., PEX or SHAPE, add functionality to the core X protocol. One extension designed specifically to enhance performance is the MIT-SHM extension. MIT-SHM provides special versions of XPutImage and XGetImage that use shared memory (if available on your operating system) rather than the normal X server connection. This can greatly improve the performance of applications that need to quickly XPutImage many images, e.g., animations. MIT-SHM only works, however, with X servers supporting the extension and when the X client and X server are running on the same computer.

Another protocol extension that helps improve performance is the Multi-Buffering extension. Before this extension became available, applications had to directly manipulate pixel values and colormaps to get high performance multi-buffered effects (e.g., smooth animations or erasable overlays).[Lee92] The Multi-Buffering simplifies the implementation of these sorts of effects.

CLIENT-SIDE FEATURES AND LIMITATIONS

In previous sections, we discussed some issues related to X clients, such as graphics and event handling. In this section we'll discuss specific client-side issues. Most mostly apply to X Toolkit applications.

XBM Bitmap File Format

Some applications store bitmap data in XBM disk files and read them in with XReadBitmapFile at start up time. This is convenient and can reduce your code size (a little). On the other hand, parsing these files at run time is very slow.

Since the XBM format is actually C programming language statements, you can significantly improve parsing performance by including the files in your program and letting the C compiler parse them at compile time. You can then process the compiled data with XCreateBitmapFromData or XCreatePixmapFromBitmapData.

If you're using large bitmap files or many files, compiling them into your application can increase your code size somewhat. You can minimize the code size problem (on operating systems using demand paged memory management) by placing all the images in one code module: the pages will not be touched after they are first copied to the X server.

The above concept also applies to the popular (but non-standard) XPM color image file format. The XPM library provides functions for both reading the format as files or as compiled data structures.

X Toolkit Resource Cache

When installing X Toolkit resource converters you can request that the X Toolkit cache the results of the conversions. For example, if one widget requests a read-only "red" color cell (possibly in response to some defaults file setting), the X Toolkit caches the result and the next widget that needs a "red" cell will get the cached value rather than requiring a round trip query to the server. This is a huge performance gain for applications performing large numbers of identical resource conversions (e.g., widget colors and fonts).

While resource converters are usually installed by widgets, application writers can also use the mechanism if they install any converters.

Manipulating Window/Widget Hierarchies

If you want to manipulate a hierarchy of windows, you often need only to manipulate the top window in the hierarchy, not any of the children. For example, unmapping the top level window automatically removes the children from the screen; unmapping the children too is a waste of CPU. The same concept applies to many widget functions, including realizing, managing, sensitizing, etc.

Widget Geometry Management

In addition to the above, you should be especially careful to minimize repeated widget geometry changes. Widgets use complex algorithms to negotiate sizes and positions with their parents and this negotiation may occur whenever a widget is managed, resized, or moved. If many widgets are involved, this process can be very slow.

When managing a group of widgets, you should try to manage them as a group (XtManageChildren instead of XtManageChild) to minimize the number of negotiations that occur. This is unnecessary, however, when managed widgets are added to a not yet realized parent, as they the X Toolkit automatically groups the geometry negotiation when the parent is realized.

Also, note that many widgets have resources that control whether or not they will participate in geometry negotiations if their contents change after they are realized. If you can disable these resources, possibly only for short periods, you may be able to greatly improve application performance. Some examples of these resources for the Motif widget set are:

The resources also apply, of course, to all subclasses of the widget classes.

XtSetValues

Related to widget geometry management is the general concept of manipulating resources values via XtSetValues. In general, if you're going to be setting several resource values for a widget, you should try to group them at set them all at once with a single call to XtSetValues or XtVaSetValues. Widgets often resize, reposition, or redraw themselves after processing a set values call. Grouping resource changes reduces unnecessary widget overhead.

Because XtGetValues usually simply copies values from the widget record, batching calls to XtGetValues improves performance only by reducing the number of function calls. In most cases, this offers little to no performance gain.

Widgets vs. Gadgets

Some X Toolkit widget sets provide both widget and gadget versions of their primitive objects. When should you use one vs. the other?

The main purpose of gadgets is to improve performance by reducing the overhead of widgets. In early releases of X11, widgets used quite a bit more X server memory than did gadgets, but X11R4 servers have minimized this difference. Widgets also take somewhat more time to create and realize than do gadgets. You can sometimes notice this if your application uses hundreds or thousands of widgets.

On the other hand, gadgets have somewhat less functionality than do widgets. Gadgets do not have color or translation resources make it impossible to manipulate these on a gadget-by-gadget basis. Also, gadgets require that their parents do all their event handling, which can cause somewhat more network traffic and client-side CPU usage than with widgets.

Which should you choose? I generally use widgets. If start-up time becomes a problem, switching to gadgets in some modules is an easy conversion.

Compiler and Linker Issues

Many operating systems now support a shared library scheme. Shared libraries allow subroutine libraries, such as the X libraries, to be dynamically linked at run time, rather than being statically linked at compile time. Since the libraries are not stored in the same disk files as the application and one copy of the library can be used by many applications, you can save large amounts of disk space by using shared libraries.

Some people have speculated that shared libraries also save memory by allowing simultaneously running applications to share the text space of the shared library. Unfortunately, several studies have show this to not be true. The dynamic memory management systems of modern operating systems are so efficient that shared libraries have no significant performance benefit.

A related area that frequently confuses beginners is whether or not "stripping" UNIX executables (removing debugging symbols) improves performance. Because the symbols are not loaded with the executable, however, stripping has no effect on performance.

On the other hand, compiler optimizers can improve performance. How much they help depends on you code. They are most useful for improving the performance of computationally intensive algorithms.

Disk I/O Issues

Previously, I mentioned that reading XBM files at run-time is slower than compiling them into your program, partially because of the extra disk reads. Of course, any other disk I/O is a potential performance issue.

If your application is disk intensive, you can often significantly improve performance in a few areas. First, consider buffering disk I/O when possible. UNIX operating systems do some kernel level buffering and the UNIX stdio library does more; you should use these when possible.

More importantly, try to avoid any disk I/O across a network when possible. Networked file systems offer some nice features, but usually at a high performance cost. Accessing a (good quality) local disk drive will usually be much faster. This applies to all of the following disk I/O operations (if supported by your operating system):

  1. compiling programs (including linking libraries)
  2. executing programs (if your virtual memory system pages off the executable disk file),
  3. swap partitions
  4. temporary file space
  5. data input and output

MEMORY ISSUES

In my last performance section, I will discuss some memory-related issues. Poor memory management can lead to poor performance, as well as poor robustness.

Malloc

The X Toolkit resource handling mechanism causes many calls to your system's malloc routine for small amounts of memory. Some system malloc's don't handle this very well; either allocating much more memory than is necessary or handling requests inefficiently. If you like low level hacking, you may want to link your application with a special version of malloc that is tuned for this behavior.

You may also want to try special versions of malloc that help debug memory problems, including memory leaks as well as memory corruption problems. There are several such malloc's available, both free via the Internet and as commercial products. There are also several popular (and heavily advertised) commercial products that extend on this idea to provide a full-featured memory analysis package.

Memory Leaks

Certain X servers are known to have memory leaks that cause them to grow without bound when certain X clients are run. There are also memory leaks in some versions of some X libraries. Unfortunately, there's probably not much you can do about these, short of harassing your vendor or getting a different X server.

Apparent X server memory leaks can also be caused because of client actions. These memory leaks can be easily avoided with a few precautions. X clients that allocate pixmaps without freeing them will cause the server to allocate memory without bound. If your X client will be running for a long period (some run for weeks or months at a time), make sure you free server resources you no longer need, especially large ones like pixmaps. Also, don't use retained close down modes (XSetCloseDownMode) unless you can absolutely guarantee that the resources will be properly freed later.

Client-side memory leaks are more complicated than server-side leaks since there are many more library interfaces and the interfaces are less well defined. The three main programming levels are Xlib, the X Toolkit, and the widget set.

Xlib rarely allocates memory and those cases where it does (e.g., XCreateImage) are well documented.[Scheifler]

Xt allocates memory more often, but these are also generally well documented. [Asente]

Some widget sets are, unfortunately, less well documented. Some key questions are which strings and arrays are copied during widget XtGetValues, XtSetValues, and in convenience functions. I discuss these issues in more detail in two separate papers.[Lee96]

CONCLUSION

In this paper, I have discussed techniques for designing performance into your X applications and for identifying and solving performance problems that do occur. As with most other aspects of software engineering, you can most easily achieve good performance through good design. Large performance improvements to an existing application may be difficult to achieve without changing large parts of the original design.

As with most engineering guidelines, these are not hard and fast rules. Most are tradeoffs that force you to choose between two or more design goals. You, the software engineer, must decide what is best for your project.

REFERENCES

[Asente] Paul Asente and Ralph Swick, X Window System Toolkit, Digital Press, 1990.

[Droms] Ralph Droms and Wayne Dyksen, "Performance Measurements of the X Window System Protocol," Software Practice and Experience, Vol. 20, No. S2 (special issue), October, 1990.

[Gajewska] Hania Gajewska, Mark Manasse, and Joel McCormack, "Why X Is Not Our Ideal Window System," Software Practice and Experience, Vol. 20, No. S2 (special issue), October, 1990.

[Jones] Oliver Jones, Introduction to the X Window System, Prentice-Hall, 1989.

[Lee92] Kenton Lee, "Graphics Effects by X Colormap Manipulation," The X Journal, May, 1992.

[Lee93] Kenton Lee, "The 40 Most Common X Programming Errors (And How To Avoid Repeating Them)," The X Journal, March, 1993.

[Lee96-1] Kenton Lee, "Debugging X Memory Leaks and Other Dynamic Memory Bugs," The X Advisor, February, 1996.

[Lee96-2] Kenton Lee, "Avoiding Motif Memory Leaks," The X Advisor, March, 1996.

[McCormack] Joel McCormack, "Writing Fast X Servers For Dumb Color Frame Buffers," Software Practice and Experience, Vol. 20, No. S2 (special issue), October, 1990.

[McMinds] Donald McMinds, Mastering OSF/Motif Widgets, second edition, Addison-Wesley, 1993.

[Mirchandani] Dinesh Mirchandani and Prabuddha Biswas, "Ethernet Performance of Remote DECwindows Applications," Digital Technical Journal, Vol. 2, No. 3, Summer, 1990.

[Mulder] Art Mulder, "How To Maximize the Performance of X," monthly posting to USENET comp.windows.x newsgroup and Internet xpert mailing list.

[Peterson] Chris Peterson and Sharon Chang, "Improving X Application Performance," The X Resource, Issue 3, Summer, 1992.

[Scheifler] Robert Scheifler and James Gettys, X Window System (Third Edition), Digital Press, 1992.


THE AUTHOR

Kenton Lee is an independent software consultant specializing in X Window System and OSF/Motif software development. He has been developing UNIX graphical user interface software since 1981.

Ken has published over two dozen technical papers on the X Window System. Most are available over the World Wide Web at http://www.rahul.net/kenton/bib.html.

Ken may be reached by Internet electronic mail to kenton @ rahul.net or the World Wide Web at http://www.rahul.net/kenton/.


[HOME] For more information on the X Window System, please visit my home page..


Please send me your comments on this paper:

Name: E-mail:

[X Consulting] [Home] [Mail] [X Papers] [X WWW Sites]