BYTE.com


Search BYTE.com
Write to Byte
Editorial Calendar

Categories
Previous Editions
Columns
Features


Print Archives
1994-1998

About Us
Byte Editorial Staff
Advertise with Byte
Privacy Policy

Free E-mail Newsletter from BYTE.com
Byte.com Update
Text only


Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers

ArticlesThe Standard Template Library


October 1995 / Core Technologies / The Standard Template Library

Part of the draft C++ standard, STL provides the framework for building generic, highly reusable algorithms and data structures

Alexander Stepanov

In every programming language, there's a need for various data structures, such as vectors, lists, and associative arrays. Programmers also need fundamental algorithms -- for sorting, searching, and copying -- defined for the data structures. It has long been lamented that C++ doesn't provide a good set of standard data structures.

But at last this problem has been remedied. The Standard Template Library is a framework of data structures (called containers in STL) and algorithms accepted as part of the draft C++ standard. A reference implementation of STL has bee n put into the public domain by Hewlett-Packard (it can be downloaded from butler.hpl.hp.com), and a growing number of commercial vendors are now shipping STL.

In the short time since its release, STL has generated many emotional -- and conflicting -- assessments. On one hand, for example, Bjarne Stroustrup of Bell Laboratories calls it a "large, systematic, clean, formally sound, comprehensible, elegant, and efficient framework." On the other hand, Pamela Seymour of Leiden University writes that "STL looks like the machine language macro library of an anally retentive assembly language programmer."

Goal: Generality + Efficiency

STL is not an attempt to impose yet another standard on a suffering humanity. And it was not designed by or for a committee. It is the result of over 15 years of research in generic programming that I've done in different places, with different collaborators, and in different programming languages. I did this research with a concrete goal in mind: to find a way to write algorithms in the most general way, but in such a way that their abstractness would not impose any performance penalty.

What do I mean by "in the most general way"? Simply that an algorithm works on all data types for which it makes sense. For example, a linear-search algorithm is written in the most general way if it can search any data structure for which the operations of looking at data, going to the next data element, and indicating the end of the search range are defined. So, it should work for an array, a singly linked list, a doubly linked list, a file, and even a binary tree.

An algorithm should also work for portions of such structures. For example, you might want to search half a list or sum the set of elements in an array that are n spaces apart (i.e., a stride).

What do I mean when I say that an algorithm does not "impose any performance penalty"? In other words, how do you know that a generic algorithm is efficient? An algorithm is called rel atively efficient if it's as efficient as a nongeneric version written in the same language, and it's called absolutely efficient if it's as efficient as a nongeneric assembly language version.

For many years, I tried to achieve relative efficiency in more advanced languages (e.g., Ada and Scheme) but failed. My generic versions of even simple algorithms were not able to compete with built-in primitives. But in C++ I was finally able to not only accomplish relative efficiency but come very close to the more ambitious goal of absolute efficiency. To verify this, I spent countless hours looking at the assembly code generated by different compilers on different architectures.

I found that efficiency and generality were not mutually exclusive. In fact, quite the reverse is true. If a component is not efficient enough, it usually means that it's not abstract enough. This is because efficiency and abstractness both require a clean, orthogonal design. A similar phenomenon occurs in mathematics: Making a proof more abstract makes it more concise and elegant.

Orthogonal Component Space

The past 25 years have seen attempts to revolutionize programming by reducing all programs to a single conceptual primitive. Functional programming, for example, made everything into a function; the notions of states, addresses, and side effects were taboo. Then, with the advent of object-oriented programming (OOP), functions became taboo; everything became an object (with a state).

STL is heavily influenced by both functional programming and OOP. But it's not a single-paradigm library; rather, it's a library for general-purpose programming of von Neumann computers.

STL is based on an orthogonal decomposition of component space. For example, an array and a binary search should not be reduced to a single, fundamental notion. The two are quite different. An array is a data structure -- a component that holds data. A binary search is an algorithm -- a component that performs a computation on data stored in a data structure. As long as a data structure provides an adequate access method, you can use the binary-search algorithm on it. Only by respecting the fundamental differences of arrays and binary searches can efficiency and elegance be simultaneously achieved.

Iterators

The key to STL is the notion of iterators , which are generalized pointers that provide a glue for connecting algorithms and data structures. STL is indeed retrograde in its disregard of the current academic dogma suggesting that pointers are evil. Instead of hiding pointers behind value semantics, it makes them the corner-stone of the design. The decision to bring pointers back into the realm of respectability was based on a simple fact: Most things in programming resemble pointers in that they identify a location of data. For instance, Internet addresses, SCSI addresses, and file descriptors all function as pointers.

Consider the task of printing a list of productive employees (see the listing "Printing Names of Productive Employees" ). The employees' names are stored in a vector , an STL version of a one-dimensional dynamic array. To print the names of productive employees, you use the STL function remove_copy_if() , which scans the range of elements from its first argument up to, but not including, its second argument and copies those that do not satisfy a predicate (its fourth argument) into positions starting from its third argument. (For most people, the code is clearer than the explanation.) The functions begin() and end() return iterators pointing to the first element and past the last element in the vector, respectively. (STL requires that for every container, the number of valid iterators pointing to it is one greater than the number of elements in the container.) The STL component ostream_iterator provides an iterator-like interface to an output stream.

It's imp ortant to note that if you later decide to put employees' names in a list instead of in a vector, you do not have to change anything except the declaration of the variable all . The remove_copy_if() function works for vectors, lists, deques, and sets (which are all STL components), as well as for any user-defined container that provides STL-conforming iterators. It also works for regular C arrays.

Iterator Categories

STL classifies iterators into five categories: input, output, forward, bidirectional, and random-access. These iterator categories are sets of requirements for operations that are supported by concrete iterator types. An important experimental discovery I made was that hundreds of different practical algorithms can be written in terms of these abstract categories.

STL specifies a set of valid expressions for each category's iterators, as well as precise semantics for each iterator's usage. For example, given that i is a value of a typ e that belongs to a bidirectional iterator category, if ++i is defined, then -- (++i) == i . STL also prescribes certain complexity requirements for these expressions. Users are thereby guaranteed that algorithms written in terms of these abstract interfaces will work effectively.

Different algorithms require different kinds of iterators, and different algorithms are needed to perform different operations on different data structures. STL uses a novel language technique that selects the right algorithm at compile time, depending on the iterator category.

Generic Algorithms

The listing "An STL Implementation of remove_copy_if() " illustrates how STL deals with iterators. What's most striking is the fact that it looks just like regular C code; only the signature is different. In fact, I've found that C programmers find it quite easy to start programming in STL even when they don't know C++, because the underlying idioms are alrea dy familiar to them. The fact that all the iterator categories are abstracted from pointers ensures that there is an efficient implementation for them.

In computer science it's important to base abstractions on efficient models. In other words, I believe that remove_copy_if() is efficient because it generates good code when used with plain C arrays. In fact, if you use remove_copy_if() with STL function objects rather than with pointers to functions, as I did in the listing "Printing Names of Productive Employees," you can obtain code that is often just as efficient as hand-written assembly code.

The Future

It is my hope that STL will prove to be the beginning of a long process of developing systematic catalogs of highly parameterized software components. The ANSI/ISO C++ standard committee saw the promise of STL and provided a conduit through which generic programming could reach working programmers.

I'd like to use this opp ortunity to advocate the creation of an industrywide consortium for developing new generic components. No single company can accumulate the algorithmic expertise that is needed for such an activity. And it is in everybody's interest that all the fundamental algorithms and data structures be universally and inexpensively available.


ACKNOWLEDGMENTS

I would like to express my gratitude to Bjarne Stroustrup, Ross Towle, Jim Dehnert, Suresh Srinivas, Mary Loomis, and Larry Rosler for reviewing this article.


Printing Names of Productive Employees

vector<Employee> all;
bool is_manager(const Employee& x) {
      return x.title == manager  }
...
remove_copy_if(
  all.begin(),
  all.end(),
  ostream_iterator<Employee>(cout),
  is_manager);





An STL Implementation of remove_copy_if()

template <clas
s InputIterator, class OutputIterator, class Predicate>
OutputIterator remove_copy_if(InputIterator first, InputIterator last,
                               OutputIterator result, Predicate pred)  {
  while (first != last)  {
     if (!prod(*first)) *result++ = *first;
     ++ first;  }
  return result;  }





Alexander Stepanov is a member of the technical staff at Silicon Graphics, Inc. (Mountain View, CA), where he works on C++ libraries and tools. Prior to joining SGI, he worked for HP Labs, AT&T Bell Labs, Polytechnic University (Brooklyn, NY), and GE R&D. He has worked in such areas as generic software libraries, storage systems, OSes, compilers, and path-planning algorithms for robots. He can be contacted on the Internet at mailto:stepanov@mti.sgi.comor on BIX c/o "editors."

Up to the Core Technologies section contentsGo to previous article: Weaving a ThreadGo to next article: Internet Firewalls



Copyright © 2003 CMP Media LLC, Privacy Policy, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: Byte.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network