Download (direct link):
Overview of Taxonomies
This section defines taxonomy, describes what kind of information a taxonomy tries to structure, and shows how it structures this information. The business world has many taxonomies, as does the nonbusiness world. In fact, the world
cannot do without taxonomies, since it is in our nature as human beings to classify. That is what a taxonomy is: a way of classifying or categorizing a set of things—specifically, a classification in the form of a hierarchy. A hierarchy is simply a treelike structure. Like a tree, it has a root and branches. Each branching point is called a node.
If you look up the definition of taxonomy in the dictionary, the definition will read something like the following (from Merriam-Webster OnLine: http:// www.m-w.com/):
The study of the general principles of scientific classification: SYSTEMATICS
CLASSIFICATION; especially: orderly classification of plants and animals
according to their presumed natural relationships
So, the two key ideas for a taxonomy are that it is a classification and it is a tree. But now let's be a bit more precise as to the information technology notion of a taxonomy. The rapid evolution of information technology has spawned terminology that's rooted in the dictionary definitions but defined slightly differently. The concepts behind the terminology (and that thus constitute the definitions) are slightly different, because these concepts describe engineering products and are not just abstract or ordinary human natural language constructs. Here is the information technology definition for a taxonomy:
The classification of information entities in the form of a hierarchy, according to
the presumed relationships of the real-world entities that they represent
A taxonomy is usually depicted with the root of the taxonomy on top, as in Figure 7.1. Each node of the taxonomy—including the root—is an information entity that stands for a real-world entity. Each link between nodes represents a special relation called the is subclassification of relation (if the link's arrow is pointing up toward the parent node) or is superclassification of (if the link's arrow is pointing down at the child node). Sometimes this special relation is defined more strictly to be is subclass of or is superclass of, where it is understood to mean that the information entities (which, remember, stand for the real-world entities) are classes of objects. This is probably terminology you are familiar with, as it is used in object-oriented programming. A class is a generic entity. In Figure 7.1, examples include the class Person, its subclasses of Employee and Manager, and its superclass of Agent (a legal entity, which can also include an Organization, as shown in the figure).
As you go up the taxonomy toward the root at the top, the entities become more general. As you go down the taxonomy toward the leaves at the bottom, the entities become more specialized. Agent, for example, is more general than Person, which in turn is more general than Employee. This kind of classification system is sometimes called a generalization/specialization taxonomy.
► Subclass of
Figure 7.1 A simple taxonomy.
Taxonomies are good for classifying information entities semantically; that is, they help establish a simple semantics (semantics here just means "meaning" or a kind of meta data) for an information space. As such, they are related to other information technology knowledge products that you've probably heard about: meta data, schemas, thesauri, conceptual models, and ontologies. Whereas the next chapter discusses ontologies in some detail, this chapter helps you make the distinction among the preceding concepts.
A taxonomy is a semantic hierarchy in which information entities are related by either the subclassification of relation or the subclass of relation. The former is semantically weaker than the latter, so we make a distinction between semantically weaker and semantically stronger taxonomies. Although taxonomies are fairly weak semantically to begin with—they don't have the complexity to express rich meaning—the stronger taxonomies try to use this notion of a distinguishing property. Each information entity is distinguished by a distinguishing property that makes it unique as a subclass of its parent entity (a synonym for property is attribute or quality). If you consider the Linnaeus-like biological taxonomy shown in Figure 7.2, which has been simplified to show where humans fit in the taxonomy. In Figure 7.1, the property that distinguishes a specific subclass at the higher level (closer to the root) is probably actually a large set of properties.
Consider the distinction between mammal and reptile under their parent subphylum Vertebrata (in Figure 7.2, a dotted line between Mammalia and Diapsida shows that they are at the same level of representation, both being subclassifications of Vertebrata). Although both mammals and reptiles have four legs (common properties), mammals are warm-blooded and reptiles are cold-blooded. So warm-bloodedness can be considered at least one of the properties that distinguishes mammals and reptiles; there could be others. One other distinguishing property between mammals and reptiles is the property of egg-laying. Although there are exceptions (the Australian platypus, for example), mammals in general