Download (direct link):
requires a human or an extremely smart compiler (perhaps driven by an expert system that uses knowledge) to detect and correct semantic errors, because, in general, computers and their software are not semantically awareâ€”and changing that state of affairs, at least with respect to the Internet, is what this book is all about.
In the Web world, we typically deal with documents, so the syntax of documents, along with the markup languages that structure documents and data, are our primary interest. The syntax of documents involve strings of characters from some alphabet (for text) or some set of defined binary encodings (for graphics, video, and so on). The semantics is what those symbols are intended to mean in a human-defined domain (sometimes called, more formally, part of a universe of discourse)â€”that is, what the documents mean. Syntactic symbols are meaningless unless they are given a semantic interpretation, in other words, mapped to objects in a model where that meaning is represented. Semantic interpretation is semantics: It is interpreting the syntactic symbols with respect to their intended meaning. In the Web work, XML has a syntax. A document that is marked up using XML is either syntactically correct or not, with respect to the syntax of XML. That means that certain constructs have to appear in a certain order, XML tags have to be closed by a delimiter, and so on.
Syntax is order and format, but it is also structure. Databases, Web objects, objects in the coming Semantic Web, models, and ontologies require structure.
Models generally require structure, a way of organizing and containing elements of the model. A database schema, for example, is primarily a way to both describe and prescribe the structure of a database. We understand the notion of description better now, after our discussion in the previous section. By prescribe, we mean that the objects of the databaseâ€”the tables, columns, rows, and valuesâ€”are required to adhere to the structure of the schema, the way the elements of the schema are organized and the way that certain elements are contained within other elements. This prescription is enforced typically by the Database Management System (DBMS), which we can think of, at least partially, as the interpreter or validator of the data with respect to the database schema. DBMSs typically do much more than this; however, here it functions much like an XML parser/validator, parsing XML files (checks them against the syntactic specification of the requisite XML language version) and then validating them with respect to a particular, structure-defining DTD or a schema based on the XML Schema language.
Conceptual models, such as those written in UML, are also concerned with structure. The structure in conceptual models is reflected partially in the inheritance hierarchies of the subclass relation: One class is a subclass of another class. Structure is also reflected in the part-of relation: One class is a part of or constitutes another class. Structure is also reflected in other arbitrary relations. In a UML model of a human resources application; for example, two classes may be in an employee-of relation (similar to the relation in a database conceptual schema, which is usually constructed in an entity-relation or extended-entity-relation model), as in Figure. 8.2. Note that this is roughly the UML equivalent of the OKBC ontology in Figure 8.1 (though without the underlying logical richness and precision of the latter; richness and precision enables machine semantic interpretability).
Structure can typically be represented by a node-and-edge graphical notationâ€” in other words, using a node and edge or link, the latter of which can be directed (symbolized by an arrowhead pointing at the node the relation is directed toward, as in Figure 8.3). The general study of such node-and-edge models is called graph theory, where a graph is a more complicated data structure than a tree, which is a simpler hierarchic structure such as we saw in the previous chapter on taxonomies. A graph (think of a complex network) is more complicated than a tree because it is a tree with either directed or undirected links arbitrarily connecting nodes, whereas a tree is a data structure that just has edges or links (branches), a distinguished node called the root (as we saw in the last chapter) into which no edge enters, and from the root there is a unique path to every node.
The main difference between a graph and a tree is that a graph may have multiple paths to nodes. A directed graph is a graph in which the edge is directed from one node to another (think of a relation like father-of, where the edge from John to Harry signifies that John is the father-of Harry). An undirected graph means that there is no arrow, but only a simpler edge (think of the relation friend-of between John and Sue: John is the friend-of Sue and Sue is the friend-of John). A graph without cycles (links between a child node and one of its ancestors) is called an acyclic graph. A graph with cycles (links between a child node and one of it parents) is called a cyclic graph. Directed graphs with cycles are called directed cyclic graphs; directed graphs without cycles are called directed acyclic graphs (DAGs) and are typically the data structure used for most complex structures, such as ER, UML, and ontology models. There is an implementation cost incurred with cycles (i.e., you have to detect cycles and so must keep additional bookkeeping information around when traversing your graph as, for example, in a search), so in general most, but not all, models do not permit them.