Introduction to Taxonomy

The Domain of Taxonomy

In simple terms, taxonomy is involved any time it is required to organize what one has to say to users (people and machines).

The root word "taxa" means category. Taxonomists may work across many disciplines to accurately identify categories and to establish their correct relationships. The result is a structural model of things, attributes, and concepts that describe a collection, product, endevor, or knowledge domain.

The reason for creating a taxonomy is to represent things in a way that allows the elements that make up the categories to be understood and to help other people to find and discuss those elements in a consistent way.

Some approaches to creating such categorical structures attempt a unified understanding of their objects, others emphasise flexibility and functionality.

The implications and nuances underlying this plain, if concise, description of the discipline and outcomes of taxonomy are important, and define the range and domain of taxonomy engagements.

One way of explaining the need for a taxonomy, is as follows:

Taxonomies are metadata models for managing access and retrieval across the domain of an endeavor.

Metadata models present semantic frameworks to enables communication and reuse about the objects or elements captured within the descriptive system.

As models, taxonomies are knowledge organization systems (KOS) that make explicit information that is implicit in the elements enabling common contextual interpretation.

The value-proposition of taxonomy is it a pragmatic tool for sharing the meaning of content accurately.

The use cases for a taxonomy determines the appropriate presentational style. Presentational styles for taxonomies range simple lists to sophisticated knowledge architectures.

The Range of Taxonomy

Mind map showing the discipline of taxonomy as concerned with categorical structure, relationship types, object classes, attribute types, concept classes, and attribute values

As an intellectual endevor, taxonomy focuses on identifying, organizing, and defining the elements that makeup the discourse used to discuss something. Just as language is broken down in to nouns, verbs, and qualifiers, so too, taxonomy is concerned with the structure of class categories and the relationships and types of attributes, or properties, associated with them.

Complete taxonomy modelsestablish the categorical structure and range or relationships that apply to:

  • Classes of things
  • Classes of concepts
  • Types of attributes
  • Ranges of attribute values

Discovery of the things, concepts, attributes, and attribute values, and their relationships, is accomplished by a variety of research approaches, including:

  • Reviewing document samples and existing vocabularies
  • Working with Subject Matter Experts (SMEs)
  • Task and workflow analysis
  • Group knowledge mining exercises

The discovery process has two goals, the first is to expose the language. Knowledge of the language has four aspects:

  • Vocabulary
  • Definitions and usage
  • Hierarchical relationships
  • Relationship patterns

Technically only hierarchical maps of vocabulary are taxonomies, however, as practical matter, to achieve the greatest value is realized by establishing use and relational patterns. Similarly, some parts of the vocabulary most likely do not lend themselves to a hierarchical mapping, and will result in reference lists or authority files.

This introductory discussion has focused on word-based taxonomies. To understand the full range of taxonomy, recall that taxonomy is the discipline of creating structured semantics. As such, all types and applications of metadata fall within its preview. Digital Asset Management (DAM) is a particular type of taxonomy practice. Similarly, identifying, defining, and cross-mapping other metadata elements and their use fall within the ordinary scope of the work.

Knowledge Organization Systems

Knowledge Organization Systems (KOS) are formal approaches to organizing and managing knowledge. The common goal of these systems is impose a structure to consistently makes resources available for use. If a resource can not be found it is functionally null

Architecturally, all KOSs are some variety of taxonomy. At the same time, taxonomies are a type of KOS.

Important types of KOSs include the following:

  • Thesauri
  • Taxonomies
  • Ontologies
  • Subject-heading systems
  • Controlled vocabularies
  • Classification schemes

The difference between these approaches is the degree of logical rigor and level of detail (granularity) achievable based on the units of recall for which they were designed.

More important than differences of degree, however, are the fundamental assumptions about the map structure regarding the algebraic properties, especially those of independence and association, and use of set theory in determining the relationships of membership and identity. As a result of the different assumptions at work, there are two different topologies for developing taxonomies, and, in general, KOSs.

Appreciating the different topologies is important in understanding and using taxonomy and metadata schema products.

There are two major approaches of taxonomy construction.

  1. Pure hierarchical taxonomies, rooted in classical Aristotelian categorical descriptive logic and exemplified by the work of Linnaeus. In this structure objects are permitted exactly one instance in the taxonomy. Frequently envisioned in discussions of taxonomies, the rigor of its logic, appropriate for use cases, results in large vocabularies that requires high-level expertise to use and create.
  2. Faceted taxonomies are built on the work of S. R. Ranganathan. This structure views objects as a matrix of values describable in multiple, independent sets of values (facets). Sets are mutual exclusive and the values (terms) are used exactly once, with well-defined meanings. Class and property values are applied to orthogonally to information objects, allowing those objects to be located within the KOS though object-value pair relationships. The taxonomy systems based on this approach are significantly flatter, and broader, than classical taxonomies, and can be developed more quickly and used effectively by users at lower levels of domain expert knowledge. As a result, faceted taxonomies are the dominant type produced for use for information and content management systems.

The importance of the forgoing technical discussion rests in the fact that how a taxonomy is designed can, and should, play a fundamental to application and web site development strategy. Moreover it is decisive in determining the easy and efficiency of internal and external users perform tasks and search and retrieve highly-salient, appropriately granular, and serendipitous results. Why care about serendipity, if you are in eCommerce, think up-sell opportunity.