The primary objective of classification frameworks is to decouple categorization from object creation and management and provide a human friendly, visually recognizable, flexible alternative to search indices of objects. Starting from directory architecture used by virtually any operating system, methods of classification have been in use for long time, under the principle of decoupling classification from creation, management, storage and identification of objects.
Taxonomy classification, and its currently popular variation - tagging are widely used in many web frameworks and desktops. Gmail started it with replicating a folder structure, with the concept of placing an object virtually into more than one bucket perhaps following soft linking that existed in file systems for decades.
This has been given a new look when web services like Gmail and Flickr adopted taxonomy classification for a very specific purpose of classification of only few selected objects. For example, we could choose to apply the classification or live without it. Later similar classification concepts have been adopted to desktop applications such as iTunes, Explorer in Microsoft Vista OS, etc and web frameworks.
Apart from classification framework's primary objective of decoupling the categorization logic from content management, we will focus on the following indispensable entities in a web framework - scalability, reliability, extensibility and usability. This must also take the frequency of operations into account in determining the crucial factors. Implementation specific details are discussed under design considerations.
Database is normalized to 3NF to ensure extensibility while being scalable. Frequently used queries like relationships, object mapping and leaf membership are designed to be scalable. Reliability is achieved through both hooks on different states of an object and vice versa and possibly through cron tasks.
- Leaf : The atomic unit of taxonomy labeling, that will be associated with the items to categorize them. Here it is referred to as terms and tags interchangeably. Leaves might be called by many similar names and we name them alias.
- Alias: For example, university and college may mean the same thing and it is redundant and misleading to have two different labels for a single term. Hence university can be aliased with college so that there will be only one term that is - university, and whenever user specifies the term college, it will be interpreted as university.
- Tree: This is an umbrella unit for leaves that act as a bucket and define the properties, rules and interaction of the leaves underneath. It will have the following properties
- Hierarchy : The type of hierarchy involved (refer structure below)
- Controlled: Whether leaves can be created outside the admin forms (com_taxonomy)
- Relations: what type of relations are allowed: Only parent-child relationship, or peer-to-peer as well.
- Tree mapping : Trees are useless unless it is mapped to an object manager, such as a component. A particular mapping involves a tree and an extension. Further it defines the following rules
- Required: Whether each object must be associated with a leaf from this tree under this association
- Multiple: Whether multiple leaves can be associated with a single object
- Weight: The factor that determines the priority for a tree when an extension is calling for all associated trees or all associated terms for an object under that extension. Heavier weights sink.
- Leaf mapping: Guided by the tree mapping rules and the properties of trees, a leaf is associated to an object coming under the extension given in the tree mapping. This is also known as labeling or tagging.
Although a unified tree assimilating the structure of domain names will be adequate we consider a forest of trees for the reasons explained later in design considerations. The top level member will be a tree, whose attributes define the usage of the tree and its structure. Leaves are members of a particular tree, albeit an unrestrained membership is considered for many reasons explained under design considerations, a leaves will be confined to only one tree, but can be used by multiple extensions, still abiding by the concept of define once, use everywhere.
There are several kinds of structures that can be built under a taxonomy framework. However there are three fundamental structures on top of whom a complex design can be built on, and we call them hierarchies.
1. Flat hierarchy
This is also known as flat structure, tagging, floating terms, etc. The tree built with such a structure will be forwarding a quite straightforward simple taxonomy system to its end users.
2. Single Hierarchy
This is commonly understood as a taxonomy tree, where a leave is either the root leave or a child of another leave. Therefore, there can be only one parent, but many children resembling top-down hierarchy.
3. Multiple Hierarchies
The requirement of flexibility may bring some rarely used features. Multiple hierarchy is such a feature, nonetheless still essential for completeness. It allows multiple parents, so that branch-off is possible upwards and downwards
The intention of separating the logic of structure from its building block is to maintain the maximum flexibility that is being able to virtually achieve any level of complexity in building a taxonomy system, while ensuring usability so that users are not led into wilderness. Under this design, structures will be built on the aforementioned blocks seamlessly .
(refer to the vocabularies above, for the implied meaning of specific words below)
- Free tagging - similar to the tagging feature available in Wordpress, Gmail, or Flickr. This is of flat hierarchy, multiple, uncontrolled tree, and possibly not required.
- Categories - Similar to categories that are used in Joomla! or in wordpress. Hierarchical trees with single hierarchy. Mostly controlled, and possibly required and single. Which means user must opt for one, and only one leaf per object.
- User groups - Similar to Organic Groups used in Drupal. Hierarchical trees with single hierarchy, controlled, single, and required.
- Book - Hierarchical trees with single hierarchy, not controlled, required and single.
- Channels - Multiple hierarchies, not controlled, multiple and possibly required.
Above examples emphasize the coordination required between the creation of a tree and mapping with an extension. Although user or an extension could do that by directly entering in the table or using the backend forms and build a model, a well thought planning is indeed needed. It will be simplified in frontend forms (refer to com_taxonomy) where user could just pick a structure whose properties will be defined automatically.
Both the ability to handle large number of records and extensibility are considered in the design, and hence the schema is normalized to 3NF. The following tables build the taxonomy framework
- tree : It builds the forest of trees that stores the complete information necessary to build a complete tree. All the attributes are editable through management component, however, a quick build will be made available for common types of trees (refer to frontend of com_taxonomy).
- tree_map : It stores the correspondence between a tree and another extension, for example, content component. The tree can be completely linked to an extension or more than one. Such a normalization would enable reuse of a single tree for multiple applications. Weight determines the priority given for a particular linkage over a similar linkage.
- leaf : Contains the atomic information about the term, the tree it belongs to, etc. Also weight is added to ensure, the preference order when multiple leaves line up for a particular request.
- leaf_map : It is used to match the terms with an item, for example a content post
- leaf_hierachy : It contains the relationship between two leaves. The type of relationships currently supported are parent and peer, which will translate to parent-child relationship that makes up a hierarchical tree and peer-to-peer relationship that makes up a cluster tree respectively.
- leaf_alias : It contains the list of alias for a leave. It is essential when taxonomy framework is used for components that interact with human directly where responsibility of remembering the right term cannot be enforced. Under such circumstances the ability to handle aliases, for example, words like university, universities, etc could all mean the same term - college.
Hierarchy / free terms :
There are two popular taxonomy structures. Hierarchical tree (mostly single hierarchy) and free tagging (flat hierarchy). Although it is quite sufficient for content management tasks, complex hierarchies like albums with free tagging, user groups, user roles (for privilege granting purposes), etc will require variety of features. This is achieved by isolating structure from its building blocks - hierarchies, extension mapping (and properties) and tree properties.
Concentrated / Distributed
Sometimes it is necessary to focus on 90% used features while giving up the total flexibility in order to boost performance.
In design there is no need for a tree, as implemented in domain names hierarchy where everything starts at 0th level - root domain - the dot, and expanded by 1st and 2nd level domains, all are equal in representation. Similarly only leaves could have formed the taxonomy framework with the ability to map the extension to a particular leave which will govern the properties of all the children.
However it is chosen to implement multiple trees, sacrificing the flexibility for the gains in performance and more importantly for the support multiple hierarchies or more precisely multiple parents. It allows greater extensibility in breadth and depth.
Define here, use everywhere
Following the popular paradigm "write once, use many" taxonomy framework is expected to be the unified solution for all classification requirements. Hence it must be able link with any implementation seamlessly. This is achieved by mapping a tree with a particular extension, thus make all the terms underneath available at the extension's disposal.