Unified Medical Language System

Designed initially by Donald Lindberg, M.D., Director of the US National Library of Medicine in 1986, the Unified Medical Language System (UMLS) is a controlled compendium of many vocabularies which also provides a mapping structure between them. The UMLS is composed of three main knowledge components: Metathesaurus®, Semantic Network and SPECIALIST Lexicon.

The relationship between the various items below provide a logical understanding of the structure and purpose of these three components:
 * Metathesaurus &harr; concepts
 * Semantic Network &harr; categories
 * SPECIALIST Lexicon &harr; indices

Purpose
The amount of biomedical resources available to researchers is enormous. Often this is a problem due to the large volume of documents retrieved when the medical literature is searched. The purpose of the UMLS is to enhance access to this literature by facilitating the development of computer systems that understand biomedical language. This is achieved by overcoming two significant barriers: "the variety of ways the same concepts are expressed in different machine-readable sources & by different people" and "the distribution of useful information among many disparate databases & systems". Three main tools are used to accomplish this: Metathesaurus, Semantic Network, and SPECIALIST Lexicon.

Applications
The application of the UMLS Knowledge sources varies from the development stages to the end user stages. There are three UMLS Knowledge Sources: the Metathesaurus, the Semantic Network, and the SPECIALIST lexicon. Developers obtain the databases from the National Library of Medicine. System developers can then make modifications to the products to meet the needs of end users. System modifications can include changes to the construction of the databases, enhancements to how data is retrieved, and the overall structure and linkage of the data. Data sources affected by these changes include biomedical and health related data information as well as health informatics. Undertaking such efforts ensures that the end product will serve the needs of the end users.

Metathesaurus
The Metathesaurus® forms the base of the UMLS and it is comprised of over 1 million biomedical concepts and 5 million concept names, all of which are from over 100 controlled vocabularies and classification systems used in patient records, bibliographic, administrative health data and full text databases. Some examples of the controlled vocabularies are ICD-9-CM, MeSH, SNOMED CT, LOINC, and RxNORM. The purpose of the Metathesaurus is to provide a basis of context and inter-context relationships between these various coding systems and vocabularies to provide a common basis of information exchange between the variety of clinical databases and systems.

Metathesaurus is organized by concept or meaning, and each concept has specific attributes that define the meaning. Identical or almost identical concepts are linked together with hierarchical context from the different vocabularies and relationships between the concepts are explained and represented.

The scope of the Metathesaurus is determined by the scope of the source vocabularies. The Metathesaurus itself is produced by the automated processing of a machine-readable version of the source vocabulary, followed by human intervention of editing and review. An individual can obtain a copy of the Metathesaurus, but it is intended to be primarily used by system developers because it is a multi-purpose base resource for the entire UMLS.

The Metathesaurus can supply information to a software program to create new data, answer a user’s inquiries, allow the user to refine his/her query questions and assist in converting the user’s vocabulary to those uniform vocabularies used by standardized classification systems. It can be used in those clinical applications to query clinical databases, can be linked to patient records and when used in concert with the Semantic Network and SPECIALIST Lexicon, it gains further utility by better coherent data results and relational power.

Semantic network
Semantic networks are knowledge representation schemes involving nodes and links (arcs or arrows) between nodes. The nodes represent objects or concepts and the links represent relations between nodes. This graphical representation assists in understanding the relationships of concepts.

The Semantic Network is one of three knowledge sources used to help facilitate the use of the Unified Medical Language System (UMLS). The network has Semantic types and Semantic relationships, that exists between semantic types. There are 135 semantic types and 54 relationships. This network is designed to categorize concepts in the UMLS Metathesaurus and provide relationships among the concepts. Once a Metathesaurus concept is established, it is connected to the most specific semantic type from the Semantic Network.

There are major groupings of semantic types including organisms, anatomical structures and the like. The links among semantic types provide the structure for the network and show important relationships between the groupings and concepts. The primary link between semantic types is the "isa" link. This primary link establishes a hierarchy to decide the most specific semantic type to assign to a Metathesaurus concept. The network also has 5 major non-hierarchical relationships categories. The major categories are "physically related to, "spatially related to", temporally related to", "functionally related to" and "conceptually related to". The semantic type information includes an identifier, hierarchy, definition and its associated relationships.

SPECIALIST Lexicon
The SPECIALIST Lexicon is the third of the Knowledge Sources supporting the Unified Medical Language System. Both common English vocabulary and biomedical terms are a source for the Specialist Natural Language Processing System, as well as information from MEDLINE, and the UMLS Metathesaurus. Each entry contains syntactic (how words are put together to create meaning), morphological (form and structure) and orthographic (spelling) information. In the Specialist Lexicon, JAVA programs help end users work through the variations in biomedical texts by relating words by their parts of speech, which can be helpful in web searches or searches through an electronic medical record.

Entries may be one-word or multiple-word terms. Records contain four parts: base form (i.e. "run" for "running"); parts of speech (of which Specialist recognizes eleven); a unique identifier; and any available spelling variants. For example, a query for "anesthetic" would return the following: {base=anaesthetic spelling_variant=anesthetic entry=E0008769 cat=noun variants=reg} {base=anaesthetic spelling_variant=anesthetic entry=E0008770 cat=adj variants=inv position=attrib(3)} (Browne et al., 2000)

The Specialist lexicon is available in two (2) formats. The "unit record" format can be seen above, and is comprised of slots and fillers. A slot is the element (i.e. "base=" or "spelling variant=") and the fillers are the values attributable to that slot for that entry. The "relational table" format is not yet normalized and contains a great deal of duplication of data.