METADATA
The term Metadata is an ambiguous term which is used for two fundamentally different concepts (
Types). Although an expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at design time the application contains no data. In this case the correct description would be "data about the containers of data". Descriptive metadata on the other hand, is about individual instances of application data, the data content. In this case, a useful description (resulting in a disambiguating neologism) would be "data about data contents" or "content about content" thus
metacontent. Descriptive, Guide and the NISO concept of administrative metadata are all subtypes of metacontent.
Metadata (metacontent) is traditionally found in the
card catalogues of
libraries. By describing the
contentscontext of
data files, the quality of the original data/files is greatly increased. For example, a
webpage and may include metadata specifying what language it's written in, what tools were used to create it, and where to go for more on the subject, allowing browsers to automatically improve the experience of users.
Definition
Metadata (metacontent) is defined as data providing information about one or more aspects of the data, such as:
- Means of creation of the data
- Purpose of the data
- Time and date of creation
- Creator or author of data
- Placement on a computer network where the data was created
- Standards used
For example, a
digital image may include metadata that describes how large the picture is, the color depth, the image resolution, when the image was created, and other data. A text document's metadata may contain information about how long the document is, who the author is, when the document was written, and a short summary of the document.
Metadata is data. As such, metadata can be stored and managed in a
database, often called a registry or repository. However, it is impossible to identify metadata just by looking at it because a user would not know when data is metadata or just data.
Libraries
Metadata has been used in various forms as a means of cataloging archived information. The
Dewey Decimal System employed by libraries for the classification of library materials is an early example of metadata usage. Library catalogues used 3x5 inch cards to display a book's title, author, subject matter, and a brief plot synopsis along with an abbreviated
alpha-numeric identification system which indicated the physical location of the book within the library's shelves. Such data helps classify, aggregate, identify, and locate a particular book. Another form of older metadata collection is the use by US Census Bureau of what is known as the "Long Form." The Long Form asks questions that are used to create demographic data to create patterns and to find patterns of distribution.
The term was coined in 1968 by Philip Bagley, one of the pioneers of
computerized document retrieval.
Since then the fields of information management, information science, information technology, librarianship and GIS have widely adopted the term. In these fields the word metadata is defined as "data about data".
While this is the generally accepted definition, various disciplines have adopted their own more specific explanation and uses of the term.
For the purposes of this article, an "object" refers to any of the following:
- A physical item such as a book, CD, DVD, map, chair, table, flower pot, etc.
- An electronic file such as a digital image, digital photo, document, program file, database table, etc.
Photographs
Metadata may be written into a digital photo file that will identify who owns it, copyright & contact information, what camera created the file, along with exposure information and descriptive information such as keywords about the photo, making the file searchable on the computer and/or the Internet. Some metadata is written by the camera and some is input by the photographer and/or software after downloading to a computer.
Photographic Metadata Standards are governed by organizations that develop the following standards. They include, but are not limited to:
- IPTC Information Interchange Model IIM (International Press Telecommunications Council),
- IPTC Core Schema for XMP
- XMP – Extensible Metadata Platform (an Adobe standard)
- Exif – Exchangeable image file format, Maintained by CIPA (Camera & Imaging Products Association) and published by JEITA (Japan Electronics and Information Technology Industries Association)
- Dublin Core (Dublin Core Metadata Initiative – DCMI)
- PLUS (Picture Licensing Universal System)
Video
Metadata is particularly useful in video, where information about its contents (such as transcripts of conversations and text descriptions of its scenes) are not directly understandable by a computer, but where efficient search is desirable.
Web pages
Web pages often include metadata in the form of
meta tags. Description and keywords meta tags are commonly used to describe the Web page's content. Most search engines use this data when adding pages to their search index.
Creation of metadata
Metadata can be created either by automated information processing or by manual work. Elementary metadata captured by computers can include information about when a file was created, who created it, when it was last updated, file size and file extension.
Metadata types
The metadata application is manifold covering a large variety of fields of application there are nothing but specialised and well accepted models to specify types of metadata. Bretheron & Singley (1994) distinguish between two distinct classes: structural/control metadata and guide metadata.
Structural metadata is used to describe the structure of computer systems such as tables, columns and indexes.
Guide metadata is used to help humans find specific items and is usually expressed as a set of keywords in a natural language. According to
Ralph Kimball metadata can be divided into 2 similar categories—Technical metadata and Business metadata.
Technical metadata correspond to internal metadata,
business metadata to external metadata. Kimball adds a third category named
Process metadata. On the other hand, NISO distinguishes between three types of metadata: descriptive, structural and administrative
Descriptive metadata is the information used to search and locate an object such as title, author, subjects, keywords, publisher;
structural metadata gives a description of how the components of the object are organised; and
administrative metadata refers to the technical information including file type. Two sub-types of administrative metadata are rights management metadata and preservation metadata.
Metadata structures
Metadata syntax
Metadata (metacontent) syntax refers to the rules created to structure the fields or elements of metadata (metacontent). A single metadata scheme may be expressed in a number of different markup or programming languages, each of which requires a different syntax. For example, Dublin Core may be expressed in plain text,
HTML,
XML and
RDFA common example of (guide) metacontent is the bibliographic classification, the subject, the
Dewey Decimal class number. There is always an implied statement in any "classification" of some object. To classify an object as, for example, Dewey class number 514 (Topology) (e.g. a book has this number on the spine) the implied statement is: "<book><subject heading><514>. This is a subject-predicate-object triple, or more importantly, a class-attribute-value triple. The first two elements of the triple (class, attribute) are pieces of some structural metadata having a defined semantic. The third element is a value, preferrably from some controlled vocabulary, some reference (master) data. The combination of the metadata and master data elements results in a statement which is a metacontent statement ie. "metacontent = metadata + master data". All these elements can be thought of as "vocabulary". Both metadata and master data are vocabularies which can be assembled into metacontent statements. There are many sources of these vocabularies, both meta and master data: UML, EDIFACT, XSD, Dewey/UDC/LoC, SKOS, ISO-25964, Pantone, Linnaean Binomial Nomenclature etc. Using controlled vocabularies for the components of metacontent statements, whether for indexing or finding, is endorsed by
ISO-25964: "If both the indexer and the searcher are guided to choose the same term for the same concept, then relevant documents will be retrieved." This is particularly relevant when considering that the behemoth of the internet, Google, is simply indexing then matching text strings, there is no intelligence or "inferencing" occurring.
Hierarchical, linear and planar schemata
Metadata schemas can be hierarchical in nature where relationships exist between metadata elements and elements are nested so that parent-child relationships exist between the elements. An example of a hierarchical metadata schema is the
IEEE LOM schema where metadata elements may belong to a parent metadata element. Metadata schemas can also be one dimensional, or linear, where each element is completely discrete from other elements and classified according to one dimension only. An example of a linear metadata schema is
Dublin Core schema which is one dimensional. Metadata schemas are often two dimensional, or planar, where each element is completely discrete from other elements but classified according to two orthogonal dimensions.
[9]Metadata hypermapping
In all cases where the metadata schemata exceed the planar depiction, some type of
hypermapping is required to enable display and view of metadata according to chosen aspect and to serve special views. Hypermapping frequently applies to layering of geographical and geological information overlays.
Granularity
Granularity is a term that applies to data as well as to metadata. The degree to which metadata is structured is referred to as its
granularity. Metadata with a high granularity allows for deeper structured information and enables greater levels of technical manipulation however, a lower level of granularity means that metadata can be created for considerably lower costs but will not provide as detailed information. The major impact of granularity is not only on creation and capture, but moreover on maintenance. As soon as the metadata structures get outdated, the access to the referred data will get outdated. Hence granularity shall take into account the effort to create as well as the effort to maintain.
Metadata standards
International standards apply to metadata. Much work is being accomplished in the national and international standards communities, especially
ANSI (American National Standards Institute) and
ISO (International Organization for Standardization) to reach consensus on standardizing metadata and registries.
The core standard is
ISO/
IEC 11179-1:2004
[11] and subsequent standards (see
ISO/IEC 11179). All yet published registrations according to this standard cover just the definition of metadata and do not serve the structuring of metadata storage or retrieval neither any administrative standardisation. It is important to note that this standard refers to metadata as data about containers of data and not to metadata (metacontent) as data about data contents. It should also be noted that this standard describes itself originally as a "data element" registry, describing disembodied data elements, and explicitly disavows the capability of containing complex structures. Thus the original term "data element" is more applicable than the later applied buzzword "metadata".
Metadata usage
Data Virtualization has emerged as the new software technology to complete the virtualization stack in the enterprise. Metadata is used in Data Virtualization servers which are enterprise infrastructure components, along side with Database and Application servers. Metadata in these servers is saved as persistent repository and describes business objects in various enterprise systems and applications.
Statistics and census services
Standardisation work has had a large impact on efforts to build metadata systems in the statistical community. Several metadata standards are described, and their importance to statistical agencies is discussed. Applications of the standards at the Census Bureau, Environmental Protection Agency, Bureau of Labor Statistics, Statistics Canada, and many others are described. Emphasis is on the impact a metadata registry can have in a statistical agency.
Library and information science
Libraries employ metadata in
library catalogues, most commonly as part of an
Integrated Library Management System. Metadata is obtained by
cataloguing resources such as books, periodicals, DVDs, web pages or digital images. This data is stored in the integrated library management system,
ILMS, using the
MARC metadata standard. The purpose is to direct patrons to the physical or electronic location of items or areas they seek as well as to provide a description of the item/s in question.
More recent and specialised instances of library metadata include the establishment of
digital libraries including
e-print repositories and digital image libraries. While often based on library principles the focus on non-librarian use, espcially in providing metadata means they do not follow traditional or common cataloguing approaches. Given the custom nature of included materials metadata fields are often specially created e.g. taxonomic classification fields, location fields, keywords or copyright statement. Standard file information such as filesize and format are usually automatically included.